[GitHub] [airflow] boring-cyborg[bot] commented on pull request #11219: [AIRFLOW-6585] Fixed Timestamp bug in RefreshKubeConfigLoader

2020-10-01 Thread GitBox


boring-cyborg[bot] commented on pull request #11219:
URL: https://github.com/apache/airflow/pull/11219#issuecomment-701949674


   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contribution Guide 
(https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for 
testing locally, it’s a heavy docker but it ships with a working Airflow and a 
lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it 
better 🚀.
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mcepok opened a new pull request #11219: [AIRFLOW-6585] Fixed Timestamp bug in RefreshKubeConfigLoader

2020-10-01 Thread GitBox


mcepok opened a new pull request #11219:
URL: https://github.com/apache/airflow/pull/11219


   # Jira
   https://issues.apache.org/jira/browse/AIRFLOW-6585
   
   # Description
   This is basically #7153 which I extended by using pendulum to parse time 
strings as asked in 
https://github.com/apache/airflow/pull/7153#pullrequestreview-389880157 . Since 
that PR is closed I opened a new one.
   **Original description:**
   When using the KubernetesPodOperator on an aws kubernetes cluster, the 
aws-iam-authenticator is used to obtain kubernetes authentication tokens. The 
aws tokens contain ISO-8601 formatted timestamps, which couldn't be parsed in 
case of a "Z" (Zulu Time) timezone.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-6585) Timestamp bug in RefreshKubeConfigLoader

2020-10-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205334#comment-17205334
 ] 

ASF GitHub Bot commented on AIRFLOW-6585:
-

mcepok opened a new pull request #11219:
URL: https://github.com/apache/airflow/pull/11219


   # Jira
   https://issues.apache.org/jira/browse/AIRFLOW-6585
   
   # Description
   This is basically #7153 which I extended by using pendulum to parse time 
strings as asked in 
https://github.com/apache/airflow/pull/7153#pullrequestreview-389880157 . Since 
that PR is closed I opened a new one.
   **Original description:**
   When using the KubernetesPodOperator on an aws kubernetes cluster, the 
aws-iam-authenticator is used to obtain kubernetes authentication tokens. The 
aws tokens contain ISO-8601 formatted timestamps, which couldn't be parsed in 
case of a "Z" (Zulu Time) timezone.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Timestamp bug in RefreshKubeConfigLoader
> 
>
> Key: AIRFLOW-6585
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6585
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes, executors, hooks
>Affects Versions: 1.10.7
>Reporter: Jan Brusch
>Assignee: Jan Brusch
>Priority: Major
>
> When using the KubernetesPodOperator on an aws kubernetes cluster, the 
> aws-iam-authenticator is used to obtain kubernetes authentication tokens. The 
> aws tokens contain ISO-8601 formatted timestamps, which couldn't be parsed in 
> case of a "Z" (Zulu Time) timezone. This PR fixes this problem by converting 
> the "Z" timezone into a regular "+" format.
> Upon further review this is only a problem with python version <= 3.6. But 
> that should not keep the issue from being fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6585) Timestamp bug in RefreshKubeConfigLoader

2020-10-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205335#comment-17205335
 ] 

ASF GitHub Bot commented on AIRFLOW-6585:
-

boring-cyborg[bot] commented on pull request #11219:
URL: https://github.com/apache/airflow/pull/11219#issuecomment-701949674


   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contribution Guide 
(https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for 
testing locally, it’s a heavy docker but it ships with a working Airflow and a 
lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it 
better 🚀.
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Timestamp bug in RefreshKubeConfigLoader
> 
>
> Key: AIRFLOW-6585
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6585
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes, executors, hooks
>Affects Versions: 1.10.7
>Reporter: Jan Brusch
>Assignee: Jan Brusch
>Priority: Major
>
> When using the KubernetesPodOperator on an aws kubernetes cluster, the 
> aws-iam-authenticator is used to obtain kubernetes authentication tokens. The 
> aws tokens contain ISO-8601 formatted timestamps, which couldn't be parsed in 
> case of a "Z" (Zulu Time) timezone. This PR fixes this problem by converting 
> the "Z" timezone into a regular "+" format.
> Upon further review this is only a problem with python version <= 3.6. But 
> that should not keep the issue from being fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] rounakdatta opened a new pull request #11220: introduce get_partition_by_name within hive hook

2020-10-01 Thread GitBox


rounakdatta opened a new pull request #11220:
URL: https://github.com/apache/airflow/pull/11220


   Hello Airflow Maintainers,
   This PR is introducing the capability to get partition object by name 
directly through a hook method. This could be often useful to derive multiple 
useful statistics for a particular partition.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on pull request #11220: introduce get_partition_by_name within hive hook

2020-10-01 Thread GitBox


boring-cyborg[bot] commented on pull request #11220:
URL: https://github.com/apache/airflow/pull/11220#issuecomment-701988305


   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contribution Guide 
(https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for 
testing locally, it’s a heavy docker but it ships with a working Airflow and a 
lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it 
better 🚀.
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj merged pull request #11216: Strict type checking for ssh

2020-10-01 Thread GitBox


mik-laj merged pull request #11216:
URL: https://github.com/apache/airflow/pull/11216


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch master updated (5093245 -> b6d5d1e)

2020-10-01 Thread kamilbregula
This is an automated email from the ASF dual-hosted git repository.

kamilbregula pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 5093245  Strict type coverage for Oracle and Yandex provider  (#11198)
 add b6d5d1e  Strict type checking for SSH (#11216)

No new revisions were added by this update.

Summary of changes:
 airflow/providers/ssh/hooks/ssh.py | 32 +---
 airflow/providers/ssh/operators/ssh.py | 25 +
 2 files changed, 30 insertions(+), 27 deletions(-)



[airflow] branch master updated (5093245 -> b6d5d1e)

2020-10-01 Thread kamilbregula
This is an automated email from the ASF dual-hosted git repository.

kamilbregula pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 5093245  Strict type coverage for Oracle and Yandex provider  (#11198)
 add b6d5d1e  Strict type checking for SSH (#11216)

No new revisions were added by this update.

Summary of changes:
 airflow/providers/ssh/hooks/ssh.py | 32 +---
 airflow/providers/ssh/operators/ssh.py | 25 +
 2 files changed, 30 insertions(+), 27 deletions(-)



[GitHub] [airflow] turbaszek commented on pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


turbaszek commented on pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#issuecomment-702037209


> This PR (with 2 Schedulers)
   TI Lag Avg: 74.028s (± 17.215)
   Time in Queued State: 68.708s (± 15.556)
   
   Is the last measurement correct?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


potiuk commented on pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#issuecomment-702041010


   > > This PR (with 2 Schedulers)
   > > TI Lag Avg: 74.028s (± 17.215)
   > > Time in Queued State: 68.708s (± 15.556)
   > 
   > Is the last measurement correct?
   
   Yeah. Looks like copy&paste error ? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk edited a comment on pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


potiuk edited a comment on pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#issuecomment-702041010


   > > This PR (with 2 Schedulers)
   > > TI Lag Avg: 74.028s (± 17.215)
   > > Time in Queued State: 68.708s (± 15.556)
   > 
   > Is the last measurement correct?
   
   Yeah. Looks like copy&paste error from 1.10.10 ? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil edited a comment on pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


kaxil edited a comment on pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#issuecomment-701053868


   Ran some benchmarks against different versions (more comprehensive 
benchmarks will be provided -- consider this too as **_preliminary_**):
   
   **Scenario**: 10 Dag File | 1 Dag Per File | 10 Tasks per DAG
   **Configs**:
   - Executor: CeleryExecutor
   - Parallelism: 128
   - Workers: 4 (1 CPU - 3.75 GB memory)
   - Each Scheduler (1 CPU - 3.75 GB memory)
   
   **Airflow 1.10.10**
   TI Lag Avg: 64.845s  (± 7.995) 
   Time in Queued State: 57.6485s (± 7.395)
   
   **Airflow Master**
   TI Lag Avg: 48.5754s (± 62.991) 
   Time in Queued State: 11.909s (± 8.252)
   
   **This PR** (Single Scheduler)
   TI Lag Avg: 10.262s (± 5.299) 
   Time in Queued State: 9.3494s (± 5.355)
   
   --
   
   **Scenario**: 100 Dag Files | 1 Dag Per File | 10 Tasks per DAG
   **Configs**:
   - Executor: CeleryExecutor
   - Parallelism: 128
   - Workers: 4 (1 CPU - 3.75 GB memory)
   - Each Scheduler (1 CPU - 3.75 GB memory)
   
   
   **Airflow 1.10.10**
   TI Lag Avg: 809.537s (± 29.6730) 
   Time in Queued State: 751.241s (± 35.5153)
   
   **Airflow Master**
   TI Lag Avg: 392.806s (± 318.620) 
   Time in Queued State: 10.694s (± 9.930)
   
   **This PR (with 2 Schedulers)**
   TI Lag Avg: 74.028s (± 17.215) 
   Time in Queued State: 68.708s (± 15.556)
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


kaxil commented on pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#issuecomment-702043623


   > > This PR (with 2 Schedulers)
   > > TI Lag Avg: 74.028s (± 17.215)
   > > Time in Queued State: 68.708s (± 15.556)
   > 
   > Is the last measurement correct?
   
   Whoops yeah updated **Time in Queued State: 751.241s (± 35.5153)** for 
1.10.10



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] turbaszek commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


turbaszek commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498160604



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   Do you think we should run this on schedule?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


kaxil commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498167461



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   I don't think we need to run it on a Schedule

##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   Running it on PRs and Master would suffice





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


kaxil commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498168087



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   ```suggestion
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] JeffryMAC commented on issue #11163: Fully separate provider packages from the Airflow core (AIP-8)

2020-10-01 Thread GitBox


JeffryMAC commented on issue #11163:
URL: https://github.com/apache/airflow/issues/11163#issuecomment-702076047


   @potiuk how the separation is going to work with hive macros?
   https://github.com/apache/airflow/blob/master/airflow/macros/hive.py
   Will Airflow have also macros defined per provider which will be able to 
integrate with core?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on pull request #11221: Update README.md

2020-10-01 Thread GitBox


boring-cyborg[bot] commented on pull request #11221:
URL: https://github.com/apache/airflow/pull/11221#issuecomment-702077899


   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contribution Guide 
(https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for 
testing locally, it’s a heavy docker but it ships with a working Airflow and a 
lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it 
better 🚀.
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] Suyog942 opened a new pull request #11221: Update README.md

2020-10-01 Thread GitBox


Suyog942 opened a new pull request #11221:
URL: https://github.com/apache/airflow/pull/11221


   Corrected some grammar mistakes.
   
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


potiuk commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498184364



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   But I think this was deliberate. It's better to check in on schedule 
rather than per-commit because the vulnerabilities are really appearing by 
someone adding new stuff but they are triggered by CVE being reported. I think 
weekly schedule for those is pretty good cadence. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


potiuk commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498184364



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   But I think this was deliberate. It's better to check in on schedule 
rather than per-commit because the vulnerabilities are rarely appearing by 
someone adding new stuff but they are triggered by CVE being reported. I think 
weekly schedule for those is pretty good cadence. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


kaxil commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498188318



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   We will already have a higher cadence of commit though, so if anything 
we would catch it before that schedule





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


potiuk commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498191250



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   I think it will just be overwhelming and counter-productive to get those 
reports for all commits, but Let's see :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on issue #11163: Fully separate provider packages from the Airflow core (AIP-8)

2020-10-01 Thread GitBox


potiuk commented on issue #11163:
URL: https://github.com/apache/airflow/issues/11163#issuecomment-702094290


   That was the plan to handle all such dependencies.
   
   I've implemented a POC that implements some of those, but for now I do not 
have time to move it further in the coming week or two, 
   
   The issue is up for grabs to anyone to work on it - maybe you would like to 
pick it up :) ? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


potiuk commented on pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#issuecomment-702097998


   > > > This PR (with 2 Schedulers)
   > > > TI Lag Avg: 74.028s (± 17.215)
   > > > Time in Queued State: 68.708s (± 15.556)
   > > 
   > > 
   > > Is the last measurement correct?
   > 
   > Whoops yeah updated **Time in Queued State: 751.241s (± 35.5153)** for 
1.10.10
   
   Coool!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


kaxil commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498206410



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   I mean yeah I don't have a strong opinion on that one tbh. We can keep 
it too.
   
   > I think it will just be overwhelming and counter-productive to get those 
reports for all commits, but Let's see :)
   
   Wouldn't we get that if we keep "schedule" too, i.e. based on what we have 
in PR: we will currently have it when a PR is pushed, a commit is pushed and 
"on schedule" too, correct me if I am wrong though
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] edeNFed opened a new pull request #11222: Add UniqueConnIdRule rule and unittest

2020-10-01 Thread GitBox


edeNFed opened a new pull request #11222:
URL: https://github.com/apache/airflow/pull/11222


   Adds UniqueConnIdRule rule to upgrade/rules as per:
   
   
https://github.com/apache/airflow/blob/master/UPDATING.md#unique-conn_id-in-connection-table
   
   Closes: #11037 
   
   ---
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on pull request #11222: Add UniqueConnIdRule rule and unittest

2020-10-01 Thread GitBox


boring-cyborg[bot] commented on pull request #11222:
URL: https://github.com/apache/airflow/pull/11222#issuecomment-702110666


   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contribution Guide 
(https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for 
testing locally, it’s a heavy docker but it ships with a working Airflow and a 
lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it 
better 🚀.
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] FHoffmannCode opened a new pull request #11223: Airflow 11045 create stat name handler not supported rule

2020-10-01 Thread GitBox


FHoffmannCode opened a new pull request #11223:
URL: https://github.com/apache/airflow/pull/11223


   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] potiuk commented on a change in pull request #11211: Add Github Code Scanning

2020-10-01 Thread GitBox


potiuk commented on a change in pull request #11211:
URL: https://github.com/apache/airflow/pull/11211#discussion_r498223576



##
File path: .github/workflows/codeql-analysis.yml
##
@@ -0,0 +1,66 @@
+name: "CodeQL"
+
+on:
+  push:
+branches: [master]
+  pull_request:
+# The branches below must be a subset of the branches above
+branches: [master]
+  schedule:
+- cron: '0 17 * * 3'

Review comment:
   Yeah. I missed that when commenting on the comment. No strong opinion 
either, We need to see how the report looks like etc. I am ok with 
experimenting :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #11195: 2.0 UI Overhaul/Refresh

2020-10-01 Thread GitBox


ashb commented on a change in pull request #11195:
URL: https://github.com/apache/airflow/pull/11195#discussion_r498237415



##
File path: airflow/www/templates/airflow/dag_code.html
##
@@ -22,34 +22,33 @@
 {% block page_title %}{{ dag.dag_id }} - Code - Airflow{% endblock %}
 
 {% block content %}
-{{ super() }}
-{{ title }}
-
-  Toggle wrap
-
+  {{ super() }}
+  
+Toggle 
Wrap
 {{ html_code }}
+  
 {% endblock %}
 
-{% block tail %}
-{{ super() }}
-
-  function toggleWrap() {
-$('.code pre').toggleClass('wrap')
-  };
-
-  // We blur task_ids in demo mode
-  $( document ).ready(function() {
-if ("{{ demo_mode }}" == "True") {
-$("pre span.s").css({
-'text-shadow': '0px 0px 10px red',
-'color': 'transparent',
-});
-}
-  });
-
-  // pygments generates the HTML so set wrap toggle via js
-  if ("{{ wrapped }}" == "True") {
-toggleWrap();
-  };
-
+{% block tail_js %}

Review comment:
   Any reason for different block in use here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on pull request #11195: 2.0 UI Overhaul/Refresh

2020-10-01 Thread GitBox


kaxil commented on pull request #11195:
URL: https://github.com/apache/airflow/pull/11195#issuecomment-702130832


   > I love most of it, but there are a few things I'd like to see differently.
   > 
   > * I'll use the 'Search Dags' bar more than the 'Search by tags' search 
bar, so I'd like to see their positions reversed
   > * the existence of the 'Delete DAG' button makes it impossible to grant 
semi-technical users access to the UI (e.g. stakeholders who sometimes want to 
rerun relevant ETL's at will). Would it be possible to either hide the button 
altogether (after all - proper dag bag configuration should be done in code) or 
have an option in the admin settings to disable it there?
   
   The Delete DAG button won't be an issue coz from Airflow 2.0 RBAC UI would 
be the default UI and you can give DELETE permissions to only admins for example



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ryanahamilton commented on a change in pull request #11195: 2.0 UI Overhaul/Refresh

2020-10-01 Thread GitBox


ryanahamilton commented on a change in pull request #11195:
URL: https://github.com/apache/airflow/pull/11195#discussion_r498240742



##
File path: airflow/www/templates/airflow/dag_code.html
##
@@ -22,34 +22,33 @@
 {% block page_title %}{{ dag.dag_id }} - Code - Airflow{% endblock %}
 
 {% block content %}
-{{ super() }}
-{{ title }}
-
-  Toggle wrap
-
+  {{ super() }}
+  
+Toggle 
Wrap
 {{ html_code }}
+  
 {% endblock %}
 
-{% block tail %}
-{{ super() }}
-
-  function toggleWrap() {
-$('.code pre').toggleClass('wrap')
-  };
-
-  // We blur task_ids in demo mode
-  $( document ).ready(function() {
-if ("{{ demo_mode }}" == "True") {
-$("pre span.s").css({
-'text-shadow': '0px 0px 10px red',
-'color': 'transparent',
-});
-}
-  });
-
-  // pygments generates the HTML so set wrap toggle via js
-  if ("{{ wrapped }}" == "True") {
-toggleWrap();
-  };
-
+{% block tail_js %}

Review comment:
   Just attempting to make the templates more consistent—not sure if I got 
them all. Presumably the `tail_js` comes after the `tail` as it's best practice 
for the scripts to be at the end of the document. Also assuming that the "js" 
in the block name designates the purpose.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #11195: 2.0 UI Overhaul/Refresh

2020-10-01 Thread GitBox


ashb commented on a change in pull request #11195:
URL: https://github.com/apache/airflow/pull/11195#discussion_r498261316



##
File path: airflow/www/templates/airflow/dag_code.html
##
@@ -22,34 +22,33 @@
 {% block page_title %}{{ dag.dag_id }} - Code - Airflow{% endblock %}
 
 {% block content %}
-{{ super() }}
-{{ title }}
-
-  Toggle wrap
-
+  {{ super() }}
+  
+Toggle 
Wrap
 {{ html_code }}
+  
 {% endblock %}
 
-{% block tail %}
-{{ super() }}
-
-  function toggleWrap() {
-$('.code pre').toggleClass('wrap')
-  };
-
-  // We blur task_ids in demo mode
-  $( document ).ready(function() {
-if ("{{ demo_mode }}" == "True") {
-$("pre span.s").css({
-'text-shadow': '0px 0px 10px red',
-'color': 'transparent',
-});
-}
-  });
-
-  // pygments generates the HTML so set wrap toggle via js
-  if ("{{ wrapped }}" == "True") {
-toggleWrap();
-  };
-
+{% block tail_js %}

Review comment:
   Gotcha :+1:





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on pull request #11195: 2.0 UI Overhaul/Refresh

2020-10-01 Thread GitBox


ashb commented on pull request #11195:
URL: https://github.com/apache/airflow/pull/11195#issuecomment-702151669


   @ryanahamilton Look slike your version change isn't working. Tests are 
failing with:
   
   ```
   E   jinja2.exceptions.UndefinedError: 'airflow_version' is undefined
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on pull request #11195: 2.0 UI Overhaul/Refresh

2020-10-01 Thread GitBox


ashb commented on pull request #11195:
URL: https://github.com/apache/airflow/pull/11195#issuecomment-702152420


   Could you show a "full" page screenshot showing the global footer in place 
please?
   
   Additionally: if the git version is not set (which I think is fairly common) 
we should just not show it in the HTML, rather than showing N/A



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ivica-k commented on pull request #10337: Update redshift_to_s3.py

2020-10-01 Thread GitBox


ivica-k commented on pull request #10337:
URL: https://github.com/apache/airflow/pull/10337#issuecomment-702191464


   Would it be considered a bad move from my side if I also submitted a PR for 
the same problem, but with a different fix?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ryw commented on pull request #5787: [AIRFLOW-5172] Add choice of interval edge scheduling

2020-10-01 Thread GitBox


ryw commented on pull request #5787:
URL: https://github.com/apache/airflow/pull/5787#issuecomment-702195003


   @iroddis I'm pushing to get this into 2.0.  
   
   Do you have time to iterate on the PR over the next week? Addressing 
@dstandish concerns, etc, rebasing it on the HA scheduler code (which we will 
merge very soon). 
   
   If you don't have the time, we can take the PR over to fix it up and get it 
merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-5172) Add ability to have DAGs execute at the start of their scheduled interval

2020-10-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205584#comment-17205584
 ] 

ASF GitHub Bot commented on AIRFLOW-5172:
-

ryw commented on pull request #5787:
URL: https://github.com/apache/airflow/pull/5787#issuecomment-702195003


   @iroddis I'm pushing to get this into 2.0.  
   
   Do you have time to iterate on the PR over the next week? Addressing 
@dstandish concerns, etc, rebasing it on the HA scheduler code (which we will 
merge very soon). 
   
   If you don't have the time, we can take the PR over to fix it up and get it 
merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add ability to have DAGs execute at the start of their scheduled interval
> -
>
> Key: AIRFLOW-5172
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5172
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: DAG, scheduler
>Affects Versions: 1.10.4
>Reporter: Ian Roddis
>Priority: Minor
>
> Airflow's scheduling of tasks at the end of their interval can be confusing, 
> and difficult to choreograph in cases where tasks must execute at a specific 
> time (e.g. collecting ephemeral data).
> I'm sure this feature has been requested before, but my Jira search skills 
> aren't good enough to find much on it.
> This issue to add an option to schedule DAGs at the beginning of the 
> scheduled interval.
> I'll be submitting a PR to add this feature will be arriving shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] potiuk commented on a change in pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


potiuk commented on a change in pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#discussion_r498315428



##
File path: airflow/config_templates/config.yml
##
@@ -1547,6 +1547,15 @@
   type: string
   example: ~
   default: "512"
+- name: use_row_level_locking
+  description: |
+Should the scheduler issue `SELECT ... FOR UPDATE` in relevant queries.

Review comment:
   :heart: 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ryw commented on issue #11218: Connection Extra Field is not encrypted/masked on editing

2020-10-01 Thread GitBox


ryw commented on issue #11218:
URL: https://github.com/apache/airflow/issues/11218#issuecomment-702205858


   @ryanahamilton can you have a look at this maybe?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] adamo57 removed a comment on issue #8907: Airflow web UI is slow

2020-10-01 Thread GitBox


adamo57 removed a comment on issue #8907:
URL: https://github.com/apache/airflow/issues/8907#issuecomment-692349842


   this also worked for me @danielnazareth89 thank you



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11223: Airflow 11045 create stat name handler not supported rule

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11223:
URL: https://github.com/apache/airflow/pull/11223#discussion_r498363356



##
File path: airflow/upgrade/rules/stat_name_handler_not_supported.py
##
@@ -0,0 +1,38 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.upgrade.rules.base_rule import BaseRule
+from tests.plugins.test_plugin import AirflowTestPlugin
+
+
+class StatNameHandlerNotSupportedRule(BaseRule):
+
+title = 'stat_name_handler field in AirflowTestPlugin is not supported.'
+
+description = '''\
+stat_name_handler field is no longer supported in AirflowTestPlugin.
+Instead there is stat_name_handler option in [scheduler] section of 
airflow.cfg.
+This change is intended to simplify the statsd configuration.
+'''
+
+def check(self):
+if getattr(AirflowTestPlugin, 'stat_name_handler'):

Review comment:
   It is okay that there is this attribute. This allows us to create 
plugins that are backwards compatible.
   
   We should check if the `stat_name_handler` option has been set. If so, skip 
role. If not, check to see if there are any plugins that use it. If not, skip 
any further processing. If so, return a message to the user.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dimberman commented on a change in pull request #10996: Kubernetes executor can adopt tasks from other schedulers

2020-10-01 Thread GitBox


dimberman commented on a change in pull request #10996:
URL: https://github.com/apache/airflow/pull/10996#discussion_r498369468



##
File path: airflow/executors/base_executor.py
##
@@ -56,6 +56,8 @@ class BaseExecutor(LoggingMixin):
 ``0`` for infinity
 """
 
+_job_id: Optional[str] = None

Review comment:
   @ashb this wouldn't work. you can't have a public member with the same 
name as a property it creates an infinite loop.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on a change in pull request #11027: Replace get accessible dag ids

2020-10-01 Thread GitBox


kaxil commented on a change in pull request #11027:
URL: https://github.com/apache/airflow/pull/11027#discussion_r498376155



##
File path: tests/www/test_security.py
##
@@ -235,8 +244,17 @@ def test_access_control_with_invalid_permission(self):
 'can_varimport',  # a real permission, but not a member of 
DAG_PERMS
 'can_eat_pudding',  # clearly not a real permission
 ]
+username = "Mrs. User"

Review comment:
   Do we allow spaces in username ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] jhtimmins commented on a change in pull request #11027: Replace get accessible dag ids

2020-10-01 Thread GitBox


jhtimmins commented on a change in pull request #11027:
URL: https://github.com/apache/airflow/pull/11027#discussion_r498376579



##
File path: tests/www/test_security.py
##
@@ -235,8 +244,17 @@ def test_access_control_with_invalid_permission(self):
 'can_varimport',  # a real permission, but not a member of 
DAG_PERMS
 'can_eat_pudding',  # clearly not a real permission
 ]
+username = "Mrs. User"

Review comment:
   Not sure. But for the sake of the test it doesn't impact behavior.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil merged pull request #11027: Replace get accessible dag ids

2020-10-01 Thread GitBox


kaxil merged pull request #11027:
URL: https://github.com/apache/airflow/pull/11027


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch master updated: Replace get accessible dag ids (#11027)

2020-10-01 Thread kaxilnaik
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
 new 427a4a8  Replace get accessible dag ids (#11027)
427a4a8 is described below

commit 427a4a8f01c414ab571578bb6b8fbe5a8c6b32ef
Author: James Timmins 
AuthorDate: Thu Oct 1 09:37:00 2020 -0700

Replace get accessible dag ids (#11027)
---
 airflow/www/security.py| 64 +--
 airflow/www/views.py   | 21 ++--
 tests/www/test_security.py | 84 --
 tests/www/test_views.py|  2 +-
 4 files changed, 93 insertions(+), 78 deletions(-)

diff --git a/airflow/www/security.py b/airflow/www/security.py
index 355ccf0..20686b7 100644
--- a/airflow/www/security.py
+++ b/airflow/www/security.py
@@ -22,7 +22,9 @@ from typing import Set
 from flask import current_app, g
 from flask_appbuilder.security.sqla import models as sqla_models
 from flask_appbuilder.security.sqla.manager import SecurityManager
+from flask_appbuilder.security.sqla.models import PermissionView, Role, User
 from sqlalchemy import and_, or_
+from sqlalchemy.orm import joinedload
 
 from airflow import models
 from airflow.exceptions import AirflowException
@@ -41,7 +43,9 @@ EXISTING_ROLES = {
 
 CAN_CREATE = 'can_create'
 CAN_READ = 'can_read'
+CAN_DAG_READ = 'can_dag_read'
 CAN_EDIT = 'can_edit'
+CAN_DAG_EDIT = 'can_dag_edit'
 CAN_DELETE = 'can_delete'
 
 
@@ -276,60 +280,54 @@ class AirflowSecurityManager(SecurityManager, 
LoggingMixin):
 
 def get_readable_dags(self, user):
 """Gets the DAGs readable by authenticated user."""
-return self.get_accessible_dags(CAN_READ, user)
+return self.get_accessible_dags([CAN_READ, CAN_DAG_READ], user)
 
 def get_editable_dags(self, user):
 """Gets the DAGs editable by authenticated user."""
-return self.get_accessible_dags(CAN_EDIT, user)
+return self.get_accessible_dags([CAN_EDIT, CAN_DAG_EDIT], user)
 
-def get_readable_dag_ids(self, user):
+def get_readable_dag_ids(self, user) -> Set[str]:
 """Gets the DAG IDs readable by authenticated user."""
-return [dag.dag_id for dag in self.get_readable_dags(user)]
+return set(dag.dag_id for dag in self.get_readable_dags(user))
 
-def get_editable_dag_ids(self, user):
+def get_editable_dag_ids(self, user) -> Set[str]:
 """Gets the DAG IDs editable by authenticated user."""
-return [dag.dag_id for dag in self.get_editable_dags(user)]
+return set(dag.dag_id for dag in self.get_editable_dags(user))
+
+def get_accessible_dag_ids(self, user) -> Set[str]:
+"""Gets the DAG IDs editable or readable by authenticated user."""
+accessible_dags = self.get_accessible_dags([CAN_EDIT, CAN_DAG_EDIT, 
CAN_READ, CAN_DAG_READ], user)
+return set(dag.dag_id for dag in accessible_dags)
 
 @provide_session
-def get_accessible_dags(self, user_action, user, session=None):
+def get_accessible_dags(self, user_actions, user, session=None):
 """Generic function to get readable or writable DAGs for authenticated 
user."""
 if user.is_anonymous:
 return set()
 
+user_query = (
+session.query(User)
+.options(
+joinedload(User.roles)
+.subqueryload(Role.permissions)
+.options(joinedload(PermissionView.permission), 
joinedload(PermissionView.view_menu))
+)
+.filter(User.id == user.id)
+.first()
+)
 resources = set()
-for role in user.roles:
+for role in user_query.roles:
 for permission in role.permissions:
 resource = permission.view_menu.name
 action = permission.permission.name
-if action == user_action:
+if action in user_actions:
 resources.add(resource)
-if 'Dag' in resources:
+
+if bool({'Dag', 'all_dags'}.intersection(resources)):
 return session.query(DagModel)
 
 return session.query(DagModel).filter(DagModel.dag_id.in_(resources))
 
-def get_accessible_dag_ids(self, username=None) -> Set[str]:
-"""
-Return a set of dags that user has access to(either read or write).
-
-:param username: Name of the user.
-:return: A set of dag ids that the user could access.
-"""
-if not username:
-username = g.user
-
-if username.is_anonymous or 'Public' in username.roles:
-# return an empty set if the role is public
-return set()
-
-roles = {role.name for role in username.roles}
-if {'Admin', 'Viewer', 'User', 'Op'} & roles:
-return self.DAG_VMS
-
-user_perms_views = self.get_all_pe

[airflow] branch master updated: Replace get accessible dag ids (#11027)

2020-10-01 Thread kaxilnaik
This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
 new 427a4a8  Replace get accessible dag ids (#11027)
427a4a8 is described below

commit 427a4a8f01c414ab571578bb6b8fbe5a8c6b32ef
Author: James Timmins 
AuthorDate: Thu Oct 1 09:37:00 2020 -0700

Replace get accessible dag ids (#11027)
---
 airflow/www/security.py| 64 +--
 airflow/www/views.py   | 21 ++--
 tests/www/test_security.py | 84 --
 tests/www/test_views.py|  2 +-
 4 files changed, 93 insertions(+), 78 deletions(-)

diff --git a/airflow/www/security.py b/airflow/www/security.py
index 355ccf0..20686b7 100644
--- a/airflow/www/security.py
+++ b/airflow/www/security.py
@@ -22,7 +22,9 @@ from typing import Set
 from flask import current_app, g
 from flask_appbuilder.security.sqla import models as sqla_models
 from flask_appbuilder.security.sqla.manager import SecurityManager
+from flask_appbuilder.security.sqla.models import PermissionView, Role, User
 from sqlalchemy import and_, or_
+from sqlalchemy.orm import joinedload
 
 from airflow import models
 from airflow.exceptions import AirflowException
@@ -41,7 +43,9 @@ EXISTING_ROLES = {
 
 CAN_CREATE = 'can_create'
 CAN_READ = 'can_read'
+CAN_DAG_READ = 'can_dag_read'
 CAN_EDIT = 'can_edit'
+CAN_DAG_EDIT = 'can_dag_edit'
 CAN_DELETE = 'can_delete'
 
 
@@ -276,60 +280,54 @@ class AirflowSecurityManager(SecurityManager, 
LoggingMixin):
 
 def get_readable_dags(self, user):
 """Gets the DAGs readable by authenticated user."""
-return self.get_accessible_dags(CAN_READ, user)
+return self.get_accessible_dags([CAN_READ, CAN_DAG_READ], user)
 
 def get_editable_dags(self, user):
 """Gets the DAGs editable by authenticated user."""
-return self.get_accessible_dags(CAN_EDIT, user)
+return self.get_accessible_dags([CAN_EDIT, CAN_DAG_EDIT], user)
 
-def get_readable_dag_ids(self, user):
+def get_readable_dag_ids(self, user) -> Set[str]:
 """Gets the DAG IDs readable by authenticated user."""
-return [dag.dag_id for dag in self.get_readable_dags(user)]
+return set(dag.dag_id for dag in self.get_readable_dags(user))
 
-def get_editable_dag_ids(self, user):
+def get_editable_dag_ids(self, user) -> Set[str]:
 """Gets the DAG IDs editable by authenticated user."""
-return [dag.dag_id for dag in self.get_editable_dags(user)]
+return set(dag.dag_id for dag in self.get_editable_dags(user))
+
+def get_accessible_dag_ids(self, user) -> Set[str]:
+"""Gets the DAG IDs editable or readable by authenticated user."""
+accessible_dags = self.get_accessible_dags([CAN_EDIT, CAN_DAG_EDIT, 
CAN_READ, CAN_DAG_READ], user)
+return set(dag.dag_id for dag in accessible_dags)
 
 @provide_session
-def get_accessible_dags(self, user_action, user, session=None):
+def get_accessible_dags(self, user_actions, user, session=None):
 """Generic function to get readable or writable DAGs for authenticated 
user."""
 if user.is_anonymous:
 return set()
 
+user_query = (
+session.query(User)
+.options(
+joinedload(User.roles)
+.subqueryload(Role.permissions)
+.options(joinedload(PermissionView.permission), 
joinedload(PermissionView.view_menu))
+)
+.filter(User.id == user.id)
+.first()
+)
 resources = set()
-for role in user.roles:
+for role in user_query.roles:
 for permission in role.permissions:
 resource = permission.view_menu.name
 action = permission.permission.name
-if action == user_action:
+if action in user_actions:
 resources.add(resource)
-if 'Dag' in resources:
+
+if bool({'Dag', 'all_dags'}.intersection(resources)):
 return session.query(DagModel)
 
 return session.query(DagModel).filter(DagModel.dag_id.in_(resources))
 
-def get_accessible_dag_ids(self, username=None) -> Set[str]:
-"""
-Return a set of dags that user has access to(either read or write).
-
-:param username: Name of the user.
-:return: A set of dag ids that the user could access.
-"""
-if not username:
-username = g.user
-
-if username.is_anonymous or 'Public' in username.roles:
-# return an empty set if the role is public
-return set()
-
-roles = {role.name for role in username.roles}
-if {'Admin', 'Viewer', 'User', 'Op'} & roles:
-return self.DAG_VMS
-
-user_perms_views = self.get_all_pe

[GitHub] [airflow] RaviTezu commented on issue #11043: Create MesosExecutorRemovedRule to ease upgrade to Airflow 2.0

2020-10-01 Thread GitBox


RaviTezu commented on issue #11043:
URL: https://github.com/apache/airflow/issues/11043#issuecomment-702278704


   Hi @turbaszek I would like to take this up. Could you please assign this to 
me? Thanks. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] arsalan993 commented on issue #11202: Airflow flower: Connection reset

2020-10-01 Thread GitBox


arsalan993 commented on issue #11202:
URL: https://github.com/apache/airflow/issues/11202#issuecomment-702283149


   During tasks are being executed on airflow using celery executors when i 
toggle different tabs in flower it suddenly stops working display above error i 
shared above.
   
   Arsalan Ahmed
   
   From: Tomek Urbaszek 
   Sent: Thursday, October 1, 2020 2:05:57 AM
   To: apache/airflow 
   Cc: Arsalan Ahmed ; Mention 

   Subject: Re: [apache/airflow] Airflow flower: Connection reset (#11202)
   
   
   @arsalan993 is there any problem with your 
Airflow deployment or you only see log messages like this one (which I think 
comes from postgres?)?
   
   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on 
GitHub, 
or 
unsubscribe.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] wyattshapiro commented on pull request #10890: Add s3 key to template fields for s3/redshift transfer operators

2020-10-01 Thread GitBox


wyattshapiro commented on pull request #10890:
URL: https://github.com/apache/airflow/pull/10890#issuecomment-702291132


   @feluelle just wanted to touch base on this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] freephys opened a new issue #11224: Branch airflow in middle and test task with different parameters in parallel

2020-10-01 Thread GitBox


freephys opened a new issue #11224:
URL: https://github.com/apache/airflow/issues/11224


   Hi,
I am looking for a way to achieve this inside airflow, hopefully some easy 
demo code to study with.  Start with several jobs with say 50 parameters, then 
at some point (After finishing task A->B) , I wanna to run task C in parallel 
with different values of say 5 parameters(those 5 parameters will try different 
values) , after running task C in parallel, all results are passed onto task D 
and E with same parameters.  The dag id etc are needed to be generated 
dynamically.
 
   Any idea about if this can be done with current airflow or a feature to be 
ask?

   The advantage will be saving time on task A and B and try task C with 
several different parameters so we can pick best one based on the results of 
task D
   
   thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on issue #11224: Branch airflow in middle and test task with different parameters in parallel

2020-10-01 Thread GitBox


boring-cyborg[bot] commented on issue #11224:
URL: https://github.com/apache/airflow/issues/11224#issuecomment-702292884


   Thanks for opening your first issue here! Be sure to follow the issue 
template!
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sudarshan2906 opened a new issue #11225: celery executer with SQS

2020-10-01 Thread GitBox


sudarshan2906 opened a new issue #11225:
URL: https://github.com/apache/airflow/issues/11225


   Hi I am using airflow 1.10.12 with celery executor with SQS. When 
configuring `predefined_queues` in celery as mentioned 
[here](https://docs.celeryproject.org/en/latest/getting-started/brokers/sqs.html#predefined-queues)
 I am getting the bellow error:
   
   ```
   [2020-10-01 17:37:54,498: CRITICAL/MainProcess] Unrecoverable error: 
AttributeError("'str' object has no attribute 'items'")
   Traceback (most recent call last):
 File 
"/usr/local/lib/python3.7/site-packages/kombu/transport/virtual/base.py", line 
921, in create_channel
   return self._avail_channels.pop()
   IndexError: pop from empty list
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
 File "/usr/local/lib/python3.7/site-packages/celery/worker/worker.py", 
line 208, in start
   self.blueprint.start(self)
 File "/usr/local/lib/python3.7/site-packages/celery/bootsteps.py", line 
119, in start
   step.start(parent)
 File "/usr/local/lib/python3.7/site-packages/celery/bootsteps.py", line 
369, in start
   return self.obj.start()
 File 
"/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", 
line 318, in start
   blueprint.start(self)
 File "/usr/local/lib/python3.7/site-packages/celery/bootsteps.py", line 
119, in start
   step.start(parent)
 File 
"/usr/local/lib/python3.7/site-packages/celery/worker/consumer/connection.py", 
line 23, in start
   c.connection = c.connect()
 File 
"/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", 
line 405, in connect
   conn = self.connection_for_read(heartbeat=self.amqheartbeat)
 File 
"/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", 
line 412, in connection_for_read
   self.app.connection_for_read(heartbeat=heartbeat))
 File 
"/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", 
line 439, in ensure_connected
   callback=maybe_shutdown,
 File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 
389, in ensure_connection
   self._ensure_connection(*args, **kwargs)
 File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 
445, in _ensure_connection
   callback, timeout=timeout
 File "/usr/local/lib/python3.7/site-packages/kombu/utils/functional.py", 
line 344, in retry_over_time
   return fun(*args, **kwargs)
 File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 
874, in _connection_factory
   self._connection = self._establish_connection()
 File "/usr/local/lib/python3.7/site-packages/kombu/connection.py", line 
809, in _establish_connection
   conn = self.transport.establish_connection()
 File 
"/usr/local/lib/python3.7/site-packages/kombu/transport/virtual/base.py", line 
941, in establish_connection
   self._avail_channels.append(self.create_channel(self))
 File 
"/usr/local/lib/python3.7/site-packages/kombu/transport/virtual/base.py", line 
923, in create_channel
   channel = self.Channel(connection)
 File "/usr/local/lib/python3.7/site-packages/kombu/transport/SQS.py", line 
134, in __init__
   self._update_queue_cache(self.queue_name_prefix)
 File "/usr/local/lib/python3.7/site-packages/kombu/transport/SQS.py", line 
140, in _update_queue_cache
   for queue_name, q in self.predefined_queues.items():
   AttributeError: 'str' object has no attribute 'items'
   
   ```
   
   airflow.cfg:
   
   ```
   [celery_broker_transport_options]
   'predefined_queues': = { 'my-q': {
   'url': 'https://ap-southeast-2.queue.amazonaws.com/123456/my-q',
   'access_key_id': 'xxx',
   'secret_access_key': 'xxx',
   }
   }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] jedcunningham commented on pull request #10663: Show a few lines of tracebacks for DAG import errors in web UI

2020-10-01 Thread GitBox


jedcunningham commented on pull request #10663:
URL: https://github.com/apache/airflow/pull/10663#issuecomment-702320416


   @turbaszek @ashb  I rebased to rekick the tests... not sure why that one 
test got killed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ryw commented on pull request #10996: Kubernetes executor can adopt tasks from other schedulers

2020-10-01 Thread GitBox


ryw commented on pull request #10996:
URL: https://github.com/apache/airflow/pull/10996#issuecomment-702328729


   @dimberman is there some documentation that should be updated w/ this change?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dimberman merged pull request #10996: Kubernetes executor can adopt tasks from other schedulers

2020-10-01 Thread GitBox


dimberman merged pull request #10996:
URL: https://github.com/apache/airflow/pull/10996


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dimberman commented on pull request #11222: Add UniqueConnIdRule rule and unittest

2020-10-01 Thread GitBox


dimberman commented on pull request #11222:
URL: https://github.com/apache/airflow/pull/11222#issuecomment-702339873


   @edeNFed please get tests to pass and then I'll merge



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch master updated (427a4a8 -> 3ca11eb)

2020-10-01 Thread dimberman
This is an automated email from the ASF dual-hosted git repository.

dimberman pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 427a4a8  Replace get accessible dag ids (#11027)
 add 3ca11eb  Kubernetes executor can adopt tasks from other schedulers 
(#10996)

No new revisions were added by this update.

Summary of changes:
 airflow/cli/commands/dag_command.py|   6 +-
 airflow/executors/base_executor.py |   2 +
 airflow/executors/kubernetes_executor.py   | 192 +++--
 airflow/jobs/scheduler_job.py  |   2 +
 airflow/kubernetes/pod_generator.py|   4 +-
 ... bef4f3d11e8b_drop_kuberesourceversion_and_.py} |  67 +--
 airflow/models/__init__.py |   5 -
 airflow/models/connection.py   |   2 +-
 airflow/models/kubernetes.py   |  88 --
 chart/templates/rbac/pod-launcher-role.yaml|   1 +
 tests/executors/test_kubernetes_executor.py|  80 -
 tests/kubernetes/test_pod_generator.py |   4 +-
 tests/models/test_kubernetes.py|  56 --
 13 files changed, 277 insertions(+), 232 deletions(-)
 copy 
airflow/migrations/versions/{33ae817a1ff4_add_kubernetes_resource_checkpointing.py
 => bef4f3d11e8b_drop_kuberesourceversion_and_.py} (53%)
 delete mode 100644 airflow/models/kubernetes.py
 delete mode 100644 tests/models/test_kubernetes.py



[airflow] branch master updated (427a4a8 -> 3ca11eb)

2020-10-01 Thread dimberman
This is an automated email from the ASF dual-hosted git repository.

dimberman pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 427a4a8  Replace get accessible dag ids (#11027)
 add 3ca11eb  Kubernetes executor can adopt tasks from other schedulers 
(#10996)

No new revisions were added by this update.

Summary of changes:
 airflow/cli/commands/dag_command.py|   6 +-
 airflow/executors/base_executor.py |   2 +
 airflow/executors/kubernetes_executor.py   | 192 +++--
 airflow/jobs/scheduler_job.py  |   2 +
 airflow/kubernetes/pod_generator.py|   4 +-
 ... bef4f3d11e8b_drop_kuberesourceversion_and_.py} |  67 +--
 airflow/models/__init__.py |   5 -
 airflow/models/connection.py   |   2 +-
 airflow/models/kubernetes.py   |  88 --
 chart/templates/rbac/pod-launcher-role.yaml|   1 +
 tests/executors/test_kubernetes_executor.py|  80 -
 tests/kubernetes/test_pod_generator.py |   4 +-
 tests/models/test_kubernetes.py|  56 --
 13 files changed, 277 insertions(+), 232 deletions(-)
 copy 
airflow/migrations/versions/{33ae817a1ff4_add_kubernetes_resource_checkpointing.py
 => bef4f3d11e8b_drop_kuberesourceversion_and_.py} (53%)
 delete mode 100644 airflow/models/kubernetes.py
 delete mode 100644 tests/models/test_kubernetes.py



[GitHub] [airflow] ashb commented on a change in pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


ashb commented on a change in pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#discussion_r498462189



##
File path: airflow/jobs/scheduler_job.py
##
@@ -1179,39 +832,50 @@ def __get_concurrency_maps(
 
 # pylint: disable=too-many-locals,too-many-statements
 @provide_session
-def _find_executable_task_instances(
-self,
-simple_dag_bag: SimpleDagBag,
-session: Session = None
-) -> List[TI]:
+def _executable_task_instances_to_queued(self, max_tis: int, session: 
Session = None) -> List[TI]:
 """
 Finds TIs that are ready for execution with respect to pool limits,
 dag concurrency, executor state, and priority.
 
-:param simple_dag_bag: TaskInstances associated with DAGs in the
-simple_dag_bag will be fetched from the DB and executed
-:type simple_dag_bag: airflow.utils.dag_processing.SimpleDagBag
+:param max_tis: Maximum number of TIs to queue in this loop.
+:type max_tis: int
 :return: list[airflow.models.TaskInstance]
 """
 executable_tis: List[TI] = []
 
+# Get the pool settings. We get a lock on the pool rows, treating this 
as a "critical section"
+# Throws an exception if lock cannot be obtained, rather than blocking
+pools = models.Pool.slots_stats(with_for_update=nowait(session), 
session=session)
+
+# If the pools are full, there is no point doing anything!
+# If _somehow_ the pool is overfull, don't let the limit go negative - 
it breaks SQL
+pool_slots_free = max(0, sum(map(operator.itemgetter('open'), 
pools.values(

Review comment:
   Done in 
https://github.com/apache/airflow/pull/10956/commits/ef9101ff910eba85a2782f894f0e9dfbafc04689





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dimberman commented on pull request #10996: Kubernetes executor can adopt tasks from other schedulers

2020-10-01 Thread GitBox


dimberman commented on pull request #10996:
URL: https://github.com/apache/airflow/pull/10996#issuecomment-702344266


   @ryw I don't think further docs needed as this isn't intended to be used 
externally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


ashb commented on a change in pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#discussion_r498462551



##
File path: airflow/jobs/scheduler_job.py
##
@@ -1705,62 +1305,216 @@ def _run_scheduler_loop(self) -> None:
 loop_duration = loop_end_time - loop_start_time
 self.log.debug("Ran scheduling loop in %.2f seconds", 
loop_duration)
 
-if not is_unit_test:
+if not is_unit_test and not num_queued_tis and not 
num_finished_events:
+# If the scheduler is doing things, don't sleep. This means 
when there is work to do, the
+# scheduler will run "as quick as possible", but when it's 
stopped, it can sleep, dropping CPU
+# usage when "idle"
 time.sleep(self._processor_poll_interval)
 
-if self.processor_agent.done:
+if self.num_runs > 0 and loop_count >= self.num_runs and 
self.processor_agent.done:
 self.log.info(
-"Exiting scheduler loop as all files have been processed 
%d times", self.num_runs
+"Exiting scheduler loop as requested number of runs (%d - 
got to %d) has been reached",
+self.num_runs, loop_count,
 )
 break
 
-def _validate_and_run_task_instances(self, simple_dag_bag: SimpleDagBag) 
-> bool:
-if simple_dag_bag.serialized_dags:
+def _do_scheduling(self, session) -> int:
+"""
+This function is where the main scheduling decisions take places. It:
+
+- Creates any necessary DAG runs by examining the 
next_dagrun_create_after column of DagModel
+
+- Finds the "next n oldest" running DAG Runs to examine for scheduling 
(n=20 by default) and tries to
+  progress state (TIs to SCHEDULED, or DagRuns to SUCCESS/FAILURE etc)
+
+  By "next oldest", we mean hasn't been examined/scheduled in the most 
time.
+
+- Then, via a Critical Section (locking the rows of the Pool model) we 
queue tasks, and then send them
+  to the executor.
+
+  See docs of _critical_section_execute_task_instances for more.
+
+:return: Number of TIs enqueued in this iteration
+:rtype: int
+"""
+try:
+from sqlalchemy import event
+expected_commit = False
+
+# Put a check in place to make sure we don't commit unexpectedly
+@event.listens_for(session.bind, 'commit')
+def validate_commit(_):
+nonlocal expected_commit
+if expected_commit:
+expected_commit = False
+return
+raise RuntimeError("UNEXPECTED COMMIT - THIS WILL BREAK HA 
LOCKS!")

Review comment:
   Done in 
https://github.com/apache/airflow/pull/10956/commits/ef9101ff910eba85a2782f894f0e9dfbafc04689
 -- I think keeping it at runtime is the easiest for now -- it doesn't add much 
overhead.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] ashb commented on a change in pull request #10956: Officially support running more than one scheduler concurrently.

2020-10-01 Thread GitBox


ashb commented on a change in pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#discussion_r498462855



##
File path: airflow/models/dag.py
##
@@ -1941,6 +2087,36 @@ def deactivate_deleted_dags(cls, alive_dag_filelocs: 
List[str], session=None):
 session.rollback()
 raise
 
+@classmethod
+def dags_needing_dagruns(cls, session: Session):
+"""
+Return (and lock) a list of Dag objects that are due to create a new 
DagRun This will return a
+resultset of rows  that is row-level-locked with a "SELECT ... FOR 
UPDATE" query, you should ensure
+that any scheduling decisions are made in a single transaction -- as 
soon as the transaction is
+committed it will be unlocked.
+"""
+
+# TODO[HA]: Bake this query, it is run _A lot_
+# TODO[HA]: Make this limit a tunable. We limit so that _one_ scheduler
+# doesn't try to do all the creation of dag runs
+return session.query(cls).filter(
+cls.is_paused.is_(False),
+cls.is_active.is_(True),
+cls.next_dagrun_create_after <= func.now(),
+).order_by(
+cls.next_dagrun_create_after
+).limit(10).with_for_update(of=cls, **skip_locked(session=session))

Review comment:
   Changed to tunable in 
https://github.com/apache/airflow/pull/10956/commits/ef9101ff910eba85a2782f894f0e9dfbafc04689





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] jhtimmins commented on a change in pull request #11221: Update README.md

2020-10-01 Thread GitBox


jhtimmins commented on a change in pull request #11221:
URL: https://github.com/apache/airflow/pull/11221#discussion_r498465374



##
File path: README.md
##
@@ -79,22 +73,22 @@ Apache Airflow is tested with:
 
 Visit the official Airflow website documentation (latest **stable** release) 
for help with [installing 
Airflow](https://airflow.apache.org/installation.html), [getting 
started](https://airflow.apache.org/start.html), or walking through a more 
complete [tutorial](https://airflow.apache.org/tutorial.html).
 
-> Note: If you're looking for documentation for master branch (latest 
development branch): you can find it on 
[ReadTheDocs](https://airflow.readthedocs.io/en/latest/).
+> Note: If you're looking for documentation for the master branch (latest 
development branch): you can find it on 
[ReadTheDocs](https://airflow.readthedocs.io/en/latest/).
 
 For more information on Airflow's Roadmap or Airflow Improvement Proposals 
(AIPs), visit the [Airflow 
Wiki](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Home).
 
 Official Docker (container) images for Apache Airflow are described in 
[IMAGES.rst](IMAGES.rst).
 
 ## Installing from PyPI
 
-We publish Apache Airflow as `apache-airflow` package in PyPI. Installing it 
however might be sometimes tricky

Review comment:
   `an` isn't the correct article in this case, since `apache-airflow` is 
the name of the specific package, and does not refer to a general type or group 
of packages.

##
File path: README.md
##
@@ -128,20 +122,20 @@ and our official source code releases:
   [Release Approval 
Process](http://www.apache.org/legal/release-policy.html#release-approval)
 
 Following the ASF rules, the source packages released must be sufficient for a 
user to build and test the
-release provided they have access to the appropriate platform and tools.
+the release provided they have access to the appropriate platform and tools.

Review comment:
   `the` is duplicated

##
File path: README.md
##
@@ -128,20 +122,20 @@ and our official source code releases:
   [Release Approval 
Process](http://www.apache.org/legal/release-policy.html#release-approval)
 
 Following the ASF rules, the source packages released must be sufficient for a 
user to build and test the
-release provided they have access to the appropriate platform and tools.
+the release provided they have access to the appropriate platform and tools.
 
 ## Convenience packages
 
 There are other ways of installing and using Airflow. Those are "convenience" 
methods - they are
 not "official releases" as stated by the `ASF Release Policy`, but they can be 
used by the users
-who do not want to build the software themselves.

Review comment:
   `do` is correct in this case since `users` is plural.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] jhtimmins commented on a change in pull request #11221: Update README.md

2020-10-01 Thread GitBox


jhtimmins commented on a change in pull request #11221:
URL: https://github.com/apache/airflow/pull/11221#discussion_r498472638



##
File path: README.md
##
@@ -1,20 +1,14 @@
 

[GitHub] [airflow] turbaszek commented on a change in pull request #11185: Add LegacyUIDeprecated rule and unittests

2020-10-01 Thread GitBox


turbaszek commented on a change in pull request #11185:
URL: https://github.com/apache/airflow/pull/11185#discussion_r498473593



##
File path: airflow/upgrade/rules/legacy_ui_deprecated.py
##
@@ -0,0 +1,32 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import absolute_import
+
+from airflow.configuration import conf
+from airflow.upgrade.rules.base_rule import BaseRule
+
+
+class LegacyUIDeprecatedRule(BaseRule):
+title = "RBAC is enabled by default"
+
+description = "Legacy UI is deprecated. FAB RBAC is enabled by default in 
order to increase security."
+
+def check(self):
+rbac_conf = conf.getboolean("webserver", "rbac")
+if not rbac_conf:
+msg = """rbac in airflow.cfg must be explicitly set empty as RBAC 
mechanism is enabled by default. """

Review comment:
   ```suggestion
   msg = "rbac in airflow.cfg must be explicitly set empty as RBAC 
mechanism is enabled by default. "
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] jhtimmins commented on a change in pull request #11221: Update README.md

2020-10-01 Thread GitBox


jhtimmins commented on a change in pull request #11221:
URL: https://github.com/apache/airflow/pull/11221#discussion_r498473040



##
File path: README.md
##
@@ -55,7 +49,7 @@ Use Airflow to author workflows as directed acyclic graphs 
(DAGs) of tasks. The
 - [Airflow merchandise](#airflow-merchandise)
 - [Links](#links)
 
-
+

Review comment:
   The inclusion of a dash (`-`) here is causing a test to fail, as this 
text is auto-generated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] turbaszek commented on a change in pull request #10991: add azure files to gcs transfer operator

2020-10-01 Thread GitBox


turbaszek commented on a change in pull request #10991:
URL: https://github.com/apache/airflow/pull/10991#discussion_r498474502



##
File path: airflow/providers/google/cloud/transfers/azure_fileshare_to_gcs.py
##
@@ -0,0 +1,190 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from tempfile import NamedTemporaryFile
+from typing import Optional, Union, Sequence, Iterable
+
+from airflow import AirflowException
+from airflow.models import BaseOperator
+from airflow.providers.google.cloud.hooks.gcs import _parse_gcs_url, GCSHook
+from airflow.providers.microsoft.azure.hooks.azure_fileshare import 
AzureFileShareHook
+from airflow.utils.decorators import apply_defaults
+
+
+class AzureFileShareToGCSOperator(BaseOperator):
+"""
+Synchronizes a Azure FileShare directory content (excluding 
subdirectories),
+possibly filtered by a prefix, with a Google Cloud Storage destination 
path.
+
+:param share_name: The Azure FileShare share where to find the objects. 
(templated)
+:type share_name: str
+:param directory_name: (Optional) Path to Azure FileShare directory which 
content is to be transferred.
+Defaults to root directory (templated)
+:type directory_name: str
+:param prefix: Prefix string which filters objects whose name begin with
+such prefix. (templated)
+:type prefix: str
+:param wasb_conn_id: The source WASB connection
+:type wasb_conn_id: str
+:param gcp_conn_id: (Optional) The connection ID used to connect to Google 
Cloud.
+:type gcp_conn_id: str
+:param dest_gcs_conn_id: (Deprecated) The connection ID used to connect to 
Google Cloud.
+This parameter has been deprecated. You should pass the gcp_conn_id 
parameter instead.
+:type dest_gcs_conn_id: str
+:param dest_gcs: The destination Google Cloud Storage bucket and prefix
+where you want to store the files. (templated)
+:type dest_gcs: str
+:param delegate_to: Google account to impersonate using domain-wide 
delegation of authority,
+if any. For this to work, the service account making the request must 
have
+domain-wide delegation enabled.
+:type delegate_to: str
+:param replace: Whether you want to replace existing destination files
+or not.
+:type replace: bool
+:param gzip: Option to compress file for upload
+:type gzip: bool
+:param google_impersonation_chain: Optional Google service account to 
impersonate using
+short-term credentials, or chained list of accounts required to get 
the access_token
+of the last account in the list, which will be impersonated in the 
request.
+If set as a string, the account must grant the originating account
+the Service Account Token Creator IAM role.
+If set as a sequence, the identities from the list must grant
+Service Account Token Creator IAM role to the directly preceding 
identity, with first
+account from the list granting this role to the originating account 
(templated).
+:type google_impersonation_chain: Optional[Union[str, Sequence[str]]]
+
+Note that ``share_name``, ``directory_name``, ``prefix``, ``delimiter`` 
and ``dest_gcs`` are
+templated, so you can use variables in them if you wish.
+"""
+
+template_fields: Iterable[str] = (
+'share_name',
+'directory_name',
+'prefix',
+'dest_gcs',
+)
+
+@apply_defaults
+def __init__(
+self,
+*,
+share_name: str,
+dest_gcs: str,
+directory_name: Optional[str] = None,
+prefix: str = '',
+wasb_conn_id: str = 'wasb_default',
+gcp_conn_id: str = 'google_cloud_default',
+delegate_to: Optional[str] = None,
+replace: bool = False,
+gzip: bool = False,
+google_impersonation_chain: Optional[Union[str, Sequence[str]]] = None,
+**kwargs,
+):
+super().__init__(**kwargs)
+
+self.share_name = share_name
+self.directory_name = directory_name
+self.prefix = prefix
+self.wasb_conn_id = wasb_conn_id
+self.gcp_conn_id = gcp_conn_id
+self.dest_gcs = dest_gcs
+self.delegate_to 

[GitHub] [airflow] turbaszek commented on a change in pull request #10991: add azure files to gcs transfer operator

2020-10-01 Thread GitBox


turbaszek commented on a change in pull request #10991:
URL: https://github.com/apache/airflow/pull/10991#discussion_r498474835



##
File path: airflow/providers/microsoft/azure/hooks/azure_fileshare.py
##
@@ -96,6 +97,44 @@ def list_directories_and_files(self, share_name, 
directory_name=None, **kwargs):
 """
 return self.get_conn().list_directories_and_files(share_name, 
directory_name, **kwargs)
 
+def list_files(self, share_name: str, directory_name: Optional[str] = 
None, **kwargs) -> List[str]:
+"""
+Return the list of files stored on a Azure File Share.
+
+:param share_name: Name of the share.
+:type share_name: str
+:param directory_name: Name of the directory.
+:type directory_name: str
+:param kwargs: Optional keyword arguments that
+`FileService.list_directories_and_files()` takes.
+:type kwargs: object
+:return: A list of files
+:rtype: list
+"""
+return [
+obj.name
+for obj in self.list_directories_and_files(share_name, 
directory_name, **kwargs)
+if isinstance(obj, File)
+]
+
+def create_share(self, share_name: str, **kwargs):
+"""
+Create new Azure File Share.
+
+:param share_name: Name of the share.
+:type share_name: str

Review comment:
   Please describe what kwargs can be passed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] boring-cyborg[bot] commented on pull request #11226: Add option to bulk clear DAG Runs in Browse DAG Runs page (#11076)

2020-10-01 Thread GitBox


boring-cyborg[bot] commented on pull request #11226:
URL: https://github.com/apache/airflow/pull/11226#issuecomment-702357507


   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contribution Guide 
(https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for 
testing locally, it’s a heavy docker but it ships with a working Airflow and a 
lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it 
better 🚀.
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] arunvelsriram opened a new pull request #11226: Add option to bulk clear DAG Runs in Browse DAG Runs page (#11076)

2020-10-01 Thread GitBox


arunvelsriram opened a new pull request #11226:
URL: https://github.com/apache/airflow/pull/11226


   Added an option to clear selected dag runs in Browse -> DAG Runs UI.
   
   closes: #11076 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on issue #11225: celery executer with SQS

2020-10-01 Thread GitBox


mik-laj commented on issue #11225:
URL: https://github.com/apache/airflow/issues/11225#issuecomment-702357824


   This section only accepts strings as values. You need to use 
"celery_config_options" to set-up other types. See: 
https://github.com/apache/airflow/blob/master/airflow/executors/celery_executor.py#L66-L69



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj edited a comment on issue #11225: celery executer with SQS

2020-10-01 Thread GitBox


mik-laj edited a comment on issue #11225:
URL: https://github.com/apache/airflow/issues/11225#issuecomment-702357824


   This section only accepts strings as values. You need to use 
"celery_config_options" to set-up other types. See: 
https://github.com/apache/airflow/blob/3ca11eb9b02a2c2591292fd6b76e0e98b8f22656/airflow/executors/celery_executor.py#L66-L69



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil commented on pull request #11217: Delete __init__.py

2020-10-01 Thread GitBox


kaxil commented on pull request #11217:
URL: https://github.com/apache/airflow/pull/11217#issuecomment-702358873


   The __init__.py file is needed here. All K8s tests are failing



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] kaxil closed pull request #11217: Delete __init__.py

2020-10-01 Thread GitBox


kaxil closed pull request #11217:
URL: https://github.com/apache/airflow/pull/11217


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] turbaszek merged pull request #10663: Show a few lines of tracebacks for DAG import errors in web UI

2020-10-01 Thread GitBox


turbaszek merged pull request #10663:
URL: https://github.com/apache/airflow/pull/10663


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] turbaszek commented on pull request #9464: Fix DockerOperator xcom

2020-10-01 Thread GitBox


turbaszek commented on pull request #9464:
URL: https://github.com/apache/airflow/pull/9464#issuecomment-702360591


   Hey @nullhack the test are failing, can you take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[airflow] branch master updated (3ca11eb -> c74b3ac)

2020-10-01 Thread turbaszek
This is an automated email from the ASF dual-hosted git repository.

turbaszek pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 3ca11eb  Kubernetes executor can adopt tasks from other schedulers 
(#10996)
 add c74b3ac  Optional import error tracebacks in web ui (#10663)

No new revisions were added by this update.

Summary of changes:
 airflow/config_templates/config.yml  |  15 
 airflow/config_templates/default_airflow.cfg |   7 ++
 airflow/models/dagbag.py |  18 +++-
 airflow/www/static/css/main.css  |   4 +
 airflow/www/templates/appbuilder/flash.html  |   4 +-
 docs/spelling_wordlist.txt   |   2 +
 tests/jobs/test_scheduler_job.py | 123 +++
 7 files changed, 168 insertions(+), 5 deletions(-)



[airflow] branch master updated (3ca11eb -> c74b3ac)

2020-10-01 Thread turbaszek
This is an automated email from the ASF dual-hosted git repository.

turbaszek pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git.


from 3ca11eb  Kubernetes executor can adopt tasks from other schedulers 
(#10996)
 add c74b3ac  Optional import error tracebacks in web ui (#10663)

No new revisions were added by this update.

Summary of changes:
 airflow/config_templates/config.yml  |  15 
 airflow/config_templates/default_airflow.cfg |   7 ++
 airflow/models/dagbag.py |  18 +++-
 airflow/www/static/css/main.css  |   4 +
 airflow/www/templates/appbuilder/flash.html  |   4 +-
 docs/spelling_wordlist.txt   |   2 +
 tests/jobs/test_scheduler_job.py | 123 +++
 7 files changed, 168 insertions(+), 5 deletions(-)



[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498481854



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^

Review comment:
   ```suggestion
   Waits for a non-empty directory
   ^^^
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] TobKed commented on issue #11203: Big Query Location Not working

2020-10-01 Thread GitBox


TobKed commented on issue #11203:
URL: https://github.com/apache/airflow/issues/11203#issuecomment-702364275


   @turbaszek `get_records` method is inherited from `DbApiHook` and if I am 
not mistaken is present in backported packages as well.
   
   @Iskz I reproduced your case and it seems that error message is little bit 
misleading. It seems that the problem is not in location (I played with it 
extensively as well) but with backtics enclosing table reference.
   According to the documentation:
   https://cloud.google.com/bigquery/docs/reference/legacy-sql#from-tables
   you should enclose table reference in brackets so it would be 
`[my-dashed-project:dataset1.tableName]` (or `[dataset1.tableName]`,  project 
name is optional if dataset belongs to the project defined within used gcp 
connection).
   Let me know does it work for you.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498482306



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/

Review comment:
   Here you must pass the full path to the file, not the directory.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498482612



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_folder_sensor]
+:end-before: [END howto_operator_hdfs_folder_sensor]
+
+
+.. _howto/operator:HdfsRegexSensor:
+
+Waits for matching files by matching on regex
+^^

Review comment:
   ```suggestion
   ^
   ```
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498482779



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_folder_sensor]
+:end-before: [END howto_operator_hdfs_folder_sensor]
+
+
+.. _howto/operator:HdfsRegexSensor:
+
+Waits for matching files by matching on regex
+^^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor` 
operator is used to check for matching files by matching on regex in HDFS.
+
+Use the ``filepath`` parameter to mention the keyspace and table for the 
record. Use dot notation to target a specific keyspace.
+
+Use the ``regex`` parameter to poke until the provided record is found. 
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_regex_sensor]
+:end-before: [END howto_operator_hdfs_regex_sensor]
+
+
+.. _howto/operator:HdfsSensor:
+
+Waits for a file or folder to land in HDFS
+^

Review comment:
   ```suggestion
   ^^
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498482887



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_folder_sensor]
+:end-before: [END howto_operator_hdfs_folder_sensor]
+
+
+.. _howto/operator:HdfsRegexSensor:
+
+Waits for matching files by matching on regex
+^^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor` 
operator is used to check for matching files by matching on regex in HDFS.
+
+Use the ``filepath`` parameter to mention the keyspace and table for the 
record. Use dot notation to target a specific keyspace.
+
+Use the ``regex`` parameter to poke until the provided record is found. 
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_regex_sensor]
+:end-before: [END howto_operator_hdfs_regex_sensor]
+
+
+.. _howto/operator:HdfsSensor:
+
+Waits for a file or folder to land in HDFS
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsSensor` operator 
is used to check for a file or folder to land in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/

Review comment:
   Here you must pass the full path to the file, not the directory.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] iadi7ya commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


iadi7ya commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498484136



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/

Review comment:
   Hi @mik-laj We don't have any example code file for hdfs in that 
directory.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] iadi7ya commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


iadi7ya commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498484511



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_folder_sensor]
+:end-before: [END howto_operator_hdfs_folder_sensor]
+
+
+.. _howto/operator:HdfsRegexSensor:
+
+Waits for matching files by matching on regex
+^^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor` 
operator is used to check for matching files by matching on regex in HDFS.
+
+Use the ``filepath`` parameter to mention the keyspace and table for the 
record. Use dot notation to target a specific keyspace.
+
+Use the ``regex`` parameter to poke until the provided record is found. 
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_regex_sensor]
+:end-before: [END howto_operator_hdfs_regex_sensor]
+
+
+.. _howto/operator:HdfsSensor:
+
+Waits for a file or folder to land in HDFS
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsSensor` operator 
is used to check for a file or folder to land in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/

Review comment:
   Hi @mik-laj We don't have any example code file for hdfs in that 
directory.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498484961



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_folder_sensor]
+:end-before: [END howto_operator_hdfs_folder_sensor]
+
+
+.. _howto/operator:HdfsRegexSensor:
+
+Waits for matching files by matching on regex
+^^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor` 
operator is used to check for matching files by matching on regex in HDFS.
+
+Use the ``filepath`` parameter to mention the keyspace and table for the 
record. Use dot notation to target a specific keyspace.
+
+Use the ``regex`` parameter to poke until the provided record is found. 
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_regex_sensor]
+:end-before: [END howto_operator_hdfs_regex_sensor]
+
+
+.. _howto/operator:HdfsSensor:
+
+Waits for a file or folder to land in HDFS
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsSensor` operator 
is used to check for a file or folder to land in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_sensor]
+:end-before: [END howto_operator_hdfs_sensor]
+
+
+

Review comment:
   ```suggestion
   
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] iadi7ya commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


iadi7ya commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498485167



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_folder_sensor]
+:end-before: [END howto_operator_hdfs_folder_sensor]
+
+
+.. _howto/operator:HdfsRegexSensor:
+
+Waits for matching files by matching on regex
+^^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor` 
operator is used to check for matching files by matching on regex in HDFS.
+
+Use the ``filepath`` parameter to mention the keyspace and table for the 
record. Use dot notation to target a specific keyspace.
+
+Use the ``regex`` parameter to poke until the provided record is found. 
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_regex_sensor]
+:end-before: [END howto_operator_hdfs_regex_sensor]
+
+
+.. _howto/operator:HdfsSensor:
+
+Waits for a file or folder to land in HDFS
+^

Review comment:
   @mik-laj Thank you!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498485341



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_folder_sensor]
+:end-before: [END howto_operator_hdfs_folder_sensor]
+
+
+.. _howto/operator:HdfsRegexSensor:
+
+Waits for matching files by matching on regex
+^^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsRegexSensor` 
operator is used to check for matching files by matching on regex in HDFS.
+
+Use the ``filepath`` parameter to mention the keyspace and table for the 
record. Use dot notation to target a specific keyspace.
+
+Use the ``regex`` parameter to poke until the provided record is found. 
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/
+:language: python
+:dedent: 4
+:start-after: [START howto_operator_hdfs_regex_sensor]
+:end-before: [END howto_operator_hdfs_regex_sensor]
+
+

Review comment:
   ```suggestion
   
   
   ```
   I removed trailing whitepsaces





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] sudarshan2906 commented on issue #11225: celery executer with SQS and predefined_queues

2020-10-01 Thread GitBox


sudarshan2906 commented on issue #11225:
URL: https://github.com/apache/airflow/issues/11225#issuecomment-702368780


   Thanks. I am using my celery configuration file for now. 
   
   But do you think we can add `predefined_queues` in DEFAULT_CELERY_CONFIG . 
That when `predefined_queues` is passed in celery config it will convert it to 
json. Its quite a important config for celery when using sqs as queue.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] mik-laj commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


mik-laj commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498486625



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/

Review comment:
   Can you add these examples? Without examples, the guides are not very 
useful.  The most important thing in the guides for operators is that they show 
example operator usage, so examples are required.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] iadi7ya commented on a change in pull request #11212: Created initial guide for HDFS operators

2020-10-01 Thread GitBox


iadi7ya commented on a change in pull request #11212:
URL: https://github.com/apache/airflow/pull/11212#discussion_r498488305



##
File path: docs/howto/operator/apache/hdfs.rst
##
@@ -0,0 +1,88 @@
+ .. Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+ ..   http://www.apache.org/licenses/LICENSE-2.0
+
+ .. Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+
+
+
+Apache Hadoop(HDFS) Operators
+==
+
+`Apache Hadoop HDFS 
`__  is a distributed 
file system designed to run on commodity hardware. It has many similarities 
with existing distributed file systems. However, the differences from other 
distributed file systems are significant. HDFS is highly fault-tolerant and is 
designed to be deployed on low-cost hardware. HDFS provides high throughput 
access to application data and is suitable for applications that have large 
data sets. HDFS relaxes a few POSIX requirements to enable streaming access to 
file system data. HDFS was originally built as infrastructure for the Apache 
Nutch web search engine project. HDFS is now an Apache Hadoop subproject.
+
+.. contents::
+  :depth: 1
+  :local:
+
+Prerequisite
+
+
+To use operators, you must configure a :doc:`HDFS Connection 
<../../connection/hdfs>`.
+
+.. _howto/operator:HdfsFolderSensor:
+
+Waits for a non-empty directory
+^
+
+The :class:`~airflow.providers.apache.hdfs.sensors.hdfs.HdfsFolderSensor` 
operator is used to check for a non-empty directory in HDFS.
+
+Use the ``filepath`` parameter to poke until the provided file is found.
+
+.. exampleinclude:: /../airflow/providers/apache/hdfs/example_dags/

Review comment:
   Hi @mik-laj I want to do that but I don't much idea about that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dimberman closed pull request #11119: [AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator

2020-10-01 Thread GitBox


dimberman closed pull request #9:
URL: https://github.com/apache/airflow/pull/9


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dimberman commented on pull request #11119: [AIP-34] TaskGroup: A UI task grouping concept as an alternative to SubDagOperator

2020-10-01 Thread GitBox


dimberman commented on pull request #9:
URL: https://github.com/apache/airflow/pull/9#issuecomment-702372954


   Hi @yuqian90 apologies but I need to close this as we're already feature 
freezed on 1.10. 2.0 Alpha is coming out in the next 1-2 weeks and at this 
point the only things we want to add to 1.10 are critical bugfixes and 
migration steps/scripts. Anyone who wants to use this feature will have full 
access to it when the alpha comes out :).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [airflow] dimberman commented on pull request #6905: [AIRFLOW-6361] Run LocalTaskJob directly in Celery task

2020-10-01 Thread GitBox


dimberman commented on pull request #6905:
URL: https://github.com/apache/airflow/pull/6905#issuecomment-702382201


   @mik-laj are we still pursuing this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AIRFLOW-6361) Run LocalTaskJob directly in Celery task

2020-10-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205807#comment-17205807
 ] 

ASF GitHub Bot commented on AIRFLOW-6361:
-

dimberman commented on pull request #6905:
URL: https://github.com/apache/airflow/pull/6905#issuecomment-702382201


   @mik-laj are we still pursuing this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Run LocalTaskJob directly in Celery task
> 
>
> Key: AIRFLOW-6361
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6361
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: executors
>Affects Versions: 1.10.6
>Reporter: Kamil Bregula
>Priority: Major
>  Labels: performance
>
> Hello,
> Celery runs the CLI first command, which contains LocalTaskJob. LocalTaskJob 
> is responsible for starting the next user-code process. This level of 
> isolation is redundant because LocalTaskJob doesn't execute unsafe code. The 
> first command is run by a new process creation, not by a fork, so this is an 
> expensive operation.
> According to preliminary measurements, this change results in an increase in 
> performance close to 30%.
> I will provide more information in PR.
> Best regards
> Kamil Bregula
> After:
> ```
> real  0m38.394s
> user  0m4.340s
> sys   0m1.600s
> real  0m38.355s
> user  0m4.700s
> sys   0m1.340s
> real  0m38.675s
> user  0m4.760s
> sys   0m1.530s
> real  0m38.488s
> user  0m4.770s
> sys   0m1.280s
> real  0m38.434s
> user  0m4.600s
> sys   0m1.390s
> real  0m38.378s
> user  0m4.500s
> sys   0m1.270s
> real  0m38.106s
> user  0m4.200s
> sys   0m1.100s
> real  0m38.082s
> user  0m4.170s
> sys   0m1.030s
> real  0m38.173s
> user  0m4.290s
> sys   0m1.340s
> real  0m38.161s
> user  0m4.460s
> sys   0m1.370s
> ```
> Before:
> ```
> real  0m53.488s
> user  0m5.140s
> sys   0m1.700s
> real  1m8.288s
> user  0m6.430s
> sys   0m2.200s
> real  0m53.371s
> user  0m5.330s
> sys   0m1.630s
> real  0m58.939s
> user  0m6.470s
> sys   0m1.730s
> real  0m53.255s
> user  0m4.950s
> sys   0m1.640s
> real  0m58.802s
> user  0m5.970s
> sys   0m1.790s
> real  0m58.449s
> user  0m5.380s
> sys   0m1.580s
> real  0m53.308s
> user  0m5.120s
> sys   0m1.430s
> real  0m53.485s
> user  0m5.220s
> sys   0m1.290s
> real  0m53.387s
> user  0m5.020s
> sys   0m1.590s
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >