[jira] [Commented] (AIRFLOW-6824) EMRAddStepsOperator does not work well with multi-step XCom

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045157#comment-17045157
 ] 

ASF subversion and git services commented on AIRFLOW-6824:
--

Commit 3ea3e1a2b580b7ed10efe668de0cc37b03673500 in airflow's branch 
refs/heads/master from Bjorn Olsen
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3ea3e1a ]

[AIRFLOW-6824] EMRAddStepsOperator problem with multi-step XCom (#7443)



> EMRAddStepsOperator does not work well with multi-step XCom
> ---
>
> Key: AIRFLOW-6824
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6824
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.10.9
>Reporter: Bjorn Olsen
>Assignee: Bjorn Olsen
>Priority: Minor
> Fix For: 2.0.0
>
>
> EmrAddStepsOperator allows you to add several steps to EMR for processing - 
> the steps must be supplied as a list.
> This works well when passing an actual Python list as the 'steps' value, but 
> we want to be able to generate the list of steps from a previous task - using 
> an XCom.
> We must use the operator as follows, for the templating to work correctly and 
> for it to resolve the XCom:
>  
> {code:java}
> add_steps_task = EmrAddStepsOperator(
>  task_id='add_steps',
>  job_flow_id=job_flow_id,
>  aws_conn_id='aws_default',
>  provide_context=True,
>  steps="{{task_instance.xcom_pull(task_ids='generate_steps')}}"
>  ){code}
>  
> The value in XCom from the 'generate_steps' task looks like (simplified):
> {code:java}
> [{'Name':'Step1'}, {'Name':'Step2'}]
> {code}
> However this is passed as a string to the operator, which cannot be passed to 
> the underlying boto3 library which expects a list object.
> The following won't work either:
> {code:java}
> add_steps_task = EmrAddStepsOperator(
>  task_id='add_steps',
>  job_flow_id=job_flow_id,
>  aws_conn_id='aws_default',
>  provide_context=True,
>  steps={{task_instance.xcom_pull(task_ids='generate_steps')}}
>  ){code}
> Since this is not valid Python.
> We have to pass the steps as a string to the operator, and then convert it 
> into a list after the render_template_fields has happened (immediately before 
> the execute). Therefore the only option is to do the conversion from string 
> to list in the operator's execute method.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6824) EMRAddStepsOperator does not work well with multi-step XCom

2020-02-25 Thread Kamil Bregula (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Bregula resolved AIRFLOW-6824.

Fix Version/s: 2.0.0
   Resolution: Fixed

> EMRAddStepsOperator does not work well with multi-step XCom
> ---
>
> Key: AIRFLOW-6824
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6824
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.10.9
>Reporter: Bjorn Olsen
>Assignee: Bjorn Olsen
>Priority: Minor
> Fix For: 2.0.0
>
>
> EmrAddStepsOperator allows you to add several steps to EMR for processing - 
> the steps must be supplied as a list.
> This works well when passing an actual Python list as the 'steps' value, but 
> we want to be able to generate the list of steps from a previous task - using 
> an XCom.
> We must use the operator as follows, for the templating to work correctly and 
> for it to resolve the XCom:
>  
> {code:java}
> add_steps_task = EmrAddStepsOperator(
>  task_id='add_steps',
>  job_flow_id=job_flow_id,
>  aws_conn_id='aws_default',
>  provide_context=True,
>  steps="{{task_instance.xcom_pull(task_ids='generate_steps')}}"
>  ){code}
>  
> The value in XCom from the 'generate_steps' task looks like (simplified):
> {code:java}
> [{'Name':'Step1'}, {'Name':'Step2'}]
> {code}
> However this is passed as a string to the operator, which cannot be passed to 
> the underlying boto3 library which expects a list object.
> The following won't work either:
> {code:java}
> add_steps_task = EmrAddStepsOperator(
>  task_id='add_steps',
>  job_flow_id=job_flow_id,
>  aws_conn_id='aws_default',
>  provide_context=True,
>  steps={{task_instance.xcom_pull(task_ids='generate_steps')}}
>  ){code}
> Since this is not valid Python.
> We have to pass the steps as a string to the operator, and then convert it 
> into a list after the render_template_fields has happened (immediately before 
> the execute). Therefore the only option is to do the conversion from string 
> to list in the operator's execute method.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6824) EMRAddStepsOperator does not work well with multi-step XCom

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045156#comment-17045156
 ] 

ASF GitHub Bot commented on AIRFLOW-6824:
-

mik-laj commented on pull request #7443: [AIRFLOW-6824] EMRAddStepsOperator 
problem with multi-step XCom
URL: https://github.com/apache/airflow/pull/7443
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> EMRAddStepsOperator does not work well with multi-step XCom
> ---
>
> Key: AIRFLOW-6824
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6824
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.10.9
>Reporter: Bjorn Olsen
>Assignee: Bjorn Olsen
>Priority: Minor
>
> EmrAddStepsOperator allows you to add several steps to EMR for processing - 
> the steps must be supplied as a list.
> This works well when passing an actual Python list as the 'steps' value, but 
> we want to be able to generate the list of steps from a previous task - using 
> an XCom.
> We must use the operator as follows, for the templating to work correctly and 
> for it to resolve the XCom:
>  
> {code:java}
> add_steps_task = EmrAddStepsOperator(
>  task_id='add_steps',
>  job_flow_id=job_flow_id,
>  aws_conn_id='aws_default',
>  provide_context=True,
>  steps="{{task_instance.xcom_pull(task_ids='generate_steps')}}"
>  ){code}
>  
> The value in XCom from the 'generate_steps' task looks like (simplified):
> {code:java}
> [{'Name':'Step1'}, {'Name':'Step2'}]
> {code}
> However this is passed as a string to the operator, which cannot be passed to 
> the underlying boto3 library which expects a list object.
> The following won't work either:
> {code:java}
> add_steps_task = EmrAddStepsOperator(
>  task_id='add_steps',
>  job_flow_id=job_flow_id,
>  aws_conn_id='aws_default',
>  provide_context=True,
>  steps={{task_instance.xcom_pull(task_ids='generate_steps')}}
>  ){code}
> Since this is not valid Python.
> We have to pass the steps as a string to the operator, and then convert it 
> into a list after the render_template_fields has happened (immediately before 
> the execute). Therefore the only option is to do the conversion from string 
> to list in the operator's execute method.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] mik-laj merged pull request #7443: [AIRFLOW-6824] EMRAddStepsOperator problem with multi-step XCom

2020-02-25 Thread GitBox
mik-laj merged pull request #7443: [AIRFLOW-6824] EMRAddStepsOperator problem 
with multi-step XCom
URL: https://github.com/apache/airflow/pull/7443
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-6822) AWS hooks dont always cache the boto3 client

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045153#comment-17045153
 ] 

ASF GitHub Bot commented on AIRFLOW-6822:
-

baolsen commented on pull request #7541: [AIRFLOW-6822] AWS hooks should cache 
boto3 client
URL: https://github.com/apache/airflow/pull/7541
 
 
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AWS hooks dont always cache the boto3 client
> 
>
> Key: AIRFLOW-6822
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6822
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws
>Affects Versions: 1.10.9
>Reporter: Bjorn Olsen
>Assignee: Bjorn Olsen
>Priority: Minor
>
> Implementation of the Amazon AWS hooks (eg S3 hook, Glue hook etc) varies 
> with how they call the underlying aws_hook.get_client_type(X) method.
> Most of the time the client that gets returned is cached by the superclass, 
> but not always. The client should always be cached for performance reasons - 
> creating a client is a time consuming process.
> Example of how to do it (athena.py):
>  
> {code:java}
> def get_conn(self): 
> """
>     check if aws conn exists already or create one and return it 
>     :return: boto3 session
>     """
>     if not self.conn:
>     self.conn = self.get_client_type('athena')
>     return self.conn{code}
>  
>  
> Example of how not to do it: (s3.py):
>  
> {code:java}
> def get_conn(self):
>  return self.get_client_type('s3'){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] baolsen opened a new pull request #7541: [AIRFLOW-6822] AWS hooks should cache boto3 client

2020-02-25 Thread GitBox
baolsen opened a new pull request #7541: [AIRFLOW-6822] AWS hooks should cache 
boto3 client
URL: https://github.com/apache/airflow/pull/7541
 
 
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7476: [AIRFLOW-6856][depends on AIRFLOW-6862][WIP] Bulk fetch paused_dag_ids

2020-02-25 Thread GitBox
codecov-io edited a comment on issue #7476: [AIRFLOW-6856][depends on 
AIRFLOW-6862][WIP] Bulk fetch paused_dag_ids
URL: https://github.com/apache/airflow/pull/7476#issuecomment-589383726
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=h1) 
Report
   > Merging 
[#7476](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/0ec2774120d43fa667a371b384e6006e1d1c7821?src=pr=desc)
 will **decrease** coverage by `0.36%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7476/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#7476  +/-   ##
   ==
   - Coverage   86.81%   86.44%   -0.37% 
   ==
 Files 893  893  
 Lines   4219342342 +149 
   ==
   - Hits3662936604  -25 
   - Misses   5564 5738 +174
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `88.63% <ø> (+0.7%)` | :arrow_up: |
   | 
[airflow/dag/base\_dag.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9kYWcvYmFzZV9kYWcucHk=)
 | `69.56% <ø> (+1.56%)` | :arrow_up: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `90.29% <100%> (+0.05%)` | :arrow_up: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5)
 | `50% <0%> (-50%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `47.18% <0%> (-45.08%)` | :arrow_down: |
   | 
[...roviders/google/cloud/operators/postgres\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9wb3N0Z3Jlc190b19nY3MucHk=)
 | `52.94% <0%> (-32.36%)` | :arrow_down: |
   | 
[...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==)
 | `69.69% <0%> (-25.26%)` | :arrow_down: |
   | 
[airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5)
 | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | ... and [15 
more](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=footer). 
Last update 
[0ec2774...72478de](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7476: [AIRFLOW-6856][depends on AIRFLOW-6862][WIP] Bulk fetch paused_dag_ids

2020-02-25 Thread GitBox
codecov-io edited a comment on issue #7476: [AIRFLOW-6856][depends on 
AIRFLOW-6862][WIP] Bulk fetch paused_dag_ids
URL: https://github.com/apache/airflow/pull/7476#issuecomment-589383726
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=h1) 
Report
   > Merging 
[#7476](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/0ec2774120d43fa667a371b384e6006e1d1c7821?src=pr=desc)
 will **decrease** coverage by `0.26%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7476/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#7476  +/-   ##
   ==
   - Coverage   86.81%   86.55%   -0.27% 
   ==
 Files 893  893  
 Lines   4219342342 +149 
   ==
   + Hits3662936647  +18 
   - Misses   5564 5695 +131
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `88.63% <ø> (+0.7%)` | :arrow_up: |
   | 
[airflow/dag/base\_dag.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9kYWcvYmFzZV9kYWcucHk=)
 | `69.56% <ø> (+1.56%)` | :arrow_up: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `90.72% <100%> (+0.48%)` | :arrow_up: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `47.18% <0%> (-45.08%)` | :arrow_down: |
   | 
[...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==)
 | `69.69% <0%> (-25.26%)` | :arrow_down: |
   | 
[airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5)
 | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | 
[airflow/providers/amazon/aws/hooks/sns.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvYW1hem9uL2F3cy9ob29rcy9zbnMucHk=)
 | `96.42% <0%> (-3.58%)` | :arrow_down: |
   | 
[airflow/models/dagrun.py](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncnVuLnB5)
 | `95.81% <0%> (-0.76%)` | :arrow_down: |
   | ... and [9 
more](https://codecov.io/gh/apache/airflow/pull/7476/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=footer). 
Last update 
[0ec2774...72478de](https://codecov.io/gh/apache/airflow/pull/7476?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on a change in pull request #7537: [AIRFLOW-6454] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
mik-laj commented on a change in pull request #7537: [AIRFLOW-6454] And script 
to benchmark scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#discussion_r384258420
 
 

 ##
 File path: scripts/perf/scheduler_dag_execution_timing.py
 ##
 @@ -0,0 +1,221 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import gc
+import os
+import statistics
+import time
+
+import click
+
+
+class ShortCircutExecutorMixin:
+def __init__(self, stop_when_these_completed):
+super().__init__()
+self.reset(stop_when_these_completed)
+
+def reset(self, stop_when_these_completed):
+self.stop_when_these_completed = {
+# Store the date as a timestamp, as sometimes this is a Pendulum
+# object, others it is a datetime object.
+(run.dag_id, run.execution_date.timestamp()): run for run in 
stop_when_these_completed
+}
+
+def change_state(self, key, state):
+from airflow.utils.state import State
+super().change_state(key, state)
+
+dag_id, task_id, execution_date, __ = key
+run_key = (dag_id, execution_date.timestamp())
+run = self.stop_when_these_completed.get(run_key, None)
+if run and all(t.state == State.SUCCESS for t in 
run.get_task_instances()):
+self.stop_when_these_completed.pop(run_key)
+
+if not self.stop_when_these_completed:
+self.log.warning("STOPPING SCHEDULER -- all runs complete")
+self.scheduler_job.processor_agent._done = True
+else:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+elif state == State.SUCCESS:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+
+
+def get_executor_under_test():
+try:
+# Run against master and 1.10.x releases
+from tests.test_utils.mock_executor import MockExecutor
+except ImportError:
+from tests.executors.test_executor import TestExecutor as MockExecutor
+
+# from airflow.executors.local_executor import LocalExecutor
+
+# Change this to try other executors
+Executor = MockExecutor
+
+class ShortCircutExecutor(ShortCircutExecutorMixin, Executor):
+pass
+
+return ShortCircutExecutor
+
+
+def reset_dag(dag, num_runs, session):
+import airflow.models
+from airflow.utils import timezone
+from airflow.utils.state import State
+
+DR = airflow.models.DagRun
+DM = airflow.models.DagModel
+TI = airflow.models.TaskInstance
+TF = airflow.models.TaskFail
+dag_id = dag.dag_id
+
+session.query(DM).filter(DM.dag_id == dag_id).update({'is_paused': False})
+session.query(DR).filter(DR.dag_id == dag_id).delete()
+session.query(TI).filter(TI.dag_id == dag_id).delete()
+session.query(TF).filter(TF.dag_id == dag_id).delete()
+
+next_run_date = dag.normalize_schedule(dag.start_date or min(t.start_date 
for t in dag.tasks))
+
+for _ in range(num_runs):
+next_run = dag.create_dagrun(
+run_id=DR.ID_PREFIX + next_run_date.isoformat(),
+execution_date=next_run_date,
+start_date=timezone.utcnow(),
+state=State.RUNNING,
+external_trigger=False,
+session=session,
+)
+next_run_date = dag.following_schedule(next_run_date)
+return next_run
+
+
+def pause_all_dags(session):
+from airflow.models.dag import DagModel
+session.query(DagModel).update({'is_paused': True})
+
+
+@click.command()
+@click.option('--num-runs', default=1, help='number of DagRun, to run for each 
DAG')
+@click.option('--repeat', default=3, help='number of times to run test, to 
reduce variance')
+@click.argument('dag_ids', required=True, nargs=-1)
+def main(num_runs, repeat, dag_ids):
+"""
+This script will run the SchedulerJob for the specified dags "to 
completion".
+
+That is it creates a fixed number of DAG runs for the specified DAGs (from
+the configured dag path/example dags etc), disable the scheduler from
+creating more, and then monitor them for completion. When the file task of
+the final dag run is 

[GitHub] [airflow] codecov-io commented on issue #7539: [AIRFLOW-6919] Make Breeze DAG-test friedly

2020-02-25 Thread GitBox
codecov-io commented on issue #7539: [AIRFLOW-6919] Make Breeze DAG-test friedly
URL: https://github.com/apache/airflow/pull/7539#issuecomment-591192171
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7539?src=pr=h1) 
Report
   > Merging 
[#7539](https://codecov.io/gh/apache/airflow/pull/7539?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/311140616daafe496310d642e4164bc53fbd2ad2?src=pr=desc)
 will **decrease** coverage by `0.04%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7539/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7539?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#7539  +/-   ##
   ==
   - Coverage   86.76%   86.72%   -0.05% 
   ==
 Files 896  896  
 Lines   4264942649  
   ==
   - Hits3700536987  -18 
   - Misses   5644 5662  +18
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7539?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/executors/sequential\_executor.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvc2VxdWVudGlhbF9leGVjdXRvci5weQ==)
 | `56% <0%> (-44%)` | :arrow_down: |
   | 
[airflow/utils/helpers.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5)
 | `75.77% <0%> (-6.84%)` | :arrow_down: |
   | 
[airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==)
 | `81.83% <0%> (-6.12%)` | :arrow_down: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `89.49% <0%> (-0.15%)` | :arrow_down: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `84.93% <0%> (+1.36%)` | :arrow_up: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `91.73% <0%> (+1.65%)` | :arrow_up: |
   | 
[airflow/providers/postgres/hooks/postgres.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvaG9va3MvcG9zdGdyZXMucHk=)
 | `94.36% <0%> (+16.9%)` | :arrow_up: |
   | 
[...roviders/google/cloud/operators/postgres\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9wb3N0Z3Jlc190b19nY3MucHk=)
 | `85.29% <0%> (+32.35%)` | :arrow_up: |
   | 
[airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7539/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5)
 | `100% <0%> (+50%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7539?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7539?src=pr=footer). 
Last update 
[3111406...c966f8d](https://codecov.io/gh/apache/airflow/pull/7539?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
codecov-io edited a comment on issue #7492: [AIRFLOW-6871] optimize tree view 
for large DAGs
URL: https://github.com/apache/airflow/pull/7492#issuecomment-589921201
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7492?src=pr=h1) 
Report
   > Merging 
[#7492](https://codecov.io/gh/apache/airflow/pull/7492?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/311140616daafe496310d642e4164bc53fbd2ad2?src=pr=desc)
 will **increase** coverage by `0.08%`.
   > The diff coverage is `84.21%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/7492/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/7492?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#7492  +/-   ##
   ==
   + Coverage   86.76%   86.85%   +0.08% 
   ==
 Files 896  896  
 Lines   4264942663  +14 
   ==
   + Hits3700537055  +50 
   + Misses   5644 5608  -36
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/7492?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/www/views.py](https://codecov.io/gh/apache/airflow/pull/7492/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=)
 | `76.19% <84.21%> (-0.05%)` | :arrow_down: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/7492/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `90.07% <0%> (+0.43%)` | :arrow_up: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/7492/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `84.93% <0%> (+1.36%)` | :arrow_up: |
   | 
[airflow/hooks/dbapi\_hook.py](https://codecov.io/gh/apache/airflow/pull/7492/diff?src=pr=tree#diff-YWlyZmxvdy9ob29rcy9kYmFwaV9ob29rLnB5)
 | `91.73% <0%> (+1.65%)` | :arrow_up: |
   | 
[airflow/providers/postgres/hooks/postgres.py](https://codecov.io/gh/apache/airflow/pull/7492/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvaG9va3MvcG9zdGdyZXMucHk=)
 | `94.36% <0%> (+16.9%)` | :arrow_up: |
   | 
[...roviders/google/cloud/operators/postgres\_to\_gcs.py](https://codecov.io/gh/apache/airflow/pull/7492/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvZ29vZ2xlL2Nsb3VkL29wZXJhdG9ycy9wb3N0Z3Jlc190b19nY3MucHk=)
 | `85.29% <0%> (+32.35%)` | :arrow_up: |
   | 
[airflow/providers/postgres/operators/postgres.py](https://codecov.io/gh/apache/airflow/pull/7492/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvcG9zdGdyZXMvb3BlcmF0b3JzL3Bvc3RncmVzLnB5)
 | `100% <0%> (+50%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/7492?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/7492?src=pr=footer). 
Last update 
[3111406...adf0c6d](https://codecov.io/gh/apache/airflow/pull/7492?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil merged pull request #7540: [AIRFLOW-XXXX] Add instructions on using templating in Bash Script

2020-02-25 Thread GitBox
kaxil merged pull request #7540: [AIRFLOW-] Add instructions on using 
templating in Bash Script
URL: https://github.com/apache/airflow/pull/7540
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] codecov-io edited a comment on issue #6342: [AIRFLOW-5662] fix incorrect naming for scheduler used slot metric

2020-02-25 Thread GitBox
codecov-io edited a comment on issue #6342: [AIRFLOW-5662] fix incorrect naming 
for scheduler used slot metric
URL: https://github.com/apache/airflow/pull/6342#issuecomment-547121627
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/6342?src=pr=h1) 
Report
   > Merging 
[#6342](https://codecov.io/gh/apache/airflow/pull/6342?src=pr=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/311140616daafe496310d642e4164bc53fbd2ad2?src=pr=desc)
 will **decrease** coverage by `0.17%`.
   > The diff coverage is `95.23%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/6342/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/airflow/pull/6342?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#6342  +/-   ##
   ==
   - Coverage   86.76%   86.58%   -0.18% 
   ==
 Files 896  896  
 Lines   4264942677  +28 
   ==
   - Hits3700536953  -52 
   - Misses   5644 5724  +80
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/6342?src=pr=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/ti\_deps/deps/pool\_slots\_available\_dep.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvcG9vbF9zbG90c19hdmFpbGFibGVfZGVwLnB5)
 | `100% <100%> (ø)` | :arrow_up: |
   | 
[airflow/models/pool.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvcG9vbC5weQ==)
 | `96.55% <95%> (-0.82%)` | :arrow_down: |
   | 
[airflow/jobs/scheduler\_job.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzL3NjaGVkdWxlcl9qb2IucHk=)
 | `90.18% <95.23%> (+0.54%)` | :arrow_up: |
   | 
[airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==)
 | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | 
[airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==)
 | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | 
[airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==)
 | `47.18% <0%> (-45.08%)` | :arrow_down: |
   | 
[...viders/cncf/kubernetes/operators/kubernetes\_pod.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy9wcm92aWRlcnMvY25jZi9rdWJlcm5ldGVzL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZC5weQ==)
 | `69.69% <0%> (-25.26%)` | :arrow_down: |
   | 
[airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5)
 | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | 
[airflow/utils/sqlalchemy.py](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9zcWxhbGNoZW15LnB5)
 | `84.93% <0%> (+1.36%)` | :arrow_up: |
   | ... and [4 
more](https://codecov.io/gh/apache/airflow/pull/6342/diff?src=pr=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/6342?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/6342?src=pr=footer). 
Last update 
[3111406...9255ec7](https://codecov.io/gh/apache/airflow/pull/6342?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil opened a new pull request #7540: [AIRFLOW-XXXX] Add instructions on using templating in Bash Script

2020-02-25 Thread GitBox
kaxil opened a new pull request #7540: [AIRFLOW-] Add instructions on using 
templating in Bash Script
URL: https://github.com/apache/airflow/pull/7540
 
 
   Add instructions on using Jinja templating in Bash Script passed to 
BashOperator
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384191672
 
 

 ##
 File path: airflow/www/views.py
 ##
 @@ -1374,90 +1376,115 @@ def tree(self):
 .all()
 )
 dag_runs = {
-dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs}
+dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs
+}
 
 dates = sorted(list(dag_runs.keys()))
 max_date = max(dates) if dates else None
 min_date = min(dates) if dates else None
 
 tis = dag.get_task_instances(start_date=min_date, end_date=base_date)
-task_instances = {}
+task_instances: Dict[Tuple[str, datetime], models.TaskInstance] = {}
 for ti in tis:
-tid = alchemy_to_dict(ti)
-dr = dag_runs.get(ti.execution_date)
-tid['external_trigger'] = dr['external_trigger'] if dr else False
-task_instances[(ti.task_id, ti.execution_date)] = tid
+task_instances[(ti.task_id, ti.execution_date)] = ti
 
-expanded = []
+expanded = set()
 # The default recursion traces every path so that tree view has full
 # expand/collapse functionality. After 5,000 nodes we stop and fall
 # back on a quick DFS search for performance. See PR #320.
-node_count = [0]
+node_count = 0
 node_limit = 5000 / max(1, len(dag.leaves))
 
+def encode_ti(ti: Optional[models.TaskInstance]) -> Optional[List]:
+if not ti:
+return None
+
+# NOTE: order of entry is important here because client JS relies 
on it for
+# tree node reconstruction. Remember to change JS code in tree.html
+# whenever order is altered.
+data = [
+ti.state,
+ti.try_number,
+None,  # start_ts
+None,  # duration
+]
+
+if ti.start_date:
+# round to seconds to reduce payload size
+data[2] = int(ti.start_date.timestamp())
+if ti.duration is not None:
+data[3] = int(ti.duration)
+
+return data
+
 def recurse_nodes(task, visited):
+nonlocal node_count
+node_count += 1
 visited.add(task)
-node_count[0] += 1
-
-children = [
-recurse_nodes(t, visited) for t in task.downstream_list
-if node_count[0] < node_limit or t not in visited]
-
-# D3 tree uses children vs _children to define what is
-# expanded or not. The following block makes it such that
-# repeated nodes are collapsed by default.
-children_key = 'children'
-if task.task_id not in expanded:
-expanded.append(task.task_id)
-elif children:
-children_key = "_children"
-
-def set_duration(tid):
-if (isinstance(tid, dict) and tid.get("state") == 
State.RUNNING and
-tid["start_date"] is not None):
-d = timezone.utcnow() - timezone.parse(tid["start_date"])
-tid["duration"] = d.total_seconds()
-return tid
-
-return {
+task_id = task.task_id
+
+node = {
 'name': task.task_id,
 'instances': [
-set_duration(task_instances.get((task.task_id, d))) or {
-'execution_date': d.isoformat(),
-'task_id': task.task_id
-}
-for d in dates],
-children_key: children,
+encode_ti(task_instances.get((task_id, d)))
+for d in dates
+],
 'num_dep': len(task.downstream_list),
 'operator': task.task_type,
 'retries': task.retries,
 'owner': task.owner,
-'start_date': task.start_date,
-'end_date': task.end_date,
-'depends_on_past': task.depends_on_past,
 'ui_color': task.ui_color,
-'extra_links': task.extra_links,
 }
 
+if task.downstream_list:
+children = [
+recurse_nodes(t, visited) for t in task.downstream_list
+if node_count < node_limit or t not in visited]
+
+# D3 tree uses children vs _children to define what is
+# expanded or not. The following block makes it such that
+# repeated nodes are collapsed by default.
+if task.task_id not in expanded:
+children_key = 'children'
+expanded.add(task.task_id)
+else:
+  

[jira] [Updated] (AIRFLOW-6389) add config for 'allow_multi_scheduler_instances' default True

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6389:
--
Description: 
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5.  Assign 'final' DNS to EC2 2
6. Teardown EC2 1

Issue is that since the metastore db (ie mysql) is shared to both EC2s there is 
a period of time between point 4 and 6 above where there are multiple 
schedulers running. To avoid this proposing config for 
'allow_multi_scheduler_instances' that when set to False, the startup of 
scheduler will detect that another scheduler is running then exit (ie not 
startup) with WARNING message

7. We have cron/systemd setup to keep retrying to to start the scheduler pid, 
so as soon as point 6 completes scheduler should successfully launch on EC2 1



  was:
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5.  Assign 'final' DNS to EC2 2
6. Teardown EC2 1

Issue is that since the metastore db (ie mysql) is shared to both EC2s there is 
a period of time between point 4 and 6 above where there are multiple 
schedulers running. To avoid this proposing config for 
'allow_multi_scheduler_instances' that when set to False, the startup of 
scheduler will detect that another scheduler is running then exit (ie not 
startup) with WARNING message




> add config for 'allow_multi_scheduler_instances' default True
> -
>
> Key: AIRFLOW-6389
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6389
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 1.10.6
>Reporter: t oo
>Priority: Minor
>
> right now common deployment pattern with blue/green build is:
> 1. on EC2 1, start scheduler
> 2. Assign 'final' DNS to EC2 1
> 3. create EC2 2
> 4. start scheduler on EC2 2
> 5.  Assign 'final' DNS to EC2 2
> 6. Teardown EC2 1
> Issue is that since the metastore db (ie mysql) is shared to both EC2s there 
> is a period of time between point 4 and 6 above where there are multiple 
> schedulers running. To avoid this proposing config for 
> 'allow_multi_scheduler_instances' that when set to False, the startup of 
> scheduler will detect that another scheduler is running then exit (ie not 
> startup) with WARNING message
> 7. We have cron/systemd setup to keep retrying to to start the scheduler pid, 
> so as soon as point 6 completes scheduler should successfully launch on EC2 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-4502) new cli command - task_states_for_dag_run

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo closed AIRFLOW-4502.
-
Resolution: Fixed

> new cli command - task_states_for_dag_run
> -
>
> Key: AIRFLOW-4502
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4502
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.10.3
>Reporter: t oo
>Assignee: t oo
>Priority: Minor
>
> This would be a new cli command where you pass arguments of dagid and 
> execution_date and you get a response of all the tasks names and their state 
> (ie skipped,failed,success.etc) and exectime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6919) Make Breeeze more Dag-test friendly

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045000#comment-17045000
 ] 

ASF GitHub Bot commented on AIRFLOW-6919:
-

potiuk commented on pull request #7539: [AIRFLOW-6919] Make Breeze DAG-test 
friedly
URL: https://github.com/apache/airflow/pull/7539
 
 
   Originally Breeze was used to run unit and integration tests, recently system
   tests and finally we make it a bit more friendly to test  your DAGs there. 
You
   can now install any older airflow version in Breeze via
   --install-airflow-version switch and "files/dags" folder is mounted to
   "/files/dags" and this folder is used to read the dags from.
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make Breeeze more Dag-test friendly
> ---
>
> Key: AIRFLOW-6919
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6919
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: breeze
>Affects Versions: 2.0.0, 1.10.9
>Reporter: Jarek Potiuk
>Priority: Major
>
> Originally Breeze was used to run unit and integration tests, recently system 
> tests and finally we make it a bit more friendly to test  your DAGs there. 
> You can now install any older
> airflow version in Breeze via --install-airflow-version switch and 
> "files/dags" folder is mounted to "/files/dags" and this folder is used to 
> read the dags from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-6920) AIRFLOW Feature Parity with LUIGI & CONTROLM

2020-02-25 Thread t oo (Jira)
t oo created AIRFLOW-6920:
-

 Summary: AIRFLOW Feature Parity with LUIGI & CONTROLM 
 Key: AIRFLOW-6920
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6920
 Project: Apache Airflow
  Issue Type: Improvement
  Components: tests
Affects Versions: 1.10.7
Reporter: t oo


*LUIGI* vs *AIRFLOW*

 

200 sequential tasks (so no parallelism):

 

+LUIGI:+
 mkdir -p test_output8
pip install luigi
#no need to start web server, scheduler or meta db
 #*8.3secs* total time for all 200
 time python3 -m luigi --module cloop --local-scheduler ManyMany

 

+AIRFLOW:+
 #*1032 sec* total time for all 200, .16s per task but 5sec gap between tasks
 #intention was for tasks in the DAG to be completely sequential ie task 3 must 
wait for task 2 which must wait for task 1..etc but chain() not working as 
intended?? so used default_pool=1
airflow initdb
nohup airflow webserver -p 8080 &
nohup airflow scheduler  &
airflow trigger_dag looper2
 #look at dagrun start-endtime

 

cloop.py
{code:java}
import os
#import time

import luigi

# To run:
# cd ~/luigi_workflows
# pythonpath=.. luigi --module=luigi_workflows.test_resources ManyMany 
--workers=100

class Sleep(luigi.Task):
#resources = {'foo': 10}

fname = luigi.Parameter()

def requires(self):
#print(self)
zin=self.fname
ii=int(zin.split('_')[1])
if ii > 1:
return Sleep(fname='marker_{}'.format(ii-1))
else:
[]

def full_path(self):
return os.path.join(os.path.dirname(os.path.realpath(__file__)), 
'test_output8', self.fname)

def run(self):
#time.sleep(1)
with open(self.full_path(), 'w') as f:
print('', file=f)

def output(self):
return luigi.LocalTarget(self.full_path())


class Many(luigi.WrapperTask):
n = luigi.IntParameter()

def requires(self):
for i in range(self.n):
yield Sleep(fname='marker_{}'.format(i))


class ManyMany(luigi.WrapperTask):
n = luigi.IntParameter(default=200)

def requires(self):
for i in range(self.n):
yield Many(n=self.n)
{code}
looper2.py
{code:java}
import airflow
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.utils.helpers import chain

args = {
'owner': 'airflow',
'retries': 3,
'start_date': airflow.utils.dates.days_ago(2)
}

dag = DAG(
dag_id='looper2', default_args=args,
schedule_interval=None)

chain([DummyOperator(task_id='op' + str(i), dag=dag) for i in range(1, 201)])

if __name__ == "__main__":
dag.cli()
{code}

I saw similar test in 
https://github.com/apache/airflow/pull/5096 but it did not seem to be 
sequential or using scheduler



Possible test scenarios:
1. 1 DAG with 200 tasks running sequentially
2. 1 DAG with 200 tasks running all in parallel (200 slots)
3. 1 DAG with 200 tasks running all in parallel (48 slots)
4. 200 DAGs each with 1 task
Then repeat above changing 200 to 2000 or 20.etc


Qs: 
1. any plans for an 'in-memory' scheduler like Luigi's? 
2. Anyone open to a Luigi Operator? 
3. Any speedups to make existing scheduler faster? Noting that the tasks here 
are sequential (should be similar time to 200 dags of 1 task each)


ControlM comparison:
is it envisioned that airflow becomes a replacement for 
https://www.bmcsoftware.uk/it-solutions/control-m.html ?
 execution_date seems similar to Order Date, DAG seems similar to job, tasks in 
a dag seem similar to a command called by a job but some of the items I see 
missing:
1. integrating public holiday calendars,
  2. ability to specify schedule like 11am on '2nd weekday of the month', 'last 
5 days of the month', 'last business day of the month'
  3. ability to visualise dependencies between dags (there does not seem to be 
a high level way to say at 11am schedule DAGc after DAGa and DAGb, then at 3pm 
schedule DAGd after DAGc only if DAGc was successful )
  4. ability to click 1 to many dags in a UI and change their state to 
killed/success (force ok).etc and have it instantly affect task instances (ie 
stopping them)
  5. ability to set whole DAGs to 'dummy' on certain days of the week. ie DAGb 
(runs 7 days a week and do stuff) must run after DAGa for each execdate (DAGa 
should do stuff on mon-fri but on sat/sun DAGa should 'do' nothing ie entire 
dag is 'dummy' just to satisfy 'IN condition' of DAGb)
  6. ability to change the number of tasks within a DAG for a diff exec date 
without 'stuffing' up the scheduler/metadb
  7. ability to 'order up' any day in the past/future (for all or some dags) 
and keep it on 'hold', visualise which dags 'would' be scheduled, see dag 
dependencies, and choose to run all/some (or just do nothing and delete them) 
of the DAGs while maintaining dependencies between them and optionally 'forcing 

[GitHub] [airflow] potiuk opened a new pull request #7539: [AIRFLOW-6919] Make Breeze DAG-test friedly

2020-02-25 Thread GitBox
potiuk opened a new pull request #7539: [AIRFLOW-6919] Make Breeze DAG-test 
friedly
URL: https://github.com/apache/airflow/pull/7539
 
 
   Originally Breeze was used to run unit and integration tests, recently system
   tests and finally we make it a bit more friendly to test  your DAGs there. 
You
   can now install any older airflow version in Breeze via
   --install-airflow-version switch and "files/dags" folder is mounted to
   "/files/dags" and this folder is used to read the dags from.
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-6919) Make Breeeze more Dag-test friendly

2020-02-25 Thread Jarek Potiuk (Jira)
Jarek Potiuk created AIRFLOW-6919:
-

 Summary: Make Breeeze more Dag-test friendly
 Key: AIRFLOW-6919
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6919
 Project: Apache Airflow
  Issue Type: Bug
  Components: breeze
Affects Versions: 1.10.9, 2.0.0
Reporter: Jarek Potiuk


Originally Breeze was used to run unit and integration tests, recently system 
tests and finally we make it a bit more friendly to test  your DAGs there. You 
can now install any older

airflow version in Breeze via --install-airflow-version switch and "files/dags" 
folder is mounted to "/files/dags" and this folder is used to read the dags 
from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-5307) Move the BaseOperator to the operators package

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044991#comment-17044991
 ] 

ASF GitHub Bot commented on AIRFLOW-5307:
-

BasPH commented on pull request #5910: [WIP][AIRFLOW-5307] Move BaseOperator to 
airflow.operators.base_operator
URL: https://github.com/apache/airflow/pull/5910
 
 
   **WIP: working out a bug in the docs.**
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5307
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   The BaseOperator currently resides in /airflow/models but has never been a 
database model, i.e. it is not stored in a database table. I suggest to move it 
to its logical place, the operators package.
   
   To preserve backwards compatibility I import the BaseOperator in 
/airflow/models/\_\_init\_\_.py and raise a DeprecationWarning when imported 
from there. All references to airflow.models.BaseOperator have been removed and 
I suggest to remove backward compatibility in Airflow 2.0.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   No test logic altered, only moved package names.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Move the BaseOperator to the operators package
> --
>
> Key: AIRFLOW-5307
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5307
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Bas Harenslak
>Priority: Major
>
> The BaseOperator currently resides in /airflow/models but has never been a 
> database model, i.e. it is not stored in a database table. I suggest to move 
> it to its logical place, the operators package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] kaxil commented on issue #5910: [WIP][AIRFLOW-5307] Move BaseOperator to airflow.operators.base_operator

2020-02-25 Thread GitBox
kaxil commented on issue #5910: [WIP][AIRFLOW-5307] Move BaseOperator to 
airflow.operators.base_operator
URL: https://github.com/apache/airflow/pull/5910#issuecomment-591124831
 
 
   Agree, this should be moved to operator. Can you rebase to the latest master?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] BasPH opened a new pull request #5910: [WIP][AIRFLOW-5307] Move BaseOperator to airflow.operators.base_operator

2020-02-25 Thread GitBox
BasPH opened a new pull request #5910: [WIP][AIRFLOW-5307] Move BaseOperator to 
airflow.operators.base_operator
URL: https://github.com/apache/airflow/pull/5910
 
 
   **WIP: working out a bug in the docs.**
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5307
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   The BaseOperator currently resides in /airflow/models but has never been a 
database model, i.e. it is not stored in a database table. I suggest to move it 
to its logical place, the operators package.
   
   To preserve backwards compatibility I import the BaseOperator in 
/airflow/models/\_\_init\_\_.py and raise a DeprecationWarning when imported 
from there. All references to airflow.models.BaseOperator have been removed and 
I suggest to remove backward compatibility in Airflow 2.0.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   No test logic altered, only moved package names.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-6389) add config for 'allow_multi_scheduler_instances' default True

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6389:
--
Description: 
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5.  Assign 'final' DNS to EC2 2
6. Teardown EC2 1

Issue is that since the metastore db (ie mysql) is shared to both EC2s there is 
a period of time between point 4 and 6 above where there are multiple 
schedulers running. To avoid this proposing config for 
'allow_multi_scheduler_instances' that when set to False, the startup of 
scheduler will detect that another scheduler is running then exit (ie not 
startup) with WARNING message



  was:
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5.  Assign 'final' DNS to EC2 2
6. Teardown EC2 1

Issue is that since the metastore db (ie mysql) is shared to both EC2s there is 
a period of time between point 4 and 6 above where there are multiple 
schedulers running. To avoid this proposing config for 
'allow_multi_scheduler_instances' that when set tot False, the startup of 
scheduler will detect that another scheduler is running then exit (ie not 
startup) with WARNING message




> add config for 'allow_multi_scheduler_instances' default True
> -
>
> Key: AIRFLOW-6389
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6389
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 1.10.6
>Reporter: t oo
>Priority: Minor
>
> right now common deployment pattern with blue/green build is:
> 1. on EC2 1, start scheduler
> 2. Assign 'final' DNS to EC2 1
> 3. create EC2 2
> 4. start scheduler on EC2 2
> 5.  Assign 'final' DNS to EC2 2
> 6. Teardown EC2 1
> Issue is that since the metastore db (ie mysql) is shared to both EC2s there 
> is a period of time between point 4 and 6 above where there are multiple 
> schedulers running. To avoid this proposing config for 
> 'allow_multi_scheduler_instances' that when set to False, the startup of 
> scheduler will detect that another scheduler is running then exit (ie not 
> startup) with WARNING message



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6389) add config for 'allow_multi_scheduler_instances' default True

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6389:
--
Description: 
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5.  Assign 'final' DNS to EC2 2
6. Teardown EC2 1

Issue is that since the metastore db (ie mysql) is shared to both EC2s there is 
a period of time between point 4 and 6 above where there are multiple 
schedulers running. To avoid this proposing config for 
'allow_multi_scheduler_instances' that when set tot False, the startup of 
scheduler will detect that another scheduler is running then exit (ie not 
startup) with WARNING message



  was:
right now common deployment pattern with blue/green build is:
1. on EC2 1, start scheduler
2. Assign 'final' DNS to EC2 1
3. create EC2 2
4. start scheduler on EC2 2
5.  Assign 'final' DNS to EC2 2
6. Teardown EC2 1

Issue is that since the megastore db (ie mysql) is shared to both EC2s there is 
a period of time between point 4 and 6 above where there are multiple 
schedulers running. To avoid this proposing config for 
'allow_multi_scheduler_instances' that when set tot False, the startup of 
scheduler will detect that another scheduler is running then exit (ie not 
startup) with WARNING message




> add config for 'allow_multi_scheduler_instances' default True
> -
>
> Key: AIRFLOW-6389
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6389
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Affects Versions: 1.10.6
>Reporter: t oo
>Priority: Minor
>
> right now common deployment pattern with blue/green build is:
> 1. on EC2 1, start scheduler
> 2. Assign 'final' DNS to EC2 1
> 3. create EC2 2
> 4. start scheduler on EC2 2
> 5.  Assign 'final' DNS to EC2 2
> 6. Teardown EC2 1
> Issue is that since the metastore db (ie mysql) is shared to both EC2s there 
> is a period of time between point 4 and 6 above where there are multiple 
> schedulers running. To avoid this proposing config for 
> 'allow_multi_scheduler_instances' that when set tot False, the startup of 
> scheduler will detect that another scheduler is running then exit (ie not 
> startup) with WARNING message



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6388) SparkSubmitOperator polling should not 'consume' a pool slot

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6388:
--
Summary: SparkSubmitOperator polling should not 'consume' a pool slot  
(was: SparkSubmitOperator polling should not 'consume' a slot)

> SparkSubmitOperator polling should not 'consume' a pool slot
> 
>
> Key: AIRFLOW-6388
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6388
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: dependencies, scheduler
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Minor
>
> Spark jobs can often take many minutes (or even hours) to complete. 
> The spark submit operator submits a job to a spark cluster, then continually 
> polls its status until it detects the spark job has ended. This means it 
> could be consuming a 'slot' (ie parallelism, dag_concurrency, 
> max_active_dag_runs_per_dag, non_pooled_task_slot_count) for hours when it is 
> not 'doing' anything but polling for status. 
> https://github.com/apache/airflow/pull/6909#discussion_r361838225 suggested 
> it should move to a poke/reschedule model.
> Another thing to note is that in cluster mode a spark-submit made to a 'full' 
> spark cluster will sit in WAITING state on spark side until some cores/memory 
> is freed, then the driver/app can go into RUNNING
> "This actually means occupy worker and do nothing for n seconds is it not?
> It was OK when it was 1 second but users may set it to even 5 min without 
> realising that it occupys the worker.
> My comment here is more of a concern rather than an action to do.
> Should this work by occupying the worker "indefinitely" or can it be 
> something like the sensors with (poke/reschedule)?"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6443) Pool name - increase length > 50, make cli give error if too large

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6443:
--
Labels: gsoc gsoc2020 mentor pool  (was: pool)

> Pool name - increase length > 50, make cli give error if too large
> --
>
> Key: AIRFLOW-6443
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6443
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli, database, models
>Affects Versions: 1.10.3
>Reporter: t oo
>Priority: Major
>  Labels: gsoc, gsoc2020, mentor, pool
>
> create some pool names (using cli) with 70 or 80 character length
>  
> 1. UI does not allow creating > 50 length but why does cli?
> click on one of the pool names listed (link is cut to 50 char name: 
> [https://domain:8080/admin/airflow/task?flt1_pool_equals=qjfdal_CRCE_INTERCONNECTION_FORECAST_TNC_EJFLSA_LP)]
> If click 'edit' it shows full 80chars in Description but cut 50chars in Pool
> 2. why limit to 50 length at all? should be increased - say 256
> 3. if trying to create really large length (more than models length) then cli 
> should give error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] saguziel commented on issue #7269: [AIRFLOW-6651] Add Redis Heartbeat option

2020-02-25 Thread GitBox
saguziel commented on issue #7269: [AIRFLOW-6651] Add Redis Heartbeat option
URL: https://github.com/apache/airflow/pull/7269#issuecomment-591116708
 
 
   Celery does not support this type of thing, and it would not be portable 
across executors. Celery's consistency model is really simple, which is why it 
gets really messy with the task_acks_late arg and the fair scheduler and 
whatnot.  I would say Celery is not designed to offer near-exactly-once type 
semantics which is why we have so many hacks on top of it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] tooptoop4 commented on issue #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
tooptoop4 commented on issue #7537: [AIRFLOW-] And script to benchmark 
scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#issuecomment-591113663
 
 
   maybe link to https://jira.apache.org/jira/browse/AIRFLOW-6454 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7537: [AIRFLOW-] And script to 
benchmark scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#discussion_r384173851
 
 

 ##
 File path: scripts/perf/scheduler_dag_execution_timing.py
 ##
 @@ -0,0 +1,221 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import gc
+import os
+import statistics
+import time
+
+import click
+
+
+class ShortCircutExecutorMixin:
+def __init__(self, stop_when_these_completed):
+super().__init__()
 
 Review comment:
   Used like this:
   
   
https://github.com/apache/airflow/pull/7537/files#diff-fda366d49e19f54d95380db869c369e1R68-R71
   
   ```python
   Executor = MockExecutor
   
   class ShortCircutExecutor(ShortCircutExecutorMixin, Executor):
   pass
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7537: [AIRFLOW-] And script to 
benchmark scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#discussion_r384173467
 
 

 ##
 File path: scripts/perf/scheduler_dag_execution_timing.py
 ##
 @@ -0,0 +1,221 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import gc
+import os
+import statistics
+import time
+
+import click
+
+
+class ShortCircutExecutorMixin:
+def __init__(self, stop_when_these_completed):
+super().__init__()
 
 Review comment:
   Yes, because it's a MixIn.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on a change in pull request #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
kaxil commented on a change in pull request #7537: [AIRFLOW-] And script to 
benchmark scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#discussion_r384173156
 
 

 ##
 File path: scripts/perf/scheduler_dag_execution_timing.py
 ##
 @@ -0,0 +1,221 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import gc
+import os
+import statistics
+import time
+
+import click
+
+
+class ShortCircutExecutorMixin:
+def __init__(self, stop_when_these_completed):
+super().__init__()
 
 Review comment:
   Do we need a super call here as we are not subclassing anything?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #7510: [AIRFLOW-6887][WIP] Do not check the state of fresh DAGRun

2020-02-25 Thread GitBox
ashb commented on issue #7510: [AIRFLOW-6887][WIP] Do not check the state of 
fresh DAGRun
URL: https://github.com/apache/airflow/pull/7510#issuecomment-591112184
 
 
   @Fokko `run.refresh_from_db()` just loads the id and state columns. The case 
you are talking about is still handled correctly (by the sligtly badly named 
`verify_integrity` function.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384172061
 
 

 ##
 File path: airflow/www/views.py
 ##
 @@ -1374,90 +1376,115 @@ def tree(self):
 .all()
 )
 dag_runs = {
-dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs}
+dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs
+}
 
 dates = sorted(list(dag_runs.keys()))
 max_date = max(dates) if dates else None
 min_date = min(dates) if dates else None
 
 tis = dag.get_task_instances(start_date=min_date, end_date=base_date)
-task_instances = {}
+task_instances: Dict[Tuple[str, datetime], models.TaskInstance] = {}
 for ti in tis:
-tid = alchemy_to_dict(ti)
-dr = dag_runs.get(ti.execution_date)
-tid['external_trigger'] = dr['external_trigger'] if dr else False
-task_instances[(ti.task_id, ti.execution_date)] = tid
+task_instances[(ti.task_id, ti.execution_date)] = ti
 
-expanded = []
+expanded = set()
 # The default recursion traces every path so that tree view has full
 # expand/collapse functionality. After 5,000 nodes we stop and fall
 # back on a quick DFS search for performance. See PR #320.
-node_count = [0]
+node_count = 0
 node_limit = 5000 / max(1, len(dag.leaves))
 
+def encode_ti(ti: Optional[models.TaskInstance]) -> Optional[List]:
+if not ti:
+return None
+
+# NOTE: order of entry is important here because client JS relies 
on it for
+# tree node reconstruction. Remember to change JS code in tree.html
+# whenever order is altered.
+data = [
+ti.state,
+ti.try_number,
+None,  # start_ts
+None,  # duration
+]
+
+if ti.start_date:
+# round to seconds to reduce payload size
+data[2] = int(ti.start_date.timestamp())
+if ti.duration is not None:
+data[3] = int(ti.duration)
+
+return data
+
 def recurse_nodes(task, visited):
+nonlocal node_count
+node_count += 1
 visited.add(task)
-node_count[0] += 1
-
-children = [
-recurse_nodes(t, visited) for t in task.downstream_list
-if node_count[0] < node_limit or t not in visited]
-
-# D3 tree uses children vs _children to define what is
-# expanded or not. The following block makes it such that
-# repeated nodes are collapsed by default.
-children_key = 'children'
-if task.task_id not in expanded:
-expanded.append(task.task_id)
-elif children:
-children_key = "_children"
-
-def set_duration(tid):
-if (isinstance(tid, dict) and tid.get("state") == 
State.RUNNING and
-tid["start_date"] is not None):
-d = timezone.utcnow() - timezone.parse(tid["start_date"])
-tid["duration"] = d.total_seconds()
-return tid
-
-return {
+task_id = task.task_id
+
+node = {
 'name': task.task_id,
 'instances': [
-set_duration(task_instances.get((task.task_id, d))) or {
-'execution_date': d.isoformat(),
-'task_id': task.task_id
-}
-for d in dates],
-children_key: children,
+encode_ti(task_instances.get((task_id, d)))
+for d in dates
+],
 'num_dep': len(task.downstream_list),
 'operator': task.task_type,
 'retries': task.retries,
 'owner': task.owner,
-'start_date': task.start_date,
-'end_date': task.end_date,
-'depends_on_past': task.depends_on_past,
 'ui_color': task.ui_color,
-'extra_links': task.extra_links,
 }
 
+if task.downstream_list:
+children = [
+recurse_nodes(t, visited) for t in task.downstream_list
+if node_count < node_limit or t not in visited]
+
+# D3 tree uses children vs _children to define what is
+# expanded or not. The following block makes it such that
+# repeated nodes are collapsed by default.
+if task.task_id not in expanded:
+children_key = 'children'
+expanded.add(task.task_id)
+else:
+ 

[GitHub] [airflow] ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384171265
 
 

 ##
 File path: airflow/www/templates/airflow/tree.html
 ##
 @@ -85,8 +85,59 @@
 

[jira] [Commented] (AIRFLOW-6518) Task did not retry when there was temporary metastore db connectivity loss

2020-02-25 Thread t oo (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044969#comment-17044969
 ] 

t oo commented on AIRFLOW-6518:
---

[~ash] [~potiuk] thoughts?

> Task did not retry when there was temporary metastore db connectivity loss
> --
>
> Key: AIRFLOW-6518
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6518
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: database, scheduler
>Affects Versions: 1.10.6
>Reporter: t oo
>Priority: Major
>
> My DAG has got retries configured at the task level. I started a dagrun, then 
> while a task was running the metastore db crashed, the task failed, but the 
> dagrun did not attempt to retry the task (even though task retries are 
> configured!), db recovered 3 seconds after the task failed, instead the 
> dagrun went to FAILED state.
> *Last part of log of TaskInstance:*
> [2020-01-08 17:34:46,301] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk Traceback (most recent call last):
> [2020-01-08 17:34:46,301] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File "/home/ec2-user/venv/bin/airflow", line 37, in 
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk args.func(args)
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", 
> line 74, in wrapper
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk return f(*args, **kwargs)
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", 
> line 551, in run
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk _run(args, dag, ti)
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", 
> line 469, in _run
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk pool=args.pool,
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", 
> line 74, in wrapper
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk return func(*args, **kwargs)
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/models/taskinstance.py",
>  line 962, in _run_raw_task
> [2020-01-08 17:34:46,302] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk self.refresh_from_db()
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", 
> line 74, in wrapper
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk return func(*args, **kwargs)
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/models/taskinstance.py",
>  line 461, in refresh_from_db
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk ti = qry.first()
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 3265, in first
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk ret = list(self[0:1])
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 3043, in __getitem__
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk return list(res)
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 3367, in __iter__
> [2020-01-08 17:34:46,303] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk return self._execute_and_instances(context)
> [2020-01-08 17:34:46,304] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> mytsk   File 
> "/home/ec2-user/venv/local/lib64/python2.7/site-packages/sqlalchemy/orm/query.py",
>  line 3389, in _execute_and_instances
> [2020-01-08 17:34:46,304] {base_task_runner.py:115} INFO - Job 34662: Subtask 
> 

[jira] [Updated] (AIRFLOW-6747) UI - Show count of tasks in each dag on the main dags page

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6747:
--
Labels: gsoc gsoc2020 mentor  (was: )

> UI - Show count of tasks in each dag on the main dags page
> --
>
> Key: AIRFLOW-6747
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6747
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: 1.10.7
>Reporter: t oo
>Priority: Minor
>  Labels: gsoc, gsoc2020, mentor
>
> Main DAGs page in UI - would benefit from showing a new column: number of 
> tasks for each dag id



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] ashb commented on issue #7538: [AIRFLOW-6382] Add reason for pre-commit rule

2020-02-25 Thread GitBox
ashb commented on issue #7538: [AIRFLOW-6382] Add reason for pre-commit rule
URL: https://github.com/apache/airflow/pull/7538#issuecomment-591108820
 
 
   (so much for the CI skip)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-6918) don't use 'is' in if conditions comparing STATE

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6918:
--
Description: Trivial code tidy-up - 'is' usually is used for testing 
booleans or None presence. This change ensures STATE values are tested with == 
/ != instead of 'is'

> don't use 'is' in if conditions comparing STATE
> ---
>
> Key: AIRFLOW-6918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6918
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.9
>Reporter: t oo
>Assignee: t oo
>Priority: Trivial
>
> Trivial code tidy-up - 'is' usually is used for testing booleans or None 
> presence. This change ensures STATE values are tested with == / != instead of 
> 'is'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384168703
 
 

 ##
 File path: airflow/www/templates/airflow/tree.html
 ##
 @@ -85,8 +85,59 @@
 
 $('span.status_square').tooltip({html: true});
 
+function ts_to_dtstr(ts) {
+  var dt = new Date(ts * 1000);
+  return dt.toISOString();
+}
+
+function is_dag_run(d) {
+  return d.run_id !== undefined;
+}
+
+var now_ts = Date.now()/1000;
+
+function populate_taskinstance_properties(node) {
+  // populate task instance properties for display purpose
+  var j;
+  for (j=0; j

[GitHub] [airflow] tooptoop4 commented on issue #7536: [AIRFLOW-6918] don't use 'is' in if conditions comparing STATE

2020-02-25 Thread GitBox
tooptoop4 commented on issue #7536: [AIRFLOW-6918] don't use 'is' in if 
conditions comparing STATE
URL: https://github.com/apache/airflow/pull/7536#issuecomment-591107895
 
 
   @kaxil updated


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384164611
 
 

 ##
 File path: airflow/www/templates/airflow/tree.html
 ##
 @@ -85,8 +85,59 @@
 
 $('span.status_square').tooltip({html: true});
 
+function ts_to_dtstr(ts) {
+  var dt = new Date(ts * 1000);
+  return dt.toISOString();
+}
+
+function is_dag_run(d) {
+  return d.run_id !== undefined;
+}
+
+var now_ts = Date.now()/1000;
+
+function populate_taskinstance_properties(node) {
+  // populate task instance properties for display purpose
+  var j;
+  for (j=0; j

[GitHub] [airflow] ashb commented on a change in pull request #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7537: [AIRFLOW-] And script to 
benchmark scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#discussion_r384161251
 
 

 ##
 File path: scripts/perf/scheduler_dag_execution_timing.py
 ##
 @@ -0,0 +1,220 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import gc
+import os
+import statistics
+import time
+
+import click
+
+
+class ShortCircutExecutorMixin:
+def __init__(self, stop_when_these_completed):
+super().__init__()
+self.reset(stop_when_these_completed)
+
+def reset(self, stop_when_these_completed):
+self.stop_when_these_completed = {
+# Store the date as a timestamp, as sometimes this is a Pendulum
+# object, others it is a datetime object.
+(run.dag_id, run.execution_date.timestamp()): run for run in 
stop_when_these_completed
+}
+
+def change_state(self, key, state):
+from airflow.utils.state import State
+super().change_state(key, state)
+
+dag_id, task_id, execution_date, __ = key
+run_key = (dag_id, execution_date.timestamp())
+run = self.stop_when_these_completed.get(run_key, None)
+if run and all(t.state == State.SUCCESS for t in 
run.get_task_instances()):
+self.stop_when_these_completed.pop(run_key)
+
+if not self.stop_when_these_completed:
+self.log.warning("STOPPING SCHEDULER -- all runs complete")
+self.scheduler_job.processor_agent._done = True
+else:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+elif state == State.SUCCESS:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+
+
+def get_executor_under_test():
+try:
+# Run against master and 1.10.x releases
+from tests.test_utils.mock_executor import MockExecutor
+except ImportError:
+from tests.executors.test_executor import TestExecutor as MockExecutor
+
+# from airflow.executors.local_executor import LocalExecutor
+
+# Change this to try other executors
+Executor = MockExecutor
+
+class ShortCircutExecutor(ShortCircutExecutorMixin, Executor):
+pass
+
+return ShortCircutExecutor
+
+
+def reset_dag(dag, num_runs, session):
+import airflow.models
+from airflow.utils import timezone
+from airflow.utils.state import State
+
+DR = airflow.models.DagRun
+DM = airflow.models.DagModel
+TI = airflow.models.TaskInstance
+TF = airflow.models.TaskFail
+dag_id = dag.dag_id
+
+session.query(DM).filter(DM.dag_id == dag_id).update({'is_paused': False})
+session.query(DR).filter(DR.dag_id == dag_id).delete()
+session.query(TI).filter(TI.dag_id == dag_id).delete()
+session.query(TF).filter(TF.dag_id == dag_id).delete()
+
+next_run_date = dag.normalize_schedule(dag.start_date or min(t.start_date 
for t in dag.tasks))
+
+for _ in range(num_runs):
+next_run = dag.create_dagrun(
+run_id=DR.ID_PREFIX + next_run_date.isoformat(),
+execution_date=next_run_date,
+start_date=timezone.utcnow(),
+state=State.RUNNING,
+external_trigger=False,
+session=session,
+)
+next_run_date = dag.following_schedule(next_run_date)
+return next_run
+
+
+def pause_all_dags(session):
+from airflow.models.dag import DagModel
+session.query(DagModel).update({'is_paused': True})
+
+
+@click.command()
+@click.option('--num-runs', default=1, help='number of DagRun, to run for each 
DAG')
+@click.option('--repeat', default=3, help='number of times to run test, to 
reduce variance')
+@click.argument('dag_ids', required=True, nargs=-1)
+def main(num_runs, repeat, dag_ids):
+"""
+This script will run the SchedulerJob for the specified dags "to 
completion".
+
+That is it creates a fixed number of DAG runs for the specified DAGs (from
+the configured dag path/example dags etc), disable the scheduler from
+creating more, and then monitor them for completion. When the file task of
+the final dag run is 

[jira] [Commented] (AIRFLOW-6866) export EXTRA_DC_OPTIONS+= fails on mac

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044952#comment-17044952
 ] 

ASF subversion and git services commented on AIRFLOW-6866:
--

Commit 34a1f92ffd5e8977f640e0f1333c70b0c15bdb99 in airflow's branch 
refs/heads/v1-10-test from Jarek Potiuk
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=34a1f92 ]

[AIRFLOW-6866] Fix wrong export for Mac on Breeze (#7485)


(cherry picked from commit 6502cfa8e61cb57673df327f8d1548c4bef89bcf)


> export EXTRA_DC_OPTIONS+= fails on mac
> --
>
> Key: AIRFLOW-6866
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6866
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: breeze
>Affects Versions: 2.0.0, 1.10.9
>Reporter: Jarek Potiuk
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6763) Make system test ready for backport tests

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044954#comment-17044954
 ] 

ASF subversion and git services commented on AIRFLOW-6763:
--

Commit 3b03f861af570629649099900e60fbc2a82daf82 in airflow's branch 
refs/heads/v1-10-test from Jarek Potiuk
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3b03f86 ]

[AIRFLOW-6763] Make systems tests ready for backport tests (#7389)

We will run system test on back-ported operators for 1.10* series of airflow
and for that we need to have support for running system tests using pytest's
markers and reading environment variables passed from HOST machine (to pass
credentials). 

This is the first step to automate system tests execution.

(cherry picked from commit 848fbab5bddb5250d323a2ae5337b34a6e0734dd)


> Make system test ready for backport tests
> -
>
> Key: AIRFLOW-6763
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6763
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Affects Versions: 2.0.0
>Reporter: Jarek Potiuk
>Priority: Major
> Fix For: 2.0.0
>
>
> We will run system test on back-ported operators for 1.10* series of airflow 
> and for that we need to have support for running system tests using pytest's 
> markers and reading environment variables passed from HOST machine (to pass 
> credentials). 
> This is the first step to automate  system test execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6892) Fix broken non-wheel releases

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044953#comment-17044953
 ] 

ASF subversion and git services commented on AIRFLOW-6892:
--

Commit 69f7d2e0ae305648d4c3746dfe4f82dc4cc1118f in airflow's branch 
refs/heads/v1-10-test from Kaxil Naik
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=69f7d2e ]

[AIRFLOW-6892] Fix broken non-wheel releases (#7514)

(cherry picked from commit 20c507f0928efb406a7677d68a6382c100bdcfd3)


> Fix broken non-wheel releases
> -
>
> Key: AIRFLOW-6892
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6892
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: packages
>Affects Versions: 1.10.7, 1.10.8, 1.10.9
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.10
>
>
> Our non-wheel releases on pypi are broken
> {code}
> pip install --no-binary apache-airflow apache-airflow
> {code}
>  will install a version that does this:
> {noformat}
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/home/ash/.virtualenvs/clean/lib/python3.7/site-packages/airflow/serialization/schema.json'
> {noformat}
> There are no issues with wheel package. Pip install wheel package if 
> available and unless you use "--no-binary"specifically



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6838) Introduce real subcommands for Breeze

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044955#comment-17044955
 ] 

ASF subversion and git services commented on AIRFLOW-6838:
--

Commit 156b946d05abaae1c30517050f9045c7bd538815 in airflow's branch 
refs/heads/v1-10-test from Jarek Potiuk
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=156b946 ]

[AIRFLOW-6838] Introduce real subcommands for Breeze (#7515)

This change introduces sub-commands in breeze tool.
It is much needed as we have many commands now
and it was difficult to separate commands from flags.

Also --help output was very long and unreadable.

With this change help it is much easier to discover
what breeze can do for you as well as navigate with it.

Co-authored-by: Jarek Potiuk 

Co-authored-by: Kamil Breguła 


> Introduce real subcommands for Breeze
> -
>
> Key: AIRFLOW-6838
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6838
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0, 1.10.9
>Reporter: Kamil Bregula
>Priority: Major
> Fix For: 1.10.10
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384160648
 
 

 ##
 File path: airflow/www/views.py
 ##
 @@ -1374,90 +1376,115 @@ def tree(self):
 .all()
 )
 dag_runs = {
-dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs}
+dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs
+}
 
 dates = sorted(list(dag_runs.keys()))
 max_date = max(dates) if dates else None
 min_date = min(dates) if dates else None
 
 tis = dag.get_task_instances(start_date=min_date, end_date=base_date)
-task_instances = {}
+task_instances: Dict[Tuple[str, datetime], models.TaskInstance] = {}
 for ti in tis:
-tid = alchemy_to_dict(ti)
-dr = dag_runs.get(ti.execution_date)
-tid['external_trigger'] = dr['external_trigger'] if dr else False
-task_instances[(ti.task_id, ti.execution_date)] = tid
+task_instances[(ti.task_id, ti.execution_date)] = ti
 
-expanded = []
+expanded = set()
 # The default recursion traces every path so that tree view has full
 # expand/collapse functionality. After 5,000 nodes we stop and fall
 # back on a quick DFS search for performance. See PR #320.
-node_count = [0]
+node_count = 0
 node_limit = 5000 / max(1, len(dag.leaves))
 
+def encode_ti(ti: Optional[models.TaskInstance]) -> Optional[List]:
+if not ti:
+return None
+
+# NOTE: order of entry is important here because client JS relies 
on it for
+# tree node reconstruction. Remember to change JS code in tree.html
+# whenever order is altered.
+data = [
+ti.state,
+ti.try_number,
+None,  # start_ts
+None,  # duration
+]
+
+if ti.start_date:
+# round to seconds to reduce payload size
+data[2] = int(ti.start_date.timestamp())
+if ti.duration is not None:
+data[3] = int(ti.duration)
+
+return data
+
 def recurse_nodes(task, visited):
+nonlocal node_count
+node_count += 1
 visited.add(task)
-node_count[0] += 1
-
-children = [
-recurse_nodes(t, visited) for t in task.downstream_list
-if node_count[0] < node_limit or t not in visited]
-
-# D3 tree uses children vs _children to define what is
-# expanded or not. The following block makes it such that
-# repeated nodes are collapsed by default.
-children_key = 'children'
-if task.task_id not in expanded:
-expanded.append(task.task_id)
-elif children:
-children_key = "_children"
-
-def set_duration(tid):
-if (isinstance(tid, dict) and tid.get("state") == 
State.RUNNING and
-tid["start_date"] is not None):
-d = timezone.utcnow() - timezone.parse(tid["start_date"])
-tid["duration"] = d.total_seconds()
-return tid
-
-return {
+task_id = task.task_id
+
+node = {
 'name': task.task_id,
 'instances': [
-set_duration(task_instances.get((task.task_id, d))) or {
-'execution_date': d.isoformat(),
-'task_id': task.task_id
-}
-for d in dates],
-children_key: children,
+encode_ti(task_instances.get((task_id, d)))
+for d in dates
+],
 'num_dep': len(task.downstream_list),
 'operator': task.task_type,
 'retries': task.retries,
 'owner': task.owner,
-'start_date': task.start_date,
-'end_date': task.end_date,
-'depends_on_past': task.depends_on_past,
 'ui_color': task.ui_color,
-'extra_links': task.extra_links,
 }
 
+if task.downstream_list:
+children = [
+recurse_nodes(t, visited) for t in task.downstream_list
+if node_count < node_limit or t not in visited]
+
+# D3 tree uses children vs _children to define what is
+# expanded or not. The following block makes it such that
+# repeated nodes are collapsed by default.
+if task.task_id not in expanded:
+children_key = 'children'
+expanded.add(task.task_id)
+else:
+ 

[jira] [Commented] (AIRFLOW-6382) Extract provide/create session to session module

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044949#comment-17044949
 ] 

ASF GitHub Bot commented on AIRFLOW-6382:
-

ashb commented on pull request #7538: [AIRFLOW-6382] Add reason for pre-commit 
rule
URL: https://github.com/apache/airflow/pull/7538
 
 
   [ci-skip]
   
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Extract provide/create session to session module
> 
>
> Key: AIRFLOW-6382
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6382
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] ashb opened a new pull request #7538: [AIRFLOW-6382] Add reason for pre-commit rule

2020-02-25 Thread GitBox
ashb opened a new pull request #7538: [AIRFLOW-6382] Add reason for pre-commit 
rule
URL: https://github.com/apache/airflow/pull/7538
 
 
   [ci-skip]
   
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to 
session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591098934
 
 
   https://github.com/apache/airflow/pull/7538 will I think solve my 
issue/brain fart.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on issue #7536: [AIRFLOW-6918] don't use 'is' in if conditions comparing STATE

2020-02-25 Thread GitBox
kaxil commented on issue #7536: [AIRFLOW-6918] don't use 'is' in if conditions 
comparing STATE
URL: https://github.com/apache/airflow/pull/7536#issuecomment-591097807
 
 
   The PR and JIRA has no description, please add it before someone can review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384158366
 
 

 ##
 File path: airflow/www/templates/airflow/tree.html
 ##
 @@ -85,8 +85,59 @@
 

[GitHub] [airflow] houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
houqp commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384158262
 
 

 ##
 File path: airflow/www/templates/airflow/tree.html
 ##
 @@ -85,8 +85,59 @@
 

[GitHub] [airflow] ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384158218
 
 

 ##
 File path: airflow/www/views.py
 ##
 @@ -1374,90 +1376,115 @@ def tree(self):
 .all()
 )
 dag_runs = {
-dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs}
+dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs
+}
 
 dates = sorted(list(dag_runs.keys()))
 max_date = max(dates) if dates else None
 min_date = min(dates) if dates else None
 
 tis = dag.get_task_instances(start_date=min_date, end_date=base_date)
-task_instances = {}
+task_instances: Dict[Tuple[str, datetime], models.TaskInstance] = {}
 for ti in tis:
-tid = alchemy_to_dict(ti)
-dr = dag_runs.get(ti.execution_date)
-tid['external_trigger'] = dr['external_trigger'] if dr else False
-task_instances[(ti.task_id, ti.execution_date)] = tid
+task_instances[(ti.task_id, ti.execution_date)] = ti
 
-expanded = []
+expanded = set()
 # The default recursion traces every path so that tree view has full
 # expand/collapse functionality. After 5,000 nodes we stop and fall
 # back on a quick DFS search for performance. See PR #320.
-node_count = [0]
+node_count = 0
 node_limit = 5000 / max(1, len(dag.leaves))
 
+def encode_ti(ti: Optional[models.TaskInstance]) -> Optional[List]:
+if not ti:
+return None
+
+# NOTE: order of entry is important here because client JS relies 
on it for
+# tree node reconstruction. Remember to change JS code in tree.html
+# whenever order is altered.
+data = [
+ti.state,
+ti.try_number,
+None,  # start_ts
+None,  # duration
+]
+
+if ti.start_date:
+# round to seconds to reduce payload size
+data[2] = int(ti.start_date.timestamp())
+if ti.duration is not None:
+data[3] = int(ti.duration)
+
+return data
+
 def recurse_nodes(task, visited):
+nonlocal node_count
+node_count += 1
 visited.add(task)
-node_count[0] += 1
-
-children = [
-recurse_nodes(t, visited) for t in task.downstream_list
-if node_count[0] < node_limit or t not in visited]
-
-# D3 tree uses children vs _children to define what is
-# expanded or not. The following block makes it such that
-# repeated nodes are collapsed by default.
-children_key = 'children'
-if task.task_id not in expanded:
-expanded.append(task.task_id)
-elif children:
-children_key = "_children"
-
-def set_duration(tid):
-if (isinstance(tid, dict) and tid.get("state") == 
State.RUNNING and
-tid["start_date"] is not None):
-d = timezone.utcnow() - timezone.parse(tid["start_date"])
-tid["duration"] = d.total_seconds()
-return tid
-
-return {
+task_id = task.task_id
+
+node = {
 'name': task.task_id,
 'instances': [
-set_duration(task_instances.get((task.task_id, d))) or {
-'execution_date': d.isoformat(),
-'task_id': task.task_id
-}
-for d in dates],
-children_key: children,
+encode_ti(task_instances.get((task_id, d)))
+for d in dates
+],
 'num_dep': len(task.downstream_list),
 'operator': task.task_type,
 'retries': task.retries,
 'owner': task.owner,
-'start_date': task.start_date,
-'end_date': task.end_date,
-'depends_on_past': task.depends_on_past,
 'ui_color': task.ui_color,
-'extra_links': task.extra_links,
 }
 
+if task.downstream_list:
+children = [
+recurse_nodes(t, visited) for t in task.downstream_list
+if node_count < node_limit or t not in visited]
+
+# D3 tree uses children vs _children to define what is
+# expanded or not. The following block makes it such that
+# repeated nodes are collapsed by default.
+if task.task_id not in expanded:
+children_key = 'children'
+expanded.add(task.task_id)
+else:
+  

[GitHub] [airflow] ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to 
session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591096260
 
 
   It's probably fine as it is. Def not worth changing all the files again 
anyway.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] houqp edited a comment on issue #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
houqp edited a comment on issue #7537: [AIRFLOW-] And script to benchmark 
scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#issuecomment-591094143
 
 
   Looks like a useful script that's worth mentioning in CONTRIBUTING.rst?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] houqp edited a comment on issue #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
houqp edited a comment on issue #7537: [AIRFLOW-] And script to benchmark 
scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#issuecomment-591094143
 
 
   Looks like something that's worth mentioning in CONTRIBUTING.rst?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
ashb commented on issue #7537: [AIRFLOW-] And script to benchmark scheduler 
dag-run time
URL: https://github.com/apache/airflow/pull/7537#issuecomment-591094351
 
 
   Example invocation, and final line of output (there are a lot of logs in the 
middle)
   
   ```
   ./scripts/perf/scheduler_dag_execution_timing.py --num-runs 2 example_dag 
example_0{1,2,3}_dag
   
   ... #Lots of logs here
   
   Time for 2 dag runs of 4 dags with 48 total tasks: 91.8814s (±0.149s)
   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] houqp commented on issue #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
houqp commented on issue #7537: [AIRFLOW-] And script to benchmark 
scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#issuecomment-591094143
 
 
   Looks like something that's worth mentioning in CONTRIBUTING.rst? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7537: [AIRFLOW-] And script to 
benchmark scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#discussion_r384154553
 
 

 ##
 File path: scripts/perf/scheduler_dag_execution_timing.py
 ##
 @@ -0,0 +1,220 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import gc
+import os
+import statistics
+import time
+
+import click
+
+
+class ShortCircutExecutorMixin:
+def __init__(self, stop_when_these_completed):
+super().__init__()
+self.reset(stop_when_these_completed)
+
+def reset(self, stop_when_these_completed):
+self.stop_when_these_completed = {
+# Store the date as a timestamp, as sometimes this is a Pendulum
+# object, others it is a datetime object.
+(run.dag_id, run.execution_date.timestamp()): run for run in 
stop_when_these_completed
+}
+
+def change_state(self, key, state):
+from airflow.utils.state import State
+super().change_state(key, state)
+
+dag_id, task_id, execution_date, __ = key
+run_key = (dag_id, execution_date.timestamp())
+run = self.stop_when_these_completed.get(run_key, None)
+if run and all(t.state == State.SUCCESS for t in 
run.get_task_instances()):
+self.stop_when_these_completed.pop(run_key)
+
+if not self.stop_when_these_completed:
+self.log.warning("STOPPING SCHEDULER -- all runs complete")
+self.scheduler_job.processor_agent._done = True
+else:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+elif state == State.SUCCESS:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+
+
+def get_executor_under_test():
+try:
+# Run against master and 1.10.x releases
+from tests.test_utils.mock_executor import MockExecutor
+except ImportError:
+from tests.executors.test_executor import TestExecutor as MockExecutor
+
+# from airflow.executors.local_executor import LocalExecutor
+
+# Change this to try other executors
+Executor = MockExecutor
+
+class ShortCircutExecutor(ShortCircutExecutorMixin, Executor):
+pass
+
+return ShortCircutExecutor
+
+
+def reset_dag(dag, num_runs, session):
+import airflow.models
+from airflow.utils import timezone
+from airflow.utils.state import State
+
+DR = airflow.models.DagRun
+DM = airflow.models.DagModel
+TI = airflow.models.TaskInstance
+TF = airflow.models.TaskFail
+dag_id = dag.dag_id
+
+session.query(DM).filter(DM.dag_id == dag_id).update({'is_paused': False})
+session.query(DR).filter(DR.dag_id == dag_id).delete()
+session.query(TI).filter(TI.dag_id == dag_id).delete()
+session.query(TF).filter(TF.dag_id == dag_id).delete()
+
+next_run_date = dag.normalize_schedule(dag.start_date or min(t.start_date 
for t in dag.tasks))
+
+for _ in range(num_runs):
+next_run = dag.create_dagrun(
+run_id=DR.ID_PREFIX + next_run_date.isoformat(),
+execution_date=next_run_date,
+start_date=timezone.utcnow(),
+state=State.RUNNING,
+external_trigger=False,
+session=session,
+)
+next_run_date = dag.following_schedule(next_run_date)
+return next_run
+
+
+def pause_all_dags(session):
+from airflow.models.dag import DagModel
+session.query(DagModel).update({'is_paused': True})
+
+
+@click.command()
+@click.option('--num-runs', default=1, help='number of DagRun, to run for each 
DAG')
+@click.option('--repeat', default=3, help='number of times to run test, to 
reduce variance')
+@click.argument('dag_ids', required=True, nargs=-1)
+def main(num_runs, repeat, dag_ids):
+"""
+This script will run the SchedulerJob for the specified dags "to 
completion".
+
+That is it creates a fixed number of DAG runs for the specified DAGs (from
+the configured dag path/example dags etc), disable the scheduler from
+creating more, and then monitor them for completion. When the file task of
+the final dag run is 

[GitHub] [airflow] ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384154242
 
 

 ##
 File path: airflow/www/templates/airflow/tree.html
 ##
 @@ -85,8 +85,59 @@
 

[GitHub] [airflow] houqp commented on a change in pull request #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
houqp commented on a change in pull request #7537: [AIRFLOW-] And script to 
benchmark scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537#discussion_r384154018
 
 

 ##
 File path: scripts/perf/scheduler_dag_execution_timing.py
 ##
 @@ -0,0 +1,220 @@
+#!/usr/bin/env python3
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import gc
+import os
+import statistics
+import time
+
+import click
+
+
+class ShortCircutExecutorMixin:
+def __init__(self, stop_when_these_completed):
+super().__init__()
+self.reset(stop_when_these_completed)
+
+def reset(self, stop_when_these_completed):
+self.stop_when_these_completed = {
+# Store the date as a timestamp, as sometimes this is a Pendulum
+# object, others it is a datetime object.
+(run.dag_id, run.execution_date.timestamp()): run for run in 
stop_when_these_completed
+}
+
+def change_state(self, key, state):
+from airflow.utils.state import State
+super().change_state(key, state)
+
+dag_id, task_id, execution_date, __ = key
+run_key = (dag_id, execution_date.timestamp())
+run = self.stop_when_these_completed.get(run_key, None)
+if run and all(t.state == State.SUCCESS for t in 
run.get_task_instances()):
+self.stop_when_these_completed.pop(run_key)
+
+if not self.stop_when_these_completed:
+self.log.warning("STOPPING SCHEDULER -- all runs complete")
+self.scheduler_job.processor_agent._done = True
+else:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+elif state == State.SUCCESS:
+self.log.warning("WAITING ON %d RUNS", 
len(self.stop_when_these_completed))
+
+
+def get_executor_under_test():
+try:
+# Run against master and 1.10.x releases
+from tests.test_utils.mock_executor import MockExecutor
+except ImportError:
+from tests.executors.test_executor import TestExecutor as MockExecutor
+
+# from airflow.executors.local_executor import LocalExecutor
+
+# Change this to try other executors
+Executor = MockExecutor
+
+class ShortCircutExecutor(ShortCircutExecutorMixin, Executor):
+pass
+
+return ShortCircutExecutor
+
+
+def reset_dag(dag, num_runs, session):
+import airflow.models
+from airflow.utils import timezone
+from airflow.utils.state import State
+
+DR = airflow.models.DagRun
+DM = airflow.models.DagModel
+TI = airflow.models.TaskInstance
+TF = airflow.models.TaskFail
+dag_id = dag.dag_id
+
+session.query(DM).filter(DM.dag_id == dag_id).update({'is_paused': False})
+session.query(DR).filter(DR.dag_id == dag_id).delete()
+session.query(TI).filter(TI.dag_id == dag_id).delete()
+session.query(TF).filter(TF.dag_id == dag_id).delete()
+
+next_run_date = dag.normalize_schedule(dag.start_date or min(t.start_date 
for t in dag.tasks))
+
+for _ in range(num_runs):
+next_run = dag.create_dagrun(
+run_id=DR.ID_PREFIX + next_run_date.isoformat(),
+execution_date=next_run_date,
+start_date=timezone.utcnow(),
+state=State.RUNNING,
+external_trigger=False,
+session=session,
+)
+next_run_date = dag.following_schedule(next_run_date)
+return next_run
+
+
+def pause_all_dags(session):
+from airflow.models.dag import DagModel
+session.query(DagModel).update({'is_paused': True})
+
+
+@click.command()
+@click.option('--num-runs', default=1, help='number of DagRun, to run for each 
DAG')
+@click.option('--repeat', default=3, help='number of times to run test, to 
reduce variance')
+@click.argument('dag_ids', required=True, nargs=-1)
+def main(num_runs, repeat, dag_ids):
+"""
+This script will run the SchedulerJob for the specified dags "to 
completion".
+
+That is it creates a fixed number of DAG runs for the specified DAGs (from
+the configured dag path/example dags etc), disable the scheduler from
+creating more, and then monitor them for completion. When the file task of
+the final dag run is 

[GitHub] [airflow] nuclearpinguin commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
nuclearpinguin commented on issue #6938: [AIRFLOW-6382] Extract provide/create 
session to session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591092356
 
 
   That was a big cycle removal. As discussed above, I don't know any other 
"session" in airflow so the `session.py` was quite intuitive. Do you think it 
would be worth to rename it to `db_session` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
ashb commented on issue #7537: [AIRFLOW-] And script to benchmark scheduler 
dag-run time
URL: https://github.com/apache/airflow/pull/7537#issuecomment-591092242
 
 
   I haven't created a Jira for this, figuring it's not really "code" as it 
doesn't get shipped with the app, but I can do if anyone would rather.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] nuclearpinguin commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
nuclearpinguin commented on issue #6938: [AIRFLOW-6382] Extract provide/create 
session to session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591091572
 
 
   Yeah, so long that I at first edited your comment instead of reply :D


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree view for large DAGs

2020-02-25 Thread GitBox
ashb commented on a change in pull request #7492: [AIRFLOW-6871] optimize tree 
view for large DAGs
URL: https://github.com/apache/airflow/pull/7492#discussion_r384152129
 
 

 ##
 File path: airflow/www/views.py
 ##
 @@ -1374,90 +1376,113 @@ def tree(self):
 .all()
 )
 dag_runs = {
-dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs}
+dr.execution_date: alchemy_to_dict(dr) for dr in dag_runs
+}
 
 dates = sorted(list(dag_runs.keys()))
 max_date = max(dates) if dates else None
 min_date = min(dates) if dates else None
 
 tis = dag.get_task_instances(start_date=min_date, end_date=base_date)
-task_instances = {}
+task_instances: Dict[Tuple[str, datetime], 
Optional[models.TaskInstance]] = {}
 for ti in tis:
-tid = alchemy_to_dict(ti)
-dr = dag_runs.get(ti.execution_date)
-tid['external_trigger'] = dr['external_trigger'] if dr else False
-task_instances[(ti.task_id, ti.execution_date)] = tid
+task_instances[(ti.task_id, ti.execution_date)] = ti
 
-expanded = []
+expanded = set()
 # The default recursion traces every path so that tree view has full
 # expand/collapse functionality. After 5,000 nodes we stop and fall
 # back on a quick DFS search for performance. See PR #320.
-node_count = [0]
+node_count = 0
 node_limit = 5000 / max(1, len(dag.leaves))
 
+def encode_ti(ti: models.TaskInstance) -> Optional[List]:
+if not ti:
+return None
+
+# NOTE: order of entry is important here because client JS relies 
on it for
+# tree node reconstruction. Remember to change JS code in tree.html
+# whenever order is altered.
+data = [
+ti.state,
+ti.try_number,
+None,  # start_ts
+None,  # duration
+]
+
+if ti.start_date:
+# round to seconds to reduce payload size
+data[2] = int(ti.start_date.timestamp())
+if ti.duration is not None:
+data[3] = int(ti.duration)
+
+return data
+
 def recurse_nodes(task, visited):
+nonlocal node_count
+node_count += 1
 visited.add(task)
-node_count[0] += 1
-
-children = [
-recurse_nodes(t, visited) for t in task.downstream_list
-if node_count[0] < node_limit or t not in visited]
-
-# D3 tree uses children vs _children to define what is
-# expanded or not. The following block makes it such that
-# repeated nodes are collapsed by default.
-children_key = 'children'
-if task.task_id not in expanded:
-expanded.append(task.task_id)
-elif children:
-children_key = "_children"
-
-def set_duration(tid):
-if (isinstance(tid, dict) and tid.get("state") == 
State.RUNNING and
-tid["start_date"] is not None):
-d = timezone.utcnow() - timezone.parse(tid["start_date"])
-tid["duration"] = d.total_seconds()
-return tid
-
-return {
+task_id = task.task_id
+
+node = {
 'name': task.task_id,
 'instances': [
-set_duration(task_instances.get((task.task_id, d))) or {
-'execution_date': d.isoformat(),
-'task_id': task.task_id
-}
-for d in dates],
-children_key: children,
+encode_ti(task_instances.get((task_id, d)))
+for d in dates
+],
 'num_dep': len(task.downstream_list),
 'operator': task.task_type,
 'retries': task.retries,
 'owner': task.owner,
-'start_date': task.start_date,
-'end_date': task.end_date,
-'depends_on_past': task.depends_on_past,
 'ui_color': task.ui_color,
-'extra_links': task.extra_links,
 }
 
+if task.downstream_list:
+children = [
+recurse_nodes(t, visited) for t in task.downstream_list
+if node_count < node_limit or t not in visited]
+
+# D3 tree uses children vs _children to define what is
+# expanded or not. The following block makes it such that
+# repeated nodes are collapsed by default.
+if task.task_id not in expanded:
+children_key = 'children'
 
 Review comment:
   Nope, what you've done sounds good.


[GitHub] [airflow] ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to 
session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591091381
 
 
   :man_facepalming:  oh yeah. Sorry, long day.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] nuclearpinguin commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
nuclearpinguin commented on issue #6938: [AIRFLOW-6382] Extract provide/create 
session to session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591091208
 
 
   >And how did this reduce cyclic imports when utils.db imports the new file 
still?!
   
   That is not a problem. The main reason is that db.py imports all models, and 
plenty of models methods uses provide_session decorator.
   
   So every model imported db.provide_session. And db.py imported every model.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb edited a comment on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb edited a comment on issue #6938: [AIRFLOW-6382] Extract provide/create 
session to session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591081255
 
 
   Why did we make this change?
   
   Neither the PR nor the linked Jira give any justification for the change. It 
doesn't reduce any imports, and personally, I found the previous name clearer.
   
   Now seeing that code for the first time my question would be "what is it 
session?"
   
   Edit, sorry, not used to old PR template.
   
   > Extracting provide_session and create_session to separate module
   reduces number of cyclic imports and make a disctinction between
   session and database.
   
   Can you explain to be what the difference between DB and session is? Cos I 
wouldn't be able to tell you if you asked me. The session is for the db.
   
   And how did this reduce cyclic imports when utils.db imports the new file 
still?!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb edited a comment on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb edited a comment on issue #6938: [AIRFLOW-6382] Extract provide/create 
session to session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591081255
 
 
   > And how did this reduce cyclic imports when utils.db imports the new file 
still?!
   
   That is not a problem. The main reason is that db.py imports all models, and 
plenty of models methods uses `provide_session` decorator. 
   
   So every model imported db.provide_session. And db.py imported every model.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb opened a new pull request #7537: [AIRFLOW-XXXX] And script to benchmark scheduler dag-run time

2020-02-25 Thread GitBox
ashb opened a new pull request #7537: [AIRFLOW-] And script to benchmark 
scheduler dag-run time
URL: https://github.com/apache/airflow/pull/7537
 
 
   This script will run the SchedulerJob for the specified dags "to completion".
   
   That is it creates a fixed number of DAG runs for the specified DAGs (from
   the configured dag path/example dags etc), disable the scheduler from
   creating more, and then monitor them for completion. When the file task of
   the final dag run is completed the scheduler will be terminated.
   
   The aim of this script is to have a benchmark for real-world scheduler
   performance -- i.e. total time take to run N dag runs to completion.
   
   It is recommended to repeat the test at least 3 times so that you can get
   somewhat-accurate variance on the reported timing numbers.
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb edited a comment on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb edited a comment on issue #6938: [AIRFLOW-6382] Extract provide/create 
session to session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591081255
 
 
   Why did we make this change?
   
   Neither the PR nor the linked Jira give _any_ justification for the change. 
It doesn't reduce any imports, and personally, I found the previous name 
_clearer_.
   
   Now seeing that code for the first time my question would be "what is it 
session?"
   
   Edit, sorry, not used to old PR template.
   
   > Extracting provide_session and create_session to separate module
   > reduces number of cyclic imports and make a disctinction between
   > session and database.
   
   Can you explain to be what the difference between DB and session is? Cos I 
wouldn't be able to tell you if you asked me. The session _is for the db_.
   
   And how did this reduce cyclic imports when utils.db imports the new file 
still?!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to 
session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591084955
 
 
   (excuse the grump, but this change, and the pre commit rule we put in place 
meant I had to work around it have code live in the repo but work on master and 
in 1.10.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow-on-k8s-operator] barney-s opened a new issue #8: Setup CI for the repo

2020-02-25 Thread GitBox
barney-s opened a new issue #8: Setup CI for the repo
URL: https://github.com/apache/airflow-on-k8s-operator/issues/8
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to session module

2020-02-25 Thread GitBox
ashb commented on issue #6938: [AIRFLOW-6382] Extract provide/create session to 
session module
URL: https://github.com/apache/airflow/pull/6938#issuecomment-591081255
 
 
   Why did we make this change?
   
   Neither the PR nor the linked Jira give _any_ justification for the change. 
It doesn't reduce any imports, and personally, I found the previous name 
_clearer_.
   
   Now seeing that code for the first time my question would be "what is it 
session?"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] tooptoop4 commented on issue #7169: [AIRFLOW-6555] mushroom cloud error when clicking 'task instance details' from graph view of dag that has never been run yet

2020-02-25 Thread GitBox
tooptoop4 commented on issue #7169: [AIRFLOW-6555] mushroom cloud error when 
clicking 'task instance details' from graph view of dag that has never been run 
yet
URL: https://github.com/apache/airflow/pull/7169#issuecomment-591075723
 
 
   @ashb are there any similar tests I could 'reuse' as a starting point ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-6918) don't use 'is' in if conditions comparing STATE

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6918:
--
Priority: Trivial  (was: Major)

> don't use 'is' in if conditions comparing STATE
> ---
>
> Key: AIRFLOW-6918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6918
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.9
>Reporter: t oo
>Assignee: t oo
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (AIRFLOW-6918) don't use 'is' in if conditions comparing STATE

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-6918 started by t oo.
-
> don't use 'is' in if conditions comparing STATE
> ---
>
> Key: AIRFLOW-6918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6918
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.9
>Reporter: t oo
>Assignee: t oo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] tooptoop4 opened a new pull request #7536: [AIRFLOW-6918] don't use 'is' in if conditions comparing STATE

2020-02-25 Thread GitBox
tooptoop4 opened a new pull request #7536: [AIRFLOW-6918] don't use 'is' in if 
conditions comparing STATE
URL: https://github.com/apache/airflow/pull/7536
 
 
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-6918) don't use 'is' in if conditions comparing STATE

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044890#comment-17044890
 ] 

ASF GitHub Bot commented on AIRFLOW-6918:
-

tooptoop4 commented on pull request #7536: [AIRFLOW-6918] don't use 'is' in if 
conditions comparing STATE
URL: https://github.com/apache/airflow/pull/7536
 
 
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-]`. AIRFLOW- = 
JIRA ID*
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   * For document-only changes commit message can start with 
`[AIRFLOW-]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> don't use 'is' in if conditions comparing STATE
> ---
>
> Key: AIRFLOW-6918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6918
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.9
>Reporter: t oo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6918) don't use 'is' in if conditions comparing STATE

2020-02-25 Thread t oo (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

t oo updated AIRFLOW-6918:
--
Summary: don't use 'is' in if conditions comparing STATE  (was: don't use 
is in if comparing state)

> don't use 'is' in if conditions comparing STATE
> ---
>
> Key: AIRFLOW-6918
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6918
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.9
>Reporter: t oo
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-6918) don't use is in if comparing state

2020-02-25 Thread t oo (Jira)
t oo created AIRFLOW-6918:
-

 Summary: don't use is in if comparing state
 Key: AIRFLOW-6918
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6918
 Project: Apache Airflow
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 1.10.9
Reporter: t oo






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044861#comment-17044861
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044868#comment-17044868
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044860#comment-17044860
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044866#comment-17044866
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044863#comment-17044863
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044867#comment-17044867
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044864#comment-17044864
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044862#comment-17044862
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044865#comment-17044865
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread Tomasz Urbaszek (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Urbaszek resolved AIRFLOW-6864.
--
Fix Version/s: 2.0.0
   Resolution: Done

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044859#comment-17044859
 ] 

ASF subversion and git services commented on AIRFLOW-6864:
--

Commit 311140616daafe496310d642e4164bc53fbd2ad2 in airflow's branch 
refs/heads/master from Tomek Urbaszek
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3111406 ]

[AIRFLOW-6864] Make airfow/jobs pylint compatible (#7484)

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

fixup! [AIRFLOW-6864] Make airfow/jobs pylint compatible

> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [airflow] nuclearpinguin merged pull request #7484: [AIRFLOW-6864] Make airflow/jobs pylint compatible

2020-02-25 Thread GitBox
nuclearpinguin merged pull request #7484: [AIRFLOW-6864] Make airflow/jobs 
pylint compatible
URL: https://github.com/apache/airflow/pull/7484
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-6864) Make airfow/jobs pylint compatible

2020-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044858#comment-17044858
 ] 

ASF GitHub Bot commented on AIRFLOW-6864:
-

nuclearpinguin commented on pull request #7484: [AIRFLOW-6864] Make 
airflow/jobs pylint compatible
URL: https://github.com/apache/airflow/pull/7484
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make airfow/jobs pylint compatible
> --
>
> Key: AIRFLOW-6864
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6864
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: pylint
>Affects Versions: 2.0.0
>Reporter: Tomasz Urbaszek
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6917) Remove Pickling

2020-02-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6917:

Labels: dag-serialization  (was: )

> Remove Pickling 
> 
>
> Key: AIRFLOW-6917
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6917
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: serialization
>Affects Versions: 2.0.0
>Reporter: Kaxil Naik
>Priority: Major
>  Labels: dag-serialization
> Fix For: 2.0.0
>
>
> We are going to use Serialized DAGs from DB. We no longer require picking 
> (this was already deprecated and only used when config is enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-6917) Remove Pickling

2020-02-25 Thread Kaxil Naik (Jira)
Kaxil Naik created AIRFLOW-6917:
---

 Summary: Remove Pickling 
 Key: AIRFLOW-6917
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6917
 Project: Apache Airflow
  Issue Type: Improvement
  Components: serialization
Affects Versions: 2.0.0
Reporter: Kaxil Naik
 Fix For: 2.0.0


We are going to use Serialized DAGs from DB. We no longer require picking (this 
was already deprecated and only used when config is enabled)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AIRFLOW-3831) Remove DagBag from /pickle_info

2020-02-25 Thread Kaxil Naik (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-3831.
-
Resolution: Fixed

> Remove DagBag from /pickle_info
> ---
>
> Key: AIRFLOW-3831
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3831
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Peter van 't Hof
>Assignee: Bas Harenslak
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >