[jira] [Created] (AIRFLOW-825) Add Dataflow semantics

2017-02-01 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-825:
--

 Summary: Add Dataflow semantics
 Key: AIRFLOW-825
 URL: https://issues.apache.org/jira/browse/AIRFLOW-825
 Project: Apache Airflow
  Issue Type: Improvement
  Components: Dataflow
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


Following discussion on the dev list, this adds first-class Dataflow semantics 
to Airflow. 

Please see my PR for examples and unit tests. From the documentation:

A Dataflow object represents the result of an upstream task. If the upstream
task has multiple outputs contained in a tuple, dict, or other indexable form,
an index may be provided so the Dataflow only uses the appropriate output.


Dataflows are passed to downstream tasks with a key. This has two effects:
1. It sets up a dependency between the upstream and downstream tasks to
   ensure that the downstream task does not run before the upstream result
   is available.
2. It ensures that the [indexed] upstream result is available in the
   downstream task's context as ``context['dataflows'][key]``. In addition,
   the result will be passed directly to PythonOperators as a keyword
   argument.

Dataflows use the XCom mechanism to exchange data. Data is passed through the
following series of steps:
1. After the upstream task runs, data is passed to the Dataflow object's
   _set_data() method.
2. The Dataflow's serialize() method is called on the data. This method
   takes the data object and returns a representation that can be used to
   reconstruct it later.
3. _set_data() stores the serialized result as an XCom.
4. Before the downstream task runs, it calls the Dataflow _get_data()
   method.
5. _get_data() retrieves the upstream XCom.
6. The Dataflow's deserialize() method is called. This method takes the
   serialiezd representation and returns the data object.
7. The data object is passed to the downstream task.

The basic Dataflow object has identity serialize and deserialize methods,
meaning data is stored directly in the Airflow database. Therefore, for
performance and practical reasons, basic Dataflows should not be used with
large or complex results.

Dataflows can easily be extended to use remote storage. In this case, the
serialize method should write the data in to storage and return a URI, which
will be stored as an XCom. The URI will be passed to deserialize() so that
the data can be downloaded and reconstructed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-825) Add Dataflow semantics

2017-02-01 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-825:
---
Issue Type: New Feature  (was: Improvement)

> Add Dataflow semantics
> --
>
> Key: AIRFLOW-825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-825
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: Dataflow
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> Following discussion on the dev list, this adds first-class Dataflow 
> semantics to Airflow. 
> Please see my PR for examples and unit tests. From the documentation:
> A Dataflow object represents the result of an upstream task. If the upstream
> task has multiple outputs contained in a tuple, dict, or other indexable form,
> an index may be provided so the Dataflow only uses the appropriate output.
> Dataflows are passed to downstream tasks with a key. This has two effects:
> 1. It sets up a dependency between the upstream and downstream tasks to
>ensure that the downstream task does not run before the upstream result
>is available.
> 2. It ensures that the [indexed] upstream result is available in the
>downstream task's context as ``context['dataflows'][key]``. In 
> addition,
>the result will be passed directly to PythonOperators as a keyword
>argument.
> Dataflows use the XCom mechanism to exchange data. Data is passed through the
> following series of steps:
> 1. After the upstream task runs, data is passed to the Dataflow object's
>_set_data() method.
> 2. The Dataflow's serialize() method is called on the data. This method
>takes the data object and returns a representation that can be used to
>reconstruct it later.
> 3. _set_data() stores the serialized result as an XCom.
> 4. Before the downstream task runs, it calls the Dataflow _get_data()
>method.
> 5. _get_data() retrieves the upstream XCom.
> 6. The Dataflow's deserialize() method is called. This method takes the
>serialiezd representation and returns the data object.
> 7. The data object is passed to the downstream task.
> The basic Dataflow object has identity serialize and deserialize methods,
> meaning data is stored directly in the Airflow database. Therefore, for
> performance and practical reasons, basic Dataflows should not be used with
> large or complex results.
> Dataflows can easily be extended to use remote storage. In this case, the
> serialize method should write the data in to storage and return a URI, which
> will be stored as an XCom. The URI will be passed to deserialize() so that
> the data can be downloaded and reconstructed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-825) Add Dataflow semantics

2017-02-01 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-825:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/2046
 Fix Version/s: 1.9.0

> Add Dataflow semantics
> --
>
> Key: AIRFLOW-825
> URL: https://issues.apache.org/jira/browse/AIRFLOW-825
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: Dataflow
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
> Fix For: 1.9.0
>
>
> Following discussion on the dev list, this adds first-class Dataflow 
> semantics to Airflow. 
> Please see my PR for examples and unit tests. From the documentation:
> A Dataflow object represents the result of an upstream task. If the upstream
> task has multiple outputs contained in a tuple, dict, or other indexable form,
> an index may be provided so the Dataflow only uses the appropriate output.
> Dataflows are passed to downstream tasks with a key. This has two effects:
> 1. It sets up a dependency between the upstream and downstream tasks to
>ensure that the downstream task does not run before the upstream result
>is available.
> 2. It ensures that the [indexed] upstream result is available in the
>downstream task's context as ``context['dataflows'][key]``. In 
> addition,
>the result will be passed directly to PythonOperators as a keyword
>argument.
> Dataflows use the XCom mechanism to exchange data. Data is passed through the
> following series of steps:
> 1. After the upstream task runs, data is passed to the Dataflow object's
>_set_data() method.
> 2. The Dataflow's serialize() method is called on the data. This method
>takes the data object and returns a representation that can be used to
>reconstruct it later.
> 3. _set_data() stores the serialized result as an XCom.
> 4. Before the downstream task runs, it calls the Dataflow _get_data()
>method.
> 5. _get_data() retrieves the upstream XCom.
> 6. The Dataflow's deserialize() method is called. This method takes the
>serialiezd representation and returns the data object.
> 7. The data object is passed to the downstream task.
> The basic Dataflow object has identity serialize and deserialize methods,
> meaning data is stored directly in the Airflow database. Therefore, for
> performance and practical reasons, basic Dataflows should not be used with
> large or complex results.
> Dataflows can easily be extended to use remote storage. In this case, the
> serialize method should write the data in to storage and return a URI, which
> will be stored as an XCom. The URI will be passed to deserialize() so that
> the data can be downloaded and reconstructed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-828) Add maximum size for XComs

2017-02-02 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-828:
--

 Summary: Add maximum size for XComs
 Key: AIRFLOW-828
 URL: https://issues.apache.org/jira/browse/AIRFLOW-828
 Project: Apache Airflow
  Issue Type: Improvement
  Components: xcom
Reporter: Jeremiah Lowin
 Fix For: 1.8.1


Adds a configurable maximum XCom size (default 20kb)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-829) Reduce verbosity of successful Travis unit tests

2017-02-02 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-829:
--

 Summary: Reduce verbosity of successful Travis unit tests
 Key: AIRFLOW-829
 URL: https://issues.apache.org/jira/browse/AIRFLOW-829
 Project: Apache Airflow
  Issue Type: Improvement
  Components: tests
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


Unit tests are run with the -s flag, so every single test prints every single 
debug log -- including multiple lines for every single dependency check of 
every single operator in every single backfill iteration. This means that the 
full test suite log is so large it can not be downloaded from Travis (4MB 
cap!!). It is difficult at best and impossible at worst to identify failed 
tests.

The -s flag should not be used with Travis. This way, successful tests will 
have their logging suppressed but failed tests will show captured log output, 
making identifying failures easy. Local testing can retain the -s flag by 
default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-830) Plugin manager should log to debug, not info

2017-02-02 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-830:
--

 Summary: Plugin manager should log to debug, not info
 Key: AIRFLOW-830
 URL: https://issues.apache.org/jira/browse/AIRFLOW-830
 Project: Apache Airflow
  Issue Type: Improvement
  Components: logging
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor


The plugin manager reports every single action as an info log, which is 
unnecessary and overly verbose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-829) Reduce verbosity of successful Travis unit tests

2017-02-02 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-829:
---
Description: 
Unit tests are run with the -s flag, so every single test prints every single 
debug log -- including multiple lines for every single dependency check of 
every single operator in every single backfill iteration. This means that the 
full test suite log is so large it can not be downloaded from Travis (4MB 
cap!!). It is difficult at best and impossible at worst to identify failed 
tests.

The -s flag should not be used with Travis. This way, successful tests will 
have their logging suppressed (or at least reduced) but failed tests will show 
captured log output, making identifying failures easy. Local testing can retain 
the -s flag by default.

Note that all task logs are still printed because of how those logs are 
presented in the Airflow log.

  was:
Unit tests are run with the -s flag, so every single test prints every single 
debug log -- including multiple lines for every single dependency check of 
every single operator in every single backfill iteration. This means that the 
full test suite log is so large it can not be downloaded from Travis (4MB 
cap!!). It is difficult at best and impossible at worst to identify failed 
tests.

The -s flag should not be used with Travis. This way, successful tests will 
have their logging suppressed but failed tests will show captured log output, 
making identifying failures easy. Local testing can retain the -s flag by 
default.


> Reduce verbosity of successful Travis unit tests
> 
>
> Key: AIRFLOW-829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-829
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> Unit tests are run with the -s flag, so every single test prints every single 
> debug log -- including multiple lines for every single dependency check of 
> every single operator in every single backfill iteration. This means that the 
> full test suite log is so large it can not be downloaded from Travis (4MB 
> cap!!). It is difficult at best and impossible at worst to identify failed 
> tests.
> The -s flag should not be used with Travis. This way, successful tests will 
> have their logging suppressed (or at least reduced) but failed tests will 
> show captured log output, making identifying failures easy. Local testing can 
> retain the -s flag by default.
> Note that all task logs are still printed because of how those logs are 
> presented in the Airflow log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-831) Fix broken unit tests

2017-02-02 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-831:
--

 Summary: Fix broken unit tests
 Key: AIRFLOW-831
 URL: https://issues.apache.org/jira/browse/AIRFLOW-831
 Project: Apache Airflow
  Issue Type: Bug
  Components: tests
Reporter: Jeremiah Lowin
Priority: Critical


AIRFLOW-794 (https://github.com/apache/incubator-airflow/pull/2013) removed an 
import statement that was required by the PR for AIRFLOW-780 
(https://github.com/apache/incubator-airflow/pull/2018). At the time 
AIRFLOW-794 was tested, the tests passed, but AIRFLOW-780 was merged prior to 
merging AIRFLOW-794. Restoring the import statement should fix the failing 
tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-831) Fix broken unit tests

2017-02-02 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin reassigned AIRFLOW-831:
--

Assignee: Jeremiah Lowin

> Fix broken unit tests
> -
>
> Key: AIRFLOW-831
> URL: https://issues.apache.org/jira/browse/AIRFLOW-831
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Critical
>
> AIRFLOW-794 (https://github.com/apache/incubator-airflow/pull/2013) removed 
> an import statement that was required by the PR for AIRFLOW-780 
> (https://github.com/apache/incubator-airflow/pull/2018). At the time 
> AIRFLOW-794 was tested, the tests passed, but AIRFLOW-780 was merged prior to 
> merging AIRFLOW-794. Restoring the import statement should fix the failing 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-780) The UI no longer shows broken DAGs

2017-02-02 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850131#comment-15850131
 ] 

Jeremiah Lowin commented on AIRFLOW-780:


The PR for this was merged, can the issue be closed?

> The UI no longer shows broken DAGs
> --
>
> Key: AIRFLOW-780
> URL: https://issues.apache.org/jira/browse/AIRFLOW-780
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>Priority: Critical
> Fix For: 1.8.1
>
>
> When a faulty dag is placed in the dags folder the UI would report a parsing 
> error. Now it doesn’t due to the separate parising (but not reporting back 
> errors)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-831) Fix broken unit tests

2017-02-02 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-831.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2050
[https://github.com/apache/incubator-airflow/pull/2050]

> Fix broken unit tests
> -
>
> Key: AIRFLOW-831
> URL: https://issues.apache.org/jira/browse/AIRFLOW-831
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Critical
> Fix For: 1.8.1
>
>
> AIRFLOW-794 (https://github.com/apache/incubator-airflow/pull/2013) removed 
> an import statement that was required by the PR for AIRFLOW-780 
> (https://github.com/apache/incubator-airflow/pull/2018). At the time 
> AIRFLOW-794 was tested, the tests passed, but AIRFLOW-780 was merged prior to 
> merging AIRFLOW-794. Restoring the import statement should fix the failing 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-830) Plugin manager should log to debug, not info

2017-02-05 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-830:
---
Fix Version/s: 1.8.1

> Plugin manager should log to debug, not info
> 
>
> Key: AIRFLOW-830
> URL: https://issues.apache.org/jira/browse/AIRFLOW-830
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> The plugin manager reports every single action as an info log, which is 
> unnecessary and overly verbose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-829) Reduce verbosity of successful Travis unit tests

2017-02-05 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-829:
---
Fix Version/s: 1.8.1

> Reduce verbosity of successful Travis unit tests
> 
>
> Key: AIRFLOW-829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-829
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
> Fix For: 1.8.1
>
>
> Unit tests are run with the -s flag, so every single test prints every single 
> debug log -- including multiple lines for every single dependency check of 
> every single operator in every single backfill iteration. This means that the 
> full test suite log is so large it can not be downloaded from Travis (4MB 
> cap!!). It is difficult at best and impossible at worst to identify failed 
> tests.
> The -s flag should not be used with Travis. This way, successful tests will 
> have their logging suppressed (or at least reduced) but failed tests will 
> show captured log output, making identifying failures easy. Local testing can 
> retain the -s flag by default.
> Note that all task logs are still printed because of how those logs are 
> presented in the Airflow log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-88) Improve clarity Travis CI reports

2017-02-05 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853234#comment-15853234
 ] 

Jeremiah Lowin commented on AIRFLOW-88:
---

See specific fixes in AIRFLOW-829 and AIRFLOW-830. Not sure if that covers the 
whole of this request but after those changes the output is much cleaner!

> Improve clarity Travis CI reports
> -
>
> Key: AIRFLOW-88
> URL: https://issues.apache.org/jira/browse/AIRFLOW-88
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Reporter: Amikam Snir
>Priority: Minor
>
> Make the report readable. It should be easier to find the failed tests/ 
> errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-830) Plugin manager should log to debug, not info

2017-02-05 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-830:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/2049

> Plugin manager should log to debug, not info
> 
>
> Key: AIRFLOW-830
> URL: https://issues.apache.org/jira/browse/AIRFLOW-830
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> The plugin manager reports every single action as an info log, which is 
> unnecessary and overly verbose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-829) Reduce verbosity of successful Travis unit tests

2017-02-05 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-829:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/2049

> Reduce verbosity of successful Travis unit tests
> 
>
> Key: AIRFLOW-829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-829
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
> Fix For: 1.8.1
>
>
> Unit tests are run with the -s flag, so every single test prints every single 
> debug log -- including multiple lines for every single dependency check of 
> every single operator in every single backfill iteration. This means that the 
> full test suite log is so large it can not be downloaded from Travis (4MB 
> cap!!). It is difficult at best and impossible at worst to identify failed 
> tests.
> The -s flag should not be used with Travis. This way, successful tests will 
> have their logging suppressed (or at least reduced) but failed tests will 
> show captured log output, making identifying failures easy. Local testing can 
> retain the -s flag by default.
> Note that all task logs are still printed because of how those logs are 
> presented in the Airflow log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-88) Improve clarity Travis CI reports

2017-02-08 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-88.
---
   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2049
[https://github.com/apache/incubator-airflow/pull/2049]

> Improve clarity Travis CI reports
> -
>
> Key: AIRFLOW-88
> URL: https://issues.apache.org/jira/browse/AIRFLOW-88
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Reporter: Amikam Snir
>Priority: Minor
> Fix For: 1.8.1
>
>
> Make the report readable. It should be easier to find the failed tests/ 
> errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-830) Plugin manager should log to debug, not info

2017-02-08 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-830.

Resolution: Fixed

Issue resolved by pull request #2049
[https://github.com/apache/incubator-airflow/pull/2049]

> Plugin manager should log to debug, not info
> 
>
> Key: AIRFLOW-830
> URL: https://issues.apache.org/jira/browse/AIRFLOW-830
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> The plugin manager reports every single action as an info log, which is 
> unnecessary and overly verbose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-829) Reduce verbosity of successful Travis unit tests

2017-02-08 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-829.

Resolution: Fixed

Issue resolved by pull request #2049
[https://github.com/apache/incubator-airflow/pull/2049]

> Reduce verbosity of successful Travis unit tests
> 
>
> Key: AIRFLOW-829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-829
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
> Fix For: 1.8.1
>
>
> Unit tests are run with the -s flag, so every single test prints every single 
> debug log -- including multiple lines for every single dependency check of 
> every single operator in every single backfill iteration. This means that the 
> full test suite log is so large it can not be downloaded from Travis (4MB 
> cap!!). It is difficult at best and impossible at worst to identify failed 
> tests.
> The -s flag should not be used with Travis. This way, successful tests will 
> have their logging suppressed (or at least reduced) but failed tests will 
> show captured log output, making identifying failures easy. Local testing can 
> retain the -s flag by default.
> Note that all task logs are still printed because of how those logs are 
> presented in the Airflow log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-862) Add DaskExecutor

2017-02-10 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-862:
--

 Summary: Add DaskExecutor
 Key: AIRFLOW-862
 URL: https://issues.apache.org/jira/browse/AIRFLOW-862
 Project: Apache Airflow
  Issue Type: New Feature
  Components: executor
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


The Dask Distributed sub-project makes it very easy to create pure-python 
clusters of Dask workers ranging from a personal laptop to thousands of 
networked cores. The workers can execute arbitrary functions submitted to the 
Dask scheduler node. A full Dask app would involve multiple tasks with 
data-dependencies (similar in philosophy to an Airflow DAG) but it will happily 
run single functions as well.

The DaskExecutor is configured by supplying the IP address of the Dask 
Scheduler. It submits Airflow commands to the cluster for execution (note: the 
cluster should have access to any Airflow dependencies, including Airflow 
itself!) and checks the resulting futures to see if the tasks completed 
successfully.

Some advantages of using Dask for parallel execution over LocalExecutor or 
CeleryExecutor are:
  - simple scaling, from local machines to remote clusters
  - pure python implementation (minimal dependencies and no need to run 
additional databases)
  - built in live-updating web UI for monitoring the cluster
  
** Note: This does NOT replace the Airflow scheduler or DAG engine with the 
analogous Dask versions; it just uses the Dask cluster to run Airflow tasks.







--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-853) ssh_execute_operator.py stdout decode default to ASCII

2017-02-10 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-853.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2060
[https://github.com/apache/incubator-airflow/pull/2060]

> ssh_execute_operator.py stdout decode default to ASCII
> --
>
> Key: AIRFLOW-853
> URL: https://issues.apache.org/jira/browse/AIRFLOW-853
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: Airflow 2.0
>Reporter: Ming Wu
>Assignee: Ming Wu
> Fix For: 1.8.1
>
>
> I'm running the tutorial example to define a pipeline, and when i ran this 
> command :
> $ sudo airflow test flowtest print_date 2016-03-11
> [2017-02-09 17:01:06,221] {models.py:1286} ERROR - 'ascii' codec can't decode 
> byte 0xe2 in position 79: ordinal not in range(128)
> Traceback (most recent call last):
>   File 
> "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/models.py", 
> line 1245, in run
> result = task_copy.execute(context=context)
>   File 
> "/opt/rh/python27/root/usr/lib/python2.7/site-packages/airflow/contrib/operators/ssh_execute_operator.py",
>  line 129, in execute
> line = line.decode().strip()
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 79: 
> ordinal not in range(128)
> Solution:
> the line.decode() should be used 'utf-8' encoding. the default is ascii



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-862) Add DaskExecutor

2017-02-10 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-862:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/2067

> Add DaskExecutor
> 
>
> Key: AIRFLOW-862
> URL: https://issues.apache.org/jira/browse/AIRFLOW-862
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: executor
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> The Dask Distributed sub-project makes it very easy to create pure-python 
> clusters of Dask workers ranging from a personal laptop to thousands of 
> networked cores. The workers can execute arbitrary functions submitted to the 
> Dask scheduler node. A full Dask app would involve multiple tasks with 
> data-dependencies (similar in philosophy to an Airflow DAG) but it will 
> happily run single functions as well.
> The DaskExecutor is configured by supplying the IP address of the Dask 
> Scheduler. It submits Airflow commands to the cluster for execution (note: 
> the cluster should have access to any Airflow dependencies, including Airflow 
> itself!) and checks the resulting futures to see if the tasks completed 
> successfully.
> Some advantages of using Dask for parallel execution over LocalExecutor or 
> CeleryExecutor are:
>   - simple scaling, from local machines to remote clusters
>   - pure python implementation (minimal dependencies and no need to run 
> additional databases)
>   - built in live-updating web UI for monitoring the cluster
>   
> ** Note: This does NOT replace the Airflow scheduler or DAG engine with the 
> analogous Dask versions; it just uses the Dask cluster to run Airflow tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-863) Example DAG start dates should be recent to avoid unnecessary backfills

2017-02-11 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-863:
--

 Summary: Example DAG start dates should be recent to avoid 
unnecessary backfills
 Key: AIRFLOW-863
 URL: https://issues.apache.org/jira/browse/AIRFLOW-863
 Project: Apache Airflow
  Issue Type: Improvement
  Components: examples
Affects Versions: 1.8.0rc3
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor
 Fix For: 1.8.1


Posted on dev mailing list:

"
2) when you install airflow, there are two new example DAGs
(last_task_only) which are going back very far in the past and scheduled to
run every hour - a bunch of dags triggered on the first start of scheduler
and hosed my CPU
"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-863) Example DAG start dates should be recent to avoid unnecessary backfills

2017-02-11 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-863:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/2068

> Example DAG start dates should be recent to avoid unnecessary backfills
> ---
>
> Key: AIRFLOW-863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-863
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: 1.8.0rc3
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> Posted on dev mailing list:
> "
> 2) when you install airflow, there are two new example DAGs
> (last_task_only) which are going back very far in the past and scheduled to
> run every hour - a bunch of dags triggered on the first start of scheduler
> and hosed my CPU
> "



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-863) Example DAG start dates should be recent to avoid unnecessary backfills

2017-02-12 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-863.

Resolution: Fixed

Issue resolved by pull request #2068
[https://github.com/apache/incubator-airflow/pull/2068]

> Example DAG start dates should be recent to avoid unnecessary backfills
> ---
>
> Key: AIRFLOW-863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-863
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: 1.8.0rc3
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> Posted on dev mailing list:
> "
> 2) when you install airflow, there are two new example DAGs
> (last_task_only) which are going back very far in the past and scheduled to
> run every hour - a bunch of dags triggered on the first start of scheduler
> and hosed my CPU
> "



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-858) Configurable database name for DB operators

2017-02-12 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-858.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2063
[https://github.com/apache/incubator-airflow/pull/2063]

> Configurable database name for DB operators
> ---
>
> Key: AIRFLOW-858
> URL: https://issues.apache.org/jira/browse/AIRFLOW-858
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ján Koščo
>Assignee: Ján Koščo
>Priority: Minor
> Fix For: 1.8.1
>
>
> As user I want to overwrite database name for simple DB operators to reuse 
> single connection configuration.
> Related Operators: PostgresOperator, MySqlOperator, MsSqlOperator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-862) Add DaskExecutor

2017-02-12 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-862.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2067
[https://github.com/apache/incubator-airflow/pull/2067]

> Add DaskExecutor
> 
>
> Key: AIRFLOW-862
> URL: https://issues.apache.org/jira/browse/AIRFLOW-862
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: executor
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
> Fix For: 1.8.1
>
>
> The Dask Distributed sub-project makes it very easy to create pure-python 
> clusters of Dask workers ranging from a personal laptop to thousands of 
> networked cores. The workers can execute arbitrary functions submitted to the 
> Dask scheduler node. A full Dask app would involve multiple tasks with 
> data-dependencies (similar in philosophy to an Airflow DAG) but it will 
> happily run single functions as well.
> The DaskExecutor is configured by supplying the IP address of the Dask 
> Scheduler. It submits Airflow commands to the cluster for execution (note: 
> the cluster should have access to any Airflow dependencies, including Airflow 
> itself!) and checks the resulting futures to see if the tasks completed 
> successfully.
> Some advantages of using Dask for parallel execution over LocalExecutor or 
> CeleryExecutor are:
>   - simple scaling, from local machines to remote clusters
>   - pure python implementation (minimal dependencies and no need to run 
> additional databases)
>   - built in live-updating web UI for monitoring the cluster
>   
> ** Note: This does NOT replace the Airflow scheduler or DAG engine with the 
> analogous Dask versions; it just uses the Dask cluster to run Airflow tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-832) Fix debug server

2017-02-12 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-832.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2051
[https://github.com/apache/incubator-airflow/pull/2051]

> Fix debug server
> 
>
> Key: AIRFLOW-832
> URL: https://issues.apache.org/jira/browse/AIRFLOW-832
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: George Sakkis
>Assignee: George Sakkis
> Fix For: 1.8.1
>
>
> Running the Flask webserver ({{airflow webserver --debug}}) requires SSL 
> which makes it effectively broken on localhost.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (AIRFLOW-862) Add DaskExecutor

2017-02-13 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin reopened AIRFLOW-862:


Reopened pending a fix to some unit tests that aren't running

> Add DaskExecutor
> 
>
> Key: AIRFLOW-862
> URL: https://issues.apache.org/jira/browse/AIRFLOW-862
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: executor
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
> Fix For: 1.8.1
>
>
> The Dask Distributed sub-project makes it very easy to create pure-python 
> clusters of Dask workers ranging from a personal laptop to thousands of 
> networked cores. The workers can execute arbitrary functions submitted to the 
> Dask scheduler node. A full Dask app would involve multiple tasks with 
> data-dependencies (similar in philosophy to an Airflow DAG) but it will 
> happily run single functions as well.
> The DaskExecutor is configured by supplying the IP address of the Dask 
> Scheduler. It submits Airflow commands to the cluster for execution (note: 
> the cluster should have access to any Airflow dependencies, including Airflow 
> itself!) and checks the resulting futures to see if the tasks completed 
> successfully.
> Some advantages of using Dask for parallel execution over LocalExecutor or 
> CeleryExecutor are:
>   - simple scaling, from local machines to remote clusters
>   - pure python implementation (minimal dependencies and no need to run 
> additional databases)
>   - built in live-updating web UI for monitoring the cluster
>   
> ** Note: This does NOT replace the Airflow scheduler or DAG engine with the 
> analogous Dask versions; it just uses the Dask cluster to run Airflow tasks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-842) scheduler.clean_dirty raises warning: SAWarning: The IN-predicate on "dag_run.dag_id" was invoked with an empty sequence.

2017-02-13 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-842.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2072
[https://github.com/apache/incubator-airflow/pull/2072]

> scheduler.clean_dirty raises warning: SAWarning: The IN-predicate on 
> "dag_run.dag_id" was invoked with an empty sequence.
> -
>
> Key: AIRFLOW-842
> URL: https://issues.apache.org/jira/browse/AIRFLOW-842
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8, 1.8.0b5
>Reporter: Marek Baczyński
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.8.1
>
>
> sqlalchemy/sql/default_comparator.py:161: SAWarning: The IN-predicate on 
> "dag_run.dag_id" was invoked with an empty sequence. This results in a 
> contradiction, which nonetheless can be expensive to evaluate.  Consider 
> alternative strategies for improved performance.
> {noformat}
> qry = (
> session.query(DagRun.dag_id, DagRun.state, func.count('*'))
> .filter(DagRun.dag_id.in_(dirty_ids))
> .group_by(DagRun.dag_id, DagRun.state)
> )
> {noformat}
> dirty_ids can be empty here, which means there's no point in running this 
> part of the code at all.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-875) Allow HttpSensor params to be templated

2017-02-13 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-875:
--

 Summary: Allow HttpSensor params to be templated
 Key: AIRFLOW-875
 URL: https://issues.apache.org/jira/browse/AIRFLOW-875
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators
Affects Versions: 1.8.0
Reporter: Jeremiah Lowin
Priority: Minor
 Fix For: 1.8.1


Unlike SimpleHttpOperator, HttpSensor's parameters aren't templated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-882) Code example in docs has unnecessary DAG>>Operator assignment

2017-02-18 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-882.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2088
[https://github.com/apache/incubator-airflow/pull/2088]

> Code example in docs has unnecessary DAG>>Operator assignment
> -
>
> Key: AIRFLOW-882
> URL: https://issues.apache.org/jira/browse/AIRFLOW-882
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
> Fix For: 1.8.1
>
>
> The docs currently say:
> {code}
> We can put this all together to build a simple pipeline:
> with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
> (
> dag
> >> DummyOperator(task_id='dummy_1')
> >> BashOperator(
> task_id='bash_1',
> bash_command='echo "HELLO!"')
> >> PythonOperator(
> task_id='python_1',
> python_callable=lambda: print("GOODBYE!"))
> )
> {code}
> But the {{dag >> ...}} is unnecessary because the operators are already 
> initialized with the proper DAG 
> (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/models.py#L1699).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-871) multiple places use logging.warn() instead of warning()

2017-02-18 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-871.

   Resolution: Fixed
Fix Version/s: 1.8.1

Issue resolved by pull request #2082
[https://github.com/apache/incubator-airflow/pull/2082]

> multiple places use logging.warn() instead of warning()
> ---
>
> Key: AIRFLOW-871
> URL: https://issues.apache.org/jira/browse/AIRFLOW-871
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 2.0, Airflow 1.8, 1.8.1, 1.8.0
>Reporter: Marek Baczyński
>Priority: Trivial
>  Labels: easyfix
> Fix For: 1.8.1
>
>
> This causes the following warning to be raised:
> airflow/airflow/utils/dag_processing.py:578: DeprecationWarning: The 'warn'
> method is deprecated, use 'warning' instead



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-886) Pass Operator result to post_execute hook

2017-02-18 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-886:
--

 Summary: Pass Operator result to post_execute hook
 Key: AIRFLOW-886
 URL: https://issues.apache.org/jira/browse/AIRFLOW-886
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


Airflow Operators have two customizable hooks: pre_execute() and 
post_execute(), called before and after the operator's execute() method. Both 
are passed the execution context; in addition, the post_execute() hook should 
receive the value returned by the Operator (if any).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-887) Add compatibility with future v0.16

2017-02-18 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-887:
--

 Summary: Add compatibility with future v0.16 
 Key: AIRFLOW-887
 URL: https://issues.apache.org/jira/browse/AIRFLOW-887
 Project: Apache Airflow
  Issue Type: Improvement
  Components: dependencies
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor


Currently Airflow requires future 0.15 >= version > 0.16. Future 0.16 has been 
out for some time and is compatible with Airflow.

http://python-future.org/whatsnew.html#what-s-new-in-version-0-16-0-2016-10-27



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-887) Add compatibility with future v0.16

2017-02-18 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-887:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/2091

> Add compatibility with future v0.16 
> 
>
> Key: AIRFLOW-887
> URL: https://issues.apache.org/jira/browse/AIRFLOW-887
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
>
> Currently Airflow requires future 0.15 >= version > 0.16. Future 0.16 has 
> been out for some time and is compatible with Airflow.
> http://python-future.org/whatsnew.html#what-s-new-in-version-0-16-0-2016-10-27



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-886) Pass Operator result to post_execute hook

2017-02-18 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-886:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/2091

> Pass Operator result to post_execute hook
> -
>
> Key: AIRFLOW-886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-886
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> Airflow Operators have two customizable hooks: pre_execute() and 
> post_execute(), called before and after the operator's execute() method. Both 
> are passed the execution context; in addition, the post_execute() hook should 
> receive the value returned by the Operator (if any).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-829) Reduce verbosity of successful Travis unit tests

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-829:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Reduce verbosity of successful Travis unit tests
> 
>
> Key: AIRFLOW-829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-829
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
> Fix For: 1.9.0
>
>
> Unit tests are run with the -s flag, so every single test prints every single 
> debug log -- including multiple lines for every single dependency check of 
> every single operator in every single backfill iteration. This means that the 
> full test suite log is so large it can not be downloaded from Travis (4MB 
> cap!!). It is difficult at best and impossible at worst to identify failed 
> tests.
> The -s flag should not be used with Travis. This way, successful tests will 
> have their logging suppressed (or at least reduced) but failed tests will 
> show captured log output, making identifying failures easy. Local testing can 
> retain the -s flag by default.
> Note that all task logs are still printed because of how those logs are 
> presented in the Airflow log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-863) Example DAG start dates should be recent to avoid unnecessary backfills

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-863:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Example DAG start dates should be recent to avoid unnecessary backfills
> ---
>
> Key: AIRFLOW-863
> URL: https://issues.apache.org/jira/browse/AIRFLOW-863
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: 1.8.0rc3
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> Posted on dev mailing list:
> "
> 2) when you install airflow, there are two new example DAGs
> (last_task_only) which are going back very far in the past and scheduled to
> run every hour - a bunch of dags triggered on the first start of scheduler
> and hosed my CPU
> "



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-830) Plugin manager should log to debug, not info

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-830:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Plugin manager should log to debug, not info
> 
>
> Key: AIRFLOW-830
> URL: https://issues.apache.org/jira/browse/AIRFLOW-830
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> The plugin manager reports every single action as an info log, which is 
> unnecessary and overly verbose.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-831) Fix broken unit tests

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-831:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Fix broken unit tests
> -
>
> Key: AIRFLOW-831
> URL: https://issues.apache.org/jira/browse/AIRFLOW-831
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Critical
> Fix For: 1.9.0
>
>
> AIRFLOW-794 (https://github.com/apache/incubator-airflow/pull/2013) removed 
> an import statement that was required by the PR for AIRFLOW-780 
> (https://github.com/apache/incubator-airflow/pull/2018). At the time 
> AIRFLOW-794 was tested, the tests passed, but AIRFLOW-780 was merged prior to 
> merging AIRFLOW-794. Restoring the import statement should fix the failing 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-822) Close the connection before throwing exception in BaseHook

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-822:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Close the connection before throwing exception in BaseHook
> --
>
> Key: AIRFLOW-822
> URL: https://issues.apache.org/jira/browse/AIRFLOW-822
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Reporter: Fokko Driesprong
> Fix For: 1.9.0
>
>
> Hi Guys,
> The basehook contains functionality to retrieve connections from the 
> database. If a connection doesn't exists it will throw an exception. This 
> exception will be thrown before the connection to the database is closed. 
> Therefore the session to the db might stay open.
> Cheers, Fokko



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-821) Scheduler dagbag importing not Py3 compatible

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-821:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Scheduler dagbag importing not Py3 compatible
> -
>
> Key: AIRFLOW-821
> URL: https://issues.apache.org/jira/browse/AIRFLOW-821
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.8.0b5
> Environment: Python 3.4.4
>Reporter: Szymon Matejczyk
>Priority: Blocker
> Fix For: 1.9.0
>
>
> Function {{update_import_errors}} in scheduler 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L694)
>  in not Py3 compatible (using {{iteritems}} instead of {{items}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-828) Add maximum size for XComs

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-828:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Add maximum size for XComs
> --
>
> Key: AIRFLOW-828
> URL: https://issues.apache.org/jira/browse/AIRFLOW-828
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: xcom
>Reporter: Jeremiah Lowin
> Fix For: 1.9.0
>
>
> Adds a configurable maximum XCom size (default 20kb)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-871) multiple places use logging.warn() instead of warning()

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-871:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> multiple places use logging.warn() instead of warning()
> ---
>
> Key: AIRFLOW-871
> URL: https://issues.apache.org/jira/browse/AIRFLOW-871
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 2.0, Airflow 1.8, 1.8.1, 1.8.0
>Reporter: Marek Baczyński
>Priority: Trivial
>  Labels: easyfix
> Fix For: 1.9.0
>
>
> This causes the following warning to be raised:
> airflow/airflow/utils/dag_processing.py:578: DeprecationWarning: The 'warn'
> method is deprecated, use 'warning' instead



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-842) scheduler.clean_dirty raises warning: SAWarning: The IN-predicate on "dag_run.dag_id" was invoked with an empty sequence.

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-842:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> scheduler.clean_dirty raises warning: SAWarning: The IN-predicate on 
> "dag_run.dag_id" was invoked with an empty sequence.
> -
>
> Key: AIRFLOW-842
> URL: https://issues.apache.org/jira/browse/AIRFLOW-842
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8, 1.8.0b5
>Reporter: Marek Baczyński
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.9.0
>
>
> sqlalchemy/sql/default_comparator.py:161: SAWarning: The IN-predicate on 
> "dag_run.dag_id" was invoked with an empty sequence. This results in a 
> contradiction, which nonetheless can be expensive to evaluate.  Consider 
> alternative strategies for improved performance.
> {noformat}
> qry = (
> session.query(DagRun.dag_id, DagRun.state, func.count('*'))
> .filter(DagRun.dag_id.in_(dirty_ids))
> .group_by(DagRun.dag_id, DagRun.state)
> )
> {noformat}
> dirty_ids can be empty here, which means there's no point in running this 
> part of the code at all.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-882) Code example in docs has unnecessary DAG>>Operator assignment

2017-02-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-882:
---
Fix Version/s: (was: 1.8.1)
   1.9.0

> Code example in docs has unnecessary DAG>>Operator assignment
> -
>
> Key: AIRFLOW-882
> URL: https://issues.apache.org/jira/browse/AIRFLOW-882
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
> Fix For: 1.9.0
>
>
> The docs currently say:
> {code}
> We can put this all together to build a simple pipeline:
> with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
> (
> dag
> >> DummyOperator(task_id='dummy_1')
> >> BashOperator(
> task_id='bash_1',
> bash_command='echo "HELLO!"')
> >> PythonOperator(
> task_id='python_1',
> python_callable=lambda: print("GOODBYE!"))
> )
> {code}
> But the {{dag >> ...}} is unnecessary because the operators are already 
> initialized with the proper DAG 
> (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/models.py#L1699).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-888) Operators should not push XComs by default

2017-02-19 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-888:
--

 Summary: Operators should not push XComs by default
 Key: AIRFLOW-888
 URL: https://issues.apache.org/jira/browse/AIRFLOW-888
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators, xcom
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
 Fix For: 1.9.0


Currently, Airflow pushes an XCom every time an Operator returns a result. This 
behavior is overeager and potentially taxing on the database (since there are 
no restrictions on what an Operator can return).

Some users may rely on this behavior, so it can still be enabled in airflow.cfg.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-889) Minor error in the docstrings for BaseOperator.

2017-02-20 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-889.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2084
[https://github.com/apache/incubator-airflow/pull/2084]

> Minor error in the docstrings for BaseOperator. 
> 
>
> Key: AIRFLOW-889
> URL: https://issues.apache.org/jira/browse/AIRFLOW-889
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Ketan Bhatt
>Assignee: Ketan Bhatt
>Priority: Trivial
> Fix For: 1.9.0
>
>
> There is a minor error in the docstrings for BaseOperator.
> At one place it says:
> "Operators derived from this task should perform or trigger certain tasks".
> This has to be changed with:
> "Operators derived from this class should perform or trigger certain tasks"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-915) Allow task context to be modified by pre_execute hook

2017-02-25 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-915:
--

 Summary: Allow task context to be modified by pre_execute hook
 Key: AIRFLOW-915
 URL: https://issues.apache.org/jira/browse/AIRFLOW-915
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators
Affects Versions: 1.8.0
Reporter: Jeremiah Lowin
Priority: Minor


Currently, Operators do two things that prevent the pre_execute hook from 
modifying the jinja context:

1. the Operator templates are rendered immediately BEFORE calling pre_execute
2. even though a context was already generated for the operator, the context is 
regenerated for template rendering, meaning that modifications to the operator 
context wouldn't matter anyway.

The proper course of events should be:
1. generate operator context
2. pass context to pre_execute where it could (potentially) be modified
3. use that context to render operator templates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-916) Fix ConfigParser deprecation warning

2017-02-26 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-916:
--

 Summary: Fix ConfigParser deprecation warning 
 Key: AIRFLOW-916
 URL: https://issues.apache.org/jira/browse/AIRFLOW-916
 Project: Apache Airflow
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Trivial


ConfigParser.readfp() is deprecated in favor of ConfigParser.read_file(), 
according to warning messages



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-916) Fix ConfigParser deprecation warning

2017-02-26 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-916.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2108
[https://github.com/apache/incubator-airflow/pull/2108]

> Fix ConfigParser deprecation warning 
> -
>
> Key: AIRFLOW-916
> URL: https://issues.apache.org/jira/browse/AIRFLOW-916
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Trivial
> Fix For: 1.9.0
>
>
> ConfigParser.readfp() is deprecated in favor of ConfigParser.read_file(), 
> according to warning messages



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-725) Make merge tool use OS' keyring for password storage

2017-02-26 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-725.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #1966
[https://github.com/apache/incubator-airflow/pull/1966]

> Make merge tool use OS' keyring for password storage
> 
>
> Key: AIRFLOW-725
> URL: https://issues.apache.org/jira/browse/AIRFLOW-725
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection

2017-03-13 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-770.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2056
[https://github.com/apache/incubator-airflow/pull/2056]

> HDFS hooks should support alternative ways of getting connection
> 
>
> Key: AIRFLOW-770
> URL: https://issues.apache.org/jira/browse/AIRFLOW-770
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
> Fix For: 1.9.0
>
>
> The HDFS hook currently uses {{get_connections()}} instead of 
> {{get_connection()}} to grab the connection info. I believe this is so if 
> multiple connections are specified, instead of choosing them at random, it 
> appropriately passes them all via snakebite's HAClient.
> As far as I can tell, this means connection info can't be set outside of the 
> UI, since environment variables are not looked at (which had me confused for 
> a bit). I think ideally we'd want to be able to do so for the three different 
> snakebite clients. Here are some possible suggestions for allowing this:
> * AutoConfigClient: add attribute like {{HDFSHook(..., 
> autoconfig=True).get_conn()}}
> * Client: specify single URI in environment variable
> * HAClient: specify multiple URIs in environment variable, separated by 
> commas? Not very adhering to standard and if we did this, we'd probably want 
> to support this across all hooks.
> WebHDFS hook has a similar issue with pulling from env.
> references:
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-917) Incorrectly formatted failure status message

2017-03-13 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-917.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2109
[https://github.com/apache/incubator-airflow/pull/2109]

> Incorrectly formatted failure status message
> 
>
> Key: AIRFLOW-917
> URL: https://issues.apache.org/jira/browse/AIRFLOW-917
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Vijay Krishna Ramesh
>Priority: Trivial
> Fix For: 1.9.0
>
>
> The formatting of the error message at 
> https://github.com/apache/incubator-airflow/blob/master/airflow/ti_deps/deps/dag_ti_slots_available_dep.py#L27
>  is incorrect.
> It logs like:
> {code}
> {models.py:1122} INFO - Dependencies not met for  etl_queries_v3.subscriptions_query 2017-02-25 07:00:00 [queued]>, dependency 
> 'Task Instance Slots Available' FAILED: The maximum number of running tasks 
> (etl_queries_v3) for this task's DAG '2' has been reached.
> [2017-02-26 07:25:46,141] {jobs.py:2062} INFO - Task exited with return code 0
> {code}
> with the num tasks and dag id mixed up. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-979) Add GovTech GDS

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-979.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2149
[https://github.com/apache/incubator-airflow/pull/2149]

> Add GovTech GDS
> ---
>
> Key: AIRFLOW-979
> URL: https://issues.apache.org/jira/browse/AIRFLOW-979
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: docs
>Reporter: Chris Sng
>Assignee: Chris Sng
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.9.0
>
>
> Add to README.md:
> ```
> 1. [GovTech GDS](https://gds-gov.tech) 
> [[@chrissng](https://github.com/chrissng) & 
> [@datagovsg](https://github.com/datagovsg)]
> ```



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-903) Add configuration setting for default DAG view.

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-903.

   Resolution: Fixed
Fix Version/s: (was: Airflow 1.8)
   1.9.0

Issue resolved by pull request #2103
[https://github.com/apache/incubator-airflow/pull/2103]

> Add configuration setting for default DAG view.
> ---
>
> Key: AIRFLOW-903
> URL: https://issues.apache.org/jira/browse/AIRFLOW-903
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.8
>Reporter: Jason Kromm
>Assignee: Jason Kromm
>Priority: Minor
> Fix For: 1.9.0
>
>
> The default view when clicking on a DAG used to be graph view, it is now tree 
> view instead.  There should be a configuration settings of default_dag_view = 
> ['tree','graph','duration','gant','landing_times']



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin reassigned AIRFLOW-883:
--

Assignee: Jeremiah Lowin

> Assigning operator to DAG via bitwise composition does not pickup default args
> --
>
> Key: AIRFLOW-883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-883
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Reporter: Daniel Huang
>Assignee: Jeremiah Lowin
>Priority: Minor
>
> This is only the case when the operator does not specify {{dag=dag}} and is 
> not initialized within a DAG's context manager (due to 
> https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)
> Example:
> {code}
> default_args = {
> 'owner': 'airflow', 
> 'start_date': datetime(2017, 2, 1)
> }
> dag = DAG('my_dag', default_args=default_args)
> dummy = DummyOperator(task_id='dummy')
> dag >> dummy
> {code}
> This will raise a {{Task is missing the start_date parameter}}. I _think_ 
> this should probably be allowed because I assume the purpose of supporting 
> {{dag >> op}} was to allow delayed assignment of an operator to a DAG. 
> I believe to fix this, on assignment, we would need to go back and go through 
> dag.default_args to see if any of those attrs weren't explicitly set on 
> task...not the cleanest. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2017-03-15 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926575#comment-15926575
 ] 

Jeremiah Lowin commented on AIRFLOW-883:


I'm not totally sure this is a "bug" per se, though it is confusing. 

"default_args" are arguments that are passed to Operators by the parent DAG. 
Critically, that happens when the Operators are created. While bitshift 
operators allow deferred DAG assignment, the Operator in question has already 
been created. The reason the distinction matters is that the Operator's 
__init__ may include logic related to its arguments. If we pass/assign those 
arguments after initialization, the logic won't run. 

However, if we do want to tackle this:
1. The simplest thing would be to walk "default_args" and replace any matching 
Operator attributes that are None.
2. The more proper thing would be to defer Operator initialization until it is 
added to a DAG. This would require a bit of a refactor though.




> Assigning operator to DAG via bitwise composition does not pickup default args
> --
>
> Key: AIRFLOW-883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-883
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Reporter: Daniel Huang
>Assignee: Jeremiah Lowin
>Priority: Minor
>
> This is only the case when the operator does not specify {{dag=dag}} and is 
> not initialized within a DAG's context manager (due to 
> https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)
> Example:
> {code}
> default_args = {
> 'owner': 'airflow', 
> 'start_date': datetime(2017, 2, 1)
> }
> dag = DAG('my_dag', default_args=default_args)
> dummy = DummyOperator(task_id='dummy')
> dag >> dummy
> {code}
> This will raise a {{Task is missing the start_date parameter}}. I _think_ 
> this should probably be allowed because I assume the purpose of supporting 
> {{dag >> op}} was to allow delayed assignment of an operator to a DAG. 
> I believe to fix this, on assignment, we would need to go back and go through 
> dag.default_args to see if any of those attrs weren't explicitly set on 
> task...not the cleanest. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-993:
--

 Summary: Dags should modify the start date and end date of tasks 
when they are added
 Key: AIRFLOW-993
 URL: https://issues.apache.org/jira/browse/AIRFLOW-993
 Project: Apache Airflow
  Issue Type: Bug
  Components: DAG
Affects Versions: 1.8.0
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor
 Fix For: 1.8.1


When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}

Furthermore, it would make sense for the DAG to set the task start_date as the 
later of the task's start date and its own start date; or the earlier for 
end_date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-993:
---
Description: 
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}


  was:
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}

Furthermore, it would make sense for the DAG to set the task start_date as the 
later of the task's start date and its own start date; or the earlier for 
end_date.


> Dags should modify the start date and end date of tasks when they are added
> ---
>
> Key: AIRFLOW-993
> URL: https://issues.apache.org/jira/browse/AIRFLOW-993
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
> it doesn't, the DAG sets it to its own start date. This isn't done for 
> end_date, but it should be.
> Otherwise, this simple code leads to a surprising failure as the backfill 
> tries to run the task every day, even though the DAG clearly has an end date 
> set.
> {code}
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> import datetime
> dt = datetime.datetime(2017, 1, 1)
> with DAG('test', start_date=dt, end_date=dt) as dag:
> op = DummyOperator(task_id='dummy')
> op.run()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-993:
---
Description: 
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

Furthermore, it may make sense for the task start date to always be the later 
of the task start date and the dag start date; similarly for the end date (but 
using the earlier date)
with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}


  was:
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}



> Dags should modify the start date and end date of tasks when they are added
> ---
>
> Key: AIRFLOW-993
> URL: https://issues.apache.org/jira/browse/AIRFLOW-993
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
> it doesn't, the DAG sets it to its own start date. This isn't done for 
> end_date, but it should be.
> Otherwise, this simple code leads to a surprising failure as the backfill 
> tries to run the task every day, even though the DAG clearly has an end date 
> set.
> {code}
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> import datetime
> dt = datetime.datetime(2017, 1, 1)
> Furthermore, it may make sense for the task start date to always be the later 
> of the task start date and the dag start date; similarly for the end date 
> (but using the earlier date)
> with DAG('test', start_date=dt, end_date=dt) as dag:
> op = DummyOperator(task_id='dummy')
> op.run()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-993) Dags should modify the start date and end date of tasks when they are added

2017-03-15 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-993:
---
Description: 
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}

Furthermore, it may make sense for the task start date to always be the later 
of the task start date and the dag start date; similarly for the end date (but 
using the earlier date)


  was:
When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
it doesn't, the DAG sets it to its own start date. This isn't done for 
end_date, but it should be.

Otherwise, this simple code leads to a surprising failure as the backfill tries 
to run the task every day, even though the DAG clearly has an end date set.

{code}

from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator
import datetime

dt = datetime.datetime(2017, 1, 1)

Furthermore, it may make sense for the task start date to always be the later 
of the task start date and the dag start date; similarly for the end date (but 
using the earlier date)
with DAG('test', start_date=dt, end_date=dt) as dag:
op = DummyOperator(task_id='dummy')

op.run()
{code}



> Dags should modify the start date and end date of tasks when they are added
> ---
>
> Key: AIRFLOW-993
> URL: https://issues.apache.org/jira/browse/AIRFLOW-993
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.8.1
>
>
> When tasks are added to DAGs, the DAG checks if the task has a start_date. If 
> it doesn't, the DAG sets it to its own start date. This isn't done for 
> end_date, but it should be.
> Otherwise, this simple code leads to a surprising failure as the backfill 
> tries to run the task every day, even though the DAG clearly has an end date 
> set.
> {code}
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> import datetime
> dt = datetime.datetime(2017, 1, 1)
> with DAG('test', start_date=dt, end_date=dt) as dag:
> op = DummyOperator(task_id='dummy')
> op.run()
> {code}
> Furthermore, it may make sense for the task start date to always be the later 
> of the task start date and the dag start date; similarly for the end date 
> (but using the earlier date)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-995) Update Github PR template

2017-03-16 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-995:
--

 Summary: Update Github PR template
 Key: AIRFLOW-995
 URL: https://issues.apache.org/jira/browse/AIRFLOW-995
 Project: Apache Airflow
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor


The Github PR template looks great when rendered but is harder to parse while 
in editing mode (which is how all PR authors initially see it). A new template 
would be clear whether editing or previewing and include checkboxes for force 
some user acknowledgement of each required step.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-995) Update Github PR template

2017-03-16 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-995.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2160
[https://github.com/apache/incubator-airflow/pull/2160]

> Update Github PR template
> -
>
> Key: AIRFLOW-995
> URL: https://issues.apache.org/jira/browse/AIRFLOW-995
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> The Github PR template looks great when rendered but is harder to parse while 
> in editing mode (which is how all PR authors initially see it). A new 
> template would be clear whether editing or previewing and include checkboxes 
> for force some user acknowledgement of each required step.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-994) Add MiNODES to the AIRFLOW Active Users List

2017-03-16 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-994.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2159
[https://github.com/apache/incubator-airflow/pull/2159]

> Add MiNODES to the AIRFLOW Active Users List
> 
>
> Key: AIRFLOW-994
> URL: https://issues.apache.org/jira/browse/AIRFLOW-994
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Alexander Mueller
>Assignee: Alexander Mueller
>Priority: Trivial
> Fix For: 1.9.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Please add MiNODES to the Active Users list: Please use their Website 
> https://www.minodes.com and their airflow users '@dice89' and 'diazcelsa'.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-984) Subdags unrecognized when subclassing SubDagOperator

2017-03-16 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-984.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2152
[https://github.com/apache/incubator-airflow/pull/2152]

> Subdags unrecognized when subclassing SubDagOperator
> 
>
> Key: AIRFLOW-984
> URL: https://issues.apache.org/jira/browse/AIRFLOW-984
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Patrick McKenna
>Priority: Minor
> Fix For: 1.9.0
>
>
> If a user subclasses SubDagOperator, the parent DAG will not pick up the 
> subdags:
> https://github.com/apache/incubator-airflow/blob/c44e2009ee625ce4a82c50e585a3c8617d9b4ff8/airflow/models.py#L2974
> which means a DagBag won't find them:
> https://github.com/apache/incubator-airflow/blob/c44e2009ee625ce4a82c50e585a3c8617d9b4ff8/airflow/models.py#L311
> https://github.com/apache/incubator-airflow/blob/c44e2009ee625ce4a82c50e585a3c8617d9b4ff8/airflow/models.py#L365
> This PR appears to be the cause: 
> https://github.com/apache/incubator-airflow/pull/1196/files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-969) Catch bad python_callable argument at DAG construction rather than Task run

2017-03-16 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-969.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2142
[https://github.com/apache/incubator-airflow/pull/2142]

> Catch bad python_callable argument at DAG construction rather than Task run
> ---
>
> Key: AIRFLOW-969
> URL: https://issues.apache.org/jira/browse/AIRFLOW-969
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Adam Bloomston
>Assignee: Adam Bloomston
>Priority: Minor
> Fix For: 1.9.0
>
>
> If a non-callable parameter for python_callable is passed to PythonOperator, 
> it should fail to instantiate.  This will move such failures from task run to 
> DAG instantiation. Better to catch such errors sooner rather than later in 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-990) DockerOperator fails when logging unicode string

2017-03-16 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-990.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2155
[https://github.com/apache/incubator-airflow/pull/2155]

> DockerOperator fails when logging unicode string
> 
>
> Key: AIRFLOW-990
> URL: https://issues.apache.org/jira/browse/AIRFLOW-990
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: Airflow 1.7.1
> Environment: Python 2.7
>Reporter: Vitor Baptista
>Assignee: Vitor Baptista
> Fix For: 1.9.0
>
>
> On line 
> https://github.com/apache/incubator-airflow/blob/1.7.1.3/airflow/operators/docker_operator.py#L164,
>  we're calling:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info("{}".format(line.strip()))
> {code}
> If `self.cli.logs()` return a string with a unicode character, this raises 
> the UnicodeDecodeError:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format
> raise e
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: 
> ordinal not in range(128)
> Logged from file docker_operator.py, line 165
> {noformat}
> A possible fix is to change that line to:
> {code:title=airflow/operators/docker_operator.py}
> for line in self.cli.logs(container=self.container['Id'], stream=True):
> logging.info(line.decode('utf-8').strip())
> {code}.
> This error doesn't happen on Python3. I haven't tested, but reading the code 
> it seems the same error exists on `master` as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (AIRFLOW-995) Update Github PR template

2017-03-16 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin reopened AIRFLOW-995:


Reopening because the new template has "AIRFLOW-1" in it (as an example) which 
is picked up by the PR tool and inserted into the commit subjects

> Update Github PR template
> -
>
> Key: AIRFLOW-995
> URL: https://issues.apache.org/jira/browse/AIRFLOW-995
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> The Github PR template looks great when rendered but is harder to parse while 
> in editing mode (which is how all PR authors initially see it). A new 
> template would be clear whether editing or previewing and include checkboxes 
> for force some user acknowledgement of each required step.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-681) homepage doc link should pointing to apache's repo not airbnb's repo

2017-03-17 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-681.

   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2164
[https://github.com/apache/incubator-airflow/pull/2164]

> homepage doc link should pointing to apache's repo not airbnb's repo
> 
>
> Key: AIRFLOW-681
> URL: https://issues.apache.org/jira/browse/AIRFLOW-681
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Trivial
> Fix For: 1.9.0
>
>
> Right now on home page, the default url of tab "Github" points to 
> https://github.com/airbnb/airflow.
> Though it's redirected to https://github.com/apache/incubator-airflow, we 
> need to fix it to explicitly point to apache's github.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-995) Update Github PR template

2017-03-17 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-995.

Resolution: Fixed

Issue resolved by pull request #2163
[https://github.com/apache/incubator-airflow/pull/2163]

> Update Github PR template
> -
>
> Key: AIRFLOW-995
> URL: https://issues.apache.org/jira/browse/AIRFLOW-995
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> The Github PR template looks great when rendered but is harder to parse while 
> in editing mode (which is how all PR authors initially see it). A new 
> template would be clear whether editing or previewing and include checkboxes 
> for force some user acknowledgement of each required step.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1005) Speed up Airflow startup time

2017-03-17 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-1005:
---

 Summary: Speed up Airflow startup time
 Key: AIRFLOW-1005
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1005
 Project: Apache Airflow
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor


Airflow takes approximately 1 second to import. It turns out that more than 
half the time is spend doing two things: importing Cryptography to create 
Fernet keys and importing Alembic. 

The first Cryptography import is in configuration.py and is only necessary if 
Airflow is generating a new airflow.cfg file (but currently gets run every 
time). Therefore it can be easily deferred.

The second is in models.py to check if encryption is turned on. This can also 
be deferred until encryption checks are actually needed.

Alembic is always imported even though it is only needed for running initdb() 
and upgradedb(). It can be lazily imported inside those functions.

These simple changes reduce Airflow's startup time by half on my machine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1005) Speed up Airflow startup time

2017-03-17 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-1005:

Attachment: Screen Shot 2017-03-17 at 6.04.23 PM.png

> Speed up Airflow startup time
> -
>
> Key: AIRFLOW-1005
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1005
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Attachments: Screen Shot 2017-03-17 at 6.04.23 PM.png
>
>
> Airflow takes approximately 1 second to import. It turns out that more than 
> half the time is spend doing two things: importing Cryptography to create 
> Fernet keys and importing Alembic. 
> The first Cryptography import is in configuration.py and is only necessary if 
> Airflow is generating a new airflow.cfg file (but currently gets run every 
> time). Therefore it can be easily deferred.
> The second is in models.py to check if encryption is turned on. This can also 
> be deferred until encryption checks are actually needed.
> Alembic is always imported even though it is only needed for running initdb() 
> and upgradedb(). It can be lazily imported inside those functions.
> These simple changes reduce Airflow's startup time by half on my machine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1006) Move configuration templates to separate files

2017-03-17 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-1006:
---

 Summary: Move configuration templates to separate files
 Key: AIRFLOW-1006
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1006
 Project: Apache Airflow
  Issue Type: Improvement
  Components: configuration
Affects Versions: Airflow 1.8
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin
Priority: Minor


Currently both the default and test configuration templates are just strings 
inside configuration.py. This makes them difficult to work with. It would be 
much better to expose them as separate files, "default_airflow.cfg" and 
"default_test.cfg", to make it clear they are distinct config templates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1010) Add a convenience script for signing

2017-03-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-1010.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2169
[https://github.com/apache/incubator-airflow/pull/2169]

> Add a convenience script for signing
> 
>
> Key: AIRFLOW-1010
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1010
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
> Fix For: 1.9.0
>
>
> Apache requires signed releases and it is convenient not to be required to 
> type in every command.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1009) Remove SQLOperator from Concepts page

2017-03-19 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-1009.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2168
[https://github.com/apache/incubator-airflow/pull/2168]

> Remove SQLOperator from Concepts page
> -
>
> Key: AIRFLOW-1009
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1009
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: Airflow 1.8
>Reporter: Ruslan Dautkhanov
> Fix For: 1.9.0
>
>
> As discussed in the dev list..
> SQLOperator neither in the source code nor in the API Reference.
> Although it is mentioned in Concepts page :
> {quote}
> SqlOperator - executes a SQL command
> {quote}
> https://github.com/apache/incubator-airflow/blob/master/docs/concepts.rst#operators
>  
> Should be updated with 
> ``MySqlOperator``, ``SqliteOperator``, ``PostgresOperator``, 
> ``MsSqlOperator``, ``OracleOperator``, ``JdbcOperator``, etc. - executes a 
> SQL command



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1006) Move configuration templates to separate files

2017-03-20 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932554#comment-15932554
 ] 

Jeremiah Lowin commented on AIRFLOW-1006:
-

Reopened due to issue pointed out by [~bolke] 
https://github.com/apache/incubator-airflow/commit/1da7450c96c631a75da96ba00f6d3ad116c9061b#commitcomment-21392365

Config templates are not included when Airflow is installed. Solution is to add 
them to MANIFEST.

> Move configuration templates to separate files
> --
>
> Key: AIRFLOW-1006
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1006
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: configuration
>Affects Versions: Airflow 1.8
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> Currently both the default and test configuration templates are just strings 
> inside configuration.py. This makes them difficult to work with. It would be 
> much better to expose them as separate files, "default_airflow.cfg" and 
> "default_test.cfg", to make it clear they are distinct config templates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (AIRFLOW-1006) Move configuration templates to separate files

2017-03-20 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin reopened AIRFLOW-1006:
-

> Move configuration templates to separate files
> --
>
> Key: AIRFLOW-1006
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1006
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: configuration
>Affects Versions: Airflow 1.8
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> Currently both the default and test configuration templates are just strings 
> inside configuration.py. This makes them difficult to work with. It would be 
> much better to expose them as separate files, "default_airflow.cfg" and 
> "default_test.cfg", to make it clear they are distinct config templates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1006) Move configuration templates to separate files

2017-03-20 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-1006.
-
Resolution: Fixed

Issue resolved by pull request #2173
[https://github.com/apache/incubator-airflow/pull/2173]

> Move configuration templates to separate files
> --
>
> Key: AIRFLOW-1006
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1006
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: configuration
>Affects Versions: Airflow 1.8
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: 1.9.0
>
>
> Currently both the default and test configuration templates are just strings 
> inside configuration.py. This makes them difficult to work with. It would be 
> much better to expose them as separate files, "default_airflow.cfg" and 
> "default_test.cfg", to make it clear they are distinct config templates.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (AIRFLOW-1040) Fix typos in comments/docstrings in models.py

2017-03-24 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-1040.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Issue resolved by pull request #2174
[https://github.com/apache/incubator-airflow/pull/2174]

> Fix typos in comments/docstrings in models.py
> -
>
> Key: AIRFLOW-1040
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1040
> Project: Apache Airflow
>  Issue Type: Task
>  Components: docs
>Reporter: Matthew Schmoyer
>Assignee: Matthew Schmoyer
>Priority: Trivial
> Fix For: 1.9.0
>
>
> There are several small spelling typos in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py
> Also some doc string formatting needs to be fixed, such as colons being in 
> the incorrect spot in things like `:param: default` and the function 
> `clean_dirty()` has doc string params that don't exist in the actual function.
> This issue is being addressed by PR: 
> https://github.com/apache/incubator-airflow/pull/2174



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-7) Unit test for ExternalTaskSensor depends on a different unit test

2016-04-26 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-7:


 Summary: Unit test for ExternalTaskSensor depends on a different 
unit test
 Key: AIRFLOW-7
 URL: https://issues.apache.org/jira/browse/AIRFLOW-7
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Jeremiah Lowin
Priority: Minor


The unit test {{core:CoreTest.test_external_task_sensor}} appears to depend on 
the result of a different unit test. I discovered this when I created a 
{{tearDown()}} method that deleted any TaskInstances created by a unit test. I 
think it's bad to have cross-test dependencies, especially since I'm not sure 
if there is a guarantee about unit test run order.

Full test:
{code}
def test_external_task_sensor_delta(self):
t = operators.ExternalTaskSensor(
task_id='test_external_task_sensor_check_delta',
external_dag_id=TEST_DAG_ID,
external_task_id='time_sensor_check',
execution_delta=timedelta(0),
allowed_states=['success'],
dag=self.dag)
t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, force=True)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Deleted] (AIRFLOW-8) ~!Call^%@(1877-791-9980)%% Kindle fire Support phone numberUSA HELPDESK Kindle fire customer Support phone number?

2016-04-27 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin deleted AIRFLOW-8:
-


> ~!Call^%@(1877-791-9980)%% Kindle fire Support phone numberUSA HELPDESK  
> Kindle fire customer Support phone number?
> ---
>
> Key: AIRFLOW-8
> URL: https://issues.apache.org/jira/browse/AIRFLOW-8
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: rahul sharma
>
> ~!Call^%@(1877-791-9980)%% Kindle fire Support phone numberUSA HELPDESK  
> Kindle fire customer Support phone number?
> USA Support ""1.877.791.9980"" kindle fire phone number
> USA Support ""1.877.791.9980"" kindle fires billing contact number
> USA Support ""1.877.791.9980"" kindle fire customer support number
> USA Support ""1.877.791.9980"" kindle fire tech support number
> USA Support ""1.877.791.9980"" kindle fire tech support phone number
> USA Support ""1.877.791.9980"" kindle fire customer support phone number
> USA Support ""1.877.791.9980"" kindle fire support number canada
> USA Support ""1.877.791.9980"" kindle fire support phone number
> USA Support ""1.877.791.9980"" kindle fire kindle fire support phone number
> USA Support ""1.877.791.9980"" kindle fire customer care number
> USA Support ""1.877.791.9980"" kindle fire support number
> USA Support ""1.877.791.9980"" kindle fire customer care phone number
> USA Support ""1.877.791.9980"" kindle fire toll free number
> USA Support ""1.877.791.9980"" kindle fire customer support phone number
> USA Support ""1.877.791.9980"" kindle fire customer service phone number
> USA Support ""1.877.791.9980"" kindle fire help phone number
> USA Support ""1.877.791.9980"" kindle fires customer service number
> USA Support ""1.877.791.9980"" kindle fire support phone number
> USA Support ""1.877.791.9980"" kindle fire technical support phone number
> USA Support ""1.877.791.9980"" kindle fire support contact number
> USA Support ""1.877.791.9980"" kindle fire contact number
> USA Support ""1.877.791.9980"" kindle fire support contact number
> USA Support ""1.877.791.9980"" kindle fire internet security toll free number
> USA Support ""1.877.791.9980"" kindle fire tech support phone number
> USA Support ""1.877.791.9980"" Contact number for kindle fire
> USA Support ""1.877.791.9980"" kindle fire help desk phone number
> USA Support ""1.877.791.9980"" kindle fire customer services number
> USA Support ""1.877.791.9980"" Phone number for kindle fire kindle fire
> USA Support ""1.877.791.9980"" kindle fire customer service phone number
> USA Support ""1.877.791.9980"" Customer service number for kindle fire
> USA Support ""1.877.791.9980"" Phone number for kindle fire support
> USA Support ""1.877.791.9980"" kindle fire technical support contact number
> USA Support ""1.877.791.9980"" kindle fire drivers support number
> USA Support ""1.877.791.9980"" kindle fire support phone number
> USA Support ""1.877.791.9980"" kindle fire support phone number
> USA Support ""1.877.791.9980"" kindle fire support phone number
> USA Support ""1.877.791.9980"" kindle fire support number
> USA Support ""1.877.791.9980"" kindle fire technical support number
> USA Support ""1.877.791.9980"" kindle fire technical support number
> USA Support ""1.877.791.9980"" kindle fire technical support number
> USA Support ""1.877.791.9980"" Install and download kindle fire drivers
> USA Support ""1.877.791.9980"" kindle fire technical support phone number
> USA Support ""1.877.791.9980"" kindle fire technical support phone number
> USA Support ""1.877.791.9980"" kindle fire support phone number
> USA Support ""1.877.791.9980"" Contact number for kindle fire
> USA Support ""1.877.791.9980"" Contact kindle fire support
> USA Support ""1.877.791.9980"" How do i contact kindle fire
> USA Support ""1.877.791.9980"" How to contact kindle fire
> USA Support ""1.877.791.9980"" Contact kindle fire support
> USA Support ""1.877.791.9980"" kindle fire support phone number
> USA Support ""1.877.791.9980"" How to contact kindle fire support
> USA Support ""1.877.791.9980"" How to contact kindle fire support
> USA Support ""1.877.791.9980"" kindle fire business
> USA Support ""1.877.791.9980"" kindle fire with kindle fire
> USA Support ""1.877.791.9980"" kindle fire for business
> USA Support ""1.877.791.9980"" hp printer.com support
> USA Support ""1.877.791.9980"" support for kindle fire
> USA Support ""1.877.791.9980"" Support for kindle fire
> USA Support ""1.877.791.9980"" kindle fire support telephone number
> USA Support ""1.877.791.9980"" kindle fire telephone number for technical 
> support
> USA Support ""1.877.791.9980"" kindle fire Telephone number for support
> USA Support ""1.877.791.9980"" kindle fire telephone number
> USA Support ""1.877.791.99

[jira] [Commented] (AIRFLOW-11) Migrate mailing list to Apache

2016-04-27 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260757#comment-15260757
 ] 

Jeremiah Lowin commented on AIRFLOW-11:
---

[~criccomini] excellent note. My suggestion is to explicitly say "send an email 
to dev-subscribe@... to subscribe" rather than just giving the list 
subscription link because I think it's not obvious how to interact with the 
mailing lists.

Also, it's implied that a PR can't exist without an associated issue, but 
that's a more strict requirement than GitHub and I think it should be 
highlighted.

> Migrate mailing list to Apache
> --
>
> Key: AIRFLOW-11
> URL: https://issues.apache.org/jira/browse/AIRFLOW-11
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-11) Migrate mailing list to Apache

2016-04-27 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260757#comment-15260757
 ] 

Jeremiah Lowin edited comment on AIRFLOW-11 at 4/27/16 7:08 PM:


@[~criccomini] excellent note. My suggestion is to explicitly say "send an 
email to dev-subscribe@... to subscribe" rather than just giving the list 
subscription link because I think it's not obvious how to interact with the 
mailing lists.

Also, it's implied that a PR can't exist without an associated issue, but 
that's a more strict requirement than GitHub and I think it should be 
highlighted.


was (Author: jlowin):
[~criccomini] excellent note. My suggestion is to explicitly say "send an email 
to dev-subscribe@... to subscribe" rather than just giving the list 
subscription link because I think it's not obvious how to interact with the 
mailing lists.

Also, it's implied that a PR can't exist without an associated issue, but 
that's a more strict requirement than GitHub and I think it should be 
highlighted.

> Migrate mailing list to Apache
> --
>
> Key: AIRFLOW-11
> URL: https://issues.apache.org/jira/browse/AIRFLOW-11
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chris Riccomini
>Assignee: Chris Riccomini
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-14:
-

 Summary: DagRun Refactor (Scheduler 2.0)
 Key: AIRFLOW-14
 URL: https://issues.apache.org/jira/browse/AIRFLOW-14
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


For full proposal, please see the Wiki: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286

Borrowing from that page: 

*Description of New Workflow*

DagRuns represent the state of a DAG at a certain point in time (perhaps they 
should be called DagInstances?). To run a DAG – or to manage the execution of a 
DAG – a DagRun must first be created. This can be done manually (simply by 
creating a DagRun object) or automatically, using methods like 
dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
runs can be done by any process at any time, simply by creating the appropriate 
object.

Just creating a DagRun is not enough to actually run the DAG (just as creating 
a TaskInstance is not the same as actually running a task). We need a Job for 
that. The DagRunJob is fairly simple in structure. It maintains a set of 
DagRuns that it is tasked with executing, and loops over that set until all the 
DagRuns either succeed or fail. New DagRuns can be passed to the job explicitly 
via DagRunJob.submit_dagruns() or by defining its DagRunJob.collect_dagruns() 
method, which is called during each loop. When the DagRunJob is executing a 
specific DagRun, it locks it. Other DagRunJobs will not try to execute locked 
DagRuns. This way, many DagRunJobs can run simultaneously in either a local or 
distributed setting, and can even be pointed at the same DagRuns, without 
worrying about collisions or interference.
The basic DagRunJob loop works like this:
- refresh dags
- collect new dagruns
- process dagruns (including updating dagrun states for success/failure)
- call executor/own heartbeat
By tweaking the DagRunJob, we can easily recreate the behavior of the current 
SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
corresponding to the requested start/end dates and submits them to itself prior 
to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262303#comment-15262303
 ] 

Jeremiah Lowin commented on AIRFLOW-14:
---

Excellent question. I've tried to address this with an addition to the existing 
{{kill_zombies()}} function. 

Basically, multiple DRJ's can be trying to run multiple DagRuns -- that's not 
an issue. They just take a lock when they're actually executing so the other 
DRJ's know to skip that one. (this is a slight repeat to something I put in the 
email, but better to have it here than there). The lock could have been a 
simple True/False flag, but that could lead to exactly the situation you're 
describing -- the DRJ locks the DR, dies, and then no one ever touches that DR 
again because of the lock. So the new kill_zombies() method looks at the 
lock_id and looks for an active job with that id. If the job exists, it takes 
no action. If the job is gone/ended, it unlocks the DagRun which makes it 
available for other DRJs (like a new Scheduler).

See: 
https://github.com/jlowin/airflow/blob/dagrun-refactor/airflow/models.py#L337

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262304#comment-15262304
 ] 

Jeremiah Lowin commented on AIRFLOW-14:
---

That is exactly correct. Multiple DRJ are not just supported but encouraged. 

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262324#comment-15262324
 ] 

Jeremiah Lowin commented on AIRFLOW-14:
---

DagRuns are primary keyed by (dag_id, execution_date), so there is only one 
canonical version. DRJ needs to make sure to refresh from the db to check for a 
lock immediately before running it, however. The mechanism is very similar to 
TaskInstance -- you can create as many TI objects as you want, but they all 
point at the one canonical version and can be refreshed at any time to reflect 
the "true" state.

DRJs pick up DagRuns in two ways:
1. explicitly via {{DagRunJob.submit_dags()}}. This is used for example by 
BackfillJob; it generates a bunch of DagRuns and calls {{submit_dags()}} to 
submit them to itself. Then it enters its loop. Scheduler also uses this after 
scheduling a DagRun, though it's actually redundant because of the second way 
(below)
2. automatically via {{DagRunJob.collect_dags()}}. This is used by SchedulerJob 
to look for any active DagRuns and add them to its list.

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262324#comment-15262324
 ] 

Jeremiah Lowin edited comment on AIRFLOW-14 at 4/28/16 3:22 PM:


DagRuns are primary keyed by (dag_id, execution_date), so there is only one 
canonical version. DRJ needs to make sure to refresh from the db to check for a 
lock immediately before running it, however. The mechanism is very similar to 
TaskInstance -- you can create as many TI objects as you want, but they all 
point at the one canonical version and can be refreshed at any time to reflect 
the "true" state.

DRJs pick up DagRuns in two ways:
1. explicitly via {{DagRunJob.submit_dags()}}. This is used for example by 
BackfillJob; it generates a bunch of DagRuns and calls {{submit_dags()}} to 
submit them to itself. Then it enters its loop. Scheduler also uses this after 
scheduling a DagRun, though it's actually redundant because of the second way 
(below)
2. automatically via {{DagRunJob.collect_dags()}}. This is called inside each 
DRJ loop and is used by SchedulerJob to look for any active DagRuns and add 
them to its set of DagRuns to execute.


was (Author: jlowin):
DagRuns are primary keyed by (dag_id, execution_date), so there is only one 
canonical version. DRJ needs to make sure to refresh from the db to check for a 
lock immediately before running it, however. The mechanism is very similar to 
TaskInstance -- you can create as many TI objects as you want, but they all 
point at the one canonical version and can be refreshed at any time to reflect 
the "true" state.

DRJs pick up DagRuns in two ways:
1. explicitly via {{DagRunJob.submit_dags()}}. This is used for example by 
BackfillJob; it generates a bunch of DagRuns and calls {{submit_dags()}} to 
submit them to itself. Then it enters its loop. Scheduler also uses this after 
scheduling a DagRun, though it's actually redundant because of the second way 
(below)
2. automatically via {{DagRunJob.collect_dags()}}. This is used by SchedulerJob 
to look for any active DagRuns and add them to its list.

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262357#comment-15262357
 ] 

Jeremiah Lowin commented on AIRFLOW-14:
---

Yes, that situation could arise... I'm not sure of the best way to handle this 
in a database-agnostic way (i.e.  SQLAlchemy). I'd appreciate any suggestions!

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-14:
--
Comment: was deleted

(was: Yes, that situation could arise... I'm not sure of the best way to handle 
this in a database-agnostic way (i.e.  SQLAlchemy). I'd appreciate any 
suggestions!)

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262359#comment-15262359
 ] 

Jeremiah Lowin commented on AIRFLOW-14:
---

Yes, that situation could arise... I'm not sure of the best way to handle this 
in a database-agnostic way (i.e.  SQLAlchemy). I'd appreciate any suggestions!

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (AIRFLOW-13) Migrate Travis CI to Apache repo

2016-04-28 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-13 started by Jeremiah Lowin.
-
> Migrate Travis CI to Apache repo
> 
>
> Key: AIRFLOW-13
> URL: https://issues.apache.org/jira/browse/AIRFLOW-13
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Chris Riccomini
>Assignee: Jeremiah Lowin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

2016-04-28 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262693#comment-15262693
 ] 

Jeremiah Lowin commented on AIRFLOW-14:
---

The good news is that lock_id is already set to the job's id, so I just need to 
add the additional check that it matches (right now it just assumes if it was 
able to lock, then it owns the lock, which is not as strict). But I'm still 
curious about your point:

> Even if they then check to verify that they are the owners of the lock, it's 
> not trustworthy.

Is that not still an issue?

> DagRun Refactor (Scheduler 2.0)
> ---
>
> Key: AIRFLOW-14
> URL: https://issues.apache.org/jira/browse/AIRFLOW-14
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>  Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-16) Use GCP-specific fields in hook view

2016-04-29 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264632#comment-15264632
 ] 

Jeremiah Lowin commented on AIRFLOW-16:
---

I don't have an easy way to test the Datastore or BigQuery hooks, but I can 
confirm that I was able to very easily create a `google_cloud_platform` 
connection in the UI and use it to write/read remote logs. The DS/BQ code 
didn't seem to raise any errors, for what it's worth.

> Use GCP-specific fields in hook view
> 
>
> Key: AIRFLOW-16
> URL: https://issues.apache.org/jira/browse/AIRFLOW-16
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Chris Riccomini
>  Labels: gcp
>
> Once AIRFLOW-15 is done, we should update the Google cloud base hook to use 
> fields for project, service account, etc. We currently just use a JSON blob 
> in the {{extras}} field. We can steal this code from 
> [this|https://github.com/airbnb/airflow/pull/1119/files] PR, where 
> {{extras\_\_google_cloud_platform_*}} is introduced in views.py.
> We should also look at creating just one hook of type google_cloud_platform, 
> rather than one hook per Google cloud service. Again, this is how the PR 
> (above) works, and it's pretty handy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-16) Use GCP-specific fields in hook view

2016-04-29 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264864#comment-15264864
 ] 

Jeremiah Lowin commented on AIRFLOW-16:
---

+1 from me [~criccomini]

> Use GCP-specific fields in hook view
> 
>
> Key: AIRFLOW-16
> URL: https://issues.apache.org/jira/browse/AIRFLOW-16
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Chris Riccomini
>  Labels: gcp
>
> Once AIRFLOW-15 is done, we should update the Google cloud base hook to use 
> fields for project, service account, etc. We currently just use a JSON blob 
> in the {{extras}} field. We can steal this code from 
> [this|https://github.com/airbnb/airflow/pull/1119/files] PR, where 
> {{extras\_\_google_cloud_platform_*}} is introduced in views.py.
> We should also look at creating just one hook of type google_cloud_platform, 
> rather than one hook per Google cloud service. Again, this is how the PR 
> (above) works, and it's pretty handy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-28) Add @latest, @now, @start_date, @end_date to the `airflow test` CLI

2016-05-02 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-28:
--
Assignee: Siddharth Anand

> Add @latest, @now, @start_date, @end_date to the `airflow test` CLI
> ---
>
> Key: AIRFLOW-28
> URL: https://issues.apache.org/jira/browse/AIRFLOW-28
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Bence Nagy
>Assignee: Siddharth Anand
>Priority: Minor
>
> It's often quite a drag to have to calculate and type a valid datestring for 
> {{airflow test}} when I just want to see whether my code runs. It'd be really 
> nice if I could just run {{airflow test dag task @latest}} and have airflow 
> think about the execution date instead of me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-26) GCP hook naming alignment

2016-05-02 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266496#comment-15266496
 ] 

Jeremiah Lowin commented on AIRFLOW-26:
---

Good idea. I had a handy way of deprecating modules in a backwards compatible 
way -- let me see if I can apply it here.

> GCP hook naming alignment
> -
>
> Key: AIRFLOW-26
> URL: https://issues.apache.org/jira/browse/AIRFLOW-26
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Alex Van Boxel
>Priority: Minor
>
> Because we have quite a few GCP services, it's better to align the naming to 
> not confuse new users using Google Cloud Platform:
> gcp_storage > renamed from gcs
> gcp_bigquery > renamed from bigquery
> gcp_datastore > rename from datastore
> gcp_dataflow > TBD
> gcp_dataproc > TBD
> gcp_bigtable > TBD
> Not this could break 'cursom' operators if they use the hooks.
> Can be assigned to me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-31) Use standard imports for hooks/operators

2016-05-02 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-31:
-

 Summary: Use standard imports for hooks/operators
 Key: AIRFLOW-31
 URL: https://issues.apache.org/jira/browse/AIRFLOW-31
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


(Migrated from https://github.com/airbnb/airflow/issues/1238)

Currently, Airflow uses a relatively complex import mechanism to import hooks 
and operators without polluting the namespace with submodules. I would like to 
propose that Airflow abandon that system and use standard Python importing.

Here are a few major reasons why I think the current system has run its course.

### Polluting namespace
The biggest advantage of the current system, as I understand it, is that only 
Operators appear in the `airflow.operators` namespace.  The submodules that 
actually contain the operators do not.
So for example while `airflow.operators.python_operator.PythonOperator` is a 
thing, `PythonOperator` is in the `airflow.operators` namespace but 
`python_operator` is not.

I think this sort of namespace pollution was helpful when Airflow was a smaller 
project, but as the number of hooks/operators grows -- and especially as the 
`contrib` hooks/operators grow -- I'd argue that namespacing is a *good thing*. 
It provides structure and organization, and opportunities for documentation 
(through module docstrings).

In fact, I'd argue that the current namespace is itself getting quite polluted 
-- the only way to know what's available is to use something like Ipython 
tab-completion to browse an alphabetical list of Operator names, or to load the 
source file and grok the import definition (which no one installing from pypi 
is likely to do).

### Conditional imports
There's a second advantage to the current system that any module that fails to 
import is silently ignored. It makes it easy to have optional dependencies. For 
example, if someone doesn't have `boto` installed, then they don't have an 
`S3Hook` either. Same for a HiveOperator

Again, as Airflow grows and matures, I think this is a little too magic. If my 
environment is missing a dependency, I want to hear about it.

On the other hand, the `contrib` namespace sort of depends on this -- we don't 
want users to have to install every single dependency. So I propose that 
contrib modules all live in their submodules: `from 
airflow.contrib.operators.my_operator import MyOperator`. As mentioned 
previously, having structure and namespacing is a good thing as the project 
gets more complex.

Other ways to handle this include putting "non-standard" dependencies inside 
the operator/hook rather than the module (see `HiveOperator`/`HiveHook`), so it 
can be imported but not used. Another is judicious use of `try`/`except 
ImportError`. The simplest is to make people import things explicitly from 
submodules.

### Operator dependencies
Right now, operators can't depend on each other if they aren't in the same 
file. This is for the simple reason that there is no guarantee on what order 
the operators will be loaded. It all comes down to which dictionary key gets 
loaded first. One day Operator B could be loaded after Operator A; the next day 
it might be loaded before. Consequently, A and B can't depend on each other. 
Worse, if a user makes two operators that do depend on each other, they won't 
get an error message when one fails to import.

For contrib modules in particular, this is sort of killer.

### Ease of use
It's *hard* to set up imports for a new operator. The dictionary-based import 
instructions aren't obvious for new users, and errors are silently dismissed 
which makes debugging difficult.

### Identity
Surprisingly, `airflow.operators.SubDagOperator != 
airflow.operators.subdag_operator.SubDagOperator`. See #1168.

# Proposal

Use standard python importing for hooks/operators/etc.
- `__init__.py` files use straightforward, standard Python imports
- major operators are available at `airflow.operators.OperatorName` or 
`airflow.operators.operator_module.OperatorName`.
- contrib operators are only available at 
`airflow.contrib.operators.operator_module.OperatorName` in order to manage 
dependencies
- operator authors are encouraged to use `__all__` to define their module's 
exports

Possibly delete namespace afterward
- in `operators/__init__.py`, run a function at the end of the file which 
deletes all modules from the namespace, leaving only `Operators`. This keeps 
the namespace clear but lets people use familiar import mechanisms.

Possibly use an import function to handle `ImportError` gracefully
- rewrite `import_module_attrs` to take one module name at a time instead of a 
dictionary. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >