[jira] [Created] (AIRFLOW-3000) Allow to print into the log from operators (ability to Base Operator)

2018-09-04 Thread jack (JIRA)
jack created AIRFLOW-3000:
-

 Summary: Allow to print into the log from operators (ability to 
Base Operator)
 Key: AIRFLOW-3000
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3000
 Project: Apache Airflow
  Issue Type: Task
Affects Versions: 1.10.0
Reporter: jack


As described on stack overflow: 
[https://stackoverflow.com/questions/52144108/how-to-print-a-unique-message-in-airflow-operator]

 

Any print in the code will be shown on the log file. However it a problem when 
creating operators dinamicly

 

assume this code:

 
{code:java}
for i in range(5, 0, -1):
 print("My name is load_ads_to_BigQuery-{}".format{i))
 update_bigquery = GoogleCloudStorageToBigQueryOperator   
(task_id='load_ads_to_BigQuery-{}'.format(I),…){code}
 

This creates 5 operators.

The print will be executed 5 times per each operator.

meaning that if you go to the log of 
{code:java}
load_ads_to_BigQuery-1 {code}
you will see:

 

 
{code:java}
My name is load_ads_to_BigQuery-1
My name is load_ads_to_BigQuery-2
My name is load_ads_to_BigQuery-3
My name is load_ads_to_BigQuery-4
My name is load_ads_to_BigQuery-5
{code}
 

 

This is a problem because it logs messages of the other operators.

 

Each operator is unique only with-in the operator itself. meaning that the 
print should be inside the operator as:

for i in range(5, 0, -1):
 
 
{code:java}
update_bigquery = GoogleCloudStorageToBigQueryOperator 
(task_id='load_ads_to_BigQuery-{}'.format(I) , print("My name is 
load_ads_to_BigQuery-{}".format{i)) , …){code}
 

or something like it. However Airflow does not support printing inside 
operators. It's not one of the allowed arguments.

Add optional parameter called 
{code:java}
msg_log {code}
that if assigned with value it will print the value to the log when the 
operator is executed.

 

Please add argument on the Base Operator for printing and extend it as optional 
ability to all operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-416533459
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=h1)
 Report
   > Merging 
[#3813](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/5869feae871dd2b70e9c0b1fdb6315d034196aea?src=pr&el=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3813/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3813  +/-   ##
   ==
   + Coverage   77.43%   77.43%   +<.01% 
   ==
 Files 203  203  
 Lines   1584615846  
   ==
   + Hits1227012271   +1 
   + Misses   3576 3575   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3813/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.79% <0%> (+0.04%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=footer).
 Last update 
[5869fea...402d8dd](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-2997) Support for clustered tables in Bigquery hooks/operators

2018-09-04 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated AIRFLOW-2997:
--
Fix Version/s: 1.10.1

> Support for clustered tables in Bigquery hooks/operators
> 
>
> Key: AIRFLOW-2997
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2997
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Reporter: Gordon Ball
>Priority: Minor
> Fix For: 1.10.1
>
>
> Bigquery support for clustered tables was added (at GCP "Beta" level) on 
> 2018-07-30. This feature allows load or table-creating query operations to 
> request that data be stored sorted by a subset of columns, allowing more 
> efficient (and potentially cheaper) subsequent queries.
>  Support for specifying fields to cluster on should be added to at least the 
> bigquery hook, load-from-GCS operator and query operator.
>  Documentation: https://cloud.google.com/bigquery/docs/clustered-tables



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko closed pull request #3839: [AIRFLOW-208] Add badge to show supported Python versions

2018-09-04 Thread GitBox
Fokko closed pull request #3839: [AIRFLOW-208] Add badge to show supported 
Python versions
URL: https://github.com/apache/incubator-airflow/pull/3839
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/README.md b/README.md
index e911225aee..211d9844d1 100644
--- a/README.md
+++ b/README.md
@@ -5,6 +5,7 @@
 [![Coverage 
Status](https://img.shields.io/codecov/c/github/apache/incubator-airflow/master.svg)](https://codecov.io/github/apache/incubator-airflow?branch=master)
 [![Documentation 
Status](https://readthedocs.org/projects/airflow/badge/?version=latest)](https://airflow.readthedocs.io/en/latest/?badge=latest)
 
[![License](http://img.shields.io/:license-Apache%202-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.txt)
+[![PyPI - Python 
Version](https://img.shields.io/pypi/pyversions/apache-airflow.svg)](https://pypi.org/project/apache-airflow/)
 [![Join the chat at 
https://gitter.im/apache/incubator-airflow](https://badges.gitter.im/apache/incubator-airflow.svg)](https://gitter.im/apache/incubator-airflow?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 
 _NOTE: The transition from 1.8.0 (or before) to 1.8.1 (or after) requires 
uninstalling Airflow before installing the new version. The package name was 
changed from `airflow` to `apache-airflow` as of version 1.8.1._


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-208) Adding badge to README.md to show supported Python versions

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602689#comment-16602689
 ] 

ASF GitHub Bot commented on AIRFLOW-208:


Fokko closed pull request #3839: [AIRFLOW-208] Add badge to show supported 
Python versions
URL: https://github.com/apache/incubator-airflow/pull/3839
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/README.md b/README.md
index e911225aee..211d9844d1 100644
--- a/README.md
+++ b/README.md
@@ -5,6 +5,7 @@
 [![Coverage 
Status](https://img.shields.io/codecov/c/github/apache/incubator-airflow/master.svg)](https://codecov.io/github/apache/incubator-airflow?branch=master)
 [![Documentation 
Status](https://readthedocs.org/projects/airflow/badge/?version=latest)](https://airflow.readthedocs.io/en/latest/?badge=latest)
 
[![License](http://img.shields.io/:license-Apache%202-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.txt)
+[![PyPI - Python 
Version](https://img.shields.io/pypi/pyversions/apache-airflow.svg)](https://pypi.org/project/apache-airflow/)
 [![Join the chat at 
https://gitter.im/apache/incubator-airflow](https://badges.gitter.im/apache/incubator-airflow.svg)](https://gitter.im/apache/incubator-airflow?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 
 _NOTE: The transition from 1.8.0 (or before) to 1.8.1 (or after) requires 
uninstalling Airflow before installing the new version. The package name was 
changed from `airflow` to `apache-airflow` as of version 1.8.1._


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Adding badge to README.md to show supported Python versions
> ---
>
> Key: AIRFLOW-208
> URL: https://issues.apache.org/jira/browse/AIRFLOW-208
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Maxime Beauchemin
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #2187: [AIRFLOW-1042] Easy Unit Testing with Docker

2018-09-04 Thread GitBox
Fokko commented on issue #2187: [AIRFLOW-1042] Easy Unit Testing with Docker
URL: 
https://github.com/apache/incubator-airflow/pull/2187#issuecomment-418270573
 
 
   @gerardo Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko closed pull request #2187: [AIRFLOW-1042] Easy Unit Testing with Docker

2018-09-04 Thread GitBox
Fokko closed pull request #2187: [AIRFLOW-1042] Easy Unit Testing with Docker
URL: https://github.com/apache/incubator-airflow/pull/2187
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 04b0d7f713..87f0a24cd7 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -141,11 +141,26 @@ We *highly* recommend setting up [Travis 
CI](https://travis-ci.org/) on
 your repo to automate this. It is free for open source projects. If for
 some reason you cannot, you can use the steps below to run tests.
 
-Here are loose guidelines on how to get your environment to run the unit tests.
-We do understand that no one out there can run the full test suite since
-Airflow is meant to connect to virtually any external system and that you most
-likely have only a subset of these in your environment. You should run the
-CoreTests and tests related to things you touched in your PR.
+Unit tests can be run locally using Docker. Running this command:
+
+docker-compose up -d
+
+builds and starts three Docker containers: one for MySQL, one for Postgres,
+and one for Airflow. Once the Docker containers are built and running you can
+then run:
+
+./scripts/docker/unittest/run.sh tests.core:CoreTest
+
+The Airflow container has a volume mapped to the Airflow source directory so
+that any edits made to source files are reflected in the container. You can
+make edits and then run tests specific to the area you're working on.
+
+If you want to run unit tests without Docker, here are loose guidelines on
+how to get your environment to run the unit tests. We do understand that no
+one out there can run the full test suite since Airflow is meant to connect
+to virtually any external system and that you most likely have only a subset
+of these in your environment. You should run the CoreTests and tests related
+to things you touched in your PR.
 
 To set up a unit test environment, first take a look at `run_unit_tests.sh` and
 understand that your ``AIRFLOW_CONFIG`` points to an alternate config file
diff --git a/airflow/config_templates/default_test.cfg 
b/airflow/config_templates/default_test.cfg
index ecf7f4ebb0..93a4f9fde3 100644
--- a/airflow/config_templates/default_test.cfg
+++ b/airflow/config_templates/default_test.cfg
@@ -70,8 +70,8 @@ smtp_mail_from = airf...@airflow.com
 celery_app_name = airflow.executors.celery_executor
 celeryd_concurrency = 16
 worker_log_server_port = 8793
-broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow
-celery_result_backend = db+mysql://airflow:airflow@localhost:3306/airflow
+broker_url = sqla+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow
+celery_result_backend = 
db+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow
 flower_host = 0.0.0.0
 flower_port = 
 default_queue = default
diff --git a/airflow/configuration.py b/airflow/configuration.py
index f140be2bc1..9ddaf5b4c1 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -318,6 +318,11 @@ def mkdir_p(path):
 else:
 AIRFLOW_CONFIG = expand_env_var(os.environ['AIRFLOW_CONFIG'])
 
+if 'AIRFLOW_MYSQL_HOST' not in os.environ:
+AIRFLOW_MYSQL_HOST = 'localhost'
+else:
+AIRFLOW_MYSQL_HOST = expand_env_var(os.environ['AIRFLOW_MYSQL_HOST'])
+
 # Set up dags folder for unit tests
 # this directory won't exist if users install via pip
 _TEST_DAGS_FOLDER = os.path.join(
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index 618e00200b..4ca59e704f 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -27,6 +27,7 @@
 
 from airflow import settings
 
+
 def provide_session(func):
 """
 Function decorator that provides a session if it isn't provided.
@@ -94,6 +95,21 @@ def checkout(dbapi_connection, connection_record, 
connection_proxy):
 )
 
 
+def get_mysql_host(default='localhost'):
+return default if 'AIRFLOW_MYSQL_HOST' not in os.environ \
+else os.environ['AIRFLOW_MYSQL_HOST']
+
+
+def get_mysql_login(default='root'):
+return default if 'AIRFLOW_MYSQL_USER' not in os.environ \
+else os.environ['AIRFLOW_MYSQL_USER']
+
+
+def get_mysql_password(default=None):
+return default if 'AIRFLOW_MYSQL_PASSWORD' not in os.environ \
+else os.environ['AIRFLOW_MYSQL_PASSWORD']
+
+
 def initdb():
 session = settings.Session()
 
@@ -103,12 +119,13 @@ def initdb():
 merge_conn(
 models.Connection(
 conn_id='airflow_db', conn_type='mysql',
-host='localhost', login='root', password='',
+login=get_mysql_login(), host=get_mysql_host(), 
password=get_mysql_password(),
 schema='airflow'))
 merge_conn(
 models.Connection(
 conn_id='airflow_ci', conn_type='mysql',
-host='loca

[jira] [Commented] (AIRFLOW-1042) Easy Unit Testing with Docker

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602690#comment-16602690
 ] 

ASF GitHub Bot commented on AIRFLOW-1042:
-

Fokko closed pull request #2187: [AIRFLOW-1042] Easy Unit Testing with Docker
URL: https://github.com/apache/incubator-airflow/pull/2187
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 04b0d7f713..87f0a24cd7 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -141,11 +141,26 @@ We *highly* recommend setting up [Travis 
CI](https://travis-ci.org/) on
 your repo to automate this. It is free for open source projects. If for
 some reason you cannot, you can use the steps below to run tests.
 
-Here are loose guidelines on how to get your environment to run the unit tests.
-We do understand that no one out there can run the full test suite since
-Airflow is meant to connect to virtually any external system and that you most
-likely have only a subset of these in your environment. You should run the
-CoreTests and tests related to things you touched in your PR.
+Unit tests can be run locally using Docker. Running this command:
+
+docker-compose up -d
+
+builds and starts three Docker containers: one for MySQL, one for Postgres,
+and one for Airflow. Once the Docker containers are built and running you can
+then run:
+
+./scripts/docker/unittest/run.sh tests.core:CoreTest
+
+The Airflow container has a volume mapped to the Airflow source directory so
+that any edits made to source files are reflected in the container. You can
+make edits and then run tests specific to the area you're working on.
+
+If you want to run unit tests without Docker, here are loose guidelines on
+how to get your environment to run the unit tests. We do understand that no
+one out there can run the full test suite since Airflow is meant to connect
+to virtually any external system and that you most likely have only a subset
+of these in your environment. You should run the CoreTests and tests related
+to things you touched in your PR.
 
 To set up a unit test environment, first take a look at `run_unit_tests.sh` and
 understand that your ``AIRFLOW_CONFIG`` points to an alternate config file
diff --git a/airflow/config_templates/default_test.cfg 
b/airflow/config_templates/default_test.cfg
index ecf7f4ebb0..93a4f9fde3 100644
--- a/airflow/config_templates/default_test.cfg
+++ b/airflow/config_templates/default_test.cfg
@@ -70,8 +70,8 @@ smtp_mail_from = airf...@airflow.com
 celery_app_name = airflow.executors.celery_executor
 celeryd_concurrency = 16
 worker_log_server_port = 8793
-broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow
-celery_result_backend = db+mysql://airflow:airflow@localhost:3306/airflow
+broker_url = sqla+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow
+celery_result_backend = 
db+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow
 flower_host = 0.0.0.0
 flower_port = 
 default_queue = default
diff --git a/airflow/configuration.py b/airflow/configuration.py
index f140be2bc1..9ddaf5b4c1 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -318,6 +318,11 @@ def mkdir_p(path):
 else:
 AIRFLOW_CONFIG = expand_env_var(os.environ['AIRFLOW_CONFIG'])
 
+if 'AIRFLOW_MYSQL_HOST' not in os.environ:
+AIRFLOW_MYSQL_HOST = 'localhost'
+else:
+AIRFLOW_MYSQL_HOST = expand_env_var(os.environ['AIRFLOW_MYSQL_HOST'])
+
 # Set up dags folder for unit tests
 # this directory won't exist if users install via pip
 _TEST_DAGS_FOLDER = os.path.join(
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index 618e00200b..4ca59e704f 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -27,6 +27,7 @@
 
 from airflow import settings
 
+
 def provide_session(func):
 """
 Function decorator that provides a session if it isn't provided.
@@ -94,6 +95,21 @@ def checkout(dbapi_connection, connection_record, 
connection_proxy):
 )
 
 
+def get_mysql_host(default='localhost'):
+return default if 'AIRFLOW_MYSQL_HOST' not in os.environ \
+else os.environ['AIRFLOW_MYSQL_HOST']
+
+
+def get_mysql_login(default='root'):
+return default if 'AIRFLOW_MYSQL_USER' not in os.environ \
+else os.environ['AIRFLOW_MYSQL_USER']
+
+
+def get_mysql_password(default=None):
+return default if 'AIRFLOW_MYSQL_PASSWORD' not in os.environ \
+else os.environ['AIRFLOW_MYSQL_PASSWORD']
+
+
 def initdb():
 session = settings.Session()
 
@@ -103,12 +119,13 @@ def initdb():
 merge_conn(
 models.Connection(
 conn_id='airflow_db', conn_type='mysql',
-host='localhost', login='root

[GitHub] codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-416533459
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=h1)
 Report
   > Merging 
[#3813](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/5869feae871dd2b70e9c0b1fdb6315d034196aea?src=pr&el=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3813/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3813  +/-   ##
   ==
   + Coverage   77.43%   77.43%   +<.01% 
   ==
 Files 203  203  
 Lines   1584615846  
   ==
   + Hits1227012271   +1 
   + Misses   3576 3575   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3813/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.79% <0%> (+0.04%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=footer).
 Last update 
[5869fea...402d8dd](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-416533459
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=h1)
 Report
   > Merging 
[#3813](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/5869feae871dd2b70e9c0b1fdb6315d034196aea?src=pr&el=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3813/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3813  +/-   ##
   ==
   + Coverage   77.43%   77.43%   +<.01% 
   ==
 Files 203  203  
 Lines   1584615846  
   ==
   + Hits1227012271   +1 
   + Misses   3576 3575   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3813/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=)
 | `88.79% <0%> (+0.04%)` | :arrow_up: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=footer).
 Last update 
[5869fea...402d8dd](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-04 Thread GitBox
chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support 
for Bigquery clustered tables
URL: https://github.com/apache/incubator-airflow/pull/3838#discussion_r214822929
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -943,6 +962,14 @@ def run_load(self,
 'timePartitioning': time_partitioning
 })
 
+if cluster_fields:
 
 Review comment:
   After rebasing this morning, the changes made by AIRFLOW-491 (#3733) mean 
the logic for `run_query` and `run_load` have diverged somewhat, so factoring 
it out probably now makes less sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-04 Thread GitBox
chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support 
for Bigquery clustered tables
URL: https://github.com/apache/incubator-airflow/pull/3838#discussion_r214822929
 
 

 ##
 File path: airflow/contrib/hooks/bigquery_hook.py
 ##
 @@ -943,6 +962,14 @@ def run_load(self,
 'timePartitioning': time_partitioning
 })
 
+if cluster_fields:
 
 Review comment:
   After rebasing this morning, the changes made by AIRFLOW-491 mean the logic 
for `run_query` and `run_load` have diverged somewhat, so factoring it out 
probably now makes less sense.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (AIRFLOW-208) Adding badge to README.md to show supported Python versions

2018-09-04 Thread Kaxil Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik resolved AIRFLOW-208.

Resolution: Fixed

Resolved by https://github.com/apache/incubator-airflow/pull/3839

> Adding badge to README.md to show supported Python versions
> ---
>
> Key: AIRFLOW-208
> URL: https://issues.apache.org/jira/browse/AIRFLOW-208
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Maxime Beauchemin
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3001) accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)
Jason Kim created AIRFLOW-3001:
--

 Summary: accumulative tis slow allocation of new schedule
 Key: AIRFLOW-3001
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
 Project: Apache Airflow
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 1.10.0
Reporter: Jason Kim


I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag would be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'"

I could resolve this issue by adding new index consist of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kim updated AIRFLOW-3001:
---
Summary: Accumulative tis slow allocation of new schedule  (was: 
accumulative tis slow allocation of new schedule)

> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval (2~3 years as 10 min 
> interval)
> So, dag would be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'"
> I could resolve this issue by adding new index consist of dag_id and 
> execution_date
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kim updated AIRFLOW-3001:
---
Description: 
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'"

I could resolve this issue by adding new index consist of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.

  was:
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag would be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'"

I could resolve this issue by adding new index consist of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval (2~3 years as 10 min 
> interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'"
> I could resolve this issue by adding new index consist of dag_id and 
> execution_date
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kim updated AIRFLOW-3001:
---
Description: 
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"

I could resolve this issue by adding new index consists of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.

  was:
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"

I could resolve this issue by adding new index consist of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval (2~3 years as 10 min 
> interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kim updated AIRFLOW-3001:
---
Description: 
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"

I could resolve this issue by adding new index consist of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.

  was:
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'"

I could resolve this issue by adding new index consist of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval (2~3 years as 10 min 
> interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"
> I could resolve this issue by adding new index consist of dag_id and 
> execution_date
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'

2018-09-04 Thread GitBox
ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table 
index 'ti_dag_date'
URL: https://github.com/apache/incubator-airflow/pull/3840
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kim reassigned AIRFLOW-3001:
--

Assignee: Jason Kim

> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Assignee: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval (2~3 years as 10 min 
> interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602762#comment-16602762
 ] 

ASF GitHub Bot commented on AIRFLOW-3001:
-

ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table 
index 'ti_dag_date'
URL: https://github.com/apache/incubator-airflow/pull/3840
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval (2~3 years as 10 min 
> interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kim updated AIRFLOW-3001:
---
Description: 
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date = 
''2018-09-01 00:00:00"

I could resolve this issue by adding new index consists of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.

  was:
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''"

I could resolve this issue by adding new index consists of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Assignee: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval (2~3 years as 10 min 
> interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date 
> = ''2018-09-01 00:00:00"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread Jason Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Kim updated AIRFLOW-3001:
---
Description: 
I have created very long term schedule in short interval. (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date = 
''2018-09-01 00:00:00"

I could resolve this issue by adding new index consists of dag_id and 
execution_date.

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.

  was:
I have created very long term schedule in short interval (2~3 years as 10 min 
interval)

So, dag could be bigger and bigger as scheduling goes on.

Finally, at critical point (I don't know exactly when it is), the allocation of 
new task_instances get slow and then almost stop.

I found that in this point, many slow query logs had occurred. (I was using 
mysql as meta repository)

queries like this

"SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date = 
''2018-09-01 00:00:00"

I could resolve this issue by adding new index consists of dag_id and 
execution_date

So, I wanted 1.10 branch to be modified to create task_instance table with the 
index.

Thanks.


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Assignee: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval. (2~3 years as 10 
> min interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date 
> = ''2018-09-01 00:00:00"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date.
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'

2018-09-04 Thread GitBox
ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table 
index 'ti_dag_date'
URL: https://github.com/apache/incubator-airflow/pull/3840
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ubermen closed pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'

2018-09-04 Thread GitBox
ubermen closed pull request #3840: [AIRFLOW-3001] Add task_instance table index 
'ti_dag_date'
URL: https://github.com/apache/incubator-airflow/pull/3840
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/migrations/versions/e3a246e0dc1_current_schema.py 
b/airflow/migrations/versions/e3a246e0dc1_current_schema.py
index 6c63d0a9dd..22624a4c8d 100644
--- a/airflow/migrations/versions/e3a246e0dc1_current_schema.py
+++ b/airflow/migrations/versions/e3a246e0dc1_current_schema.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -176,6 +176,12 @@ def upgrade():
 ['dag_id', 'state'],
 unique=False
 )
+op.create_index(
+'ti_dag_date',
+'task_instance',
+['dag_id', 'execution_date'],
+unique=False
+)
 op.create_index(
 'ti_pool',
 'task_instance',
@@ -269,6 +275,7 @@ def downgrade():
 op.drop_index('ti_state_lkp', table_name='task_instance')
 op.drop_index('ti_pool', table_name='task_instance')
 op.drop_index('ti_dag_state', table_name='task_instance')
+op.drop_index('ti_dag_date', table_name='task_instance')
 op.drop_table('task_instance')
 op.drop_table('slot_pool')
 op.drop_table('sla_miss')
diff --git a/airflow/models.py b/airflow/models.py
index 2096785b41..c41f2a9dbe 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -880,6 +880,7 @@ class TaskInstance(Base, LoggingMixin):
 
 __table_args__ = (
 Index('ti_dag_state', dag_id, state),
+Index('ti_dag_date', dag_id, execution_date),
 Index('ti_state', state),
 Index('ti_state_lkp', dag_id, task_id, execution_date, state),
 Index('ti_pool', pool, state, priority_weight),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602779#comment-16602779
 ] 

ASF GitHub Bot commented on AIRFLOW-3001:
-

ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table 
index 'ti_dag_date'
URL: https://github.com/apache/incubator-airflow/pull/3840
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Assignee: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval. (2~3 years as 10 
> min interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date 
> = ''2018-09-01 00:00:00"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date.
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602778#comment-16602778
 ] 

ASF GitHub Bot commented on AIRFLOW-3001:
-

ubermen closed pull request #3840: [AIRFLOW-3001] Add task_instance table index 
'ti_dag_date'
URL: https://github.com/apache/incubator-airflow/pull/3840
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/migrations/versions/e3a246e0dc1_current_schema.py 
b/airflow/migrations/versions/e3a246e0dc1_current_schema.py
index 6c63d0a9dd..22624a4c8d 100644
--- a/airflow/migrations/versions/e3a246e0dc1_current_schema.py
+++ b/airflow/migrations/versions/e3a246e0dc1_current_schema.py
@@ -7,9 +7,9 @@
 # to you under the Apache License, Version 2.0 (the
 # "License"); you may not use this file except in compliance
 # with the License.  You may obtain a copy of the License at
-# 
+#
 #   http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing,
 # software distributed under the License is distributed on an
 # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -176,6 +176,12 @@ def upgrade():
 ['dag_id', 'state'],
 unique=False
 )
+op.create_index(
+'ti_dag_date',
+'task_instance',
+['dag_id', 'execution_date'],
+unique=False
+)
 op.create_index(
 'ti_pool',
 'task_instance',
@@ -269,6 +275,7 @@ def downgrade():
 op.drop_index('ti_state_lkp', table_name='task_instance')
 op.drop_index('ti_pool', table_name='task_instance')
 op.drop_index('ti_dag_state', table_name='task_instance')
+op.drop_index('ti_dag_date', table_name='task_instance')
 op.drop_table('task_instance')
 op.drop_table('slot_pool')
 op.drop_table('sla_miss')
diff --git a/airflow/models.py b/airflow/models.py
index 2096785b41..c41f2a9dbe 100755
--- a/airflow/models.py
+++ b/airflow/models.py
@@ -880,6 +880,7 @@ class TaskInstance(Base, LoggingMixin):
 
 __table_args__ = (
 Index('ti_dag_state', dag_id, state),
+Index('ti_dag_date', dag_id, execution_date),
 Index('ti_state', state),
 Index('ti_state_lkp', dag_id, task_id, execution_date, state),
 Index('ti_pool', pool, state, priority_weight),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Accumulative tis slow allocation of new schedule
> 
>
> Key: AIRFLOW-3001
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3001
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 1.10.0
>Reporter: Jason Kim
>Assignee: Jason Kim
>Priority: Major
>
> I have created very long term schedule in short interval. (2~3 years as 10 
> min interval)
> So, dag could be bigger and bigger as scheduling goes on.
> Finally, at critical point (I don't know exactly when it is), the allocation 
> of new task_instances get slow and then almost stop.
> I found that in this point, many slow query logs had occurred. (I was using 
> mysql as meta repository)
> queries like this
> "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date 
> = ''2018-09-01 00:00:00"
> I could resolve this issue by adding new index consists of dag_id and 
> execution_date.
> So, I wanted 1.10 branch to be modified to create task_instance table with 
> the index.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-04 Thread GitBox
chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery 
clustered tables
URL: 
https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418294239
 
 
   @kaxil I've addressed your comments wrt better docstrings, correct 
indentation.
   
   Rebasing after #3733 has resulted in some quite large changes to the 
implementation in `bigquery_hook.py`; since the logic is now rather different 
in `run_query` and `run_load`, it's less obvious that there is an easy piece of 
common code to factor out.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery 
clustered tables
URL: 
https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418050587
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=h1)
 Report
   > Merging 
[#3838](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3838/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3838   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=footer).
 Last update 
[da052ff...8e3325a](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery 
clustered tables
URL: 
https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418050587
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=h1)
 Report
   > Merging 
[#3838](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3838/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3838   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=footer).
 Last update 
[da052ff...8e3325a](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (AIRFLOW-3000) Allow to print into the log from operators (ability to Base Operator)

2018-09-04 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-3000.
--
Resolution: Won't Do

The example given in the stack-over flow answer (of sub-classing the Operator) 
is one way to do to this.

However this is an anti-pattern: this is going to get printed _every time 
Airflow parses the DAG_ - not just when it runs, but every time Airflow goes 
around it's parsing loop. This is going to be noisy and create possibly many GB 
of logs per day.

> Allow to print into the log from operators (ability to Base Operator)
> -
>
> Key: AIRFLOW-3000
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3000
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Minor
>
> As described on stack overflow: 
> [https://stackoverflow.com/questions/52144108/how-to-print-a-unique-message-in-airflow-operator]
>  
> Any print in the code will be shown on the log file. However it a problem 
> when creating operators dinamicly
>  
> assume this code:
>  
> {code:java}
> for i in range(5, 0, -1):
>  print("My name is load_ads_to_BigQuery-{}".format{i))
>  update_bigquery = GoogleCloudStorageToBigQueryOperator   
> (task_id='load_ads_to_BigQuery-{}'.format(I),…){code}
>  
> This creates 5 operators.
> The print will be executed 5 times per each operator.
> meaning that if you go to the log of 
> {code:java}
> load_ads_to_BigQuery-1 {code}
> you will see:
>  
>  
> {code:java}
> My name is load_ads_to_BigQuery-1
> My name is load_ads_to_BigQuery-2
> My name is load_ads_to_BigQuery-3
> My name is load_ads_to_BigQuery-4
> My name is load_ads_to_BigQuery-5
> {code}
>  
>  
> This is a problem because it logs messages of the other operators.
>  
> Each operator is unique only with-in the operator itself. meaning that the 
> print should be inside the operator as:
> for i in range(5, 0, -1):
>  
>  
> {code:java}
> update_bigquery = GoogleCloudStorageToBigQueryOperator 
> (task_id='load_ads_to_BigQuery-{}'.format(I) , print("My name is 
> load_ads_to_BigQuery-{}".format{i)) , …){code}
>  
> or something like it. However Airflow does not support printing inside 
> operators. It's not one of the allowed arguments.
> Add optional parameter called 
> {code:java}
> msg_log {code}
> that if assigned with value it will print the value to the log when the 
> operator is executed.
>  
> Please add argument on the Base Operator for printing and extend it as 
> optional ability to all operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] isknight commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-09-04 Thread GitBox
isknight commented on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418303178
 
 
   Apologies @Fokko I was offline for a bit. For some odd reason my git client 
is erroring out when I try to interactively rebase and squash my commits. I'll 
look into it some more tomorrow. Otherwise, if @andrewmchen (and others) are ok 
with my changes, perhaps it would be simpler to make a new PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] isknight edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-09-04 Thread GitBox
isknight edited a comment on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418303178
 
 
   Apologies @Fokko I was offline for a bit. For some odd reason my git client 
is erroring out when I try to interactively rebase and squash my commits. I'll 
look into it some more tomorrow. Otherwise, if @andrewmchen (and others) are ok 
with my changes, perhaps it would be simpler for me to make a new PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb opened a new pull request #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira

2018-09-04 Thread GitBox
ashb opened a new pull request #3841: [AIRFLOW-XXX] Fix python3 errors in 
dev/airflow-jira
URL: https://github.com/apache/incubator-airflow/pull/3841
 
 
   This is a script that checks if the Jira's marked as fixed in a release
   are actually merged in - getting this working is helpful to me in
   preparing 1.10.1
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes: fix the `airflow-jira compare 1.10.1` script to make building the 
point release easier :)
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3841: [AIRFLOW-XXX] Fix python3 errors in 
dev/airflow-jira
URL: 
https://github.com/apache/incubator-airflow/pull/3841#issuecomment-418314621
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=h1)
 Report
   > Merging 
[#3841](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3841/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3841   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=footer).
 Last update 
[da052ff...0a663ed](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira

2018-09-04 Thread GitBox
codecov-io commented on issue #3841: [AIRFLOW-XXX] Fix python3 errors in 
dev/airflow-jira
URL: 
https://github.com/apache/incubator-airflow/pull/3841#issuecomment-418314621
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=h1)
 Report
   > Merging 
[#3841](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3841/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3841   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=footer).
 Last update 
[da052ff...0a663ed](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wmorris75 opened a new pull request #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)

2018-09-04 Thread GitBox
wmorris75 opened a new pull request #3842: [AIRFLOW-2993] s3_to_sftp and 
sftp_to_s3 operators (#3828)
URL: https://github.com/apache/incubator-airflow/pull/3842
 
 
   add 8fit to list of companies
   
   [AIRFLOW-XXX] Add THE ICONIC to the list of orgs using Airflow
   
   Closes #3807 from ksaagariconic/patch-2
   
   [AIRFLOW-2933] Enable Codecov on Docker-CI Build (#3780)
   
   - Add missing variables and use codecov instead of coveralls.
 The issue why it wasn't working was because missing environment variables.
 The codecov library heavily depends on the environment variables in
 the CI to determine how to push the reports to codecov.
   
   - Remove the explicit passing of the variables in the `tox.ini`
 since it is already done in the `docker-compose.yml`,
 having to maintain this at two places makes it brittle.
   
   - Removed the empty Codecov yml since codecov was complaining that
 it was unable to parse it
   
   [AIRFLOW-2960] Pin boto3 to <1.8 (#3810)
   
   Boto 1.8 has been released a few days ago and they break our tests.
   
   [AIRFLOW-2957] Remove obselete sensor references
   
   [AIRFLOW-2959] Refine HTTPSensor doc (#3809)
   
   HTTP Error code other than 404,
   or Connection Refused, would fail the sensor
   itself directly (no more poking).
   
   [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test 
(#3811)
   
   Simplify this test since it takes up 15% of all the time. This is because
   every example dag, with some exclusions, are backfilled. This will put some
   pressure on the scheduler and everything. If the test just covers a couple
   of dags should be sufficient
   
   254 seconds:
   [success] 15.03% tests.BackfillJobTest.test_backfill_examples: 254.9323s
   
   [AIRFLOW-XXX] Remove residual line in Changelog (#3814)
   
   [AIRFLOW-2930] Fix celery excecutor scheduler crash (#3784)
   
   Caused by an update in PR #3740.
   execute_command.apply_async(args=command, ...)
   -command is a list of short unicode strings and the above code pass multiple
   arguments to a function defined as taking only one argument.
   -command = ["airflow", "run", "dag323",...]
   -args = command = ["airflow", "run", "dag323", ...]
   -execute_command("airflow","run","dag3s3", ...) will be error and exit.
   
   [AIRFLOW-2916] Arg `verify` for AwsHook() & S3 sensors/operators (#3764)
   
   This is useful when
   1. users want to use a different CA cert bundle than the
 one used by botocore.
   2. users want to have '--no-verify-ssl'. This is especially useful
 when we're using on-premises S3 or other implementations of
 object storage, like IBM's Cloud Object Storage.
   
   The default value here is `None`, which is also the default
   value in boto3, so that backward compatibility is ensured too.
   
   Reference:
   https://boto3.readthedocs.io/en/latest/reference/core/session.html
   
   [AIRFLOW-2709] Improve error handling in Databricks hook (#3570)
   
   * Use float for default value
   * Use status code to determine whether an error is retryable
   * Fix wrong type in assertion
   * Fix style to prevent lines from exceeding 90 characters
   * Fix wrong way of checking exception type
   
   [AIRFLOW-2854] kubernetes_pod_operator add more configuration items (#3697)
   
   * kubernetes_pod_operator add more configuration items
   * fix test_kubernetes_pod_operator test_faulty_service_account failure case
   * fix review comment issues
   * pod_operator add hostnetwork config
   * add doc example
   
   [AIRFLOW-2994] Fix command status check in Qubole Check operator (#3790)
   
   [AIRFLOW-2928] Use uuid4 instead of uuid1 (#3779)
   
   for better randomness.
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828)
   
   [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828)
   
   [AIRFLOW-2949] Add syntax highlight for single quote strings (#3795)
   
   * AIRFLOW-2949: Add syntax highlight for single quote strings
   
   * AIRFLOW-2949: Also updated new UI main.css
   
   [AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (#3793)
   
   There may be different combinations of arguments, and
   some processings are being done 'silently', while users
   may not be fully aware of them.
   
   For example
   - User only needs to provide either `ssh_hook`
 or `ssh_conn_id`, while this is not clear in doc
   - if both provided, `ssh_conn_id` will be ignored.
   - if `remote_host` is provided, it will replace
 the `remote_host` which wasndefined in `ssh_hook`
 or predefined in the connection of `ssh_conn_id`
   
   These should be documented clearly 

[jira] [Commented] (AIRFLOW-2993) Addition of S3_to_SFTP and SFTP_to_S3 Operators

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603047#comment-16603047
 ] 

ASF GitHub Bot commented on AIRFLOW-2993:
-

wmorris75 opened a new pull request #3842: [AIRFLOW-2993] s3_to_sftp and 
sftp_to_s3 operators (#3828)
URL: https://github.com/apache/incubator-airflow/pull/3842
 
 
   add 8fit to list of companies
   
   [AIRFLOW-XXX] Add THE ICONIC to the list of orgs using Airflow
   
   Closes #3807 from ksaagariconic/patch-2
   
   [AIRFLOW-2933] Enable Codecov on Docker-CI Build (#3780)
   
   - Add missing variables and use codecov instead of coveralls.
 The issue why it wasn't working was because missing environment variables.
 The codecov library heavily depends on the environment variables in
 the CI to determine how to push the reports to codecov.
   
   - Remove the explicit passing of the variables in the `tox.ini`
 since it is already done in the `docker-compose.yml`,
 having to maintain this at two places makes it brittle.
   
   - Removed the empty Codecov yml since codecov was complaining that
 it was unable to parse it
   
   [AIRFLOW-2960] Pin boto3 to <1.8 (#3810)
   
   Boto 1.8 has been released a few days ago and they break our tests.
   
   [AIRFLOW-2957] Remove obselete sensor references
   
   [AIRFLOW-2959] Refine HTTPSensor doc (#3809)
   
   HTTP Error code other than 404,
   or Connection Refused, would fail the sensor
   itself directly (no more poking).
   
   [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test 
(#3811)
   
   Simplify this test since it takes up 15% of all the time. This is because
   every example dag, with some exclusions, are backfilled. This will put some
   pressure on the scheduler and everything. If the test just covers a couple
   of dags should be sufficient
   
   254 seconds:
   [success] 15.03% tests.BackfillJobTest.test_backfill_examples: 254.9323s
   
   [AIRFLOW-XXX] Remove residual line in Changelog (#3814)
   
   [AIRFLOW-2930] Fix celery excecutor scheduler crash (#3784)
   
   Caused by an update in PR #3740.
   execute_command.apply_async(args=command, ...)
   -command is a list of short unicode strings and the above code pass multiple
   arguments to a function defined as taking only one argument.
   -command = ["airflow", "run", "dag323",...]
   -args = command = ["airflow", "run", "dag323", ...]
   -execute_command("airflow","run","dag3s3", ...) will be error and exit.
   
   [AIRFLOW-2916] Arg `verify` for AwsHook() & S3 sensors/operators (#3764)
   
   This is useful when
   1. users want to use a different CA cert bundle than the
 one used by botocore.
   2. users want to have '--no-verify-ssl'. This is especially useful
 when we're using on-premises S3 or other implementations of
 object storage, like IBM's Cloud Object Storage.
   
   The default value here is `None`, which is also the default
   value in boto3, so that backward compatibility is ensured too.
   
   Reference:
   https://boto3.readthedocs.io/en/latest/reference/core/session.html
   
   [AIRFLOW-2709] Improve error handling in Databricks hook (#3570)
   
   * Use float for default value
   * Use status code to determine whether an error is retryable
   * Fix wrong type in assertion
   * Fix style to prevent lines from exceeding 90 characters
   * Fix wrong way of checking exception type
   
   [AIRFLOW-2854] kubernetes_pod_operator add more configuration items (#3697)
   
   * kubernetes_pod_operator add more configuration items
   * fix test_kubernetes_pod_operator test_faulty_service_account failure case
   * fix review comment issues
   * pod_operator add hostnetwork config
   * add doc example
   
   [AIRFLOW-2994] Fix command status check in Qubole Check operator (#3790)
   
   [AIRFLOW-2928] Use uuid4 instead of uuid1 (#3779)
   
   for better randomness.
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
   
   [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828)
   
   [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828)
   
   [AIRFLOW-2949] Add syntax highlight for single quote strings (#3795)
   
   * AIRFLOW-2949: Add syntax highlight for single quote strings
   
   * AIRFLOW-2949: Also updated new UI main.css
   
   [AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (#3793)
   
   There may be different combinations of arguments, and
   some processings are being done 'silently', while users
   may not be fully aware of them.
   
   For example
   - User only needs to provide either `ssh_hook`
 or `ssh_conn_id`, while this is not clear 

[GitHub] gauthiermartin commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose

2018-09-04 Thread GitBox
gauthiermartin commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + 
docker-compose
URL: 
https://github.com/apache/incubator-airflow/pull/3797#issuecomment-418388832
 
 
   @dimberman Currently I'm having an issue while running the ./docker/build.sh 
locally.  There still seem to be an issue with the 
SLUGIFY_USES_TEXT_UNIDECODE=yes while running the script locally. I know you 
have added that env var in travis-ci.yml file but it is also required when 
running the script locally. Should we export it in the build.sh file ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] dalupus commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-09-04 Thread GitBox
dalupus commented on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418395125
 
 
   lgtm


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator 
for S3 object copying/deleting
URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214944518
 
 

 ##
 File path: airflow/contrib/operators/s3_delete_objects_operator.py
 ##
 @@ -0,0 +1,92 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.exceptions import AirflowException
+from airflow.hooks.S3_hook import S3Hook
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class S3DeleteObjectsOperator(BaseOperator):
+"""
+To enable users to delete single object or multiple objects from
+a bucket using a single HTTP request.
+
+Users may specify up to 1000 keys to delete.
+
+:param bucket: Name of the bucket in which you are going to delete 
object(s)
+:type bucket: str
+:param keys: The key(s) to delete from S3 bucket.
+
+When ``keys`` is a string, it's supposed to be the key name of
+the single object to delete.
+
+When ``keys`` is a list, it's supposed to be the list of the
+keys to delete.
+:type keys: str or list
+:param s3_conn_id: Connection id of the S3 connection to use
+:type s3_conn_id: str
+:param verify: Whether or not to verify SSL certificates for S3 connetion.
+By default SSL certificates are verified.
+
+You can provide the following values:
+
+- False: do not validate SSL certificates. SSL will still be used,
+ but SSL certificates will not be
+ verified.
+- path/to/cert/bundle.pem: A filename of the CA cert bundle to uses.
+ You can specify this argument if you want to use a different
+ CA cert bundle than the one used by botocore.
+:type verify: bool or str
+:param silent_on_errors: If set to `True`, ignore `Errors` in the boto3 
response.
+Default value is `False`.
+
+`Errors` here arise due to reasons like users are trying to delete 
non-existent
+objects. It doesn't necessarily indicate exception (the request sent 
to S3
+itself succeeds).
+:type silent_on_errors: bool
+"""
+
+@apply_defaults
+def __init__(
+self,
+bucket,
+keys,
+s3_conn_id='aws_default',
+verify=None,
+silent_on_errors=False,
+*args, **kwargs):
+super(S3DeleteObjectsOperator, self).__init__(*args, **kwargs)
+self.bucket = bucket
+self.keys = keys
+self.s3_conn_id = s3_conn_id
+self.verify = verify
+self.silent_on_errors = silent_on_errors
+
+def execute(self, context):
+s3_hook = S3Hook(aws_conn_id=self.s3_conn_id, verify=self.verify)
+
+response = s3_hook._delete_objects(bucket=self.bucket, keys=self.keys)
+
+deleted_keys = [x['Key'] for x in response.get("Deleted", [])]
+self.log.info("Deleted: {}".format(deleted_keys))
+
+if not self.silent_on_errors and "Errors" in response:
 
 Review comment:
   Regarding the `Errors` in the boto3 response, I use argument 
`silent_on_errors` to enable users decide whether they consider `deleting 
non-existent object` as exceptions and if the operator should fail.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator 
for S3 object copying/deleting
URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214945536
 
 

 ##
 File path: airflow/hooks/S3_hook.py
 ##
 @@ -384,3 +384,89 @@ def load_bytes(self,
 
 client = self.get_conn()
 client.upload_fileobj(filelike_buffer, bucket_name, key, 
ExtraArgs=extra_args)
+
+def _copy_object(self,
 
 Review comment:
   I name the methods in `S3Hook()` as `_copy_object`, in order to avoid 
confusion between `boto3.client.copy_object` and `S3Hook.copy_object`.
   
   The same applies to  `_delete_objects`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator 
for S3 object copying/deleting
URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214945861
 
 

 ##
 File path: tests/contrib/operators/test_s3_delete_objects_operator.py
 ##
 @@ -0,0 +1,111 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import io
+import unittest
+
+import boto3
+from moto import mock_s3
+
+from airflow.contrib.operators.s3_delete_objects_operator import 
S3DeleteObjectsOperator
+from airflow.exceptions import AirflowException
+
+
+class TestS3DeleteObjectsOperator(unittest.TestCase):
+
+@mock_s3
+def test_s3_delete_single_object(self):
+bucket = "testbucket"
+key = "path/data.txt"
+
+conn = boto3.client('s3')
+conn.create_bucket(Bucket=bucket)
+conn.upload_fileobj(Bucket=bucket,
+Key=key,
+Fileobj=io.BytesIO(b"input"))
+
+# The object should be detected before the DELETE action is taken
+objects_in_dest_bucket = conn.list_objects(Bucket=bucket,
+   Prefix=key)
+self.assertEqual(len(objects_in_dest_bucket['Contents']), 1)
+self.assertEqual(objects_in_dest_bucket['Contents'][0]['Key'], key)
+
+t = 
S3DeleteObjectsOperator(task_id="test_task_s3_delete_single_object",
+bucket=bucket,
+keys=key)
+t.execute(None)
+
+# There should be no object found in the bucket created earlier
+self.assertFalse('Contents' in conn.list_objects(Bucket=bucket,
+ Prefix=key))
+
+@mock_s3
+def test_s3_delete_multiple_objects(self):
+bucket = "testbucket"
+key_pattern = "path/data"
+n_keys = 3
+keys = [key_pattern + str(i) for i in range(n_keys)]
+
+conn = boto3.client('s3')
+conn.create_bucket(Bucket=bucket)
+for k in keys:
+conn.upload_fileobj(Bucket=bucket,
+Key=k,
+Fileobj=io.BytesIO(b"input"))
+
+# The objects should be detected before the DELETE action is taken
+objects_in_dest_bucket = conn.list_objects(Bucket=bucket,
+   Prefix=key_pattern)
+self.assertEqual(len(objects_in_dest_bucket['Contents']), n_keys)
+self.assertEqual(sorted([x['Key'] for x in 
objects_in_dest_bucket['Contents']]),
+ sorted(keys))
+
+t = 
S3DeleteObjectsOperator(task_id="test_task_s3_delete_multiple_objects",
+bucket=bucket,
+keys=keys)
+t.execute(None)
+
+# There should be no object found in the bucket created earlier
+self.assertFalse('Contents' in conn.list_objects(Bucket=bucket,
+ Prefix=key_pattern))
+
+@mock_s3
+def test_s3_delete_non_existent_object(self):
 
 Review comment:
   A test case is added to test the argument `silent_on_errors`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator 
for S3 object copying/deleting
URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214946897
 
 

 ##
 File path: airflow/contrib/operators/s3_delete_objects_operator.py
 ##
 @@ -0,0 +1,92 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from airflow.exceptions import AirflowException
+from airflow.hooks.S3_hook import S3Hook
+from airflow.models import BaseOperator
+from airflow.utils.decorators import apply_defaults
+
+
+class S3DeleteObjectsOperator(BaseOperator):
 
 Review comment:
   Personally I still suggest not support S3 style url in 
`S3DeleteObjectsOperator`. This is to make
   - argument combination clear
   - support deleting single object and deleting multiple object within one 
operator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator 
for S3 object copying/deleting
URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214947531
 
 

 ##
 File path: airflow/hooks/S3_hook.py
 ##
 @@ -384,3 +384,89 @@ def load_bytes(self,
 
 client = self.get_conn()
 client.upload_fileobj(filelike_buffer, bucket_name, key, 
ExtraArgs=extra_args)
+
+def _copy_object(self,
 
 Review comment:
   I wouldn't. A method prefixed with `_` in python is usually an indication 
that it is "private" and shouldn't be called from outside the method.
   
   Given the S3Hook doesn't directly expose any methods from the boto 
client/session object I think the chance of confusion is slight. (and a doc 
string highlighting any differences will help dispel confusion)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object 
copying/deleting
URL: 
https://github.com/apache/incubator-airflow/pull/3823#issuecomment-418398128
 
 
   Hi @ashb , I have addressed your earlier review comments. May you take 
another look? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator 
for S3 object copying/deleting
URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214949579
 
 

 ##
 File path: airflow/hooks/S3_hook.py
 ##
 @@ -384,3 +384,89 @@ def load_bytes(self,
 
 client = self.get_conn()
 client.upload_fileobj(filelike_buffer, bucket_name, key, 
ExtraArgs=extra_args)
+
+def _copy_object(self,
 
 Review comment:
   Sure, I will change this part.
   
   Actually the chance of misusing is low given the argument required are 
totally different.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (AIRFLOW-2062) Support fine-grained Connection encryption

2018-09-04 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2062:
--

Assignee: Jasper Kahn

> Support fine-grained Connection encryption
> --
>
> Key: AIRFLOW-2062
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2062
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Reporter: Wilson Lian
>Assignee: Jasper Kahn
>Priority: Minor
>
> This effort targets containerized tasks (e.g., those launched by 
> KubernetesExecutor). Under that paradigm, each task could potentially operate 
> under different credentials, and fine-grained Connection encryption will 
> enable an administrator to restrict which connections can be accessed by 
> which tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object 
copying/deleting
URL: 
https://github.com/apache/incubator-airflow/pull/3823#issuecomment-418412622
 
 
   Hi @ashb , I have addressed the naming of methods within `S3Hook()` (removed 
prepending `_`).
   
   Any other comment?
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 
object copying/deleting
URL: 
https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=h1)
 Report
   > Merging 
[#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `90.47%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#3823  +/-   ##
   ==
   + Coverage   77.43%   77.45%   +0.01% 
   ==
 Files 203  203  
 Lines   1584615867  +21 
   ==
   + Hits1227112290  +19 
   - Misses   3575 3577   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=tree) 
| Coverage Δ | |
   |---|---|---|
   | 
[airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3823/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5)
 | `94.32% <90.47%> (-0.68%)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=footer).
 Last update 
[da052ff...3144472](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2999) S3_hook - add the ability to download file to local disk

2018-09-04 Thread jack (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603239#comment-16603239
 ] 

jack commented on AIRFLOW-2999:
---

[~XD-DENG] seems like your territory if you would like to take it :) 

> S3_hook  - add the ability to download file to local disk
> -
>
> Key: AIRFLOW-2999
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2999
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>
> The [S3_hook 
> |https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/S3_hook.py#L177]
>  has get_key method that returns boto3.s3.Object it also has load_file method 
> which loads file from local file system to S3.
>  
> What it doesn't have is a method to download a file from S3 to the local file 
> system.
> Basicly it should be something very simple... an extention to the get_key 
> method with parameter to the destination on local file system adding a code 
> for taking the boto3.s3.Object and save it on the disk.  Note: that it can be 
> more than 1 file if the user choose a folder in S3.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file

2018-09-04 Thread Jeffrey Payne (JIRA)
Jeffrey Payne created AIRFLOW-3002:
--

 Summary: ValueError in dataflow operators when using GCS jar or 
py_file
 Key: AIRFLOW-3002
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3002
 Project: Apache Airflow
  Issue Type: Bug
  Components: contrib, Dataflow
Affects Versions: 1.9.0, 2.0.0
Reporter: Jeffrey Payne
Assignee: Kaxil Naik
 Fix For: 1.10.1


The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to 
compare a list to an int, resulting in the TypeError, with:
{noformat}
...
path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
if path_components < 2:
...
{noformat}
This should be {{if len(path_components) < 2:}}.

Also, fix {{if file_size > 0:}} in same function...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file

2018-09-04 Thread Jeffrey Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Payne updated AIRFLOW-3002:
---
Description: 
The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with a 
ValueError, with:
{noformat}
...
file_size = self._gcs_hook.download(bucket_id, object_id, local_file)

if os.stat(file_size).st_size > 0:
return local_file
...
{noformat}
The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed in 
is actually the downloaded bytes from {{GoogleCloudStorageHook.download()}}.

The error is like:
{noformat}
[2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask 
surge_export   File 
"/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py",
 line 372, in google_cloud_to_local
[2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
surge_export if os.stat(file_size).st_size > 0:
[2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
surge_export ValueError: stat: embedded null character in path
{noformat}


  was:
The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to 
compare a list to an int, resulting in the TypeError, with:
{noformat}
...
path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
if path_components < 2:
...
{noformat}
This should be {{if len(path_components) < 2:}}.

Also, fix {{if file_size > 0:}} in same function...


> ValueError in dataflow operators when using GCS jar or py_file
> --
>
> Key: AIRFLOW-3002
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3002
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, Dataflow
>Affects Versions: 1.9.0, 2.0.0
>Reporter: Jeffrey Payne
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.1
>
>
> The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with 
> a ValueError, with:
> {noformat}
> ...
> file_size = self._gcs_hook.download(bucket_id, object_id, local_file)
> if os.stat(file_size).st_size > 0:
> return local_file
> ...
> {noformat}
> The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed 
> in is actually the downloaded bytes from 
> {{GoogleCloudStorageHook.download()}}.
> The error is like:
> {noformat}
> [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export   File 
> "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py",
>  line 372, in google_cloud_to_local
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export if os.stat(file_size).st_size > 0:
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export ValueError: stat: embedded null character in path
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file

2018-09-04 Thread Jeffrey Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603276#comment-16603276
 ] 

Jeffrey Payne commented on AIRFLOW-3002:


[~kaxilnaik] Opening a PR for this.  Change should just be to {{if 
os.stat(local_file).st_size > 0:}}, no?

> ValueError in dataflow operators when using GCS jar or py_file
> --
>
> Key: AIRFLOW-3002
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3002
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, Dataflow
>Affects Versions: 1.9.0, 2.0.0
>Reporter: Jeffrey Payne
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.1
>
>
> The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with 
> a ValueError, with:
> {noformat}
> ...
> file_size = self._gcs_hook.download(bucket_id, object_id, local_file)
> if os.stat(file_size).st_size > 0:
> return local_file
> ...
> {noformat}
> The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed 
> in is actually the downloaded bytes from 
> {{GoogleCloudStorageHook.download()}}.
> The error is like:
> {noformat}
> [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export   File 
> "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py",
>  line 372, in google_cloud_to_local
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export if os.stat(file_size).st_size > 0:
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export ValueError: stat: embedded null character in path
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jeffkpayne opened a new pull request #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
jeffkpayne opened a new pull request #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: https://github.com/apache/incubator-airflow/pull/3843
 
 
   …le_cloud_to_local
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3002
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603287#comment-16603287
 ] 

ASF GitHub Bot commented on AIRFLOW-3002:
-

jeffkpayne opened a new pull request #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: https://github.com/apache/incubator-airflow/pull/3843
 
 
   …le_cloud_to_local
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3002
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ValueError in dataflow operators when using GCS jar or py_file
> --
>
> Key: AIRFLOW-3002
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3002
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, Dataflow
>Affects Versions: 1.9.0, 2.0.0
>Reporter: Jeffrey Payne
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 1.10.1
>
>
> The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with 
> a ValueError, with:
> {noformat}
> ...
> file_size = self._gcs_hook.download(bucket_id, object_id, local_file)
> if os.stat(file_size).st_size > 0:
> return local_file
> ...
> {noformat}
> The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed 
> in is actually the downloaded bytes from 
> {{GoogleCloudStorageHook.download()}}.
> The error is like:
> {noformat}
> [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export   File 
> "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py",
>  line 372, in google_cloud_to_local
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export if os.stat(file_size).st_size > 0:
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask 
> surge_export ValueError: stat: embedded null character in path
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)

2018-09-04 Thread Trevor Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603302#comment-16603302
 ] 

Trevor Edwards commented on AIRFLOW-2319:
-

+1 to this issue. There is an id column, but aside from this, it seems like 
only the pair (dag_id, 
[run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384])]
 should be enforced as unique. The current behavior feels like a bug.

 

This issue becomes problematic if you have event-driven DAGs (e.g. 
https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf) which 
may have different parameters execute simultaneously, causing an execution_date 
collision.

 

Andreas, are you working on a fix for this?

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> --
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.9.0
>Reporter: Andreas Költringer
>Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} 
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the 
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right 
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with 
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>     id INTEGER NOT NULL, 
>     dag_id VARCHAR(250), 
>     execution_date DATETIME, 
>     state VARCHAR(50), 
>     run_id VARCHAR(250), 
>     external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date 
> DATETIME, 
>     PRIMARY KEY (id), 
>     UNIQUE (dag_id, execution_date), 
>     UNIQUE (dag_id, run_id), 
>     CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in 
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this 
> is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on a change in pull request #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira

2018-09-04 Thread GitBox
feng-tao commented on a change in pull request #3841: [AIRFLOW-XXX] Fix python3 
errors in dev/airflow-jira
URL: https://github.com/apache/incubator-airflow/pull/3841#discussion_r214988582
 
 

 ##
 File path: dev/airflow-jira
 ##
 @@ -134,7 +134,7 @@ def compare(target_version):
 
 for issue in issues:
 is_merged = issue.key in merges
-print("{:<18}|{:<12}||{:<10}||{:<10}|{:<50}|{:<6}|{:<6}|{:<40}"
+print("{:<18}|{!s:<12}||{!s:<10}||{!s:<10}|{:<50}|{:<6}|{:<6}|{:<40}"
 
 Review comment:
   what is this line change trying to fix?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size 
of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418442065
 
 
   Good Spot. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size 
of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418442350
 
 
   Will merge once the Travis passes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size 
of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418443141
 
 
   @jeffkpayne Can you please add a unit test for this as well ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil removed a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
kaxil removed a comment on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418442350
 
 
   Will merge once the Travis passes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
kaxil edited a comment on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418443141
 
 
   @jeffkpayne Can you please add a unit test for this as well ? Also when you 
push new commits remember to squash commits and make sure that the subject is 
limited to 50 characters (not including Jira issue reference) as we would use 
this in CHANGELOG.md.
   
   Thanks. Appreciate the effort.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1)
 Report
   > Merging 
[#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3843   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer).
 Last update 
[da052ff...df9ef49](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
codecov-io commented on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1)
 Report
   > Merging 
[#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3843   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer).
 Last update 
[da052ff...df9ef49](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'

2018-09-04 Thread GitBox
feng-tao commented on issue #3840: [AIRFLOW-3001] Add task_instance table index 
'ti_dag_date'
URL: 
https://github.com/apache/incubator-airflow/pull/3840#issuecomment-418449675
 
 
   could you add more description on what is this change about? Why do we need 
a new index? And I think we need a new alemic script for any model change 
instead of modifying existing one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418455599
 
 
   @kaxil Will do, but wrt squashed commits, I only had one commit so far ;)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3003) Pull the krb5 image instead of building it

2018-09-04 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created AIRFLOW-3003:
-

 Summary: Pull the krb5 image instead of building it
 Key: AIRFLOW-3003
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3003
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Fokko Driesprong


For the CI we use a krb5 image to test kerberos functionality. This is not 
something that we want to since it is faster to pull the finished image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3003) Pull the krb5 image instead of building it

2018-09-04 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong reassigned AIRFLOW-3003:
-

Assignee: Fokko Driesprong

> Pull the krb5 image instead of building it
> --
>
> Key: AIRFLOW-3003
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3003
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>
> For the CI we use a krb5 image to test kerberos functionality. This is not 
> something that we want to since it is faster to pull the finished image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3003) Pull the krb5 image instead of building it

2018-09-04 Thread Fokko Driesprong (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated AIRFLOW-3003:
--
Issue Type: Improvement  (was: Bug)

> Pull the krb5 image instead of building it
> --
>
> Key: AIRFLOW-3003
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3003
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
>
> For the CI we use a krb5 image to test kerberos functionality. This is not 
> something that we want to since it is faster to pull the finished image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] r39132 commented on a change in pull request #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime

2018-09-04 Thread GitBox
r39132 commented on a change in pull request #3834: [AIRFLOW-2965] CLI tool to 
show the next execution datetime
URL: https://github.com/apache/incubator-airflow/pull/3834#discussion_r215010936
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -551,6 +551,17 @@ def dag_state(args):
 print(dr[0].state if len(dr) > 0 else None)
 
 
+@cli_utils.action_logging
 
 Review comment:
   @XD-DENG I just tested this with some of the example dags in 
https://github.com/apache/incubator-airflow/tree/master/airflow/example_dags. 
Can you test your code with different schedule types including `@once`, 
`daily/weekly`, `timedelta(hours=1)`, etc... in addition to the case you 
provide which is cron expressions. Also, can you add tests for these?
   
   ```
   (venv) sianand@LM-SJN-21002367:~/Projects/airflow_incubator $ airflow 
next_execution latest_only
   [2018-09-04 10:52:19,613] {__init__.py:51} INFO - Using executor 
SequentialExecutor
   
/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/bin/cli.py:1724:
 DeprecationWarning: The celeryd_concurrency option in [celery] has been 
renamed to worker_concurrency - the old setting has been used, but please 
update your config.
 default=conf.get('celery', 'worker_concurrency')),
   [2018-09-04 10:52:19,822] {models.py:260} INFO - Filling up the DagBag from 
/Users/sianand/Projects/airflow_incubator/dags
   [2018-09-04 10:52:19,882] {example_kubernetes_operator.py:55} WARNING - 
Could not import KubernetesPodOperator: No module named 'kubernetes'
   [2018-09-04 10:52:19,882] {example_kubernetes_operator.py:56} WARNING - 
Install kubernetes dependencies with: pip install apache-airflow[kubernetes]
   Traceback (most recent call last):
 File "/Users/sianand/miniconda3/bin/airflow", line 4, in 
   
__import__('pkg_resources').run_script('apache-airflow==2.0.0.dev0+incubating', 
'airflow')
 File 
"/Users/sianand/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py",
 line 654, in run_script
   self.require(requires)[0].run_script(script_name, ns)
 File 
"/Users/sianand/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py",
 line 1434, in run_script
   exec(code, namespace, namespace)
 File 
"/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/EGG-INFO/scripts/airflow",
 line 32, in 
   args.func(args)
 File 
"/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/utils/cli.py",
 line 74, in wrapper
   return f(*args, **kwargs)
 File 
"/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/bin/cli.py",
 line 562, in next_execution
   print(dag.following_schedule(dag.latest_execution_date))
 File 
"/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/models.py",
 line 3371, in following_schedule
   return dttm + self._schedule_interval
   TypeError: unsupported operand type(s) for +: 'NoneType' and 
'datetime.timedelta'
   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3004) Add configuration option to disable schedules

2018-09-04 Thread Jacob Greenfield (JIRA)
Jacob Greenfield created AIRFLOW-3004:
-

 Summary: Add configuration option to disable schedules
 Key: AIRFLOW-3004
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3004
 Project: Apache Airflow
  Issue Type: Improvement
  Components: configuration, scheduler
Reporter: Jacob Greenfield
Assignee: Jacob Greenfield


We have a particular use case where we'd like there to be a configuration 
option that controls the scheduler and prevents use of the cron schedules 
globally for all DAGs, while still allowing manual submission (trigger_dag) and 
task instances being scheduled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime

2018-09-04 Thread GitBox
Fokko commented on issue #3834: [AIRFLOW-2965] CLI tool to show the next 
execution datetime
URL: 
https://github.com/apache/incubator-airflow/pull/3834#issuecomment-418470603
 
 
   I've tried working with the next_execution but stumbled on some problems. 
For example, if the dag hasn't run yet, I got error since it tries to fetch it 
from the database. Please keep this in mind. Personally I would prefer a 
next_execution date that is computed based on the schedule instead or having 
the scheduler fill this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko removed a comment on issue #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime

2018-09-04 Thread GitBox
Fokko removed a comment on issue #3834: [AIRFLOW-2965] CLI tool to show the 
next execution datetime
URL: 
https://github.com/apache/incubator-airflow/pull/3834#issuecomment-418470603
 
 
   I've tried working with the next_execution but stumbled on some problems. 
For example, if the dag hasn't run yet, I got error since it tries to fetch it 
from the database. Please keep this in mind. Personally I would prefer a 
next_execution date that is computed based on the schedule instead or having 
the scheduler fill this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …

2018-09-04 Thread GitBox
Fokko commented on issue #3813: [AIRFLOW-1998] Implemented 
DatabricksRunNowOperator for jobs/run-now …
URL: 
https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418474040
 
 
   No problem @isknight 
   
   @andrewmchen any final thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)

2018-09-04 Thread GitBox
Fokko commented on issue #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 
operators (#3828)
URL: 
https://github.com/apache/incubator-airflow/pull/3842#issuecomment-418474332
 
 
   @wmorris75 Something obviously went wrong. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418490734
 
 
   Looks like the 2.7 builds are failing.  Looking into this...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418491999
 
 
   Gah, found it...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3828: [AIRFLOW-2993] s3_to_sftp and 
sftp_to_s3 operators
URL: 
https://github.com/apache/incubator-airflow/pull/3828#issuecomment-417764201
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=h1)
 Report
   > Merging 
[#3828](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3828/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3828   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=footer).
 Last update 
[da052ff...2463827](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators

2018-09-04 Thread GitBox
wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 
operators
URL: 
https://github.com/apache/incubator-airflow/pull/3828#issuecomment-418505257
 
 
   I made some fixes to the commits with the most recent push. Hopefully that 
should resolve the commit issues that came up earlier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1)
 Report
   > Merging 
[#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3843   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer).
 Last update 
[da052ff...9413435](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1)
 Report
   > Merging 
[#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3843   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer).
 Last update 
[da052ff...9413435](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko opened a new pull request #3844: [AIRFLOW-3003] Pull the krb5 image instead of building

2018-09-04 Thread GitBox
Fokko opened a new pull request #3844: [AIRFLOW-3003] Pull the krb5 image 
instead of building
URL: https://github.com/apache/incubator-airflow/pull/3844
 
 
   Pull the image instead of building it, this will speed up the CI process 
since we don't have to build it every time.
   
   I did a test, on the current master (17.430 seconds in total): 
https://travis-ci.org/Fokko/incubator-airflow/builds/424438601 The PR takes 
17.004 seconds in total: 
https://travis-ci.org/Fokko/incubator-airflow/builds/424479941
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3003\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3003
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3003\], code changes always need a Jira issue.
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3003) Pull the krb5 image instead of building it

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603581#comment-16603581
 ] 

ASF GitHub Bot commented on AIRFLOW-3003:
-

Fokko opened a new pull request #3844: [AIRFLOW-3003] Pull the krb5 image 
instead of building
URL: https://github.com/apache/incubator-airflow/pull/3844
 
 
   Pull the image instead of building it, this will speed up the CI process 
since we don't have to build it every time.
   
   I did a test, on the current master (17.430 seconds in total): 
https://travis-ci.org/Fokko/incubator-airflow/builds/424438601 The PR takes 
17.004 seconds in total: 
https://travis-ci.org/Fokko/incubator-airflow/builds/424479941
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-3003\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3003
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-3003\], code changes always need a Jira issue.
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Pull the krb5 image instead of building it
> --
>
> Key: AIRFLOW-3003
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3003
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>
> For the CI we use a krb5 image to test kerberos functionality. This is not 
> something that we want to since it is faster to pull the finished image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3005) Replace 'Airbnb Airflow' with 'Apache Airflow'

2018-09-04 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-3005:
---

 Summary: Replace 'Airbnb Airflow' with 'Apache Airflow'
 Key: AIRFLOW-3005
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3005
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docs, Documentation
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 2.0.0


There are still many files where Airbnb is mentioned or the links point to 
broken pages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil opened a new pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'

2018-09-04 Thread GitBox
kaxil opened a new pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' 
with 'Apache Airflow'
URL: https://github.com/apache/incubator-airflow/pull/3845
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3005
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3005) Replace 'Airbnb Airflow' with 'Apache Airflow'

2018-09-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603601#comment-16603601
 ] 

ASF GitHub Bot commented on AIRFLOW-3005:
-

kaxil opened a new pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' 
with 'Apache Airflow'
URL: https://github.com/apache/incubator-airflow/pull/3845
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3005
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace 'Airbnb Airflow' with 'Apache Airflow'
> --
>
> Key: AIRFLOW-3005
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3005
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, Documentation
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 2.0.0
>
>
> There are still many files where Airbnb is mentioned or the links point to 
> broken pages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] codecov-io commented on issue #3844: [AIRFLOW-3003] Pull the krb5 image instead of building

2018-09-04 Thread GitBox
codecov-io commented on issue #3844: [AIRFLOW-3003] Pull the krb5 image instead 
of building
URL: 
https://github.com/apache/incubator-airflow/pull/3844#issuecomment-418521145
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=h1)
 Report
   > Merging 
[#3844](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3844/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3844   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=footer).
 Last update 
[da052ff...3336883](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io edited a comment on issue #3844: [AIRFLOW-3003] Pull the krb5 image instead of building

2018-09-04 Thread GitBox
codecov-io edited a comment on issue #3844: [AIRFLOW-3003] Pull the krb5 image 
instead of building
URL: 
https://github.com/apache/incubator-airflow/pull/3844#issuecomment-418521145
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=h1)
 Report
   > Merging 
[#3844](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=desc)
 into 
[master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-airflow/pull/3844/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=tree)
   
   ```diff
   @@   Coverage Diff   @@
   ##   master#3844   +/-   ##
   ===
 Coverage   77.43%   77.43%   
   ===
 Files 203  203   
 Lines   1584615846   
   ===
 Hits1227112271   
 Misses   3575 3575
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=footer).
 Last update 
[da052ff...3336883](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'

2018-09-04 Thread GitBox
feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 
'Airbnb Airflow' with 'Apache Airflow'
URL: https://github.com/apache/incubator-airflow/pull/3845#discussion_r215070750
 
 

 ##
 File path: tests/sensors/test_http_sensor.py
 ##
 @@ -178,7 +178,7 @@ def test_get_response_check(self):
 method='GET',
 endpoint='/search',
 data={"client": "ubuntu", "q": "airflow"},
-response_check=lambda response: ("airbnb/airflow" in 
response.text),
+response_check=lambda response: ("apache/airflow" in 
response.text),
 
 Review comment:
   same


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'

2018-09-04 Thread GitBox
feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 
'Airbnb Airflow' with 'Apache Airflow'
URL: https://github.com/apache/incubator-airflow/pull/3845#discussion_r215070725
 
 

 ##
 File path: tests/sensors/test_http_sensor.py
 ##
 @@ -140,7 +140,7 @@ class FakeSession(object):
 def __init__(self):
 self.response = requests.Response()
 self.response.status_code = 200
-self.response._content = 'airbnb/airflow'.encode('ascii', 'ignore')
+self.response._content = 'apache/airflow'.encode('ascii', 'ignore')
 
 Review comment:
   should it be apache/incubator-airflow ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)

2018-09-04 Thread JIRA


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603611#comment-16603611
 ] 

Andreas Költringer commented on AIRFLOW-2319:
-

I was thinking about possible fixes (see my comment above). The problem are the 
different database backends - e.g. Sqlite does not support dropping uniqueness 
constraints.

Lacking confirmation by the projects' top committers/leaders that this is 
actually a bug (and not intended due to some reason I might not see) - I did 
not proceed.

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> --
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.9.0
>Reporter: Andreas Költringer
>Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} 
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the 
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right 
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with 
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>     id INTEGER NOT NULL, 
>     dag_id VARCHAR(250), 
>     execution_date DATETIME, 
>     state VARCHAR(50), 
>     run_id VARCHAR(250), 
>     external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date 
> DATETIME, 
>     PRIMARY KEY (id), 
>     UNIQUE (dag_id, execution_date), 
>     UNIQUE (dag_id, run_id), 
>     CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in 
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this 
> is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)

2018-09-04 Thread Trevor Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603302#comment-16603302
 ] 

Trevor Edwards edited comment on AIRFLOW-2319 at 9/4/18 9:23 PM:
-

+1 to this issue. There is an id column, but aside from this, it seems like 
only the pair (dag_id, 
[run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]])
 should be enforced as unique. The current behavior feels like a bug.

 

This issue becomes problematic if you have event-driven DAGs (e.g. 
[https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) 
which may have different parameters execute simultaneously, causing an 
execution_date collision.

 

Andreas, are you working on a fix for this?


was (Author: trevoredwards):
+1 to this issue. There is an id column, but aside from this, it seems like 
only the pair (dag_id, 
[run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384])]
 should be enforced as unique. The current behavior feels like a bug.

 

This issue becomes problematic if you have event-driven DAGs (e.g. 
https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf) which 
may have different parameters execute simultaneously, causing an execution_date 
collision.

 

Andreas, are you working on a fix for this?

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> --
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.9.0
>Reporter: Andreas Költringer
>Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} 
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the 
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right 
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with 
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>     id INTEGER NOT NULL, 
>     dag_id VARCHAR(250), 
>     execution_date DATETIME, 
>     state VARCHAR(50), 
>     run_id VARCHAR(250), 
>     external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date 
> DATETIME, 
>     PRIMARY KEY (id), 
>     UNIQUE (dag_id, execution_date), 
>     UNIQUE (dag_id, run_id), 
>     CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in 
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this 
> is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)

2018-09-04 Thread Trevor Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603302#comment-16603302
 ] 

Trevor Edwards edited comment on AIRFLOW-2319 at 9/4/18 9:25 PM:
-

+1 to this issue. There is an id column, but aside from this, it seems like 
only the pair (dag_id, 
[run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]]
 ) should be enforced as unique. The current behavior feels like a bug.

 

 

This issue becomes problematic if you have event-driven DAGs (e.g. 
[https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) 
which may have different parameters execute simultaneously, causing an 
execution_date collision.

 

Andreas, are you working on a fix for this?


was (Author: trevoredwards):
+1 to this issue. There is an id column, but aside from this, it seems like 
only the pair (dag_id, 
[run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]])
 should be enforced as unique. The current behavior feels like a bug.

 

This issue becomes problematic if you have event-driven DAGs (e.g. 
[https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) 
which may have different parameters execute simultaneously, causing an 
execution_date collision.

 

Andreas, are you working on a fix for this?

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> --
>
> Key: AIRFLOW-2319
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.9.0
>Reporter: Andreas Költringer
>Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} 
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the 
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right 
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with 
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>     id INTEGER NOT NULL, 
>     dag_id VARCHAR(250), 
>     execution_date DATETIME, 
>     state VARCHAR(50), 
>     run_id VARCHAR(250), 
>     external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date 
> DATETIME, 
>     PRIMARY KEY (id), 
>     UNIQUE (dag_id, execution_date), 
>     UNIQUE (dag_id, run_id), 
>     CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in 
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this 
> is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor

2018-09-04 Thread GitBox
yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor
URL: 
https://github.com/apache/incubator-airflow/pull/3830#issuecomment-418531406
 
 
   @kaxil Tyvm. We definitely should test thoroughly. Just to provide a data 
point here, the change has been running in Airbnb production for 2+ months plus 
more times in stress test cluster( we're running 1.8 + celery executor).
   
   For the Codecov, should I rebase to fix it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'

2018-09-04 Thread GitBox
kaxil commented on issue #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 
'Apache Airflow'
URL: 
https://github.com/apache/incubator-airflow/pull/3845#issuecomment-418531520
 
 
   @feng-tao Made the necessary changes :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables

2018-09-04 Thread GitBox
kaxil commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered 
tables
URL: 
https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418531811
 
 
   Can you squash commits?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…

2018-09-04 Thread GitBox
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test 
os.stat().st_size of local_file in goog…
URL: 
https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418537507
 
 
   Added additional test and fixed inconsistent exception message formatting.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3006) Error when schedule_interval="None"

2018-09-04 Thread Kaxil Naik (JIRA)
Kaxil Naik created AIRFLOW-3006:
---

 Summary: Error when schedule_interval="None"
 Key: AIRFLOW-3006
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3006
 Project: Apache Airflow
  Issue Type: Improvement
  Components: core, scheduler
Affects Versions: 1.10.0, 1.9.0, 1.8.2
Reporter: Kaxil Naik
Assignee: Kaxil Naik
 Fix For: 1.10.1


When `schedule_interval` is set to `"None"`, it gives the following error:
{code:python}
dag = DAG('params-temp3',
  default_args=default_args, schedule_interval='None')
{code}

{code:python}
[2018-09-04 23:26:21,515] {dag_processing.py:582} INFO - Started a process 
(PID: 65903) to generate tasks for /Users/kaxil/airflow/dags/params-temp1.py
Process DagFileProcessor386-Process:
Traceback (most recent call last):
  File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", line 
267, in _bootstrap
self.run()
  File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", line 
114, in run
self._target(*self._args, **self._kwargs)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 388, in helper
pickle_dags)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py",
 line 74, in wrapper
return func(*args, **kwargs)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 1832, in process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 1422, in _process_dags
dag_run = self.create_dag_run(dag)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py",
 line 74, in wrapper
return func(*args, **kwargs)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 856, in create_dag_run
next_run_date = dag.normalize_schedule(min(task_start_dates))
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py",
 line 3410, in normalize_schedule
following = self.following_schedule(dttm)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py",
 line 3353, in following_schedule
cron = croniter(self._schedule_interval, dttm)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py",
 line 92, in __init__
self.expanded, self.nth_weekday_of_month = self.expand(expr_format)
  File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py",
 line 467, in expand
raise CroniterBadCronError(cls.bad_length)
CroniterBadCronError: Exactly 5 or 6 columns has to be specified for 
iteratorexpression.
[2018-09-04 23:26:22,657] {dag_processing.py:495} INFO - Processor for 
/Users/kaxil/airflow/dags/params-temp1.py finished
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil opened a new pull request #3846: [AIRFLOW-3006] Fix issue with schedule_interval='None'

2018-09-04 Thread GitBox
kaxil opened a new pull request #3846: [AIRFLOW-3006] Fix issue with 
schedule_interval='None'
URL: https://github.com/apache/incubator-airflow/pull/3846
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3006
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   When `schedule_interval` is set to `"None"` (Not python literal `None` but 
'None' string) as shown in the example below:
   ```python
   dag = DAG('params-temp3',
 default_args=default_args, schedule_interval='None')
   ```
   
   it gives the following error:
   
   ```python
   [2018-09-04 23:26:21,515] {dag_processing.py:582} INFO - Started a process 
(PID: 65903) to generate tasks for /Users/kaxil/airflow/dags/params-temp1.py
   Process DagFileProcessor386-Process:
   Traceback (most recent call last):
 File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", 
line 267, in _bootstrap
   self.run()
 File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", 
line 114, in run
   self._target(*self._args, **self._kwargs)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 388, in helper
   pickle_dags)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py",
 line 74, in wrapper
   return func(*args, **kwargs)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 1832, in process_file
   self._process_dags(dagbag, dags, ti_keys_to_schedule)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 1422, in _process_dags
   dag_run = self.create_dag_run(dag)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py",
 line 74, in wrapper
   return func(*args, **kwargs)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py",
 line 856, in create_dag_run
   next_run_date = dag.normalize_schedule(min(task_start_dates))
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py",
 line 3410, in normalize_schedule
   following = self.following_schedule(dttm)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py",
 line 3353, in following_schedule
   cron = croniter(self._schedule_interval, dttm)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py",
 line 92, in __init__
   self.expanded, self.nth_weekday_of_month = self.expand(expr_format)
 File 
"/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py",
 line 467, in expand
   raise CroniterBadCronError(cls.bad_length)
   CroniterBadCronError: Exactly 5 or 6 columns has to be specified for 
iteratorexpression.
   [2018-09-04 23:26:22,657] {dag_processing.py:495} INFO - Processor for 
/Users/kaxil/airflow/dags/params-temp1.py finished
   ```
   
   Our documentation at https://airflow.apache.org/scheduler.html#dag-runs has 
'None' as **preset** since 1.8.2 or even before, hence we should accept 
**"None"** as a valid `schedule_interval` apart from **None**
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   - `test_scheduler_dagrun_none `
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   
   ### Code Quality
   
   - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff`
   


This is an automated message from the Apache Git Service

  1   2   >