[jira] [Created] (AIRFLOW-3000) Allow to print into the log from operators (ability to Base Operator)
jack created AIRFLOW-3000: - Summary: Allow to print into the log from operators (ability to Base Operator) Key: AIRFLOW-3000 URL: https://issues.apache.org/jira/browse/AIRFLOW-3000 Project: Apache Airflow Issue Type: Task Affects Versions: 1.10.0 Reporter: jack As described on stack overflow: [https://stackoverflow.com/questions/52144108/how-to-print-a-unique-message-in-airflow-operator] Any print in the code will be shown on the log file. However it a problem when creating operators dinamicly assume this code: {code:java} for i in range(5, 0, -1): print("My name is load_ads_to_BigQuery-{}".format{i)) update_bigquery = GoogleCloudStorageToBigQueryOperator (task_id='load_ads_to_BigQuery-{}'.format(I),…){code} This creates 5 operators. The print will be executed 5 times per each operator. meaning that if you go to the log of {code:java} load_ads_to_BigQuery-1 {code} you will see: {code:java} My name is load_ads_to_BigQuery-1 My name is load_ads_to_BigQuery-2 My name is load_ads_to_BigQuery-3 My name is load_ads_to_BigQuery-4 My name is load_ads_to_BigQuery-5 {code} This is a problem because it logs messages of the other operators. Each operator is unique only with-in the operator itself. meaning that the print should be inside the operator as: for i in range(5, 0, -1): {code:java} update_bigquery = GoogleCloudStorageToBigQueryOperator (task_id='load_ads_to_BigQuery-{}'.format(I) , print("My name is load_ads_to_BigQuery-{}".format{i)) , …){code} or something like it. However Airflow does not support printing inside operators. It's not one of the allowed arguments. Add optional parameter called {code:java} msg_log {code} that if assigned with value it will print the value to the log when the operator is executed. Please add argument on the Base Operator for printing and extend it as optional ability to all operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …
codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now … URL: https://github.com/apache/incubator-airflow/pull/3813#issuecomment-416533459 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=h1) Report > Merging [#3813](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/5869feae871dd2b70e9c0b1fdb6315d034196aea?src=pr&el=desc) will **increase** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3813/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#3813 +/- ## == + Coverage 77.43% 77.43% +<.01% == Files 203 203 Lines 1584615846 == + Hits1227012271 +1 + Misses 3576 3575 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3813/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.79% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=footer). Last update [5869fea...402d8dd](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2997) Support for clustered tables in Bigquery hooks/operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated AIRFLOW-2997: -- Fix Version/s: 1.10.1 > Support for clustered tables in Bigquery hooks/operators > > > Key: AIRFLOW-2997 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2997 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Reporter: Gordon Ball >Priority: Minor > Fix For: 1.10.1 > > > Bigquery support for clustered tables was added (at GCP "Beta" level) on > 2018-07-30. This feature allows load or table-creating query operations to > request that data be stored sorted by a subset of columns, allowing more > efficient (and potentially cheaper) subsequent queries. > Support for specifying fields to cluster on should be added to at least the > bigquery hook, load-from-GCS operator and query operator. > Documentation: https://cloud.google.com/bigquery/docs/clustered-tables -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko closed pull request #3839: [AIRFLOW-208] Add badge to show supported Python versions
Fokko closed pull request #3839: [AIRFLOW-208] Add badge to show supported Python versions URL: https://github.com/apache/incubator-airflow/pull/3839 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/README.md b/README.md index e911225aee..211d9844d1 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ [![Coverage Status](https://img.shields.io/codecov/c/github/apache/incubator-airflow/master.svg)](https://codecov.io/github/apache/incubator-airflow?branch=master) [![Documentation Status](https://readthedocs.org/projects/airflow/badge/?version=latest)](https://airflow.readthedocs.io/en/latest/?badge=latest) [![License](http://img.shields.io/:license-Apache%202-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.txt) +[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/apache-airflow.svg)](https://pypi.org/project/apache-airflow/) [![Join the chat at https://gitter.im/apache/incubator-airflow](https://badges.gitter.im/apache/incubator-airflow.svg)](https://gitter.im/apache/incubator-airflow?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) _NOTE: The transition from 1.8.0 (or before) to 1.8.1 (or after) requires uninstalling Airflow before installing the new version. The package name was changed from `airflow` to `apache-airflow` as of version 1.8.1._ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-208) Adding badge to README.md to show supported Python versions
[ https://issues.apache.org/jira/browse/AIRFLOW-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602689#comment-16602689 ] ASF GitHub Bot commented on AIRFLOW-208: Fokko closed pull request #3839: [AIRFLOW-208] Add badge to show supported Python versions URL: https://github.com/apache/incubator-airflow/pull/3839 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/README.md b/README.md index e911225aee..211d9844d1 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ [![Coverage Status](https://img.shields.io/codecov/c/github/apache/incubator-airflow/master.svg)](https://codecov.io/github/apache/incubator-airflow?branch=master) [![Documentation Status](https://readthedocs.org/projects/airflow/badge/?version=latest)](https://airflow.readthedocs.io/en/latest/?badge=latest) [![License](http://img.shields.io/:license-Apache%202-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.txt) +[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/apache-airflow.svg)](https://pypi.org/project/apache-airflow/) [![Join the chat at https://gitter.im/apache/incubator-airflow](https://badges.gitter.im/apache/incubator-airflow.svg)](https://gitter.im/apache/incubator-airflow?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) _NOTE: The transition from 1.8.0 (or before) to 1.8.1 (or after) requires uninstalling Airflow before installing the new version. The package name was changed from `airflow` to `apache-airflow` as of version 1.8.1._ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Adding badge to README.md to show supported Python versions > --- > > Key: AIRFLOW-208 > URL: https://issues.apache.org/jira/browse/AIRFLOW-208 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Maxime Beauchemin >Assignee: Kaxil Naik >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko commented on issue #2187: [AIRFLOW-1042] Easy Unit Testing with Docker
Fokko commented on issue #2187: [AIRFLOW-1042] Easy Unit Testing with Docker URL: https://github.com/apache/incubator-airflow/pull/2187#issuecomment-418270573 @gerardo Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko closed pull request #2187: [AIRFLOW-1042] Easy Unit Testing with Docker
Fokko closed pull request #2187: [AIRFLOW-1042] Easy Unit Testing with Docker URL: https://github.com/apache/incubator-airflow/pull/2187 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 04b0d7f713..87f0a24cd7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -141,11 +141,26 @@ We *highly* recommend setting up [Travis CI](https://travis-ci.org/) on your repo to automate this. It is free for open source projects. If for some reason you cannot, you can use the steps below to run tests. -Here are loose guidelines on how to get your environment to run the unit tests. -We do understand that no one out there can run the full test suite since -Airflow is meant to connect to virtually any external system and that you most -likely have only a subset of these in your environment. You should run the -CoreTests and tests related to things you touched in your PR. +Unit tests can be run locally using Docker. Running this command: + +docker-compose up -d + +builds and starts three Docker containers: one for MySQL, one for Postgres, +and one for Airflow. Once the Docker containers are built and running you can +then run: + +./scripts/docker/unittest/run.sh tests.core:CoreTest + +The Airflow container has a volume mapped to the Airflow source directory so +that any edits made to source files are reflected in the container. You can +make edits and then run tests specific to the area you're working on. + +If you want to run unit tests without Docker, here are loose guidelines on +how to get your environment to run the unit tests. We do understand that no +one out there can run the full test suite since Airflow is meant to connect +to virtually any external system and that you most likely have only a subset +of these in your environment. You should run the CoreTests and tests related +to things you touched in your PR. To set up a unit test environment, first take a look at `run_unit_tests.sh` and understand that your ``AIRFLOW_CONFIG`` points to an alternate config file diff --git a/airflow/config_templates/default_test.cfg b/airflow/config_templates/default_test.cfg index ecf7f4ebb0..93a4f9fde3 100644 --- a/airflow/config_templates/default_test.cfg +++ b/airflow/config_templates/default_test.cfg @@ -70,8 +70,8 @@ smtp_mail_from = airf...@airflow.com celery_app_name = airflow.executors.celery_executor celeryd_concurrency = 16 worker_log_server_port = 8793 -broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow -celery_result_backend = db+mysql://airflow:airflow@localhost:3306/airflow +broker_url = sqla+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow +celery_result_backend = db+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow flower_host = 0.0.0.0 flower_port = default_queue = default diff --git a/airflow/configuration.py b/airflow/configuration.py index f140be2bc1..9ddaf5b4c1 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -318,6 +318,11 @@ def mkdir_p(path): else: AIRFLOW_CONFIG = expand_env_var(os.environ['AIRFLOW_CONFIG']) +if 'AIRFLOW_MYSQL_HOST' not in os.environ: +AIRFLOW_MYSQL_HOST = 'localhost' +else: +AIRFLOW_MYSQL_HOST = expand_env_var(os.environ['AIRFLOW_MYSQL_HOST']) + # Set up dags folder for unit tests # this directory won't exist if users install via pip _TEST_DAGS_FOLDER = os.path.join( diff --git a/airflow/utils/db.py b/airflow/utils/db.py index 618e00200b..4ca59e704f 100644 --- a/airflow/utils/db.py +++ b/airflow/utils/db.py @@ -27,6 +27,7 @@ from airflow import settings + def provide_session(func): """ Function decorator that provides a session if it isn't provided. @@ -94,6 +95,21 @@ def checkout(dbapi_connection, connection_record, connection_proxy): ) +def get_mysql_host(default='localhost'): +return default if 'AIRFLOW_MYSQL_HOST' not in os.environ \ +else os.environ['AIRFLOW_MYSQL_HOST'] + + +def get_mysql_login(default='root'): +return default if 'AIRFLOW_MYSQL_USER' not in os.environ \ +else os.environ['AIRFLOW_MYSQL_USER'] + + +def get_mysql_password(default=None): +return default if 'AIRFLOW_MYSQL_PASSWORD' not in os.environ \ +else os.environ['AIRFLOW_MYSQL_PASSWORD'] + + def initdb(): session = settings.Session() @@ -103,12 +119,13 @@ def initdb(): merge_conn( models.Connection( conn_id='airflow_db', conn_type='mysql', -host='localhost', login='root', password='', +login=get_mysql_login(), host=get_mysql_host(), password=get_mysql_password(), schema='airflow')) merge_conn( models.Connection( conn_id='airflow_ci', conn_type='mysql', -host='loca
[jira] [Commented] (AIRFLOW-1042) Easy Unit Testing with Docker
[ https://issues.apache.org/jira/browse/AIRFLOW-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602690#comment-16602690 ] ASF GitHub Bot commented on AIRFLOW-1042: - Fokko closed pull request #2187: [AIRFLOW-1042] Easy Unit Testing with Docker URL: https://github.com/apache/incubator-airflow/pull/2187 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 04b0d7f713..87f0a24cd7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -141,11 +141,26 @@ We *highly* recommend setting up [Travis CI](https://travis-ci.org/) on your repo to automate this. It is free for open source projects. If for some reason you cannot, you can use the steps below to run tests. -Here are loose guidelines on how to get your environment to run the unit tests. -We do understand that no one out there can run the full test suite since -Airflow is meant to connect to virtually any external system and that you most -likely have only a subset of these in your environment. You should run the -CoreTests and tests related to things you touched in your PR. +Unit tests can be run locally using Docker. Running this command: + +docker-compose up -d + +builds and starts three Docker containers: one for MySQL, one for Postgres, +and one for Airflow. Once the Docker containers are built and running you can +then run: + +./scripts/docker/unittest/run.sh tests.core:CoreTest + +The Airflow container has a volume mapped to the Airflow source directory so +that any edits made to source files are reflected in the container. You can +make edits and then run tests specific to the area you're working on. + +If you want to run unit tests without Docker, here are loose guidelines on +how to get your environment to run the unit tests. We do understand that no +one out there can run the full test suite since Airflow is meant to connect +to virtually any external system and that you most likely have only a subset +of these in your environment. You should run the CoreTests and tests related +to things you touched in your PR. To set up a unit test environment, first take a look at `run_unit_tests.sh` and understand that your ``AIRFLOW_CONFIG`` points to an alternate config file diff --git a/airflow/config_templates/default_test.cfg b/airflow/config_templates/default_test.cfg index ecf7f4ebb0..93a4f9fde3 100644 --- a/airflow/config_templates/default_test.cfg +++ b/airflow/config_templates/default_test.cfg @@ -70,8 +70,8 @@ smtp_mail_from = airf...@airflow.com celery_app_name = airflow.executors.celery_executor celeryd_concurrency = 16 worker_log_server_port = 8793 -broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow -celery_result_backend = db+mysql://airflow:airflow@localhost:3306/airflow +broker_url = sqla+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow +celery_result_backend = db+mysql://airflow:airflow@{AIRFLOW_MYSQL_HOST}:3306/airflow flower_host = 0.0.0.0 flower_port = default_queue = default diff --git a/airflow/configuration.py b/airflow/configuration.py index f140be2bc1..9ddaf5b4c1 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -318,6 +318,11 @@ def mkdir_p(path): else: AIRFLOW_CONFIG = expand_env_var(os.environ['AIRFLOW_CONFIG']) +if 'AIRFLOW_MYSQL_HOST' not in os.environ: +AIRFLOW_MYSQL_HOST = 'localhost' +else: +AIRFLOW_MYSQL_HOST = expand_env_var(os.environ['AIRFLOW_MYSQL_HOST']) + # Set up dags folder for unit tests # this directory won't exist if users install via pip _TEST_DAGS_FOLDER = os.path.join( diff --git a/airflow/utils/db.py b/airflow/utils/db.py index 618e00200b..4ca59e704f 100644 --- a/airflow/utils/db.py +++ b/airflow/utils/db.py @@ -27,6 +27,7 @@ from airflow import settings + def provide_session(func): """ Function decorator that provides a session if it isn't provided. @@ -94,6 +95,21 @@ def checkout(dbapi_connection, connection_record, connection_proxy): ) +def get_mysql_host(default='localhost'): +return default if 'AIRFLOW_MYSQL_HOST' not in os.environ \ +else os.environ['AIRFLOW_MYSQL_HOST'] + + +def get_mysql_login(default='root'): +return default if 'AIRFLOW_MYSQL_USER' not in os.environ \ +else os.environ['AIRFLOW_MYSQL_USER'] + + +def get_mysql_password(default=None): +return default if 'AIRFLOW_MYSQL_PASSWORD' not in os.environ \ +else os.environ['AIRFLOW_MYSQL_PASSWORD'] + + def initdb(): session = settings.Session() @@ -103,12 +119,13 @@ def initdb(): merge_conn( models.Connection( conn_id='airflow_db', conn_type='mysql', -host='localhost', login='root
[GitHub] codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …
codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now … URL: https://github.com/apache/incubator-airflow/pull/3813#issuecomment-416533459 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=h1) Report > Merging [#3813](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/5869feae871dd2b70e9c0b1fdb6315d034196aea?src=pr&el=desc) will **increase** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3813/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#3813 +/- ## == + Coverage 77.43% 77.43% +<.01% == Files 203 203 Lines 1584615846 == + Hits1227012271 +1 + Misses 3576 3575 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3813/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.79% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=footer). Last update [5869fea...402d8dd](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …
codecov-io edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now … URL: https://github.com/apache/incubator-airflow/pull/3813#issuecomment-416533459 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=h1) Report > Merging [#3813](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/5869feae871dd2b70e9c0b1fdb6315d034196aea?src=pr&el=desc) will **increase** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3813/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#3813 +/- ## == + Coverage 77.43% 77.43% +<.01% == Files 203 203 Lines 1584615846 == + Hits1227012271 +1 + Misses 3576 3575 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3813/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `88.79% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=footer). Last update [5869fea...402d8dd](https://codecov.io/gh/apache/incubator-airflow/pull/3813?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#discussion_r214822929 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -943,6 +962,14 @@ def run_load(self, 'timePartitioning': time_partitioning }) +if cluster_fields: Review comment: After rebasing this morning, the changes made by AIRFLOW-491 (#3733) mean the logic for `run_query` and `run_load` have diverged somewhat, so factoring it out probably now makes less sense. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
chronitis commented on a change in pull request #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#discussion_r214822929 ## File path: airflow/contrib/hooks/bigquery_hook.py ## @@ -943,6 +962,14 @@ def run_load(self, 'timePartitioning': time_partitioning }) +if cluster_fields: Review comment: After rebasing this morning, the changes made by AIRFLOW-491 mean the logic for `run_query` and `run_load` have diverged somewhat, so factoring it out probably now makes less sense. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-208) Adding badge to README.md to show supported Python versions
[ https://issues.apache.org/jira/browse/AIRFLOW-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-208. Resolution: Fixed Resolved by https://github.com/apache/incubator-airflow/pull/3839 > Adding badge to README.md to show supported Python versions > --- > > Key: AIRFLOW-208 > URL: https://issues.apache.org/jira/browse/AIRFLOW-208 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Maxime Beauchemin >Assignee: Kaxil Naik >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3001) accumulative tis slow allocation of new schedule
Jason Kim created AIRFLOW-3001: -- Summary: accumulative tis slow allocation of new schedule Key: AIRFLOW-3001 URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 Project: Apache Airflow Issue Type: Improvement Components: scheduler Affects Versions: 1.10.0 Reporter: Jason Kim I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag would be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'" I could resolve this issue by adding new index consist of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Kim updated AIRFLOW-3001: --- Summary: Accumulative tis slow allocation of new schedule (was: accumulative tis slow allocation of new schedule) > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Priority: Major > > I have created very long term schedule in short interval (2~3 years as 10 min > interval) > So, dag would be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'" > I could resolve this issue by adding new index consist of dag_id and > execution_date > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Kim updated AIRFLOW-3001: --- Description: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'" I could resolve this issue by adding new index consist of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. was: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag would be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'" I could resolve this issue by adding new index consist of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Priority: Major > > I have created very long term schedule in short interval (2~3 years as 10 min > interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'" > I could resolve this issue by adding new index consist of dag_id and > execution_date > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Kim updated AIRFLOW-3001: --- Description: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" I could resolve this issue by adding new index consists of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. was: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" I could resolve this issue by adding new index consist of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Priority: Major > > I have created very long term schedule in short interval (2~3 years as 10 min > interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" > I could resolve this issue by adding new index consists of dag_id and > execution_date > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Kim updated AIRFLOW-3001: --- Description: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" I could resolve this issue by adding new index consist of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. was: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '~' and execution_date = '~'" I could resolve this issue by adding new index consist of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Priority: Major > > I have created very long term schedule in short interval (2~3 years as 10 min > interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" > I could resolve this issue by adding new index consist of dag_id and > execution_date > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'
ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date' URL: https://github.com/apache/incubator-airflow/pull/3840 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Kim reassigned AIRFLOW-3001: -- Assignee: Jason Kim > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Assignee: Jason Kim >Priority: Major > > I have created very long term schedule in short interval (2~3 years as 10 min > interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" > I could resolve this issue by adding new index consists of dag_id and > execution_date > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602762#comment-16602762 ] ASF GitHub Bot commented on AIRFLOW-3001: - ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date' URL: https://github.com/apache/incubator-airflow/pull/3840 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Priority: Major > > I have created very long term schedule in short interval (2~3 years as 10 min > interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" > I could resolve this issue by adding new index consists of dag_id and > execution_date > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Kim updated AIRFLOW-3001: --- Description: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date = ''2018-09-01 00:00:00" I could resolve this issue by adding new index consists of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. was: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = '' and execution_date = ''" I could resolve this issue by adding new index consists of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Assignee: Jason Kim >Priority: Major > > I have created very long term schedule in short interval (2~3 years as 10 min > interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date > = ''2018-09-01 00:00:00" > I could resolve this issue by adding new index consists of dag_id and > execution_date > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Kim updated AIRFLOW-3001: --- Description: I have created very long term schedule in short interval. (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date = ''2018-09-01 00:00:00" I could resolve this issue by adding new index consists of dag_id and execution_date. So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. was: I have created very long term schedule in short interval (2~3 years as 10 min interval) So, dag could be bigger and bigger as scheduling goes on. Finally, at critical point (I don't know exactly when it is), the allocation of new task_instances get slow and then almost stop. I found that in this point, many slow query logs had occurred. (I was using mysql as meta repository) queries like this "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date = ''2018-09-01 00:00:00" I could resolve this issue by adding new index consists of dag_id and execution_date So, I wanted 1.10 branch to be modified to create task_instance table with the index. Thanks. > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Assignee: Jason Kim >Priority: Major > > I have created very long term schedule in short interval. (2~3 years as 10 > min interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date > = ''2018-09-01 00:00:00" > I could resolve this issue by adding new index consists of dag_id and > execution_date. > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'
ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date' URL: https://github.com/apache/incubator-airflow/pull/3840 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ubermen closed pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'
ubermen closed pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date' URL: https://github.com/apache/incubator-airflow/pull/3840 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/migrations/versions/e3a246e0dc1_current_schema.py b/airflow/migrations/versions/e3a246e0dc1_current_schema.py index 6c63d0a9dd..22624a4c8d 100644 --- a/airflow/migrations/versions/e3a246e0dc1_current_schema.py +++ b/airflow/migrations/versions/e3a246e0dc1_current_schema.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -176,6 +176,12 @@ def upgrade(): ['dag_id', 'state'], unique=False ) +op.create_index( +'ti_dag_date', +'task_instance', +['dag_id', 'execution_date'], +unique=False +) op.create_index( 'ti_pool', 'task_instance', @@ -269,6 +275,7 @@ def downgrade(): op.drop_index('ti_state_lkp', table_name='task_instance') op.drop_index('ti_pool', table_name='task_instance') op.drop_index('ti_dag_state', table_name='task_instance') +op.drop_index('ti_dag_date', table_name='task_instance') op.drop_table('task_instance') op.drop_table('slot_pool') op.drop_table('sla_miss') diff --git a/airflow/models.py b/airflow/models.py index 2096785b41..c41f2a9dbe 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -880,6 +880,7 @@ class TaskInstance(Base, LoggingMixin): __table_args__ = ( Index('ti_dag_state', dag_id, state), +Index('ti_dag_date', dag_id, execution_date), Index('ti_state', state), Index('ti_state_lkp', dag_id, task_id, execution_date, state), Index('ti_pool', pool, state, priority_weight), This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602779#comment-16602779 ] ASF GitHub Bot commented on AIRFLOW-3001: - ubermen opened a new pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date' URL: https://github.com/apache/incubator-airflow/pull/3840 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Assignee: Jason Kim >Priority: Major > > I have created very long term schedule in short interval. (2~3 years as 10 > min interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date > = ''2018-09-01 00:00:00" > I could resolve this issue by adding new index consists of dag_id and > execution_date. > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3001) Accumulative tis slow allocation of new schedule
[ https://issues.apache.org/jira/browse/AIRFLOW-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602778#comment-16602778 ] ASF GitHub Bot commented on AIRFLOW-3001: - ubermen closed pull request #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date' URL: https://github.com/apache/incubator-airflow/pull/3840 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/migrations/versions/e3a246e0dc1_current_schema.py b/airflow/migrations/versions/e3a246e0dc1_current_schema.py index 6c63d0a9dd..22624a4c8d 100644 --- a/airflow/migrations/versions/e3a246e0dc1_current_schema.py +++ b/airflow/migrations/versions/e3a246e0dc1_current_schema.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -176,6 +176,12 @@ def upgrade(): ['dag_id', 'state'], unique=False ) +op.create_index( +'ti_dag_date', +'task_instance', +['dag_id', 'execution_date'], +unique=False +) op.create_index( 'ti_pool', 'task_instance', @@ -269,6 +275,7 @@ def downgrade(): op.drop_index('ti_state_lkp', table_name='task_instance') op.drop_index('ti_pool', table_name='task_instance') op.drop_index('ti_dag_state', table_name='task_instance') +op.drop_index('ti_dag_date', table_name='task_instance') op.drop_table('task_instance') op.drop_table('slot_pool') op.drop_table('sla_miss') diff --git a/airflow/models.py b/airflow/models.py index 2096785b41..c41f2a9dbe 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -880,6 +880,7 @@ class TaskInstance(Base, LoggingMixin): __table_args__ = ( Index('ti_dag_state', dag_id, state), +Index('ti_dag_date', dag_id, execution_date), Index('ti_state', state), Index('ti_state_lkp', dag_id, task_id, execution_date, state), Index('ti_pool', pool, state, priority_weight), This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Accumulative tis slow allocation of new schedule > > > Key: AIRFLOW-3001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3001 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.0 >Reporter: Jason Kim >Assignee: Jason Kim >Priority: Major > > I have created very long term schedule in short interval. (2~3 years as 10 > min interval) > So, dag could be bigger and bigger as scheduling goes on. > Finally, at critical point (I don't know exactly when it is), the allocation > of new task_instances get slow and then almost stop. > I found that in this point, many slow query logs had occurred. (I was using > mysql as meta repository) > queries like this > "SELECT * FROM task_instance WHERE dag_id = 'some_dag_id' AND execution_date > = ''2018-09-01 00:00:00" > I could resolve this issue by adding new index consists of dag_id and > execution_date. > So, I wanted 1.10 branch to be modified to create task_instance table with > the index. > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
chronitis commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418294239 @kaxil I've addressed your comments wrt better docstrings, correct indentation. Rebasing after #3733 has resulted in some quite large changes to the implementation in `bigquery_hook.py`; since the logic is now rather different in `run_query` and `run_load`, it's less obvious that there is an easy piece of common code to factor out. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418050587 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=h1) Report > Merging [#3838](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3838/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3838 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=footer). Last update [da052ff...8e3325a](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
codecov-io edited a comment on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418050587 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=h1) Report > Merging [#3838](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3838/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3838 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=footer). Last update [da052ff...8e3325a](https://codecov.io/gh/apache/incubator-airflow/pull/3838?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-3000) Allow to print into the log from operators (ability to Base Operator)
[ https://issues.apache.org/jira/browse/AIRFLOW-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor closed AIRFLOW-3000. -- Resolution: Won't Do The example given in the stack-over flow answer (of sub-classing the Operator) is one way to do to this. However this is an anti-pattern: this is going to get printed _every time Airflow parses the DAG_ - not just when it runs, but every time Airflow goes around it's parsing loop. This is going to be noisy and create possibly many GB of logs per day. > Allow to print into the log from operators (ability to Base Operator) > - > > Key: AIRFLOW-3000 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3000 > Project: Apache Airflow > Issue Type: Task >Affects Versions: 1.10.0 >Reporter: jack >Priority: Minor > > As described on stack overflow: > [https://stackoverflow.com/questions/52144108/how-to-print-a-unique-message-in-airflow-operator] > > Any print in the code will be shown on the log file. However it a problem > when creating operators dinamicly > > assume this code: > > {code:java} > for i in range(5, 0, -1): > print("My name is load_ads_to_BigQuery-{}".format{i)) > update_bigquery = GoogleCloudStorageToBigQueryOperator > (task_id='load_ads_to_BigQuery-{}'.format(I),…){code} > > This creates 5 operators. > The print will be executed 5 times per each operator. > meaning that if you go to the log of > {code:java} > load_ads_to_BigQuery-1 {code} > you will see: > > > {code:java} > My name is load_ads_to_BigQuery-1 > My name is load_ads_to_BigQuery-2 > My name is load_ads_to_BigQuery-3 > My name is load_ads_to_BigQuery-4 > My name is load_ads_to_BigQuery-5 > {code} > > > This is a problem because it logs messages of the other operators. > > Each operator is unique only with-in the operator itself. meaning that the > print should be inside the operator as: > for i in range(5, 0, -1): > > > {code:java} > update_bigquery = GoogleCloudStorageToBigQueryOperator > (task_id='load_ads_to_BigQuery-{}'.format(I) , print("My name is > load_ads_to_BigQuery-{}".format{i)) , …){code} > > or something like it. However Airflow does not support printing inside > operators. It's not one of the allowed arguments. > Add optional parameter called > {code:java} > msg_log {code} > that if assigned with value it will print the value to the log when the > operator is executed. > > Please add argument on the Base Operator for printing and extend it as > optional ability to all operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] isknight commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …
isknight commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now … URL: https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418303178 Apologies @Fokko I was offline for a bit. For some odd reason my git client is erroring out when I try to interactively rebase and squash my commits. I'll look into it some more tomorrow. Otherwise, if @andrewmchen (and others) are ok with my changes, perhaps it would be simpler to make a new PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] isknight edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …
isknight edited a comment on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now … URL: https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418303178 Apologies @Fokko I was offline for a bit. For some odd reason my git client is erroring out when I try to interactively rebase and squash my commits. I'll look into it some more tomorrow. Otherwise, if @andrewmchen (and others) are ok with my changes, perhaps it would be simpler for me to make a new PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb opened a new pull request #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira
ashb opened a new pull request #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira URL: https://github.com/apache/incubator-airflow/pull/3841 This is a script that checks if the Jira's marked as fixed in a release are actually merged in - getting this working is helpful to me in preparing 1.10.1 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: fix the `airflow-jira compare 1.10.1` script to make building the point release easier :) ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira
codecov-io edited a comment on issue #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira URL: https://github.com/apache/incubator-airflow/pull/3841#issuecomment-418314621 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=h1) Report > Merging [#3841](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3841/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3841 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=footer). Last update [da052ff...0a663ed](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira
codecov-io commented on issue #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira URL: https://github.com/apache/incubator-airflow/pull/3841#issuecomment-418314621 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=h1) Report > Merging [#3841](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3841/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3841 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=footer). Last update [da052ff...0a663ed](https://codecov.io/gh/apache/incubator-airflow/pull/3841?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wmorris75 opened a new pull request #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
wmorris75 opened a new pull request #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) URL: https://github.com/apache/incubator-airflow/pull/3842 add 8fit to list of companies [AIRFLOW-XXX] Add THE ICONIC to the list of orgs using Airflow Closes #3807 from ksaagariconic/patch-2 [AIRFLOW-2933] Enable Codecov on Docker-CI Build (#3780) - Add missing variables and use codecov instead of coveralls. The issue why it wasn't working was because missing environment variables. The codecov library heavily depends on the environment variables in the CI to determine how to push the reports to codecov. - Remove the explicit passing of the variables in the `tox.ini` since it is already done in the `docker-compose.yml`, having to maintain this at two places makes it brittle. - Removed the empty Codecov yml since codecov was complaining that it was unable to parse it [AIRFLOW-2960] Pin boto3 to <1.8 (#3810) Boto 1.8 has been released a few days ago and they break our tests. [AIRFLOW-2957] Remove obselete sensor references [AIRFLOW-2959] Refine HTTPSensor doc (#3809) HTTP Error code other than 404, or Connection Refused, would fail the sensor itself directly (no more poking). [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test (#3811) Simplify this test since it takes up 15% of all the time. This is because every example dag, with some exclusions, are backfilled. This will put some pressure on the scheduler and everything. If the test just covers a couple of dags should be sufficient 254 seconds: [success] 15.03% tests.BackfillJobTest.test_backfill_examples: 254.9323s [AIRFLOW-XXX] Remove residual line in Changelog (#3814) [AIRFLOW-2930] Fix celery excecutor scheduler crash (#3784) Caused by an update in PR #3740. execute_command.apply_async(args=command, ...) -command is a list of short unicode strings and the above code pass multiple arguments to a function defined as taking only one argument. -command = ["airflow", "run", "dag323",...] -args = command = ["airflow", "run", "dag323", ...] -execute_command("airflow","run","dag3s3", ...) will be error and exit. [AIRFLOW-2916] Arg `verify` for AwsHook() & S3 sensors/operators (#3764) This is useful when 1. users want to use a different CA cert bundle than the one used by botocore. 2. users want to have '--no-verify-ssl'. This is especially useful when we're using on-premises S3 or other implementations of object storage, like IBM's Cloud Object Storage. The default value here is `None`, which is also the default value in boto3, so that backward compatibility is ensured too. Reference: https://boto3.readthedocs.io/en/latest/reference/core/session.html [AIRFLOW-2709] Improve error handling in Databricks hook (#3570) * Use float for default value * Use status code to determine whether an error is retryable * Fix wrong type in assertion * Fix style to prevent lines from exceeding 90 characters * Fix wrong way of checking exception type [AIRFLOW-2854] kubernetes_pod_operator add more configuration items (#3697) * kubernetes_pod_operator add more configuration items * fix test_kubernetes_pod_operator test_faulty_service_account failure case * fix review comment issues * pod_operator add hostnetwork config * add doc example [AIRFLOW-2994] Fix command status check in Qubole Check operator (#3790) [AIRFLOW-2928] Use uuid4 instead of uuid1 (#3779) for better randomness. [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828) [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828) [AIRFLOW-2949] Add syntax highlight for single quote strings (#3795) * AIRFLOW-2949: Add syntax highlight for single quote strings * AIRFLOW-2949: Also updated new UI main.css [AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (#3793) There may be different combinations of arguments, and some processings are being done 'silently', while users may not be fully aware of them. For example - User only needs to provide either `ssh_hook` or `ssh_conn_id`, while this is not clear in doc - if both provided, `ssh_conn_id` will be ignored. - if `remote_host` is provided, it will replace the `remote_host` which wasndefined in `ssh_hook` or predefined in the connection of `ssh_conn_id` These should be documented clearly
[jira] [Commented] (AIRFLOW-2993) Addition of S3_to_SFTP and SFTP_to_S3 Operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603047#comment-16603047 ] ASF GitHub Bot commented on AIRFLOW-2993: - wmorris75 opened a new pull request #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) URL: https://github.com/apache/incubator-airflow/pull/3842 add 8fit to list of companies [AIRFLOW-XXX] Add THE ICONIC to the list of orgs using Airflow Closes #3807 from ksaagariconic/patch-2 [AIRFLOW-2933] Enable Codecov on Docker-CI Build (#3780) - Add missing variables and use codecov instead of coveralls. The issue why it wasn't working was because missing environment variables. The codecov library heavily depends on the environment variables in the CI to determine how to push the reports to codecov. - Remove the explicit passing of the variables in the `tox.ini` since it is already done in the `docker-compose.yml`, having to maintain this at two places makes it brittle. - Removed the empty Codecov yml since codecov was complaining that it was unable to parse it [AIRFLOW-2960] Pin boto3 to <1.8 (#3810) Boto 1.8 has been released a few days ago and they break our tests. [AIRFLOW-2957] Remove obselete sensor references [AIRFLOW-2959] Refine HTTPSensor doc (#3809) HTTP Error code other than 404, or Connection Refused, would fail the sensor itself directly (no more poking). [AIRFLOW-2961] Refactor tests.BackfillJobTest.test_backfill_examples test (#3811) Simplify this test since it takes up 15% of all the time. This is because every example dag, with some exclusions, are backfilled. This will put some pressure on the scheduler and everything. If the test just covers a couple of dags should be sufficient 254 seconds: [success] 15.03% tests.BackfillJobTest.test_backfill_examples: 254.9323s [AIRFLOW-XXX] Remove residual line in Changelog (#3814) [AIRFLOW-2930] Fix celery excecutor scheduler crash (#3784) Caused by an update in PR #3740. execute_command.apply_async(args=command, ...) -command is a list of short unicode strings and the above code pass multiple arguments to a function defined as taking only one argument. -command = ["airflow", "run", "dag323",...] -args = command = ["airflow", "run", "dag323", ...] -execute_command("airflow","run","dag3s3", ...) will be error and exit. [AIRFLOW-2916] Arg `verify` for AwsHook() & S3 sensors/operators (#3764) This is useful when 1. users want to use a different CA cert bundle than the one used by botocore. 2. users want to have '--no-verify-ssl'. This is especially useful when we're using on-premises S3 or other implementations of object storage, like IBM's Cloud Object Storage. The default value here is `None`, which is also the default value in boto3, so that backward compatibility is ensured too. Reference: https://boto3.readthedocs.io/en/latest/reference/core/session.html [AIRFLOW-2709] Improve error handling in Databricks hook (#3570) * Use float for default value * Use status code to determine whether an error is retryable * Fix wrong type in assertion * Fix style to prevent lines from exceeding 90 characters * Fix wrong way of checking exception type [AIRFLOW-2854] kubernetes_pod_operator add more configuration items (#3697) * kubernetes_pod_operator add more configuration items * fix test_kubernetes_pod_operator test_faulty_service_account failure case * fix review comment issues * pod_operator add hostnetwork config * add doc example [AIRFLOW-2994] Fix command status check in Qubole Check operator (#3790) [AIRFLOW-2928] Use uuid4 instead of uuid1 (#3779) for better randomness. [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828) [AIRFLOW-2993] Added sftp_to_s3 and s3_to_sftp operators (#3828) [AIRFLOW-2949] Add syntax highlight for single quote strings (#3795) * AIRFLOW-2949: Add syntax highlight for single quote strings * AIRFLOW-2949: Also updated new UI main.css [AIRFLOW-2948] Arg check & better doc - SSHOperator & SFTPOperator (#3793) There may be different combinations of arguments, and some processings are being done 'silently', while users may not be fully aware of them. For example - User only needs to provide either `ssh_hook` or `ssh_conn_id`, while this is not clear
[GitHub] gauthiermartin commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose
gauthiermartin commented on issue #3797: [AIRFLOW-2952] Splits CI into k8s + docker-compose URL: https://github.com/apache/incubator-airflow/pull/3797#issuecomment-418388832 @dimberman Currently I'm having an issue while running the ./docker/build.sh locally. There still seem to be an issue with the SLUGIFY_USES_TEXT_UNIDECODE=yes while running the script locally. I know you have added that env var in travis-ci.yml file but it is also required when running the script locally. Should we export it in the build.sh file ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dalupus commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …
dalupus commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now … URL: https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418395125 lgtm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214944518 ## File path: airflow/contrib/operators/s3_delete_objects_operator.py ## @@ -0,0 +1,92 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.exceptions import AirflowException +from airflow.hooks.S3_hook import S3Hook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +class S3DeleteObjectsOperator(BaseOperator): +""" +To enable users to delete single object or multiple objects from +a bucket using a single HTTP request. + +Users may specify up to 1000 keys to delete. + +:param bucket: Name of the bucket in which you are going to delete object(s) +:type bucket: str +:param keys: The key(s) to delete from S3 bucket. + +When ``keys`` is a string, it's supposed to be the key name of +the single object to delete. + +When ``keys`` is a list, it's supposed to be the list of the +keys to delete. +:type keys: str or list +:param s3_conn_id: Connection id of the S3 connection to use +:type s3_conn_id: str +:param verify: Whether or not to verify SSL certificates for S3 connetion. +By default SSL certificates are verified. + +You can provide the following values: + +- False: do not validate SSL certificates. SSL will still be used, + but SSL certificates will not be + verified. +- path/to/cert/bundle.pem: A filename of the CA cert bundle to uses. + You can specify this argument if you want to use a different + CA cert bundle than the one used by botocore. +:type verify: bool or str +:param silent_on_errors: If set to `True`, ignore `Errors` in the boto3 response. +Default value is `False`. + +`Errors` here arise due to reasons like users are trying to delete non-existent +objects. It doesn't necessarily indicate exception (the request sent to S3 +itself succeeds). +:type silent_on_errors: bool +""" + +@apply_defaults +def __init__( +self, +bucket, +keys, +s3_conn_id='aws_default', +verify=None, +silent_on_errors=False, +*args, **kwargs): +super(S3DeleteObjectsOperator, self).__init__(*args, **kwargs) +self.bucket = bucket +self.keys = keys +self.s3_conn_id = s3_conn_id +self.verify = verify +self.silent_on_errors = silent_on_errors + +def execute(self, context): +s3_hook = S3Hook(aws_conn_id=self.s3_conn_id, verify=self.verify) + +response = s3_hook._delete_objects(bucket=self.bucket, keys=self.keys) + +deleted_keys = [x['Key'] for x in response.get("Deleted", [])] +self.log.info("Deleted: {}".format(deleted_keys)) + +if not self.silent_on_errors and "Errors" in response: Review comment: Regarding the `Errors` in the boto3 response, I use argument `silent_on_errors` to enable users decide whether they consider `deleting non-existent object` as exceptions and if the operator should fail. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214945536 ## File path: airflow/hooks/S3_hook.py ## @@ -384,3 +384,89 @@ def load_bytes(self, client = self.get_conn() client.upload_fileobj(filelike_buffer, bucket_name, key, ExtraArgs=extra_args) + +def _copy_object(self, Review comment: I name the methods in `S3Hook()` as `_copy_object`, in order to avoid confusion between `boto3.client.copy_object` and `S3Hook.copy_object`. The same applies to `_delete_objects` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214945861 ## File path: tests/contrib/operators/test_s3_delete_objects_operator.py ## @@ -0,0 +1,111 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import io +import unittest + +import boto3 +from moto import mock_s3 + +from airflow.contrib.operators.s3_delete_objects_operator import S3DeleteObjectsOperator +from airflow.exceptions import AirflowException + + +class TestS3DeleteObjectsOperator(unittest.TestCase): + +@mock_s3 +def test_s3_delete_single_object(self): +bucket = "testbucket" +key = "path/data.txt" + +conn = boto3.client('s3') +conn.create_bucket(Bucket=bucket) +conn.upload_fileobj(Bucket=bucket, +Key=key, +Fileobj=io.BytesIO(b"input")) + +# The object should be detected before the DELETE action is taken +objects_in_dest_bucket = conn.list_objects(Bucket=bucket, + Prefix=key) +self.assertEqual(len(objects_in_dest_bucket['Contents']), 1) +self.assertEqual(objects_in_dest_bucket['Contents'][0]['Key'], key) + +t = S3DeleteObjectsOperator(task_id="test_task_s3_delete_single_object", +bucket=bucket, +keys=key) +t.execute(None) + +# There should be no object found in the bucket created earlier +self.assertFalse('Contents' in conn.list_objects(Bucket=bucket, + Prefix=key)) + +@mock_s3 +def test_s3_delete_multiple_objects(self): +bucket = "testbucket" +key_pattern = "path/data" +n_keys = 3 +keys = [key_pattern + str(i) for i in range(n_keys)] + +conn = boto3.client('s3') +conn.create_bucket(Bucket=bucket) +for k in keys: +conn.upload_fileobj(Bucket=bucket, +Key=k, +Fileobj=io.BytesIO(b"input")) + +# The objects should be detected before the DELETE action is taken +objects_in_dest_bucket = conn.list_objects(Bucket=bucket, + Prefix=key_pattern) +self.assertEqual(len(objects_in_dest_bucket['Contents']), n_keys) +self.assertEqual(sorted([x['Key'] for x in objects_in_dest_bucket['Contents']]), + sorted(keys)) + +t = S3DeleteObjectsOperator(task_id="test_task_s3_delete_multiple_objects", +bucket=bucket, +keys=keys) +t.execute(None) + +# There should be no object found in the bucket created earlier +self.assertFalse('Contents' in conn.list_objects(Bucket=bucket, + Prefix=key_pattern)) + +@mock_s3 +def test_s3_delete_non_existent_object(self): Review comment: A test case is added to test the argument `silent_on_errors`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214946897 ## File path: airflow/contrib/operators/s3_delete_objects_operator.py ## @@ -0,0 +1,92 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow.exceptions import AirflowException +from airflow.hooks.S3_hook import S3Hook +from airflow.models import BaseOperator +from airflow.utils.decorators import apply_defaults + + +class S3DeleteObjectsOperator(BaseOperator): Review comment: Personally I still suggest not support S3 style url in `S3DeleteObjectsOperator`. This is to make - argument combination clear - support deleting single object and deleting multiple object within one operator. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
ashb commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214947531 ## File path: airflow/hooks/S3_hook.py ## @@ -384,3 +384,89 @@ def load_bytes(self, client = self.get_conn() client.upload_fileobj(filelike_buffer, bucket_name, key, ExtraArgs=extra_args) + +def _copy_object(self, Review comment: I wouldn't. A method prefixed with `_` in python is usually an indication that it is "private" and shouldn't be called from outside the method. Given the S3Hook doesn't directly expose any methods from the boto client/session object I think the chance of confusion is slight. (and a doc string highlighting any differences will help dispel confusion) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-418398128 Hi @ashb , I have addressed your earlier review comments. May you take another look? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
XD-DENG commented on a change in pull request #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#discussion_r214949579 ## File path: airflow/hooks/S3_hook.py ## @@ -384,3 +384,89 @@ def load_bytes(self, client = self.get_conn() client.upload_fileobj(filelike_buffer, bucket_name, key, ExtraArgs=extra_args) + +def _copy_object(self, Review comment: Sure, I will change this part. Actually the chance of misusing is low given the argument required are totally different. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (AIRFLOW-2062) Support fine-grained Connection encryption
[ https://issues.apache.org/jira/browse/AIRFLOW-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous reassigned AIRFLOW-2062: -- Assignee: Jasper Kahn > Support fine-grained Connection encryption > -- > > Key: AIRFLOW-2062 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2062 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Wilson Lian >Assignee: Jasper Kahn >Priority: Minor > > This effort targets containerized tasks (e.g., those launched by > KubernetesExecutor). Under that paradigm, each task could potentially operate > under different credentials, and fine-grained Connection encryption will > enable an administrator to restrict which connections can be accessed by > which tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
XD-DENG commented on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-418412622 Hi @ashb , I have addressed the naming of methods within `S3Hook()` (removed prepending `_`). Any other comment? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting
codecov-io edited a comment on issue #3823: [AIRFLOW-2985] An operator for S3 object copying/deleting URL: https://github.com/apache/incubator-airflow/pull/3823#issuecomment-417296502 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=h1) Report > Merging [#3823](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **increase** coverage by `0.01%`. > The diff coverage is `90.47%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3823/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#3823 +/- ## == + Coverage 77.43% 77.45% +0.01% == Files 203 203 Lines 1584615867 +21 == + Hits1227112290 +19 - Misses 3575 3577 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3823/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5) | `94.32% <90.47%> (-0.68%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=footer). Last update [da052ff...3144472](https://codecov.io/gh/apache/incubator-airflow/pull/3823?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2999) S3_hook - add the ability to download file to local disk
[ https://issues.apache.org/jira/browse/AIRFLOW-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603239#comment-16603239 ] jack commented on AIRFLOW-2999: --- [~XD-DENG] seems like your territory if you would like to take it :) > S3_hook - add the ability to download file to local disk > - > > Key: AIRFLOW-2999 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2999 > Project: Apache Airflow > Issue Type: Task >Affects Versions: 1.10.0 >Reporter: jack >Priority: Major > > The [S3_hook > |https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/S3_hook.py#L177] > has get_key method that returns boto3.s3.Object it also has load_file method > which loads file from local file system to S3. > > What it doesn't have is a method to download a file from S3 to the local file > system. > Basicly it should be something very simple... an extention to the get_key > method with parameter to the destination on local file system adding a code > for taking the boto3.s3.Object and save it on the disk. Note: that it can be > more than 1 file if the user choose a folder in S3. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file
Jeffrey Payne created AIRFLOW-3002: -- Summary: ValueError in dataflow operators when using GCS jar or py_file Key: AIRFLOW-3002 URL: https://issues.apache.org/jira/browse/AIRFLOW-3002 Project: Apache Airflow Issue Type: Bug Components: contrib, Dataflow Affects Versions: 1.9.0, 2.0.0 Reporter: Jeffrey Payne Assignee: Kaxil Naik Fix For: 1.10.1 The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to compare a list to an int, resulting in the TypeError, with: {noformat} ... path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/') if path_components < 2: ... {noformat} This should be {{if len(path_components) < 2:}}. Also, fix {{if file_size > 0:}} in same function... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file
[ https://issues.apache.org/jira/browse/AIRFLOW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Payne updated AIRFLOW-3002: --- Description: The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with a ValueError, with: {noformat} ... file_size = self._gcs_hook.download(bucket_id, object_id, local_file) if os.stat(file_size).st_size > 0: return local_file ... {noformat} The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed in is actually the downloaded bytes from {{GoogleCloudStorageHook.download()}}. The error is like: {noformat} [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export File "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py", line 372, in google_cloud_to_local [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export if os.stat(file_size).st_size > 0: [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export ValueError: stat: embedded null character in path {noformat} was: The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to compare a list to an int, resulting in the TypeError, with: {noformat} ... path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/') if path_components < 2: ... {noformat} This should be {{if len(path_components) < 2:}}. Also, fix {{if file_size > 0:}} in same function... > ValueError in dataflow operators when using GCS jar or py_file > -- > > Key: AIRFLOW-3002 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3002 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, Dataflow >Affects Versions: 1.9.0, 2.0.0 >Reporter: Jeffrey Payne >Assignee: Kaxil Naik >Priority: Major > Fix For: 1.10.1 > > > The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with > a ValueError, with: > {noformat} > ... > file_size = self._gcs_hook.download(bucket_id, object_id, local_file) > if os.stat(file_size).st_size > 0: > return local_file > ... > {noformat} > The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed > in is actually the downloaded bytes from > {{GoogleCloudStorageHook.download()}}. > The error is like: > {noformat} > [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export File > "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py", > line 372, in google_cloud_to_local > [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export if os.stat(file_size).st_size > 0: > [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export ValueError: stat: embedded null character in path > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file
[ https://issues.apache.org/jira/browse/AIRFLOW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603276#comment-16603276 ] Jeffrey Payne commented on AIRFLOW-3002: [~kaxilnaik] Opening a PR for this. Change should just be to {{if os.stat(local_file).st_size > 0:}}, no? > ValueError in dataflow operators when using GCS jar or py_file > -- > > Key: AIRFLOW-3002 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3002 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, Dataflow >Affects Versions: 1.9.0, 2.0.0 >Reporter: Jeffrey Payne >Assignee: Kaxil Naik >Priority: Major > Fix For: 1.10.1 > > > The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with > a ValueError, with: > {noformat} > ... > file_size = self._gcs_hook.download(bucket_id, object_id, local_file) > if os.stat(file_size).st_size > 0: > return local_file > ... > {noformat} > The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed > in is actually the downloaded bytes from > {{GoogleCloudStorageHook.download()}}. > The error is like: > {noformat} > [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export File > "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py", > line 372, in google_cloud_to_local > [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export if os.stat(file_size).st_size > 0: > [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export ValueError: stat: embedded null character in path > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jeffkpayne opened a new pull request #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
jeffkpayne opened a new pull request #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843 …le_cloud_to_local Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3002 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file
[ https://issues.apache.org/jira/browse/AIRFLOW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603287#comment-16603287 ] ASF GitHub Bot commented on AIRFLOW-3002: - jeffkpayne opened a new pull request #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843 …le_cloud_to_local Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3002 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ValueError in dataflow operators when using GCS jar or py_file > -- > > Key: AIRFLOW-3002 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3002 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, Dataflow >Affects Versions: 1.9.0, 2.0.0 >Reporter: Jeffrey Payne >Assignee: Kaxil Naik >Priority: Major > Fix For: 1.10.1 > > > The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with > a ValueError, with: > {noformat} > ... > file_size = self._gcs_hook.download(bucket_id, object_id, local_file) > if os.stat(file_size).st_size > 0: > return local_file > ... > {noformat} > The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed > in is actually the downloaded bytes from > {{GoogleCloudStorageHook.download()}}. > The error is like: > {noformat} > [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export File > "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py", > line 372, in google_cloud_to_local > [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export if os.stat(file_size).st_size > 0: > [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask > surge_export ValueError: stat: embedded null character in path > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)
[ https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603302#comment-16603302 ] Trevor Edwards commented on AIRFLOW-2319: - +1 to this issue. There is an id column, but aside from this, it seems like only the pair (dag_id, [run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384])] should be enforced as unique. The current behavior feels like a bug. This issue becomes problematic if you have event-driven DAGs (e.g. https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf) which may have different parameters execute simultaneously, causing an execution_date collision. Andreas, are you working on a fix for this? > Table "dag_run" has (bad) second index on (dag_id, execution_date) > -- > > Key: AIRFLOW-2319 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2319 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.9.0 >Reporter: Andreas Költringer >Priority: Major > > Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} > (multiple rows with the same {{(dag_id, execution_date)}}) raised the > following error: > {code:java} > {models.py:1644} ERROR - No row was found for one(){code} > This is weird as the {{session.add()}} and {{session.commit()}} is right > before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}. > Manually inspecting the database revealed that there is an extra index with > {{unique}} constraint on the columns {{(dag_id, execution_date)}}: > {code:java} > sqlite> .schema dag_run > CREATE TABLE dag_run ( > id INTEGER NOT NULL, > dag_id VARCHAR(250), > execution_date DATETIME, > state VARCHAR(50), > run_id VARCHAR(250), > external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date > DATETIME, > PRIMARY KEY (id), > UNIQUE (dag_id, execution_date), > UNIQUE (dag_id, run_id), > CHECK (external_trigger IN (0, 1)) > ); > CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code} > (On SQLite its a unique constraint, on MariaDB its also an index) > The {{DagRun}} class in {{models.py}} does not reflect this, however it is in > [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42] > I looked for other migrations correting this, but could not find any. As this > is not reflected in the model, I guess this is a bug? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] feng-tao commented on a change in pull request #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira
feng-tao commented on a change in pull request #3841: [AIRFLOW-XXX] Fix python3 errors in dev/airflow-jira URL: https://github.com/apache/incubator-airflow/pull/3841#discussion_r214988582 ## File path: dev/airflow-jira ## @@ -134,7 +134,7 @@ def compare(target_version): for issue in issues: is_merged = issue.key in merges -print("{:<18}|{:<12}||{:<10}||{:<10}|{:<50}|{:<6}|{:<6}|{:<40}" +print("{:<18}|{!s:<12}||{!s:<10}||{!s:<10}|{:<50}|{:<6}|{:<6}|{:<40}" Review comment: what is this line change trying to fix? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418442065 Good Spot. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418442350 Will merge once the Travis passes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
kaxil commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418443141 @jeffkpayne Can you please add a unit test for this as well ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil removed a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
kaxil removed a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418442350 Will merge once the Travis passes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
kaxil edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418443141 @jeffkpayne Can you please add a unit test for this as well ? Also when you push new commits remember to squash commits and make sure that the subject is limited to 50 characters (not including Jira issue reference) as we would use this in CHANGELOG.md. Thanks. Appreciate the effort. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1) Report > Merging [#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3843 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer). Last update [da052ff...df9ef49](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
codecov-io commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1) Report > Merging [#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3843 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer). Last update [da052ff...df9ef49](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on issue #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date'
feng-tao commented on issue #3840: [AIRFLOW-3001] Add task_instance table index 'ti_dag_date' URL: https://github.com/apache/incubator-airflow/pull/3840#issuecomment-418449675 could you add more description on what is this change about? Why do we need a new index? And I think we need a new alemic script for any model change instead of modifying existing one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418455599 @kaxil Will do, but wrt squashed commits, I only had one commit so far ;) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3003) Pull the krb5 image instead of building it
Fokko Driesprong created AIRFLOW-3003: - Summary: Pull the krb5 image instead of building it Key: AIRFLOW-3003 URL: https://issues.apache.org/jira/browse/AIRFLOW-3003 Project: Apache Airflow Issue Type: Bug Reporter: Fokko Driesprong For the CI we use a krb5 image to test kerberos functionality. This is not something that we want to since it is faster to pull the finished image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3003) Pull the krb5 image instead of building it
[ https://issues.apache.org/jira/browse/AIRFLOW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong reassigned AIRFLOW-3003: - Assignee: Fokko Driesprong > Pull the krb5 image instead of building it > -- > > Key: AIRFLOW-3003 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3003 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > > For the CI we use a krb5 image to test kerberos functionality. This is not > something that we want to since it is faster to pull the finished image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3003) Pull the krb5 image instead of building it
[ https://issues.apache.org/jira/browse/AIRFLOW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong updated AIRFLOW-3003: -- Issue Type: Improvement (was: Bug) > Pull the krb5 image instead of building it > -- > > Key: AIRFLOW-3003 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3003 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > > For the CI we use a krb5 image to test kerberos functionality. This is not > something that we want to since it is faster to pull the finished image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] r39132 commented on a change in pull request #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime
r39132 commented on a change in pull request #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime URL: https://github.com/apache/incubator-airflow/pull/3834#discussion_r215010936 ## File path: airflow/bin/cli.py ## @@ -551,6 +551,17 @@ def dag_state(args): print(dr[0].state if len(dr) > 0 else None) +@cli_utils.action_logging Review comment: @XD-DENG I just tested this with some of the example dags in https://github.com/apache/incubator-airflow/tree/master/airflow/example_dags. Can you test your code with different schedule types including `@once`, `daily/weekly`, `timedelta(hours=1)`, etc... in addition to the case you provide which is cron expressions. Also, can you add tests for these? ``` (venv) sianand@LM-SJN-21002367:~/Projects/airflow_incubator $ airflow next_execution latest_only [2018-09-04 10:52:19,613] {__init__.py:51} INFO - Using executor SequentialExecutor /Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/bin/cli.py:1724: DeprecationWarning: The celeryd_concurrency option in [celery] has been renamed to worker_concurrency - the old setting has been used, but please update your config. default=conf.get('celery', 'worker_concurrency')), [2018-09-04 10:52:19,822] {models.py:260} INFO - Filling up the DagBag from /Users/sianand/Projects/airflow_incubator/dags [2018-09-04 10:52:19,882] {example_kubernetes_operator.py:55} WARNING - Could not import KubernetesPodOperator: No module named 'kubernetes' [2018-09-04 10:52:19,882] {example_kubernetes_operator.py:56} WARNING - Install kubernetes dependencies with: pip install apache-airflow[kubernetes] Traceback (most recent call last): File "/Users/sianand/miniconda3/bin/airflow", line 4, in __import__('pkg_resources').run_script('apache-airflow==2.0.0.dev0+incubating', 'airflow') File "/Users/sianand/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 654, in run_script self.require(requires)[0].run_script(script_name, ns) File "/Users/sianand/miniconda3/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1434, in run_script exec(code, namespace, namespace) File "/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/EGG-INFO/scripts/airflow", line 32, in args.func(args) File "/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/utils/cli.py", line 74, in wrapper return f(*args, **kwargs) File "/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/bin/cli.py", line 562, in next_execution print(dag.following_schedule(dag.latest_execution_date)) File "/Users/sianand/miniconda3/lib/python3.6/site-packages/apache_airflow-2.0.0.dev0+incubating-py3.6.egg/airflow/models.py", line 3371, in following_schedule return dttm + self._schedule_interval TypeError: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta' ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3004) Add configuration option to disable schedules
Jacob Greenfield created AIRFLOW-3004: - Summary: Add configuration option to disable schedules Key: AIRFLOW-3004 URL: https://issues.apache.org/jira/browse/AIRFLOW-3004 Project: Apache Airflow Issue Type: Improvement Components: configuration, scheduler Reporter: Jacob Greenfield Assignee: Jacob Greenfield We have a particular use case where we'd like there to be a configuration option that controls the scheduler and prevents use of the cron schedules globally for all DAGs, while still allowing manual submission (trigger_dag) and task instances being scheduled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko commented on issue #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime
Fokko commented on issue #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime URL: https://github.com/apache/incubator-airflow/pull/3834#issuecomment-418470603 I've tried working with the next_execution but stumbled on some problems. For example, if the dag hasn't run yet, I got error since it tries to fetch it from the database. Please keep this in mind. Personally I would prefer a next_execution date that is computed based on the schedule instead or having the scheduler fill this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko removed a comment on issue #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime
Fokko removed a comment on issue #3834: [AIRFLOW-2965] CLI tool to show the next execution datetime URL: https://github.com/apache/incubator-airflow/pull/3834#issuecomment-418470603 I've tried working with the next_execution but stumbled on some problems. For example, if the dag hasn't run yet, I got error since it tries to fetch it from the database. Please keep this in mind. Personally I would prefer a next_execution date that is computed based on the schedule instead or having the scheduler fill this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now …
Fokko commented on issue #3813: [AIRFLOW-1998] Implemented DatabricksRunNowOperator for jobs/run-now … URL: https://github.com/apache/incubator-airflow/pull/3813#issuecomment-418474040 No problem @isknight @andrewmchen any final thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828)
Fokko commented on issue #3842: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (#3828) URL: https://github.com/apache/incubator-airflow/pull/3842#issuecomment-418474332 @wmorris75 Something obviously went wrong. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418490734 Looks like the 2.7 builds are failing. Looking into this... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418491999 Gah, found it... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators
codecov-io edited a comment on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators URL: https://github.com/apache/incubator-airflow/pull/3828#issuecomment-417764201 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=h1) Report > Merging [#3828](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3828/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3828 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=footer). Last update [da052ff...2463827](https://codecov.io/gh/apache/incubator-airflow/pull/3828?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators
wmorris75 commented on issue #3828: [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators URL: https://github.com/apache/incubator-airflow/pull/3828#issuecomment-418505257 I made some fixes to the commits with the most recent push. Hopefully that should resolve the commit issues that came up earlier. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1) Report > Merging [#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3843 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer). Last update [da052ff...9413435](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
codecov-io edited a comment on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418446140 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=h1) Report > Merging [#3843](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3843/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3843 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=footer). Last update [da052ff...9413435](https://codecov.io/gh/apache/incubator-airflow/pull/3843?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko opened a new pull request #3844: [AIRFLOW-3003] Pull the krb5 image instead of building
Fokko opened a new pull request #3844: [AIRFLOW-3003] Pull the krb5 image instead of building URL: https://github.com/apache/incubator-airflow/pull/3844 Pull the image instead of building it, this will speed up the CI process since we don't have to build it every time. I did a test, on the current master (17.430 seconds in total): https://travis-ci.org/Fokko/incubator-airflow/builds/424438601 The PR takes 17.004 seconds in total: https://travis-ci.org/Fokko/incubator-airflow/builds/424479941 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-3003\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3003 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-3003\], code changes always need a Jira issue. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3003) Pull the krb5 image instead of building it
[ https://issues.apache.org/jira/browse/AIRFLOW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603581#comment-16603581 ] ASF GitHub Bot commented on AIRFLOW-3003: - Fokko opened a new pull request #3844: [AIRFLOW-3003] Pull the krb5 image instead of building URL: https://github.com/apache/incubator-airflow/pull/3844 Pull the image instead of building it, this will speed up the CI process since we don't have to build it every time. I did a test, on the current master (17.430 seconds in total): https://travis-ci.org/Fokko/incubator-airflow/builds/424438601 The PR takes 17.004 seconds in total: https://travis-ci.org/Fokko/incubator-airflow/builds/424479941 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-3003\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3003 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-3003\], code changes always need a Jira issue. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Pull the krb5 image instead of building it > -- > > Key: AIRFLOW-3003 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3003 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > > For the CI we use a krb5 image to test kerberos functionality. This is not > something that we want to since it is faster to pull the finished image. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3005) Replace 'Airbnb Airflow' with 'Apache Airflow'
Kaxil Naik created AIRFLOW-3005: --- Summary: Replace 'Airbnb Airflow' with 'Apache Airflow' Key: AIRFLOW-3005 URL: https://issues.apache.org/jira/browse/AIRFLOW-3005 Project: Apache Airflow Issue Type: Improvement Components: docs, Documentation Reporter: Kaxil Naik Assignee: Kaxil Naik Fix For: 2.0.0 There are still many files where Airbnb is mentioned or the links point to broken pages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil opened a new pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'
kaxil opened a new pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow' URL: https://github.com/apache/incubator-airflow/pull/3845 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3005 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3005) Replace 'Airbnb Airflow' with 'Apache Airflow'
[ https://issues.apache.org/jira/browse/AIRFLOW-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603601#comment-16603601 ] ASF GitHub Bot commented on AIRFLOW-3005: - kaxil opened a new pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow' URL: https://github.com/apache/incubator-airflow/pull/3845 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3005 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Replace 'Airbnb Airflow' with 'Apache Airflow' > -- > > Key: AIRFLOW-3005 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3005 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, Documentation >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 2.0.0 > > > There are still many files where Airbnb is mentioned or the links point to > broken pages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io commented on issue #3844: [AIRFLOW-3003] Pull the krb5 image instead of building
codecov-io commented on issue #3844: [AIRFLOW-3003] Pull the krb5 image instead of building URL: https://github.com/apache/incubator-airflow/pull/3844#issuecomment-418521145 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=h1) Report > Merging [#3844](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3844/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3844 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=footer). Last update [da052ff...3336883](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3844: [AIRFLOW-3003] Pull the krb5 image instead of building
codecov-io edited a comment on issue #3844: [AIRFLOW-3003] Pull the krb5 image instead of building URL: https://github.com/apache/incubator-airflow/pull/3844#issuecomment-418521145 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=h1) Report > Merging [#3844](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/da052ff7315cfd96a0deaab9577ba18a4089749e?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3844/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#3844 +/- ## === Coverage 77.43% 77.43% === Files 203 203 Lines 1584615846 === Hits1227112271 Misses 3575 3575 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=footer). Last update [da052ff...3336883](https://codecov.io/gh/apache/incubator-airflow/pull/3844?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'
feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow' URL: https://github.com/apache/incubator-airflow/pull/3845#discussion_r215070750 ## File path: tests/sensors/test_http_sensor.py ## @@ -178,7 +178,7 @@ def test_get_response_check(self): method='GET', endpoint='/search', data={"client": "ubuntu", "q": "airflow"}, -response_check=lambda response: ("airbnb/airflow" in response.text), +response_check=lambda response: ("apache/airflow" in response.text), Review comment: same This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'
feng-tao commented on a change in pull request #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow' URL: https://github.com/apache/incubator-airflow/pull/3845#discussion_r215070725 ## File path: tests/sensors/test_http_sensor.py ## @@ -140,7 +140,7 @@ class FakeSession(object): def __init__(self): self.response = requests.Response() self.response.status_code = 200 -self.response._content = 'airbnb/airflow'.encode('ascii', 'ignore') +self.response._content = 'apache/airflow'.encode('ascii', 'ignore') Review comment: should it be apache/incubator-airflow ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)
[ https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603611#comment-16603611 ] Andreas Költringer commented on AIRFLOW-2319: - I was thinking about possible fixes (see my comment above). The problem are the different database backends - e.g. Sqlite does not support dropping uniqueness constraints. Lacking confirmation by the projects' top committers/leaders that this is actually a bug (and not intended due to some reason I might not see) - I did not proceed. > Table "dag_run" has (bad) second index on (dag_id, execution_date) > -- > > Key: AIRFLOW-2319 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2319 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.9.0 >Reporter: Andreas Költringer >Priority: Major > > Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} > (multiple rows with the same {{(dag_id, execution_date)}}) raised the > following error: > {code:java} > {models.py:1644} ERROR - No row was found for one(){code} > This is weird as the {{session.add()}} and {{session.commit()}} is right > before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}. > Manually inspecting the database revealed that there is an extra index with > {{unique}} constraint on the columns {{(dag_id, execution_date)}}: > {code:java} > sqlite> .schema dag_run > CREATE TABLE dag_run ( > id INTEGER NOT NULL, > dag_id VARCHAR(250), > execution_date DATETIME, > state VARCHAR(50), > run_id VARCHAR(250), > external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date > DATETIME, > PRIMARY KEY (id), > UNIQUE (dag_id, execution_date), > UNIQUE (dag_id, run_id), > CHECK (external_trigger IN (0, 1)) > ); > CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code} > (On SQLite its a unique constraint, on MariaDB its also an index) > The {{DagRun}} class in {{models.py}} does not reflect this, however it is in > [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42] > I looked for other migrations correting this, but could not find any. As this > is not reflected in the model, I guess this is a bug? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)
[ https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603302#comment-16603302 ] Trevor Edwards edited comment on AIRFLOW-2319 at 9/4/18 9:23 PM: - +1 to this issue. There is an id column, but aside from this, it seems like only the pair (dag_id, [run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]]) should be enforced as unique. The current behavior feels like a bug. This issue becomes problematic if you have event-driven DAGs (e.g. [https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) which may have different parameters execute simultaneously, causing an execution_date collision. Andreas, are you working on a fix for this? was (Author: trevoredwards): +1 to this issue. There is an id column, but aside from this, it seems like only the pair (dag_id, [run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384])] should be enforced as unique. The current behavior feels like a bug. This issue becomes problematic if you have event-driven DAGs (e.g. https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf) which may have different parameters execute simultaneously, causing an execution_date collision. Andreas, are you working on a fix for this? > Table "dag_run" has (bad) second index on (dag_id, execution_date) > -- > > Key: AIRFLOW-2319 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2319 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.9.0 >Reporter: Andreas Költringer >Priority: Major > > Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} > (multiple rows with the same {{(dag_id, execution_date)}}) raised the > following error: > {code:java} > {models.py:1644} ERROR - No row was found for one(){code} > This is weird as the {{session.add()}} and {{session.commit()}} is right > before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}. > Manually inspecting the database revealed that there is an extra index with > {{unique}} constraint on the columns {{(dag_id, execution_date)}}: > {code:java} > sqlite> .schema dag_run > CREATE TABLE dag_run ( > id INTEGER NOT NULL, > dag_id VARCHAR(250), > execution_date DATETIME, > state VARCHAR(50), > run_id VARCHAR(250), > external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date > DATETIME, > PRIMARY KEY (id), > UNIQUE (dag_id, execution_date), > UNIQUE (dag_id, run_id), > CHECK (external_trigger IN (0, 1)) > ); > CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code} > (On SQLite its a unique constraint, on MariaDB its also an index) > The {{DagRun}} class in {{models.py}} does not reflect this, however it is in > [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42] > I looked for other migrations correting this, but could not find any. As this > is not reflected in the model, I guess this is a bug? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2319) Table "dag_run" has (bad) second index on (dag_id, execution_date)
[ https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603302#comment-16603302 ] Trevor Edwards edited comment on AIRFLOW-2319 at 9/4/18 9:25 PM: - +1 to this issue. There is an id column, but aside from this, it seems like only the pair (dag_id, [run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]] ) should be enforced as unique. The current behavior feels like a bug. This issue becomes problematic if you have event-driven DAGs (e.g. [https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) which may have different parameters execute simultaneously, causing an execution_date collision. Andreas, are you working on a fix for this? was (Author: trevoredwards): +1 to this issue. There is an id column, but aside from this, it seems like only the pair (dag_id, [run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]]) should be enforced as unique. The current behavior feels like a bug. This issue becomes problematic if you have event-driven DAGs (e.g. [https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) which may have different parameters execute simultaneously, causing an execution_date collision. Andreas, are you working on a fix for this? > Table "dag_run" has (bad) second index on (dag_id, execution_date) > -- > > Key: AIRFLOW-2319 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2319 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.9.0 >Reporter: Andreas Költringer >Priority: Major > > Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} > (multiple rows with the same {{(dag_id, execution_date)}}) raised the > following error: > {code:java} > {models.py:1644} ERROR - No row was found for one(){code} > This is weird as the {{session.add()}} and {{session.commit()}} is right > before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}. > Manually inspecting the database revealed that there is an extra index with > {{unique}} constraint on the columns {{(dag_id, execution_date)}}: > {code:java} > sqlite> .schema dag_run > CREATE TABLE dag_run ( > id INTEGER NOT NULL, > dag_id VARCHAR(250), > execution_date DATETIME, > state VARCHAR(50), > run_id VARCHAR(250), > external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date > DATETIME, > PRIMARY KEY (id), > UNIQUE (dag_id, execution_date), > UNIQUE (dag_id, run_id), > CHECK (external_trigger IN (0, 1)) > ); > CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code} > (On SQLite its a unique constraint, on MariaDB its also an index) > The {{DagRun}} class in {{models.py}} does not reflect this, however it is in > [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42] > I looked for other migrations correting this, but could not find any. As this > is not reflected in the model, I guess this is a bug? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor
yrqls21 commented on issue #3830: [AIRFLOW-2156] Parallelize Celery Executor URL: https://github.com/apache/incubator-airflow/pull/3830#issuecomment-418531406 @kaxil Tyvm. We definitely should test thoroughly. Just to provide a data point here, the change has been running in Airbnb production for 2+ months plus more times in stress test cluster( we're running 1.8 + celery executor). For the Codecov, should I rebase to fix it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow'
kaxil commented on issue #3845: [AIRFLOW-3005] Replace 'Airbnb Airflow' with 'Apache Airflow' URL: https://github.com/apache/incubator-airflow/pull/3845#issuecomment-418531520 @feng-tao Made the necessary changes :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables
kaxil commented on issue #3838: [AIRFLOW-2997] Support for Bigquery clustered tables URL: https://github.com/apache/incubator-airflow/pull/3838#issuecomment-418531811 Can you squash commits? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog…
jeffkpayne commented on issue #3843: [AIRFLOW-3002] Correctly test os.stat().st_size of local_file in goog… URL: https://github.com/apache/incubator-airflow/pull/3843#issuecomment-418537507 Added additional test and fixed inconsistent exception message formatting. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3006) Error when schedule_interval="None"
Kaxil Naik created AIRFLOW-3006: --- Summary: Error when schedule_interval="None" Key: AIRFLOW-3006 URL: https://issues.apache.org/jira/browse/AIRFLOW-3006 Project: Apache Airflow Issue Type: Improvement Components: core, scheduler Affects Versions: 1.10.0, 1.9.0, 1.8.2 Reporter: Kaxil Naik Assignee: Kaxil Naik Fix For: 1.10.1 When `schedule_interval` is set to `"None"`, it gives the following error: {code:python} dag = DAG('params-temp3', default_args=default_args, schedule_interval='None') {code} {code:python} [2018-09-04 23:26:21,515] {dag_processing.py:582} INFO - Started a process (PID: 65903) to generate tasks for /Users/kaxil/airflow/dags/params-temp1.py Process DagFileProcessor386-Process: Traceback (most recent call last): File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 388, in helper pickle_dags) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 1832, in process_file self._process_dags(dagbag, dags, ti_keys_to_schedule) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 1422, in _process_dags dag_run = self.create_dag_run(dag) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 856, in create_dag_run next_run_date = dag.normalize_schedule(min(task_start_dates)) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py", line 3410, in normalize_schedule following = self.following_schedule(dttm) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py", line 3353, in following_schedule cron = croniter(self._schedule_interval, dttm) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py", line 92, in __init__ self.expanded, self.nth_weekday_of_month = self.expand(expr_format) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py", line 467, in expand raise CroniterBadCronError(cls.bad_length) CroniterBadCronError: Exactly 5 or 6 columns has to be specified for iteratorexpression. [2018-09-04 23:26:22,657] {dag_processing.py:495} INFO - Processor for /Users/kaxil/airflow/dags/params-temp1.py finished {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil opened a new pull request #3846: [AIRFLOW-3006] Fix issue with schedule_interval='None'
kaxil opened a new pull request #3846: [AIRFLOW-3006] Fix issue with schedule_interval='None' URL: https://github.com/apache/incubator-airflow/pull/3846 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3006 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: When `schedule_interval` is set to `"None"` (Not python literal `None` but 'None' string) as shown in the example below: ```python dag = DAG('params-temp3', default_args=default_args, schedule_interval='None') ``` it gives the following error: ```python [2018-09-04 23:26:21,515] {dag_processing.py:582} INFO - Started a process (PID: 65903) to generate tasks for /Users/kaxil/airflow/dags/params-temp1.py Process DagFileProcessor386-Process: Traceback (most recent call last): File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/Users/kaxil/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 388, in helper pickle_dags) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 1832, in process_file self._process_dags(dagbag, dags, ti_keys_to_schedule) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 1422, in _process_dags dag_run = self.create_dag_run(dag) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/jobs.py", line 856, in create_dag_run next_run_date = dag.normalize_schedule(min(task_start_dates)) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py", line 3410, in normalize_schedule following = self.following_schedule(dttm) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/airflow/models.py", line 3353, in following_schedule cron = croniter(self._schedule_interval, dttm) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py", line 92, in __init__ self.expanded, self.nth_weekday_of_month = self.expand(expr_format) File "/Users/kaxil/.virtualenvs/tst-pip-airflow/lib/python2.7/site-packages/croniter/croniter.py", line 467, in expand raise CroniterBadCronError(cls.bad_length) CroniterBadCronError: Exactly 5 or 6 columns has to be specified for iteratorexpression. [2018-09-04 23:26:22,657] {dag_processing.py:495} INFO - Processor for /Users/kaxil/airflow/dags/params-temp1.py finished ``` Our documentation at https://airflow.apache.org/scheduler.html#dag-runs has 'None' as **preset** since 1.8.2 or even before, hence we should accept **"None"** as a valid `schedule_interval` apart from **None** ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - `test_scheduler_dagrun_none ` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service