[GitHub] vardancse commented on a change in pull request #3994: [AIRFLOW-3136] Add retry_number to TaskInstance Key property to avoid race condition
vardancse commented on a change in pull request #3994: [AIRFLOW-3136] Add retry_number to TaskInstance Key property to avoid race condition URL: https://github.com/apache/incubator-airflow/pull/3994#discussion_r223258140 ## File path: airflow/models.py ## @@ -1230,7 +1230,7 @@ def key(self): """ Returns a tuple that identifies the task instance uniquely """ -return self.dag_id, self.task_id, self.execution_date +return self.dag_id, self.task_id, self.execution_date, self.try_number Review comment: @ashb Looking forward to your feedback! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3132) Allow to specify auto_remove option for DockerOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641283#comment-16641283 ] Guoqiang Ding commented on AIRFLOW-3132: Github pullrequest: https://github.com/apache/incubator-airflow/pull/3977 > Allow to specify auto_remove option for DockerOperator > -- > > Key: AIRFLOW-3132 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3132 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Guoqiang Ding >Assignee: Guoqiang Ding >Priority: Major > > Sometimes we want to run docker container command just once. Docker API > client allows to specify the auto_remove option when starting a container. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] gerardo commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption.
gerardo commented on issue #3805: [AIRFLOW-2062] Add per-connection KMS encryption. URL: https://github.com/apache/incubator-airflow/pull/3805#issuecomment-427702362 @Fokko @jakahn I feel like this PR should be presented as an [Airflow Improvement Proposal](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals), given the changes being introduced to `models.py`. It should be easy to do, as there is a [design doc](https://docs.google.com/document/d/1qaucGw52aoR96swHQqIYN9nQXn2QKSOWvw6-KyI6Wwc/edit) already. BTW @jakahn try rebasing your branch now. It's broken at this moment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3141) Fix 500 on duration view when dag doesn't exist
[ https://issues.apache.org/jira/browse/AIRFLOW-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641239#comment-16641239 ] ASF GitHub Bot commented on AIRFLOW-3141: - feng-tao closed pull request #3991: [AIRFLOW-3141] Backfill missing sensor tests. URL: https://github.com/apache/incubator-airflow/pull/3991 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/sensors/s3_key_sensor.py b/airflow/sensors/s3_key_sensor.py index a743ba2e0c..462091ff86 100644 --- a/airflow/sensors/s3_key_sensor.py +++ b/airflow/sensors/s3_key_sensor.py @@ -73,10 +73,7 @@ def __init__(self, raise AirflowException('Please provide a bucket_name') else: bucket_name = parsed_url.netloc -if parsed_url.path[0] == '/': -bucket_key = parsed_url.path[1:] -else: -bucket_key = parsed_url.path +bucket_key = parsed_url.path.lstrip('/') else: parsed_url = urlparse(bucket_key) if parsed_url.scheme != '' or parsed_url.netloc != '': @@ -97,5 +94,4 @@ def poke(self, context): if self.wildcard_match: return hook.check_for_wildcard_key(self.bucket_key, self.bucket_name) -else: -return hook.check_for_key(self.bucket_key, self.bucket_name) +return hook.check_for_key(self.bucket_key, self.bucket_name) diff --git a/airflow/sensors/sql_sensor.py b/airflow/sensors/sql_sensor.py index de46e6d3db..d2ef6b3626 100644 --- a/airflow/sensors/sql_sensor.py +++ b/airflow/sensors/sql_sensor.py @@ -51,8 +51,4 @@ def poke(self, context): records = hook.get_records(self.sql) if not records: return False -else: -if str(records[0][0]) in ('0', '',): -return False -else: -return True +return str(records[0][0]) not in ('0', '') diff --git a/tests/sensors/test_s3_key_sensor.py b/tests/sensors/test_s3_key_sensor.py index a6e77058d8..0d7f5678f8 100644 --- a/tests/sensors/test_s3_key_sensor.py +++ b/tests/sensors/test_s3_key_sensor.py @@ -17,7 +17,10 @@ # specific language governing permissions and limitations # under the License. +import mock import unittest +from parameterized import parameterized + from airflow.exceptions import AirflowException from airflow.sensors.s3_key_sensor import S3KeySensor @@ -31,7 +34,9 @@ def test_bucket_name_None_and_bucket_key_as_relative_path(self): :return: """ with self.assertRaises(AirflowException): -S3KeySensor(bucket_key="file_in_bucket") +S3KeySensor( +task_id='s3_key_sensor', +bucket_key="file_in_bucket") def test_bucket_name_provided_and_bucket_key_is_s3_url(self): """ @@ -40,5 +45,49 @@ def test_bucket_name_provided_and_bucket_key_is_s3_url(self): :return: """ with self.assertRaises(AirflowException): -S3KeySensor(bucket_key="s3://test_bucket/file", -bucket_name='test_bucket') +S3KeySensor( +task_id='s3_key_sensor', +bucket_key="s3://test_bucket/file", +bucket_name='test_bucket') + +@parameterized.expand([ +['s3://bucket/key', None, 'key', 'bucket'], +['key', 'bucket', 'key', 'bucket'], +]) +def test_parse_bucket_key(self, key, bucket, parsed_key, parsed_bucket): +s = S3KeySensor( +task_id='s3_key_sensor', +bucket_key=key, +bucket_name=bucket, +) +self.assertEqual(s.bucket_key, parsed_key) +self.assertEqual(s.bucket_name, parsed_bucket) + +@mock.patch('airflow.hooks.S3_hook.S3Hook') +def test_poke(self, mock_hook): +s = S3KeySensor( +task_id='s3_key_sensor', +bucket_key='s3://test_bucket/file') + +mock_check_for_key = mock_hook.return_value.check_for_key +mock_check_for_key.return_value = False +self.assertFalse(s.poke(None)) +mock_check_for_key.assert_called_with(s.bucket_key, s.bucket_name) + +mock_hook.return_value.check_for_key.return_value = True +self.assertTrue(s.poke(None)) + +@mock.patch('airflow.hooks.S3_hook.S3Hook') +def test_poke_wildcard(self, mock_hook): +s = S3KeySensor( +task_id='s3_key_sensor', +bucket_key='s3://test_bucket/file', +wildcard_match=True) + +mock_check_for_wildcard_key = mock_hook.return_value.check_for_wildcard_key +mock_chec
[GitHub] feng-tao closed pull request #3991: [AIRFLOW-3141] Backfill missing sensor tests.
feng-tao closed pull request #3991: [AIRFLOW-3141] Backfill missing sensor tests. URL: https://github.com/apache/incubator-airflow/pull/3991 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/sensors/s3_key_sensor.py b/airflow/sensors/s3_key_sensor.py index a743ba2e0c..462091ff86 100644 --- a/airflow/sensors/s3_key_sensor.py +++ b/airflow/sensors/s3_key_sensor.py @@ -73,10 +73,7 @@ def __init__(self, raise AirflowException('Please provide a bucket_name') else: bucket_name = parsed_url.netloc -if parsed_url.path[0] == '/': -bucket_key = parsed_url.path[1:] -else: -bucket_key = parsed_url.path +bucket_key = parsed_url.path.lstrip('/') else: parsed_url = urlparse(bucket_key) if parsed_url.scheme != '' or parsed_url.netloc != '': @@ -97,5 +94,4 @@ def poke(self, context): if self.wildcard_match: return hook.check_for_wildcard_key(self.bucket_key, self.bucket_name) -else: -return hook.check_for_key(self.bucket_key, self.bucket_name) +return hook.check_for_key(self.bucket_key, self.bucket_name) diff --git a/airflow/sensors/sql_sensor.py b/airflow/sensors/sql_sensor.py index de46e6d3db..d2ef6b3626 100644 --- a/airflow/sensors/sql_sensor.py +++ b/airflow/sensors/sql_sensor.py @@ -51,8 +51,4 @@ def poke(self, context): records = hook.get_records(self.sql) if not records: return False -else: -if str(records[0][0]) in ('0', '',): -return False -else: -return True +return str(records[0][0]) not in ('0', '') diff --git a/tests/sensors/test_s3_key_sensor.py b/tests/sensors/test_s3_key_sensor.py index a6e77058d8..0d7f5678f8 100644 --- a/tests/sensors/test_s3_key_sensor.py +++ b/tests/sensors/test_s3_key_sensor.py @@ -17,7 +17,10 @@ # specific language governing permissions and limitations # under the License. +import mock import unittest +from parameterized import parameterized + from airflow.exceptions import AirflowException from airflow.sensors.s3_key_sensor import S3KeySensor @@ -31,7 +34,9 @@ def test_bucket_name_None_and_bucket_key_as_relative_path(self): :return: """ with self.assertRaises(AirflowException): -S3KeySensor(bucket_key="file_in_bucket") +S3KeySensor( +task_id='s3_key_sensor', +bucket_key="file_in_bucket") def test_bucket_name_provided_and_bucket_key_is_s3_url(self): """ @@ -40,5 +45,49 @@ def test_bucket_name_provided_and_bucket_key_is_s3_url(self): :return: """ with self.assertRaises(AirflowException): -S3KeySensor(bucket_key="s3://test_bucket/file", -bucket_name='test_bucket') +S3KeySensor( +task_id='s3_key_sensor', +bucket_key="s3://test_bucket/file", +bucket_name='test_bucket') + +@parameterized.expand([ +['s3://bucket/key', None, 'key', 'bucket'], +['key', 'bucket', 'key', 'bucket'], +]) +def test_parse_bucket_key(self, key, bucket, parsed_key, parsed_bucket): +s = S3KeySensor( +task_id='s3_key_sensor', +bucket_key=key, +bucket_name=bucket, +) +self.assertEqual(s.bucket_key, parsed_key) +self.assertEqual(s.bucket_name, parsed_bucket) + +@mock.patch('airflow.hooks.S3_hook.S3Hook') +def test_poke(self, mock_hook): +s = S3KeySensor( +task_id='s3_key_sensor', +bucket_key='s3://test_bucket/file') + +mock_check_for_key = mock_hook.return_value.check_for_key +mock_check_for_key.return_value = False +self.assertFalse(s.poke(None)) +mock_check_for_key.assert_called_with(s.bucket_key, s.bucket_name) + +mock_hook.return_value.check_for_key.return_value = True +self.assertTrue(s.poke(None)) + +@mock.patch('airflow.hooks.S3_hook.S3Hook') +def test_poke_wildcard(self, mock_hook): +s = S3KeySensor( +task_id='s3_key_sensor', +bucket_key='s3://test_bucket/file', +wildcard_match=True) + +mock_check_for_wildcard_key = mock_hook.return_value.check_for_wildcard_key +mock_check_for_wildcard_key.return_value = False +self.assertFalse(s.poke(None)) +mock_check_for_wildcard_key.assert_called_with(s.bucket_key, s.bucket_name) + +mock_check_for_wildcard_key.return_value = True +self.assertTrue(s.poke(None))
[GitHub] codecov-io commented on issue #4016: [AIRFLOW-XXX] Fix Typo in SFTPOperator docstring
codecov-io commented on issue #4016: [AIRFLOW-XXX] Fix Typo in SFTPOperator docstring URL: https://github.com/apache/incubator-airflow/pull/4016#issuecomment-427689430 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=h1) Report > Merging [#4016](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/92b54bb0697339d7b2ab89d8bdd926eb8d2273bb?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4016/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #4016 +/- ## == Coverage75.5% 75.5% == Files 199 199 Lines 15949 15949 == Hits12043 12043 Misses 39063906 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=footer). Last update [92b54bb...643c0bd](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4016: [AIRFLOW-XXX] Fix Typo in SFTPOperator docstring
codecov-io edited a comment on issue #4016: [AIRFLOW-XXX] Fix Typo in SFTPOperator docstring URL: https://github.com/apache/incubator-airflow/pull/4016#issuecomment-427689430 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=h1) Report > Merging [#4016](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/92b54bb0697339d7b2ab89d8bdd926eb8d2273bb?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4016/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #4016 +/- ## == Coverage75.5% 75.5% == Files 199 199 Lines 15949 15949 == Hits12043 12043 Misses 39063906 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=footer). Last update [92b54bb...643c0bd](https://codecov.io/gh/apache/incubator-airflow/pull/4016?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3893: [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
codecov-io edited a comment on issue #3893: [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file URL: https://github.com/apache/incubator-airflow/pull/3893#issuecomment-424189836 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3893?src=pr&el=h1) Report > Merging [#3893](https://codecov.io/gh/apache/incubator-airflow/pull/3893?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/92b54bb0697339d7b2ab89d8bdd926eb8d2273bb?src=pr&el=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3893/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3893?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #3893 +/- ## == Coverage75.5% 75.5% == Files 199 199 Lines 15949 15949 == Hits12043 12043 Misses 39063906 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3893?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3893?src=pr&el=footer). Last update [92b54bb...4acabf9](https://codecov.io/gh/apache/incubator-airflow/pull/3893?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil opened a new pull request #4016: [AIRFLOW-XXX] Fix Typo in SFTPOperator docstring
kaxil opened a new pull request #4016: [AIRFLOW-XXX] Fix Typo in SFTPOperator docstring URL: https://github.com/apache/incubator-airflow/pull/4016 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Fix Typo in SFTPOperator docstring ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: N/a - Docstring change ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [x] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] neil90 commented on issue #3893: [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file
neil90 commented on issue #3893: [AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file URL: https://github.com/apache/incubator-airflow/pull/3893#issuecomment-427686729 @kaxil and @xnuinside I have updated the operator and added test case, I used the test_file_to_wasb.py as a template. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3989: [AIRFLOW-1945] Autoscale celery workers for airflow added
codecov-io edited a comment on issue #3989: [AIRFLOW-1945] Autoscale celery workers for airflow added URL: https://github.com/apache/incubator-airflow/pull/3989#issuecomment-426543786 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3989?src=pr&el=h1) Report > Merging [#3989](https://codecov.io/gh/apache/incubator-airflow/pull/3989?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/92b54bb0697339d7b2ab89d8bdd926eb8d2273bb?src=pr&el=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3989/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3989?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#3989 +/- ## == - Coverage75.5% 75.49% -0.02% == Files 199 199 Lines 1594915952 +3 == Hits1204312043 - Misses 3906 3909 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3989?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3989/diff?src=pr&el=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.25% <100%> (-0.23%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3989?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3989?src=pr&el=footer). Last update [92b54bb...b4d8180](https://codecov.io/gh/apache/incubator-airflow/pull/3989?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log
codecov-io edited a comment on issue #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log URL: https://github.com/apache/incubator-airflow/pull/3992#issuecomment-426519197 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=h1) Report > Merging [#3992](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/92b54bb0697339d7b2ab89d8bdd926eb8d2273bb?src=pr&el=desc) will **decrease** coverage by `0.13%`. > The diff coverage is `51.68%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3992/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#3992 +/- ## == - Coverage75.5% 75.36% -0.14% == Files 199 199 Lines 1594916025 +76 == + Hits1204312078 +35 - Misses 3906 3947 +41 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/utils/log/file\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZmlsZV90YXNrX2hhbmRsZXIucHk=) | `77.14% <33.33%> (-12.27%)` | :arrow_down: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `62.98% <4.54%> (-1.51%)` | :arrow_down: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `69.01% <90%> (+0.15%)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.18% <90%> (+0.14%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=footer). Last update [92b54bb...70132f0](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log
codecov-io edited a comment on issue #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log URL: https://github.com/apache/incubator-airflow/pull/3992#issuecomment-426519197 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=h1) Report > Merging [#3992](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/92b54bb0697339d7b2ab89d8bdd926eb8d2273bb?src=pr&el=desc) will **decrease** coverage by `0.13%`. > The diff coverage is `51.68%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3992/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#3992 +/- ## == - Coverage75.5% 75.36% -0.14% == Files 199 199 Lines 1594916025 +76 == + Hits1204312078 +35 - Misses 3906 3947 +41 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/utils/log/file\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZmlsZV90YXNrX2hhbmRsZXIucHk=) | `77.14% <33.33%> (-12.27%)` | :arrow_down: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `62.98% <4.54%> (-1.51%)` | :arrow_down: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `69.01% <90%> (+0.15%)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/3992/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.18% <90%> (+0.14%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=footer). Last update [92b54bb...70132f0](https://codecov.io/gh/apache/incubator-airflow/pull/3992?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] phani8996 commented on a change in pull request #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log
phani8996 commented on a change in pull request #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log URL: https://github.com/apache/incubator-airflow/pull/3992#discussion_r223224920 ## File path: airflow/bin/cli.py ## @@ -1023,17 +1023,41 @@ def scheduler(args): @cli_utils.action_logging def serve_logs(args): print("Starting flask") -import flask -flask_app = flask.Flask(__name__) +from flask import Flask, request, Response, stream_with_context, send_from_directory +flask_app = Flask(__name__) @flask_app.route('/log/') def serve_logs(filename): # noqa +def tail_logs(logdir, filename, num_lines): +logpath = "{logdir}/{filename}".format(logdir=logdir, filename=filename) +logsize = os.path.getsize(logpath) +if logsize >= 100 * 1024 * 1024: +p1 = subprocess.Popen(["tail", "-n " + str(num_lines), filename], + stdout=subprocess.PIPE, cwd=log) +out, err = p1.communicate() +out = "Tailing file\n\n" + out.decode("utf-8") +else: +fl = open("{log}//{filename}".format(log=log, filename=filename), "r") +lines = fl.readlines() +fl.close() +out = "".join(l for l in lines[-num_lines:]) +line = "* Showing only last {num_lines} lines from {filename} *" \ + "\n\n\n{out}".format(num_lines=num_lines, filename=filename, out=out) +yield line +num_lines = request.args.get("num_lines") +try: +num_lines = int(num_lines) +except ValueError or TypeError: +num_lines = None Review comment: @jeffkpayne that's a valid point. Logging invalid values is necessary, but logging it here won't identify the origin of query params.At this point of time, the origin of this value is a key in config. If there are multiple origins of requests, then we can log it. For now, admin/developer can debug this with values provided in airflow config. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-3055) add get_dataset and get_datasets_list to bigquery_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-3055. - Resolution: Fixed Fix Version/s: (was: 1.10.1) 2.0.0 Resolved by https://github.com/apache/incubator-airflow/pull/3894 > add get_dataset and get_datasets_list to bigquery_hook > -- > > Key: AIRFLOW-3055 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3055 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > Fix For: 2.0.0 > > > Add operators to check what Dataset exist and operator what check a list of > datasets in BigQuery > implementation of: > [https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/get] > and > [https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list] > I already done it. I will open PR soon (planned today). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3055) add get_dataset and get_datasets_list to bigquery_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3055: Affects Version/s: (was: 1.10.1) > add get_dataset and get_datasets_list to bigquery_hook > -- > > Key: AIRFLOW-3055 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3055 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Iuliia Volkova >Assignee: Iuliia Volkova >Priority: Minor > Fix For: 2.0.0 > > > Add operators to check what Dataset exist and operator what check a list of > datasets in BigQuery > implementation of: > [https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/get] > and > [https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list] > I already done it. I will open PR soon (planned today). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3055) add get_dataset and get_datasets_list to bigquery_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641169#comment-16641169 ] ASF GitHub Bot commented on AIRFLOW-3055: - kaxil closed pull request #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook URL: https://github.com/apache/incubator-airflow/pull/3894 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index dd77df1283..dba4618e35 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -1441,6 +1441,86 @@ def delete_dataset(self, project_id, dataset_id): 'BigQuery job failed. Error was: {}'.format(err.content) ) +def get_dataset(self, dataset_id, project_id=None): +""" +Method returns dataset_resource if dataset exist +and raised 404 error if dataset does not exist + +:param dataset_id: The BigQuery Dataset ID +:type dataset_id: str +:param project_id: The GCP Project ID +:type project_id: str +:return: dataset_resource + +.. seealso:: +For more information, see Dataset Resource content: + https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource +""" + +if not dataset_id or not isinstance(dataset_id, str): +raise ValueError("dataset_id argument must be provided and has " + "a type 'str'. You provided: {}".format(dataset_id)) + +dataset_project_id = project_id if project_id else self.project_id + +try: +dataset_resource = self.service.datasets().get( +datasetId=dataset_id, projectId=dataset_project_id).execute() +self.log.info("Dataset Resource: {}".format(dataset_resource)) +except HttpError as err: +raise AirflowException( +'BigQuery job failed. Error was: {}'.format(err.content)) + +return dataset_resource + +def get_datasets_list(self, project_id=None): +""" +Method returns full list of BigQuery datasets in the current project + +.. seealso:: +For more information, see: + https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list + +:param project_id: Google Cloud Project for which you +try to get all datasets +:type project_id: str +:return: datasets_list + +Example of returned datasets_list: :: + + { + "kind":"bigquery#dataset", + "location":"US", + "id":"your-project:dataset_2_test", + "datasetReference":{ + "projectId":"your-project", + "datasetId":"dataset_2_test" + } + }, + { + "kind":"bigquery#dataset", + "location":"US", + "id":"your-project:dataset_1_test", + "datasetReference":{ + "projectId":"your-project", + "datasetId":"dataset_1_test" + } + } +] +""" +dataset_project_id = project_id if project_id else self.project_id + +try: +datasets_list = self.service.datasets().list( +projectId=dataset_project_id).execute()['datasets'] +self.log.info("Datasets List: {}".format(datasets_list)) + +except HttpError as err: +raise AirflowException( +'BigQuery job failed. Error was: {}'.format(err.content)) + +return datasets_list + class BigQueryCursor(BigQueryBaseCursor): """ diff --git a/tests/contrib/hooks/test_bigquery_hook.py b/tests/contrib/hooks/test_bigquery_hook.py index 84fe84043e..77a31f0320 100644 --- a/tests/contrib/hooks/test_bigquery_hook.py +++ b/tests/contrib/hooks/test_bigquery_hook.py @@ -360,6 +360,68 @@ def test_create_empty_dataset_duplicates_call_err(self, {"datasetId": "test_dataset", "projectId": "project_test2"}}) +def test_get_dataset_without_dataset_id(self): +with mock.patch.object(hook.BigQueryHook, 'get_service'): +with self.assertRaises(ValueError): +hook.BigQueryBaseCursor( +mock.Mock(), "test_create_empty_dataset").get_dataset( +dataset_id="", project_id="project_test") + +def test_get_da
[GitHub] kaxil closed pull request #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook
kaxil closed pull request #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook URL: https://github.com/apache/incubator-airflow/pull/3894 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index dd77df1283..dba4618e35 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -1441,6 +1441,86 @@ def delete_dataset(self, project_id, dataset_id): 'BigQuery job failed. Error was: {}'.format(err.content) ) +def get_dataset(self, dataset_id, project_id=None): +""" +Method returns dataset_resource if dataset exist +and raised 404 error if dataset does not exist + +:param dataset_id: The BigQuery Dataset ID +:type dataset_id: str +:param project_id: The GCP Project ID +:type project_id: str +:return: dataset_resource + +.. seealso:: +For more information, see Dataset Resource content: + https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource +""" + +if not dataset_id or not isinstance(dataset_id, str): +raise ValueError("dataset_id argument must be provided and has " + "a type 'str'. You provided: {}".format(dataset_id)) + +dataset_project_id = project_id if project_id else self.project_id + +try: +dataset_resource = self.service.datasets().get( +datasetId=dataset_id, projectId=dataset_project_id).execute() +self.log.info("Dataset Resource: {}".format(dataset_resource)) +except HttpError as err: +raise AirflowException( +'BigQuery job failed. Error was: {}'.format(err.content)) + +return dataset_resource + +def get_datasets_list(self, project_id=None): +""" +Method returns full list of BigQuery datasets in the current project + +.. seealso:: +For more information, see: + https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/list + +:param project_id: Google Cloud Project for which you +try to get all datasets +:type project_id: str +:return: datasets_list + +Example of returned datasets_list: :: + + { + "kind":"bigquery#dataset", + "location":"US", + "id":"your-project:dataset_2_test", + "datasetReference":{ + "projectId":"your-project", + "datasetId":"dataset_2_test" + } + }, + { + "kind":"bigquery#dataset", + "location":"US", + "id":"your-project:dataset_1_test", + "datasetReference":{ + "projectId":"your-project", + "datasetId":"dataset_1_test" + } + } +] +""" +dataset_project_id = project_id if project_id else self.project_id + +try: +datasets_list = self.service.datasets().list( +projectId=dataset_project_id).execute()['datasets'] +self.log.info("Datasets List: {}".format(datasets_list)) + +except HttpError as err: +raise AirflowException( +'BigQuery job failed. Error was: {}'.format(err.content)) + +return datasets_list + class BigQueryCursor(BigQueryBaseCursor): """ diff --git a/tests/contrib/hooks/test_bigquery_hook.py b/tests/contrib/hooks/test_bigquery_hook.py index 84fe84043e..77a31f0320 100644 --- a/tests/contrib/hooks/test_bigquery_hook.py +++ b/tests/contrib/hooks/test_bigquery_hook.py @@ -360,6 +360,68 @@ def test_create_empty_dataset_duplicates_call_err(self, {"datasetId": "test_dataset", "projectId": "project_test2"}}) +def test_get_dataset_without_dataset_id(self): +with mock.patch.object(hook.BigQueryHook, 'get_service'): +with self.assertRaises(ValueError): +hook.BigQueryBaseCursor( +mock.Mock(), "test_create_empty_dataset").get_dataset( +dataset_id="", project_id="project_test") + +def test_get_dataset(self): +expected_result = { +"kind": "bigquery#dataset", +"location": "US", +"id": "your-project:dataset_2_test", +"datasetReference": { +"projectId": "your-project", +"
[jira] [Commented] (AIRFLOW-820) Standardize GCP related connection id names and default values
[ https://issues.apache.org/jira/browse/AIRFLOW-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641164#comment-16641164 ] Kaxil Naik commented on AIRFLOW-820: At this point of writing, `DataProcClusterCreateOperator` has already been standardized to use `gcp_conn_id ` instead of `google_cloud_default`. As far as I can remember the only service that is using a different conn id is BigQuery which is due to the fact that we use it in GCStoBQ Operator where if we need separate connections for services it is problematic to have a single conn id for both. Regarding [~dlamblin]'s comment: Having `source_conn_id` and `target_conn_id` removes the possbility and the main objective of this jira that is to pass the connections in default args. The `source_conn_id` and `target_conn_id` for GCStoBQ will be different to that of `GCStoS3`. > Standardize GCP related connection id names and default values > > > Key: AIRFLOW-820 > URL: https://issues.apache.org/jira/browse/AIRFLOW-820 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Feng Lu >Assignee: Feng Lu >Priority: Major > > A number of Google Cloud Platform (GCP) related operators, such as > BigQueryCheckOperator or DataFlowJavaOperator, are using different > connection_id var names and default values. For example, > BigQueryCheckOperator(.., big_query_conn_id='bigquery_default'..) > DataFlowJavaOperator(..., gcp_conn_id='google_cloud_default'...) > DataProcClusterCreateOperator(..., > google_cloud_conn_id='google_cloud_default',...) > This makes dag-level default_args problematic, one would have to specify each > connection_id explicitly in the default_args even though the same GCP > connection is shared throughout the DAG. We propose to: > - standardize all connection id names, e.g., > big_query_conn_id ---> gcp_conn_id > google_cloud_conn_id-->gcp_conn_id > - standardize all default values, e.g., > 'bigquery_default' --> 'google_cloud_default' > Therefore, if the same GCP connection is used, we only need to specify once > in the DAG default_args, e.g., > default_args = { > ... > gcp_conn_id='some_gcp_connection_id' > ... > } > Better still, if a connection with the default name 'google_cloud_default' > has already been created and used by all GCP operators, the gcp_conn_id > doesn't even need to be specified in DAG default_args. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] xnuinside commented on issue #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook
xnuinside commented on issue #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook URL: https://github.com/apache/incubator-airflow/pull/3894#issuecomment-427675981 @kaxil , thanks, as I see - all okay with CI This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jeffkpayne commented on a change in pull request #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log
jeffkpayne commented on a change in pull request #3992: [AIRFLOW-620] Feature to tail custom number of logs instead of rendering whole log URL: https://github.com/apache/incubator-airflow/pull/3992#discussion_r223222849 ## File path: airflow/bin/cli.py ## @@ -1023,17 +1023,41 @@ def scheduler(args): @cli_utils.action_logging def serve_logs(args): print("Starting flask") -import flask -flask_app = flask.Flask(__name__) +from flask import Flask, request, Response, stream_with_context, send_from_directory +flask_app = Flask(__name__) @flask_app.route('/log/') def serve_logs(filename): # noqa +def tail_logs(logdir, filename, num_lines): +logpath = "{logdir}/{filename}".format(logdir=logdir, filename=filename) +logsize = os.path.getsize(logpath) +if logsize >= 100 * 1024 * 1024: +p1 = subprocess.Popen(["tail", "-n " + str(num_lines), filename], + stdout=subprocess.PIPE, cwd=log) +out, err = p1.communicate() +out = "Tailing file\n\n" + out.decode("utf-8") +else: +fl = open("{log}//{filename}".format(log=log, filename=filename), "r") +lines = fl.readlines() +fl.close() +out = "".join(l for l in lines[-num_lines:]) +line = "* Showing only last {num_lines} lines from {filename} *" \ + "\n\n\n{out}".format(num_lines=num_lines, filename=filename, out=out) +yield line +num_lines = request.args.get("num_lines") +try: +num_lines = int(num_lines) +except ValueError or TypeError: +num_lines = None Review comment: If `num_lines` is not a valid `int`, should something be logged so the user/admin knows that something it setup incorrectly? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-820) Standardize GCP related connection id names and default values
[ https://issues.apache.org/jira/browse/AIRFLOW-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641137#comment-16641137 ] Daniel Lamblin commented on AIRFLOW-820: I think any operator that takes only 1 connection id should name that parameter as simply as {{conn_id}} and any transferring operator with two connection ids would be better off naming them {{source_conn_id}} and {{target_conn_id}} than having all these specific but unchecked type of connections like {{mysql_conn_id}} or {{aws_conn_id}}. > Standardize GCP related connection id names and default values > > > Key: AIRFLOW-820 > URL: https://issues.apache.org/jira/browse/AIRFLOW-820 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib >Reporter: Feng Lu >Assignee: Feng Lu >Priority: Major > > A number of Google Cloud Platform (GCP) related operators, such as > BigQueryCheckOperator or DataFlowJavaOperator, are using different > connection_id var names and default values. For example, > BigQueryCheckOperator(.., big_query_conn_id='bigquery_default'..) > DataFlowJavaOperator(..., gcp_conn_id='google_cloud_default'...) > DataProcClusterCreateOperator(..., > google_cloud_conn_id='google_cloud_default',...) > This makes dag-level default_args problematic, one would have to specify each > connection_id explicitly in the default_args even though the same GCP > connection is shared throughout the DAG. We propose to: > - standardize all connection id names, e.g., > big_query_conn_id ---> gcp_conn_id > google_cloud_conn_id-->gcp_conn_id > - standardize all default values, e.g., > 'bigquery_default' --> 'google_cloud_default' > Therefore, if the same GCP connection is used, we only need to specify once > in the DAG default_args, e.g., > default_args = { > ... > gcp_conn_id='some_gcp_connection_id' > ... > } > Better still, if a connection with the default name 'google_cloud_default' > has already been created and used by all GCP operators, the gcp_conn_id > doesn't even need to be specified in DAG default_args. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] antxxxx commented on issue #4003: [AIRFLOW-3163] add operator to enable setting table description in BigQuery table
ant commented on issue #4003: [AIRFLOW-3163] add operator to enable setting table description in BigQuery table URL: https://github.com/apache/incubator-airflow/pull/4003#issuecomment-427662813 There is not a bigquery api exposed to set the description when you want to run a query and save the results to a table - which is the main reason I wanted this. My flow is load data into staging tables -> aggregate data using sql and save the results to new table -> set table description on new table with metadata such as timestamp when aggregate was performed, source systems etc. The other use case I want it for is to include metadata such as latest extract date or load date in history tables. This obviously changes each time new data is added to the table so can not be set at table creation time, and again there does not seem an api exposed to set description when adding data to an existing table. I could change the existing operators to include a description parameters, and then call the hook to set the description after it has run, but it seemed simpler and more useful to add a new operator to set the description This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook
codecov-io edited a comment on issue #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook URL: https://github.com/apache/incubator-airflow/pull/3894#issuecomment-423291432 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3894?src=pr&el=h1) Report > Merging [#3894](https://codecov.io/gh/apache/incubator-airflow/pull/3894?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/6aabdf16455cb6ab2cbe0f6b2cee7f28c852f368?src=pr&el=desc) will **increase** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/3894/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/3894?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master #3894 +/- ## = + Coverage75.5% 75.5% +<.01% = Files 199 199 Lines 15945 15949 +4 = + Hits12039 12043 +4 Misses 39063906 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/3894?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/www\_rbac/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3894/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3dfcmJhYy9hcHAucHk=) | `96.7% <0%> (-1.08%)` | :arrow_down: | | [airflow/www/app.py](https://codecov.io/gh/apache/incubator-airflow/pull/3894/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3cvYXBwLnB5) | `98.95% <0%> (-1.05%)` | :arrow_down: | | [airflow/operators/redshift\_to\_s3\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/3894/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcmVkc2hpZnRfdG9fczNfb3BlcmF0b3IucHk=) | `95.45% <0%> (-0.11%)` | :arrow_down: | | [airflow/default\_login.py](https://codecov.io/gh/apache/incubator-airflow/pull/3894/diff?src=pr&el=tree#diff-YWlyZmxvdy9kZWZhdWx0X2xvZ2luLnB5) | `58.97% <0%> (ø)` | :arrow_up: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/3894/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `91.71% <0%> (+0.04%)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/3894/diff?src=pr&el=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.48% <0%> (+0.04%)` | :arrow_up: | | [airflow/hooks/http\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/3894/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9odHRwX2hvb2sucHk=) | `95.45% <0%> (+1.7%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3894?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/3894?src=pr&el=footer). Last update [6aabdf1...8bfb84a](https://codecov.io/gh/apache/incubator-airflow/pull/3894?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook
kaxil commented on issue #3894: [AIRFLOW-3055] add get_dataset and get_datasets_list to bigquery_hook URL: https://github.com/apache/incubator-airflow/pull/3894#issuecomment-427659777 I have updated a few minor changes. Let's wait for the CI to pass. If it passes, I will squash commit and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3162) HttpHook fails to parse URL when port is specified
[ https://issues.apache.org/jira/browse/AIRFLOW-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3162: Fix Version/s: 1.10.1 > HttpHook fails to parse URL when port is specified > -- > > Key: AIRFLOW-3162 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3162 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Eric Chang >Assignee: Eric Chang >Priority: Minor > Fix For: 1.10.1 > > > https://github.com/apache/incubator-airflow/pull/3379 introduced a regression > on HttpHook.get() where if a port is specified on > airflow.models.Connection.port and the provided endpoint has no leading > slash, the URL will fail to parse. > With the connection: > > {{airflow.models.Connection(}} > {{ conn_id='test_conn',}} > {{ conn_type='http',}} > {{ host='test.com',}} > {{ port=1234}} > {{)}} > {{>>> hook = HttpHook(method='GET', http_conn_id='test_conn')}} > {{>>> hook.get(endpoint='some/endpoint')}} > {{--}} > {{Traceback (most recent call last):}} > {{InvalidURL: Failed to parse: test.com:1234some/endpoint}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3162) HttpHook fails to parse URL when port is specified
[ https://issues.apache.org/jira/browse/AIRFLOW-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik updated AIRFLOW-3162: Affects Version/s: 1.10.0 > HttpHook fails to parse URL when port is specified > -- > > Key: AIRFLOW-3162 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3162 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Eric Chang >Assignee: Eric Chang >Priority: Minor > Fix For: 1.10.1 > > > https://github.com/apache/incubator-airflow/pull/3379 introduced a regression > on HttpHook.get() where if a port is specified on > airflow.models.Connection.port and the provided endpoint has no leading > slash, the URL will fail to parse. > With the connection: > > {{airflow.models.Connection(}} > {{ conn_id='test_conn',}} > {{ conn_type='http',}} > {{ host='test.com',}} > {{ port=1234}} > {{)}} > {{>>> hook = HttpHook(method='GET', http_conn_id='test_conn')}} > {{>>> hook.get(endpoint='some/endpoint')}} > {{--}} > {{Traceback (most recent call last):}} > {{InvalidURL: Failed to parse: test.com:1234some/endpoint}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil closed pull request #4001: [AIRFLOW-3162] Fix HttpHook URL parse error when port is specified
kaxil closed pull request #4001: [AIRFLOW-3162] Fix HttpHook URL parse error when port is specified URL: https://github.com/apache/incubator-airflow/pull/4001 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/hooks/http_hook.py b/airflow/hooks/http_hook.py index caa89d3142..1e0c7b3058 100644 --- a/airflow/hooks/http_hook.py +++ b/airflow/hooks/http_hook.py @@ -98,7 +98,11 @@ def run(self, endpoint, data=None, headers=None, extra_options=None): session = self.get_conn(headers) -url = self.base_url + endpoint +if not self.base_url.endswith('/') and not endpoint.startswith('/'): +url = self.base_url + '/' + endpoint +else: +url = self.base_url + endpoint + req = None if self.method == 'GET': # GET uses params diff --git a/tests/hooks/test_http_hook.py b/tests/hooks/test_http_hook.py index c8163322f9..e64e59fbca 100644 --- a/tests/hooks/test_http_hook.py +++ b/tests/hooks/test_http_hook.py @@ -42,6 +42,15 @@ def get_airflow_connection(conn_id=None): ) +def get_airflow_connection_with_port(conn_id=None): +return models.Connection( +conn_id='http_default', +conn_type='http', +host='test.com', +port=1234 +) + + class TestHttpHook(unittest.TestCase): """Test get, post and raise_for_status""" def setUp(self): @@ -69,6 +78,32 @@ def test_raise_for_status_with_200(self, m): resp = self.get_hook.run('v1/test') self.assertEquals(resp.text, '{"status":{"status": 200}}') +@requests_mock.mock() +@mock.patch('requests.Request') +def test_get_request_with_port(self, m, request_mock): +from requests.exceptions import MissingSchema + +with mock.patch( +'airflow.hooks.base_hook.BaseHook.get_connection', +side_effect=get_airflow_connection_with_port +): +expected_url = 'http://test.com:1234/some/endpoint' +for endpoint in ['some/endpoint', '/some/endpoint']: + +try: +self.get_hook.run(endpoint) +except MissingSchema: +pass + +request_mock.assert_called_once_with( +mock.ANY, +expected_url, +headers=mock.ANY, +params=mock.ANY +) + +request_mock.reset_mock() + @requests_mock.mock() def test_get_request_do_not_raise_for_status_if_check_response_is_false(self, m): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3162) HttpHook fails to parse URL when port is specified
[ https://issues.apache.org/jira/browse/AIRFLOW-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641081#comment-16641081 ] ASF GitHub Bot commented on AIRFLOW-3162: - kaxil closed pull request #4001: [AIRFLOW-3162] Fix HttpHook URL parse error when port is specified URL: https://github.com/apache/incubator-airflow/pull/4001 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/hooks/http_hook.py b/airflow/hooks/http_hook.py index caa89d3142..1e0c7b3058 100644 --- a/airflow/hooks/http_hook.py +++ b/airflow/hooks/http_hook.py @@ -98,7 +98,11 @@ def run(self, endpoint, data=None, headers=None, extra_options=None): session = self.get_conn(headers) -url = self.base_url + endpoint +if not self.base_url.endswith('/') and not endpoint.startswith('/'): +url = self.base_url + '/' + endpoint +else: +url = self.base_url + endpoint + req = None if self.method == 'GET': # GET uses params diff --git a/tests/hooks/test_http_hook.py b/tests/hooks/test_http_hook.py index c8163322f9..e64e59fbca 100644 --- a/tests/hooks/test_http_hook.py +++ b/tests/hooks/test_http_hook.py @@ -42,6 +42,15 @@ def get_airflow_connection(conn_id=None): ) +def get_airflow_connection_with_port(conn_id=None): +return models.Connection( +conn_id='http_default', +conn_type='http', +host='test.com', +port=1234 +) + + class TestHttpHook(unittest.TestCase): """Test get, post and raise_for_status""" def setUp(self): @@ -69,6 +78,32 @@ def test_raise_for_status_with_200(self, m): resp = self.get_hook.run('v1/test') self.assertEquals(resp.text, '{"status":{"status": 200}}') +@requests_mock.mock() +@mock.patch('requests.Request') +def test_get_request_with_port(self, m, request_mock): +from requests.exceptions import MissingSchema + +with mock.patch( +'airflow.hooks.base_hook.BaseHook.get_connection', +side_effect=get_airflow_connection_with_port +): +expected_url = 'http://test.com:1234/some/endpoint' +for endpoint in ['some/endpoint', '/some/endpoint']: + +try: +self.get_hook.run(endpoint) +except MissingSchema: +pass + +request_mock.assert_called_once_with( +mock.ANY, +expected_url, +headers=mock.ANY, +params=mock.ANY +) + +request_mock.reset_mock() + @requests_mock.mock() def test_get_request_do_not_raise_for_status_if_check_response_is_false(self, m): This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > HttpHook fails to parse URL when port is specified > -- > > Key: AIRFLOW-3162 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3162 > Project: Apache Airflow > Issue Type: Bug >Reporter: Eric Chang >Assignee: Eric Chang >Priority: Minor > > https://github.com/apache/incubator-airflow/pull/3379 introduced a regression > on HttpHook.get() where if a port is specified on > airflow.models.Connection.port and the provided endpoint has no leading > slash, the URL will fail to parse. > With the connection: > > {{airflow.models.Connection(}} > {{ conn_id='test_conn',}} > {{ conn_type='http',}} > {{ host='test.com',}} > {{ port=1234}} > {{)}} > {{>>> hook = HttpHook(method='GET', http_conn_id='test_conn')}} > {{>>> hook.get(endpoint='some/endpoint')}} > {{--}} > {{Traceback (most recent call last):}} > {{InvalidURL: Failed to parse: test.com:1234some/endpoint}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3162) HttpHook fails to parse URL when port is specified
[ https://issues.apache.org/jira/browse/AIRFLOW-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaxil Naik resolved AIRFLOW-3162. - Resolution: Fixed Resolved by https://github.com/apache/incubator-airflow/pull/4001 > HttpHook fails to parse URL when port is specified > -- > > Key: AIRFLOW-3162 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3162 > Project: Apache Airflow > Issue Type: Bug >Reporter: Eric Chang >Assignee: Eric Chang >Priority: Minor > > https://github.com/apache/incubator-airflow/pull/3379 introduced a regression > on HttpHook.get() where if a port is specified on > airflow.models.Connection.port and the provided endpoint has no leading > slash, the URL will fail to parse. > With the connection: > > {{airflow.models.Connection(}} > {{ conn_id='test_conn',}} > {{ conn_type='http',}} > {{ host='test.com',}} > {{ port=1234}} > {{)}} > {{>>> hook = HttpHook(method='GET', http_conn_id='test_conn')}} > {{>>> hook.get(endpoint='some/endpoint')}} > {{--}} > {{Traceback (most recent call last):}} > {{InvalidURL: Failed to parse: test.com:1234some/endpoint}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil commented on issue #4003: [AIRFLOW-3163] add operator to enable setting table description in BigQuery table
kaxil commented on issue #4003: [AIRFLOW-3163] add operator to enable setting table description in BigQuery table URL: https://github.com/apache/incubator-airflow/pull/4003#issuecomment-427658486 A different operator for setting table description? I don't think it is worth it. I am ok with having a param in BQ operators to add an optional description but definitely not a separate operator. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil closed pull request #4002: [AIRFLOW-3099] Complete list of optional airflow.cfg sections
kaxil closed pull request #4002: [AIRFLOW-3099] Complete list of optional airflow.cfg sections URL: https://github.com/apache/incubator-airflow/pull/4002 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index dbab8b38d8..14928bbd48 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -488,6 +488,24 @@ def _run(args, dag, ti): @cli_utils.action_logging def run(args, dag=None): +# Optional sections won't log an error if they're missing in airflow.cfg. +OPTIONAL_AIRFLOW_CFG_SECTIONS = [ +'atlas', +'celery', +'celery_broker_transport_options', +'dask', +'elasticsearch', +'github_enterprise', +'hive', +'kerberos', +'kubernetes', +'kubernetes_node_selectors', +'kubernetes_secrets', +'ldap', +'lineage', +'mesos', +] + if dag: args.dag_id = dag.dag_id @@ -510,18 +528,15 @@ def run(args, dag=None): try: conf.set(section, option, value) except NoSectionError: -optional_sections = [ -'atlas', 'mesos', 'elasticsearch', 'kubernetes', -'lineage', 'hive' -] -if section in optional_sections: -log.debug('Section {section} Option {option} ' - 'does not exist in the config!'.format(section=section, - option=option)) +no_section_msg = ( +'Section {section} Option {option} ' +'does not exist in the config!' +).format(section=section, option=option) + +if section in OPTIONAL_AIRFLOW_CFG_SECTIONS: +log.debug(no_section_msg) else: -log.error('Section {section} Option {option} ' - 'does not exist in the config!'.format(section=section, - option=option)) +log.error(no_section_msg) settings.configure_vars() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3099) Errors raised when some blocs are missing in airflow.cfg
[ https://issues.apache.org/jira/browse/AIRFLOW-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641076#comment-16641076 ] ASF GitHub Bot commented on AIRFLOW-3099: - kaxil closed pull request #4002: [AIRFLOW-3099] Complete list of optional airflow.cfg sections URL: https://github.com/apache/incubator-airflow/pull/4002 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index dbab8b38d8..14928bbd48 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -488,6 +488,24 @@ def _run(args, dag, ti): @cli_utils.action_logging def run(args, dag=None): +# Optional sections won't log an error if they're missing in airflow.cfg. +OPTIONAL_AIRFLOW_CFG_SECTIONS = [ +'atlas', +'celery', +'celery_broker_transport_options', +'dask', +'elasticsearch', +'github_enterprise', +'hive', +'kerberos', +'kubernetes', +'kubernetes_node_selectors', +'kubernetes_secrets', +'ldap', +'lineage', +'mesos', +] + if dag: args.dag_id = dag.dag_id @@ -510,18 +528,15 @@ def run(args, dag=None): try: conf.set(section, option, value) except NoSectionError: -optional_sections = [ -'atlas', 'mesos', 'elasticsearch', 'kubernetes', -'lineage', 'hive' -] -if section in optional_sections: -log.debug('Section {section} Option {option} ' - 'does not exist in the config!'.format(section=section, - option=option)) +no_section_msg = ( +'Section {section} Option {option} ' +'does not exist in the config!' +).format(section=section, option=option) + +if section in OPTIONAL_AIRFLOW_CFG_SECTIONS: +log.debug(no_section_msg) else: -log.error('Section {section} Option {option} ' - 'does not exist in the config!'.format(section=section, - option=option)) +log.error(no_section_msg) settings.configure_vars() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Errors raised when some blocs are missing in airflow.cfg > > > Key: AIRFLOW-3099 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3099 > Project: Apache Airflow > Issue Type: Bug > Components: configuration >Affects Versions: 1.10.0 >Reporter: Christophe >Assignee: Kaxil Naik >Priority: Minor > Fix For: 1.10.1 > > > When we upgrade from a version of airflow to another one and new config bloc > are available or if we delete useless blocs, lots of errors are raised if we > don't have some blocs in airflow.cfg file. > We need to avoid these errors for non-required blocs. > > For the record logs (not exhaustive): > > {noformat} > [2018-09-21 10:49:37,727] {cli.py:464} ERROR - Section atlas Option > sasl_enabled does not exist in the config! > [2018-09-21 10:49:37,727] {cli.py:464} ERROR - Section atlas Option host does > not exist in the config! > [2018-09-21 10:49:37,727] {cli.py:464} ERROR - Section atlas Option port does > not exist in the config! > [2018-09-21 10:49:37,727] {cli.py:464} ERROR - Section atlas Option username > does not exist in the config! > [2018-09-21 10:49:37,727] {cli.py:464} ERROR - Section atlas Option password > does not exist in the config! > [2018-09-21 10:49:37,728] {cli.py:464} ERROR - Section hive Option > default_hive_mapred_queue does not exist in the config! > [2018-09-21 10:49:37,728] {cli.py:464} ERROR - Section mesos Option master > does not exist in the config! > [2018-09-21 10:49:37,728] {cli.py:464} ERROR - Section mesos Option > framework_name does not exist in the config! > [2018-09-21 10:49:37,728] {cli.py:464} ERROR - Section mesos Option task_cpu > does not exist in the config! > [2018-09-21 10:49:3
[GitHub] codecov-io edited a comment on issue #4015: [AIRFLOW-2789] Create single node cluster
codecov-io edited a comment on issue #4015: [AIRFLOW-2789] Create single node cluster URL: https://github.com/apache/incubator-airflow/pull/4015#issuecomment-427650314 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=h1) Report > Merging [#4015](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/9bea6228d92780e508f4cc69a273653878a191b4?src=pr&el=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4015/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#4015 +/- ## == - Coverage 75.51% 75.49% -0.02% == Files 199 199 Lines 1594615946 == - Hits1204112039 -2 - Misses 3905 3907 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4015/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.13% <0%> (-0.27%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4015/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `91.71% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=footer). Last update [9bea622...0927b66](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #2551: [AIRFLOW-1543] Improve error message for incorrect fernet_key
codecov-io edited a comment on issue #2551: [AIRFLOW-1543] Improve error message for incorrect fernet_key URL: https://github.com/apache/incubator-airflow/pull/2551#issuecomment-325708902 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2551?src=pr&el=h1) Report > Merging [#2551](https://codecov.io/gh/apache/incubator-airflow/pull/2551?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/304c2afd20c0ea3139054a38f9eb38d5fa7d930e?src=pr&el=desc) will **decrease** coverage by `6.64%`. > The diff coverage is `66.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/2551/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/2551?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#2551 +/- ## == - Coverage 77.49% 70.85% -6.65% == Files 200 150 -50 Lines 1589311585-4308 == - Hits12317 8209-4108 + Misses 3576 3376 -200 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/2551?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `87.17% <66.66%> (-1.64%)` | :arrow_down: | | [airflow/operators/email\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZW1haWxfb3BlcmF0b3IucHk=) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/hooks/pig\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9waWdfaG9vay5weQ==) | `0% <0%> (-100%)` | :arrow_down: | | [airflow/operators/redshift\_to\_s3\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcmVkc2hpZnRfdG9fczNfb3BlcmF0b3IucHk=) | `0% <0%> (-95.56%)` | :arrow_down: | | [airflow/hooks/jdbc\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9qZGJjX2hvb2sucHk=) | `0% <0%> (-94.45%)` | :arrow_down: | | [airflow/operators/s3\_file\_transform\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfZmlsZV90cmFuc2Zvcm1fb3BlcmF0b3IucHk=) | `0% <0%> (-93.88%)` | :arrow_down: | | [airflow/executors/celery\_executor.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9leGVjdXRvcnMvY2VsZXJ5X2V4ZWN1dG9yLnB5) | `0% <0%> (-80.62%)` | :arrow_down: | | [airflow/hooks/S3\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9TM19ob29rLnB5) | `22.27% <0%> (-72.05%)` | :arrow_down: | | [airflow/hooks/mssql\_hook.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy9ob29rcy9tc3NxbF9ob29rLnB5) | `6.66% <0%> (-66.67%)` | :arrow_down: | | [airflow/utils/log/s3\_task\_handler.py](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvczNfdGFza19oYW5kbGVyLnB5) | `37.5% <0%> (-61.08%)` | :arrow_down: | | ... and [199 more](https://codecov.io/gh/apache/incubator-airflow/pull/2551/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2551?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/2551?src=pr&el=footer). Last update [304c2af...1126a50](https://codecov.io/gh/apache/incubator-airflow/pull/2551?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-824) Allow writing to XCOM values via API
[ https://issues.apache.org/jira/browse/AIRFLOW-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641068#comment-16641068 ] jack commented on AIRFLOW-824: -- What is this suppose to help with? When did you have the need to manually change XCOM? > Allow writing to XCOM values via API > > > Key: AIRFLOW-824 > URL: https://issues.apache.org/jira/browse/AIRFLOW-824 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Robin Miller >Assignee: Robin Miller >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] exploy commented on issue #4015: [AIRFLOW-2789] Create single node cluster
exploy commented on issue #4015: [AIRFLOW-2789] Create single node cluster URL: https://github.com/apache/incubator-airflow/pull/4015#issuecomment-427650912 > # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=h1) Report > > Merging [#4015](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/9bea6228d92780e508f4cc69a273653878a191b4?src=pr&el=desc) will **decrease** coverage by `0.01%`. > > The diff coverage is `n/a`. > > [![Impacted file tree graph](https://camo.githubusercontent.com/63e6afd98c2604367668c960c8f019aa77dd65c6/68747470733a2f2f636f6465636f762e696f2f67682f6170616368652f696e63756261746f722d616972666c6f772f70756c6c2f343031352f6772617068732f747265652e7376673f77696474683d36353026746f6b656e3d57644c4b6c4b484f4155266865696768743d313530267372633d7072)](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) > > ```diff > @@Coverage Diff @@ > ## master#4015 +/- ## > == > - Coverage 75.51% 75.49% -0.02% > == > Files 199 199 > Lines 1594615946 > == > - Hits1204112038 -3 > - Misses 3905 3908 +3 > ``` > > [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) Coverage Δ > [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4015/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzLnB5) `82.13% <0%> (-0.27%)` > [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=continue). > > > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=footer). Last update [9bea622...7012ed1](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). Not sure how my changes impact `jobs.py`. Can anyone comment on this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-2717) FileToGoogleCloudStorageOperator not shown in the Documentation.
[ https://issues.apache.org/jira/browse/AIRFLOW-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641046#comment-16641046 ] jack commented on AIRFLOW-2717: --- [~ashb] I think it was fixed. https://airflow.readthedocs.io/en/latest/_modules/airflow/contrib/operators/file_to_gcs.html > FileToGoogleCloudStorageOperator not shown in the Documentation. > > > Key: AIRFLOW-2717 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2717 > Project: Apache Airflow > Issue Type: Bug > Components: contrib, Documentation, gcp, operators >Affects Versions: 1.9.0 >Reporter: Michele De Simoni >Priority: Minor > Labels: docuentation, easyfix > Fix For: 1.9.0 > > > [FileToGoogleCloudStorageOperator|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/file_to_gcs.py] > is present in the codebase but not in the > [documentation|https://airflow.incubator.apache.org/code.html#community-contributed-operators]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #4015: [AIRFLOW-2789] Create single node cluster
codecov-io edited a comment on issue #4015: [AIRFLOW-2789] Create single node cluster URL: https://github.com/apache/incubator-airflow/pull/4015#issuecomment-427650314 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=h1) Report > Merging [#4015](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/9bea6228d92780e508f4cc69a273653878a191b4?src=pr&el=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4015/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#4015 +/- ## == - Coverage 75.51% 75.49% -0.02% == Files 199 199 Lines 1594615946 == - Hits1204112038 -3 - Misses 3905 3908 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4015/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.13% <0%> (-0.27%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=footer). Last update [9bea622...7012ed1](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4015: [AIRFLOW-2789] Create single node cluster
codecov-io commented on issue #4015: [AIRFLOW-2789] Create single node cluster URL: https://github.com/apache/incubator-airflow/pull/4015#issuecomment-427650314 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=h1) Report > Merging [#4015](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/9bea6228d92780e508f4cc69a273653878a191b4?src=pr&el=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4015/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#4015 +/- ## == - Coverage 75.51% 75.49% -0.02% == Files 199 199 Lines 1594615946 == - Hits1204112038 -3 - Misses 3905 3908 +3 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4015/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `82.13% <0%> (-0.27%)` | :arrow_down: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=footer). Last update [9bea622...7012ed1](https://codecov.io/gh/apache/incubator-airflow/pull/4015?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-2789) Add ability to create single node cluster to DataprocClusterCreateOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarosław Śmietanka updated AIRFLOW-2789: Affects Version/s: 1.10.0 > Add ability to create single node cluster to DataprocClusterCreateOperator > -- > > Key: AIRFLOW-2789 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2789 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, gcp, operators >Affects Versions: 1.10.0, 2.0.0 >Reporter: Jarosław Śmietanka >Assignee: Jarosław Śmietanka >Priority: Minor > > In GCP, it is possible to set up [Single node > clusters|https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/single-node-clusters] > Since the minimal size of the cluster (without this modification) is 3 (one > master + two workers). It may be very helpful while doing, for example, > small-scale non-critical data processing or building proof-of-concept. > Since I already have a code which does that, I volunteer to bring it to the > community :) > This improvement won't change many components and should not require > groundbreaking changes to DataprocClusterCreateOperator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (AIRFLOW-2789) Add ability to create single node cluster to DataprocClusterCreateOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-2789 started by Jarosław Śmietanka. --- > Add ability to create single node cluster to DataprocClusterCreateOperator > -- > > Key: AIRFLOW-2789 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2789 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, gcp, operators >Affects Versions: 1.10.0, 2.0.0 >Reporter: Jarosław Śmietanka >Assignee: Jarosław Śmietanka >Priority: Minor > > In GCP, it is possible to set up [Single node > clusters|https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/single-node-clusters] > Since the minimal size of the cluster (without this modification) is 3 (one > master + two workers). It may be very helpful while doing, for example, > small-scale non-critical data processing or building proof-of-concept. > Since I already have a code which does that, I volunteer to bring it to the > community :) > This improvement won't change many components and should not require > groundbreaking changes to DataprocClusterCreateOperator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2789) Add ability to create single node cluster to DataprocClusterCreateOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641039#comment-16641039 ] ASF GitHub Bot commented on AIRFLOW-2789: - exploy opened a new pull request #4015: [AIRFLOW-2789] Create single node cluster URL: https://github.com/apache/incubator-airflow/pull/4015 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description In GCP, it is possible to set up [Single node clusters](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/single-node-clusters) Since the minimal size of the cluster (without this modification) is 3 (one master + two workers). It may be very helpful while doing, for example, small-scale non-critical data processing or building proof-of-concept. ### Tests - [x] My PR adds the following unit tests `test_build_single_node_cluster` and `test_init_with_single_node_and_non_zero_workers` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add ability to create single node cluster to DataprocClusterCreateOperator > -- > > Key: AIRFLOW-2789 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2789 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, gcp, operators >Affects Versions: 2.0.0 >Reporter: Jarosław Śmietanka >Assignee: Jarosław Śmietanka >Priority: Minor > > In GCP, it is possible to set up [Single node > clusters|https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/single-node-clusters] > Since the minimal size of the cluster (without this modification) is 3 (one > master + two workers). It may be very helpful while doing, for example, > small-scale non-critical data processing or building proof-of-concept. > Since I already have a code which does that, I volunteer to bring it to the > community :) > This improvement won't change many components and should not require > groundbreaking changes to DataprocClusterCreateOperator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] exploy opened a new pull request #4015: [AIRFLOW-2789] Create single node cluster
exploy opened a new pull request #4015: [AIRFLOW-2789] Create single node cluster URL: https://github.com/apache/incubator-airflow/pull/4015 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description In GCP, it is possible to set up [Single node clusters](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/single-node-clusters) Since the minimal size of the cluster (without this modification) is 3 (one master + two workers). It may be very helpful while doing, for example, small-scale non-critical data processing or building proof-of-concept. ### Tests - [x] My PR adds the following unit tests `test_build_single_node_cluster` and `test_init_with_single_node_and_non_zero_workers` ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services