[jira] [Commented] (AIRFLOW-5701) Don't clear xcom explicitly before execution
[ https://issues.apache.org/jira/browse/AIRFLOW-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957572#comment-16957572 ] ASF subversion and git services commented on AIRFLOW-5701: -- Commit 74d2a0d9e77cf90b85654f65d1adba0875e0fb1f in airflow's branch refs/heads/master from Fokko Driesprong [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=74d2a0d ] AIRFLOW-5701: Don't clear xcom explicitly before execution (#6370) > Don't clear xcom explicitly before execution > > > Key: AIRFLOW-5701 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5701 > Project: Apache Airflow > Issue Type: Bug > Components: xcom >Affects Versions: 1.10.5 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] Fokko commented on a change in pull request #6379: [AIRFLOW-5708] Optionally check task pools when parsing dags.
Fokko commented on a change in pull request #6379: [AIRFLOW-5708] Optionally check task pools when parsing dags. URL: https://github.com/apache/airflow/pull/6379#discussion_r337857359 ## File path: airflow/config_templates/default_airflow.cfg ## @@ -208,6 +208,8 @@ dag_discovery_safe_mode = True # The number of retries each task is going to have by default. Can be overridden at dag or task level. default_task_retries = 0 +check_task_pools = False Review comment: Why this as an option? I think we should check if the pool exists anyway. Currently, the error message is horrible :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6341: Updated airflow
Fokko commented on issue #6341: Updated airflow URL: https://github.com/apache/airflow/pull/6341#issuecomment-545275692 Thank you @Fortschritt69 for your PR. Please create a ticket in Jira to describe the intention of the PR. For now, I'll close the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko closed pull request #6341: Updated airflow
Fokko closed pull request #6341: Updated airflow URL: https://github.com/apache/airflow/pull/6341 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko merged pull request #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko merged pull request #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-5701) Don't clear xcom explicitly before execution
[ https://issues.apache.org/jira/browse/AIRFLOW-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957571#comment-16957571 ] ASF GitHub Bot commented on AIRFLOW-5701: - Fokko commented on pull request #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Don't clear xcom explicitly before execution > > > Key: AIRFLOW-5701 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5701 > Project: Apache Airflow > Issue Type: Bug > Components: xcom >Affects Versions: 1.10.5 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] fangpenlin commented on issue #3229: [AIRFLOW-2325] Add cloudwatch task handler (IN PROGRESS)
fangpenlin commented on issue #3229: [AIRFLOW-2325] Add cloudwatch task handler (IN PROGRESS) URL: https://github.com/apache/airflow/pull/3229#issuecomment-545265286 @ericabertugli @potiuk @mik-laj yeah, sorry I have no time to work on this. Feel free to take over my PR and continue working on it 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-5710) Optionally error on unused operator arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai updated AIRFLOW-5710: --- Fix Version/s: 1.10.6 > Optionally error on unused operator arguments > - > > Key: AIRFLOW-5710 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5710 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Joshua Carp >Assignee: Joshua Carp >Priority: Trivial > Fix For: 1.10.6 > > > Airflow 2.0 will error when operators are instantiated with unused keyword > arguments, but for now unused arguments just raise a warning. My team has > passed unused arguments to operators and not noticed the warning a few times, > and it would be useful to be able to opt in to airflow 2.0 validation early. > I propose adding a configuration flag that causes tasks to error on unused > arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (AIRFLOW-5710) Optionally error on unused operator arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-5710. Resolution: Fixed > Optionally error on unused operator arguments > - > > Key: AIRFLOW-5710 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5710 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Joshua Carp >Assignee: Joshua Carp >Priority: Trivial > > Airflow 2.0 will error when operators are instantiated with unused keyword > arguments, but for now unused arguments just raise a warning. My team has > passed unused arguments to operators and not noticed the warning a few times, > and it would be useful to be able to opt in to airflow 2.0 validation early. > I propose adding a configuration flag that causes tasks to error on unused > arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5710) Optionally error on unused operator arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957504#comment-16957504 ] ASF subversion and git services commented on AIRFLOW-5710: -- Commit e9d65e3d21f2aa6eaeb557b2960730628540127f in airflow's branch refs/heads/master from Joshua Carp [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=e9d65e3 ] [AIRFLOW-5710] Optionally raise exception on unused operator arguments. (#6332) > Optionally error on unused operator arguments > - > > Key: AIRFLOW-5710 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5710 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Joshua Carp >Assignee: Joshua Carp >Priority: Trivial > > Airflow 2.0 will error when operators are instantiated with unused keyword > arguments, but for now unused arguments just raise a warning. My team has > passed unused arguments to operators and not noticed the warning a few times, > and it would be useful to be able to opt in to airflow 2.0 validation early. > I propose adding a configuration flag that causes tasks to error on unused > arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5710) Optionally error on unused operator arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957503#comment-16957503 ] ASF GitHub Bot commented on AIRFLOW-5710: - milton0825 commented on pull request #6332: [AIRFLOW-5710] Optionally raise exception on unused operator arguments. URL: https://github.com/apache/airflow/pull/6332 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Optionally error on unused operator arguments > - > > Key: AIRFLOW-5710 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5710 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Joshua Carp >Assignee: Joshua Carp >Priority: Trivial > > Airflow 2.0 will error when operators are instantiated with unused keyword > arguments, but for now unused arguments just raise a warning. My team has > passed unused arguments to operators and not noticed the warning a few times, > and it would be useful to be able to opt in to airflow 2.0 validation early. > I propose adding a configuration flag that causes tasks to error on unused > arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] milton0825 merged pull request #6332: [AIRFLOW-5710] Optionally raise exception on unused operator arguments.
milton0825 merged pull request #6332: [AIRFLOW-5710] Optionally raise exception on unused operator arguments. URL: https://github.com/apache/airflow/pull/6332 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] KevinYang21 merged pull request #6391: [AIRFLOW-XXX] Improve wording
KevinYang21 merged pull request #6391: [AIRFLOW-XXX] Improve wording URL: https://github.com/apache/airflow/pull/6391 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
codecov-io edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#issuecomment-529755241 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@4e661f5`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `83.81%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5743/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master#5743 +/- ## = Coverage ? 84.16% = Files ? 627 Lines ?36531 Branches ?0 = Hits ?30745 Misses? 5786 Partials ?0 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/serialization/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZXJpYWxpemF0aW9uL19faW5pdF9fLnB5) | `100% <100%> (ø)` | | | [airflow/models/baseoperator.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZW9wZXJhdG9yLnB5) | `95.91% <100%> (ø)` | | | [airflow/settings.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==) | `88.48% <100%> (ø)` | | | [airflow/utils/log/logging\_mixin.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvbG9nZ2luZ19taXhpbi5weQ==) | `96.15% <100%> (ø)` | | | [airflow/www/utils.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=) | `80.19% <100%> (ø)` | | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `100% <100%> (ø)` | | | [airflow/serialization/enums.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZXJpYWxpemF0aW9uL2VudW1zLnB5) | `100% <100%> (ø)` | | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `58.15% <16.66%> (ø)` | | | [airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5) | `85.24% <41.66%> (ø)` | | | [airflow/models/dag.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnLnB5) | `90.87% <80%> (ø)` | | | ... and [8 more](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=footer). Last update [4e661f5...4e1416a](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
codecov-io edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#issuecomment-529755241 # [Codecov](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@4e661f5`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `83.81%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/5743/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master#5743 +/- ## = Coverage ? 84.16% = Files ? 627 Lines ?36531 Branches ?0 = Hits ?30745 Misses? 5786 Partials ?0 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/serialization/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZXJpYWxpemF0aW9uL19faW5pdF9fLnB5) | `100% <100%> (ø)` | | | [airflow/models/baseoperator.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZW9wZXJhdG9yLnB5) | `95.91% <100%> (ø)` | | | [airflow/settings.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==) | `88.48% <100%> (ø)` | | | [airflow/utils/log/logging\_mixin.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvbG9nZ2luZ19taXhpbi5weQ==) | `96.15% <100%> (ø)` | | | [airflow/www/utils.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy93d3cvdXRpbHMucHk=) | `80.19% <100%> (ø)` | | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `100% <100%> (ø)` | | | [airflow/serialization/enums.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZXJpYWxpemF0aW9uL2VudW1zLnB5) | `100% <100%> (ø)` | | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `58.15% <16.66%> (ø)` | | | [airflow/models/dagbag.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnYmFnLnB5) | `85.24% <41.66%> (ø)` | | | [airflow/models/dag.py](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFnLnB5) | `90.87% <80%> (ø)` | | | ... and [8 more](https://codecov.io/gh/apache/airflow/pull/5743/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=footer). Last update [4e661f5...4e1416a](https://codecov.io/gh/apache/airflow/pull/5743?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337801737 ## File path: docs/howto/enable-dag-serialization.rst ## @@ -0,0 +1,109 @@ + .. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + + + + +Enable DAG Serialization + + +Add the following settings in ``airflow.cfg``: + +.. code-block:: ini + +[core] +store_serialized_dags = True +min_serialized_dag_update_interval = 30 + +* ``store_serialized_dags``: This flag decides whether to serialises DAGs and persist them in DB. +If set to True, Webserver reads from DB instead of parsing DAG files +* ``min_serialized_dag_update_interval``: This flag sets the minimum interval (in seconds) after which +the serialized DAG in DB should be updated. This helps in reducing database write rate. + +If you are updating Airflow from <1.10.6, please do not forget to run ``airflow db upgrade``. + + +How it works + + +In order to make Airflow Webserver stateless (almost!), Airflow >=1.10.6 supports +DAG Serialization and DB Persistence. + +.. image:: ../img/dag_serialization.png + +As shown in the image above in Vanilla Airflow, the Webserver and the Scheduler both +needs access to the DAG files. Both the scheduler and webserver parses the DAG files. + +With **DAG Serialization** we aim to decouple the webserver from DAG parsing +which would make the Webserver very light-weight. + +As shown in the image above, when using the dag_serilization feature, +the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB. + +The Webserver now instead of having to parse the DAG file again, reads the +serialized DAGs in JSON, de-serializes them and create the DagBag and uses it +to show in the UI. + +One of the key features that is implemented as the part of DAG Serialization is that +instead of loading an entire DagBag when the WebServer starts we only load each DAG on demand from the +Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable +when you have large number of DAGs. + +Below is the screenshot of the ``serialized_dag`` table in Metadata DB: + +.. image:: ../img/serialized_dag_table.png + +Limitations +--- +The Webserver will still need access to DAG files in the following cases, +which is why we said "almost" stateless. + +* **Rendered Template** tab will still have to parse Python file as it needs all the details like +the execution date and even the data passed by the upstream task using Xcom. +* **Code View** will read the DAG File & show it using Pygments. +However, it does not need to Parse the Python file so it is still a small operation. +* :doc:`Extra Operator Links ` would not work out of +the box. They need to be defined in Airflow Plugin. + +**Existing Airflow Operators**: +To make extra operator links work with existing operators like BigQuery, copy all +the classes that are defined in ``operator_extra_links`` property. Review comment: Looks like the currently defined OperatorLinks for inbuilt operators are dependent on the Instance of operator object. Example the following is Qubole Operator: ``` class QDSLink(BaseOperatorLink): """Link to QDS""" name = 'Go to QDS' def get_link(self, operator, dttm): return operator.get_hook().get_extra_links(operator, dttm) ``` which depends on `conn = BaseHook.get_connection(operator.kwargs['qubole_conn_id'])` line. We don't store `qubole_conn_id` as the task property in our serialized DAG. and the BigqueryOperator is: ``` @property def operator_extra_links(self): """ Return operator extra lxinks """ if isinstance(self.sql, str): return ( BigQueryConsoleLink(), ) return ( BigQueryConsoleIndexableLink(i) for i, _ in enumerate(self.sql) ) ``` this one is dependent on `self.sql` property !
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337801737 ## File path: docs/howto/enable-dag-serialization.rst ## @@ -0,0 +1,109 @@ + .. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + + + + +Enable DAG Serialization + + +Add the following settings in ``airflow.cfg``: + +.. code-block:: ini + +[core] +store_serialized_dags = True +min_serialized_dag_update_interval = 30 + +* ``store_serialized_dags``: This flag decides whether to serialises DAGs and persist them in DB. +If set to True, Webserver reads from DB instead of parsing DAG files +* ``min_serialized_dag_update_interval``: This flag sets the minimum interval (in seconds) after which +the serialized DAG in DB should be updated. This helps in reducing database write rate. + +If you are updating Airflow from <1.10.6, please do not forget to run ``airflow db upgrade``. + + +How it works + + +In order to make Airflow Webserver stateless (almost!), Airflow >=1.10.6 supports +DAG Serialization and DB Persistence. + +.. image:: ../img/dag_serialization.png + +As shown in the image above in Vanilla Airflow, the Webserver and the Scheduler both +needs access to the DAG files. Both the scheduler and webserver parses the DAG files. + +With **DAG Serialization** we aim to decouple the webserver from DAG parsing +which would make the Webserver very light-weight. + +As shown in the image above, when using the dag_serilization feature, +the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB. + +The Webserver now instead of having to parse the DAG file again, reads the +serialized DAGs in JSON, de-serializes them and create the DagBag and uses it +to show in the UI. + +One of the key features that is implemented as the part of DAG Serialization is that +instead of loading an entire DagBag when the WebServer starts we only load each DAG on demand from the +Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable +when you have large number of DAGs. + +Below is the screenshot of the ``serialized_dag`` table in Metadata DB: + +.. image:: ../img/serialized_dag_table.png + +Limitations +--- +The Webserver will still need access to DAG files in the following cases, +which is why we said "almost" stateless. + +* **Rendered Template** tab will still have to parse Python file as it needs all the details like +the execution date and even the data passed by the upstream task using Xcom. +* **Code View** will read the DAG File & show it using Pygments. +However, it does not need to Parse the Python file so it is still a small operation. +* :doc:`Extra Operator Links ` would not work out of +the box. They need to be defined in Airflow Plugin. + +**Existing Airflow Operators**: +To make extra operator links work with existing operators like BigQuery, copy all +the classes that are defined in ``operator_extra_links`` property. Review comment: Looks like the currently defined OperatorLinks for inbuilt operators are dependent on the Instance of operator class. Example the following is Qubole Operator: ``` class QDSLink(BaseOperatorLink): """Link to QDS""" name = 'Go to QDS' def get_link(self, operator, dttm): return operator.get_hook().get_extra_links(operator, dttm) ``` which depends on `conn = BaseHook.get_connection(operator.kwargs['qubole_conn_id'])` line. We don't store `qubole_conn_id` as the task property in our serialized DAG. and the BigqueryOperator is: ``` @property def operator_extra_links(self): """ Return operator extra lxinks """ if isinstance(self.sql, str): return ( BigQueryConsoleLink(), ) return ( BigQueryConsoleIndexableLink(i) for i, _ in enumerate(self.sql) ) ``` this one is dependent on `self.sql` property !!
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337801737 ## File path: docs/howto/enable-dag-serialization.rst ## @@ -0,0 +1,109 @@ + .. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + + + + +Enable DAG Serialization + + +Add the following settings in ``airflow.cfg``: + +.. code-block:: ini + +[core] +store_serialized_dags = True +min_serialized_dag_update_interval = 30 + +* ``store_serialized_dags``: This flag decides whether to serialises DAGs and persist them in DB. +If set to True, Webserver reads from DB instead of parsing DAG files +* ``min_serialized_dag_update_interval``: This flag sets the minimum interval (in seconds) after which +the serialized DAG in DB should be updated. This helps in reducing database write rate. + +If you are updating Airflow from <1.10.6, please do not forget to run ``airflow db upgrade``. + + +How it works + + +In order to make Airflow Webserver stateless (almost!), Airflow >=1.10.6 supports +DAG Serialization and DB Persistence. + +.. image:: ../img/dag_serialization.png + +As shown in the image above in Vanilla Airflow, the Webserver and the Scheduler both +needs access to the DAG files. Both the scheduler and webserver parses the DAG files. + +With **DAG Serialization** we aim to decouple the webserver from DAG parsing +which would make the Webserver very light-weight. + +As shown in the image above, when using the dag_serilization feature, +the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB. + +The Webserver now instead of having to parse the DAG file again, reads the +serialized DAGs in JSON, de-serializes them and create the DagBag and uses it +to show in the UI. + +One of the key features that is implemented as the part of DAG Serialization is that +instead of loading an entire DagBag when the WebServer starts we only load each DAG on demand from the +Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable +when you have large number of DAGs. + +Below is the screenshot of the ``serialized_dag`` table in Metadata DB: + +.. image:: ../img/serialized_dag_table.png + +Limitations +--- +The Webserver will still need access to DAG files in the following cases, +which is why we said "almost" stateless. + +* **Rendered Template** tab will still have to parse Python file as it needs all the details like +the execution date and even the data passed by the upstream task using Xcom. +* **Code View** will read the DAG File & show it using Pygments. +However, it does not need to Parse the Python file so it is still a small operation. +* :doc:`Extra Operator Links ` would not work out of +the box. They need to be defined in Airflow Plugin. + +**Existing Airflow Operators**: +To make extra operator links work with existing operators like BigQuery, copy all +the classes that are defined in ``operator_extra_links`` property. Review comment: Looks like the currently defined OperatorLinks for inbuilt operators aredependent on Instance of operator object. Example the following is Qubole Operator: ``` class QDSLink(BaseOperatorLink): """Link to QDS""" name = 'Go to QDS' def get_link(self, operator, dttm): return operator.get_hook().get_extra_links(operator, dttm) ``` which depends on `conn = BaseHook.get_connection(operator.kwargs['qubole_conn_id'])` line. We don't store `qubole_conn_id` as the task property in our serialized DAG. and the BigqueryOperator is: ``` @property def operator_extra_links(self): """ Return operator extra lxinks """ if isinstance(self.sql, str): return ( BigQueryConsoleLink(), ) return ( BigQueryConsoleIndexableLink(i) for i, _ in enumerate(self.sql) ) ``` this one is dependent on `self.sql` property !! whi
[GitHub] [airflow] codecov-io edited a comment on issue #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage
codecov-io edited a comment on issue #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage URL: https://github.com/apache/airflow/pull/6317#issuecomment-541349899 # [Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=h1) Report > Merging [#6317](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/3cfe4a1c9dc49c91839e9c278b97f9c18033fdf4?src=pr&el=desc) will **decrease** coverage by `0.26%`. > The diff coverage is `95%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/6317/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master #6317 +/- ## = - Coverage 80.57% 80.3% -0.27% = Files 626 626 Lines 36237 36223 -14 = - Hits29198 29090 -108 - Misses 70397133 +94 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [...low/example\_dags/example\_trigger\_controller\_dag.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV90cmlnZ2VyX2NvbnRyb2xsZXJfZGFnLnB5) | `100% <100%> (+43.75%)` | :arrow_up: | | [airflow/operators/dagrun\_operator.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZGFncnVuX29wZXJhdG9yLnB5) | `96% <100%> (+1.26%)` | :arrow_up: | | [airflow/utils/dates.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYXRlcy5weQ==) | `82.6% <100%> (ø)` | :arrow_up: | | [airflow/example\_dags/example\_trigger\_target\_dag.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV90cmlnZ2VyX3RhcmdldF9kYWcucHk=) | `90% <75%> (-2.31%)` | :arrow_down: | | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: | | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: | | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: | | [airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5) | `33.33% <0%> (-41.67%)` | :arrow_down: | | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `70.14% <0%> (-28.36%)` | :arrow_down: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `93.28% <0%> (-0.51%)` | :arrow_down: | | ... and [4 more](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=footer). Last update [3cfe4a1...467dfcc](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage
codecov-io edited a comment on issue #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage URL: https://github.com/apache/airflow/pull/6317#issuecomment-541349899 # [Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=h1) Report > Merging [#6317](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/3cfe4a1c9dc49c91839e9c278b97f9c18033fdf4?src=pr&el=desc) will **decrease** coverage by `0.26%`. > The diff coverage is `95%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/6317/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master #6317 +/- ## = - Coverage 80.57% 80.3% -0.27% = Files 626 626 Lines 36237 36223 -14 = - Hits29198 29090 -108 - Misses 70397133 +94 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [...low/example\_dags/example\_trigger\_controller\_dag.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV90cmlnZ2VyX2NvbnRyb2xsZXJfZGFnLnB5) | `100% <100%> (+43.75%)` | :arrow_up: | | [airflow/operators/dagrun\_operator.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZGFncnVuX29wZXJhdG9yLnB5) | `96% <100%> (+1.26%)` | :arrow_up: | | [airflow/utils/dates.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYXRlcy5weQ==) | `82.6% <100%> (ø)` | :arrow_up: | | [airflow/example\_dags/example\_trigger\_target\_dag.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV90cmlnZ2VyX3RhcmdldF9kYWcucHk=) | `90% <75%> (-2.31%)` | :arrow_down: | | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: | | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: | | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: | | [airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5) | `33.33% <0%> (-41.67%)` | :arrow_down: | | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `70.14% <0%> (-28.36%)` | :arrow_down: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `93.28% <0%> (-0.51%)` | :arrow_down: | | ... and [4 more](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=footer). Last update [3cfe4a1...467dfcc](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage
codecov-io edited a comment on issue #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage URL: https://github.com/apache/airflow/pull/6317#issuecomment-541349899 # [Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=h1) Report > Merging [#6317](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/3cfe4a1c9dc49c91839e9c278b97f9c18033fdf4?src=pr&el=desc) will **decrease** coverage by `0.26%`. > The diff coverage is `95%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/6317/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master #6317 +/- ## = - Coverage 80.57% 80.3% -0.27% = Files 626 626 Lines 36237 36223 -14 = - Hits29198 29090 -108 - Misses 70397133 +94 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [...low/example\_dags/example\_trigger\_controller\_dag.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV90cmlnZ2VyX2NvbnRyb2xsZXJfZGFnLnB5) | `100% <100%> (+43.75%)` | :arrow_up: | | [airflow/operators/dagrun\_operator.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZGFncnVuX29wZXJhdG9yLnB5) | `96% <100%> (+1.26%)` | :arrow_up: | | [airflow/utils/dates.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYXRlcy5weQ==) | `82.6% <100%> (ø)` | :arrow_up: | | [airflow/example\_dags/example\_trigger\_target\_dag.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9leGFtcGxlX2RhZ3MvZXhhbXBsZV90cmlnZ2VyX3RhcmdldF9kYWcucHk=) | `90% <75%> (-2.31%)` | :arrow_down: | | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: | | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: | | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: | | [airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5) | `33.33% <0%> (-41.67%)` | :arrow_down: | | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `70.14% <0%> (-28.36%)` | :arrow_down: | | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `93.28% <0%> (-0.51%)` | :arrow_down: | | ... and [4 more](https://codecov.io/gh/apache/airflow/pull/6317/diff?src=pr&el=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=footer). Last update [3cfe4a1...467dfcc](https://codecov.io/gh/apache/airflow/pull/6317?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337773140 ## File path: airflow/models/serialized_dag.py ## @@ -0,0 +1,214 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Serialzed DAG table in database.""" + +import hashlib +from datetime import timedelta +from typing import TYPE_CHECKING, Any, Dict, List, Optional + +# from sqlalchemy import Column, Index, Integer, String, Text, JSON, and_, exc +from sqlalchemy import JSON, Column, Index, Integer, String, and_ +from sqlalchemy.sql import exists + +from airflow.models.base import ID_LEN, Base +from airflow.utils import db, timezone +from airflow.utils.log.logging_mixin import LoggingMixin +from airflow.utils.sqlalchemy import UtcDateTime + +if TYPE_CHECKING: +from airflow.models import DAG # noqa: F401; # pylint: disable=cyclic-import +from airflow.serialization import SerializedDAG # noqa: F401 + + +log = LoggingMixin().log + + +class SerializedDagModel(Base): +"""A table for serialized DAGs. + +serialized_dag table is a snapshot of DAG files synchronized by scheduler. +This feature is controlled by: + +* ``[core] store_serialized_dags = True``: enable this feature +* ``[core] min_serialized_dag_update_interval = 30`` (s): + serialized DAGs are updated in DB when a file gets processed by scheduler, + to reduce DB write rate, there is a minimal interval of updating serialized DAGs. +* ``[scheduler] dag_dir_list_interval = 300`` (s): + interval of deleting serialized DAGs in DB when the files are deleted, suggest + to use a smaller interval such as 60 + +It is used by webserver to load dagbags when ``store_serialized_dags=True``. +Because reading from database is lightweight compared to importing from files, +it solves the webserver scalability issue. +""" +__tablename__ = 'serialized_dag' + +dag_id = Column(String(ID_LEN), primary_key=True) +fileloc = Column(String(2000), nullable=False) +# The max length of fileloc exceeds the limit of indexing. +fileloc_hash = Column(Integer, nullable=False) +data = Column(JSON, nullable=False) +last_updated = Column(UtcDateTime, nullable=False) + +__table_args__ = ( +Index('idx_fileloc_hash', fileloc_hash, unique=False), +) + +def __init__(self, dag: 'DAG'): +from airflow.serialization import SerializedDAG # noqa # pylint: disable=redefined-outer-name + +self.dag_id = dag.dag_id +self.fileloc = dag.full_filepath +self.fileloc_hash = self.dag_fileloc_hash(self.fileloc) +self.data = SerializedDAG.to_dict(dag) +self.last_updated = timezone.utcnow() + +@staticmethod +def dag_fileloc_hash(full_filepath: str) -> int: +Hashing file location for indexing. + +:param full_filepath: full filepath of DAG file +:return: hashed full_filepath +""" +# hashing is needed because the length of fileloc is 2000 as an Airflow convention, +# which is over the limit of indexing. If we can reduce the length of fileloc, then +# hashing is not needed. +return int.from_bytes( +hashlib.sha1(full_filepath.encode('utf-8')).digest()[-2:], byteorder='big', signed=False) + +@classmethod +@db.provide_session +def write_dag(cls, dag: 'DAG', min_update_interval: Optional[int] = None, session=None): +"""Serializes a DAG and writes it into database. + +:param dag: a DAG to be written into database +:param min_update_interval: minimal interval in seconds to update serialized DAG +:param session: ORM Session +""" +log.debug("Writing DAG: %s to the DB", dag) +# Checks if (Current Time - Time when the DAG was written to DB) < min_update_interval +# If Yes, does nothing +# If No or the DAG does not exists, updates / writes Serialized DAG to DB +if min_update_interval is not None: +if session.query(exists().where( +and_(cls.dag_id == dag.d
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337772827 ## File path: airflow/dag/serialization/schema.json ## @@ -0,0 +1,196 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#";, + "$id": "https://airflow.apache.com/schemas/serialized-dags.json";, + "definitions": { +"datetime": { + "description": "A date time, stored as fractional seconds since the epoch", + "type": "number" +}, +"typed_datetime": { + "type": "object", + "properties": { +"__type": { + "type": "string", + "const": "datetime" +}, +"__var": { "$ref": "#/definitions/datetime" } + }, + "required": [ +"__type", +"__var" + ], + "additionalProperties": false +}, +"timedelta": { + "type": "number", + "minimum": 0 +}, +"typed_timedelta": { + "type": "object", + "properties": { +"__type": { + "type": "string", + "const": "timedelta" +}, +"__var": { "$ref": "#/definitions/timedelta" } + }, + "required": [ +"__type", +"__var" + ], + "additionalProperties": false +}, +"typed_relativedelta": { + "type": "object", + "description": "A dateutil.relativedelta.relativedelta object", + "properties": { +"__type": { + "type": "string", + "const": "relativedelta" +}, +"__var": { + "type": "object", + "properties": { +"weekday": { + "type": "array", + "items": { "type": "integer" }, + "minItems": 1, + "maxItems": 2 +} + }, + "additionalProperties": { "type": "integer" } +} + } +}, +"timezone": { + "type": "string" +}, +"dict": { + "description": "A python dictionary containing values of any type", + "type": "object" +}, +"color": { + "type": "string", + "pattern": "^#[a-fA-F0-9]{3,6}$" +}, +"stringset": { + "description": "A set of strings", + "type": "object", + "properties": { +"__type": { + "type": "string", + "const": "set" +}, +"__var": { + "type": "array", + "items": { "type": "string" } +} + }, + "required": [ +"__type", +"__var" + ] +}, +"dag": { + "type": "object", + "properties": { +"params": { "$ref": "#/definitions/dict" }, +"_dag_id": { "type": "string" }, +"task_dict": { "$ref": "#/definitions/task_dict" }, +"timezone": { "$ref": "#/definitions/timezone" }, +"schedule_interval": { + "anyOf": [ +{ "type": "null" }, +{ "type": "string" }, +{ "$ref": "#/definitions/typed_timedelta" }, +{ "$ref": "#/definitions/typed_relativedelta" } + ] +}, +"catchup": { "type": "boolean" }, +"is_subdag": { "type": "boolean" }, +"fileloc": { "type" : "string"}, +"orientation": { "type" : "string"}, +"_description": { "type" : "string"}, +"_concurrency": { "type" : "number"}, +"max_active_runs": { "type" : "number"}, +"default_args": { "$ref": "#/definitions/dict" }, +"start_date": { "$ref": "#/definitions/datetime" }, +"dagrun_timeout": { "$ref": "#/definitions/timedelta" }, +"doc_md": { "type" : "string"} + }, + "required": [ +"params", +"_dag_id", +"fileloc", +"task_dict" + ], + "additionalProperties": false +}, +"task_dict": { + "type": "object", + "additionalProperties": { "$ref": "#/definitions/operator" } +}, +"operator": { + "$comment": "A task/operator in a DAG", + "type": "object", + "required": [ +"_task_type", +"_task_module", +"task_id", +"ui_color", +"ui_fgcolor", +"template_fields" + ], + "properties": { +"_task_type": { "type": "string" }, +"_task_module": { "type": "string" }, +"task_id": { "type": "string" }, +"owner": { "type": "string" }, +"start_date": { "$ref": "#/definitions/datetime" }, +"end_date": { "$ref": "#/definitions/datetime" }, +"trigger_rule": { "type": "string" }, +"depends_on_past": { "type": "boolean" }, +"wait_for_downstream": { "type": "boolean" }, +"retries": { "type": "number" }, +"queue": { "type": "string" }, +"pool": { "type": "string" }, +"execution_timeout": { "$ref": "#/definitions/timedelta" }, +"retry_delay": { "$ref": "#/definitions/timedelta" }, +"retry_exponential_backoff": { "ty
[GitHub] [airflow] mik-laj commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
mik-laj commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337772203 ## File path: airflow/models/serialized_dag.py ## @@ -0,0 +1,214 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Serialzed DAG table in database.""" + +import hashlib +from datetime import timedelta +from typing import TYPE_CHECKING, Any, Dict, List, Optional + +# from sqlalchemy import Column, Index, Integer, String, Text, JSON, and_, exc +from sqlalchemy import JSON, Column, Index, Integer, String, and_ +from sqlalchemy.sql import exists + +from airflow.models.base import ID_LEN, Base +from airflow.utils import db, timezone +from airflow.utils.log.logging_mixin import LoggingMixin +from airflow.utils.sqlalchemy import UtcDateTime + +if TYPE_CHECKING: +from airflow.models import DAG # noqa: F401; # pylint: disable=cyclic-import +from airflow.serialization import SerializedDAG # noqa: F401 + + +log = LoggingMixin().log + + +class SerializedDagModel(Base): +"""A table for serialized DAGs. + +serialized_dag table is a snapshot of DAG files synchronized by scheduler. +This feature is controlled by: + +* ``[core] store_serialized_dags = True``: enable this feature +* ``[core] min_serialized_dag_update_interval = 30`` (s): + serialized DAGs are updated in DB when a file gets processed by scheduler, + to reduce DB write rate, there is a minimal interval of updating serialized DAGs. +* ``[scheduler] dag_dir_list_interval = 300`` (s): + interval of deleting serialized DAGs in DB when the files are deleted, suggest + to use a smaller interval such as 60 + +It is used by webserver to load dagbags when ``store_serialized_dags=True``. +Because reading from database is lightweight compared to importing from files, +it solves the webserver scalability issue. +""" +__tablename__ = 'serialized_dag' + +dag_id = Column(String(ID_LEN), primary_key=True) +fileloc = Column(String(2000), nullable=False) +# The max length of fileloc exceeds the limit of indexing. +fileloc_hash = Column(Integer, nullable=False) +data = Column(JSON, nullable=False) +last_updated = Column(UtcDateTime, nullable=False) + +__table_args__ = ( +Index('idx_fileloc_hash', fileloc_hash, unique=False), +) + +def __init__(self, dag: 'DAG'): +from airflow.serialization import SerializedDAG # noqa # pylint: disable=redefined-outer-name + +self.dag_id = dag.dag_id +self.fileloc = dag.full_filepath +self.fileloc_hash = self.dag_fileloc_hash(self.fileloc) +self.data = SerializedDAG.to_dict(dag) +self.last_updated = timezone.utcnow() + +@staticmethod +def dag_fileloc_hash(full_filepath: str) -> int: +Hashing file location for indexing. + +:param full_filepath: full filepath of DAG file +:return: hashed full_filepath +""" +# hashing is needed because the length of fileloc is 2000 as an Airflow convention, +# which is over the limit of indexing. If we can reduce the length of fileloc, then +# hashing is not needed. +return int.from_bytes( +hashlib.sha1(full_filepath.encode('utf-8')).digest()[-2:], byteorder='big', signed=False) + +@classmethod +@db.provide_session +def write_dag(cls, dag: 'DAG', min_update_interval: Optional[int] = None, session=None): +"""Serializes a DAG and writes it into database. + +:param dag: a DAG to be written into database +:param min_update_interval: minimal interval in seconds to update serialized DAG +:param session: ORM Session +""" +log.debug("Writing DAG: %s to the DB", dag) +# Checks if (Current Time - Time when the DAG was written to DB) < min_update_interval +# If Yes, does nothing +# If No or the DAG does not exists, updates / writes Serialized DAG to DB +if min_update_interval is not None: +if session.query(exists().where( +and_(cls.dag_id == dag
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337772008 ## File path: airflow/models/serialized_dag.py ## @@ -0,0 +1,214 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Serialzed DAG table in database.""" + +import hashlib +from datetime import timedelta +from typing import TYPE_CHECKING, Any, Dict, List, Optional + +# from sqlalchemy import Column, Index, Integer, String, Text, JSON, and_, exc +from sqlalchemy import JSON, Column, Index, Integer, String, and_ +from sqlalchemy.sql import exists + +from airflow.models.base import ID_LEN, Base +from airflow.utils import db, timezone +from airflow.utils.log.logging_mixin import LoggingMixin +from airflow.utils.sqlalchemy import UtcDateTime + +if TYPE_CHECKING: +from airflow.models import DAG # noqa: F401; # pylint: disable=cyclic-import +from airflow.serialization import SerializedDAG # noqa: F401 + + +log = LoggingMixin().log + + +class SerializedDagModel(Base): +"""A table for serialized DAGs. + +serialized_dag table is a snapshot of DAG files synchronized by scheduler. +This feature is controlled by: + +* ``[core] store_serialized_dags = True``: enable this feature +* ``[core] min_serialized_dag_update_interval = 30`` (s): + serialized DAGs are updated in DB when a file gets processed by scheduler, + to reduce DB write rate, there is a minimal interval of updating serialized DAGs. +* ``[scheduler] dag_dir_list_interval = 300`` (s): + interval of deleting serialized DAGs in DB when the files are deleted, suggest + to use a smaller interval such as 60 + +It is used by webserver to load dagbags when ``store_serialized_dags=True``. +Because reading from database is lightweight compared to importing from files, +it solves the webserver scalability issue. +""" +__tablename__ = 'serialized_dag' + +dag_id = Column(String(ID_LEN), primary_key=True) +fileloc = Column(String(2000), nullable=False) +# The max length of fileloc exceeds the limit of indexing. +fileloc_hash = Column(Integer, nullable=False) +data = Column(JSON, nullable=False) +last_updated = Column(UtcDateTime, nullable=False) + +__table_args__ = ( +Index('idx_fileloc_hash', fileloc_hash, unique=False), +) + +def __init__(self, dag: 'DAG'): +from airflow.serialization import SerializedDAG # noqa # pylint: disable=redefined-outer-name + +self.dag_id = dag.dag_id +self.fileloc = dag.full_filepath +self.fileloc_hash = self.dag_fileloc_hash(self.fileloc) +self.data = SerializedDAG.to_dict(dag) +self.last_updated = timezone.utcnow() + +@staticmethod +def dag_fileloc_hash(full_filepath: str) -> int: +Hashing file location for indexing. + +:param full_filepath: full filepath of DAG file +:return: hashed full_filepath +""" +# hashing is needed because the length of fileloc is 2000 as an Airflow convention, +# which is over the limit of indexing. If we can reduce the length of fileloc, then +# hashing is not needed. +return int.from_bytes( +hashlib.sha1(full_filepath.encode('utf-8')).digest()[-2:], byteorder='big', signed=False) + +@classmethod +@db.provide_session +def write_dag(cls, dag: 'DAG', min_update_interval: Optional[int] = None, session=None): +"""Serializes a DAG and writes it into database. + +:param dag: a DAG to be written into database +:param min_update_interval: minimal interval in seconds to update serialized DAG +:param session: ORM Session +""" +log.debug("Writing DAG: %s to the DB", dag) +# Checks if (Current Time - Time when the DAG was written to DB) < min_update_interval +# If Yes, does nothing +# If No or the DAG does not exists, updates / writes Serialized DAG to DB +if min_update_interval is not None: +if session.query(exists().where( +and_(cls.dag_id == dag.d
[GitHub] [airflow-site] kgabryje commented on a change in pull request #89: [depends on #83] Feature/blog tags
kgabryje commented on a change in pull request #89: [depends on #83] Feature/blog tags URL: https://github.com/apache/airflow-site/pull/89#discussion_r337771639 ## File path: landing-pages/site/assets/scss/main-custom.scss ## @@ -18,7 +18,7 @@ */ @import url('https://fonts.googleapis.com/css?family=Rubik:500&display=swap'); -@import url('https://fonts.googleapis.com/css?family=Roboto:400,500,700&display=swap'); +@import url('https://fonts.googleapis.com/css?family=Roboto:400,400i,500,700&display=swap'); Review comment: 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337771516 ## File path: docs/howto/enable-dag-serialization.rst ## @@ -0,0 +1,109 @@ + .. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + + + + +Enable DAG Serialization + + +Add the following settings in ``airflow.cfg``: + +.. code-block:: ini + +[core] +store_serialized_dags = True +min_serialized_dag_update_interval = 30 + +* ``store_serialized_dags``: This flag decides whether to serialises DAGs and persist them in DB. +If set to True, Webserver reads from DB instead of parsing DAG files +* ``min_serialized_dag_update_interval``: This flag sets the minimum interval (in seconds) after which +the serialized DAG in DB should be updated. This helps in reducing database write rate. + +If you are updating Airflow from <1.10.6, please do not forget to run ``airflow db upgrade``. + + +How it works + + +In order to make Airflow Webserver stateless (almost!), Airflow >=1.10.6 supports +DAG Serialization and DB Persistence. + +.. image:: ../img/dag_serialization.png + +As shown in the image above in Vanilla Airflow, the Webserver and the Scheduler both +needs access to the DAG files. Both the scheduler and webserver parses the DAG files. + +With **DAG Serialization** we aim to decouple the webserver from DAG parsing +which would make the Webserver very light-weight. + +As shown in the image above, when using the dag_serilization feature, +the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB. + +The Webserver now instead of having to parse the DAG file again, reads the +serialized DAGs in JSON, de-serializes them and create the DagBag and uses it +to show in the UI. + +One of the key features that is implemented as the part of DAG Serialization is that +instead of loading an entire DagBag when the WebServer starts we only load each DAG on demand from the +Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable +when you have large number of DAGs. + +Below is the screenshot of the ``serialized_dag`` table in Metadata DB: + +.. image:: ../img/serialized_dag_table.png + +Limitations +--- +The Webserver will still need access to DAG files in the following cases, +which is why we said "almost" stateless. + +* **Rendered Template** tab will still have to parse Python file as it needs all the details like +the execution date and even the data passed by the upstream task using Xcom. +* **Code View** will read the DAG File & show it using Pygments. +However, it does not need to Parse the Python file so it is still a small operation. +* :doc:`Extra Operator Links ` would not work out of +the box. They need to be defined in Airflow Plugin. + +**Existing Airflow Operators**: +To make extra operator links work with existing operators like BigQuery, copy all +the classes that are defined in ``operator_extra_links`` property. Review comment: Yes, unfortunately in the current implementation you need to add operator links via plugin for in-built operators too. Let me try and see if I can remove the need to register links for in-built operators links This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337770457 ## File path: airflow/models/serialized_dag.py ## @@ -0,0 +1,214 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Serialzed DAG table in database.""" + +import hashlib +from datetime import timedelta +from typing import TYPE_CHECKING, Any, Dict, List, Optional + +# from sqlalchemy import Column, Index, Integer, String, Text, JSON, and_, exc +from sqlalchemy import JSON, Column, Index, Integer, String, and_ +from sqlalchemy.sql import exists + +from airflow.models.base import ID_LEN, Base +from airflow.utils import db, timezone +from airflow.utils.log.logging_mixin import LoggingMixin +from airflow.utils.sqlalchemy import UtcDateTime + +if TYPE_CHECKING: +from airflow.models import DAG # noqa: F401; # pylint: disable=cyclic-import +from airflow.serialization import SerializedDAG # noqa: F401 + + +log = LoggingMixin().log + + +class SerializedDagModel(Base): +"""A table for serialized DAGs. + +serialized_dag table is a snapshot of DAG files synchronized by scheduler. +This feature is controlled by: + +* ``[core] store_serialized_dags = True``: enable this feature +* ``[core] min_serialized_dag_update_interval = 30`` (s): + serialized DAGs are updated in DB when a file gets processed by scheduler, + to reduce DB write rate, there is a minimal interval of updating serialized DAGs. +* ``[scheduler] dag_dir_list_interval = 300`` (s): + interval of deleting serialized DAGs in DB when the files are deleted, suggest + to use a smaller interval such as 60 + +It is used by webserver to load dagbags when ``store_serialized_dags=True``. +Because reading from database is lightweight compared to importing from files, +it solves the webserver scalability issue. +""" +__tablename__ = 'serialized_dag' + +dag_id = Column(String(ID_LEN), primary_key=True) +fileloc = Column(String(2000), nullable=False) +# The max length of fileloc exceeds the limit of indexing. +fileloc_hash = Column(Integer, nullable=False) +data = Column(JSON, nullable=False) +last_updated = Column(UtcDateTime, nullable=False) + +__table_args__ = ( +Index('idx_fileloc_hash', fileloc_hash, unique=False), +) + +def __init__(self, dag: 'DAG'): +from airflow.serialization import SerializedDAG # noqa # pylint: disable=redefined-outer-name + +self.dag_id = dag.dag_id +self.fileloc = dag.full_filepath +self.fileloc_hash = self.dag_fileloc_hash(self.fileloc) +self.data = SerializedDAG.to_dict(dag) +self.last_updated = timezone.utcnow() + +@staticmethod +def dag_fileloc_hash(full_filepath: str) -> int: +Hashing file location for indexing. + +:param full_filepath: full filepath of DAG file +:return: hashed full_filepath +""" +# hashing is needed because the length of fileloc is 2000 as an Airflow convention, +# which is over the limit of indexing. If we can reduce the length of fileloc, then +# hashing is not needed. +return int.from_bytes( +hashlib.sha1(full_filepath.encode('utf-8')).digest()[-2:], byteorder='big', signed=False) + +@classmethod +@db.provide_session +def write_dag(cls, dag: 'DAG', min_update_interval: Optional[int] = None, session=None): +"""Serializes a DAG and writes it into database. + +:param dag: a DAG to be written into database +:param min_update_interval: minimal interval in seconds to update serialized DAG +:param session: ORM Session +""" +log.debug("Writing DAG: %s to the DB", dag) +# Checks if (Current Time - Time when the DAG was written to DB) < min_update_interval +# If Yes, does nothing +# If No or the DAG does not exists, updates / writes Serialized DAG to DB +if min_update_interval is not None: +if session.query(exists().where( +and_(cls.dag_id == dag.d
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337769472 ## File path: airflow/dag/serialization/schema.json ## @@ -0,0 +1,196 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#";, + "$id": "https://airflow.apache.com/schemas/serialized-dags.json";, + "definitions": { +"datetime": { + "description": "A date time, stored as fractional seconds since the epoch", + "type": "number" +}, +"typed_datetime": { + "type": "object", + "properties": { +"__type": { + "type": "string", + "const": "datetime" +}, +"__var": { "$ref": "#/definitions/datetime" } + }, + "required": [ +"__type", +"__var" + ], + "additionalProperties": false +}, +"timedelta": { + "type": "number", + "minimum": 0 +}, +"typed_timedelta": { + "type": "object", + "properties": { +"__type": { + "type": "string", + "const": "timedelta" +}, +"__var": { "$ref": "#/definitions/timedelta" } + }, + "required": [ +"__type", +"__var" + ], + "additionalProperties": false +}, +"typed_relativedelta": { + "type": "object", + "description": "A dateutil.relativedelta.relativedelta object", + "properties": { +"__type": { + "type": "string", + "const": "relativedelta" +}, +"__var": { + "type": "object", + "properties": { +"weekday": { + "type": "array", + "items": { "type": "integer" }, + "minItems": 1, + "maxItems": 2 +} + }, + "additionalProperties": { "type": "integer" } +} + } +}, +"timezone": { + "type": "string" +}, +"dict": { + "description": "A python dictionary containing values of any type", + "type": "object" +}, +"color": { + "type": "string", + "pattern": "^#[a-fA-F0-9]{3,6}$" +}, +"stringset": { + "description": "A set of strings", + "type": "object", + "properties": { +"__type": { + "type": "string", + "const": "set" +}, +"__var": { + "type": "array", + "items": { "type": "string" } +} + }, + "required": [ +"__type", +"__var" + ] +}, +"dag": { + "type": "object", + "properties": { +"params": { "$ref": "#/definitions/dict" }, +"_dag_id": { "type": "string" }, +"task_dict": { "$ref": "#/definitions/task_dict" }, +"timezone": { "$ref": "#/definitions/timezone" }, +"schedule_interval": { + "anyOf": [ +{ "type": "null" }, +{ "type": "string" }, +{ "$ref": "#/definitions/typed_timedelta" }, +{ "$ref": "#/definitions/typed_relativedelta" } + ] +}, +"catchup": { "type": "boolean" }, +"is_subdag": { "type": "boolean" }, +"fileloc": { "type" : "string"}, +"orientation": { "type" : "string"}, +"_description": { "type" : "string"}, +"_concurrency": { "type" : "number"}, +"max_active_runs": { "type" : "number"}, +"default_args": { "$ref": "#/definitions/dict" }, +"start_date": { "$ref": "#/definitions/datetime" }, +"dagrun_timeout": { "$ref": "#/definitions/timedelta" }, +"doc_md": { "type" : "string"} + }, + "required": [ +"params", +"_dag_id", +"fileloc", +"task_dict" + ], + "additionalProperties": false +}, +"task_dict": { + "type": "object", + "additionalProperties": { "$ref": "#/definitions/operator" } +}, +"operator": { + "$comment": "A task/operator in a DAG", + "type": "object", + "required": [ +"_task_type", +"_task_module", +"task_id", +"ui_color", +"ui_fgcolor", +"template_fields" + ], + "properties": { +"_task_type": { "type": "string" }, +"_task_module": { "type": "string" }, +"task_id": { "type": "string" }, +"owner": { "type": "string" }, +"start_date": { "$ref": "#/definitions/datetime" }, +"end_date": { "$ref": "#/definitions/datetime" }, +"trigger_rule": { "type": "string" }, +"depends_on_past": { "type": "boolean" }, +"wait_for_downstream": { "type": "boolean" }, +"retries": { "type": "number" }, +"queue": { "type": "string" }, +"pool": { "type": "string" }, +"execution_timeout": { "$ref": "#/definitions/timedelta" }, +"retry_delay": { "$ref": "#/definitions/timedelta" }, +"retry_exponential_backoff": { "ty
[GitHub] [airflow] kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability
kaxil commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability URL: https://github.com/apache/airflow/pull/5743#discussion_r337769592 ## File path: airflow/dag/serialization/serialization.py ## @@ -53,81 +50,81 @@ class Serialization: _primitive_types = (int, bool, float, str) # Time types. -_datetime_types = (datetime.datetime, datetime.date, datetime.time) +# datetime.date and datetime.time are converted to strings. +_datetime_types = (datetime.datetime,) # Object types that are always excluded in serialization. # FIXME: not needed if _included_fields of DAG and operator are customized. _excluded_types = (logging.Logger, Connection, type) -_json_schema = None # type: Optional[Dict] +_json_schema = None # type: Optional[Validator] + +_CONSTRUCTOR_PARAMS = {} # type: Dict[str, Parameter] + +SERIALIZER_VERSION = 1 @classmethod def to_json(cls, var: Union[DAG, BaseOperator, dict, list, set, tuple]) -> str: """Stringifies DAGs and operators contained by var and returns a JSON string of var. """ -return json.dumps(cls._serialize(var, {}), ensure_ascii=True) +return json.dumps(cls.to_dict(var), ensure_ascii=True) @classmethod -def from_json(cls, json_str: str) -> Union[ +def to_dict(cls, var: Union[DAG, BaseOperator, dict, list, set, tuple]) -> dict: +"""Stringifies DAGs and operators contained by var and returns a dict of var. +""" +# Don't call on this class directly - only SerializedDAG or +# SerializedBaseOperator should be used as the "entrypoint" +raise NotImplementedError() + +@classmethod +def from_json(cls, serialized_obj: str) -> Union[ 'SerializedDAG', 'SerializedBaseOperator', dict, list, set, tuple]: """Deserializes json_str and reconstructs all DAGs and operators it contains.""" -return cls._deserialize(json.loads(json_str), {}) +return cls.from_dict(json.loads(serialized_obj)) @classmethod -def validate_json(cls, json_str: str): -"""Validate json_str satisfies JSON schema.""" +def from_dict(cls, serialized_obj: dict) -> Union[ +'SerializedDAG', 'SerializedBaseOperator', dict, list, set, tuple]: +"""Deserializes a python dict stored with type decorators and +reconstructs all DAGs and operators it contains.""" +return cls._deserialize(serialized_obj) + +@classmethod +def validate_schema(cls, serialized_obj: Union[str, dict]): +"""Validate serialized_obj satisfies JSON schema.""" if cls._json_schema is None: raise AirflowException('JSON schema of {:s} is not set.'.format(cls.__name__)) -jsonschema.validate(json.loads(json_str), cls._json_schema) + +if isinstance(serialized_obj, dict): +cls._json_schema.validate(serialized_obj) +elif isinstance(serialized_obj, str): +cls._json_schema.validate(json.loads(serialized_obj)) Review comment: Updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] stevetaggart opened a new pull request #6391: [AIRFLOW-XXX] Improve wording
stevetaggart opened a new pull request #6391: [AIRFLOW-XXX] Improve wording URL: https://github.com/apache/airflow/pull/6391 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-XXX - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: a small change to documentation wording ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: tiny documentation change ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] mik-laj commented on a change in pull request #89: Feature/blog tags
mik-laj commented on a change in pull request #89: Feature/blog tags URL: https://github.com/apache/airflow-site/pull/89#discussion_r337763096 ## File path: landing-pages/site/assets/scss/main-custom.scss ## @@ -18,7 +18,7 @@ */ @import url('https://fonts.googleapis.com/css?family=Rubik:500&display=swap'); -@import url('https://fonts.googleapis.com/css?family=Roboto:400,500,700&display=swap'); +@import url('https://fonts.googleapis.com/css?family=Roboto:400,400i,500,700&display=swap'); Review comment: It doesn't have to be in a separate PR. I just didn't know what happened here and I thought it was a typo This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] kgabryje commented on a change in pull request #89: Feature/blog tags
kgabryje commented on a change in pull request #89: Feature/blog tags URL: https://github.com/apache/airflow-site/pull/89#discussion_r337754367 ## File path: landing-pages/site/assets/scss/main-custom.scss ## @@ -18,7 +18,7 @@ */ @import url('https://fonts.googleapis.com/css?family=Rubik:500&display=swap'); -@import url('https://fonts.googleapis.com/css?family=Roboto:400,500,700&display=swap'); +@import url('https://fonts.googleapis.com/css?family=Roboto:400,400i,500,700&display=swap'); Review comment: I added Roboto italic because we have italic style in typography. Though i realise now that it shouldn't be a part of this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-5720) don't call _get_connections_from_db in TestCloudSqlDatabaseHook
[ https://issues.apache.org/jira/browse/AIRFLOW-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Standish updated AIRFLOW-5720: - Description: Issues with this test class: *tests are mocking lower-level than they need to* Tests were mocking {{airflow.hook.BaseHook.get_connections}}. Instead they can mock {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is more direct. *should not reference private method* This is an impediment to refactoring of connections / creds. *Tests had complexity that did not add a benefit* They all had this bit: {code:python} self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) {code} {{_setup_connections}} was like this: {code:python} @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] {code} Issues here are as follows. 1. no test ever used the third side effect 2. the first side effect does not help us; {{default_gcp_project_id}} is irrelevant Only one of the three connections in {{_setup_connections}} has any impact on the test. The call of {{BaseHook.get_connection}} only serves to discard the first connection in mock side effect list, {{gcp_connection}}. The second connection is the one that matters, and it is returned when {{CloudSqlDatabaseHook}} calls `self.get_connection` during init. Since it is a mock side effect, it doesn't matter what value is given for conn_id. So the {{CloudSqlDatabaseHook}} init param {{default_gcp_project_id}} has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. We should not have extra code that does not serve a purpose because it makes it harder to understand what's actually happening. was: Issues with this test class: *tests are mocking lower-level than they need to* Tests were mocking {{airflow.hook.BaseHook.get_connections}}. Instead they can mock {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is more direct. *should not reference private method* This is an impediment to refactoring of connections / creds. *Tests had complexity that did not add a benefit* They all had this bit: {code:python} self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) {code} {{_setup_connections}} was like this: {code:python} @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] {code} Issues here are as follows. 1. no test ever used the third side effect 2. the first side effect does not help us; {{default_gcp_project_id}} is irrelevant Only one of the three connections in {{_setup_connections}} has any impact on the test. The call of {{BaseHook.get_connection}} only serves to discard the first connection in mock side effect list, {{gcp_connection}}. The second connection is the one that matters, and it is returned when {{CloudSqlDatabaseHook}} calls `self.get_connection` during init. Since it is a mock side effect, it doesn't matter what value is given for conn_id. So the {{CloudSqlDatabaseHook}} init param {{default_gcp_project_id}} has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. > don't call _get_connections_from_db in TestCloudSqlDatabaseHook > --- > > Key: AIRFLOW-5720 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5720 > Project: Apache Airflow >
[GitHub] [airflow] codecov-io edited a comment on issue #6281: Run batches of (self-terminating) EMR JobFlows [AIRFLOW-XXX]
codecov-io edited a comment on issue #6281: Run batches of (self-terminating) EMR JobFlows [AIRFLOW-XXX] URL: https://github.com/apache/airflow/pull/6281#issuecomment-539810934 # [Codecov](https://codecov.io/gh/apache/airflow/pull/6281?src=pr&el=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@bc53412`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/6281/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/6281?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master#6281 +/- ## = Coverage ? 80.33% = Files ? 627 Lines ?36316 Branches ?0 = Hits ?29173 Misses? 7143 Partials ?0 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/6281?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/contrib/sensors/emr\_run\_job\_flows.py](https://codecov.io/gh/apache/airflow/pull/6281/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvZW1yX3J1bl9qb2JfZmxvd3MucHk=) | `100% <100%> (ø)` | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/6281?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/6281?src=pr&el=footer). Last update [bc53412...82f63ec](https://codecov.io/gh/apache/airflow/pull/6281?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] BasPH commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage
BasPH commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage URL: https://github.com/apache/airflow/pull/6317#discussion_r337730794 ## File path: airflow/operators/dagrun_operator.py ## @@ -18,81 +18,64 @@ # under the License. import datetime -import json -from typing import Callable, Dict, Optional, Union +from typing import Dict, Optional, Union from airflow.api.common.experimental.trigger_dag import trigger_dag from airflow.models import BaseOperator from airflow.utils import timezone from airflow.utils.decorators import apply_defaults -class DagRunOrder: -def __init__(self, run_id=None, payload=None): -self.run_id = run_id -self.payload = payload - - class TriggerDagRunOperator(BaseOperator): """ Triggers a DAG run for a specified ``dag_id`` :param trigger_dag_id: the dag_id to trigger (templated) :type trigger_dag_id: str -:param python_callable: a reference to a python function that will be -called while passing it the ``context`` object and a placeholder -object ``obj`` for your callable to fill and return if you want -a DagRun created. This ``obj`` object contains a ``run_id`` and -``payload`` attribute that you can modify in your function. -The ``run_id`` should be a unique identifier for that DAG run, and -the payload has to be a picklable object that will be made available -to your tasks while executing that DAG run. Your function header -should look like ``def foo(context, dag_run_obj):`` -:type python_callable: python callable +:param conf: Configuration for the DAG run +:type conf: dict :param execution_date: Execution date for the dag (templated) :type execution_date: str or datetime.datetime """ -template_fields = ('trigger_dag_id', 'execution_date') -ui_color = '#ffefeb' + +template_fields = ("trigger_dag_id", "execution_date", "conf") +ui_color = "#ffefeb" @apply_defaults def __init__( -self, -trigger_dag_id: str, -python_callable: Optional[Callable[[Dict, DagRunOrder], DagRunOrder]] = None, -execution_date: Optional[Union[str, datetime.datetime]] = None, -*args, **kwargs) -> None: +self, +trigger_dag_id: str, +conf: Optional[Dict] = None, +execution_date: Optional[Union[str, datetime.datetime]] = None, +*args, +**kwargs +) -> None: super().__init__(*args, **kwargs) -self.python_callable = python_callable self.trigger_dag_id = trigger_dag_id +self.conf = conf -self.execution_date = None # type: Optional[Union[str, datetime.datetime]] -if isinstance(execution_date, datetime.datetime): -self.execution_date = execution_date.isoformat() -elif isinstance(execution_date, str): +if execution_date is None or isinstance(execution_date, (str, datetime.datetime)): self.execution_date = execution_date -elif execution_date is None: -self.execution_date = None else: raise TypeError( -'Expected str or datetime.datetime type ' -'for execution_date. Got {}'.format( -type(execution_date))) +"Expected str or datetime.datetime type for execution_date. " +"Got {}".format(type(execution_date)) +) -def execute(self, context): -if self.execution_date is not None: -run_id = 'trig__{}'.format(self.execution_date) -self.execution_date = timezone.parse(self.execution_date) +def execute(self, context: Dict): +if isinstance(self.execution_date, datetime.datetime): +run_id = "trig__{}".format(self.execution_date.isoformat()) +elif isinstance(self.execution_date, str): +run_id = "trig__{}".format(self.execution_date) +self.execution_date = timezone.parse(self.execution_date) # trigger_dag() expects datetime Review comment: Would like to use it as it is much shorter and simpler. However, the `execution_date` can be templated. For example `{{ execution_date }}` will fail in `timezone.parse()`. So, we have to save it first, wait for `execute()` to be called and all variables to be templated, and only then can we call `timezone.parse()` on the `execution_date` :( This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #2892: [AIRFLOW-XXX] Upgrade support to python 3.5 and disable dask
tooptoop4 commented on issue #2892: [AIRFLOW-XXX] Upgrade support to python 3.5 and disable dask URL: https://github.com/apache/airflow/pull/2892#issuecomment-545132250 Can tests be reenabled? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] tooptoop4 commented on issue #2067: [AIRFLOW-862] Add DaskExecutor
tooptoop4 commented on issue #2067: [AIRFLOW-862] Add DaskExecutor URL: https://github.com/apache/airflow/pull/2067#issuecomment-545130598 @jlowin can distribute version above v2 work? Setup.py limits it to <2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] BasPH commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage
BasPH commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage URL: https://github.com/apache/airflow/pull/6317#discussion_r337683917 ## File path: airflow/operators/dagrun_operator.py ## @@ -18,81 +18,64 @@ # under the License. import datetime -import json -from typing import Callable, Dict, Optional, Union +from typing import Dict, Optional, Union from airflow.api.common.experimental.trigger_dag import trigger_dag from airflow.models import BaseOperator from airflow.utils import timezone from airflow.utils.decorators import apply_defaults -class DagRunOrder: -def __init__(self, run_id=None, payload=None): -self.run_id = run_id -self.payload = payload - - class TriggerDagRunOperator(BaseOperator): """ Triggers a DAG run for a specified ``dag_id`` :param trigger_dag_id: the dag_id to trigger (templated) :type trigger_dag_id: str -:param python_callable: a reference to a python function that will be -called while passing it the ``context`` object and a placeholder -object ``obj`` for your callable to fill and return if you want -a DagRun created. This ``obj`` object contains a ``run_id`` and -``payload`` attribute that you can modify in your function. -The ``run_id`` should be a unique identifier for that DAG run, and -the payload has to be a picklable object that will be made available -to your tasks while executing that DAG run. Your function header -should look like ``def foo(context, dag_run_obj):`` -:type python_callable: python callable +:param conf: Configuration for the DAG run +:type conf: dict :param execution_date: Execution date for the dag (templated) :type execution_date: str or datetime.datetime """ -template_fields = ('trigger_dag_id', 'execution_date') -ui_color = '#ffefeb' + +template_fields = ("trigger_dag_id", "execution_date", "conf") +ui_color = "#ffefeb" @apply_defaults def __init__( -self, -trigger_dag_id: str, -python_callable: Optional[Callable[[Dict, DagRunOrder], DagRunOrder]] = None, -execution_date: Optional[Union[str, datetime.datetime]] = None, -*args, **kwargs) -> None: +self, +trigger_dag_id: str, +conf: Optional[Dict] = None, +execution_date: Optional[Union[str, datetime.datetime]] = None, +*args, +**kwargs +) -> None: super().__init__(*args, **kwargs) -self.python_callable = python_callable self.trigger_dag_id = trigger_dag_id +self.conf = conf -self.execution_date = None # type: Optional[Union[str, datetime.datetime]] -if isinstance(execution_date, datetime.datetime): -self.execution_date = execution_date.isoformat() -elif isinstance(execution_date, str): +if execution_date is None or isinstance(execution_date, (str, datetime.datetime)): self.execution_date = execution_date -elif execution_date is None: -self.execution_date = None else: raise TypeError( -'Expected str or datetime.datetime type ' -'for execution_date. Got {}'.format( -type(execution_date))) +"Expected str or datetime.datetime type for execution_date. " +"Got {}".format(type(execution_date)) +) -def execute(self, context): -if self.execution_date is not None: -run_id = 'trig__{}'.format(self.execution_date) -self.execution_date = timezone.parse(self.execution_date) +def execute(self, context: Dict): +if isinstance(self.execution_date, datetime.datetime): +run_id = "trig__{}".format(self.execution_date.isoformat()) +elif isinstance(self.execution_date, str): +run_id = "trig__{}".format(self.execution_date) +self.execution_date = timezone.parse(self.execution_date) # trigger_dag() expects datetime Review comment: Much simpler, much better👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-5720) don't call _get_connections_from_db in TestCloudSqlDatabaseHook
[ https://issues.apache.org/jira/browse/AIRFLOW-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957306#comment-16957306 ] ASF GitHub Bot commented on AIRFLOW-5720: - dstandish commented on pull request #6390: [AIRFLOW-5720] don't call private method _get_connections_from_db URL: https://github.com/apache/airflow/pull/6390 * tests are mocking lower-level than they need to - Tests were mocking airflow.hook.BaseHook.get_connections. - Instead they can mock airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection which is more direct. * should not reference private method - This is an impediment to refactoring of connections / creds. * Tests had complexity that did not add a benefit - only one of the three connections in _setup_connections actually had any impact on the test - supplying a param for default_gcp_project_id is misleading because it has no impact on the test See Jira ticket [AIRFLOW-5720](https://issues.apache.org/jira/browse/AIRFLOW-5720) for more info. Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [AIRFLOW-5720](https://issues.apache.org/jira/browse/AIRFLOW-5720) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-5720 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Issues with this test class: **tests are mocking lower-level than they need to** Tests were mocking airflow.hook.BaseHook.get_connections. Instead they can mock airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection which is more direct. **should not reference private method** This is an impediment to refactoring of connections / creds. **Tests had complexity that did not add a benefit** They all had this bit: self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) _setup_connections was like this: @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] Issues here are as follows. 1. no test ever used the third side effect 2. the first side effect does not help us; `default_gcp_project_id` is irrelevant Only one of the three connections in `_setup_connections` has any impact on the test. The call of `BaseHook.get_connection` only serves to discard the first connection in mock side effect list, `gcp_connection`. The second connection is the one that matters, and it is returned when `CloudSqlDatabaseHook` calls `self.get_connection` during init. Since it is a mock side effect, it doesn't matter what value is given for conn_id. So the `CloudSqlDatabaseHook` init param `default_gcp_project_id` has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropria
[jira] [Updated] (AIRFLOW-5720) don't call _get_connections_from_db in TestCloudSqlDatabaseHook
[ https://issues.apache.org/jira/browse/AIRFLOW-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Standish updated AIRFLOW-5720: - Description: Issues with this test class: *tests are mocking lower-level than they need to* Tests were mocking {{airflow.hook.BaseHook.get_connections}}. Instead they can mock {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is more direct. *should not reference private method* This is an impediment to refactoring of connections / creds. *Tests had complexity that did not add a benefit* They all had this bit: {code:python} self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) {code} {{_setup_connections}} was like this: {code:python} @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] {code} Issues here are as follows. 1. no test ever used the third side effect 2. the first side effect does not help us; {{default_gcp_project_id}} is irrelevant Only one of the three connections in {{_setup_connections}} has any impact on the test. The call of {{BaseHook.get_connection}} only serves to discard the first connection in mock side effect list, {{gcp_connection}}. The second connection is the one that matters, and it is returned when {{CloudSqlDatabaseHook}} calls `self.get_connection` during init. Since it is a mock side effect, it doesn't matter what value is given for conn_id. So the {{CloudSqlDatabaseHook}} init param {{default_gcp_project_id}} has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. was: Issues with this test class: *tests are mocking lower-level than they need to* Tests were mocking {{airflow.hook.BaseHook.get_connections}}. Instead they can mock {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is more direct. *should not reference private method* This is an impediment to refactoring of connections / creds. *Tests had complexity that did not add a benefit* They all had this bit: {code:python} self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) {code} {{_setup_connections}} was like this: {code:python} @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] {code} Issues here are as follows. 1. no test ever used the third side effect 2. the first side effect does not help us; {{default_gcp_project_id}} is irrelevant. All this line serves to accomplish is to discard the first connection, {{gcp_connection}}. This is the first invocation of {{BaseHook.get_connection}}. The second invocation is the one that matters, namely when {{CloudSqlDatabaseHook}} calls `self.get_connection`. This is when {{cloudsql_connection}} is returned. But since it is a mock side effect, it doesn't matter what value you give for conn_id. So the param {{default_gcp_project_id}} has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. > don't call _get_connections_from_db in TestCloudSqlDatabaseHook > --- > > Key: AIRFLOW-5720 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5720 > Project: Apache Airflow > Issue Type: New Feature > Components: gcp >Affects Versions: 1.10.5 >Reporter: Daniel Standish >Assignee: Daniel Standish >Priority: Ma
[GitHub] [airflow] dstandish opened a new pull request #6390: [AIRFLOW-5720] don't call private method _get_connections_from_db
dstandish opened a new pull request #6390: [AIRFLOW-5720] don't call private method _get_connections_from_db URL: https://github.com/apache/airflow/pull/6390 * tests are mocking lower-level than they need to - Tests were mocking airflow.hook.BaseHook.get_connections. - Instead they can mock airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection which is more direct. * should not reference private method - This is an impediment to refactoring of connections / creds. * Tests had complexity that did not add a benefit - only one of the three connections in _setup_connections actually had any impact on the test - supplying a param for default_gcp_project_id is misleading because it has no impact on the test See Jira ticket [AIRFLOW-5720](https://issues.apache.org/jira/browse/AIRFLOW-5720) for more info. Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [AIRFLOW-5720](https://issues.apache.org/jira/browse/AIRFLOW-5720) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-5720 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Issues with this test class: **tests are mocking lower-level than they need to** Tests were mocking airflow.hook.BaseHook.get_connections. Instead they can mock airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection which is more direct. **should not reference private method** This is an impediment to refactoring of connections / creds. **Tests had complexity that did not add a benefit** They all had this bit: self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) _setup_connections was like this: @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] Issues here are as follows. 1. no test ever used the third side effect 2. the first side effect does not help us; `default_gcp_project_id` is irrelevant Only one of the three connections in `_setup_connections` has any impact on the test. The call of `BaseHook.get_connection` only serves to discard the first connection in mock side effect list, `gcp_connection`. The second connection is the one that matters, and it is returned when `CloudSqlDatabaseHook` calls `self.get_connection` during init. Since it is a mock side effect, it doesn't matter what value is given for conn_id. So the `CloudSqlDatabaseHook` init param `default_gcp_project_id` has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about
[GitHub] [airflow-site] mik-laj commented on a change in pull request #89: Feature/blog tags
mik-laj commented on a change in pull request #89: Feature/blog tags URL: https://github.com/apache/airflow-site/pull/89#discussion_r337716708 ## File path: landing-pages/site/assets/scss/main-custom.scss ## @@ -18,7 +18,7 @@ */ @import url('https://fonts.googleapis.com/css?family=Rubik:500&display=swap'); -@import url('https://fonts.googleapis.com/css?family=Roboto:400,500,700&display=swap'); +@import url('https://fonts.googleapis.com/css?family=Roboto:400,400i,500,700&display=swap'); Review comment: ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-545123697 Actually, the current database layout is backward compatible. We don't use the `id` field, and I don't see too much value in writing a complicated migration script (since sqlite doesn't support removing columns, and also removing the PK's can be tricky in alembic). Ready for review :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[airflow-site] branch aip-11 updated: Add blog page layout (#87)
This is an automated email from the ASF dual-hosted git repository. kamilbregula pushed a commit to branch aip-11 in repository https://gitbox.apache.org/repos/asf/airflow-site.git The following commit(s) were added to refs/heads/aip-11 by this push: new 11ee2c4 Add blog page layout (#87) 11ee2c4 is described below commit 11ee2c435d79de5c0726abd1b5ccbd75943c3c95 Author: Kamil Breguła AuthorDate: Tue Oct 22 21:44:07 2019 +0200 Add blog page layout (#87) Co-Authored-By: Kamil Gabryjelski --- .../assets/scss/{main-custom.scss => _blog.scss} | 29 +-- landing-pages/site/assets/scss/_list-boxes.scss| 33 ++- landing-pages/site/assets/scss/main-custom.scss| 1 + landing-pages/site/config.toml | 6 +- .../Its-a-breeze-to-develop-apache-airflow.html| 8 + .../Its-a-breeze-to-develop-apache-airflow2.html | 8 + .../Its-a-breeze-to-develop-apache-airflow3.html | 8 + .../Its-a-breeze-to-develop-apache-airflow4.html | 8 + .../Its-a-breeze-to-develop-apache-airflow5.html | 8 + landing-pages/site/content/en/blog/_index.md | 7 +- landing-pages/site/content/en/blog/news/_index.md | 6 - .../blog/news/first-post/featured-sunset-get.png | Bin 387442 -> 0 bytes .../site/content/en/blog/news/first-post/index.md | 44 .../site/content/en/blog/news/second-post.md | 245 - .../site/content/en/blog/releases/_index.md| 6 - .../releases/in-depth-monoliths-detailed-spec.md | 245 - landing-pages/site/layouts/blog/baseof.html| 46 landing-pages/site/layouts/blog/list.html | 31 +++ .../site/layouts/partials/boxes/blogpost.html | 33 +++ landing-pages/site/layouts/taxonomy/baseof.html| 46 landing-pages/site/layouts/taxonomy/tag.html | 31 +++ landing-pages/src/js/showAllCommiters.js | 2 +- 22 files changed, 274 insertions(+), 577 deletions(-) diff --git a/landing-pages/site/assets/scss/main-custom.scss b/landing-pages/site/assets/scss/_blog.scss similarity index 60% copy from landing-pages/site/assets/scss/main-custom.scss copy to landing-pages/site/assets/scss/_blog.scss index 0813810..696d145 100644 --- a/landing-pages/site/assets/scss/main-custom.scss +++ b/landing-pages/site/assets/scss/_blog.scss @@ -17,21 +17,14 @@ * under the License. */ -@import url('https://fonts.googleapis.com/css?family=Rubik:500&display=swap'); -@import url('https://fonts.googleapis.com/css?family=Roboto:400,500,700&display=swap'); -@import url('https://fonts.googleapis.com/css?family=Roboto+Mono&display=swap'); - -@import "typography"; -@import "accordion"; -@import "buttons"; -@import "ol-ul"; -@import "list-boxes"; -@import "avatar"; -@import "quote"; -@import "pager"; -@import "case-study"; -@import "paragraph"; -@import "base-layout"; -@import "feature"; -@import "text-with-icon"; -@import "video"; +.tag { + // TODO(kgabryje): when using @extend .bodytext__medium--cerulean-blue, scss builds stylesheet but throws errors + font-family: "Roboto", sans-serif; + font-weight: 400; + font-size: 16px; + line-height: 1.63; + color: #017cee; + background-color: #d9ebfc; // cerulean-blue + opacity 0.15 + padding: 1px 30px; + border-radius: 5px; +} diff --git a/landing-pages/site/assets/scss/_list-boxes.scss b/landing-pages/site/assets/scss/_list-boxes.scss index 51b49ed..69cd658 100644 --- a/landing-pages/site/assets/scss/_list-boxes.scss +++ b/landing-pages/site/assets/scss/_list-boxes.scss @@ -16,7 +16,6 @@ * specific language governing permissions and limitations * under the License. */ - @import "colors"; $box-margin: 20px; @@ -33,6 +32,11 @@ $box-margin: 20px; margin: $box-margin; padding: 30px 10px; width: 270px; + + &--wide { +max-width: 580px; +width: 100%; + } } .box-event { @@ -55,6 +59,32 @@ $box-margin: 20px; } } + &__blogpost { +padding: 0 20px; + +&--metadata { + display: flex; + flex-wrap: wrap; + justify-content: space-between; + margin-bottom: 20px; +} + +&--header { + @extend .subtitle__large--greyish-brown; + margin-bottom: 4px; +} + +&--author { + @extend .bodytext__medium--cerulean-blue; + font-weight: 500; +} + +&--description { + @extend .bodytext__medium--brownish-grey; + margin-bottom: 20px; +} + } + &__case-study { padding: 18px 18px 0; justify-content: space-between; @@ -109,6 +139,7 @@ $box-margin: 20px; &--members { @extend .bodytext__medium--brownish-grey; margin-bottom: 30px; + span { vertical-align: middle; } diff --git a/landing-pages/site/assets/scss/main-custom.scss b/landing-pages/site/assets/scss/main-custom.scss index 0813810..f39eef6 100644 --- a/landing-pages/site/assets/scss/main-custom.scss +++ b/landing-pages/site/assets/scss/main-custom.scss @@ -35,3 +35,4 @@ @import "feature"; @import "
[GitHub] [airflow-site] mik-laj merged pull request #87: [depends on #85, #86, #87] Add blogpost list layout
mik-laj merged pull request #87: [depends on #85, #86, #87] Add blogpost list layout URL: https://github.com/apache/airflow-site/pull/87 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-5720) don't call _get_connections_from_db in TestCloudSqlDatabaseHook
[ https://issues.apache.org/jira/browse/AIRFLOW-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Standish updated AIRFLOW-5720: - Description: Issues with this test class: *tests are mocking lower-level than they need to* Tests were mocking {{airflow.hook.BaseHook.get_connections}}. Instead they can mock {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is more direct. *should not reference private method* This is an impediment to refactoring of connections / creds. *Tests had complexity that did not add a benefit* They all had this bit: {code:python} self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) {code} {{_setup_connections}} was like this: {code:python} @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] {code} Issues here are as follows. 1. no test ever used the third side effect 2. the first side effect does not help us; {{default_gcp_project_id}} is irrelevant. All this line serves to accomplish is to discard the first connection, {{gcp_connection}}. This is the first invocation of {{BaseHook.get_connection}}. The second invocation is the one that matters, namely when {{CloudSqlDatabaseHook}} calls `self.get_connection`. This is when {{cloudsql_connection}} is returned. But since it is a mock side effect, it doesn't matter what value you give for conn_id. So the param {{default_gcp_project_id}} has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. was: Issues with this test class: *tests are mocking lower-level than they need to* Tests were mocking {{airflow.hook.BaseHook.get_connections}}. Instead they can mock {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is more direct. *should not reference private method* This is an impediment to refactoring of connections / creds. *Tests had complexity that did not add a benefit* They all had this bit: {code:python} self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) {code} {{_setup_connections}} was like this: {code:python} @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] {code} Issues here are as follows. 1. no test ever used the third side effect 2. this line: {{default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get(}} All this line serves to accomplish is to discard the first connection, {{gcp_connection}}. This is the first invocation of {{BaseHook.get_connection}}. The second invocation is the one that matters, namely when {{CloudSqlDatabaseHook}} calls `self.get_connection`. This is when {{cloudsql_connection}} is returned. But since it is a mock side effect, it doesn't matter what value you give for conn_id. So the param {{default_gcp_project_id}} has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. > don't call _get_connections_from_db in TestCloudSqlDatabaseHook > --- > > Key: AIRFLOW-5720 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5720 > Project: Apache Airflow > Issue Type: New Feature > Components: gcp >Affects Versions: 1.10.5 >Reporter: Daniel Standish >Assignee: Daniel Standish >Priority: Major > > Issues with this test class: > *tests are mo
[jira] [Updated] (AIRFLOW-5720) don't call _get_connections_from_db in TestCloudSqlDatabaseHook
[ https://issues.apache.org/jira/browse/AIRFLOW-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Standish updated AIRFLOW-5720: - Description: Issues with this test class: *tests are mocking lower-level than they need to* Tests were mocking {{airflow.hook.BaseHook.get_connections}}. Instead they can mock {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is more direct. *should not reference private method* This is an impediment to refactoring of connections / creds. *Tests had complexity that did not add a benefit* They all had this bit: {code:python} self._setup_connections(get_connections, uri) gcp_conn_id = 'google_cloud_default' hook = CloudSqlDatabaseHook( default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( 'extra__google_cloud_platform__project') ) {code} {{_setup_connections}} was like this: {code:python} @staticmethod def _setup_connections(get_connections, uri): gcp_connection = mock.MagicMock() gcp_connection.extra_dejson = mock.MagicMock() gcp_connection.extra_dejson.get.return_value = 'empty_project' cloudsql_connection = Connection() cloudsql_connection.parse_from_uri(uri) cloudsql_connection2 = Connection() cloudsql_connection2.parse_from_uri(uri) get_connections.side_effect = [[gcp_connection], [cloudsql_connection], [cloudsql_connection2]] {code} Issues here are as follows. 1. no test ever used the third side effect 2. this line: {{default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get(}} All this line serves to accomplish is to discard the first connection, {{gcp_connection}}. This is the first invocation of {{BaseHook.get_connection}}. The second invocation is the one that matters, namely when {{CloudSqlDatabaseHook}} calls `self.get_connection`. This is when {{cloudsql_connection}} is returned. But since it is a mock side effect, it doesn't matter what value you give for conn_id. So the param {{default_gcp_project_id}} has no consequence. And because it has no consequence, we should not supply a value for it because this is misleading. was: Test should not reference "private" methods. This impedes refactoring. Also some other issues with these tests. More detail TBD > don't call _get_connections_from_db in TestCloudSqlDatabaseHook > --- > > Key: AIRFLOW-5720 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5720 > Project: Apache Airflow > Issue Type: New Feature > Components: gcp >Affects Versions: 1.10.5 >Reporter: Daniel Standish >Assignee: Daniel Standish >Priority: Major > > Issues with this test class: > *tests are mocking lower-level than they need to* > Tests were mocking {{airflow.hook.BaseHook.get_connections}}. > Instead they can mock > {{airflow.gcp.hooks.cloud_sql.CloudSqlDatabaseHook.get_connection}} which is > more direct. > *should not reference private method* > This is an impediment to refactoring of connections / creds. > *Tests had complexity that did not add a benefit* > They all had this bit: > {code:python} > self._setup_connections(get_connections, uri) > gcp_conn_id = 'google_cloud_default' > hook = CloudSqlDatabaseHook( > > default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get( > 'extra__google_cloud_platform__project') > ) > {code} > {{_setup_connections}} was like this: > {code:python} > @staticmethod > def _setup_connections(get_connections, uri): > gcp_connection = mock.MagicMock() > gcp_connection.extra_dejson = mock.MagicMock() > gcp_connection.extra_dejson.get.return_value = 'empty_project' > cloudsql_connection = Connection() > cloudsql_connection.parse_from_uri(uri) > cloudsql_connection2 = Connection() > cloudsql_connection2.parse_from_uri(uri) > get_connections.side_effect = [[gcp_connection], > [cloudsql_connection], >[cloudsql_connection2]] > {code} > Issues here are as follows. > 1. no test ever used the third side effect > 2. this line: > {{default_gcp_project_id=BaseHook.get_connection(gcp_conn_id).extra_dejson.get(}} > > All this line serves to accomplish is to discard the first connection, > {{gcp_connection}}. This is the first invocation of > {{BaseHook.get_connection}}. > The second invocation is the one that matters, namely when > {{CloudSqlDatabaseHook}} calls `self.get_connection`. > This is when {{cloudsql_connection}} is returned. But since it is a mock > side effect, it doesn't matter what value you giv
[jira] [Commented] (AIRFLOW-5715) Make email, owner context available
[ https://issues.apache.org/jira/browse/AIRFLOW-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957293#comment-16957293 ] ASF GitHub Bot commented on AIRFLOW-5715: - feng-tao commented on pull request #6385: [AIRFLOW-5715] Make email, owner context available URL: https://github.com/apache/airflow/pull/6385 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make email, owner context available > --- > > Key: AIRFLOW-5715 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5715 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (AIRFLOW-5715) Make email, owner context available
[ https://issues.apache.org/jira/browse/AIRFLOW-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng resolved AIRFLOW-5715. --- Fix Version/s: 2.0.0 Resolution: Fixed > Make email, owner context available > --- > > Key: AIRFLOW-5715 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5715 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Minor > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5715) Make email, owner context available
[ https://issues.apache.org/jira/browse/AIRFLOW-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957294#comment-16957294 ] ASF subversion and git services commented on AIRFLOW-5715: -- Commit 641b8aaf04bcf68311cd490481360ea93a3d360d in airflow's branch refs/heads/master from Tao Feng [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=641b8aa ] [AIRFLOW-5715] Make email, owner context available (#6385) > Make email, owner context available > --- > > Key: AIRFLOW-5715 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5715 > Project: Apache Airflow > Issue Type: Improvement > Components: core >Affects Versions: 1.10.5 >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] feng-tao merged pull request #6385: [AIRFLOW-5715] Make email, owner context available
feng-tao merged pull request #6385: [AIRFLOW-5715] Make email, owner context available URL: https://github.com/apache/airflow/pull/6385 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-5720) don't call _get_connections_from_db in TestCloudSqlDatabaseHook
Daniel Standish created AIRFLOW-5720: Summary: don't call _get_connections_from_db in TestCloudSqlDatabaseHook Key: AIRFLOW-5720 URL: https://issues.apache.org/jira/browse/AIRFLOW-5720 Project: Apache Airflow Issue Type: New Feature Components: gcp Affects Versions: 1.10.5 Reporter: Daniel Standish Assignee: Daniel Standish Test should not reference "private" methods. This impedes refactoring. Also some other issues with these tests. More detail TBD -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow-site] mik-laj merged pull request #86: Add features components
mik-laj merged pull request #86: Add features components URL: https://github.com/apache/airflow-site/pull/86 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-5714) Collect SLA miss emails only from tasks missed SLA
[ https://issues.apache.org/jira/browse/AIRFLOW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao-Han Tsai resolved AIRFLOW-5714. Fix Version/s: 1.10.7 Resolution: Fixed > Collect SLA miss emails only from tasks missed SLA > -- > > Key: AIRFLOW-5714 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5714 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.7 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > Fix For: 1.10.7 > > > Currently when a task in the DAG missed the SLA, Airflow would traverse > through all the tasks in the DAG and collect all the task-level emails. Then > Airflow would send an SLA miss email to all those collected emails, which can > add unnecessary noise to task owners that does not contribute to the SLA miss. > Thus, changing the code to only collect emails from the tasks that missed the > SLA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5714) Collect SLA miss emails only from tasks missed SLA
[ https://issues.apache.org/jira/browse/AIRFLOW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957268#comment-16957268 ] ASF GitHub Bot commented on AIRFLOW-5714: - milton0825 commented on pull request #6384: [AIRFLOW-5714] Collect SLA miss emails only from tasks missed SLA URL: https://github.com/apache/airflow/pull/6384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Collect SLA miss emails only from tasks missed SLA > -- > > Key: AIRFLOW-5714 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5714 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.7 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently when a task in the DAG missed the SLA, Airflow would traverse > through all the tasks in the DAG and collect all the task-level emails. Then > Airflow would send an SLA miss email to all those collected emails, which can > add unnecessary noise to task owners that does not contribute to the SLA miss. > Thus, changing the code to only collect emails from the tasks that missed the > SLA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5714) Collect SLA miss emails only from tasks missed SLA
[ https://issues.apache.org/jira/browse/AIRFLOW-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957269#comment-16957269 ] ASF subversion and git services commented on AIRFLOW-5714: -- Commit bc5341223429efe777af7b492ad09b96c7c77b17 in airflow's branch refs/heads/master from Chao-Han Tsai [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=bc53412 ] [AIRFLOW-5714] Collect SLA miss emails only from tasks missed SLA (#6384) Currently when a task in the DAG missed the SLA, Airflow would traverse through all the tasks in the DAG and collect all the task-level emails. Then Airflow would send an SLA miss email to all those collected emails, which can add unnecessary noise to task owners that does not contribute to the SLA miss. Thus, changing the code to only collect emails from the tasks that missed the SLA. > Collect SLA miss emails only from tasks missed SLA > -- > > Key: AIRFLOW-5714 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5714 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Affects Versions: 1.10.7 >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Currently when a task in the DAG missed the SLA, Airflow would traverse > through all the tasks in the DAG and collect all the task-level emails. Then > Airflow would send an SLA miss email to all those collected emails, which can > add unnecessary noise to task owners that does not contribute to the SLA miss. > Thus, changing the code to only collect emails from the tasks that missed the > SLA. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] milton0825 merged pull request #6384: [AIRFLOW-5714] Collect SLA miss emails only from tasks missed SLA
milton0825 merged pull request #6384: [AIRFLOW-5714] Collect SLA miss emails only from tasks missed SLA URL: https://github.com/apache/airflow/pull/6384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io commented on issue #6385: [AIRFLOW-5715] Make email, owner context available
codecov-io commented on issue #6385: [AIRFLOW-5715] Make email, owner context available URL: https://github.com/apache/airflow/pull/6385#issuecomment-545096426 # [Codecov](https://codecov.io/gh/apache/airflow/pull/6385?src=pr&el=h1) Report > Merging [#6385](https://codecov.io/gh/apache/airflow/pull/6385?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/3cfe4a1c9dc49c91839e9c278b97f9c18033fdf4?src=pr&el=desc) will **decrease** coverage by `0.28%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/6385/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/6385?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#6385 +/- ## == - Coverage 80.57% 80.29% -0.29% == Files 626 626 Lines 3623736248 +11 == - Hits2919829105 -93 - Misses 7039 7143 +104 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/6385?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/utils/operator\_helpers.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9vcGVyYXRvcl9oZWxwZXJzLnB5) | `100% <100%> (ø)` | :arrow_up: | | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: | | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: | | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: | | [airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5) | `33.33% <0%> (-41.67%)` | :arrow_down: | | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `70.14% <0%> (-28.36%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `58.56% <0%> (-0.34%)` | :arrow_down: | | [airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==) | `91.43% <0%> (+1.52%)` | :arrow_up: | | [airflow/jobs/local\_task\_job.py](https://codecov.io/gh/apache/airflow/pull/6385/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL2xvY2FsX3Rhc2tfam9iLnB5) | `90% <0%> (+5%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/6385?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/6385?src=pr&el=footer). Last update [3cfe4a1...ae58749](https://codecov.io/gh/apache/airflow/pull/6385?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] BasPH commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage
BasPH commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage URL: https://github.com/apache/airflow/pull/6317#discussion_r337683917 ## File path: airflow/operators/dagrun_operator.py ## @@ -18,81 +18,64 @@ # under the License. import datetime -import json -from typing import Callable, Dict, Optional, Union +from typing import Dict, Optional, Union from airflow.api.common.experimental.trigger_dag import trigger_dag from airflow.models import BaseOperator from airflow.utils import timezone from airflow.utils.decorators import apply_defaults -class DagRunOrder: -def __init__(self, run_id=None, payload=None): -self.run_id = run_id -self.payload = payload - - class TriggerDagRunOperator(BaseOperator): """ Triggers a DAG run for a specified ``dag_id`` :param trigger_dag_id: the dag_id to trigger (templated) :type trigger_dag_id: str -:param python_callable: a reference to a python function that will be -called while passing it the ``context`` object and a placeholder -object ``obj`` for your callable to fill and return if you want -a DagRun created. This ``obj`` object contains a ``run_id`` and -``payload`` attribute that you can modify in your function. -The ``run_id`` should be a unique identifier for that DAG run, and -the payload has to be a picklable object that will be made available -to your tasks while executing that DAG run. Your function header -should look like ``def foo(context, dag_run_obj):`` -:type python_callable: python callable +:param conf: Configuration for the DAG run +:type conf: dict :param execution_date: Execution date for the dag (templated) :type execution_date: str or datetime.datetime """ -template_fields = ('trigger_dag_id', 'execution_date') -ui_color = '#ffefeb' + +template_fields = ("trigger_dag_id", "execution_date", "conf") +ui_color = "#ffefeb" @apply_defaults def __init__( -self, -trigger_dag_id: str, -python_callable: Optional[Callable[[Dict, DagRunOrder], DagRunOrder]] = None, -execution_date: Optional[Union[str, datetime.datetime]] = None, -*args, **kwargs) -> None: +self, +trigger_dag_id: str, +conf: Optional[Dict] = None, +execution_date: Optional[Union[str, datetime.datetime]] = None, +*args, +**kwargs +) -> None: super().__init__(*args, **kwargs) -self.python_callable = python_callable self.trigger_dag_id = trigger_dag_id +self.conf = conf -self.execution_date = None # type: Optional[Union[str, datetime.datetime]] -if isinstance(execution_date, datetime.datetime): -self.execution_date = execution_date.isoformat() -elif isinstance(execution_date, str): +if execution_date is None or isinstance(execution_date, (str, datetime.datetime)): self.execution_date = execution_date -elif execution_date is None: -self.execution_date = None else: raise TypeError( -'Expected str or datetime.datetime type ' -'for execution_date. Got {}'.format( -type(execution_date))) +"Expected str or datetime.datetime type for execution_date. " +"Got {}".format(type(execution_date)) +) -def execute(self, context): -if self.execution_date is not None: -run_id = 'trig__{}'.format(self.execution_date) -self.execution_date = timezone.parse(self.execution_date) +def execute(self, context: Dict): +if isinstance(self.execution_date, datetime.datetime): +run_id = "trig__{}".format(self.execution_date.isoformat()) +elif isinstance(self.execution_date, str): +run_id = "trig__{}".format(self.execution_date) +self.execution_date = timezone.parse(self.execution_date) # trigger_dag() expects datetime Review comment: Much simpler, much better👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-5663) PythonVirtualenvOperator doesn't print logs until the operator is finished
[ https://issues.apache.org/jira/browse/AIRFLOW-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957243#comment-16957243 ] ASF GitHub Bot commented on AIRFLOW-5663: - wjiangqc commented on pull request #6389: [AIRFLOW-5663]: Switch to real-time logging in PythonVirtualenvOperator. URL: https://github.com/apache/airflow/pull/6389 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-5663 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Allow us to see the logs while the operator is running. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Existing 19 unit tests passed. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > PythonVirtualenvOperator doesn't print logs until the operator is finished > -- > > Key: AIRFLOW-5663 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5663 > Project: Apache Airflow > Issue Type: Bug > Components: operators >Affects Versions: 1.10.5 >Reporter: Wei Jiang >Priority: Major > > It seems the {{PythonVirtualenvOperator }}doesn't print out the log until the > job is done. When we use logging in {{PythonOperator}}, it does print out > logging in real time. It would be nice for the {{PythonVirtualenvOperator}} > to have the same functionality as the {{PythonOperator}} so we can see the > logging output as the PythonVirtualenvOperator makes progress. > [https://github.com/apache/airflow/blob/master/airflow/operators/python_operator.py#L332] > {code:python} > def _execute_in_subprocess(self, cmd): > try: > self.log.info("Executing cmd\n%s", cmd) > output = subprocess.check_output(cmd, > stderr=subprocess.STDOUT, > close_fds=True) > if output: > self.log.info("Got output\n%s", output) > except subprocess.CalledProcessError as e: > self.log.info("Got error output\n%s", e.output) > raise > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ashb commented on a change in pull request #6377: [AIRFLOW-5589] monitor pods by labels instead of names
ashb commented on a change in pull request #6377: [AIRFLOW-5589] monitor pods by labels instead of names URL: https://github.com/apache/airflow/pull/6377#discussion_r337668858 ## File path: tests/integration/kubernetes/test_kubernetes_pod_operator.py ## @@ -61,7 +72,10 @@ def setUp(self): 'namespace': 'default', 'name': ANY, 'annotations': {}, -'labels': {'foo': 'bar'} +'labels': {'foo': 'bar', + 'exec_date': '2019-01-01T09-4bf747daa', Review comment: I wonder if it's worth using a non-standard date format here, rather than always hashing this label? (Ultimately it doesn't matter much to us as Airflow I guess.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] mik-laj commented on a change in pull request #6389: [AIRFLOW-5663]: Switch to real-time logging in PythonVirtualenvOperator.
mik-laj commented on a change in pull request #6389: [AIRFLOW-5663]: Switch to real-time logging in PythonVirtualenvOperator. URL: https://github.com/apache/airflow/pull/6389#discussion_r337667617 ## File path: airflow/operators/python_operator.py ## @@ -330,16 +330,20 @@ def _pass_op_args(self): return len(self.op_args) + len(self.op_kwargs) > 0 def _execute_in_subprocess(self, cmd): -try: -self.log.info("Executing cmd\n%s", cmd) -output = subprocess.check_output(cmd, - stderr=subprocess.STDOUT, - close_fds=True) -if output: -self.log.info("Got output\n%s", output) -except subprocess.CalledProcessError as e: -self.log.info("Got error output\n%s", e.output) -raise +self.log.info("Executing cmd\n{}".format(cmd)) +self._sp = subprocess.Popen(cmd, +stdout=subprocess.PIPE, +stderr=subprocess.STDOUT, +bufsize=0, +close_fds=True) +self.log.info("Got output\n") +with self._sp.stdout: +for line in iter(self._sp.stdout.readline, b''): +self.log.info(line) Review comment: ```suggestion self.log.info("%s", line) ``` Data should be passed as arguments so that they are written in a different color. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] mik-laj merged pull request #88: [depends on #84] Add video component
mik-laj merged pull request #88: [depends on #84] Add video component URL: https://github.com/apache/airflow-site/pull/88 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[airflow-site] branch aip-11 updated: [depends on #84] Add video component (#88)
This is an automated email from the ASF dual-hosted git repository. kamilbregula pushed a commit to branch aip-11 in repository https://gitbox.apache.org/repos/asf/airflow-site.git The following commit(s) were added to refs/heads/aip-11 by this push: new 6108236 [depends on #84] Add video component (#88) 6108236 is described below commit 61082363127a96bfdcf829dc68b9b822c1f73988 Author: Kamil Breguła AuthorDate: Tue Oct 22 19:53:48 2019 +0200 [depends on #84] Add video component (#88) Co-Authored-By: Kamil Gabryjelski --- landing-pages/site/assets/icons/play-icon.svg | 5 ++ landing-pages/site/assets/scss/_video.scss | 90 ++ landing-pages/site/assets/scss/main-custom.scss| 1 + landing-pages/site/data/videos.json| 56 ++ landing-pages/site/layouts/examples/list.html | 2 + .../site/layouts/partials/video-section.html | 42 ++ landing-pages/site/layouts/partials/youtube.html | 23 ++ landing-pages/src/index.js | 2 + .../js/handleActiveVideo.js} | 32 9 files changed, 239 insertions(+), 14 deletions(-) diff --git a/landing-pages/site/assets/icons/play-icon.svg b/landing-pages/site/assets/icons/play-icon.svg new file mode 100644 index 000..6f66cb0 --- /dev/null +++ b/landing-pages/site/assets/icons/play-icon.svg @@ -0,0 +1,5 @@ +http://www.w3.org/2000/svg"; width="20" height="20" viewBox="-0.5 -0.5 21 21"> + + diff --git a/landing-pages/site/assets/scss/_video.scss b/landing-pages/site/assets/scss/_video.scss new file mode 100644 index 000..f8dd998 --- /dev/null +++ b/landing-pages/site/assets/scss/_video.scss @@ -0,0 +1,90 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +@import "colors"; + +.video-section { + display: flex; + border: solid 1px #cbcbcb; + padding: 40px; +} + +.video-wrapper { + flex: 1; + + .video-container { +display: none; + +&:last-child { + display: block; +} + } + + .anchor { +position: fixed; + +&:target + .video-container { + display: block; +} + +&:target + .video-container ~ .video-container { + display: none; +} + } +} + +.video-list-wrapper { + overflow-y: scroll; + max-height: 403px; + max-width: 365px; + width: 100%; + margin-left: 40px; +} + +.video-list { + display: flex; + flex-direction: column-reverse; + justify-content: flex-end; + + &__item { +display: flex; +align-items: center; +border-bottom: solid 1px map-get($colors, very-light-pink); +padding: 16px 0; + +$item: &; +#{$item}--title { + @extend .bodytext__medium--brownish-grey; + margin-left: 9px; + vertical-align: middle; +} + +&:hover, &.active { + #{$item}--title { +font-weight: 500; + } + + svg { +path { + fill: map-get($colors, brownish-grey); + stroke: none; +} + } +} + } +} diff --git a/landing-pages/site/assets/scss/main-custom.scss b/landing-pages/site/assets/scss/main-custom.scss index 4b7330e..1d9bd8b 100644 --- a/landing-pages/site/assets/scss/main-custom.scss +++ b/landing-pages/site/assets/scss/main-custom.scss @@ -32,3 +32,4 @@ @import "case-study"; @import "paragraph"; @import "base-layout"; +@import "video"; diff --git a/landing-pages/site/data/videos.json b/landing-pages/site/data/videos.json new file mode 100644 index 000..64dd841 --- /dev/null +++ b/landing-pages/site/data/videos.json @@ -0,0 +1,56 @@ +[ + { +"name": "video", +"date": "2019-01-23", +"title": "Airflow Meetup, London 23 Jan 2019", +"videoID": "E0asAgpHvaI" + }, + { +"name": "video", +"date": "2019-04-05", +"title": "Airflow Meetup, London 05 Apr 2019", +"videoID": "uN-TvWzeEvA" + }, + { +"name": "video", +"date": "2019-09-30", +"title": "Airflow Meetup, London 30 Sep 2019", +"videoID": "nUfb4UxnvJk" + }, + { +"name": "video", +"date": "2019-09-30", +"title": "Airflow Meetup, London 30 Sep 2019", +"videoID": "OhWZavn2OvM" + }, + { +"name": "video", +"date": "2019-09-30", +"title": "Airflow Meetup, London 30 Sep
[GitHub] [airflow] wjiangqc opened a new pull request #6389: [AIRFLOW-5663]: Switch to real-time logging in PythonVirtualenvOperator.
wjiangqc opened a new pull request #6389: [AIRFLOW-5663]: Switch to real-time logging in PythonVirtualenvOperator. URL: https://github.com/apache/airflow/pull/6389 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-5663 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Allow us to see the logs while the operator is running. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Existing 19 unit tests passed. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
codecov-io edited a comment on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-544232286 # [Codecov](https://codecov.io/gh/apache/airflow/pull/6370?src=pr&el=h1) Report > Merging [#6370](https://codecov.io/gh/apache/airflow/pull/6370?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/3cfe4a1c9dc49c91839e9c278b97f9c18033fdf4?src=pr&el=desc) will **decrease** coverage by `0.3%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/6370/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/6370?src=pr&el=tree) ```diff @@Coverage Diff @@ ## master#6370 +/- ## == - Coverage 80.57% 80.26% -0.31% == Files 626 626 Lines 3623736229 -8 == - Hits2919829081 -117 - Misses 7039 7148 +109 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/6370?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [airflow/models/taskinstance.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvdGFza2luc3RhbmNlLnB5) | `93.73% <ø> (-0.06%)` | :arrow_down: | | [airflow/models/xcom.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMveGNvbS5weQ==) | `79.79% <100%> (-0.6%)` | :arrow_down: | | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: | | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: | | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: | | [airflow/kubernetes/kube\_client.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL2t1YmVfY2xpZW50LnB5) | `33.33% <0%> (-41.67%)` | :arrow_down: | | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `70.14% <0%> (-28.36%)` | :arrow_down: | | [airflow/ti\_deps/deps/base\_ti\_dep.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy90aV9kZXBzL2RlcHMvYmFzZV90aV9kZXAucHk=) | `85.71% <0%> (-4.77%)` | :arrow_down: | | [airflow/utils/dag\_processing.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9kYWdfcHJvY2Vzc2luZy5weQ==) | `58.73% <0%> (-0.17%)` | :arrow_down: | | [airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/6370/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==) | `91.43% <0%> (+1.52%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/6370?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/6370?src=pr&el=footer). Last update [3cfe4a1...c9b9312](https://codecov.io/gh/apache/airflow/pull/6370?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] mik-laj merged pull request #84: Add accordion
mik-laj merged pull request #84: Add accordion URL: https://github.com/apache/airflow-site/pull/84 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[airflow-site] branch aip-11 updated: Add accordian component (#84)
This is an automated email from the ASF dual-hosted git repository. kamilbregula pushed a commit to branch aip-11 in repository https://gitbox.apache.org/repos/asf/airflow-site.git The following commit(s) were added to refs/heads/aip-11 by this push: new 4976f21 Add accordian component (#84) 4976f21 is described below commit 4976f218652975dff3e3c0cab5f699c68b0cba56 Author: Kamil Breguła AuthorDate: Tue Oct 22 19:40:01 2019 +0200 Add accordian component (#84) Co-Authored-By: Kamil Gabryjelski --- .../site/assets/icons/ask-question-icon.svg| 16 +++ landing-pages/site/assets/icons/bug-icon.svg | 31 ++ .../site/assets/icons/join-devlist-icon.svg| 16 +++ landing-pages/site/assets/scss/_accordion.scss | 20 -- landing-pages/site/content/en/examples/_index.html | 21 +++ .../site/layouts/shortcodes/accordion.html | 19 ++--- 6 files changed, 107 insertions(+), 16 deletions(-) diff --git a/landing-pages/site/assets/icons/ask-question-icon.svg b/landing-pages/site/assets/icons/ask-question-icon.svg new file mode 100644 index 000..33916d3 --- /dev/null +++ b/landing-pages/site/assets/icons/ask-question-icon.svg @@ -0,0 +1,16 @@ +http://www.w3.org/2000/svg"; width="60" height="55.876" viewBox="0 0 60 55.876"> + + + + + + + + + + + + + diff --git a/landing-pages/site/assets/icons/bug-icon.svg b/landing-pages/site/assets/icons/bug-icon.svg new file mode 100644 index 000..7bfb35b --- /dev/null +++ b/landing-pages/site/assets/icons/bug-icon.svg @@ -0,0 +1,31 @@ +http://www.w3.org/2000/svg"; width="54.425" height="60" viewBox="0 0 54.425 60"> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/landing-pages/site/assets/icons/join-devlist-icon.svg b/landing-pages/site/assets/icons/join-devlist-icon.svg new file mode 100644 index 000..48a9ec5 --- /dev/null +++ b/landing-pages/site/assets/icons/join-devlist-icon.svg @@ -0,0 +1,16 @@ +http://www.w3.org/2000/svg"; width="60" height="43.846" viewBox="0 0 60 43.846"> + + + + + + + + + + + + + + + diff --git a/landing-pages/site/assets/scss/_accordion.scss b/landing-pages/site/assets/scss/_accordion.scss index 7d9e96d..392e3e5 100644 --- a/landing-pages/site/assets/scss/_accordion.scss +++ b/landing-pages/site/assets/scss/_accordion.scss @@ -16,7 +16,6 @@ * specific language governing permissions and limitations * under the License. */ - @import "colors"; details.accordion { @@ -36,16 +35,21 @@ details.accordion { display: none; } - .accordion__header { -color: map_get($colors, cerulean-blue); -margin-bottom: 20px; - } + .accordion__summary-content { +display: flex; - .accordion__description { -color: map_get($colors, brownish-grey); +svg { + width: 60px; + margin-top: 28px; + margin-right: 42px; +} + +&--header { + margin-bottom: 20px; +} } - .accordion__icon { + .accordion__arrow { display: flex; position: absolute; width: 36px; diff --git a/landing-pages/site/content/en/examples/_index.html b/landing-pages/site/content/en/examples/_index.html index a9b2fc3..4d6085b 100644 --- a/landing-pages/site/content/en/examples/_index.html +++ b/landing-pages/site/content/en/examples/_index.html @@ -43,16 +43,27 @@ menu: {{< blocks/section color="white" >}} -{{< accordion title="Accordion 1" description="Tincidunt ornare massa eget egestas purus viverra." >}} -Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Tellus at urna condimentum mattis pellentesque id. Neque laoreet suspendisse interdum consectetur. Non blandit massa enim nec. -{{< /accordion >}} -{{< accordion title="Accordion 2" description="Vestibulum morbi blandit cursus risus at." >}} +{{< accordion title="Install Apache Airflow locally" description="Working on an Open Source project such as Apache Airflow is very demanding but also equally rewarding when you realize how many businesses use it every day." >}} Magna ac placerat vestibulum lectus mauris ultrices. Nullam non nisi est sit amet facilisis magna etiam tempor. Aliquet nec ullamcorper sit amet risus nullam eget felis. Rhoncus aenean vel elit scelerisque mauris pellentesque. {{< /accordion >}} -{{< accordion title="Accordion 3" description="Id faucibus nisl tincidunt eget." >}} +{{< accordion title="In
[GitHub] [airflow] Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution
Fokko commented on issue #6370: AIRFLOW-5701: Don't clear xcom explicitly before execution URL: https://github.com/apache/airflow/pull/6370#issuecomment-545043480 Test failure looks unrelated, pulled in master: ``` Summary of failed tests tests.gcp.hooks.test_gcs.TestGoogleCloudStorageHookUpload.test_upload_data_str_gzip Failure:builtins.AssertionError ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage
Fokko commented on a change in pull request #6317: [AIRFLOW-5644] Simplify TriggerDagRunOperator usage URL: https://github.com/apache/airflow/pull/6317#discussion_r337613334 ## File path: airflow/operators/dagrun_operator.py ## @@ -75,24 +63,21 @@ def __init__( self.execution_date = None else: raise TypeError( -'Expected str or datetime.datetime type ' -'for execution_date. Got {}'.format( -type(execution_date))) +"Expected str or datetime.datetime type for execution_date. " +"Got {}".format(type(execution_date)) +) def execute(self, context): Review comment: No, it is okay like this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] kgabryje opened a new pull request #89: Feature/blog tags
kgabryje opened a new pull request #89: Feature/blog tags URL: https://github.com/apache/airflow-site/pull/89 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Resolved] (AIRFLOW-5695) Cannot run a task from UI if its state is None
[ https://issues.apache.org/jira/browse/AIRFLOW-5695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor resolved AIRFLOW-5695. Fix Version/s: 1.10.7 Resolution: Fixed > Cannot run a task from UI if its state is None > -- > > Key: AIRFLOW-5695 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5695 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: 1.10.4 >Reporter: Ping Zhang >Priority: Major > Fix For: 1.10.7 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow-site] mik-laj opened a new pull request #88: [depends on #84] Add video component
mik-laj opened a new pull request #88: [depends on #84] Add video component URL: https://github.com/apache/airflow-site/pull/88 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[airflow-site] branch aip-11 updated (d17ee44 -> dbb402c)
This is an automated email from the ASF dual-hosted git repository. kamilbregula pushed a change to branch aip-11 in repository https://gitbox.apache.org/repos/asf/airflow-site.git. discard d17ee44 [depends on #84] Add video component (#85) This update removed existing revisions from the reference, leaving the reference pointing at a previous point in the repository history. * -- * -- N refs/heads/aip-11 (dbb402c) \ O -- O -- O (d17ee44) Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: .../site/assets/icons/ask-question-icon.svg| 16 landing-pages/site/assets/icons/bug-icon.svg | 31 .../site/assets/icons/join-devlist-icon.svg| 16 landing-pages/site/assets/icons/play-icon.svg | 5 -- landing-pages/site/assets/scss/_accordion.scss | 20 ++--- landing-pages/site/assets/scss/_video.scss | 90 -- landing-pages/site/assets/scss/main-custom.scss| 1 - landing-pages/site/content/en/examples/_index.html | 21 ++--- landing-pages/site/data/videos.json| 56 -- landing-pages/site/layouts/examples/list.html | 2 - .../site/layouts/partials/video-section.html | 42 -- landing-pages/site/layouts/partials/youtube.html | 23 -- .../site/layouts/shortcodes/accordion.html | 19 + landing-pages/src/index.js | 2 - landing-pages/src/js/handleActiveVideo.js | 38 - 15 files changed, 16 insertions(+), 366 deletions(-) delete mode 100644 landing-pages/site/assets/icons/ask-question-icon.svg delete mode 100644 landing-pages/site/assets/icons/bug-icon.svg delete mode 100644 landing-pages/site/assets/icons/join-devlist-icon.svg delete mode 100644 landing-pages/site/assets/icons/play-icon.svg delete mode 100644 landing-pages/site/assets/scss/_video.scss delete mode 100644 landing-pages/site/data/videos.json delete mode 100644 landing-pages/site/layouts/partials/video-section.html delete mode 100644 landing-pages/site/layouts/partials/youtube.html delete mode 100644 landing-pages/src/js/handleActiveVideo.js
[GitHub] [airflow-site] mik-laj merged pull request #85: [depends on #84] Add video component
mik-laj merged pull request #85: [depends on #84] Add video component URL: https://github.com/apache/airflow-site/pull/85 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[airflow-site] branch aip-11 updated: [depends on #84] Add video component (#85)
This is an automated email from the ASF dual-hosted git repository. kamilbregula pushed a commit to branch aip-11 in repository https://gitbox.apache.org/repos/asf/airflow-site.git The following commit(s) were added to refs/heads/aip-11 by this push: new d17ee44 [depends on #84] Add video component (#85) d17ee44 is described below commit d17ee44b86b21a4fc39893988f04a95d99047d17 Author: Kamil Breguła AuthorDate: Tue Oct 22 15:39:36 2019 +0200 [depends on #84] Add video component (#85) Co-Authored-By: Kamil Gabryjelski --- .../site/assets/icons/ask-question-icon.svg| 16 landing-pages/site/assets/icons/bug-icon.svg | 31 .../site/assets/icons/join-devlist-icon.svg| 16 landing-pages/site/assets/icons/play-icon.svg | 5 ++ landing-pages/site/assets/scss/_accordion.scss | 20 +++-- landing-pages/site/assets/scss/_video.scss | 90 ++ landing-pages/site/assets/scss/main-custom.scss| 1 + landing-pages/site/content/en/examples/_index.html | 21 +++-- landing-pages/site/data/videos.json| 56 ++ landing-pages/site/layouts/examples/list.html | 2 + .../site/layouts/partials/video-section.html | 42 ++ .../accordion.html => partials/youtube.html} | 14 +--- .../site/layouts/shortcodes/accordion.html | 19 - landing-pages/src/index.js | 2 + .../js/handleActiveVideo.js} | 32 15 files changed, 327 insertions(+), 40 deletions(-) diff --git a/landing-pages/site/assets/icons/ask-question-icon.svg b/landing-pages/site/assets/icons/ask-question-icon.svg new file mode 100644 index 000..33916d3 --- /dev/null +++ b/landing-pages/site/assets/icons/ask-question-icon.svg @@ -0,0 +1,16 @@ +http://www.w3.org/2000/svg"; width="60" height="55.876" viewBox="0 0 60 55.876"> + + + + + + + + + + + + + diff --git a/landing-pages/site/assets/icons/bug-icon.svg b/landing-pages/site/assets/icons/bug-icon.svg new file mode 100644 index 000..7bfb35b --- /dev/null +++ b/landing-pages/site/assets/icons/bug-icon.svg @@ -0,0 +1,31 @@ +http://www.w3.org/2000/svg"; width="54.425" height="60" viewBox="0 0 54.425 60"> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/landing-pages/site/assets/icons/join-devlist-icon.svg b/landing-pages/site/assets/icons/join-devlist-icon.svg new file mode 100644 index 000..48a9ec5 --- /dev/null +++ b/landing-pages/site/assets/icons/join-devlist-icon.svg @@ -0,0 +1,16 @@ +http://www.w3.org/2000/svg"; width="60" height="43.846" viewBox="0 0 60 43.846"> + + + + + + + + + + + + + + + diff --git a/landing-pages/site/assets/icons/play-icon.svg b/landing-pages/site/assets/icons/play-icon.svg new file mode 100644 index 000..6f66cb0 --- /dev/null +++ b/landing-pages/site/assets/icons/play-icon.svg @@ -0,0 +1,5 @@ +http://www.w3.org/2000/svg"; width="20" height="20" viewBox="-0.5 -0.5 21 21"> + + diff --git a/landing-pages/site/assets/scss/_accordion.scss b/landing-pages/site/assets/scss/_accordion.scss index 7d9e96d..392e3e5 100644 --- a/landing-pages/site/assets/scss/_accordion.scss +++ b/landing-pages/site/assets/scss/_accordion.scss @@ -16,7 +16,6 @@ * specific language governing permissions and limitations * under the License. */ - @import "colors"; details.accordion { @@ -36,16 +35,21 @@ details.accordion { display: none; } - .accordion__header { -color: map_get($colors, cerulean-blue); -margin-bottom: 20px; - } + .accordion__summary-content { +display: flex; - .accordion__description { -color: map_get($colors, brownish-grey); +svg { + width: 60px; + margin-top: 28px; + margin-right: 42px; +} + +&--header { + margin-bottom: 20px; +} } - .accordion__icon { + .accordion__arrow { display: flex; position: absolute; width: 36px; diff --git a/landing-pages/site/assets/scss/_video.scss b/landing-pages/site/assets/scss/_video.scss new file mode 100644 index 000..f8dd998 --- /dev/null +++ b/landing-pages/site/assets/scss/_video.scss @@ -0,0 +1,90 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you
[jira] [Resolved] (AIRFLOW-5632) Tests for renaming GCP operators
[ https://issues.apache.org/jira/browse/AIRFLOW-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Bregula resolved AIRFLOW-5632. Fix Version/s: 2.0.0 Resolution: Fixed > Tests for renaming GCP operators > - > > Key: AIRFLOW-5632 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5632 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: 1.10.5 >Reporter: Michał Słowikowski >Assignee: Michał Słowikowski >Priority: Minor > Fix For: 2.0.0 > > > Tests include: > * if importing old module raising DeprecationWarning > * if initializing deprecated class raising DeprecationWarning > * if the old class is the new class subclass > * check if new module path is shown in warning message -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5632) Tests for renaming GCP operators
[ https://issues.apache.org/jira/browse/AIRFLOW-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957051#comment-16957051 ] ASF subversion and git services commented on AIRFLOW-5632: -- Commit 3cfe4a1c9dc49c91839e9c278b97f9c18033fdf4 in airflow's branch refs/heads/master from mislo [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=3cfe4a1 ] [AIRFLOW-5632] Rename ComputeEngine operators (#6306) > Tests for renaming GCP operators > - > > Key: AIRFLOW-5632 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5632 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: 1.10.5 >Reporter: Michał Słowikowski >Assignee: Michał Słowikowski >Priority: Minor > > Tests include: > * if importing old module raising DeprecationWarning > * if initializing deprecated class raising DeprecationWarning > * if the old class is the new class subclass > * check if new module path is shown in warning message -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-5632) Tests for renaming GCP operators
[ https://issues.apache.org/jira/browse/AIRFLOW-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957050#comment-16957050 ] ASF GitHub Bot commented on AIRFLOW-5632: - mik-laj commented on pull request #6306: [AIRFLOW-5632] Rename ComputeEngine operators URL: https://github.com/apache/airflow/pull/6306 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Tests for renaming GCP operators > - > > Key: AIRFLOW-5632 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5632 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: 1.10.5 >Reporter: Michał Słowikowski >Assignee: Michał Słowikowski >Priority: Minor > > Tests include: > * if importing old module raising DeprecationWarning > * if initializing deprecated class raising DeprecationWarning > * if the old class is the new class subclass > * check if new module path is shown in warning message -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] mik-laj merged pull request #6306: [AIRFLOW-5632] Rename ComputeEngine operators
mik-laj merged pull request #6306: [AIRFLOW-5632] Rename ComputeEngine operators URL: https://github.com/apache/airflow/pull/6306 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] mik-laj opened a new pull request #87: [depends on #85, #86, #87] Add blog layout
mik-laj opened a new pull request #87: [depends on #85, #86, #87] Add blog layout URL: https://github.com/apache/airflow-site/pull/87 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] mik-laj opened a new pull request #86: Add features components
mik-laj opened a new pull request #86: Add features components URL: https://github.com/apache/airflow-site/pull/86 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] mik-laj opened a new pull request #85: [depends on #84]Add video component
mik-laj opened a new pull request #85: [depends on #84]Add video component URL: https://github.com/apache/airflow-site/pull/85 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow-site] mik-laj opened a new pull request #84: Add accordion
mik-laj opened a new pull request #84: Add accordion URL: https://github.com/apache/airflow-site/pull/84 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] Fokko commented on a change in pull request #6210: [AIRFLOW-5567] BaseAsyncOperator
Fokko commented on a change in pull request #6210: [AIRFLOW-5567] BaseAsyncOperator URL: https://github.com/apache/airflow/pull/6210#discussion_r337499032 ## File path: airflow/models/base_reschedule_poke_operator.py ## @@ -0,0 +1,218 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +""" +Base Operator for kicking off a long running +operations and polling for completion with reschedule mode. +""" + +from abc import ABC, abstractmethod +from typing import Dict, List, Iterable, Optional, Union +from time import sleep +from datetime import timedelta + +from airflow.exceptions import AirflowException, AirflowSensorTimeout, \ +AirflowSkipException, AirflowRescheduleException +from airflow.models import SkipMixin, TaskReschedule +from airflow.models.xcom import XCOM_EXTERNAL_RESOURCE_ID_KEY +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils import timezone +from airflow.utils.decorators import apply_defaults +from airflow.ti_deps.deps.ready_to_reschedule import ReadyToRescheduleDep + + +class BaseReschedulePokeOperator(BaseOperator, SkipMixin, ABC): +""" +ReschedulePokeOperators are derived from this class and inherit these attributes. +ReschedulePokeOperators should be used for long running operations where the task +can tolerate a longer poke interval. They use the task rescheduling +mechanism similar to sensors to avoid occupying a worker slot between +pokes. + +Developing concrete operators that provide parameterized flexibility +for synchronous or asynchronous poking depending on the invocation is +possible by programing against this `BaseReschedulePokeOperator` interface, +and overriding the execute method as demonstrated below. + +.. code-block:: python + +class DummyFlexiblePokingOperator(BaseReschedulePokeOperator): + def __init__(self, async=False, *args, **kwargs): +self.async = async +super().__init(*args, **kwargs) + + def execute(self, context: Dict) -> None: +if self.async: + # use the BaseReschedulePokeOperator's execute + super().execute(context) +else: + self.submit_request(context) + while not self.poke(): +time.sleep(self.poke_interval) + self.process_results(context) + + def sumbit_request(self, context: Dict) -> Optional[str]: +return None + + def poke(self, context: Dict) -> bool: +return bool(random.getrandbits(1)) + +ReschedulePokeOperators must override the following methods: +:py:meth:`submit_request`: fire a request for a long running operation +:py:meth:`poke`: a method to check if the long running operation is +complete it should return True when a success criteria is met. + +Optionally, ReschedulePokeOperators can override: +:py:meth:`process_result` to perform any operations after the success +criteria is met in :py:meth: `poke` + +:py:meth:`poke` is executed at a time interval and succeed when a +criteria is met and fail if and when they time out. + +:param soft_fail: Set to true to mark the task as SKIPPED on failure +:type soft_fail: bool +:param poke_interval: Time in seconds that the job should wait in +between each tries +:type poke_interval: int +:param timeout: Time, in seconds before the task times out and fails. +:type timeout: int + +""" +ui_color = '#9933ff' # type: str + +@apply_defaults +def __init__(self, + *args, + **kwargs) -> None: +super().__init__(mode='reschedule', *args, **kwargs) + +@abstractmethod +def submit_request(self, context: Dict) -> Optional[Union[str, List, Dict]]: +""" +This method should kick off a long running operation. +This method should return the ID for the long running operation if +applicable. +Context is the same dictionary used as when rendering jinja templates. + +Refer to get_template_context for more context. + +:returns: a resource_id for the long running operation. +
[GitHub] [airflow] Fokko commented on a change in pull request #6210: [AIRFLOW-5567] BaseAsyncOperator
Fokko commented on a change in pull request #6210: [AIRFLOW-5567] BaseAsyncOperator URL: https://github.com/apache/airflow/pull/6210#discussion_r337498162 ## File path: airflow/gcp/hooks/dataproc.py ## @@ -485,7 +490,8 @@ def submit( project_id: str, job: Dict, region: str = 'global', -job_error_states: Optional[Iterable[str]] = None +job_error_states: Optional[Iterable[str]] = None, +async: bool = False Review comment: I believe this is a reserved keyword This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-5694) Check for blinker when detecting Sentry packages
[ https://issues.apache.org/jira/browse/AIRFLOW-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957024#comment-16957024 ] ASF subversion and git services commented on AIRFLOW-5694: -- Commit 770927ae8d12135adac7430a9ab6b39a6f4a1e68 in airflow's branch refs/heads/v1-10-test from Marcus Levine [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=770927a ] [AIRFLOW-5694] Check for blinker in Sentry setup (#6365) (cherry picked from commit c72c42730236fee1526fcc03dca7f88e1778ee94) > Check for blinker when detecting Sentry packages > > > Key: AIRFLOW-5694 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5694 > Project: Apache Airflow > Issue Type: Bug > Components: dependencies >Affects Versions: 1.10.6 >Reporter: Marcus Levine >Assignee: Marcus Levine >Priority: Minor > Fix For: 1.10.7 > > > After upgrading to 1.10.6rc1 with `sentry-sdk` installed but not specifying > the `[sentry]` extra, the dependency `blinker` will cause failures of the > following form: > {code:python} > ../lib/python3.7/site-packages/airflow/__init__.py:40: in > from airflow.models import DAG > ../lib/python3.7/site-packages/airflow/models/__init__.py:21: in > from airflow.models.baseoperator import BaseOperator # noqa: F401 > ../lib/python3.7/site-packages/airflow/models/baseoperator.py:42: in > from airflow.models.dag import DAG > ../lib/python3.7/site-packages/airflow/models/dag.py:51: in > from airflow.models.taskinstance import TaskInstance, clear_task_instances > ../lib/python3.7/site-packages/airflow/models/taskinstance.py:53: in > from airflow.sentry import Sentry > ../lib/python3.7/site-packages/airflow/sentry.py:167: in > Sentry = ConfiguredSentry() > ../lib/python3.7/site-packages/airflow/sentry.py:94: in __init__ > init(integrations=integrations) > ../lib/python3.7/site-packages/sentry_sdk/hub.py:81: in _init > client = Client(*args, **kwargs) # type: ignore > ../lib/python3.7/site-packages/sentry_sdk/client.py:80: in __init__ > self._init_impl() > ../lib/python3.7/site-packages/sentry_sdk/client.py:108: in _init_impl > with_defaults=self.options["default_integrations"], > ../lib/python3.7/site-packages/sentry_sdk/integrations/__init__.py:82: in > setup_integrations > type(integration).setup_once() > ../lib/python3.7/site-packages/sentry_sdk/integrations/flask.py:57: in > setup_once > appcontext_pushed.connect(_push_appctx) > ../lib/python3.7/site-packages/flask/signals.py:39: in _fail > "Signalling support is unavailable because the blinker" > E RuntimeError: Signalling support is unavailable because the blinker > library is not installed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] acroos commented on a change in pull request #6380: [AIRFLOW-3632] Allow replace_microseconds in trigger_dag REST reqeust
acroos commented on a change in pull request #6380: [AIRFLOW-3632] Allow replace_microseconds in trigger_dag REST reqeust URL: https://github.com/apache/airflow/pull/6380#discussion_r337497804 ## File path: airflow/www/api/experimental/endpoints.py ## @@ -74,8 +75,12 @@ def trigger_dag(dag_id): return response +replace_microseconds = True +if 'replace_microseconds' in data: +replace_microseconds = to_boolean(data['replace_microseconds']) + try: -dr = trigger.trigger_dag(dag_id, run_id, conf, execution_date) +dr = trigger.trigger_dag(dag_id, run_id, conf, execution_date, replace_microseconds) Review comment: sounds great, will do! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-5719) Adding Knative Executor support to Airflow
Ash Berlin-Taylor created AIRFLOW-5719: -- Summary: Adding Knative Executor support to Airflow Key: AIRFLOW-5719 URL: https://issues.apache.org/jira/browse/AIRFLOW-5719 Project: Apache Airflow Issue Type: Epic Components: executor-kubernetes Affects Versions: 2.0.0 Reporter: Ash Berlin-Taylor Assignee: Daniel Imberman -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] ashb commented on a change in pull request #6377: [AIRFLOW-5589] monitor pods by labels instead of names
ashb commented on a change in pull request #6377: [AIRFLOW-5589] monitor pods by labels instead of names URL: https://github.com/apache/airflow/pull/6377#discussion_r337479786 ## File path: airflow/contrib/operators/kubernetes_pod_operator.py ## @@ -112,55 +113,55 @@ class KubernetesPodOperator(BaseOperator): # pylint: disable=too-many-instance- """ template_fields = ('cmds', 'arguments', 'env_vars', 'config_file') +@staticmethod +def create_labels_for_pod(context): +""" +Generate labels for the pod s.t. we can track it in case of Operator crash + +:param context: +:return: +""" +labels = { +'dag_id': context['dag'].dag_id, +'task_id': context['task'].task_id, +'exec_date': context['ts'] +} +# In the case of sub dags this is just useful +if context['dag'].parent_dag: +labels['parent_dag_id'] = context['dag'].parent_dag.dag_id +# Replace unsupported characters with dashes & trim at 63 chars (max allowed) Review comment: This comment probably doesn't reflect what make_safe_label_value does anymore so ```suggestion # Ensure that label is valid for Kube, and if not trucate/remove invalid chars and replace with short hash. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-5669) [AIRFLOW-5669] Rename GoogleCloudBaseHook to CloudBaseHook
[ https://issues.apache.org/jira/browse/AIRFLOW-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956983#comment-16956983 ] ASF GitHub Bot commented on AIRFLOW-5669: - michalslowikowski00 commented on pull request #6388: [AIRFLOW-5669] Rename GoogleCloudBaseHook to CloudBaseHook URL: https://github.com/apache/airflow/pull/6388 Part of AIP-21 - renamed `GoogleCloudBaseHook to CloudBaseHook` - added `added to gcp_api_base_hook.py deprecation warning` ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-5569 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [AIRFLOW-5669] Rename GoogleCloudBaseHook to CloudBaseHook > -- > > Key: AIRFLOW-5669 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5669 > Project: Apache Airflow > Issue Type: Sub-task > Components: gcp >Affects Versions: 1.10.5 >Reporter: Michał Słowikowski >Assignee: Michał Słowikowski >Priority: Minor > > Rename GoogleCloudBaseHook to CloudBaseHook -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] michalslowikowski00 opened a new pull request #6388: [AIRFLOW-5669] Rename GoogleCloudBaseHook to CloudBaseHook
michalslowikowski00 opened a new pull request #6388: [AIRFLOW-5669] Rename GoogleCloudBaseHook to CloudBaseHook URL: https://github.com/apache/airflow/pull/6388 Part of AIP-21 - renamed `GoogleCloudBaseHook to CloudBaseHook` - added `added to gcp_api_base_hook.py deprecation warning` ### Jira - [ ] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-5569 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. - In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)). - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Description - [ ] Here are some details about my PR, including screenshots of any UI changes: ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain docstrings that explain what it does - If you implement backwards incompatible changes, please leave a note in the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so we can assign it to a appropriate release This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r337465801 ## File path: setup.py ## @@ -287,46 +286,75 @@ def write_version(filename: str = os.path.join(*["airflow", "git_version"])): 'jira', 'mongomock', 'moto==1.3.5', +'mypy==0.720', 'nose', 'nose-ignore-docstring==0.2', 'nose-timer', 'parameterized', -'paramiko', 'pre-commit', 'pylint~=2.3.1', # to be upgraded after fixing https://github.com/PyCQA/pylint/issues/3123 # We should also disable checking docstring at the module level -'pysftp', -'pywinrm', -'qds-sdk>=1.9.6', 'rednose', 'requests_mock', 'yamllint' ] + +devel = sorted(devel + doc) + # IMPORTANT NOTE!!! # IF you are removing dependencies from the above list, please make sure that you also increase # DEPENDENCIES_EPOCH_NUMBER in the Dockerfile -if PY3: -devel += ['mypy==0.720'] -else: -devel += ['unittest2'] - devel_minreq = devel + kubernetes + mysql + doc + password + cgroups devel_hadoop = devel_minreq + hive + hdfs + webhdfs + kerberos -devel_all = (sendgrid + devel + all_dbs + doc + samba + slack + oracle + - docker + ssh + kubernetes + celery + redis + gcp + grpc + - datadog + zendesk + jdbc + ldap + kerberos + password + webhdfs + jenkins + - druid + pinot + segment + snowflake + elasticsearch + sentry + - atlas + azure + aws + salesforce + cgroups + papermill + virtualenv) -# Snakebite & Google Cloud Dataflow are not Python 3 compatible :'( -if PY3: -devel_ci = [package for package in devel_all if package not in -['snakebite>=2.7.8', 'snakebite[kerberos]>=2.7.8']] -else: -devel_ci = devel_all +all_packages = ( +async_packages + +atlas + +all_dbs + +aws + +azure + +celery + +cgroups + +datadog + +dask + +databricks + +datadog + +docker + +druid + +elasticsearch + +gcp + +grpc + +flask_oauth + +jdbc + +jenkins + +kerberos + +kubernetes + +ldap + +oracle + +papermill + +password + +pinot + +redis + +salesforce + +samba + +sendgrid + +sentry + +segment + +slack + +snowflake + +ssh + +statsd + +virtualenv + +webhdfs + +winrm + +zendesk +) + +# Snakebite is not Python 3 compatible :'( +all_packages = [package for package in all_packages if not package.startswith('snakebite')] Review comment: There's an open issue for upgrading/replacing snakebite. Removing Kerberos: eek no definetly not. Soem enterprises need kerberos support. If it's broken on Py3 then we need to fix that before 2.0.0 can be released. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] codecov-io edited a comment on issue #6306: [AIRFLOW-5632] Rename ComputeEngine operators
codecov-io edited a comment on issue #6306: [AIRFLOW-5632] Rename ComputeEngine operators URL: https://github.com/apache/airflow/pull/6306#issuecomment-541052532 # [Codecov](https://codecov.io/gh/apache/airflow/pull/6306?src=pr&el=h1) Report > :exclamation: No coverage uploaded for pull request base (`master@4e661f5`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit). > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/6306/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/6306?src=pr&el=tree) ```diff @@Coverage Diff@@ ## master#6306 +/- ## = Coverage ? 80.28% = Files ? 626 Lines ?36237 Branches ?0 = Hits ?29094 Misses? 7143 Partials ?0 ``` | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/6306?src=pr&el=tree) | Coverage Δ | | |---|---|---| | [...irflow/contrib/operators/gcp\_container\_operator.py](https://codecov.io/gh/apache/airflow/pull/6306/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9nY3BfY29udGFpbmVyX29wZXJhdG9yLnB5) | `100% <ø> (ø)` | | | [airflow/contrib/hooks/gcp\_compute\_hook.py](https://codecov.io/gh/apache/airflow/pull/6306/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2djcF9jb21wdXRlX2hvb2sucHk=) | `100% <100%> (ø)` | | | [airflow/gcp/operators/compute.py](https://codecov.io/gh/apache/airflow/pull/6306/diff?src=pr&el=tree#diff-YWlyZmxvdy9nY3Avb3BlcmF0b3JzL2NvbXB1dGUucHk=) | `98.49% <100%> (ø)` | | | [airflow/gcp/hooks/compute.py](https://codecov.io/gh/apache/airflow/pull/6306/diff?src=pr&el=tree#diff-YWlyZmxvdy9nY3AvaG9va3MvY29tcHV0ZS5weQ==) | `86.86% <100%> (ø)` | | | [airflow/contrib/hooks/gcp\_container\_hook.py](https://codecov.io/gh/apache/airflow/pull/6306/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2djcF9jb250YWluZXJfaG9vay5weQ==) | `100% <100%> (ø)` | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/6306?src=pr&el=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/6306?src=pr&el=footer). Last update [4e661f5...4b2ff7d](https://codecov.io/gh/apache/airflow/pull/6306?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6380: [AIRFLOW-3632] Allow replace_microseconds in trigger_dag REST reqeust
ashb commented on a change in pull request #6380: [AIRFLOW-3632] Allow replace_microseconds in trigger_dag REST reqeust URL: https://github.com/apache/airflow/pull/6380#discussion_r337444292 ## File path: airflow/www/api/experimental/endpoints.py ## @@ -74,8 +75,12 @@ def trigger_dag(dag_id): return response +replace_microseconds = True +if 'replace_microseconds' in data: +replace_microseconds = to_boolean(data['replace_microseconds']) + try: -dr = trigger.trigger_dag(dag_id, run_id, conf, execution_date) +dr = trigger.trigger_dag(dag_id, run_id, conf, execution_date, replace_microseconds) Review comment: It doesn't seen needed, other than it's sometimes "nice" to see whole seconds. For the API it is probably more sensible to default to un-trucated time, but this would be potentially "breaking" change, so we need to think about how we upgrade existing users. For the 1.10 release series I would probably suggest that the default shouldn't change, just in case someone is relying on it. If you could do this as two PRs (both against master) the first adding the behavour we talked about, but not changing the deafult behaviour (which we can get in to the next 1.10.x release, I will cherry-pick the merged PR back to the relase branch) and then once that is merged a second one that changes the default to be as you describe. How does that sound? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-5718) Add SFTPToGoogleCloudStorageOperator
Tobiasz Kedzierski created AIRFLOW-5718: --- Summary: Add SFTPToGoogleCloudStorageOperator Key: AIRFLOW-5718 URL: https://issues.apache.org/jira/browse/AIRFLOW-5718 Project: Apache Airflow Issue Type: Sub-task Components: gcp, operators Affects Versions: 1.10.5 Reporter: Tobiasz Kedzierski Assignee: Tobiasz Kedzierski -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4971) Add Google Display & Video 360 integration
[ https://issues.apache.org/jira/browse/AIRFLOW-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956879#comment-16956879 ] ASF subversion and git services commented on AIRFLOW-4971: -- Commit 16d7accb22c866d4fbf368e4d979dc1c4a41d93c in airflow's branch refs/heads/master from Tomek [ https://gitbox.apache.org/repos/asf?p=airflow.git;h=16d7acc ] [AIRFLOW-4971] Add Google Display & Video 360 integration (#6170) > Add Google Display & Video 360 integration > -- > > Key: AIRFLOW-4971 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4971 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: 1.10.3 >Reporter: Kamil Bregula >Priority: Major > > Hi > This project lacks integration with the Google Display & Video 360 service. I > would be happy if Airflow had proper operators and hooks that integrate with > this service. > Product Documentation: > https://developers.google.com/bid-manager/guides/getting-started-api > API Documentation: > https://developers.google.com/bid-manager/guides/getting-started-api > Lots of love -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AIRFLOW-4971) Add Google Display & Video 360 integration
[ https://issues.apache.org/jira/browse/AIRFLOW-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956878#comment-16956878 ] ASF GitHub Bot commented on AIRFLOW-4971: - mik-laj commented on pull request #6170: [AIRFLOW-4971] Add Google Display & Video 360 integration URL: https://github.com/apache/airflow/pull/6170 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Google Display & Video 360 integration > -- > > Key: AIRFLOW-4971 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4971 > Project: Apache Airflow > Issue Type: Improvement > Components: gcp >Affects Versions: 1.10.3 >Reporter: Kamil Bregula >Priority: Major > > Hi > This project lacks integration with the Google Display & Video 360 service. I > would be happy if Airflow had proper operators and hooks that integrate with > this service. > Product Documentation: > https://developers.google.com/bid-manager/guides/getting-started-api > API Documentation: > https://developers.google.com/bid-manager/guides/getting-started-api > Lots of love -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AIRFLOW-5717) Add get_tree_map method to SFTPHook
Tobiasz Kedzierski created AIRFLOW-5717: --- Summary: Add get_tree_map method to SFTPHook Key: AIRFLOW-5717 URL: https://issues.apache.org/jira/browse/AIRFLOW-5717 Project: Apache Airflow Issue Type: Improvement Components: hooks Affects Versions: 1.10.5 Reporter: Tobiasz Kedzierski Assignee: Tobiasz Kedzierski -- This message was sent by Atlassian Jira (v8.3.4#803005)