[GitHub] [airflow] dstandish edited a comment on pull request #15100: Add support for arbitrary json in conn uri format
dstandish edited a comment on pull request #15100: URL: https://github.com/apache/airflow/pull/15100#issuecomment-811659712 ok so to be clear @mik-laj you are in support of supporting arbitrary dict under `extra`, just not primative or list? OR, you do _not_ support this PR at all and you think extra should just be dict of strings i.e `'key1':'val1'` and so on, e.g. _not_ `'key1':{'key3':'key4'}` if you support this i'll remove the tests for primitive and list, so that the codebase does not explicitly endorse them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish edited a comment on pull request #15100: Add support for arbitrary json in conn uri format
dstandish edited a comment on pull request #15100: URL: https://github.com/apache/airflow/pull/15100#issuecomment-811659712 ok so to be clear @mik-laj you are in support of supporting arbitrary dict under `extra`, just not primative or list? OR, you do _not_ support this PR at all and you think extra should just be dict of strings i.e `'key1':'val1'` and so on, e.g. _not_ `'key1':{'key3':'key4'}` if you support this i'll remove the tests for primitive and list, so the codebase does not explicitly endorse them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on pull request #15124: Minor Helm Chart doc enhancements
ephraimbuddy commented on pull request #15124: URL: https://github.com/apache/airflow/pull/15124#issuecomment-811660724 There's a little mistake on the quickstart.rst https://github.com/apache/airflow/blob/master/docs/helm-chart/quick-start.rst for installing a release: `helm install airflow --n airflow .` should be `helm install airflow -n airflow .` Wondering if we can add it to this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish edited a comment on pull request #15100: Add support for arbitrary json in conn uri format
dstandish edited a comment on pull request #15100: URL: https://github.com/apache/airflow/pull/15100#issuecomment-811659712 ok so to be clear @mik-laj you are in support of supporting arbitrary dict under `extra`, just not primative or list? OR, you do _not_ support this PR at all and you think extra should just be dict of strings i.e `'key1':'val1'` and so on, e.g. _not_ `'key1':{'key3':'key4'}` if you support this i'll go ahead and remove primitive and list but keep arbitrary dict. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish commented on pull request #15100: Add support for arbitrary json in conn uri format
dstandish commented on pull request #15100: URL: https://github.com/apache/airflow/pull/15100#issuecomment-811659712 ok so to be clear @mik-laj you are in support of supporting arbitrary dict under `extra`, just not primative or list? OR, you do _not_ support this PR at all and you think extra should just be dict of strings i.e `'key1':'val1'` and so on, e.g. _not_ `'key1':{'key3':'key4'}` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish commented on a change in pull request #15013: Enable Connection creation from Vault parameters
dstandish commented on a change in pull request #15013: URL: https://github.com/apache/airflow/pull/15013#discussion_r605388747 ## File path: tests/providers/hashicorp/secrets/test_vault.py ## @@ -59,6 +59,48 @@ def test_get_conn_uri(self, mock_hvac): returned_uri = test_client.get_conn_uri(conn_id="test_postgres") assert 'postgresql://airflow:airflow@host:5432/airflow' == returned_uri + @mock.patch("airflow.providers.hashicorp._internal_client.vault_client.hvac") +def test_get_connection(self, mock_hvac): +mock_client = mock.MagicMock() +mock_hvac.Client.return_value = mock_client +mock_client.secrets.kv.v2.read_secret_version.return_value = { +'request_id': '94011e25-f8dc-ec29-221b-1f9c1d9ad2ae', +'lease_id': '', +'renewable': False, +'lease_duration': 0, +'data': { +'data': { +'conn_type': 'postgresql', +'login': 'airflow', +'password': 'airflow', +'host': 'host', +'port': '5432', +'schema': 'airflow', +}, Review comment: probably should include `extra` since it's a case that should be supported one thought i had is, when storing creds in the "object" style, should we make it so that you can add values here at top level and they'll automatically get added to extra. so e.g. we could support this: ``` 'data': { 'conn_type': 'postgresql', 'login': 'airflow', 'password': 'airflow', 'host': 'host', 'port': '5432', 'myfield1': 'hello', 'myfield2': 'hello again', 'schema': 'airflow', }, ``` and then when bulding the connection object, add these to the extra dict. it's maybe a little wacky but it would allow for less nesting. and then when defining the values in vault you wouldn't have to dump to json and store under `extra` as json string. just a thought 路♂️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish opened a new pull request #15125: Add latest only decorator
dstandish opened a new pull request #15125: URL: https://github.com/apache/airflow/pull/15125 This decorator lets us implement latest only behavior without adding an extraneous task. You can make any operator a latest only operator by applying this decorator to `execute`. I thought for a minute about pulling the `is_latest_execution` logic out into a function shared by both latest only decorator and latest only operator, but the logic is just a few lines so I decided it wasn't worth the added obfuscation and object proliferation. Please let me know what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15124: Minor Helm Chart doc enhancements
github-actions[bot] commented on pull request #15124: URL: https://github.com/apache/airflow/pull/15124#issuecomment-811613157 The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest master or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] wolfier commented on issue #9975: max_active_runs = 1 can still create multiple active execution runs
wolfier commented on issue #9975: URL: https://github.com/apache/airflow/issues/9975#issuecomment-811586928 I couldn't find any mention of this [AIRFLOW-2535](https://issues.apache.org/jira/browse/AIRFLOW-2535). This is an issue for TriggerDagRunOperators as well in 1.10.10 where max_active_runs was not respected. PR [#13803](https://github.com/apache/airflow/pull/13803) does fix the issue in 1.10.15! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] kaxil opened a new pull request #15124: Minor Helm Chart doc enhancements
kaxil opened a new pull request #15124: URL: https://github.com/apache/airflow/pull/15124 - Better formatting for `tip` - Correct listing **Before**: ![image](https://user-images.githubusercontent.com/8811558/113229569-25f93600-928f-11eb-8ab8-668efcd63f4c.png) ![image](https://user-images.githubusercontent.com/8811558/113229595-33aebb80-928f-11eb-8137-e73fc7320c30.png) **After**: ![image](https://user-images.githubusercontent.com/8811558/113229584-2d204400-928f-11eb-853f-14a5e63bcc84.png) ![image](https://user-images.githubusercontent.com/8811558/113229612-3ad5c980-928f-11eb-850b-3827d3dbca56.png) --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ShakaibKhan commented on issue #15004: Running upgrade_check command doesn't process DAGs located in zip files
ShakaibKhan commented on issue #15004: URL: https://github.com/apache/airflow/issues/15004#issuecomment-811558458 Hey, I am trying to patch this issue and have a PR up https://github.com/apache/airflow/pull/15123 was wondering if I could get some guidance on testing my change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on pull request #15123: check for zipped files in import rules for upgrade check
boring-cyborg[bot] commented on pull request #15123: URL: https://github.com/apache/airflow/pull/15123#issuecomment-811557830 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst) Here are some useful points: - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices). Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://s.apache.org/airflow-slack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ShakaibKhan opened a new pull request #15123: check for zipped files in import rules for upgrade check
ShakaibKhan opened a new pull request #15123: URL: https://github.com/apache/airflow/pull/15123 This addresses #15004 added zip file check for import rules of upgrade check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] closed pull request #12617: Moved Contributors' Guide to CONTRIBUTING.rst file
github-actions[bot] closed pull request #12617: URL: https://github.com/apache/airflow/pull/12617 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #13894: critical class definition issue
github-actions[bot] commented on pull request #13894: URL: https://github.com/apache/airflow/pull/13894#issuecomment-811546288 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Dr-Denzy commented on pull request #15122: Skip `DatabaseVersionCheckRule` check if invalid version is detected
Dr-Denzy commented on pull request #15122: URL: https://github.com/apache/airflow/pull/15122#issuecomment-811522320 > When the version is something like `12.2 (Debian 12.2-2.pgdg100+1)`, the upgrade check errors with the following: > > ``` > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/checker.py", line 130, in __main__ > run(args) > File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/checker.py", line 118, in run > all_problems = check_upgrade(formatter, rules) > File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/checker.py", line 38, in check_upgrade > rule_status = RuleStatus.from_rule(rule) > File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/problem.py", line 44, in from_rule > result = rule.check() > File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper > return func(*args, **kwargs) > File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/rules/postgres_mysql_sqlite_version_upgrade_check.py", line 52, in check > installed_postgres_version = Version(session.execute('SHOW server_version;').scalar()) > File "/usr/local/lib/python3.7/site-packages/packaging/version.py", line 298, in __init__ > raise InvalidVersion("Invalid version: '{0}'".format(version)) > packaging.version.InvalidVersion: Invalid version: '12.2 (Debian 12.2-2.pgdg100+1)' > ``` > > This commit will SKIP the check during such occasions. > > cc @Dr-Denzy > > **^ Add meaningful description above** > > Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information. > In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. > In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). > In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). Nice catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on issue #14012: Changing state of task_instance to None
ephraimbuddy commented on issue #14012: URL: https://github.com/apache/airflow/issues/14012#issuecomment-811520833 It's the state of the pod that's been set to None after execution and not the task instance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jhtimmins edited a comment on pull request #14946: Standardize default fab perms
jhtimmins edited a comment on pull request #14946: URL: https://github.com/apache/airflow/pull/14946#issuecomment-811519584 @ashb This updates the permission model, so #14840 will need to integrate these changes, which I'm happy to help with. Can you and @kaxil take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] jhtimmins commented on pull request #14946: Standardize default fab perms
jhtimmins commented on pull request #14946: URL: https://github.com/apache/airflow/pull/14946#issuecomment-811519584 @ashb This updates the permission model, so #14840 will need to integrate these changes. Can you and @kaxil take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] kaxil opened a new pull request #15122: Skip `DatabaseVersionCheckRule` check if invalid version is detected
kaxil opened a new pull request #15122: URL: https://github.com/apache/airflow/pull/15122 When the version is something like `12.2 (Debian 12.2-2.pgdg100+1)`, the upgrade check errors with the following: ``` Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/checker.py", line 130, in __main__ run(args) File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/checker.py", line 118, in run all_problems = check_upgrade(formatter, rules) File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/checker.py", line 38, in check_upgrade rule_status = RuleStatus.from_rule(rule) File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/problem.py", line 44, in from_rule result = rule.check() File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/airflow/upgrade/rules/postgres_mysql_sqlite_version_upgrade_check.py", line 52, in check installed_postgres_version = Version(session.execute('SHOW server_version;').scalar()) File "/usr/local/lib/python3.7/site-packages/packaging/version.py", line 298, in __init__ raise InvalidVersion("Invalid version: '{0}'".format(version)) packaging.version.InvalidVersion: Invalid version: '12.2 (Debian 12.2-2.pgdg100+1)' ``` This commit will SKIP the check during such occasions. cc @Dr-Denzy --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information. In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] nkumar15 commented on issue #9486: ECSOperator failed to display logs from Cloudwatch after providing log configurations
nkumar15 commented on issue #9486: URL: https://github.com/apache/airflow/issues/9486#issuecomment-811507955 Thanks @zdenulo finally after reading your article I realized that in awslogs_stream_prefix configuration it should not have leading / in it. So instead of /ecs/hello-world it should be ecs/hello-world". It worked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated (b0e68eb -> 116a8a0)
This is an automated email from the ASF dual-hosted git repository. ash pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git. from b0e68eb Avoid scheduler/parser manager deadlock by using non-blocking IO (#15112) add 116a8a0 Remove duplicate call to sync_metadata inside DagFileProcessorManager (#15121) No new revisions were added by this update. Summary of changes: airflow/utils/dag_processing.py | 9 + 1 file changed, 5 insertions(+), 4 deletions(-)
[GitHub] [airflow] ashb merged pull request #15121: Remove duplicate call to sync_metadata inside DagFileProcessorManager
ashb merged pull request #15121: URL: https://github.com/apache/airflow/pull/15121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb merged pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb merged pull request #15112: URL: https://github.com/apache/airflow/pull/15112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb closed issue #7935: scheduler gets stuck without a trace
ashb closed issue #7935: URL: https://github.com/apache/airflow/issues/7935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated (7417f81 -> b0e68eb)
This is an automated email from the ASF dual-hosted git repository. ash pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git. from 7417f81 Run UI tests selectively when UI files have changed (#15009) add b0e68eb Avoid scheduler/parser manager deadlock by using non-blocking IO (#15112) No new revisions were added by this update. Summary of changes: airflow/utils/dag_processing.py| 32 +++ pylintrc-tests | 4 +- tests/utils/test_dag_processing.py | 80 +- 3 files changed, 106 insertions(+), 10 deletions(-)
[GitHub] [airflow] thekyz commented on issue #14128: Circular package imports in Airflow 2.0
thekyz commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811500469 Thanks for the links, definitely appreciated! I managed to make it work by depending directly on the providers + apache-airflow, since then the dependency on the extras is not expressed. For reference if somebody is encountering the same issue as me, this: ``` apache-airflow-providers-cncf-kubernetes==1.0.2 apache-airflow-providers-postgres==1.0.1 apache-airflow[statsd]==2.0.1 ``` instead of this: ``` apache-airflow[statsd,postgres,kubernetes]==2.0.1 ``` Thanks again @potiuk for the support! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated: Run UI tests selectively when UI files have changed (#15009)
This is an automated email from the ASF dual-hosted git repository. ash pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/master by this push: new 7417f81 Run UI tests selectively when UI files have changed (#15009) 7417f81 is described below commit 7417f81d36ad02c2a9d7feb9b9f881610f50ceba Author: Ash Berlin-Taylor AuthorDate: Wed Mar 31 23:10:00 2021 +0100 Run UI tests selectively when UI files have changed (#15009) If we only change the new React UI files, we don't want to have to run the whole python test suite! --- .github/workflows/ci.yml | 37 ++ .pre-commit-config.yaml| 7 ++ BREEZE.rst | 2 +- STATIC_CODE_CHECKS.rst | 2 ++ breeze-complete| 1 + scripts/ci/selective_ci_checks.sh | 46 +++--- scripts/ci/static_checks/eslint.sh | 31 + 7 files changed, 122 insertions(+), 4 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 49fb2e7..3f90b43 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -147,6 +147,7 @@ jobs: sqliteExclude: ${{ steps.selective-checks.outputs.sqlite-exclude }} kubernetesExclude: ${{ steps.selective-checks.outputs.kubernetes-exclude }} run-tests: ${{ steps.selective-checks.outputs.run-tests }} + run-ui-tests: ${{ steps.selective-checks.outputs.run-ui-tests }} run-kubernetes-tests: ${{ steps.selective-checks.outputs.run-kubernetes-tests }} basic-checks-only: ${{ steps.selective-checks.outputs.basic-checks-only }} image-build: ${{ steps.selective-checks.outputs.image-build }} @@ -344,6 +345,12 @@ pre-commit-local-installation-${{steps.host-python-version.outputs.host-python-v key: "pre-commit-no-pylint-${{steps.host-python-version.outputs.host-python-version}}-\ ${{ hashFiles('.pre-commit-config.yaml') }}" restore-keys: pre-commit-no-pylint-${{steps.host-python-version.outputs.host-python-version}} + + - name: "Cache eslint" +uses: actions/cache@v2 +with: + path: 'airflow/ui/node_modules' + key: ${{ runner.os }}-ui-node-modules-${{ hashFiles('airflow/ui/**/yarn.lock') }} - name: "Static checks: except pylint" run: ./scripts/ci/static_checks/run_static_checks.sh env: @@ -1264,3 +1271,33 @@ ${{ hashFiles('.pre-commit-config.yaml') }}" tags: true force: true branch: master + + tests-ui: +timeout-minutes: 10 +name: React UI tests +runs-on: ${{ fromJson(needs.build-info.outputs.runsOn) }} +needs: [build-info, ci-images] +env: + GITHUB_REGISTRY: ${{ needs.ci-images.outputs.githubRegistry }} +if: needs.build-info.outputs.run-ui-tests == 'true' +steps: + - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )" +uses: actions/checkout@v2 +with: + persist-credentials: false + submodules: recursive + - name: "Setup node" +uses: actions/setup-node@v2 +with: + node-version: 14 + - name: "Free space" +run: ./scripts/ci/tools/ci_free_space_on_ci.sh + - name: "Prepare CI image ${{env.PYTHON_MAJOR_MINOR_VERSION}}:${{ env.GITHUB_REGISTRY_PULL_IMAGE_TAG }}" +run: ./scripts/ci/images/ci_prepare_ci_image_on_ci.sh + - name: "Cache eslint" +uses: actions/cache@v2 +with: + path: 'airflow/ui/node_modules' + key: ${{ runner.os }}-ui-node-modules-${{ hashFiles('airflow/ui/**/yarn.lock') }} + - run: yarn --cwd airflow/ui/ install --frozen-lockfile --non-interactive + - run: yarn --cwd airflow/ui/ run test diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index e3cc75a..50e18ed 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -620,6 +620,13 @@ repos: entry: "./scripts/ci/pre_commit/pre_commit_flake8.sh" files: \.py$ pass_filenames: true + - id: ui-lint +name: ESLint against airflow/ui +language: node +'types_or': [javascript, tsx, ts] +files: ^airflow/ui/ +entry: "scripts/ci/static_checks/eslint.sh" +pass_filenames: false - id: bats-in-container-tests name: Run in container bats tests language: system diff --git a/BREEZE.rst b/BREEZE.rst index 55cc46a..2d0a346 100644 --- a/BREEZE.rst +++ b/BREEZE.rst @@ -2298,7 +2298,7 @@ This is the current syntax for `./breeze <./breeze>`_: pre-commit-hook-names provide-create-sessions providers-init-file provider-yamls pydevd pydocstyle pylint pylint-tests python-no-log-warn pyupgrade restrict-start_date rst-backticks setup-order setup-extra-packages shellcheck -
[GitHub] [airflow] ashb merged pull request #15009: Run UI tests selectively
ashb merged pull request #15009: URL: https://github.com/apache/airflow/pull/15009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb closed issue #14957: Run selective CI pipeline for UI-only PRs
ashb closed issue #14957: URL: https://github.com/apache/airflow/issues/14957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15009: Run UI tests selectively
github-actions[bot] commented on pull request #15009: URL: https://github.com/apache/airflow/pull/15009#issuecomment-811498731 The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #14840: Add CUD REST API endpoints for Roles
github-actions[bot] commented on pull request #14840: URL: https://github.com/apache/airflow/pull/14840#issuecomment-811493957 The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated (0e024c4 -> a2d9929)
This is an automated email from the ASF dual-hosted git repository. ash pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git. from 0e024c4 Fixed all remaining code usage of old task import (#15118) add a2d9929 Add REST API query sort and order to some endpoints (#14895) No new revisions were added by this update. Summary of changes: .../api_connexion/endpoints/connection_endpoint.py | 10 ++- .../api_connexion/endpoints/dag_run_endpoint.py| 26 +- .../api_connexion/endpoints/event_log_endpoint.py | 19 - .../endpoints/import_error_endpoint.py | 16 ++-- airflow/api_connexion/endpoints/pool_endpoint.py | 10 ++- .../endpoints/role_and_permission_endpoint.py | 10 ++- airflow/api_connexion/endpoints/task_endpoint.py | 11 ++- airflow/api_connexion/endpoints/user_endpoint.py | 19 - .../api_connexion/endpoints/variable_endpoint.py | 17 ++-- airflow/api_connexion/openapi/v1.yaml | 28 ++- airflow/api_connexion/parameters.py| 19 + airflow/api_connexion/schemas/dag_run_schema.py| 1 + .../endpoints/test_connection_endpoint.py | 49 .../endpoints/test_dag_run_endpoint.py | 93 ++ .../endpoints/test_event_log_endpoint.py | 65 ++- .../endpoints/test_import_error_endpoint.py| 58 ++ .../api_connexion/endpoints/test_pool_endpoint.py | 43 ++ .../endpoints/test_role_and_permission_endpoint.py | 8 ++ .../api_connexion/endpoints/test_task_endpoint.py | 75 +++-- .../api_connexion/endpoints/test_user_endpoint.py | 9 +++ .../endpoints/test_variable_endpoint.py| 10 +++ 21 files changed, 551 insertions(+), 45 deletions(-)
[GitHub] [airflow] ashb merged pull request #14895: Add REST API query sort and order to some endpoints
ashb merged pull request #14895: URL: https://github.com/apache/airflow/pull/14895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated: Fixed all remaining code usage of old task import (#15118)
This is an automated email from the ASF dual-hosted git repository. ash pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/master by this push: new 0e024c4 Fixed all remaining code usage of old task import (#15118) 0e024c4 is described below commit 0e024c4a79ed5843d7eb6d53fc5f6e4950d2f42a Author: Andrew Godwin AuthorDate: Wed Mar 31 15:23:46 2021 -0600 Fixed all remaining code usage of old task import (#15118) This finishes the job of removing all imports of task from `operators.python` and replaces them with the new import from `decorators`. --- tests/models/test_dag.py | 2 +- tests/models/test_dagparam.py | 2 +- tests/utils/test_task_group.py | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/models/test_dag.py b/tests/models/test_dag.py index d2d0f9a..7f00134 100644 --- a/tests/models/test_dag.py +++ b/tests/models/test_dag.py @@ -38,6 +38,7 @@ from parameterized import parameterized from airflow import models, settings from airflow.configuration import conf +from airflow.decorators import task as task_decorator from airflow.exceptions import AirflowException, DuplicateTaskIdFound from airflow.models import DAG, DagModel, DagRun, DagTag, TaskFail, TaskInstance as TI from airflow.models.baseoperator import BaseOperator @@ -45,7 +46,6 @@ from airflow.models.dag import dag as dag_decorator from airflow.models.dagparam import DagParam from airflow.operators.bash import BashOperator from airflow.operators.dummy import DummyOperator -from airflow.operators.python import task as task_decorator from airflow.operators.subdag import SubDagOperator from airflow.security import permissions from airflow.utils import timezone diff --git a/tests/models/test_dagparam.py b/tests/models/test_dagparam.py index 1eb28cb..618eb88 100644 --- a/tests/models/test_dagparam.py +++ b/tests/models/test_dagparam.py @@ -19,8 +19,8 @@ import unittest from datetime import timedelta +from airflow.decorators import task from airflow.models.dag import DAG -from airflow.operators.python import task from airflow.utils import timezone from airflow.utils.state import State from airflow.utils.types import DagRunType diff --git a/tests/utils/test_task_group.py b/tests/utils/test_task_group.py index 0bc80e3..de227e4 100644 --- a/tests/utils/test_task_group.py +++ b/tests/utils/test_task_group.py @@ -269,7 +269,7 @@ def test_build_task_group_with_task_decorator(): """ Test that TaskGroup can be used with the @task decorator. """ -from airflow.operators.python import task +from airflow.decorators import task @task def task_1():
[GitHub] [airflow] ashb merged pull request #15118: Fixed all remaining usage of the old task import
ashb merged pull request #15118: URL: https://github.com/apache/airflow/pull/15118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #14128: Circular package imports in Airflow 2.0
potiuk commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811474838 And here is explanation about airflow's extras http://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html - it's rather straightforward and helpful I think -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #14128: Circular package imports in Airflow 2.0
potiuk edited a comment on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811473508 https://stackoverflow.com/q/52474931/516701 gives a brief explanation about extras. I think it make sense to at least take a brief look if you are going to use them to understand what they are -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #14128: Circular package imports in Airflow 2.0
potiuk commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811473508 https://stackoverflow.com/q/52474931/516701 gives a brief explanation about extras. I think it make sens to at least take a brief look if you are going to use them to understand what they are -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #14128: Circular package imports in Airflow 2.0
potiuk commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811472295 And answering your question - yes. There is no point in installing provider without airflow. So it makes perfect sense to have this dependency. It also prevents from installing providers with airflow 1.10 (as there is 2.0 apache-airflow dependency) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #14128: Circular package imports in Airflow 2.0
potiuk commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811470959 There is no need to install both provider and extra. Extra is just a legacy way to install providers, but you can simply install 'apache-airflow' (no extra) and 'apche-airflow-providers-cncf-kubernetes' as two separate packages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb opened a new pull request #15121: Remove duplicate call to sync_metadata inside DagFileProcessorManager
ashb opened a new pull request #15121: URL: https://github.com/apache/airflow/pull/15121 `_process_message` already calls sync_metadata if it's a DagParsingStat, so there's no need to call it again. This isn't expensive, but no point doing it I've also added a runtime check to ensure this function is only used in sync mode, which makes the purpose/logic used in this function clearer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] thekyz commented on issue #14128: Circular package imports in Airflow 2.0
thekyz commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811463347 Well I'm using pip through rules_python (a set of bazel rules to use pip libraries; it actually uses pip wheel to get the modules). The download works fine, but then apache-airflow-providers-cncf-kubernetes declares a dependency on apache-airflow, which itself declares a dependency on apache-airflow-providers-cncf-kubernetes when installed with the extra, thus making bazel complains because of the circular dependency. Since the issue had been addressed above for the other (albeit preinstalled) providers, I thought this would be a similar issue. I'm sorry I'm not very knowledgeable on pip so I'm not aware of the underlying difference between the preinstalled providers and the extras. Is that dependency from apache-airflow-providers-cncf-kubernetes towards apache-airflow required for regular pip install? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
kaxil commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605216893 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: lol yeah, good catch @XD-DENG `self. _sync_metadata` is redundant here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605215817 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: Yeah, that looks like a bug. PR comin. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #14640: Allow ExternalTaskSensor to wait for taskgroup
github-actions[bot] commented on pull request #14640: URL: https://github.com/apache/airflow/pull/14640#issuecomment-811459419 [The Workflow run](https://github.com/apache/airflow/actions/runs/706274159) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #14640: Allow ExternalTaskSensor to wait for taskgroup
github-actions[bot] commented on pull request #14640: URL: https://github.com/apache/airflow/pull/14640#issuecomment-811459103 [The Workflow run](https://github.com/apache/airflow/actions/runs/706272731) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #14946: Standardize default fab perms
ashb commented on pull request #14946: URL: https://github.com/apache/airflow/pull/14946#issuecomment-811457323 FYI: This PR and https://github.com/apache/airflow/pull/14840 "fight" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] XD-DENG commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
XD-DENG commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605210067 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: > Checking `self._sync_metadata` also leads me to something else: > > https://github.com/apache/airflow/blob/1bec3b21266764f367431ab5d9a5b75f52b9b6d2/airflow/utils/dag_processing.py#L323-L326 > > is doing exactly the same thing in `self._process_message()`, which has been invoked just before L323-326. > A clean-up refactor to do? Or I misunderstood something? Thanks @ashb . How about this question? We can address it separately of course, but want to confirm if I overlooked anything regarding this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)
ashb commented on issue #15116: URL: https://github.com/apache/airflow/issues/15116#issuecomment-811450727 Long term the "fix" for us is probably to wait for Flask to support asyncio and then to drop support for gevent/eventlet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on issue #15097: Errors when launching many pods simultaneously on GKE
ashb commented on issue #15097: URL: https://github.com/apache/airflow/issues/15097#issuecomment-811450057 I agree with you that cases like this where Airflow was never even able to _start_ the task don't feel like they should "consume" a retry attempt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on issue #15097: Errors when launching many pods simultaneously on GKE
ashb commented on issue #15097: URL: https://github.com/apache/airflow/issues/15097#issuecomment-811449743 @SamWheating Assigned, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on issue #15119: Flag on CLI airflow tasks test to attach debugger
ashb commented on issue #15119: URL: https://github.com/apache/airflow/issues/15119#issuecomment-811448411 Are you aware of the the `dag.cli()` function? That might help you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] dstandish opened a new pull request #15120: Do not log info when ssm secret not found
dstandish opened a new pull request #15120: URL: https://github.com/apache/airflow/pull/15120 Not finding a connection in a particular backend is not and event that merits logging. That's why we have 3 connections in the search path. It's only when the connection is not found in all 3 backends is something exceptional happening. Log level should be DEBUG. Also, use of the word 'error' is misleading, because not finding a parameter is normal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] GuillaumeDesforges opened a new issue #15119: Flag on CLI airflow tasks test to attach debugger
GuillaumeDesforges opened a new issue #15119: URL: https://github.com/apache/airflow/issues/15119 **Description** Any way to allow to attach a debugger when using `airflow tasks test` **Use case / motivation** Following [discussion](https://github.com/apache/airflow/discussions/14621): I generate DAGs dynamically using a database. I want to debug tasks. The doc ([Debug Executor](https://airflow.apache.org/docs/apache-airflow/stable/executor/debug.html)) suggests to add ```python if __name__ == '__main__': from airflow.utils.state import State dag.clear(dag_run_state=State.NONE) dag.run() ``` and run from IDE. The above code snippet is something I'd rather avoid: * it is hard to select the right dag and tasks when they are generated automatically * it's a code artifact I'd rather avoid having, because everytime I test another dag, task or execution time, it creates git diffs **Are you willing to submit a PR?** No **Related Issues** Haven't seen any (?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605198494 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: Yes, done, was only used in one test, and that was strictly speaking not testing the right thing anymore, just a likely side-effect. ```diff -assert result.num_file_paths == 3 +assert sum(stat.run_count for stat in manager._file_stats.values()) == 3 ``` See 4d8da57eb for full diff. Good call @XD-DENG -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605194036 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: Fair point yes, this is only used for tests and could possibly be removed. Let me take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #15009: WIP: run UI tests
ashb commented on pull request #15009: URL: https://github.com/apache/airflow/pull/15009#issuecomment-811431004 Note that _this_ PR will build everything as the CI scripts have changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] XD-DENG commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
XD-DENG commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605181000 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: let's further challenge this part: why do we need `num_file_paths` (or the earlier `file_paths`) at all? Where we really need this `DagParsingStat` is `self._sync_metadata`, in which we don't use `num_file_paths` (or the earlier `file_paths`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] XD-DENG commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
XD-DENG commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605184402 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: Checking `self._sync_metadata` also leads me to something else: https://github.com/apache/airflow/blob/1bec3b21266764f367431ab5d9a5b75f52b9b6d2/airflow/utils/dag_processing.py#L323-L326 is doing exactly the same thing in `self._process_message()`, which has been invoked just before L323-326. A clean-up refactor to do? Or I misunderstood something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] XD-DENG commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
XD-DENG commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605181000 ## File path: airflow/utils/dag_processing.py ## @@ -141,7 +141,7 @@ def waitable_handle(self): class DagParsingStat(NamedTuple): """Information on processing progress""" -file_paths: List[str] +num_file_paths: int Review comment: let's further challenge this part: why do we need `num_file_paths` or earlier `file_paths`? Where we really need this `DagParsingStat` is `self._sync_metadata`, in which we don't use `num_file_paths` or earlier `file_paths` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15086: Mark the test_scheduler_task_start_date as quarantined
github-actions[bot] commented on pull request #15086: URL: https://github.com/apache/airflow/pull/15086#issuecomment-811399680 The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (AIRFLOW-2535) TriggerDagRunOperator ignores max_active_runs, leading to scheduler breakdown
[ https://issues.apache.org/jira/browse/AIRFLOW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312673#comment-17312673 ] Chris edited comment on AIRFLOW-2535 at 3/31/21, 7:43 PM: -- This is making me avoid the TriggerDagRunOperator like the plague. I think I'll stick with Sensors until this gets sorted out. I'm on Airflow 1.10.10. Will this happen in v2.0+? was (Author: cdabel): This is making me avoid the TriggerDagRunOperator like the plague. I think I'll stick with Sensors until this gets sorted out. > TriggerDagRunOperator ignores max_active_runs, leading to scheduler breakdown > - > > Key: AIRFLOW-2535 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2535 > Project: Apache Airflow > Issue Type: Bug > Components: operators, scheduler >Affects Versions: 1.9.0 > Environment: CeleryExec, Rabbitmq, mysql >Reporter: Tylar Murray >Priority: Minor > > `TriggerDagRunOperator` does not respect the `max_active_runs` setting. This > in itself is not a huge issue, but the scheduler breaks down if > `max_active_runs` is exceeded. Task scheduling becomes absurdly slow, quickly > leading to perpetual DAG buildup. > `TriggerDagRunOperator` could throw exception if `max_active_runs` will be > exceeded, or maybe dags could be put into a `queued` state rather than > directly into `running`? > Alternatively: the scheduler probably shouldn't slow task scheduling when > `max_active_runs` is exceeded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] potiuk edited a comment on issue #14128: Circular package imports in Airflow 2.0
potiuk edited a comment on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811389466 I don't belive it's an issue at all. Airflow is only officialy installable with `pip` and it works nicely when you run installation including constraints in the way described here: http://airflow.apache.org/docs/apache-airflow/stable/installation.html#installation-tools. This is the only installation method that brings repeatable results. Other methods might fail, but w are not supporting a full range of tools that behave differently than PIP. If you are using poetry, pipenv or other tools like that you are pretty much on your own. If this is not a 'serious issue' but "some way of using some tool', it's likely to be closed as invalid. So possibly starting discussion and discussing it might be a better way to go about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #14128: Circular package imports in Airflow 2.0
potiuk edited a comment on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811389466 I don't belive it's an issue at all. Airflow is only officialy installable with `pip` and it works nicely when you run it in including constraints in the way described here: http://airflow.apache.org/docs/apache-airflow/stable/installation.html#installation-tools. This is the only installation method that brings repeatable results. Other methods might fail, but w are not supporting a full range of tools that behave differently than PIP. If you are using poetry, pipenv or other tools like that you are pretty much on your own. If this is not a 'serious issue' but "some way of using some tool', it's likely to be closed as invalid. So possibly starting discussion and discussing it might be a better way to go about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15109: Store inventories in GitHub Action cache
github-actions[bot] commented on pull request #15109: URL: https://github.com/apache/airflow/pull/15109#issuecomment-811390591 The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #14128: Circular package imports in Airflow 2.0
potiuk commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811389466 I don't belive it's an issue at all. Airflow is only officialyl installable with `pip` and it works nicely when you run it in including constraints in the way described here: http://airflow.apache.org/docs/apache-airflow/stable/installation.html#installation-tools. This is the only installation method that brings repeatable results. Other methods might fail, but w are not supporting a full range of tools that behave differently than PIP. If you are using poetry, pipenv or other tools like that you are pretty much on your own. If this is not a 'serious issue' but "some way of using some tool', it's likely to be closed as invalid. So possibly starting discussion and discussing it might be a better way to go about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AIRFLOW-2535) TriggerDagRunOperator ignores max_active_runs, leading to scheduler breakdown
[ https://issues.apache.org/jira/browse/AIRFLOW-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312673#comment-17312673 ] Chris commented on AIRFLOW-2535: This is making me avoid the TriggerDagRunOperator like the plague. I think I'll stick with Sensors until this gets sorted out. > TriggerDagRunOperator ignores max_active_runs, leading to scheduler breakdown > - > > Key: AIRFLOW-2535 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2535 > Project: Apache Airflow > Issue Type: Bug > Components: operators, scheduler >Affects Versions: 1.9.0 > Environment: CeleryExec, Rabbitmq, mysql >Reporter: Tylar Murray >Priority: Minor > > `TriggerDagRunOperator` does not respect the `max_active_runs` setting. This > in itself is not a huge issue, but the scheduler breaks down if > `max_active_runs` is exceeded. Task scheduling becomes absurdly slow, quickly > leading to perpetual DAG buildup. > `TriggerDagRunOperator` could throw exception if `max_active_runs` will be > exceeded, or maybe dags could be put into a `queued` state rather than > directly into `running`? > Alternatively: the scheduler probably shouldn't slow task scheduling when > `max_active_runs` is exceeded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [airflow] XD-DENG commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
XD-DENG commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605163605 ## File path: tests/utils/test_dag_processing.py ## @@ -521,17 +524,90 @@ def test_dag_with_system_exit(self): manager._run_parsing_loop() +result = None while parent_pipe.poll(timeout=None): result = parent_pipe.recv() if isinstance(result, DagParsingStat) and result.done: break # Three files in folder should be processed -assert len(result.file_paths) == 3 +assert result.num_file_paths == 3 with create_session() as session: assert session.query(DagModel).get(dag_id) is not None +@conf_vars({('core', 'load_examples'): 'False'}) +@pytest.mark.backend("mysql", "postgres") +@pytest.mark.timeout(60) +def test_pipe_full_deadlock(self): +dag_filepath = TEST_DAG_FOLDER / "test_scheduler_dags.py" + +child_pipe, parent_pipe = multiprocessing.Pipe() + +# Shrink the buffers to exacerbate the problem! +for fd in (parent_pipe.fileno(),): +sock = socket.socket(fileno=fd) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) +sock.detach() + +exit_event = threading.Event() + +# To test this behaviour we need something that continually fills the +# parent pipe's bufffer (and keeps it full). Review comment: ```suggestion # parent pipe's buffer (and keeps it full). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] thekyz commented on issue #14128: Circular package imports in Airflow 2.0
thekyz commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811315008 Sorry I jumped to conclusions here. I get exactly the same issue as described above when installing apache-airflow[kubernetes] thus I thought it was the same root cause. Should I create a new ticket then? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15117: Remove user_id from API schema
github-actions[bot] commented on pull request #15117: URL: https://github.com/apache/airflow/pull/15117#issuecomment-811314368 The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Davide95 commented on issue #15108: PostgresHook do not use the specified cursor
Davide95 commented on issue #15108: URL: https://github.com/apache/airflow/issues/15108#issuecomment-811302486 > Yes, as the implementation currently stands. Ok, thanks for the clarification > It is definitely a bit inconvinient, but why would you need to mix both kinds of cursors in the first place? It’s a nice feature to have, but _particluarly tricky_ wouldn’t be my description to the situation. I'm inheriting a large and old codebase and migrating everything could be an issue. Integrating different DAGs with different settings would be a nice feature to have -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #15108: PostgresHook do not use the specified cursor
uranusjr commented on issue #15108: URL: https://github.com/apache/airflow/issues/15108#issuecomment-811300433 > Does this mean that if you want to use different cursor types you have to create different connections? Yes, as the implementation currently stands. > Isn't this behavior particularly tricky? It is definitely a bit inconvinient, but why would you need to mix both kinds of cursors in the first place? It’s a nice feature to have, but *particluarly tricky* wouldn’t be my description to the situation. But nontheless, it would still be a good feature request and contribution if the interface can be designed reasonably. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] Davide95 commented on issue #15108: PostgresHook do not use the specified cursor
Davide95 commented on issue #15108: URL: https://github.com/apache/airflow/issues/15108#issuecomment-811295403 Hi @uranusjr > I believe the documentation is saying you need to set extra on your Connection, not the hook. Does this mean that if you want to use different cursor types you have to create different connections? Isn't this behavior particularly tricky? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #8665: subdag is not able to correctly render {{ next_ds }} jinja template when start_date uses timezone info.
uranusjr commented on issue #8665: URL: https://github.com/apache/airflow/issues/8665#issuecomment-811290623 The displayed time here is actually pretty weird. `Asia/Calcutta` is +05:30 (`pendulum` seems to implement this correctly), so I'd expect the output to be `2020-04-29T05:30:00+00:00` (the same time as supplied by the user, converted to UTC). I'm less sure the two issues are related (still could be). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605095651 ## File path: tests/utils/test_dag_processing.py ## @@ -17,10 +17,13 @@ # under the License. import multiprocessing +import logging Review comment: ```suggestion import logging import multiprocessing ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[airflow] branch master updated (6e99ae0 -> 1bec3b2)
This is an automated email from the ASF dual-hosted git repository. kaxilnaik pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/airflow.git. from 6e99ae0 Allow pathlib.Path in DagBag and various util fns (#15110) add 1bec3b2 Fix doc link permission name (#14972) No new revisions were added by this update. Summary of changes: airflow/security/permissions.py | 2 +- airflow/www/extensions/init_appbuilder_links.py | 10 +- airflow/www/security.py | 2 +- tests/www/test_security.py | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-)
[GitHub] [airflow] kaxil merged pull request #14972: Fix doc link permission name
kaxil merged pull request #14972: URL: https://github.com/apache/airflow/pull/14972 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
github-actions[bot] commented on pull request #15112: URL: https://github.com/apache/airflow/pull/15112#issuecomment-811278484 [The Workflow run](https://github.com/apache/airflow/actions/runs/705808662) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #15109: Store inventories in GitHub Action cache
kaxil commented on a change in pull request #15109: URL: https://github.com/apache/airflow/pull/15109#discussion_r605092312 ## File path: scripts/ci/docs/ci_docs.sh ## @@ -30,8 +30,7 @@ export PYTHONPATH=${AIRFLOW_SOURCES} pip install --upgrade pip==20.2.4 -pip install .[doc] --upgrade --constraint \ - "https://raw.githubusercontent.com/apache/airflow/constraints-${DEFAULT_BRANCH}/constraints-${PYTHON_MAJOR_MINOR_VERSION}.txt; +pip install .[doc] --upgrade --constraint "${AIRFLOW_SOURCES}/constraints.txt" Review comment: Wait is this change required? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #14176: Simplify CONTRIBUTING.rst
potiuk commented on pull request #14176: URL: https://github.com/apache/airflow/pull/14176#issuecomment-811276689 Cool! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] SamWheating commented on issue #15097: Errors when launching many pods simultaneously on GKE
SamWheating commented on issue #15097: URL: https://github.com/apache/airflow/issues/15097#issuecomment-811274036 - Can you assign this issue to me and I'll open a PR to add some backoff logic to the operator? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on pull request #15109: Store inventories in GitHub Action cache
uranusjr commented on pull request #15109: URL: https://github.com/apache/airflow/pull/15109#issuecomment-811272928 Cool ️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15034: Taskgroup decorator
github-actions[bot] commented on pull request #15034: URL: https://github.com/apache/airflow/pull/15034#issuecomment-811272342 The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #15109: Store inventories in GitHub Action cache
potiuk commented on pull request #15109: URL: https://github.com/apache/airflow/pull/15109#issuecomment-811272172 > I'm not yet sure how best to implement the part pulling version from constraints to fetch the currect inventory versions. I'll probably start with checking each listed documentation and figure out if they have a version inventory (not so sure for e.g. boto). Actually, I think that's IT. We do not need all those version changes actually when you look at it now. This will beautifully work as-is now without any further changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #15109: Store inventories in GitHub Action cache
potiuk commented on pull request #15109: URL: https://github.com/apache/airflow/pull/15109#issuecomment-811270930 Nice! Waay simpler than the initial idea :). Would you mind `git commit --amend` and `git --force-push` to re-push it again (and change it to non-draft). This way we will see if the cache works (it's been successfully pushed it seems) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr edited a comment on issue #15108: PostgresHook do not use the specified cursor
uranusjr edited a comment on issue #15108: URL: https://github.com/apache/airflow/issues/15108#issuecomment-811268206 That said, it is definitely a bit confusing the hook just throws away unused arguments like this. Maybe it should raise an exception instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #15108: PostgresHook do not use the specified cursor
uranusjr commented on issue #15108: URL: https://github.com/apache/airflow/issues/15108#issuecomment-811268206 That said, it is definitely a bit confusing the hook just throws away unused arguments like this. Maybe it should throw an error instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #15108: PostgresHook do not use the specified cursor
uranusjr commented on issue #15108: URL: https://github.com/apache/airflow/issues/15108#issuecomment-811267577 I believe the documentation is saying you need to set `extra` on your *Connection*, not the hook. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605082015 ## File path: tests/utils/test_dag_processing.py ## @@ -521,17 +523,93 @@ def test_dag_with_system_exit(self): manager._run_parsing_loop() +result = None while parent_pipe.poll(timeout=None): result = parent_pipe.recv() if isinstance(result, DagParsingStat) and result.done: break # Three files in folder should be processed -assert len(result.file_paths) == 3 +assert result.num_file_paths == 3 with create_session() as session: assert session.query(DagModel).get(dag_id) is not None +@conf_vars({('core', 'load_examples'): 'False'}) +@pytest.mark.backend("mysql", "postgres") +def test_pipe_full_deadlock(self): +dag_filepath = TEST_DAG_FOLDER / "test_scheduler_dags.py" + +child_pipe, parent_pipe = multiprocessing.Pipe() + +# Shrink the buffers to exacerbate the problem! +for fd in (parent_pipe.fileno(),): +sock = socket.socket(fileno=fd) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) +sock.detach() + +exit_event = threading.Event() + +# To test this behaviour we need something that continually fills the +# parent pipe's bufffer (and keeps it full). +def keep_pipe_full(pipe, exit_event): +import logging + +n = 0 +while True: +if exit_event.is_set(): +break + +req = CallbackRequest(str(dag_filepath)) +try: +logging.debug("Sending CallbackRequests %d", n + 1) +pipe.send(req) +except TypeError: +# This is actually the error you get when the parent pipe +# is closed! Nicely handled, eh? +break +except OSError: +break +n += 1 +logging.debug(" Sent %d CallbackRequests", n) + +thread = threading.Thread(target=keep_pipe_full, args=(parent_pipe, exit_event)) + +fake_processors = [] + +def fake_processor_factory(*args, **kwargs): +nonlocal fake_processors +processor = FakeDagFileProcessorRunner._fake_dag_processor_factory(*args, **kwargs) +fake_processors.append(processor) +return processor + +manager = DagFileProcessorManager( +dag_directory=dag_filepath, +dag_ids=[], +# A reasonable large number to ensure that we trigger the deadlock +max_runs=100, +processor_factory=fake_processor_factory, +processor_timeout=timedelta(seconds=5), +signal_conn=child_pipe, +pickle_dags=False, +async_mode=True, +) + +try: +thread.start() + +# If this completes without hanging, then the test is good! +manager._run_parsing_loop() +exit_event.set() +finally: +import logging Review comment: https://github.com/apache/airflow/pull/15112/commits/5e7405f5574b36e8bc9cf8df0d0d788ba7238ad4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605081881 ## File path: tests/utils/test_dag_processing.py ## @@ -521,17 +523,93 @@ def test_dag_with_system_exit(self): manager._run_parsing_loop() +result = None while parent_pipe.poll(timeout=None): result = parent_pipe.recv() if isinstance(result, DagParsingStat) and result.done: break # Three files in folder should be processed -assert len(result.file_paths) == 3 +assert result.num_file_paths == 3 with create_session() as session: assert session.query(DagModel).get(dag_id) is not None +@conf_vars({('core', 'load_examples'): 'False'}) +@pytest.mark.backend("mysql", "postgres") +def test_pipe_full_deadlock(self): +dag_filepath = TEST_DAG_FOLDER / "test_scheduler_dags.py" + +child_pipe, parent_pipe = multiprocessing.Pipe() + +# Shrink the buffers to exacerbate the problem! +for fd in (parent_pipe.fileno(),): +sock = socket.socket(fileno=fd) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) +sock.detach() + +exit_event = threading.Event() + +# To test this behaviour we need something that continually fills the +# parent pipe's bufffer (and keeps it full). +def keep_pipe_full(pipe, exit_event): +import logging Review comment: https://github.com/apache/airflow/pull/15112/commits/5e7405f5574b36e8bc9cf8df0d0d788ba7238ad4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605081483 ## File path: tests/utils/test_dag_processing.py ## @@ -521,17 +523,93 @@ def test_dag_with_system_exit(self): manager._run_parsing_loop() +result = None while parent_pipe.poll(timeout=None): result = parent_pipe.recv() if isinstance(result, DagParsingStat) and result.done: break # Three files in folder should be processed -assert len(result.file_paths) == 3 +assert result.num_file_paths == 3 with create_session() as session: assert session.query(DagModel).get(dag_id) is not None +@conf_vars({('core', 'load_examples'): 'False'}) +@pytest.mark.backend("mysql", "postgres") +def test_pipe_full_deadlock(self): +dag_filepath = TEST_DAG_FOLDER / "test_scheduler_dags.py" + +child_pipe, parent_pipe = multiprocessing.Pipe() + +# Shrink the buffers to exacerbate the problem! +for fd in (parent_pipe.fileno(),): +sock = socket.socket(fileno=fd) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) +sock.detach() + +exit_event = threading.Event() + +# To test this behaviour we need something that continually fills the +# parent pipe's bufffer (and keeps it full). +def keep_pipe_full(pipe, exit_event): +import logging + +n = 0 +while True: +if exit_event.is_set(): +break + +req = CallbackRequest(str(dag_filepath)) +try: +logging.debug("Sending CallbackRequests %d", n + 1) +pipe.send(req) +except TypeError: +# This is actually the error you get when the parent pipe +# is closed! Nicely handled, eh? +break +except OSError: +break +n += 1 +logging.debug(" Sent %d CallbackRequests", n) + +thread = threading.Thread(target=keep_pipe_full, args=(parent_pipe, exit_event)) + +fake_processors = [] + +def fake_processor_factory(*args, **kwargs): +nonlocal fake_processors +processor = FakeDagFileProcessorRunner._fake_dag_processor_factory(*args, **kwargs) +fake_processors.append(processor) +return processor + +manager = DagFileProcessorManager( +dag_directory=dag_filepath, +dag_ids=[], +# A reasonable large number to ensure that we trigger the deadlock +max_runs=100, +processor_factory=fake_processor_factory, +processor_timeout=timedelta(seconds=5), +signal_conn=child_pipe, +pickle_dags=False, +async_mode=True, +) + +try: +thread.start() + +# If this completes without hanging, then the test is good! +manager._run_parsing_loop() Review comment: https://github.com/apache/airflow/pull/15112/commits/5e7405f5574b36e8bc9cf8df0d0d788ba7238ad4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #14128: Circular package imports in Airflow 2.0
potiuk edited a comment on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811266078 > @potiuk any clue when airflow will get a new release with this issue fixed? It's already fixed in released 2.0.1 https://github.com/apache/airflow/pull/13314 > also looks like the issue is not fixed at least for apache-airflow-providers-cncf-kubernetes It's never been an issue for cncf-kubernetes. This was only issue for the 4 providers that are preinstalled: 'ftp', 'http', 'imap', 'sqlite'. Airflow has only "extra" dependency on cncf.kubernetes but extras are not counting as "install-requires" so this is not a problem as far as I know (unless you specifically have some problem that you would like to describe). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #14128: Circular package imports in Airflow 2.0
potiuk commented on issue #14128: URL: https://github.com/apache/airflow/issues/14128#issuecomment-811266078 > @potiuk any clue when airflow will get a new release with this issue fixed? It's already fixed in released 2.0.1 https://github.com/apache/airflow/pull/13314 > also looks like the issue is not fixed at least for apache-airflow-providers-cncf-kubernetes It's never been an issue for cncf-kubernetes. This was only issue for the 4 providers that are preinstalled: 'ftp', 'http', 'imap', 'sqlite'. Airflow has only "extra" dependency on cncf.kubernetes but extras are not counting as "install-requires" so this is not a problem as far as I know (unless you specifically have some problem that you would like to describe). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15114: Pin pandas-gbq to <0.15.0
github-actions[bot] commented on pull request #15114: URL: https://github.com/apache/airflow/pull/15114#issuecomment-811266066 [The Workflow run](https://github.com/apache/airflow/actions/runs/705783193) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #15118: Fixed all remaining usage of the old task import
github-actions[bot] commented on pull request #15118: URL: https://github.com/apache/airflow/pull/15118#issuecomment-811264197 The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605077882 ## File path: tests/utils/test_dag_processing.py ## @@ -20,7 +20,9 @@ import os Review comment: ```suggestion import logging import os ``` ## File path: tests/utils/test_dag_processing.py ## @@ -521,17 +523,93 @@ def test_dag_with_system_exit(self): manager._run_parsing_loop() +result = None while parent_pipe.poll(timeout=None): result = parent_pipe.recv() if isinstance(result, DagParsingStat) and result.done: break # Three files in folder should be processed -assert len(result.file_paths) == 3 +assert result.num_file_paths == 3 with create_session() as session: assert session.query(DagModel).get(dag_id) is not None +@conf_vars({('core', 'load_examples'): 'False'}) +@pytest.mark.backend("mysql", "postgres") +def test_pipe_full_deadlock(self): +dag_filepath = TEST_DAG_FOLDER / "test_scheduler_dags.py" + +child_pipe, parent_pipe = multiprocessing.Pipe() + +# Shrink the buffers to exacerbate the problem! +for fd in (parent_pipe.fileno(),): +sock = socket.socket(fileno=fd) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) +sock.detach() + +exit_event = threading.Event() + +# To test this behaviour we need something that continually fills the +# parent pipe's bufffer (and keeps it full). +def keep_pipe_full(pipe, exit_event): +import logging + Review comment: ```suggestion ``` ## File path: tests/utils/test_dag_processing.py ## @@ -521,17 +523,93 @@ def test_dag_with_system_exit(self): manager._run_parsing_loop() +result = None while parent_pipe.poll(timeout=None): result = parent_pipe.recv() if isinstance(result, DagParsingStat) and result.done: break # Three files in folder should be processed -assert len(result.file_paths) == 3 +assert result.num_file_paths == 3 with create_session() as session: assert session.query(DagModel).get(dag_id) is not None +@conf_vars({('core', 'load_examples'): 'False'}) +@pytest.mark.backend("mysql", "postgres") +def test_pipe_full_deadlock(self): +dag_filepath = TEST_DAG_FOLDER / "test_scheduler_dags.py" + +child_pipe, parent_pipe = multiprocessing.Pipe() + +# Shrink the buffers to exacerbate the problem! +for fd in (parent_pipe.fileno(),): +sock = socket.socket(fileno=fd) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) +sock.detach() + +exit_event = threading.Event() + +# To test this behaviour we need something that continually fills the +# parent pipe's bufffer (and keeps it full). +def keep_pipe_full(pipe, exit_event): +import logging + +n = 0 +while True: +if exit_event.is_set(): +break + +req = CallbackRequest(str(dag_filepath)) +try: +logging.debug("Sending CallbackRequests %d", n + 1) +pipe.send(req) +except TypeError: +# This is actually the error you get when the parent pipe +# is closed! Nicely handled, eh? +break +except OSError: +break +n += 1 +logging.debug(" Sent %d CallbackRequests", n) + +thread = threading.Thread(target=keep_pipe_full, args=(parent_pipe, exit_event)) + +fake_processors = [] + +def fake_processor_factory(*args, **kwargs): +nonlocal fake_processors +processor = FakeDagFileProcessorRunner._fake_dag_processor_factory(*args, **kwargs) +fake_processors.append(processor) +return processor + +manager = DagFileProcessorManager( +dag_directory=dag_filepath, +dag_ids=[], +# A reasonable large number to ensure that we trigger the deadlock +max_runs=100, +processor_factory=fake_processor_factory, +processor_timeout=timedelta(seconds=5), +signal_conn=child_pipe, +pickle_dags=False, +async_mode=True, +) + +try: +thread.start() + +# If this completes without hanging, then the test is good! +manager._run_parsing_loop() +exit_event.set() +finally: +import
[GitHub] [airflow] ashb commented on a change in pull request #15112: Avoid scheduler/parser manager deadlock by using non-blocking IO
ashb commented on a change in pull request #15112: URL: https://github.com/apache/airflow/pull/15112#discussion_r605076144 ## File path: tests/utils/test_dag_processing.py ## @@ -521,17 +523,93 @@ def test_dag_with_system_exit(self): manager._run_parsing_loop() +result = None while parent_pipe.poll(timeout=None): result = parent_pipe.recv() if isinstance(result, DagParsingStat) and result.done: break # Three files in folder should be processed -assert len(result.file_paths) == 3 +assert result.num_file_paths == 3 with create_session() as session: assert session.query(DagModel).get(dag_id) is not None +@conf_vars({('core', 'load_examples'): 'False'}) +@pytest.mark.backend("mysql", "postgres") +def test_pipe_full_deadlock(self): +dag_filepath = TEST_DAG_FOLDER / "test_scheduler_dags.py" + +child_pipe, parent_pipe = multiprocessing.Pipe() + +# Shrink the buffers to exacerbate the problem! +for fd in (parent_pipe.fileno(),): +sock = socket.socket(fileno=fd) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024) +sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024) +sock.detach() + +exit_event = threading.Event() + +# To test this behaviour we need something that continually fills the +# parent pipe's bufffer (and keeps it full). +def keep_pipe_full(pipe, exit_event): +import logging + +n = 0 +while True: +if exit_event.is_set(): +break + +req = CallbackRequest(str(dag_filepath)) +try: +logging.debug("Sending CallbackRequests %d", n + 1) +pipe.send(req) +except TypeError: +# This is actually the error you get when the parent pipe +# is closed! Nicely handled, eh? +break +except OSError: +break +n += 1 +logging.debug(" Sent %d CallbackRequests", n) + +thread = threading.Thread(target=keep_pipe_full, args=(parent_pipe, exit_event)) + +fake_processors = [] + +def fake_processor_factory(*args, **kwargs): +nonlocal fake_processors +processor = FakeDagFileProcessorRunner._fake_dag_processor_factory(*args, **kwargs) +fake_processors.append(processor) +return processor + +manager = DagFileProcessorManager( +dag_directory=dag_filepath, +dag_ids=[], +# A reasonable large number to ensure that we trigger the deadlock +max_runs=100, +processor_factory=fake_processor_factory, +processor_timeout=timedelta(seconds=5), +signal_conn=child_pipe, +pickle_dags=False, +async_mode=True, +) + +try: +thread.start() + +# If this completes without hanging, then the test is good! +manager._run_parsing_loop() Review comment: Oh in tests, yes we probably should. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)
uranusjr commented on issue #15116: URL: https://github.com/apache/airflow/issues/15116#issuecomment-811258624 As a hack, maybe it would be possible to call something on the hook before gevent patches? The connection object (creation of which triggers `SSLContext` modifications) is cached, so once it’s created, subsequent S3 calls won’t modify `SSLContext` (I think). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)
uranusjr commented on issue #15116: URL: https://github.com/apache/airflow/issues/15116#issuecomment-811254578 Seems like this is a known gevent issue: gevent/gevent#1777 and gevent/gevent#1531. Note that both issues have been closed without a fix. Reading the issues, it seems like this happens when the monkeypatch happens before an option update, so a fix in Airflow would need to be delaying the patch until all possible context modifications. But since those modifications come from providers, it can literally happen anywhere, so unfortunately I don't see a good way to fix this in Airflow. Hopefully someone can provide a workable solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org