Taragolis commented on PR #29616: URL: https://github.com/apache/airflow/pull/29616#issuecomment-1437330747
Yeah as soon as just run a go to do my daily routine it finally failed 🥇 💯 😢 That is quite a bit interesting thing, some of them mostly could be a "wrong assumptions" Dag Runs --- ```console HTTP: GET dags/example_bash_operator/dagRuns {'dag_runs': [{'conf': {}, 'dag_id': 'example_bash_operator', 'dag_run_id': 'test_dag_run_id', 'data_interval_end': '2023-02-20T00:00:00+00:00', 'data_interval_start': '2023-02-19T00:00:00+00:00', 'end_date': None, 'execution_date': '2023-02-20T10:30:00.702880+00:00', 'external_trigger': True, 'last_scheduling_decision': None, 'logical_date': '2023-02-20T10:30:00.702880+00:00', 'note': None, 'run_type': 'manual', 'start_date': None, 'state': 'queued'}], 'total_entries': 1} ``` `example_bash_operator` DAG has scheduling interval, as result we should see here 2 DAG Runs, first for scheduled and second manual, in this case we could see only one - manual which created during the test. Scheduler Logs --- ```console airflow-scheduler_1 | airflow-scheduler_1 | BACKEND=redis airflow-scheduler_1 | DB_HOST=redis airflow-scheduler_1 | DB_PORT=6379 airflow-scheduler_1 | airflow-scheduler_1 | /home/airflow/.local/lib/python3.7/site-packages/airflow/models/base.py:49 MovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings. Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) airflow-scheduler_1 | ____________ _____________ airflow-scheduler_1 | ____ |__( )_________ __/__ /________ __ airflow-scheduler_1 | ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / airflow-scheduler_1 | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / airflow-scheduler_1 | _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ airflow-scheduler_1 | [2023-02-20T10:29:14.618+0000] {executor_loader.py:114} INFO - Loaded executor: CeleryExecutor airflow-scheduler_1 | [2023-02-20T10:29:14.664+0000] {scheduler_job.py:724} INFO - Starting the scheduler airflow-scheduler_1 | [2023-02-20T10:29:14.665+0000] {scheduler_job.py:731} INFO - Processing each file at most -1 times airflow-scheduler_1 | [2023-02-20T10:29:14.669+0000] {manager.py:164} INFO - Launched DagFileProcessorManager with pid: 33 airflow-scheduler_1 | [2023-02-20T10:29:14.671+0000] {scheduler_job.py:1437} INFO - Resetting orphaned tasks for active dag runs airflow-scheduler_1 | [2023-02-20T10:29:14.685+0000] {settings.py:61} INFO - Configured default timezone Timezone('UTC') ``` Thats all, seems like it scheduler is just hang but service reported that it healthy. Is it problem with recent changes in health check https://github.com/apache/airflow/pull/29408 and maybe problem with simple http server in scheduler. I would add output from `/health` endpoint in case of failure Docker services after test failure --- ```console $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8da8ebd97f17 ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c "/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy) 8080/tcp quick-start_airflow-triggerer_1 88a829428ce8 ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c "/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp quick-start_airflow-webserver_1 f3baa9496225 ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c "/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy) 8080/tcp quick-start_airflow-scheduler_1 134b3356ed96 ghcr.io/apache/airflow/main/prod/python3.7:a8723aa63be724652809c141714af95493aea68c "/usr/bin/dumb-init …" 2 minutes ago Up 2 minutes (healthy) 8080/tcp quick-start_airflow-worker_1 a5f5e8250820 redis:latest "docker-entrypoint.s…" 3 minutes ago Up 3 minutes (healthy) 6379/tcp quick-start_redis_1 de963f245166 postgres:13 "docker-entrypoint.s…" 3 minutes ago Up 3 minutes (healthy) 5432/tcp quick-start_postgres_1 ``` All healthy, that mean initially services pass health check after start time Versions --- ```console $ docker version Client: Version: 20.10.23+azure-2 API version: 1.41 Go version: go1.19.6 Git commit: 715524332ff91d0f9ec5ab2ec95f051456ed1dba Built: Wed Jan 18 20:42:16 UTC 2023 OS/Arch: linux/amd64 Context: default Experimental: true Server: Engine: Version: 20.10.22+azure-1 API version: 1.41 (minimum version 1.12) Go version: go1.18.9 Git commit: 42c8b314993e5eb3cc2776da0bbe41d5eb4b707b Built: Thu Dec 15 22:17:04 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.18+azure-1 GitCommit: 2456e983eb9e37e47538f59ea18f2043c9a73640 runc: Version: 1.1.4 GitCommit: 5fd4c4d144137e991c4acebb2146ab1483a97925 docker-init: Version: 0.19.0 GitCommit: ``` ```console $ docker-compose version docker-compose version 1.29.2, build 5becea4c docker-py version: 5.0.0 CPython version: 3.7.10 OpenSSL version: OpenSSL 1.1.0l 10 Sep 2019 ``` That is more interesting. I've seen before that statics checks sometimes failed with particular this version of docker `20.10.23+azure-2` and didn't seen that this happen in docker without `azure-X`. Another strange things --- `Prepare Breeze and PROD image` step have a lot of errors witch refers to permission denied ```console Received 27910740 of 32105044 (86.9%), 26.6 MBs/sec Received 32105044 of 32105044 (100.0%), 29.8 MBs/sec Cache Size: ~31 MB (32105044 B) /usr/bin/tar -xf /home/runner/work/_temp/00fdf96d-139b-4[95](https://github.com/apache/airflow/actions/runs/4222264319/jobs/7330883288#step:4:100)4-ad8c-852b0f051104/cache.tgz -P -C /home/runner/work/airflow/airflow -z /usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../.local/pipx: Cannot mkdir: No such file or directory /usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../.local/pipx/shared: Cannot mkdir: No such file or directory /usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../.local/pipx/shared/lib: Cannot mkdir: No such file or directory /usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../.local/pipx/shared/lib/python3.7: Cannot mkdir: No such file or directory /usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../.local/pipx/shared/lib/python3.7/site-packages: Cannot mkdir: No such file or directory /usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../.local/pipx/shared/lib/python3.7/site-packages/_distutils_hack: Cannot mkdir: No such file or directory /usr/bin/tar: ../../../../.local: Cannot mkdir: Permission denied /usr/bin/tar: ../../../../.local/pipx/shared/lib/python3.7/site-packages/_distutils_hack/override.py: Cannot open: No such file or directory ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org