emiliadecaudin opened a new issue, #50802:
URL: https://github.com/apache/airflow/issues/50802
### Apache Airflow Provider(s)
amazon
### Versions of Apache Airflow Providers
apache-airflow==3.0.1
apache-airflow-core==3.0.1
apache-airflow-providers-amazon==9.7.0
apache-airflow-providers-celery==3.10.6
apache-airflow-providers-cncf-kubernetes==10.4.3
apache-airflow-providers-common-compat==1.6.1
apache-airflow-providers-common-io==1.5.4
apache-airflow-providers-common-messaging==1.0.1
apache-airflow-providers-common-sql==1.27.0
apache-airflow-providers-docker==4.3.1
apache-airflow-providers-elasticsearch==6.2.2
apache-airflow-providers-fab==2.0.2
apache-airflow-providers-ftp==3.12.3
apache-airflow-providers-git==0.0.2
apache-airflow-providers-google==15.1.0
apache-airflow-providers-grpc==3.7.3
apache-airflow-providers-hashicorp==4.1.1
apache-airflow-providers-http==5.2.2
apache-airflow-providers-microsoft-azure==12.3.1
apache-airflow-providers-mysql==6.2.2
apache-airflow-providers-odbc==4.9.2
apache-airflow-providers-openlineage==2.2.0
apache-airflow-providers-postgres==6.1.3
apache-airflow-providers-redis==4.0.2
apache-airflow-providers-sendgrid==4.0.1
apache-airflow-providers-sftp==5.2.1
apache-airflow-providers-slack==9.0.5
apache-airflow-providers-smtp==2.0.3
apache-airflow-providers-snowflake==6.3.0
apache-airflow-providers-ssh==4.0.1
apache-airflow-providers-standard==1.1.0
apache-airflow-task-sdk==1.0.1
### Apache Airflow version
3.0.1
### Operating System
Debian GNU/Linux 12 (bookworm)
### Deployment
Docker-Compose
### Deployment details
Docker deployment based on the docker-compose.yaml provided in the Airflow
documentation. A Dockerfile is provided to install a custom provider into the
image. I use the following environment variables (defined in
docker-compose.yaml) to control logging:
```
AIRFLOW__LOGGING__LOGGING_LEVEL: ${AIRFLOW__LOGGING__LOGGING_LEVEL:-INFO}
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER:
"cloudwatch://arn:aws:logs:us-east-1:XXX:log-group:XXX"
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "aws_default"
AIRFLOW__LOGGING__REMOTE_LOGGING: true
```
### What happened
I switched my remote logging destination from an s3 bucket to a Cloudwatch
log group. When running a single-task DAG to test if logs are being written
correctly, the first invocation works, whereas the second and subsequent
invocations fail to actually write the logs to Cloudwatch. This is first
reflected by a header in the log panel on the web interface reporting that the
specified log stream doesn't exist (I confirmed it doesn't upon checking
Cloudwatch itself). If I run a DAG with multiple tasks, the first task writes
its log successfully, while the subsequent tasks fail to do so. Further
invocations of the DAG result in no tasks successfully uploading their logs to
Cloudwatch. Lastly, dag_processer log streams are created and updated
sporadically.
### What you think should happen instead
Logs should be written to Cloudwatch after the completion of every task, and
upon every beat of the dag processor.
### How to reproduce
I was able to reproduce this error using a separate and "minimal" new
environment with the following docker-compose.yml file:
```x-airflow-common: &airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:3.0.1}
environment: &airflow-common-env
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CELERY__RESULT_BACKEND:
db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CORE__AUTH_MANAGER:
airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
AIRFLOW__CORE__DEFAULT_TIMEZONE: "America/New_York"
AIRFLOW__CORE__EXECUTION_API_SERVER_URL:
"http://airflow-apiserver:8080/execution/"
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__LOAD_EXAMPLES: "true"
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:
postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__LOGGING__LOGGING_LEVEL: ${AIRFLOW__LOGGING__LOGGING_LEVEL:-INFO}
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER:
"cloudwatch://arn:aws:logs:us-east-1:XXX:log-group:XXX"
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "aws_default"
AIRFLOW__LOGGING__REMOTE_LOGGING: "true"
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: "true"
AIRFLOW_CONFIG: /opt/airflow/config/airflow.cfg
volumes:
- ./config:/opt/airflow/config
user: "${AIRFLOW_UID:-50000}:0"
depends_on: &airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: [ "CMD", "pg_isready", "-U", "airflow" ]
interval: 10s
retries: 5
start_period: 5s
restart: always
redis:
# Redis is limited to 7.2-bookworm due to licencing change
# https://redis.io/blog/redis-adopts-dual-source-available-licensing/
image: redis:7.2-bookworm
expose:
- 6379
healthcheck:
test: [ "CMD", "redis-cli", "ping" ]
interval: 10s
timeout: 30s
retries: 50
start_period: 30s
restart: always
airflow-apiserver:
<<: *airflow-common
command: api-server
ports:
- "8080:8080"
healthcheck:
test: [ "CMD", "curl", "--fail",
"http://localhost:8080/api/v2/version" ]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: [ "CMD", "curl", "--fail", "http://localhost:8974/health" ]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-dag-processor:
<<: *airflow-common
command: dag-processor
healthcheck:
test: [ "CMD-SHELL", 'airflow jobs check --job-type DagProcessorJob
--hostname "$${HOSTNAME}"' ]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
# yamllint disable rule:line-length
test:
- "CMD-SHELL"
- 'celery --app
airflow.providers.celery.executors.celery_executor.app inspect ping -d
"celery@$${HOSTNAME}" || celery --app airflow.executors.celery_executor.app
inspect ping -d "celery@$${HOSTNAME}"'
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
environment:
<<: *airflow-common-env
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-apiserver:
condition: service_healthy
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
command: triggerer
healthcheck:
test: [ "CMD-SHELL", 'airflow jobs check --job-type TriggererJob
--hostname "$${HOSTNAME}"' ]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
/entrypoint airflow version
# yamllint enable rule:line-length
environment:
<<: *airflow-common-env
_AIRFLOW_DB_MIGRATE: "true"
_AIRFLOW_WWW_USER_CREATE: "true"
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
user: "0:0"
volumes:
postgres-db-volume:
```
### Anything else
I am happy to share DEBUG level logs from the worker process if requested.
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]