GitHub user MrHenryD edited a discussion: DAGProcessor failing to process dag 
files after an unknown period of time

Airflow DAGProcessor works fine initially but after a period of time, it fails 
to parse some DAG files which causes them to be disabled and not scheduled. 
Having trouble debugging what the issue is and would appreciate any guidance.

In addition, there seems to be some instability around how long it takes to 
process these files. Sometimes a DAG could be processed within 10s and other 
times 40s (which is odd because it's being given a lot more resources than it's 
currently using)

## Configuration
**Python**: 3.11
**Airflow**: 3.0.2
**Executor**: KubernetesExecutor
**Platform**: AWS EKS
**Storage**: EFS

## DAG Processing Stats
<img width="620" height="118" alt="image" 
src="https://github.com/user-attachments/assets/eb313a56-1dc8-4d36-af8e-c4b242efaf95";
 />

I see errors associated with a particular dag file path, but I don't know how 
to go about checking for errors (don't know if there's a command I can run to 
list errors associated with a dag bundle or file). My assumption was that the 
processor either `lacks resources` or is `timing out`, but based on what I see, 
this shouldn't be possible.

## DAG Processor Metrics
**CPU usage is very low for dag processor (was allocated 4cpus)**
<img width="1090" height="230" alt="image" 
src="https://github.com/user-attachments/assets/986a1cd2-f44c-44e5-b98c-a0383996217d";
 />

**Memory usage is also very low (was allocated 4gb ram)**
<img width="1086" height="227" alt="image" 
src="https://github.com/user-attachments/assets/7e869548-c446-45fa-8bc2-32b0c7d6c512";
 />


## Airflow Settings
Timeout settings and refresh interval is already quite high.
```
[dag_processor]
bundle_refresh_check_interval = 240
dag_file_processor_timeout = 1800
disable_bundle_versioning = true
min_file_process_interval = 600
parsing_processes = 3
print_stats_interval = 300
refresh_interval = 600
stale_bundle_cleanup_interval = 3600
stale_bundle_cleanup_min_versions = 1
stale_dag_threshold = 1800

[scheduler]
dag_stale_not_seen_duration = 3600
parsing_cleanup_interval = 600
run_duration = 41460
standalone_dag_processor = True
statsd_host = airflow-statsd
statsd_on = True
statsd_port = 9125
statsd_prefix = airflow
```

GitHub link: https://github.com/apache/airflow/discussions/54274

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to