Hi there,
Issue: Would love to get pointers on an issue we have been seeing after we upgraded our airflow installation from 1.8.0 to 1.10.1. The configuration we use is the same across these versions but we see task failures due to number of DB connections being used up. The failures are mainly when the scheduler tries to build a new DAG. The exceptions that we see are (attached sample stack trace): - psycopg2.OperationalError: FATAL: too many connections for role xxx - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: remaining connection slots are reserved for non-replication superuser connections Info: Below are the settings that seem relevant to this behavior (also attaching our config file): -------------------- sql_alchemy_pool_size = 5 sql_alchemy_pool_recycle = 3600 sql_alchemy_reconnect_timeout = 300 parallelism = 32 dag_concurrency = 16 dags_are_paused_at_creation = True non_pooled_task_slot_count = 128 max_active_runs_per_dag = 16 workers = 4 scheduler_zombie_task_threshold = 300 ----------- Setup: We use postgres as the DB backend and connection limit for Airflow user has been set to 100. Below is how airflow components are setup: Node 1: Worker(8), webserver, scheduler Node 2: Worker(8), webserver Node 3: Worker(8) Node 4: Worker(8) We could not find anything in commits, JIRA and dev mailing list which could point to why Airflow 1.10.1 would start using more connections vs Airflow 1.8.0. The only commit that seemed related in 1.10.2 is https://github.com/apache/airflow/commit/959dd619d19223db3709fa4abcf52e8ee98bc079. Since, we don't know the root cause of this behavior, not sure if upgrading to 1.10.2 is going to help. Is there a way to estimate the number of connections that can be used based on the configuration and setup? Or perhaps identifying the settings that can significantly affect it. Any help is greatly appreciated. Regards, Kiran