[ 
https://issues.apache.org/jira/browse/AIRFLOW-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Fernandez updated AIRFLOW-2442:
-----------------------------------------
    Description: 
*Summary*

The "airflow run" command creates a connection to the database and leaves it 
open (until killed by SQLALchemy later). The number of these connections can 
skyrocket whenever hundreds/thousands of tasks are launched simultaneously, and 
potentially hit the database connection limit.

The problem is that in cli.py, the run() method first calls 
{code:java}
settings.configure_orm(disable_connection_pool=True){code}
correctly
 to use a NullPool, but then parses any custom configs and again calls
{code:java}
settings.configure_orm(){code}
, thereby overriding the desired behavior by instead using a QueuePool.
 The QueuePool uses the default configs for SQL_ALCHEMY_POOL_SIZE and 
SQL_ALCHEMY_POOL_RECYCLE. This means that while the task is running and the 
executor is sending heartbeats, the sleeping connection is idle until it is 
killed by SQLAlchemy.

This fixes a bug introduced by 
[https://github.com/apache/incubator-airflow/pull/1934] in 
[https://github.com/apache/incubator-airflow/pull/1934/commits/b380013634b02bb4c1b9d1cc587ccd12383820b6#diff-1c2404a3a60f829127232842250ff406R344]
  

which is present in branches 1-8-stable, 1-9-stable, and 1-10-test

NOTE: Will create a PR once I've done more testing since I'm on an older 
branch. For now, attaching a patch file [^AIRFLOW-2442.patch]

  was:
*Summary*

The "airflow run" command creates a connection to the database and leaves it 
open (until killed by SQLALchemy later). The number of these connections can 
skyrocket whenever hundreds/thousands of tasks are launched simultaneously, and 
potentially hit the database connection limit.

The problem is that in cli.py, the run() method first calls 
{code:java}
settings.configure_orm(disable_connection_pool=True){code}
correctly
 to use a NullPool, but then parses any custom configs and again calls
{code:java}
settings.configure_orm(){code}
, thereby overriding the desired behavior by instead using a QueuePool.
 The QueuePool uses the default configs for SQL_ALCHEMY_POOL_SIZE and 
SQL_ALCHEMY_POOL_RECYCLE. This means that while the task is running and the 
executor is sending heartbeats, the sleeping connection is idle until it is 
killed by SQLAlchemy.

This fixes a bug introduced by 
[https://github.com/apache/incubator-airflow/pull/1934] in 
[https://github.com/apache/incubator-airflow/pull/1934/commits/b380013634b02bb4c1b9d1cc587ccd12383820b6#diff-1c2404a3a60f829127232842250ff406R344]
  

which is present in branches 1-8-stable, 1-9-stable, and 1-10-test


> Airflow run command leaves database connections open, which can hit the 
> database limit
> --------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2442
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2442
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: Airflow 1.8, 1.8.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>            Priority: Major
>             Fix For: Airflow 2.0
>
>         Attachments: connection_duration_1_hour.png, db_connections.png, 
> fixed_before_and_after.jpg, monthly_db_connections.png, running_tasks.png
>
>
> *Summary*
> The "airflow run" command creates a connection to the database and leaves it 
> open (until killed by SQLALchemy later). The number of these connections can 
> skyrocket whenever hundreds/thousands of tasks are launched simultaneously, 
> and potentially hit the database connection limit.
> The problem is that in cli.py, the run() method first calls 
> {code:java}
> settings.configure_orm(disable_connection_pool=True){code}
> correctly
>  to use a NullPool, but then parses any custom configs and again calls
> {code:java}
> settings.configure_orm(){code}
> , thereby overriding the desired behavior by instead using a QueuePool.
>  The QueuePool uses the default configs for SQL_ALCHEMY_POOL_SIZE and 
> SQL_ALCHEMY_POOL_RECYCLE. This means that while the task is running and the 
> executor is sending heartbeats, the sleeping connection is idle until it is 
> killed by SQLAlchemy.
> This fixes a bug introduced by 
> [https://github.com/apache/incubator-airflow/pull/1934] in 
> [https://github.com/apache/incubator-airflow/pull/1934/commits/b380013634b02bb4c1b9d1cc587ccd12383820b6#diff-1c2404a3a60f829127232842250ff406R344]
>   
> which is present in branches 1-8-stable, 1-9-stable, and 1-10-test
> NOTE: Will create a PR once I've done more testing since I'm on an older 
> branch. For now, attaching a patch file [^AIRFLOW-2442.patch]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to