kaxil commented on issue #65010:
URL: https://github.com/apache/airflow/issues/65010#issuecomment-4227785690

   To help narrow down the root cause, could you capture some PostgreSQL 
diagnostics next time the issue occurs?
   
   **1. Connection state snapshot** (most important -- run when connections are 
high):
   ```sql
   SELECT state, count(*) AS cnt
   FROM pg_stat_activity
   WHERE datname = current_database()
   GROUP BY state
   ORDER BY cnt DESC;
   ```
   
   And the detailed view:
   ```sql
   SELECT state, wait_event_type, wait_event,
          left(query, 100) AS query_prefix,
          age(now(), query_start) AS duration
   FROM pg_stat_activity
   WHERE datname = current_database()
   ORDER BY query_start;
   ```
   
   This tells us whether the stuck connections are `active`, `idle`, or `idle 
in transaction`.
   
   **2. Lock contention** -- are connections waiting on each other?
   ```sql
   SELECT blocked.pid, left(blocked.query, 80) AS blocked_query,
          blocking.pid AS blocking_pid, left(blocking.query, 80) AS 
blocking_query
   FROM pg_stat_activity blocked
   JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
   JOIN pg_locks gl ON gl.locktype = bl.locktype AND gl.relation = bl.relation 
AND gl.granted
   JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
   WHERE blocked.datname = current_database();
   ```
   
   **3. Log table size** -- the `log` table gets an INSERT on every task state 
change (new in 3.2.0). If it's large, INSERTs slow down and hold connections 
longer:
   ```sql
   SELECT relname, n_live_tup, n_dead_tup,
          pg_size_pretty(pg_total_relation_size(oid)) AS total_size,
          last_autovacuum
   FROM pg_stat_user_tables
   WHERE relname = 'log';
   ```
   
   **4. Connection growth trend** -- run every 5 minutes for ~30 minutes to 
capture the accumulation pattern:
   ```sql
   SELECT now(), count(*), state
   FROM pg_stat_activity
   WHERE datname = current_database()
   GROUP BY state;
   ```
   
   **5. PostgreSQL timeout settings**:
   ```sql
   SHOW idle_in_transaction_session_timeout;
   SHOW tcp_keepalives_idle;
   ```
   
   The `state` column from #1 is the single most useful data point. If we see 
growing `idle in transaction` connections, that's a session leak. If we see 
`active` connections piling up on `INSERT INTO log` or `SELECT ... FOR UPDATE`, 
that's lock contention from new 3.2.0 query patterns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to