kaxil commented on issue #65010:
URL: https://github.com/apache/airflow/issues/65010#issuecomment-4227785690
To help narrow down the root cause, could you capture some PostgreSQL
diagnostics next time the issue occurs?
**1. Connection state snapshot** (most important -- run when connections are
high):
```sql
SELECT state, count(*) AS cnt
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
ORDER BY cnt DESC;
```
And the detailed view:
```sql
SELECT state, wait_event_type, wait_event,
left(query, 100) AS query_prefix,
age(now(), query_start) AS duration
FROM pg_stat_activity
WHERE datname = current_database()
ORDER BY query_start;
```
This tells us whether the stuck connections are `active`, `idle`, or `idle
in transaction`.
**2. Lock contention** -- are connections waiting on each other?
```sql
SELECT blocked.pid, left(blocked.query, 80) AS blocked_query,
blocking.pid AS blocking_pid, left(blocking.query, 80) AS
blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype AND gl.relation = bl.relation
AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
WHERE blocked.datname = current_database();
```
**3. Log table size** -- the `log` table gets an INSERT on every task state
change (new in 3.2.0). If it's large, INSERTs slow down and hold connections
longer:
```sql
SELECT relname, n_live_tup, n_dead_tup,
pg_size_pretty(pg_total_relation_size(oid)) AS total_size,
last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'log';
```
**4. Connection growth trend** -- run every 5 minutes for ~30 minutes to
capture the accumulation pattern:
```sql
SELECT now(), count(*), state
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state;
```
**5. PostgreSQL timeout settings**:
```sql
SHOW idle_in_transaction_session_timeout;
SHOW tcp_keepalives_idle;
```
The `state` column from #1 is the single most useful data point. If we see
growing `idle in transaction` connections, that's a session leak. If we see
`active` connections piling up on `INSERT INTO log` or `SELECT ... FOR UPDATE`,
that's lock contention from new 3.2.0 query patterns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]