Hello pg hackers!

Today we had to run a `REINDEX table CONCURRENTLY my_table;` in our production 
database due to considerable index bloat. We used to deal with this problem in 
the past by using pg_repack but we stopped using it because our data 
replication tool doesn’t support “re creating” tables in the way that pg_repack 
does it (we are using Fivetran).

The reindex went fine in the primary database and in one of our standby. The 
other standby that we also operate for some reason ended up in a state where 
all transactions were locked by the WAL process and the WAL process was not 
able to make any progress. In order to solve this issue we had to move traffic 
from the “bad” standby to the healthy one and then kill all transactions that 
were running in the “bad” standby. After that, replication was able to resume 
successfully. 

I’m just trying to understand what could have caused this issue. I was not able 
to identify any queries in the standby that would be locking the WAL process. 
Any insight would be more than welcome!

We are running pg12 in our cluster and standbys are replicating using physical 
replication lots.

Best,
Martín

Reply via email to