Hello,

On PG 9.0.8, we just observed a long-running query executing on a replica (~1 
hour), which was effectively blocking replication. I say effectively, as checks 
on streaming replication appeared as if everything was up-to-date (using SELECT 
pg_current_xlog_location() on the primary and SELECT 
pg_last_xlog_receive_location() on the replica). However, when we checked a 
frequently updated table on the replica, it was ~1 hour behind the primary. 

It has been our experience (and configuration) that long running queries that 
block replication get cancelled after at most 10 minutes. These are the 
relevant settings from our postgresql.conf on the replica:

# - Standby Servers -

hot_standby = on                        # "on" allows queries during recovery
                                        # (change requires restart)
max_standby_archive_delay = 600s        # max delay before canceling queries
                                        # when reading WAL from archive;
                                        # -1 allows indefinite delay
max_standby_streaming_delay = 600s      # max delay before canceling queries
                                        # when reading streaming WAL;
                                        # -1 allows indefinite delay

----

It is worth noting that the query was joining a table from the main schema that 
is not updated often with a table from another schema (that may or may not be 
updated often). However, it appears that replication ground to a halt until we 
terminated the query (which triggered crash recovery).

Are we seeing something abnormal or unexpected here? It caught us by surpriseā€¦

Thank you,

Andrew Hannon 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to