The following bug has been logged on the website: Bug reference: 8192 Logged by: Federico Campoli Email address: feder...@brandwatch.com PostgreSQL version: 9.2.4 Operating system: Debian 6.0 Description:
/* Description: It seems on very large tables the concurrent update with vacuum (or autovacuum), when the slave is in hot standby mode, generates long loops in read on a single wal segment during the recovery process. This have two nasty effects. A massive read IO peak and the replay lag increasing as the recovery process hangs for long periods on a pointless loop. PostgreSQL version: PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit Steps to reproduce the error: setup an hot standby server. the error occurs with streaming replication enabled and disabled */ SET client_min_messages='debug2'; SET trace_sort='on'; --create a new empty table DROP TABLE IF EXISTS t_vacuum; CREATE table t_vacuum ( i_id_row integer, ts_time timestamp with time zone default now() ) ; --disable the autovacuum, we need to run it manually ALTER TABLE t_vacuum set (autovacuum_enabled='off'); --insert into the table enough data to have multiple file nodes INSERT INTO t_vacuum ( i_id_row ) SELECT * FROM generate_series(1,40000000) ; /* start iotop on the hot standby monitoring the postgres:startup process and eventually the wal receiver for example iotop -p 31293 wait for all the wal files generated during the update to be applied this does not apply on live servers of course but is useful to trap only the IO activity generated by vacuum If the streaming replication is on, this select will tell when the slave has finished. In any case is useful to check the replication lag during the vacuum. SELECT pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), replay_location)) FROM pg_stat_replication ; */ --in a new session and start an huge table update UPDATE t_vacuum set ts_time=now() WHERE i_id_row<20000000; --then vacuum the table VACUUM VERBOSE t_vacuum; --at some point the startup process will stuck recovering one single wal file and --the DISK READ column will show a huge IO for a while. --if you monitor the wal receiver also you will notice stream will continue without problems --sooner or later the startup process will resume the recovery but meanwhile the DISK READ is not zero --the slave will accumulate replication lag -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs