On Thursday, May 30, 2013, wrote: > The following bug has been logged on the website: > > Bug reference: 8192 > Logged by: Federico Campoli > Email address: feder...@brandwatch.com <javascript:;> > PostgreSQL version: 9.2.4 > Operating system: Debian 6.0 > Description: > > /* > > Description: > > It seems on very large tables the concurrent update with vacuum (or > autovacuum), > when the slave is in hot standby mode, generates long loops in read on a > single wal segment during the recovery process. > > This have two nasty effects. > A massive read IO peak and the replay lag increasing as the recovery > process > hangs for long periods on a pointless loop. >
Are you observing a loop, and if so how are you observing it? What is it that is looping? > SET client_min_messages='debug2'; > SET trace_sort='on'; > Are these settings useful? What are they showing you? > > --in a new session and start an huge table update > UPDATE t_vacuum set ts_time=now() WHERE i_id_row<20000000; > > --then vacuum the table > VACUUM VERBOSE t_vacuum; > Are you running the update and vacuum concurrently or serially? > > --at some point the startup process will stuck recovering one single wal > file and > --the DISK READ column will show a huge IO for a while. > What is huge? I don't know if I can reproduce this or not. I certainly get spiky lag, but I see no reason to think it is anything other than IO congestion, occurring during stretches of WAL records where compact records describe a larger amount of work that needs to be done in terms of poorly-cached IO. Perhaps the kernel's read-ahead mechanism is not working as well as it theoretically could be. Also the standby isn't using a ring-buffer strategy, but I see no reason to think it would help were it to do so. The DISK READ column is not what I would call huge during this, often 10-15 MB/S, because much of the IO is scattered rather than sequential. The IO wait % on the other hand is maxed out. It is hard to consider it as a bug that the performance is not as high as one might wish it to be. Is this behavior a regression from some earlier version? What if hot-standby is turned off? Cheers, Jeff