On Wed, Jun 2, 2010 at 8:14 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Indeed, but nothing we do can prevent that, if the slave is just plain > slower than the master. You have to assume that the slave is capable of > keeping up in the absence of query-caused delays, or you're hosed.
I was assuming the walreceiver only requests more wal in relatively small chunks and only when replay has caught up and needs more data. I haven't actually read this code so if that assumption is wrong then I'm off-base. But if my assumption is right then it's not merely the master running faster than the slave that can cause you to fall arbitrarily far behind. The "receive time" is delayed by how long the slave waited on a previous lock, so if we wait for a few minutes, then proceed and find we need more wal data we'll read data from the master that it could have generated those few minutes ago. > The sticky point is that once in a blue moon you do have a conflicting > query sitting on a buffer lock for a long time, or even more likely a > series of queries keeping the WAL replay process from obtaining buffer > cleanup lock. So I think this isn't necessarily such a blue moon event. As I understand it, all it would take is a single long-running report and a vacuum or HOT cleanup occurring on the master. If I want to set max_standby_delay to 60min to allow reports to run for up to an hour then any random HOT cleanup on the master will propagate to the slave and cause a WAL stall until the transactions which have that xid in their snapshot end. Even the buffer locks are could potentially be blocked for a long time if you happen to run the right kind of query (say a nested loop with the buffer in question on the outside and a large table on the inside). That's a rarer event though; is that what you were thinking of? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers