Simon Riggs <si...@2ndquadrant.com> writes: > OK, here's v4. I've been trying to stay out of this thread, but with beta2 approaching and no resolution in sight, I'm afraid I have to get involved.
This patch seems to me to be going in fundamentally the wrong direction. It's adding complexity and overhead (far more than is needed), and it's failing utterly to resolve the objections that I raised to start with. In particular, Simon seems to be basically refusing to do anything about the complaint that the code fails unless master and standby clocks are in close sync. I do not believe that this is acceptable, and since he won't fix it, I guess I'll have to. The other basic problem that I had with the current code, which this does nothing about, is that it believes that timestamps pulled from WAL archive should be treated on the same footing as timestamps of "live" actions. That might work in certain limited scenarios but it's going to be a disaster in general. I believe that the motivation for treating archived timestamps as live is, essentially, to force rapid catchup if a slave falls behind so far that it's reading from archive instead of SR. There are certainly use-cases where that's appropriate (though, again, not all); but even when you do want it it's a pretty inflexible implementation. For realistic values of max_standby_delay the behavior is going to pretty much be "instant kill whenever we're reading from archive". I have backed off my original suggestion that we should reduce max_standby_delay to a boolean: that was based on the idea that delays would only occur in response to DDL on the master, but since vacuum and btree page splits can also trigger delays, it seems clear that a "kill queries immediately" policy isn't really very useful in practice. So we need to make max_standby_delay work rather than just get rid of it. What I think might be a realistic compromise is this: 1. Separate max_standby_delay into two GUCs, say "max_streaming_delay" and "max_archive_delay". 2. When applying WAL that came across SR, use max_streaming_delay and let the time measurement be current time minus time of receipt of the current WAL send chunk. 3. When applying WAL that came from archive, use max_archive_delay and let the time measurement be current time minus time of acquisition of the current WAL segment from the archive. The current code's behavior in the latter case could effectively be modeled by setting max_archive_delay to zero, but that isn't the only plausible setting. More likely DBAs would set max_archive_delay to something smaller than max_streaming_delay, but still positive so as to not kill conflicting queries instantly. An important property of this design is that all relevant timestamps are taken on the slave, so clock skew isn't an issue. I'm still inclined to apply the part of Simon's patch that adds a transmit timestamp to each SR send chunk. That would actually be completely unused by the slave given my proposal above, but I think that it is an important step to take to future-proof the SR protocol against possible changes in the slave-side timing logic. I don't however see the value of transmitting "keepalive" records when we'd otherwise not have anything to send. The value of adding timestamps to the SR protocol is really to let the slave determine the current amount of clock skew between it and the master; which is a number that should not change so rapidly that it has to be updated every 100ms even in an idle system. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers