Hi, On Thu, Jul 16, 2009 at 6:00 PM, Heikki Linnakangas<heikki.linnakan...@enterprisedb.com> wrote: > The archive should not normally contain partial XLOG files, only if you > manually copy one there after primary has crashed. So I don't think > that's something we need to support.
You are right. And, if the last valid record exists in the middle of the restored file (e.g. by XLOG_SWITCH record), <begin> should indicate the head of the next file. > Hmm. You only need the timeline history file if the base backup was > taken in an earlier timeline. That situation would only arise if you > (manually) take a base backup, restore to a server (which creates a new > timeline), and then create a slave against that server. At least in the > 1st phase, I think we can assume that the standby has access to the same > archive, and will find the history file from there. If not, throw an > error. We can add more bells and whistles later. Okey, I hold the problem about a history file for possible later consideration. > As the patch stands, new walsender connections are refused when one is > active already. What if the walsender connection is in a zombie state? > For example, it's trying to send WAL to the slave, but the network > connection is down, and the packets are going to a black hole. It will > take a while for the TCP layer to declare the connection dead, and close > the socket. During that time, you can't connect a new slave to the > master, or the same slave using a better network connection. > > The most robust way to fix that is to support multiple walsenders. The > zombie walsender can take its time to die, while the new walsender > serves the new connection. You could tweak SO_TIMEOUTs and stuff, but > even then the standby process could be in some weird hung state. > > And of course, when we get around to add support for multiple slaves, > we'll have to do that anyway. Better get it right to begin with. Thanks for the detailed description! I was thinking that a new GUC replication_timeout and some keepalive parameters would be enough to help with such trouble. But I agree that the support multiple walsenders is better solution, so I'll try this problem. > Even in synchronous replication, a backend should only have to wait when > it commits. You would only see the difference with very large > transactions that write more WAL than fits in wal_buffers, though, like > data loading. That's right. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers