Re: [HACKERS] Synch Rep for CommitFest 2009-07

Fujii Masao Fri, 17 Jul 2009 02:15:41 -0700

Hi,

On Thu, Jul 16, 2009 at 6:00 PM, Heikki
Linnakangas<heikki.linnakan...@enterprisedb.com> wrote:
> The archive should not normally contain partial XLOG files, only if you
> manually copy one there after primary has crashed. So I don't think
> that's something we need to support.


You are right. And, if the last valid record exists in the middle of
the restored
file (e.g. by XLOG_SWITCH record), <begin> should indicate the head of the
next file.

> Hmm. You only need the timeline history file if the base backup was
> taken in an earlier timeline. That situation would only arise if you
> (manually) take a base backup, restore to a server (which creates a new
> timeline), and then create a slave against that server. At least in the
> 1st phase, I think we can assume that the standby has access to the same
> archive, and will find the history file from there. If not, throw an
> error. We can add more bells and whistles later.

Okey, I hold the problem about a history file for possible later consideration.

> As the patch stands, new walsender connections are refused when one is
> active already. What if the walsender connection is in a zombie state?
> For example, it's trying to send WAL to the slave, but the network
> connection is down, and the packets are going to a black hole. It will
> take a while for the TCP layer to declare the connection dead, and close
> the socket. During that time, you can't connect a new slave to the
> master, or the same slave using a better network connection.
>
> The most robust way to fix that is to support multiple walsenders. The
> zombie walsender can take its time to die, while the new walsender
> serves the new connection. You could tweak SO_TIMEOUTs and stuff, but
> even then the standby process could be in some weird hung state.
>
> And of course, when we get around to add support for multiple slaves,
> we'll have to do that anyway. Better get it right to begin with.

Thanks for the detailed description! I was thinking that a new GUC
replication_timeout and some keepalive parameters would be enough to
help with such trouble. But I agree that the support multiple walsenders
is better solution, so I'll try this problem.

> Even in synchronous replication, a backend should only have to wait when
> it commits. You would only see the difference with very large
> transactions that write more WAL than fits in wal_buffers, though, like
> data loading.

That's right.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synch Rep for CommitFest 2009-07

Reply via email to