Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Heikki Linnakangas Mon, 01 Oct 2012 03:39:07 -0700

On 21.09.2012 14:18, Amit kapila wrote:

On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote:
On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila<[email protected]>  wrote:

Approach-2 :
Provide a variable wal_send_status_interval, such that if this is 0, then
the current behavior would prevail and if its non-zero then KeepAlive
message would be send maximum after that time.
The modified code of WALSendLoop will be as follows:


<snip>

Which way you think is better or you have any other idea to handle.

I think #2 is better because it's more intuitive to a user.


Please find a patch attached for implementation of Approach-2.

Hmm, I think we need to step back a bit. I've never liked the wayreplication_timeout works, where it's the user's responsibility to setwal_receiver_status_interval < replication_timeout. It's not veryuser-friendly. I'd rather not copy that same design to this walreceivertimeout. If there's two different timeouts like that, it's even worse,because it's easy to confuse the two.

So let's think how this should ideally work from a user's point of view.I think there should be just two settings: walsender_timeout andwalreceiver_timeout. walsender_timeout specifies how long a walsenderwill keep a connection open if it doesn't hear from the walreceiver, andwalreceiver_timeout is the same for walreceiver. The system shouldfigure out itself how often to send keepalive messages so that thosetimeouts are not reached.

In walsender, after half of walsender_timeout has elapsed and we haven'treceived anything from the client, the walsender process should send a"ping" message to the client. Whenever the client receives a Ping, itreplies. The walreceiver does the same; when half of walreceiver_timeouthas elapsed, send a Ping message to the server. Each Ping-Pong roundtripresets the timer in both ends, regardless of which side initiated it, soif e.g walsender_timeout < walreceiver_timeout, the client will neverhave to initiate a Ping message, because walsender will always reach thewalsender_timeout/2 point first and initiate the heartbeat message.

The Ping/Pong messages don't necessarily need to be new message types,we can use the message types we currently have, perhaps with anadditional flag attached to them, to request the other side to replyimmediately.


- Heikki


--
Sent via pgsql-bugs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

Reply via email to