On Wednesday, September 12, 2012 10:12 PM Magnus Hagander wrote: On Wed, Sep 12, 2012 at 1:54 PM, <amit.kap...@huawei.com> wrote: >> The following bug has been logged on the website: > >> Bug reference: 7534 >> Logged by: Amit Kapila >> Email address: amit.kap...@huawei.com >> PostgreSQL version: 9.2.0 >> Operating system: Suse 10 >> Description: > >> 1. Both master and standby machine are connected normally, >> 2. then you use the command: ifconfig ip down; make the network card of >> master and standby down, > >> Observation >> master can detect connect abnormal, but the standby can't detect connect >> abnormal and show a connected channel long time.
> The master will detect it quicker, because it will get an error when > it tries to send something. > But the standby should detect it either when sending the feedback > message (what's your wal_receiver_status_interval set to?) or when > ythe kernel does (have you configured the tcp keepalive on the slave > somehow?) wal_receiver_status_interval - 10s (we have not changed this. Used as default). We have tried by using tcp keepalive as well, it might not be able to detect as receiver is anyway trying to send Receiver status. It fails during send socket call from XLogWalRcvSendReply() after calling the same many times as internally might be in send() until the sockets internal buffer is full, it keeps accumulating even if other side recv has not received the data. Also in walsender, it is failing to replication_timeout parameter not due to send failure. So in my opinion, the full-proof solution would be to have mechanism (replication_timeout) similar to walsender in walreceiver. > Oh, and what do you actually mean by "long time"? 15-20 mins. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs