On Thursday, September 13, 2012 10:57 PM Fujii Masao 
On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kap...@huawei.com> wrote:
> On Wednesday, September 12, 2012 10:15 PM Fujii Masao
> On Wed, Sep 12, 2012 at 8:54 PM,  <amit.kap...@huawei.com> wrote:
>>>> The following bug has been logged on the website:
>>>
>>>> Bug reference:      7534
>>>> Logged by:          Amit Kapila
>>>> Email address:      amit.kap...@huawei.com
>>>> PostgreSQL version: 9.2.0
>>>> Operating system:   Suse 10
>>>> Description:
>>
>>>> 1. Both master and standby machine are connected normally,
>>>> 2. then you use the command: ifconfig ip down; make the network card of
>>>> master and standby down,
>>
>>>> Observation
>>>> master can detect connect abnormal, but the standby can't detect connect
>>>> abnormal and show a connected channel long time.
>
>
>>  I would like to implement such feature for walreceiver, but there is one
>> confusion that whether to use
>>  same configuration parameter(replication_timeout) for walrecevier as for
>> master or introduce a new
>>  configuration parameter (receiver_replication_timeout).

>I like the latter. I believe some users want to set the different
>timeout values,
>for example, in the case where the master and standby servers are placed in
>the same room, but cascaded standby is placed in other continent.

Thank you for your suggestion. I have implemented as per your suggestion to 
have separate timeout parameter for walreceiver.
The main changes are:
1. Introduce a new configuration parameter wal_receiver_replication_timeout for 
walreceiver.
2. In function WalReceiverMain(), check if there is no communication till 
wal_receiver_replication_timeout, exit the walreceiver.
    This is same as walsender functionality.

As this is a feature, So I am uploading the attached patch in coming CommitFest.

Suggestions/Comments?

With Regards,
Amit Kapila.
*** a/src/backend/replication/walreceiver.c
--- b/src/backend/replication/walreceiver.c
***************
*** 62,67 **** walrcv_connect_type walrcv_connect = NULL;
--- 62,69 ----
  walrcv_receive_type walrcv_receive = NULL;
  walrcv_send_type walrcv_send = NULL;
  walrcv_disconnect_type walrcv_disconnect = NULL;
+ int                   wal_receiver_replication_timeout = 60 * 1000;   /* 
maximum time to receive one
+                                                                               
                 * WAL data message */
  
  #define NAPTIME_PER_CYCLE 100 /* max sleep time between cycles (100ms) */
  
***************
*** 174,179 **** WalReceiverMain(void)
--- 176,184 ----
        /* use volatile pointer to prevent code rearrangement */
        volatile WalRcvData *walrcv = WalRcv;
  
+       TimestampTz last_recv_timestamp;
+       TimestampTz timeout = 0;
+ 
        /*
         * WalRcv should be set up already (if we are a backend, we inherit this
         * by fork() or EXEC_BACKEND mechanism from the postmaster).
***************
*** 282,287 **** WalReceiverMain(void)
--- 287,295 ----
        MemSet(&reply_message, 0, sizeof(reply_message));
        MemSet(&feedback_message, 0, sizeof(feedback_message));
  
+       /* Initialize the last recv timestamp */
+       last_recv_timestamp = GetCurrentTimestamp();
+ 
        /* Loop until end-of-streaming or error */
        for (;;)
        {
***************
*** 316,327 **** WalReceiverMain(void)
--- 324,343 ----
                /* Wait a while for data to arrive */
                if (walrcv_receive(NAPTIME_PER_CYCLE, &type, &buf, &len))
                {
+                       /* Something is received from master, so reset last 
receive time*/
+                       last_recv_timestamp = GetCurrentTimestamp();
+                       
                        /* Accept the received data, and process it */
                        XLogWalRcvProcessMsg(type, buf, len);
  
                        /* Receive any more data we can without sleeping */
                        while (walrcv_receive(0, &type, &buf, &len))
+                       {
+                               /* Something is received from master, so reset 
last receive time*/
+                               last_recv_timestamp = GetCurrentTimestamp();
+                               
                                XLogWalRcvProcessMsg(type, buf, len);
+                       }
  
                        /* Let the master know that we received some data. */
                        XLogWalRcvSendReply();
***************
*** 334,339 **** WalReceiverMain(void)
--- 350,369 ----
                }
                else
                {
+                       /* Check if time since last receive from standby has 
reached the configured limit
+                        * No need to check if it is disabled by giving value 
as 0*/
+                       if (wal_receiver_replication_timeout > 0)
+                       {
+                               timeout = 
TimestampTzPlusMilliseconds(last_recv_timestamp,
+                                                                               
                                  wal_receiver_replication_timeout);
+ 
+                               if (GetCurrentTimestamp() >= timeout)
+                               {
+                                       ereport(ERROR,
+                                               (errmsg("Could not receive any 
message from WalSender for configured timeout period")));
+                               }
+                       }
+               
                        /*
                         * We didn't receive anything new, but send a status 
update to the
                         * master anyway, to report any progress in applying 
WAL.
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
***************
*** 2382,2387 **** static struct config_int ConfigureNamesInt[] =
--- 2382,2398 ----
                NULL, NULL, NULL
        },
  
+       {
+               {"wal_receiver_replication_timeout", PGC_SIGHUP, 
REPLICATION_STANDBY,
+                       gettext_noop("Sets the maximum wait time to receive 
data from master."),
+                       NULL,
+                       GUC_UNIT_MS
+               },
+               &wal_receiver_replication_timeout,
+               60 * 1000, 0, INT_MAX,
+               NULL, NULL, NULL
+       },
+               
        /* End-of-list marker */
        {
                {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 237,242 ****
--- 237,244 ----
                                        # 0 disables
  #hot_standby_feedback = off           # send info from standby to prevent
                                        # query conflicts
+ #wal_receiver_replication_timeout = 60s       # in milliseconds; 0 disables; 
time 
+                                       # till receiver waits for communication 
from master.
  
  
  
#------------------------------------------------------------------------------
*** a/src/include/replication/walreceiver.h
--- b/src/include/replication/walreceiver.h
***************
*** 19,24 ****
--- 19,25 ----
  
  extern int    wal_receiver_status_interval;
  extern bool hot_standby_feedback;
+ extern int wal_receiver_replication_timeout;
  
  /*
   * MAXCONNINFO: maximum size of a connection string.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to