From: Sandeep Thakkar <[email protected]>
Subject: [Pgpool-general] primary server cannot be recovered by online recovery
Date: Tue, 4 Oct 2011 23:09:41 -0700 (PDT)
Message-ID: <[email protected]>

> I use pgpool-II 3.0.3 and configured it in Streaming replication mode. My 
> test cases work fine. 
> But sometimes, once in a few days, I see the following error during online 
> recovery:
> 
> DEBUG: send: tos="R", len=41
> DEBUG: recv: tos="r", len=21, data=AuthenticationOK
> DEBUG: send: tos="D", len=6
> DEBUG: recv: tos="e", len=59, data=primary server cannot be recovered by 
> online recovery.
> DEBUG: command failed. reason=primary server cannot be recovered by online 
> recovery.
> DEBUG: send: tos="X", len=4
> BackendError
> 
> I looked into code and found that this error should appear when the 
> master_slave_sub_mode value in pgpool.conf
> is not set to 'stream'. But my pgpool.conf settings are all fine. I just 
> wanted to know if there could be other reason for this error?
>
> ...
> if (MASTER_SLAVE && !strcmp(pool_config->master_slave_sub_mode, 
> MODE_STREAMREP))
>                                               msg = "primary server cannot be 
> recovered by online recovery.";
> ........

No. The code says the error message should appears when the
master_slave_sub_mode value in pgpool.confis *set* to 'stream'.

I think real cause of the problem is just before the code segment:

                                if ((!REPLICATION &&
                                         !(MASTER_SLAVE &&
                                           
!strcmp(pool_config->master_slave_sub_mode, MODE_STREAMREP))) ||
                                        (MASTER_SLAVE &&
                                         
!strcmp(pool_config->master_slave_sub_mode, MODE_STREAMREP) &&
Here ---->                       node_id == PRIMARY_NODE_ID))
                                {
                                        int len;
                                        char *msg;

                                        if (MASTER_SLAVE && 
!strcmp(pool_config->master_slave_sub_mode, MODE_STREAMREP))
                                                msg = "primary server cannot be 
recovered by online recovery.";
                                        else
                                                msg = "recovery request is 
accepted only in replication mode or stereaming replication mode. ";

"PRIMARY_NODE_ID" is a macro:

#define PRIMARY_NODE_ID (Req_info->primary_node_id >=0?\
                                                 
Req_info->primary_node_id:REAL_MASTER_NODE_ID)

Req_info->primary_node_id is data on shared memory. It is set by
calling pgpool_walrecrunning(). The function checks whether WAL
receiver process of PostgreSQL is running. If not running it must be
the primary server. Unfortunately, there is a logic flaw in this. If
something goes wrong (for example, the network connection between
primary and standby is broken) then WAL receiver goes down. In this
case it leads to mistake to determine which is primary.

In summary, in your system WAL receiver process occasionally goes
down, and this trigger the error described above.

Pgpool-II 3.1 changes the logic to detect primary to solve the
problem. So my bet is upgrading to 3.1 will solve the problem.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
_______________________________________________
Pgpool-general mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgpool-general

Reply via email to