Tatsuo,
I modified the source file for Pgpool -II 2.3.2.2.
I attached the patch(diff).
Regards,
Rumman
________________________________________
From: Tatsuo Ishii [[email protected]]
Sent: Tuesday, February 23, 2010 9:14 AM
To: Brian Maguire
Cc: [email protected]; Gazi Rahman; Ahmed Iftekhar; Arthur Vossberg;
[email protected]
Subject: Re: Pgpool II 2.3.1 Patch Contribution
[Cced to pgpool-hackers]
Brian,
Thanks for your contribution. I will look into the code. Also I would
like to hear from pgpool hackers's opinion if any.
If possible, it would be great if you could generate patches (diff)
against CVS HEAD. Also you should change pool_config.l, rather than
pool_config.c. Another request is, please provide documentation. You
can modify doc/pgpool-en.html.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
> Tatsuo
>
>
>
> I would like to contribibution of a patch of a code change to support a
> config change for the killing of all child processes rather than waiting for
> disconnect. The description of the rational and the code chnage are below
> and attached. Please let me know if this can be contributed or even better
> avoided all together some how.
>
>
>
>
>
> Cheers,
>
>
>
> Brian
>
>
>
>
>
>
>
>
>
>
>
>
>
> We changed the source files of Pgpool II 2.3.1 for the following two purposes:
> 1. To avoid the wait for the disconnection of all sessions at 2nd stage
> of pgpool recovery.
> 2. To avoid the disconnection of active sessions at Primary node when the
> secondary node stop communication with the primary pgpool.
>
> 1. To avoid the wait for the disconnection of all sessions at 2nd stage of
> pgpool recovery
>
> With default implementation of Pgpool II 2.3.1, during recovery, at 2nd stage
> pgpool waits for disconnection of all sessions. But in 24x7 production
> database, without stopping the application, it is difficult to ensure that no
> connection will exist in the database and if there is a session connected in
> the database, the recovery process will wait for its disconnection and if it
> does not happen within a specific period then the recovery will fail.
> To avoid this recovery failure we added a parameter
> 'kill_all_child_process_before_2nd_stage"
>
> if kill_all_child_process_before_2nd_stage = true/1 then
> kill all child process before the start of 2nd stage so that no need to
> wait for disconnections of active sessions
> else
> wait for the disconnections of all active sessions; that is same as the
> Pgpool II 2.3.1
> end if
>
> Default value for kill_all_child_process_before_2nd_stage is set to false/0.
>
> The following 3 files have been modified:
>
> pool.h
> pool_config.c
> recovery.c
>
>
>
> File : pool.h //added the parameter
>
>
>
> typedef struct
> {
> ....
> int kill_all_child_process_during_before_2nd_stage;
> } POOL_CONFIG;
>
>
> File: pool_config.c //read the pgpool.conf value set by the pgpool
> administrator
>
>
>
> int pool_init_config()
> {
> ...
> pool_config->kill_all_child_process_before_2nd_stage = 0;
>
> } //int pool_init_config()
>
>
> int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context)
> {
> ....
> //kill_all_child_process_before_2nd_stage
> else if (!strcmp(key, "kill_all_child_process_before_2nd_stage") &&
> CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
> {
> int v = eval_logical(yytext);
>
> if (v < 0)
> {
> pool_error("pool_config: invalid value %s for %s", yytext,
> key);
> return(-1);
> }
> pool_config->kill_all_child_process_before_2nd_stage = v;
> } //kill_all_child_process_before_2nd_stage
> ....
> }//int pool_get_config
>
>
> File: recovery.c //added the parameterized functionality in the block
> "modified here"
>
>
>
> int start_recovery(int recovery_node)
> {
> ...
> pool_log("1st stage is done");
>
> --------------- modified here -------------------------
> if (pool_config->kill_all_child_process_before_2nd_stage == 1)
> { /* kill all children processes */
>
> pool_log("Disconnection of all sessions before 2nd
> stage started");
> pool_debug("pool_config
> =%d",pool_config->num_init_children);
> int i;
> for (i = 0; i < pool_config->num_init_children; i++)
> {
> pid_t pid = pids[i].pid;
> if (pid)
> {
> kill(pid, SIGQUIT);
> pool_debug("Disconnect session before
> 2nd stage: kill %d", pid);
> }//if (pids[i].pid)
> }//for (i = 0; i < pool_config->num_init_children;
> i++)
> pool_log("Killed all active sessions");
> }// if (pool_config->kill_all_child_process_before_2nd_stage
> == 1)
>
> ------------ end of modification ---------------------------------
> pool_log("starting 2nd stage");
>
> /* 2nd stage */
> *InRecovery = 1;
> -------- modified here ----------------
> pool_log("---->Waiting for all connection closed <---");
> --------- end of modification ----------
> if (wait_connection_closed() != 0)
> {
> PQfinish(conn);
> pool_error("start_recovery: timeover for waiting connection closed");
> return 1;
> }
> pool_log("all connections from clients have been closed");
>
> ...
> } //int start_recovery
> 2. To avoid the disconnection of active sessions at Primary node when the
> secondary node stop communication with the primary pgpool
>
> With default implementation of Pgpool II 2.3.1, if either node is down pgpool
> kills all child processes to stop TCP/IP retrying. But this may cause problem
> in our environment.
> Let a situation where secondary node somehow disconnected and stopped
> communication with the primary node. With default implementation, pgpool
> kills all active calls and other activities of our application.
> It is not desirable. Activities at primary node must not be hampered if
> secondary node stops communication.
> To avoid this, we added a parameter
> "restart_all_process_for_same_master_node_in_failover".
>
> If restart_all_process_for_same_master_node_in_failover = true/1 then
> restart all child processes that is same as the default with Pgpool II
> 2.3.1
> else
> don't restart the child processes for same master node
> end if
>
> Default value for restart_all_process_for_same_master_node_in_failover set to
> 1/true;
>
> Actually, here we have only enabled the old functionaluty of do not restart
> child process for same master node with the parameter
> restart_all_process_for_same_master_node_in_failover.
>
>
>
> The following 3 files have been changed:
> pool.h
> pool_config.c
> main.c
>
>
>
> File: pool.h //added the parameter
>
>
>
> typedef struct
> {
> ....
> int restart_all_process_for_same_master_node_in_failover;
> } POOL_CONFIG;
>
>
> File: pool_config.c //read the pgpool.conf value set by the pgpool
> administrator
>
>
>
> int pool_init_config()
> {
> ...
> pool_config->restart_all_process_for_same_master_node_in_failover = 1;
> } //int pool_init_config()
>
> int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context)
> {
> ....
> else if (!strcmp(key, "restart_all_process_for_same_master_node_in_failover")
> &&
> CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
> {
> int v = eval_logical(yytext);
>
> if (v < 0)
> {
> pool_error("pool_config: invalid value %s for %s", yytext,
> key);
> fclose(fd);
> return(-1);
> }
> pool_debug("restart_all_process_for_same_master_node_in_failover:
> %d", v);
> pool_config->restart_all_process_for_same_master_node_in_failover
> = v;
> }
> ....
>
> } //int pool_get_config(char *confpath, POOL_CONFIG_CONTEXT context)
>
>
>
>
> File: main.c // added the parameterized functionality; the block "modified
> here" indicates our modification
>
>
>
> ...
> static void failover(void)
> {
>
> ...
> if (new_master == pool_config->backend_desc->num_backends)
> {
> pool_error("failover_handler: no valid DB node found");
> }
>
> /*
> * Before we tried to minimize restarting pgpool to protect existing
> * connections from clients to pgpool children. What we did here was,
> * if children other than master went down, we did not fail over.
> * This is wrong. Think about following scenario. If someone
> * accidentally plugs out the network cable, the TCP/IP stack keeps
> * retrying for long time (typically 2 hours). The only way to stop
> * the retry is restarting the process. Bottom line is, we need to
> * restart all children in any case. See pgpool-general list posting
> * "TCP connections are *not* closed when a backend timeout" on Jul 13
> * 2008 for more details.
> * It has been parameterized as people may want behavior depending on their
> system.
> */
>
> ----------- modified here ------------------------
> //#ifdef NOT_USED //
> else
> {
> if (
> pool_config->restart_all_process_for_same_master_node_in_failover == 0 &&
> Req_info->master_node_id == new_master && *InRecovery == 0)
> {
> pool_log("failover_handler: do not restart pgpool. same master
> node %d was selected", new_master);
> if (Req_info->kind == NODE_UP_REQUEST)
> {
> pool_log("failback done. reconnect host %s(%d)",
> BACKEND_INFO(node_id).backend_hostname,
> BACKEND_INFO(node_id).backend_port);
> }
> else
> {
> pool_log("failover done. shutdown host %s(%d)",
> BACKEND_INFO(node_id).backend_hostname,
> BACKEND_INFO(node_id).backend_port);
> }
>
> /* exec failover_command */
> for (i = 0; i < pool_config->backend_desc->num_backends; i++)
> {
> if (nodes[i])
> trigger_failover_command(i,
> pool_config->failover_command);
> }
>
> pool_semaphore_unlock(REQUEST_INFO_SEM);
> switching = 0;
> kill(pcp_pid, SIGUSR2);
> switching = 0;
> return;
> }
> }
> //#endif //
> ----------- end of modification ------------------------
>
> /* kill all children */
> for (i = 0; i < pool_config->num_init_children; i++)
> {
> pid_t pid = pids[i].pid;
> if (pid)
> {
> kill(pid, SIGQUIT);
> pool_debug("failover_handler: kill %d", pid);
> }
> }
>
> ...
> } // end of failover()
> ...diff -aur pgpool-II-2.3.2.2/doc/pgpool-en.html
pgpool-II-2.3.2.2.modified/doc/pgpool-en.html
--- pgpool-II-2.3.2.2/doc/pgpool-en.html 2010-02-02 06:30:58.000000000
+0600
+++ pgpool-II-2.3.2.2.modified/doc/pgpool-en.html 2010-03-10
17:17:36.000000000 +0600
@@ -1035,6 +1035,22 @@
client_idle_limit_in_recovery is 0, which means the functionality is
turned
off. You need to reload pgpool.conf if you change
client_idle_limit_in_recovery.</p>
+
+ <dt>kill_all_child_process_before_2nd_stage
+ <dd>
+ <p> This parameter takes effect in recovery stage. If set to true, it kills
all sessions both active or inactive connected
+ in the pgpool before the 2nd stage of online recovery and avoid the
wait for connections close. If set to false,
+ then this operation is ommited. You need to reload pgpool.conf if you change
the value of this parameter.</p>
+ <p>Default value is false.
+ </p>
+
+ <dt>restart_all_process_for_same_master_node
+ <dd>
+ <p> This parameter takes effect in failover operation. If set to true,
pgpool kills all connections for any node goes down. Otherwise, that is, if set
to false, then
+ pgpool does not kill connections for children other than master node
go down.You need to reload pgpool.conf
+ if you change the value of this parameter.</p>
+ <p>Default value is true.
+ </p>
<dt>lobj_lock_table
<dd>
diff -aur pgpool-II-2.3.2.2/main.c pgpool-II-2.3.2.2.modified/main.c
--- pgpool-II-2.3.2.2/main.c 2010-02-13 17:23:55.000000000 +0600
+++ pgpool-II-2.3.2.2.modified/main.c 2010-03-10 17:17:08.000000000 +0600
@@ -1383,10 +1383,10 @@
* 2008 for more details.
*/
-#ifdef NOT_USED
+//#ifdef NOT_USED
else
{
- if (Req_info->master_node_id == new_master && *InRecovery == 0)
+ if (
pool_config->restart_all_process_for_same_master_node_in_failover==0 &&
Req_info->master_node_id == new_master && *InRecovery == 0)
{
pool_log("failover_handler: do not restart pgpool. same
master node %d was selected", new_master);
if (Req_info->kind == NODE_UP_REQUEST)
@@ -1416,7 +1416,7 @@
return;
}
}
-#endif
+//#endif
/* kill all children */
for (i = 0; i < pool_config->num_init_children; i++)
{
diff -aur pgpool-II-2.3.2.2/pool_config.l
pgpool-II-2.3.2.2.modified/pool_config.l
--- pgpool-II-2.3.2.2/pool_config.l 2010-01-31 08:22:24.000000000 +0600
+++ pgpool-II-2.3.2.2.modified/pool_config.l 2010-03-10 17:17:09.000000000
+0600
@@ -182,7 +182,10 @@
pool_config->ssl_key = "";
pool_config->ssl_ca_cert = "";
pool_config->ssl_ca_cert_dir = "";
-
+
+ pool_config->kill_all_child_process_before_2nd_stage = 0;
+ pool_config->restart_all_process_for_same_master_node_in_failover = 1;
+
res = gethostname(localhostname,sizeof(localhostname));
if(res !=0 )
{
@@ -1185,7 +1188,8 @@
(context == RELOAD_CONFIG && (status ==
CON_UNUSED || status == CON_DOWN)))
strncpy(BACKEND_INFO(slot).backend_data_directory, str, MAX_PATH_LENGTH);
}
- else if (!strcmp(key, "log_statement") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
+
+ else if (!strcmp(key, "log_per_node_statement") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
{
int v = eval_logical(yytext);
@@ -1194,9 +1198,10 @@
pool_error("pool_config: invalid value %s for
%s", yytext, key);
return(-1);
}
- pool_config->log_statement = v;
+ pool_config->log_per_node_statement = v;
}
- else if (!strcmp(key, "log_per_node_statement") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
+
+ else if (!strcmp(key, "log_statement") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
{
int v = eval_logical(yytext);
@@ -1205,9 +1210,10 @@
pool_error("pool_config: invalid value %s for
%s", yytext, key);
return(-1);
}
- pool_config->log_per_node_statement = v;
+ pool_config->log_statement = v;
}
- else if (!strcmp(key, "log_statement") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
+
+ else if (!strcmp(key, "kill_all_child_process_before_2nd_stage") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
{
int v = eval_logical(yytext);
@@ -1216,8 +1222,21 @@
pool_error("pool_config: invalid value %s for
%s", yytext, key);
return(-1);
}
- pool_config->log_statement = v;
- }
+ pool_config->kill_all_child_process_before_2nd_stage =
v;
+ } //kill_all_child_process_before_2nd_stage
+
+ else if (!strcmp(key,
"restart_all_process_for_same_master_node_in_failover") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
+ {
+ int v = eval_logical(yytext);
+
+ if (v < 0)
+ {
+ pool_error("pool_config: invalid value %s for
%s", yytext, key);
+ return(-1);
+ }
+
pool_config->restart_all_process_for_same_master_node_in_failover = v;
+ } //restart_all_process_for_same_master_node_in_failover
+
else if (!strcmp(key, "lobj_lock_table") &&
CHECK_CONTEXT(INIT_CONFIG|RELOAD_CONFIG, context))
{
diff -aur pgpool-II-2.3.2.2/pool.h pgpool-II-2.3.2.2.modified/pool.h
--- pgpool-II-2.3.2.2/pool.h 2010-02-03 14:11:42.000000000 +0600
+++ pgpool-II-2.3.2.2.modified/pool.h 2010-03-10 17:17:10.000000000 +0600
@@ -223,6 +223,9 @@
char *ssl_key; /* path to ssl key (frontend only) */
char *ssl_ca_cert; /* path to root (CA) certificate */
char *ssl_ca_cert_dir; /* path to directory containing CA certificates
*/
+
+ int kill_all_child_process_before_2nd_stage;
+ int restart_all_process_for_same_master_node_in_failover;
} POOL_CONFIG;
#define MAX_PASSWORD_SIZE 1024
diff -aur pgpool-II-2.3.2.2/recovery.c pgpool-II-2.3.2.2.modified/recovery.c
--- pgpool-II-2.3.2.2/recovery.c 2009-12-23 15:30:43.000000000 +0600
+++ pgpool-II-2.3.2.2.modified/recovery.c 2010-03-10 17:17:12.000000000
+0600
@@ -90,11 +90,30 @@
}
pool_log("1st stage is done");
-
+
+ if (pool_config->kill_all_child_process_before_2nd_stage == 1)
+ { /* kill all children processes */
+
+ pool_log("Disconnection of all sessions before 2nd stage
started");
+ pool_debug("pool_config =%d",pool_config->num_init_children);
+ int i;
+ for (i = 0; i < pool_config->num_init_children; i++)
+ {
+ pid_t pid = pids[i].pid;
+ if (pid)
+ {
+ kill(pid, SIGQUIT);
+ pool_debug("Disconnect session before 2nd
stage: kill %d", pid);
+ }//if (pids[i].pid)
+ }//for (i = 0; i < pool_config->num_init_children; i++)
+ pool_log("Killed all active sessions");
+ }// if (pool_config->kill_all_child_process_before_2nd_stage == 1)
+
pool_log("starting 2nd stage");
/* 2nd stage */
*InRecovery = 1;
+ pool_log("---->Waiting for all connection closed <---");
if (wait_connection_closed() != 0)
{
PQfinish(conn);
_______________________________________________
Pgpool-hackers mailing list
[email protected]
http://pgfoundry.org/mailman/listinfo/pgpool-hackers