Re: [Pgpool-general] unexpected EOF on client connection
On Wed, Sep 14, 2011 at 6:00 PM, Tatsuo Ishii wrote: >> On Wed, Sep 14, 2011 at 4:22 PM, Tatsuo Ishii wrote: >> I'm pretty sure that's not the case as the messages stop whenever >> pgpool isn't running, they were not present prior to using pgpool, and >> pg_hba.conf is setup such that the database servers only accept >> connections from each other, and the server running pgpool. None of >> these servers have normal users connected directly to them (such as >> with ssh), nor are they running anything that would connect to the >> database as a client. Also, the volume of these messages are such >> that something significant has to be causing them. Last night, in the >> span of 5 minutes, there were 117 of these messages. > > Ok. I would like to narraow down the reason why we have "unexpected > EOF on client connection" message frequently. I think currently there > are two possiblities: > > 1) pgpool child was killed by some unknown reason(we can omit > segfault case because you don't see it in the pgpool log) > > 2) pgpool child disconnects to PostgreSQL in ungraceful manner > > For 1) I would like to know if pgpool child process are fine since > they are spawned. Are you seeing any pgpool child process disappeared > since pgpool started? I assume this should be determined by num_init_children (which I've set to 195 in pgpool.conf)? If so, then I currently have 195 processes in either the "wait for connection request" state or actively connected state. >>> >>> No. Pgpool parent process automatically respawns child process if it's >>> dyning. So having num_init_children child process is not showing >>> anything usefull. You record 195 process ids and compare current >>> process ids. If some of them have been changed, we can assume that >>> child process is dying. >> >> Ah, good point. I just diffed the list of PIDs associated with pgpool >> processes before and after another EOF message in the log, and there >> were no differences. So I think that rules out any processes dying? > > Right. > >> One other thing that I just noticed from comparing logs between all of >> the database servers is that the time stamps for every one of the >> 'unexpected EOF on client connection' instances are identical. In >> other words, they are happening at the same time on each server. I >> think this further suggests that pgpool has to be doing it? > > Yes, I think so unless you set connection_life_time to other than 0 or > the network connection between PostgreSQL and pgpool is unstable. connection _life_time is currently 0 (since you recommended I change it earlier). I don't have any evidence to suggest that the network connection is unstable. There are 0 errors of any kind in ifconfig output. > > Let me think how we can make further investigation... ok, thanks. ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] unexpected EOF on client connection
> On Wed, Sep 14, 2011 at 4:22 PM, Tatsuo Ishii wrote: > I'm pretty sure that's not the case as the messages stop whenever > pgpool isn't running, they were not present prior to using pgpool, and > pg_hba.conf is setup such that the database servers only accept > connections from each other, and the server running pgpool. None of > these servers have normal users connected directly to them (such as > with ssh), nor are they running anything that would connect to the > database as a client. Also, the volume of these messages are such > that something significant has to be causing them. Last night, in the > span of 5 minutes, there were 117 of these messages. Ok. I would like to narraow down the reason why we have "unexpected EOF on client connection" message frequently. I think currently there are two possiblities: 1) pgpool child was killed by some unknown reason(we can omit segfault case because you don't see it in the pgpool log) 2) pgpool child disconnects to PostgreSQL in ungraceful manner For 1) I would like to know if pgpool child process are fine since they are spawned. Are you seeing any pgpool child process disappeared since pgpool started? >>> >>> I assume this should be determined by num_init_children (which I've >>> set to 195 in pgpool.conf)? If so, then I currently have 195 >>> processes in either the "wait for connection request" state or >>> actively connected state. >> >> No. Pgpool parent process automatically respawns child process if it's >> dyning. So having num_init_children child process is not showing >> anything usefull. You record 195 process ids and compare current >> process ids. If some of them have been changed, we can assume that >> child process is dying. > > Ah, good point. I just diffed the list of PIDs associated with pgpool > processes before and after another EOF message in the log, and there > were no differences. So I think that rules out any processes dying? Right. > One other thing that I just noticed from comparing logs between all of > the database servers is that the time stamps for every one of the > 'unexpected EOF on client connection' instances are identical. In > other words, they are happening at the same time on each server. I > think this further suggests that pgpool has to be doing it? Yes, I think so unless you set connection_life_time to other than 0 or the network connection between PostgreSQL and pgpool is unstable. Let me think how we can make further investigation... -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] unexpected EOF on client connection
On Wed, Sep 14, 2011 at 4:22 PM, Tatsuo Ishii wrote: I'm pretty sure that's not the case as the messages stop whenever pgpool isn't running, they were not present prior to using pgpool, and pg_hba.conf is setup such that the database servers only accept connections from each other, and the server running pgpool. None of these servers have normal users connected directly to them (such as with ssh), nor are they running anything that would connect to the database as a client. Also, the volume of these messages are such that something significant has to be causing them. Last night, in the span of 5 minutes, there were 117 of these messages. >>> >>> Ok. I would like to narraow down the reason why we have "unexpected >>> EOF on client connection" message frequently. I think currently there >>> are two possiblities: >>> >>> 1) pgpool child was killed by some unknown reason(we can omit >>> segfault case because you don't see it in the pgpool log) >>> >>> 2) pgpool child disconnects to PostgreSQL in ungraceful manner >>> >>> For 1) I would like to know if pgpool child process are fine since >>> they are spawned. Are you seeing any pgpool child process disappeared >>> since pgpool started? >> >> I assume this should be determined by num_init_children (which I've >> set to 195 in pgpool.conf)? If so, then I currently have 195 >> processes in either the "wait for connection request" state or >> actively connected state. > > No. Pgpool parent process automatically respawns child process if it's > dyning. So having num_init_children child process is not showing > anything usefull. You record 195 process ids and compare current > process ids. If some of them have been changed, we can assume that > child process is dying. Ah, good point. I just diffed the list of PIDs associated with pgpool processes before and after another EOF message in the log, and there were no differences. So I think that rules out any processes dying? One other thing that I just noticed from comparing logs between all of the database servers is that the time stamps for every one of the 'unexpected EOF on client connection' instances are identical. In other words, they are happening at the same time on each server. I think this further suggests that pgpool has to be doing it? ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] unexpected EOF on client connection
>>> I'm pretty sure that's not the case as the messages stop whenever >>> pgpool isn't running, they were not present prior to using pgpool, and >>> pg_hba.conf is setup such that the database servers only accept >>> connections from each other, and the server running pgpool. None of >>> these servers have normal users connected directly to them (such as >>> with ssh), nor are they running anything that would connect to the >>> database as a client. Also, the volume of these messages are such >>> that something significant has to be causing them. Last night, in the >>> span of 5 minutes, there were 117 of these messages. >> >> Ok. I would like to narraow down the reason why we have "unexpected >> EOF on client connection" message frequently. I think currently there >> are two possiblities: >> >> 1) pgpool child was killed by some unknown reason(we can omit >> segfault case because you don't see it in the pgpool log) >> >> 2) pgpool child disconnects to PostgreSQL in ungraceful manner >> >> For 1) I would like to know if pgpool child process are fine since >> they are spawned. Are you seeing any pgpool child process disappeared >> since pgpool started? > > I assume this should be determined by num_init_children (which I've > set to 195 in pgpool.conf)? If so, then I currently have 195 > processes in either the "wait for connection request" state or > actively connected state. No. Pgpool parent process automatically respawns child process if it's dyning. So having num_init_children child process is not showing anything usefull. You record 195 process ids and compare current process ids. If some of them have been changed, we can assume that child process is dying. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU
Thanks for your reply. I'll do this the next time this happens (which will likely be within a few days based on history). On Wed, Sep 14, 2011 at 3:57 PM, Tatsuo Ishii wrote: > Please use gdb. For example, > > become postgres user (or root user) > gdb pgpool 29191 > bt > cont > bt > cont > : > : > : > > This will give us an idea where it's looping. > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese: http://www.sraoss.co.jp > >> This problem has returned yet again: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 29191 postgres 20 0 80192 14m 1544 R 89.8 0.2 51:15.91 pgpool >> >> postgres 29191 3.4 0.1 80192 14728 ? R Sep13 51:40 >> pgpool: lfriedman nightly 10.31.96.84(61698) idle >> >> >> I'd really appreciate some input on how to debug this. >> >> >> On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman wrote: >>> No one else has experienced this or has suggestions how to debug it? >>> >>> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman >>> wrote: Greetings, I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load balancer for a three server postgresql-9.0.4 cluster (1 master, 2 standby). I'm seeing strange behavior where a single pgpool process seems to hang after some period of time, and then consume 100% of the CPU. I've seen this behavior happen twice since last Friday (when pgpool was brought online in my production environment). At the moment the current hung process looks like this in 'ps auxww' output: postgres 19838 98.7 0.0 68856 2904 ? R Sep06 1027:36 pgpool: lfriedman nightly 10.31.45.20(58277) idle In top, I see: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19838 postgres 20 0 68856 2904 1072 R 100.0 0.0 1027:29 pgpool When to connect to the process with strace, there is no output, so I'm guessing the process is stuck spinning somewhere: # strace -p 19838 Process 19838 attached - interrupt to quit ... ^CProcess 19838 detached One thing that i'm certain of is that the client IP (10.31.45.20) associated with the hung process has rebooted at least once since that process was spawned. So pgpool seems to be in some confused state, as the client definitely severed the connection already. I checked the pgpool log and there are no explicit references to PID 19838. I'm at a loss how to debug this further, but clearly something is wrong somewhere, and this isn't normal/expected behavior. ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] unexpected EOF on client connection
On Wed, Sep 14, 2011 at 3:56 PM, Tatsuo Ishii wrote: >> On Tue, Sep 13, 2011 at 8:48 PM, Tatsuo Ishii wrote: On Mon, Sep 12, 2011 at 6:47 PM, Lonni J Friedman wrote: > On Mon, Sep 12, 2011 at 6:39 PM, Tatsuo Ishii wrote: I couldn't find anything possibly related to your problem at a first grance(in theory client_idle_limit and authentication_timeout are not related but you might want to change them to see anything could be changed). >>> >>> OK, I'll give that a try. Should I just try increasing them by 10 or >>> 20s? >> >> I'd suggest giving them 0. This will prevent to initiate those >> functionalities which the directives are related. >> >> Also you hve child_life_time being 300. I don't expect this is related >> but could you set it to 0 and see anything gest changed for just in >> case? > > OK, i'll make those changes tomorrow (its late in the day here, and I > don't want to introduce potential problems in the middle of the night > when no one is closely monitoring the server), and let you know if > they have any impact. client_idle_limit was already 0. I set authentication_timeout=0 and child_life_time=0, and restarted pgpool, however that had no impact. I'm still seeing: 26323 2011-09-13 09:28:19 PDT LOG: unexpected EOF on client connection 3933 2011-09-13 09:36:20 PDT LOG: unexpected EOF on client connection >>> >>> Humm. Is it possible that those connections do not come from pgpool >>> process? >> >> I'm pretty sure that's not the case as the messages stop whenever >> pgpool isn't running, they were not present prior to using pgpool, and >> pg_hba.conf is setup such that the database servers only accept >> connections from each other, and the server running pgpool. None of >> these servers have normal users connected directly to them (such as >> with ssh), nor are they running anything that would connect to the >> database as a client. Also, the volume of these messages are such >> that something significant has to be causing them. Last night, in the >> span of 5 minutes, there were 117 of these messages. > > Ok. I would like to narraow down the reason why we have "unexpected > EOF on client connection" message frequently. I think currently there > are two possiblities: > > 1) pgpool child was killed by some unknown reason(we can omit > segfault case because you don't see it in the pgpool log) > > 2) pgpool child disconnects to PostgreSQL in ungraceful manner > > For 1) I would like to know if pgpool child process are fine since > they are spawned. Are you seeing any pgpool child process disappeared > since pgpool started? I assume this should be determined by num_init_children (which I've set to 195 in pgpool.conf)? If so, then I currently have 195 processes in either the "wait for connection request" state or actively connected state. ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU
Please use gdb. For example, become postgres user (or root user) gdb pgpool 29191 bt cont bt cont : : : This will give us an idea where it's looping. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp > This problem has returned yet again: > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 29191 postgres 20 0 80192 14m 1544 R 89.8 0.2 51:15.91 pgpool > > postgres 29191 3.4 0.1 80192 14728 ?RSep13 51:40 > pgpool: lfriedman nightly 10.31.96.84(61698) idle > > > I'd really appreciate some input on how to debug this. > > > On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman wrote: >> No one else has experienced this or has suggestions how to debug it? >> >> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman wrote: >>> Greetings, >>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load >>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2 >>> standby). I'm seeing strange behavior where a single pgpool process >>> seems to hang after some period of time, and then consume 100% of the >>> CPU. I've seen this behavior happen twice since last Friday (when >>> pgpool was brought online in my production environment). At the >>> moment the current hung process looks like this in 'ps auxww' output: >>> >>> postgres 19838 98.7 0.0 68856 2904 ? R Sep06 1027:36 >>> pgpool: lfriedman nightly 10.31.45.20(58277) idle >>> >>> >>> In top, I see: >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 19838 postgres 20 0 68856 2904 1072 R 100.0 0.0 1027:29 pgpool >>> >>> >>> When to connect to the process with strace, there is no output, so I'm >>> guessing the process is stuck spinning somewhere: >>> # strace -p 19838 >>> Process 19838 attached - interrupt to quit >>> ... >>> ^CProcess 19838 detached >>> >>> One thing that i'm certain of is that the client IP (10.31.45.20) >>> associated with the hung process has rebooted at least once since that >>> process was spawned. So pgpool seems to be in some confused state, as >>> the client definitely severed the connection already. I checked the >>> pgpool log and there are no explicit references to PID 19838. I'm at >>> a loss how to debug this further, but clearly something is wrong >>> somewhere, and this isn't normal/expected behavior. > ___ > Pgpool-general mailing list > Pgpool-general@pgfoundry.org > http://pgfoundry.org/mailman/listinfo/pgpool-general ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] unexpected EOF on client connection
> On Tue, Sep 13, 2011 at 8:48 PM, Tatsuo Ishii wrote: >>> On Mon, Sep 12, 2011 at 6:47 PM, Lonni J Friedman >>> wrote: On Mon, Sep 12, 2011 at 6:39 PM, Tatsuo Ishii wrote: >>> I couldn't find anything possibly related to your problem at a first >>> grance(in theory client_idle_limit and authentication_timeout are not >>> related but you might want to change them to see anything could be >>> changed). >> >> OK, I'll give that a try. Should I just try increasing them by 10 or >> 20s? > > I'd suggest giving them 0. This will prevent to initiate those > functionalities which the directives are related. > > Also you hve child_life_time being 300. I don't expect this is related > but could you set it to 0 and see anything gest changed for just in > case? OK, i'll make those changes tomorrow (its late in the day here, and I don't want to introduce potential problems in the middle of the night when no one is closely monitoring the server), and let you know if they have any impact. >>> >>> >>> client_idle_limit was already 0. I set authentication_timeout=0 and >>> child_life_time=0, and restarted pgpool, however that had no impact. >>> I'm still seeing: >>> 26323 2011-09-13 09:28:19 PDT LOG: unexpected EOF on client connection >>> 3933 2011-09-13 09:36:20 PDT LOG: unexpected EOF on client connection >> >> Humm. Is it possible that those connections do not come from pgpool >> process? > > I'm pretty sure that's not the case as the messages stop whenever > pgpool isn't running, they were not present prior to using pgpool, and > pg_hba.conf is setup such that the database servers only accept > connections from each other, and the server running pgpool. None of > these servers have normal users connected directly to them (such as > with ssh), nor are they running anything that would connect to the > database as a client. Also, the volume of these messages are such > that something significant has to be causing them. Last night, in the > span of 5 minutes, there were 117 of these messages. Ok. I would like to narraow down the reason why we have "unexpected EOF on client connection" message frequently. I think currently there are two possiblities: 1) pgpool child was killed by some unknown reason(we can omit segfault case because you don't see it in the pgpool log) 2) pgpool child disconnects to PostgreSQL in ungraceful manner For 1) I would like to know if pgpool child process are fine since they are spawned. Are you seeing any pgpool child process disappeared since pgpool started? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU
This problem has returned yet again: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29191 postgres 20 0 80192 14m 1544 R 89.8 0.2 51:15.91 pgpool postgres 29191 3.4 0.1 80192 14728 ?RSep13 51:40 pgpool: lfriedman nightly 10.31.96.84(61698) idle I'd really appreciate some input on how to debug this. On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman wrote: > No one else has experienced this or has suggestions how to debug it? > > On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman wrote: >> Greetings, >> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load >> balancer for a three server postgresql-9.0.4 cluster (1 master, 2 >> standby). I'm seeing strange behavior where a single pgpool process >> seems to hang after some period of time, and then consume 100% of the >> CPU. I've seen this behavior happen twice since last Friday (when >> pgpool was brought online in my production environment). At the >> moment the current hung process looks like this in 'ps auxww' output: >> >> postgres 19838 98.7 0.0 68856 2904 ? R Sep06 1027:36 >> pgpool: lfriedman nightly 10.31.45.20(58277) idle >> >> >> In top, I see: >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 19838 postgres 20 0 68856 2904 1072 R 100.0 0.0 1027:29 pgpool >> >> >> When to connect to the process with strace, there is no output, so I'm >> guessing the process is stuck spinning somewhere: >> # strace -p 19838 >> Process 19838 attached - interrupt to quit >> ... >> ^CProcess 19838 detached >> >> One thing that i'm certain of is that the client IP (10.31.45.20) >> associated with the hung process has rebooted at least once since that >> process was spawned. So pgpool seems to be in some confused state, as >> the client definitely severed the connection already. I checked the >> pgpool log and there are no explicit references to PID 19838. I'm at >> a loss how to debug this further, but clearly something is wrong >> somewhere, and this isn't normal/expected behavior. ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] confirm 2b4736d3dbf2f7ccea62d713d3d64985a93c4c1a
I am looking for Failover and Loadbalancing in postgresql. my choice is very likely to be pgpool. but i have concerns regarding it beeing a SPOF. so i found pgpool-HA. but nowhere is a description of what this actually does. I would like to keep all my VMs the same (not have a dedicated DB loadbalancer) So there would be a pgpool server on every database server knowing about all other databases. my goal would be to be able to takl to any of the pgpool instances and get the same result. the question is will pgpool-HA keep the information about what servers are available/disconnected synchronised over all pgpool instances. or is it just a hot-standby solution where the new pgpool server takes the place of the old one if it fails. tldr; is a active:active configuration for pgpool instances possibel with pgpool-HA? ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] unexpected EOF on client connection
On Tue, Sep 13, 2011 at 8:48 PM, Tatsuo Ishii wrote: >> On Mon, Sep 12, 2011 at 6:47 PM, Lonni J Friedman wrote: >>> On Mon, Sep 12, 2011 at 6:39 PM, Tatsuo Ishii wrote: >> I couldn't find anything possibly related to your problem at a first >> grance(in theory client_idle_limit and authentication_timeout are not >> related but you might want to change them to see anything could be >> changed). > > OK, I'll give that a try. Should I just try increasing them by 10 or 20s? I'd suggest giving them 0. This will prevent to initiate those functionalities which the directives are related. Also you hve child_life_time being 300. I don't expect this is related but could you set it to 0 and see anything gest changed for just in case? >>> >>> OK, i'll make those changes tomorrow (its late in the day here, and I >>> don't want to introduce potential problems in the middle of the night >>> when no one is closely monitoring the server), and let you know if >>> they have any impact. >> >> >> client_idle_limit was already 0. I set authentication_timeout=0 and >> child_life_time=0, and restarted pgpool, however that had no impact. >> I'm still seeing: >> 26323 2011-09-13 09:28:19 PDT LOG: unexpected EOF on client connection >> 3933 2011-09-13 09:36:20 PDT LOG: unexpected EOF on client connection > > Humm. Is it possible that those connections do not come from pgpool > process? I'm pretty sure that's not the case as the messages stop whenever pgpool isn't running, they were not present prior to using pgpool, and pg_hba.conf is setup such that the database servers only accept connections from each other, and the server running pgpool. None of these servers have normal users connected directly to them (such as with ssh), nor are they running anything that would connect to the database as a client. Also, the volume of these messages are such that something significant has to be causing them. Last night, in the span of 5 minutes, there were 117 of these messages. ___ Pgpool-general mailing list Pgpool-general@pgfoundry.org http://pgfoundry.org/mailman/listinfo/pgpool-general
Re: [Pgpool-general] online recovery fails on HPUX
I have set "enable_pool_hba = false" in pgpool.conf and do not use pool_hba.conf. Do we need to enable this? How does the client authentication works when pool_hba.conf is disabled? From: Sandeep Thakkar To: Sandeep Thakkar ; Jose Mendoza ; "pgpool-general@pgfoundry.org" Sent: Wednesday, September 14, 2011 4:33 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX I still face this issue and I wonder why do I see the following error in the pgpool log: 2011-09-14 04:48:52 LOG: pid 10268: starting recovering node 1 2011-09-14 04:48:53 ERROR: pid 10268: start_recover: could not connect master node. Please help! Thanks From: Sandeep Thakkar To: Sandeep Thakkar ; Jose Mendoza ; "pgpool-general@pgfoundry.org" Sent: Tuesday, September 13, 2011 4:52 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX I mean, pcp_remote_start contains "$PGCTL -w -D $DESTDIR start". i.e without SSH Sandeep. From: Sandeep Thakkar To: Jose Mendoza ; "pgpool-general@pgfoundry.org" Sent: Tuesday, September 13, 2011 4:24 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX Well, actually my database servers and pgpool running on the same host, so my pcp_remote_start does not contains "$PGCTL -w -D $DESTDIR start". i.e without SSH. This works fine on Linux, though. Sandeep. From: Jose Mendoza To: pgpool-general@pgfoundry.org Sent: Friday, September 2, 2011 12:47 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX If you already have an sshkey defined and it’s the same user on both servers then. I am not sure what could be causing the failure but according to your log its an ssh auth issue. Have you tried running a telnet test to the port to check that is listening… Jose Autonomy Ops verificare tua hinc From:Sandeep Thakkar [mailto:sandee...@yahoo.com] Sent: Friday, September 02, 2011 12:05 AM To: Jose Mendoza; pgpool-general@pgfoundry.org Subject: Re: [Pgpool-general] online recovery fails on HPUX Sorry, I didn't get you.. Actually, both the server instances and the pgpool running on the same host.. Could this be an issue of loopback? From:Jose Mendoza To: pgpool-general@pgfoundry.org Sent: Friday, September 2, 2011 12:16 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX SSh trust must be created on both server for all the accounts involved in the recovery process. I would add the ssh-keys the the user for pgpool and try again. Jose Autonomy Ops verificare tua hinc From:Sandeep Thakkar [mailto:sandee...@yahoo.com] Sent: Thursday, September 01, 2011 10:17 PM To: Jose Mendoza; pgpool-general@pgfoundry.org Subject: Re: [Pgpool-general] online recovery fails on HPUX I can see the following lines in /tmp/pgpool.log: .. 2011-09-01 22:57:34 ERROR: pid 22975: check_replication_time_lag: DB node is valid but no persistent connection 2011-09-01 22:57:34 DEBUG: pid 22936: health_check: 1 th DB node status: 3 2011-09-01 22:57:36 LOG: pid 23019: starting recovering node 1 2011-09-01 22:57:36 ERROR: pid 23019: start_recover: could not connect master node. 2011-09-01 22:57:38 DEBUG: pid 22936: starting health checking 2011-09-01 22:57:38 DEBUG: pid 22936: health_check: 0 th DB node status: 1 2011-09-01 22:57:38 DEBUG: pid 22975: pool_ssl: SSL requested but SSL support is not available 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: auth kind: 0 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: parameter status data received 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: parameter status data received 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: backend key data received 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: transaction state: I 2011-09-01 22:57:39 DEBUG: pid 22936: health_check: 1 th DB node status: 3 2011-09-01 22:57:39 ERROR: pid 22975: connect_inet_domain_socket: connect() failed: Connection refused 2011-09-01 22:57:39 ERROR: pid 22975: make_persistent_db_connection: connection to localhost(5445) failed 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: T 2011-09-01 22:57:39 DEBUG: pid 22975: num_fileds: 1 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: D 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: C 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: Z 2011-09-01 22:57:39 ERROR: pid 22975: check_replication_time_lag: DB node is valid but no persistent connection 2011-09-01 22:57:43 DEBUG: pid 22936: starting health checking .. From:Sandeep Thakkar To: Jose Mendoza ; "pgpool-general@pgfoundry.org" Sent: Friday, September 2, 2011 10:10 AM Subject: Re: [Pgpool-general] online recovery fails on HPUX # Logging directory logdir = '/tmp' From:Jose Mendoza To: pgpool-general@pgfoundry.org Sent: Wednesday, August 31, 2011 10:28 PM Subject: Re:
Re: [Pgpool-general] online recovery fails on HPUX
I still face this issue and I wonder why do I see the following error in the pgpool log: 2011-09-14 04:48:52 LOG: pid 10268: starting recovering node 1 2011-09-14 04:48:53 ERROR: pid 10268: start_recover: could not connect master node. Please help! Thanks From: Sandeep Thakkar To: Sandeep Thakkar ; Jose Mendoza ; "pgpool-general@pgfoundry.org" Sent: Tuesday, September 13, 2011 4:52 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX I mean, pcp_remote_start contains "$PGCTL -w -D $DESTDIR start". i.e without SSH Sandeep. From: Sandeep Thakkar To: Jose Mendoza ; "pgpool-general@pgfoundry.org" Sent: Tuesday, September 13, 2011 4:24 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX Well, actually my database servers and pgpool running on the same host, so my pcp_remote_start does not contains "$PGCTL -w -D $DESTDIR start". i.e without SSH. This works fine on Linux, though. Sandeep. From: Jose Mendoza To: pgpool-general@pgfoundry.org Sent: Friday, September 2, 2011 12:47 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX If you already have an sshkey defined and it’s the same user on both servers then. I am not sure what could be causing the failure but according to your log its an ssh auth issue. Have you tried running a telnet test to the port to check that is listening… Jose Autonomy Ops verificare tua hinc From:Sandeep Thakkar [mailto:sandee...@yahoo.com] Sent: Friday, September 02, 2011 12:05 AM To: Jose Mendoza; pgpool-general@pgfoundry.org Subject: Re: [Pgpool-general] online recovery fails on HPUX Sorry, I didn't get you.. Actually, both the server instances and the pgpool running on the same host.. Could this be an issue of loopback? From:Jose Mendoza To: pgpool-general@pgfoundry.org Sent: Friday, September 2, 2011 12:16 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX SSh trust must be created on both server for all the accounts involved in the recovery process. I would add the ssh-keys the the user for pgpool and try again. Jose Autonomy Ops verificare tua hinc From:Sandeep Thakkar [mailto:sandee...@yahoo.com] Sent: Thursday, September 01, 2011 10:17 PM To: Jose Mendoza; pgpool-general@pgfoundry.org Subject: Re: [Pgpool-general] online recovery fails on HPUX I can see the following lines in /tmp/pgpool.log: .. 2011-09-01 22:57:34 ERROR: pid 22975: check_replication_time_lag: DB node is valid but no persistent connection 2011-09-01 22:57:34 DEBUG: pid 22936: health_check: 1 th DB node status: 3 2011-09-01 22:57:36 LOG: pid 23019: starting recovering node 1 2011-09-01 22:57:36 ERROR: pid 23019: start_recover: could not connect master node. 2011-09-01 22:57:38 DEBUG: pid 22936: starting health checking 2011-09-01 22:57:38 DEBUG: pid 22936: health_check: 0 th DB node status: 1 2011-09-01 22:57:38 DEBUG: pid 22975: pool_ssl: SSL requested but SSL support is not available 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: auth kind: 0 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: parameter status data received 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: parameter status data received 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: backend key data received 2011-09-01 22:57:38 DEBUG: pid 22975: s_do_auth: transaction state: I 2011-09-01 22:57:39 DEBUG: pid 22936: health_check: 1 th DB node status: 3 2011-09-01 22:57:39 ERROR: pid 22975: connect_inet_domain_socket: connect() failed: Connection refused 2011-09-01 22:57:39 ERROR: pid 22975: make_persistent_db_connection: connection to localhost(5445) failed 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: T 2011-09-01 22:57:39 DEBUG: pid 22975: num_fileds: 1 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: D 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: C 2011-09-01 22:57:39 DEBUG: pid 22975: do_query: kind: Z 2011-09-01 22:57:39 ERROR: pid 22975: check_replication_time_lag: DB node is valid but no persistent connection 2011-09-01 22:57:43 DEBUG: pid 22936: starting health checking .. From:Sandeep Thakkar To: Jose Mendoza ; "pgpool-general@pgfoundry.org" Sent: Friday, September 2, 2011 10:10 AM Subject: Re: [Pgpool-general] online recovery fails on HPUX # Logging directory logdir = '/tmp' From:Jose Mendoza To: pgpool-general@pgfoundry.org Sent: Wednesday, August 31, 2011 10:28 PM Subject: Re: [Pgpool-general] online recovery fails on HPUX What does your pgpool.conf say about logging: # Logging directory logdir = '/var/log' Jose Autonomy Ops verificare tua hinc From:pgpool-general-boun...@pgfoundry.org [mailto:pgpool-general-boun...@pgfoundry.org] On Behalf Of Sandeep Thakkar Sent: Wednesday, August 31, 2011 4:58 AM To: Sandeep Thakkar; pgpool-general@pgfoundry.org Subject: Re: [Pgpool-general] online r