Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-11-23 Thread Lonni J Friedman
On Wed, Nov 23, 2011 at 10:42 PM, Tatsuo Ishii  wrote:
>> Not wanting to be impatient, but I'm very concerned about this
>> problem, since its impossible to predict when it will occur.  Is there
>> additional information that I can provide to investigate this further?
>
> I really need to know where pgpool is looping.

OK, how can I capture that information?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-11-23 Thread Lonni J Friedman
Not wanting to be impatient, but I'm very concerned about this
problem, since its impossible to predict when it will occur.  Is there
additional information that I can provide to investigate this further?

thanks

On Tue, Nov 22, 2011 at 10:11 AM, Lonni J Friedman  wrote:
> This hadn't reproduced in a long time, but we ugpraded to pgpool-3.1 a
> week ago, and this morning I found a pgpool process that was consuming
> 100% CPU, and had been running for a week (although wasn't consuming
> 100% CPU the entire time).  Something else weird is that it showed an
> active, idle connection from a client system which had only been up
> for the past 21 hours.  Anyway, here's the backtrace from the process
> (gdb hung at the very bottom):
>
> [root ~]# gdb pgpool 31293
> GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/pgpool...Reading symbols from
> /usr/lib/debug/usr/sbin/pgpool.debug...done.
> done.
> Attaching to program: /usr/sbin/pgpool, process 31293
> Reading symbols from /usr/lib64/libpq.so.5...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libpq.so.5
> Reading symbols from /usr/lib64/libpcp.so.0...Reading symbols from
> /usr/lib/debug/usr/lib64/libpcp.so.0.0.0.debug...done.
> done.
> Loaded symbols for /usr/lib64/libpcp.so.0
> Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libpam.so.0
> Reading symbols from /usr/lib64/libssl.so.10...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libssl.so.10
> Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging
> symbols found)...done.
> Loaded symbols for /usr/lib64/libcrypto.so.10
> Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libcrypt.so.1
> Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libresolv.so.2
> Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libnsl.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /lib64/libgssapi_krb5.so.2
> Reading symbols from /usr/lib64/libldap_r-2.4.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /usr/lib64/libldap_r-2.4.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libaudit.so.1...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libaudit.so.1
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libkrb5.so.3
> Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libcom_err.so.2
> Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libk5crypto.so.3
> Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libz.so.1
> Reading symbols from /usr/lib64/libfreebl3.so...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/libfreebl3.so
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libkrb5support.so.0...(no debugging
> symbols found)...done.
> Loaded symbols for /lib64/libkrb5support.so.0
> Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libkeyutils.so.1
> Reading symbols from /usr/lib64/liblber-2.4.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /usr/lib64/libl

Re: [Pgpool-general] autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

2011-11-22 Thread Lonni J Friedman
*sigh*  I thought that I did, but clearly I did not.  Sorry about the noise.

On Tue, Nov 22, 2011 at 6:16 PM, Tatsuo Ishii  wrote:
> I think you'd better post to pgsql-general since your question is
> nothing related to pgpool.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

2011-11-22 Thread Lonni J Friedman
Thanks for your reply.  No, I do not regularly schedule any vacuum
cronjobs, as I was led to believe that this was no longer necessary
from past discussions on this and other mailing lists.  I had been
running vacuum full as frequently as twice/month earlier this year,
but it was basically making the database unusable for over 12 hours,
which wasn't acceptable in a production environment.  That's when I
did some research and saw numerous discussions commenting that
autovacuum is generally sufficient for most use cases.

The official documentation seems to suggest that the 'freeze' option
is deprecated, and that 'full' should only be run under special
circumstances (and not regularly):
http://www.postgresql.org/docs/9.0/static/sql-vacuum.html

Are you basically telling me that there is no way to stop the hung
autovacuum process other than shutting down  and restarting the entire
cluster?


On Tue, Nov 22, 2011 at 5:42 PM, Paul Robert Marino  wrote:
> The autovacuum process is suppose to reduce the frequence of needing a
> vacuum full or vacuum freeze not completly replace it.
> Do you regularly schedule at least a vacuum freeze if not this may be your
> problem.
> Also I've seen this happen when a server went too long betwean running
> vacuum full and there was a transaction id wrap around issue, but you would
> see that in the log.
> The first piece of advice is to restart the cluster and imeadiatly do a
> vacuum freeze this will do every thing short of doing an exclusive table
> lock. The next step is to plan a vacuum full on each the tables. The vacuum
> full will reqiure an exclusive table lock so you will not be able to query
> or update the tables durring this time but the earrlier vacuum freeze will
> have accelerated the process. note you may do multiple vacuums concurently
> on seperate tables (but not multiple on the same at once) if downtime is an
> issue, you'll still run into io contention but may be better able to take
> advantage of your cpu and ram if you do it
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


[Pgpool-general] autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

2011-11-22 Thread Lonni J Friedman
Greetings,
I'm running PostgreSQL-9.0.4 on a Linux-x86_64 cluster with 1 master,
and two streaming replication slaves.  Since late yesterday, the load
on the server has been noticably higher (5.00+) than normal (generally
under 1.00).  I investigated, and found that for the past ~18 hours,
there's one autovacuum process that has been running, and not making
any obvious progress:

select procpid,query_start,current_query from pg_stat_activity where
current_query LIKE 'autovacuum%' ;
 procpid |  query_start  |  current_query
-+---+-
   30188 | 2011-11-21 22:42:26.426315-08 | autovacuum: VACUUM
public.nppsmoketests
(1 row)

select c.oid,c.relname,l.pid,l.mode,l.granted from pg_class c join
pg_locks l on c.oid=l.relation where l.pid='30188' order by l.pid;
  oid  |  relname   |  pid  |   mode
| granted
---++---+--+-
 72112 | nppsmoketests  | 30188 |
ShareUpdateExclusiveLock | t
 72617 | nppsmoketests_pkey | 30188 | RowExclusiveLock
| t
 72619 | nppsmoketests_bug_idx  | 30188 | RowExclusiveLock
| t
 72620 | nppsmoketests_active_idx   | 30188 | RowExclusiveLock
| t
 72621 | nppsmoketests_arch_idx | 30188 | RowExclusiveLock
| t
 72622 | nppsmoketests_branch_idx   | 30188 | RowExclusiveLock
| t
 72623 | nppsmoketests_current_status_idx   | 30188 | RowExclusiveLock
| t
 72624 | nppsmoketests_build_type_idx   | 30188 | RowExclusiveLock
| t
 72625 | nppsmoketests_gpu_idx  | 30188 | RowExclusiveLock
| t
 72626 | nppsmoketests_os_idx   | 30188 | RowExclusiveLock
| t
 72627 | nppsmoketests_owner_idx| 30188 | RowExclusiveLock
| t
 72628 | nppsmoketests_regressioncl_idx | 30188 | RowExclusiveLock
| t
 72629 | nppsmoketests_subtest_idx  | 30188 | RowExclusiveLock
| t
 72630 | nppsmoketests_suiteid_idx  | 30188 | RowExclusiveLock
| t
 72631 | nppsmoketests_suiteid_testname_idx | 30188 | RowExclusiveLock
| t
 72632 | nppsmoketests_testcl_idx   | 30188 | RowExclusiveLock
| t
 72633 | nppsmoketests_testname_idx | 30188 | RowExclusiveLock
| t
 80749 | nppsmoketests_osversion_idx| 30188 | RowExclusiveLock
| t
(18 rows)

When I strace PID 30188, I see tons of this scrolling past quickly,
but I'm not really sure what it means beyond a 'Timeout' not looking
good:
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
lseek(95, 753901568, SEEK_SET)  = 753901568
read(95, "\202\1\0\0\260\315\250\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
lseek(95, 753917952, SEEK_SET)  = 753917952
read(95, "\202\1\0\0 N\253\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
lseek(95, 768606208, SEEK_SET)  = 768606208
read(95, "\204\1\0\0h!~\233\1\0\0\0\230\0\360\20\360\37\4
\0\0\0\0x\237\360\0\0\237\360\0"..., 8192) = 8192
lseek(95, 753934336, SEEK_SET)  = 753934336
read(95, "\202\1\0\0 &\275\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
read(95, "\202\1\0\0\10\33\276\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
lseek(95, 753958912, SEEK_SET)  = 753958912
read(95, "\202\1\0\0x\317\307\245\1\0\0\0\220\0\360\20\360\37\4
\0\0\0\0p\237\0\1\360\236\0\1"..., 8192) = 8192
select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
lseek(95, 768614400, SEEK_SET)  = 768614400

An old thread suggests that this is a stuck spinlock:
http://archives.postgresql.org/pgsql-performance/2009-05/msg00455.php

I'm using the defaults for all the *vacuum* options in
postgresql.conf, except for:
log_autovacuum_min_duration = 2500

At this point, I'm not sure what I can safely do to either terminate
this autovacuum process, or kick it into making progress again?

thanks
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-11-22 Thread Lonni J Friedman
ib-2.1.23-8.el6.x86_64
glibc-2.12-1.7.el6.x86_64 keyutils-libs-1.4-1.el6.x86_64
krb5-libs-1.8.2-3.el6.x86_64 libcom_err-1.41.12-3.el6.x86_64
libselinux-2.0.94-2.el6.x86_64
nss-softokn-freebl-3.12.7-1.1.el6.x86_64 openldap-2.4.19-15.el6.x86_64
openssl-1.0.0-4.el6.x86_64 pam-1.1.1-4.el6.x86_64
postgresql-libs-8.4.4-2.el6.x86_64 zlib-1.2.3-25.el6.x86_64
(gdb) bt
#0  0x004192c0 in pool_process_query (frontend=0x2dd8fd0,
backend=0x187d540,
reset_request=) at pool_process_query.c:379
#1  0x0040ae42 in do_child (unix_fd=3, inet_fd=) at child.c:354
#2  0x004054c5 in fork_a_child (unix_fd=3, inet_fd=4, id=152)
at main.c:1072
#3  0x00407b1c in main (argc=,
argv=)
at main.c:549
(gdb) cont
Continuing.

On Tue, Sep 20, 2011 at 9:25 AM, Lonni J Friedman  wrote:
> Nevermind, I figured out what I was doing wrong.  Now I just need for
> this to hang again.
>
> On Tue, Sep 20, 2011 at 9:08 AM, Lonni J Friedman  wrote:
>> I tried to do that, but pgpool refuses to start reporting:
>> -bash: /usr/lib/debug/usr/sbin/pgpool.debug: bad ELF interpreter: No
>> such file or directory
>>
>> I'm puzzled why it fails, as it was built on the same server where I
>> built the (working) release build of pgpool.
>>
>> $ file /usr/lib/debug/usr/sbin/pgpool.debug
>> /usr/lib/debug/usr/sbin/pgpool.debug: ELF 64-bit LSB executable,
>> x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs),
>> for GNU/Linux 2.6.18, not stripped
>> $ file /usr/sbin/pgpool
>> /usr/sbin/pgpool: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
>> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
>>
>>
>>
>>
>> On Mon, Sep 19, 2011 at 7:10 PM, Tatsuo Ishii  wrote:
>>> It's really hard to find the cause of the problem from a stack trace
>>> without symbol tables... Is it possible to reinstalll pgpool binary
>>> with debug symbols?
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>>> This happened again.  I ran the gdb command that you requested,
>>>> however it occurred to me that the output may not be all that useful
>>>> since I'm not running a debug build of pgpool:
>>>> ###
>>>> # gdb pgpool 2343
>>>> GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
>>>> Copyright (C) 2010 Free Software Foundation, Inc.
>>>> License GPLv3+: GNU GPL version 3 or later 
>>>> <http://gnu.org/licenses/gpl.html>
>>>> This is free software: you are free to change and redistribute it.
>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>>> and "show warranty" for details.
>>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>>> For bug reporting instructions, please see:
>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>> Reading symbols from /usr/sbin/pgpool...(no debugging symbols 
>>>> found)...done.
>>>> Attaching to program: /usr/sbin/pgpool, process 2343
>>>> Reading symbols from /usr/lib64/libpq.so.5...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /usr/lib64/libpq.so.5
>>>> Reading symbols from /usr/lib64/libpcp.so.0...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /usr/lib64/libpcp.so.0
>>>> Reading symbols from /lib64/libpam.so.0...(no debugging symbols 
>>>> found)...done.
>>>> Loaded symbols for /lib64/libpam.so.0
>>>> Reading symbols from /usr/lib64/libssl.so.10...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /usr/lib64/libssl.so.10
>>>> Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging
>>>> symbols found)...done.
>>>> Loaded symbols for /usr/lib64/libcrypto.so.10
>>>> Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols 
>>>> found)...done.
>>>> Loaded symbols for /lib64/libcrypt.so.1
>>>> Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
>>>> found)...done.
>>>> Loaded symbols for /lib64/libresolv.so.2
>>>> Reading symbols from /lib64/libnsl.so.1...(no debugging symbols 
>>>> found)...done.
>>>> Loaded symbols for /lib64/libnsl.so.1
>>>> Reading symbols from /lib64/libm.so.6...(no debugging symbols 
>>>> found)...done.
>>>> Loaded symbols for /lib64/libm.so.6
>>&g

Re: [Pgpool-general] logging the client IP address/hostname

2011-11-20 Thread Lonni J Friedman
On Sun, Nov 20, 2011 at 4:27 AM, Tatsuo Ishii  wrote:
 By enabling "log_connections" you have your client IP and pgpool child
 pid in your log. Since the log for "unable parse..." includes pgpool
 child pid, you can get client IP by checking pgpool child pid.

 LOG:   pid 4327: connection received: host=[local]
 LOG:   pid 4327: SimpleQuery: Unable to parse the query: select select;
>>>
>>> I was hoping there was some way other than enabling log_connections,
>>> as that's going to log every single connection (millions/day), even if
>>> there's no error?
>>
>> Good suggestion. I would like to include it for next release (3.2)
>> unless someone beats me.
>
> Here is the patch I promised. Here are sample log entries.
>
> From TCP/IP client case:
> 2011-11-20 21:15:52 LOG:   pid 23045: SimpleQuery: Unable to parse the query: 
> "select select;" from client 127.0.0.1(33737)
>
> From Unix domain socket client case:
> 2011-11-20 21:14:46 LOG:   pid 23045: SimpleQuery: Unable to parse the query: 
> "select select;" from local client

The sample log entries look good to me, thanks!
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] logging the client IP address/hostname

2011-11-06 Thread Lonni J Friedman
On Sun, Nov 6, 2011 at 5:58 PM, Tatsuo Ishii  wrote:
>> Greetings,
>> I'm using pgpool-3.0.4 to load balance queries between 3 PostgreSQL
>> servers.  I'm occasionally seeing some "unable to parse the query'
>> errors in the pgpool log for queries that are malformed from a client.
>>  The problem is that pgpool doesn't log the client IP or hostname, so
>> its rather difficult to debug.  Other than enabling full debug logs
>> for pgpool, is there some way to add the client IP address (or
>> hostname) whenever pgpool creates a log entry?
>
> By enabling "log_connections" you have your client IP and pgpool child
> pid in your log. Since the log for "unable parse..." includes pgpool
> child pid, you can get client IP by checking pgpool child pid.
>
> LOG:   pid 4327: connection received: host=[local]
> LOG:   pid 4327: SimpleQuery: Unable to parse the query: select select;

I was hoping there was some way other than enabling log_connections,
as that's going to log every single connection (millions/day), even if
there's no error?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


[Pgpool-general] logging the client IP address/hostname

2011-11-06 Thread Lonni J Friedman
Greetings,
I'm using pgpool-3.0.4 to load balance queries between 3 PostgreSQL
servers.  I'm occasionally seeing some "unable to parse the query'
errors in the pgpool log for queries that are malformed from a client.
 The problem is that pgpool doesn't log the client IP or hostname, so
its rather difficult to debug.  Other than enabling full debug logs
for pgpool, is there some way to add the client IP address (or
hostname) whenever pgpool creates a log entry?

thanks!
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] pgpool segfaulting and dumping core

2011-09-23 Thread Lonni J Friedman
On Fri, Sep 23, 2011 at 4:24 PM, Tatsuo Ishii  wrote:>
> No. Just try:
>
> $ psql -h cuda-db0 -U postgres -d postgres -c "select pgpool_walrecrunning();"
>
> If it succeeds, you are done.

Returned true.  thanks for your help!
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] pgpool segfaulting and dumping core

2011-09-23 Thread Lonni J Friedman
On Fri, Sep 23, 2011 at 3:43 PM, Tatsuo Ishii  wrote:
>>> Ok. I think this is the cause of the problem:
>>>
 2011-09-21 16:23:11 LOG:   pid 23588: find_primary_node:
 pgpool_walrecrunning does not exist
>>>
>>> I think you did not install pgpool_walrecrunning() on DB node 2. In
>>> this case find_primary_node() returns -1 and it is stored in
>>>
>>> Req_info->primary_node_id = find_primary_node();
>>>
>>> It is used as the parameter for TSTAE macro, which actually accesses
>>> out of array because of -1.
>>>
>>> 1250                    state = TSTATE(backend,
>>>
>>> So the solution would be installing pgpool_walrecrunning() on DB node 2.
>>
>> Thanks for looking at this.  I'm admittedly rather confused.  Where is
>> pgpool_walrerunning() documented?  I looked through the official
>> documentation (
>> http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html )
>> and don't see any references.  I also checked the official tutorial (
>> http://pgpool.projects.postgresql.org/pgpool-II/doc/tutorial-en.html )
>> and didn't see any references.
>
> The documents in the URL above is for the latest stable release of
> pgpool-II 3.1 and the version does not require installing the function
> anymore.
>
> Version specific docs are coming with under doc/ directory in the
> source code.

Ah, ok thanks.  I'm not sure how I missed this when I setup pgpool
over a month ago.

>
>> After googling it sounds like I need to run
>> /usr/share/pgpool-II/sql/walrecrunning/pgpool-walrecrunning.sql
>> against the postgres database (on the master, at which point it will
>> automatically replicate to the standby's)?
>> That seems to work fine:
>> $ psql -h cuda-db2 -U postgres -d postgres -f pgpool-walrecrunning.sql
>> CREATE FUNCTION
>> $ psql -h cuda-db2 -U postgres -d postgres -c "select 
>> pgpool_walrecrunning();"
>>  pgpool_walrecrunning
>> --
>>  f
>> (1 row)
>>
>> Trying to install it on either of the standby servers fails
>> (expectedly, since they're readonly):
>> $ psql -h cuda-db0 -U postgres -d postgres -f pgpool-walrecrunning.sql
>> psql:pgpool-walrecrunning.sql:4: ERROR:  cannot execute CREATE
>> FUNCTION in a read-only transaction
>> $ psql -h cuda-db0 -U postgres -d postgres -c "select 
>> pgpool_walrecrunning();"
>> ERROR:  could not access file "$libdir/pgpool-walrecrunning": No such
>> file or directory
>>
>> I restarted pgpool, after running the above.  Please let me know if I
>> need to do anything else to resolve this.
>
> It seems cuda-db0 lacks pgpool-walrecrunning.so. Have you ever installed it?

Nope, I hadn't.  I've done that now.  Does this require a restart of
pgpool or postgresql to take effect, or is simply installing it (with
'make install + running the sql script for each DB on the master)
sufficient?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] pgpool segfaulting and dumping core

2011-09-23 Thread Lonni J Friedman
On Fri, Sep 23, 2011 at 7:29 AM, Tatsuo Ishii  wrote:
>> On Thu, Sep 22, 2011 at 5:18 PM, Tatsuo Ishii  wrote:
 I checked a second of the coredumps, and it has the same backtrace.
 Let me know if you need anything else.  thanks!
>>>
>>> Thanks. Can you show me pgpool log when DNS failed and pgpool died in
>>> segfaulting?
>
> Ok. I think this is the cause of the problem:
>
>> 2011-09-21 16:23:11 LOG:   pid 23588: find_primary_node:
>> pgpool_walrecrunning does not exist
>
> I think you did not install pgpool_walrecrunning() on DB node 2. In
> this case find_primary_node() returns -1 and it is stored in
>
> Req_info->primary_node_id = find_primary_node();
>
> It is used as the parameter for TSTAE macro, which actually accesses
> out of array because of -1.
>
> 1250                    state = TSTATE(backend,
>
> So the solution would be installing pgpool_walrecrunning() on DB node 2.

Thanks for looking at this.  I'm admittedly rather confused.  Where is
pgpool_walrerunning() documented?  I looked through the official
documentation (
http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html )
and don't see any references.  I also checked the official tutorial (
http://pgpool.projects.postgresql.org/pgpool-II/doc/tutorial-en.html )
and didn't see any references.

After googling it sounds like I need to run
/usr/share/pgpool-II/sql/walrecrunning/pgpool-walrecrunning.sql
against the postgres database (on the master, at which point it will
automatically replicate to the standby's)?
That seems to work fine:
$ psql -h cuda-db2 -U postgres -d postgres -f pgpool-walrecrunning.sql
CREATE FUNCTION
$ psql -h cuda-db2 -U postgres -d postgres -c "select pgpool_walrecrunning();"
 pgpool_walrecrunning
--
 f
(1 row)

Trying to install it on either of the standby servers fails
(expectedly, since they're readonly):
$ psql -h cuda-db0 -U postgres -d postgres -f pgpool-walrecrunning.sql
psql:pgpool-walrecrunning.sql:4: ERROR:  cannot execute CREATE
FUNCTION in a read-only transaction
$ psql -h cuda-db0 -U postgres -d postgres -c "select pgpool_walrecrunning();"
ERROR:  could not access file "$libdir/pgpool-walrecrunning": No such
file or directory

I restarted pgpool, after running the above.  Please let me know if I
need to do anything else to resolve this.

thanks
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] pgpool segfaulting and dumping core

2011-09-23 Thread Lonni J Friedman
On Thu, Sep 22, 2011 at 5:18 PM, Tatsuo Ishii  wrote:
>> I checked a second of the coredumps, and it has the same backtrace.
>> Let me know if you need anything else.  thanks!
>
> Thanks. Can you show me pgpool log when DNS failed and pgpool died in
> segfaulting?

2011-09-21 16:23:00 ERROR: pid 26556: connect_inet_domain_socket:
gethostbyname() failed: Host name lookup failure host: cuda-db2
2011-09-21 16:23:00 ERROR: pid 26556: connection to cuda-db2(5432) failed
2011-09-21 16:23:00 ERROR: pid 26556: new_connection: create_cp() failed
2011-09-21 16:23:00 LOG:   pid 26556: notice_backend_error: 0 fail
over request from pid 26556
2011-09-21 16:23:00 LOG:   pid 23588: starting degeneration. shutdown
host cuda-db2(5432)
2011-09-21 16:23:00 LOG:   pid 23588: failover_handler: set new master node: 1
2011-09-21 16:23:00 LOG:   pid 23588: failover done. shutdown host
cuda-db2(5432)
2011-09-21 16:23:00 ERROR: pid 23588: connect_inet_domain_socket:
gethostbyname() failed: Host name lookup failure host: cuda-db1
2011-09-21 16:23:00 ERROR: pid 23588: make_persistent_db_connection:
connection to cuda-db1(5432) failed
2011-09-21 16:23:00 ERROR: pid 23588: find_primary_node:
make_persistent_connection failed
2011-09-21 16:23:00 LOG:   pid 23588: find_primary_node: primary node id is 1
2011-09-21 16:23:01 ERROR: pid 18746: connect_inet_domain_socket:
gethostbyname() failed: Host name lookup failure host: cuda-db1
2011-09-21 16:23:01 ERROR: pid 18746: connection to cuda-db1(5432) failed
2011-09-21 16:23:01 ERROR: pid 18746: new_connection: create_cp() failed
2011-09-21 16:23:01 LOG:   pid 18746: notice_backend_error: 1 fail
over request from pid 18746
2011-09-21 16:23:01 LOG:   pid 23588: starting degeneration. shutdown
host cuda-db1(5432)
2011-09-21 16:23:01 LOG:   pid 23588: failover_handler: set new master node: 2
2011-09-21 16:23:01 LOG:   pid 23588: failover done. shutdown host
cuda-db1(5432)
2011-09-21 16:23:11 LOG:   pid 23588: find_primary_node:
pgpool_walrecrunning does not exist
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18884 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18894 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18896 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18901 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18905 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18943 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18949 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18961 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18964 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18971 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 18977 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19001 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19003 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19010 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19011 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19012 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19019 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19024 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19031 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19033 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19034 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19039 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19045 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19046 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19050 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19053 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19063 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19067 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19070 was
terminated by segmentation fault
2011-09-21 16:23:11 ERROR: pid 23588: Child process 19071 was
terminated by segmentation fault
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listin

Re: [Pgpool-general] pgpool segfaulting and dumping core

2011-09-22 Thread Lonni J Friedman
On Thu, Sep 22, 2011 at 8:58 AM, Tatsuo Ishii  wrote:
>> Greetings,
>> I'm running pgpool-II-3.0.4 on a Linux-x86_64 server, which is load
>> balancing for a cluster of 3 PostgreSQL-9.0.4 servers (1 master, 2
>> standby).  I'm using pgpool for load balancing only (not managing
>> streaming replication or failover).  Last night, pgpool started
>> segfaulting repeatedly, and dumping core each time.  What triggered
>> this was a DNS outage, which resulted in pgpool being unable to
>> connect to some of the database servers, and write queries failing, as
>> it couldn't send write queries to a 'master'.  While its obviously not
>> pgpool's fault that the DNS stopped working, I would expect it to not
>> segfault simply because it can't send write queries to a master.
>>
>> Is this a known bug in 3.0.4 that is fixed in 3.1?  I've been holding
>> off on upgrading to 3.1 until there was a compelling reason to
>> upgrade.    Is someone interested in obtaining the core dumps for
>> analysis?
>
> Yes, I'm interested. Please show me a backtrace.

OK, here you go:


$ gdb /usr/sbin/pgpool coredump
GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /usr/sbin/pgpool...Reading symbols from
/usr/lib/debug/usr/sbin/pgpool.debug...done.
done.
[New Thread 18977]
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install
/usr/lib/debug/.build-id/53/01d144075c3c79c94529528e5c39bc1a3c188e
Reading symbols from /usr/lib64/libpq.so.5...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libpq.so.5
Reading symbols from /usr/lib64/libpcp.so.0.0.0...Reading symbols from
/usr/lib/debug/usr/lib64/libpcp.so.0.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/libpcp.so.0.0.0
Reading symbols from /lib64/libpam.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libpam.so.0
Reading symbols from /usr/lib64/libssl.so.10...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libssl.so.10
Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging
symbols found)...done.
Loaded symbols for /usr/lib64/libcrypto.so.10
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /usr/lib64/libldap_r-2.4.so.2...(no debugging
symbols found)...done.
Loaded symbols for /usr/lib64/libldap_r-2.4.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libaudit.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libaudit.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib64/libkrb5.so.3
Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libcom_err.so.2
Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libk5crypto.so.3
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /usr/lib64/libfreebl3.so...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libfreebl3.so
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libkrb5support.so.0...(no debugging
symbols found)...done.
Loaded symbols for /lib64/libkrb5support.so.0
Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libkeyutils.so.1
Reading symbols from /usr/lib64/liblber-2.4.so.2...(no debugging
symbols found)...done.
Loaded symbols for /usr/lib64/liblber-2.4.so.2
Reading symbols from /usr/lib64/libsasl2.so.2...(no debugg

[Pgpool-general] pgpool segfaulting and dumping core

2011-09-22 Thread Lonni J Friedman
Greetings,
I'm running pgpool-II-3.0.4 on a Linux-x86_64 server, which is load
balancing for a cluster of 3 PostgreSQL-9.0.4 servers (1 master, 2
standby).  I'm using pgpool for load balancing only (not managing
streaming replication or failover).  Last night, pgpool started
segfaulting repeatedly, and dumping core each time.  What triggered
this was a DNS outage, which resulted in pgpool being unable to
connect to some of the database servers, and write queries failing, as
it couldn't send write queries to a 'master'.  While its obviously not
pgpool's fault that the DNS stopped working, I would expect it to not
segfault simply because it can't send write queries to a master.

Is this a known bug in 3.0.4 that is fixed in 3.1?  I've been holding
off on upgrading to 3.1 until there was a compelling reason to
upgrade.Is someone interested in obtaining the core dumps for
analysis?

thanks
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-09-20 Thread Lonni J Friedman
Nevermind, I figured out what I was doing wrong.  Now I just need for
this to hang again.

On Tue, Sep 20, 2011 at 9:08 AM, Lonni J Friedman  wrote:
> I tried to do that, but pgpool refuses to start reporting:
> -bash: /usr/lib/debug/usr/sbin/pgpool.debug: bad ELF interpreter: No
> such file or directory
>
> I'm puzzled why it fails, as it was built on the same server where I
> built the (working) release build of pgpool.
>
> $ file /usr/lib/debug/usr/sbin/pgpool.debug
> /usr/lib/debug/usr/sbin/pgpool.debug: ELF 64-bit LSB executable,
> x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs),
> for GNU/Linux 2.6.18, not stripped
> $ file /usr/sbin/pgpool
> /usr/sbin/pgpool: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
>
>
>
>
> On Mon, Sep 19, 2011 at 7:10 PM, Tatsuo Ishii  wrote:
>> It's really hard to find the cause of the problem from a stack trace
>> without symbol tables... Is it possible to reinstalll pgpool binary
>> with debug symbols?
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>> This happened again.  I ran the gdb command that you requested,
>>> however it occurred to me that the output may not be all that useful
>>> since I'm not running a debug build of pgpool:
>>> ###
>>> # gdb pgpool 2343
>>> GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6)
>>> Copyright (C) 2010 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later 
>>> <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-redhat-linux-gnu".
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>...
>>> Reading symbols from /usr/sbin/pgpool...(no debugging symbols found)...done.
>>> Attaching to program: /usr/sbin/pgpool, process 2343
>>> Reading symbols from /usr/lib64/libpq.so.5...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib64/libpq.so.5
>>> Reading symbols from /usr/lib64/libpcp.so.0...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib64/libpcp.so.0
>>> Reading symbols from /lib64/libpam.so.0...(no debugging symbols 
>>> found)...done.
>>> Loaded symbols for /lib64/libpam.so.0
>>> Reading symbols from /usr/lib64/libssl.so.10...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /usr/lib64/libssl.so.10
>>> Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging
>>> symbols found)...done.
>>> Loaded symbols for /usr/lib64/libcrypto.so.10
>>> Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols 
>>> found)...done.
>>> Loaded symbols for /lib64/libcrypt.so.1
>>> Reading symbols from /lib64/libresolv.so.2...(no debugging symbols
>>> found)...done.
>>> Loaded symbols for /lib64/libresolv.so.2
>>> Reading symbols from /lib64/libnsl.so.1...(no debugging symbols 
>>> found)...done.
>>> Loaded symbols for /lib64/libnsl.so.1
>>> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
>>> Loaded symbols for /lib64/libm.so.6
>>> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
>>> Loaded symbols for /lib64/libc.so.6
>>> Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging
>>> symbols found)...done.
>>> Loaded symbols for /lib64/libgssapi_krb5.so.2
>>> Reading symbols from /usr/lib64/libldap_r-2.4.so.2...(no debugging
>>> symbols found)...done.
>>> Loaded symbols for /usr/lib64/libldap_r-2.4.so.2
>>> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
>>> found)...done.
>>> [Thread debugging using libthread_db enabled]
>>> Loaded symbols for /lib64/libpthread.so.0
>>> Reading symbols from /lib64/libaudit.so.1...(no debugging symbols 
>>> found)...done.
>>> Loaded symbols for /lib64/libaudit.so.1
>>> Reading symbols from /lib64/libdl.so.2...(no debugging symbols 
>>> found)...done.
>>> Loaded symbols for /lib64/libdl.so.2
>>> Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols 
>>> found)...done.
>

Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-09-20 Thread Lonni J Friedman
ging symbols
>> found)...done.
>> Loaded symbols for /usr/lib64/libfreebl3.so
>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
>> symbols found)...done.
>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>> Reading symbols from /lib64/libkrb5support.so.0...(no debugging
>> symbols found)...done.
>> Loaded symbols for /lib64/libkrb5support.so.0
>> Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libkeyutils.so.1
>> Reading symbols from /usr/lib64/liblber-2.4.so.2...(no debugging
>> symbols found)...done.
>> Loaded symbols for /usr/lib64/liblber-2.4.so.2
>> Reading symbols from /usr/lib64/libsasl2.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /usr/lib64/libsasl2.so.2
>> Reading symbols from /lib64/libselinux.so.1...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libselinux.so.1
>> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libnss_files.so.2
>> Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/libnss_dns.so.2
>> 0x0044790a in ?? ()
>> Missing separate debuginfos, use: debuginfo-install 
>> pgpool-II-3.0.4-1.el6.x86_64
>> (gdb) bt
>> #0  0x0044790a in ?? ()
>> #1  0x00414547 in ?? ()
>> #2  0x0041762e in ?? ()
>> #3  0x0040a4cd in ?? ()
>> #4  0x00405345 in ?? ()
>> #5  0x004068dc in ?? ()
>> #6  0x004076dc in ?? ()
>> #7  0x0031ae41ec5d in __libc_start_main () from /lib64/libc.so.6
>> #8  0x00403bf9 in ?? ()
>> #9  0x7fff0663cfc8 in ?? ()
>> #10 0x001c in ?? ()
>> #11 0x0004 in ?? ()
>> #12 0x7fff0663d90b in ?? ()
>> #13 0x7fff0663dfe6 in ?? ()
>> #14 0x7fff0663dfe6 in ?? ()
>> #15 0x7fff0663dfe6 in ?? ()
>> #16 0x in ?? ()
>> (gdb) cont
>> Continuing.
>>
>> ###
>>
>> The entire session completely hung at the end there.
>>
>>
>> On Wed, Sep 14, 2011 at 3:57 PM, Tatsuo Ishii  wrote:
>>> Please use gdb. For example,
>>>
>>> become postgres user (or root user)
>>> gdb pgpool 29191
>>> bt
>>> cont
>>> bt
>>> cont
>>> :
>>> :
>>> :
>>>
>>> This will give us an idea where it's looping.
>>> --
>>> Tatsuo Ishii
>>> SRA OSS, Inc. Japan
>>> English: http://www.sraoss.co.jp/index_en.php
>>> Japanese: http://www.sraoss.co.jp
>>>
>>>> This problem has returned yet again:
>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 29191 postgres  20   0 80192  14m 1544 R 89.8  0.2  51:15.91 pgpool
>>>>
>>>> postgres 29191  3.4  0.1  80192 14728 ?        R    Sep13  51:40
>>>> pgpool: lfriedman nightly 10.31.96.84(61698) idle
>>>>
>>>>
>>>> I'd really appreciate some input on how to debug this.
>>>>
>>>>
>>>> On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman  
>>>> wrote:
>>>>> No one else has experienced this or has suggestions how to debug it?
>>>>>
>>>>> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman  
>>>>> wrote:
>>>>>> Greetings,
>>>>>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
>>>>>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
>>>>>> standby).  I'm seeing strange behavior where a single pgpool process
>>>>>> seems to hang after some period of time, and then consume 100% of the
>>>>>> CPU.  I've seen this behavior happen twice since last Friday (when
>>>>>> pgpool was brought online in my production environment).  At the
>>>>>> moment the current hung process looks like this in 'ps auxww' output:
>>>>>>
>>>>>> postgres 19838 98.7  0.0  68856  2904 ?        R    Sep06 1027:36
>>>>>> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>>>>>>
>>>>>>
>>>>>> In top, I see:
>>>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>>>> 19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool
>>>>>>
>>>>>>
>>>>>> When to connect to the process with strace, there is no output, so I'm
>>>>>> guessing the process is stuck spinning somewhere:
>>>>>> # strace -p 19838
>>>>>> Process 19838 attached - interrupt to quit
>>>>>> ...
>>>>>> ^CProcess 19838 detached
>>>>>>
>>>>>> One thing that i'm certain of is that the client IP (10.31.45.20)
>>>>>> associated with the hung process has rebooted at least once since that
>>>>>> process was spawned.  So pgpool seems to be in some confused state, as
>>>>>> the client definitely severed the connection already.  I checked the
>>>>>> pgpool log and there are no explicit references to PID 19838.  I'm at
>>>>>> a loss how to debug this further, but clearly something is wrong
>>>>>> somewhere, and this isn't normal/expected behavior.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-09-18 Thread Lonni J Friedman
0b in ?? ()
#13 0x7fff0663dfe6 in ?? ()
#14 0x7fff0663dfe6 in ?? ()
#15 0x7fff0663dfe6 in ?? ()
#16 0x in ?? ()
(gdb) cont
Continuing.

###

The entire session completely hung at the end there.


On Wed, Sep 14, 2011 at 3:57 PM, Tatsuo Ishii  wrote:
> Please use gdb. For example,
>
> become postgres user (or root user)
> gdb pgpool 29191
> bt
> cont
> bt
> cont
> :
> :
> :
>
> This will give us an idea where it's looping.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> This problem has returned yet again:
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 29191 postgres  20   0 80192  14m 1544 R 89.8  0.2  51:15.91 pgpool
>>
>> postgres 29191  3.4  0.1  80192 14728 ?        R    Sep13  51:40
>> pgpool: lfriedman nightly 10.31.96.84(61698) idle
>>
>>
>> I'd really appreciate some input on how to debug this.
>>
>>
>> On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman  wrote:
>>> No one else has experienced this or has suggestions how to debug it?
>>>
>>> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman  
>>> wrote:
>>>> Greetings,
>>>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
>>>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
>>>> standby).  I'm seeing strange behavior where a single pgpool process
>>>> seems to hang after some period of time, and then consume 100% of the
>>>> CPU.  I've seen this behavior happen twice since last Friday (when
>>>> pgpool was brought online in my production environment).  At the
>>>> moment the current hung process looks like this in 'ps auxww' output:
>>>>
>>>> postgres 19838 98.7  0.0  68856  2904 ?        R    Sep06 1027:36
>>>> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>>>>
>>>>
>>>> In top, I see:
>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool
>>>>
>>>>
>>>> When to connect to the process with strace, there is no output, so I'm
>>>> guessing the process is stuck spinning somewhere:
>>>> # strace -p 19838
>>>> Process 19838 attached - interrupt to quit
>>>> ...
>>>> ^CProcess 19838 detached
>>>>
>>>> One thing that i'm certain of is that the client IP (10.31.45.20)
>>>> associated with the hung process has rebooted at least once since that
>>>> process was spawned.  So pgpool seems to be in some confused state, as
>>>> the client definitely severed the connection already.  I checked the
>>>> pgpool log and there are no explicit references to PID 19838.  I'm at
>>>> a loss how to debug this further, but clearly something is wrong
>>>> somewhere, and this isn't normal/expected behavior.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-15 Thread Lonni J Friedman
On Thu, Sep 15, 2011 at 8:57 AM, Johnny Tan  wrote:
> On Wed, Sep 14, 2011 at 9:12 PM, Lonni J Friedman  wrote:
>> On Wed, Sep 14, 2011 at 6:00 PM, Tatsuo Ishii  wrote:
>>>> On Wed, Sep 14, 2011 at 4:22 PM, Tatsuo Ishii  wrote:
>>>>>>>> I'm pretty sure that's not the case as the messages stop whenever
>>>>>>>> pgpool isn't running, they were not present prior to using pgpool, and
>>>>>>>> pg_hba.conf is setup such that the database servers only accept
>>>>>>>> connections from each other, and the server running pgpool.  None of
>>>>>>>> these servers have normal users connected directly to them (such as
>>>>>>>> with ssh), nor are they running anything that would connect to the
>>>>>>>> database as a client.  Also, the volume of these messages are such
>>>>>>>> that something significant has to be causing them.  Last night, in the
>>>>>>>> span of 5 minutes, there were 117 of these messages.
>>>>>>>
>>>>>>> Ok. I would like to narraow down the reason why we have "unexpected
>>>>>>> EOF on client connection" message frequently. I think currently there
>>>>>>> are two possiblities:
>>>>>>>
>>>>>>> 1) pgpool child was killed by some unknown reason(we can omit
>>>>>>>   segfault case because you don't see it in the pgpool log)
>>>>>>>
>>>>>>> 2) pgpool child disconnects to PostgreSQL in ungraceful manner
>>>>>>>
>>>>>>> For 1) I would like to know if pgpool child process are fine since
>>>>>>> they are spawned. Are you seeing any pgpool child process disappeared
>>>>>>> since pgpool started?
>>>>>>
>>>>>> I assume this should be determined by num_init_children (which I've
>>>>>> set to 195 in pgpool.conf)?  If so, then I currently have 195
>>>>>> processes in either the "wait for connection request" state or
>>>>>> actively connected state.
>>>>>
>>>>> No. Pgpool parent process automatically respawns child process if it's
>>>>> dyning. So having num_init_children child process is not showing
>>>>> anything usefull. You record 195 process ids and compare current
>>>>> process ids. If some of them have been changed, we can assume that
>>>>> child process is dying.
>>>>
>>>> Ah, good point.  I just diffed the list of PIDs associated with pgpool
>>>> processes before and after another EOF message in the log, and there
>>>> were no differences.  So I think that rules out any processes dying?
>>>
>>> Right.
>>>
>>>> One other thing that I just noticed from comparing logs between all of
>>>> the database servers is that the time stamps for every one of the
>>>> 'unexpected EOF on client connection' instances are identical.  In
>>>> other words, they are happening at the same time on each server.  I
>>>> think this further suggests that pgpool has to be doing it?
>>>
>>> Yes, I think so unless you set connection_life_time to other than 0 or
>>> the network connection between PostgreSQL and pgpool is unstable.
>
> I've used both pgpool and pgbouncer. We also have recurring similar
> client-EOF messages in both pgpool and pgbouncer logs. On pgbouncer,
> the actual error is:
>
> LOG C-0x7db270: mydb/myuser@10.0.0.160:43057 closing because: client
> unexpected eof (age=44005)
>
> On pgpool side, the error is exactly the same as Lonni's.
>
> So, from that, I had concluded that it was from our app side. But I
> have nothing further to substantiate that conclusion. Just adding it
> as another datapoint.
>
> Lonni, given how easy pgbouncer is to setup, it may be worth doing a
> quick proof-of-concept with it and see if you get similar EOF errors
> there. If so, I think that would eliminate the problem stemming from
> the middleware?

Actually I was using pgbouncer prior to switching to pgpool, and there
were never any errors related to unexpected EOF.  This definitely
started since i switched to pgpool.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-14 Thread Lonni J Friedman
On Wed, Sep 14, 2011 at 6:00 PM, Tatsuo Ishii  wrote:
>> On Wed, Sep 14, 2011 at 4:22 PM, Tatsuo Ishii  wrote:
>> I'm pretty sure that's not the case as the messages stop whenever
>> pgpool isn't running, they were not present prior to using pgpool, and
>> pg_hba.conf is setup such that the database servers only accept
>> connections from each other, and the server running pgpool.  None of
>> these servers have normal users connected directly to them (such as
>> with ssh), nor are they running anything that would connect to the
>> database as a client.  Also, the volume of these messages are such
>> that something significant has to be causing them.  Last night, in the
>> span of 5 minutes, there were 117 of these messages.
>
> Ok. I would like to narraow down the reason why we have "unexpected
> EOF on client connection" message frequently. I think currently there
> are two possiblities:
>
> 1) pgpool child was killed by some unknown reason(we can omit
>   segfault case because you don't see it in the pgpool log)
>
> 2) pgpool child disconnects to PostgreSQL in ungraceful manner
>
> For 1) I would like to know if pgpool child process are fine since
> they are spawned. Are you seeing any pgpool child process disappeared
> since pgpool started?

 I assume this should be determined by num_init_children (which I've
 set to 195 in pgpool.conf)?  If so, then I currently have 195
 processes in either the "wait for connection request" state or
 actively connected state.
>>>
>>> No. Pgpool parent process automatically respawns child process if it's
>>> dyning. So having num_init_children child process is not showing
>>> anything usefull. You record 195 process ids and compare current
>>> process ids. If some of them have been changed, we can assume that
>>> child process is dying.
>>
>> Ah, good point.  I just diffed the list of PIDs associated with pgpool
>> processes before and after another EOF message in the log, and there
>> were no differences.  So I think that rules out any processes dying?
>
> Right.
>
>> One other thing that I just noticed from comparing logs between all of
>> the database servers is that the time stamps for every one of the
>> 'unexpected EOF on client connection' instances are identical.  In
>> other words, they are happening at the same time on each server.  I
>> think this further suggests that pgpool has to be doing it?
>
> Yes, I think so unless you set connection_life_time to other than 0 or
> the network connection between PostgreSQL and pgpool is unstable.

connection _life_time is currently 0 (since you recommended I change
it earlier).  I don't have any evidence to suggest that the network
connection is unstable.  There are 0 errors of any kind in ifconfig
output.

>
> Let me think how we can make further investigation...

ok, thanks.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-14 Thread Lonni J Friedman
On Wed, Sep 14, 2011 at 4:22 PM, Tatsuo Ishii  wrote:
 I'm pretty sure that's not the case as the messages stop whenever
 pgpool isn't running, they were not present prior to using pgpool, and
 pg_hba.conf is setup such that the database servers only accept
 connections from each other, and the server running pgpool.  None of
 these servers have normal users connected directly to them (such as
 with ssh), nor are they running anything that would connect to the
 database as a client.  Also, the volume of these messages are such
 that something significant has to be causing them.  Last night, in the
 span of 5 minutes, there were 117 of these messages.
>>>
>>> Ok. I would like to narraow down the reason why we have "unexpected
>>> EOF on client connection" message frequently. I think currently there
>>> are two possiblities:
>>>
>>> 1) pgpool child was killed by some unknown reason(we can omit
>>>   segfault case because you don't see it in the pgpool log)
>>>
>>> 2) pgpool child disconnects to PostgreSQL in ungraceful manner
>>>
>>> For 1) I would like to know if pgpool child process are fine since
>>> they are spawned. Are you seeing any pgpool child process disappeared
>>> since pgpool started?
>>
>> I assume this should be determined by num_init_children (which I've
>> set to 195 in pgpool.conf)?  If so, then I currently have 195
>> processes in either the "wait for connection request" state or
>> actively connected state.
>
> No. Pgpool parent process automatically respawns child process if it's
> dyning. So having num_init_children child process is not showing
> anything usefull. You record 195 process ids and compare current
> process ids. If some of them have been changed, we can assume that
> child process is dying.

Ah, good point.  I just diffed the list of PIDs associated with pgpool
processes before and after another EOF message in the log, and there
were no differences.  So I think that rules out any processes dying?

One other thing that I just noticed from comparing logs between all of
the database servers is that the time stamps for every one of the
'unexpected EOF on client connection' instances are identical.  In
other words, they are happening at the same time on each server.  I
think this further suggests that pgpool has to be doing it?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-09-14 Thread Lonni J Friedman
Thanks for your reply.  I'll do this the next time this happens (which
will likely be within a few days based on history).

On Wed, Sep 14, 2011 at 3:57 PM, Tatsuo Ishii  wrote:
> Please use gdb. For example,
>
> become postgres user (or root user)
> gdb pgpool 29191
> bt
> cont
> bt
> cont
> :
> :
> :
>
> This will give us an idea where it's looping.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
>> This problem has returned yet again:
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 29191 postgres  20   0 80192  14m 1544 R 89.8  0.2  51:15.91 pgpool
>>
>> postgres 29191  3.4  0.1  80192 14728 ?        R    Sep13  51:40
>> pgpool: lfriedman nightly 10.31.96.84(61698) idle
>>
>>
>> I'd really appreciate some input on how to debug this.
>>
>>
>> On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman  wrote:
>>> No one else has experienced this or has suggestions how to debug it?
>>>
>>> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman  
>>> wrote:
>>>> Greetings,
>>>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
>>>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
>>>> standby).  I'm seeing strange behavior where a single pgpool process
>>>> seems to hang after some period of time, and then consume 100% of the
>>>> CPU.  I've seen this behavior happen twice since last Friday (when
>>>> pgpool was brought online in my production environment).  At the
>>>> moment the current hung process looks like this in 'ps auxww' output:
>>>>
>>>> postgres 19838 98.7  0.0  68856  2904 ?        R    Sep06 1027:36
>>>> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>>>>
>>>>
>>>> In top, I see:
>>>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool
>>>>
>>>>
>>>> When to connect to the process with strace, there is no output, so I'm
>>>> guessing the process is stuck spinning somewhere:
>>>> # strace -p 19838
>>>> Process 19838 attached - interrupt to quit
>>>> ...
>>>> ^CProcess 19838 detached
>>>>
>>>> One thing that i'm certain of is that the client IP (10.31.45.20)
>>>> associated with the hung process has rebooted at least once since that
>>>> process was spawned.  So pgpool seems to be in some confused state, as
>>>> the client definitely severed the connection already.  I checked the
>>>> pgpool log and there are no explicit references to PID 19838.  I'm at
>>>> a loss how to debug this further, but clearly something is wrong
>>>> somewhere, and this isn't normal/expected behavior.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-14 Thread Lonni J Friedman
On Wed, Sep 14, 2011 at 3:56 PM, Tatsuo Ishii  wrote:
>> On Tue, Sep 13, 2011 at 8:48 PM, Tatsuo Ishii  wrote:
>>>> On Mon, Sep 12, 2011 at 6:47 PM, Lonni J Friedman  
>>>> wrote:
>>>>> On Mon, Sep 12, 2011 at 6:39 PM, Tatsuo Ishii  wrote:
>>>>>>>> I couldn't find anything possibly related to your problem at a first
>>>>>>>> grance(in theory client_idle_limit and authentication_timeout are not
>>>>>>>> related but you might want to change them to see anything could be
>>>>>>>> changed).
>>>>>>>
>>>>>>> OK, I'll give that a try.  Should I just try increasing them by 10 or 
>>>>>>> 20s?
>>>>>>
>>>>>> I'd suggest giving them 0. This will prevent to initiate those
>>>>>> functionalities which the directives are related.
>>>>>>
>>>>>> Also you hve child_life_time being 300. I don't expect this is related
>>>>>> but could you set it to 0 and see anything gest changed for just in
>>>>>> case?
>>>>>
>>>>> OK, i'll make those changes tomorrow (its late in the day here, and I
>>>>> don't want to introduce potential problems in the middle of the night
>>>>> when no one is closely monitoring the server), and let you know if
>>>>> they have any impact.
>>>>
>>>>
>>>> client_idle_limit was already 0.  I set authentication_timeout=0 and
>>>> child_life_time=0, and restarted pgpool, however that had no impact.
>>>> I'm still seeing:
>>>> 26323 2011-09-13 09:28:19 PDT LOG:  unexpected EOF on client connection
>>>> 3933 2011-09-13 09:36:20 PDT LOG:  unexpected EOF on client connection
>>>
>>> Humm. Is it possible that those connections do not come from pgpool
>>> process?
>>
>> I'm pretty sure that's not the case as the messages stop whenever
>> pgpool isn't running, they were not present prior to using pgpool, and
>> pg_hba.conf is setup such that the database servers only accept
>> connections from each other, and the server running pgpool.  None of
>> these servers have normal users connected directly to them (such as
>> with ssh), nor are they running anything that would connect to the
>> database as a client.  Also, the volume of these messages are such
>> that something significant has to be causing them.  Last night, in the
>> span of 5 minutes, there were 117 of these messages.
>
> Ok. I would like to narraow down the reason why we have "unexpected
> EOF on client connection" message frequently. I think currently there
> are two possiblities:
>
> 1) pgpool child was killed by some unknown reason(we can omit
>   segfault case because you don't see it in the pgpool log)
>
> 2) pgpool child disconnects to PostgreSQL in ungraceful manner
>
> For 1) I would like to know if pgpool child process are fine since
> they are spawned. Are you seeing any pgpool child process disappeared
> since pgpool started?

I assume this should be determined by num_init_children (which I've
set to 195 in pgpool.conf)?  If so, then I currently have 195
processes in either the "wait for connection request" state or
actively connected state.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-09-14 Thread Lonni J Friedman
This problem has returned yet again:
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29191 postgres  20   0 80192  14m 1544 R 89.8  0.2  51:15.91 pgpool

postgres 29191  3.4  0.1  80192 14728 ?RSep13  51:40
pgpool: lfriedman nightly 10.31.96.84(61698) idle


I'd really appreciate some input on how to debug this.


On Fri, Sep 9, 2011 at 8:11 AM, Lonni J Friedman  wrote:
> No one else has experienced this or has suggestions how to debug it?
>
> On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman  wrote:
>> Greetings,
>> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
>> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
>> standby).  I'm seeing strange behavior where a single pgpool process
>> seems to hang after some period of time, and then consume 100% of the
>> CPU.  I've seen this behavior happen twice since last Friday (when
>> pgpool was brought online in my production environment).  At the
>> moment the current hung process looks like this in 'ps auxww' output:
>>
>> postgres 19838 98.7  0.0  68856  2904 ?        R    Sep06 1027:36
>> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>>
>>
>> In top, I see:
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool
>>
>>
>> When to connect to the process with strace, there is no output, so I'm
>> guessing the process is stuck spinning somewhere:
>> # strace -p 19838
>> Process 19838 attached - interrupt to quit
>> ...
>> ^CProcess 19838 detached
>>
>> One thing that i'm certain of is that the client IP (10.31.45.20)
>> associated with the hung process has rebooted at least once since that
>> process was spawned.  So pgpool seems to be in some confused state, as
>> the client definitely severed the connection already.  I checked the
>> pgpool log and there are no explicit references to PID 19838.  I'm at
>> a loss how to debug this further, but clearly something is wrong
>> somewhere, and this isn't normal/expected behavior.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-14 Thread Lonni J Friedman
On Tue, Sep 13, 2011 at 8:48 PM, Tatsuo Ishii  wrote:
>> On Mon, Sep 12, 2011 at 6:47 PM, Lonni J Friedman  wrote:
>>> On Mon, Sep 12, 2011 at 6:39 PM, Tatsuo Ishii  wrote:
>>>>>> I couldn't find anything possibly related to your problem at a first
>>>>>> grance(in theory client_idle_limit and authentication_timeout are not
>>>>>> related but you might want to change them to see anything could be
>>>>>> changed).
>>>>>
>>>>> OK, I'll give that a try.  Should I just try increasing them by 10 or 20s?
>>>>
>>>> I'd suggest giving them 0. This will prevent to initiate those
>>>> functionalities which the directives are related.
>>>>
>>>> Also you hve child_life_time being 300. I don't expect this is related
>>>> but could you set it to 0 and see anything gest changed for just in
>>>> case?
>>>
>>> OK, i'll make those changes tomorrow (its late in the day here, and I
>>> don't want to introduce potential problems in the middle of the night
>>> when no one is closely monitoring the server), and let you know if
>>> they have any impact.
>>
>>
>> client_idle_limit was already 0.  I set authentication_timeout=0 and
>> child_life_time=0, and restarted pgpool, however that had no impact.
>> I'm still seeing:
>> 26323 2011-09-13 09:28:19 PDT LOG:  unexpected EOF on client connection
>> 3933 2011-09-13 09:36:20 PDT LOG:  unexpected EOF on client connection
>
> Humm. Is it possible that those connections do not come from pgpool
> process?

I'm pretty sure that's not the case as the messages stop whenever
pgpool isn't running, they were not present prior to using pgpool, and
pg_hba.conf is setup such that the database servers only accept
connections from each other, and the server running pgpool.  None of
these servers have normal users connected directly to them (such as
with ssh), nor are they running anything that would connect to the
database as a client.  Also, the volume of these messages are such
that something significant has to be causing them.  Last night, in the
span of 5 minutes, there were 117 of these messages.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-13 Thread Lonni J Friedman
On Mon, Sep 12, 2011 at 6:47 PM, Lonni J Friedman  wrote:
> On Mon, Sep 12, 2011 at 6:39 PM, Tatsuo Ishii  wrote:
>>>> I couldn't find anything possibly related to your problem at a first
>>>> grance(in theory client_idle_limit and authentication_timeout are not
>>>> related but you might want to change them to see anything could be
>>>> changed).
>>>
>>> OK, I'll give that a try.  Should I just try increasing them by 10 or 20s?
>>
>> I'd suggest giving them 0. This will prevent to initiate those
>> functionalities which the directives are related.
>>
>> Also you hve child_life_time being 300. I don't expect this is related
>> but could you set it to 0 and see anything gest changed for just in
>> case?
>
> OK, i'll make those changes tomorrow (its late in the day here, and I
> don't want to introduce potential problems in the middle of the night
> when no one is closely monitoring the server), and let you know if
> they have any impact.


client_idle_limit was already 0.  I set authentication_timeout=0 and
child_life_time=0, and restarted pgpool, however that had no impact.
I'm still seeing:
26323 2011-09-13 09:28:19 PDT LOG:  unexpected EOF on client connection
3933 2011-09-13 09:36:20 PDT LOG:  unexpected EOF on client connection


>>>> Do you have anything between pgpool and PostgreSQL? It has been
>>>> reported that some firewall hardware/software kills TCP connections if
>>>> they are idle for n seconds.
>>>
>>> Nope, there are no firewalls, or anything else that I'm aware of
>>> sitting between pgpool and the database servers.
>>
>> Ok.
>>
>> Another possibility is, pgpool child process is dying for unknown
>> reason. Do you see anything bad (for example child dies segfault) in
>> the pgpool log?
>
> I don't believe that's happening, but I'll have to check the logs
> tomorrow to verify.  Currently, the only truly bad behavior that I'm
> currently experiencing is this:
> http://lists.pgfoundry.org/pipermail/pgpool-general/2011-September/003954.html

I see no explicit segfaults in the pgpool logs.

thanks for your help.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-12 Thread Lonni J Friedman
On Mon, Sep 12, 2011 at 6:39 PM, Tatsuo Ishii  wrote:
>>> I couldn't find anything possibly related to your problem at a first
>>> grance(in theory client_idle_limit and authentication_timeout are not
>>> related but you might want to change them to see anything could be
>>> changed).
>>
>> OK, I'll give that a try.  Should I just try increasing them by 10 or 20s?
>
> I'd suggest giving them 0. This will prevent to initiate those
> functionalities which the directives are related.
>
> Also you hve child_life_time being 300. I don't expect this is related
> but could you set it to 0 and see anything gest changed for just in
> case?

OK, i'll make those changes tomorrow (its late in the day here, and I
don't want to introduce potential problems in the middle of the night
when no one is closely monitoring the server), and let you know if
they have any impact.

>
>>> Do you have anything between pgpool and PostgreSQL? It has been
>>> reported that some firewall hardware/software kills TCP connections if
>>> they are idle for n seconds.
>>
>> Nope, there are no firewalls, or anything else that I'm aware of
>> sitting between pgpool and the database servers.
>
> Ok.
>
> Another possibility is, pgpool child process is dying for unknown
> reason. Do you see anything bad (for example child dies segfault) in
> the pgpool log?

I don't believe that's happening, but I'll have to check the logs
tomorrow to verify.  Currently, the only truly bad behavior that I'm
currently experiencing is this:
http://lists.pgfoundry.org/pipermail/pgpool-general/2011-September/003954.html
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-12 Thread Lonni J Friedman
On Mon, Sep 12, 2011 at 5:14 PM, Tatsuo Ishii  wrote:
>> On Mon, Sep 12, 2011 at 4:16 PM, Tatsuo Ishii  wrote:
 Yes, all connections defined in pool_hba.conf are trust auth.
 However, I also have
 health_check_period = 0
 in pgpool.conf, so I'd assume that no health checks are being performed?
>>>
>>> Have you changed child_life_time or something from defaults? I would
>>> like to take a look at your pgpool.conf.
>>
>> Nope, I'm using the default of 300.  Anyway, pgpool.conf attached.
>> Thanks for looking at this.
>
> I couldn't find anything possibly related to your problem at a first
> grance(in theory client_idle_limit and authentication_timeout are not
> related but you might want to change them to see anything could be
> changed).

OK, I'll give that a try.  Should I just try increasing them by 10 or 20s?

>
> Do you have anything between pgpool and PostgreSQL? It has been
> reported that some firewall hardware/software kills TCP connections if
> they are idle for n seconds.

Nope, there are no firewalls, or anything else that I'm aware of
sitting between pgpool and the database servers.
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-12 Thread Lonni J Friedman
On Mon, Sep 12, 2011 at 4:16 PM, Tatsuo Ishii  wrote:
>> Yes, all connections defined in pool_hba.conf are trust auth.
>> However, I also have
>> health_check_period = 0
>> in pgpool.conf, so I'd assume that no health checks are being performed?
>
> Have you changed child_life_time or something from defaults? I would
> like to take a look at your pgpool.conf.

Nope, I'm using the default of 300.  Anyway, pgpool.conf attached.
Thanks for looking at this.


pgpool.conf
Description: Binary data
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] pgpool thinks a backend is down even though its not

2011-09-12 Thread Lonni J Friedman
Thanks for your quick reply!  I figured this out on my own about 2
minutes ago.  All good now.

On Mon, Sep 12, 2011 at 10:24 AM,   wrote:
> You have to use pcp_attach_node command to re-attach that node. Pgpool
> doesn't know if a database that went down is in good shape even if it is
> back online. So, after you perform the synching (it seems like you did),
> call the pcp_attach_node to bring it back to pgpool's pool of databases.
>
> -Daniel
>
>
>> -Original Message-
>> From: pgpool-general-boun...@pgfoundry.org [mailto:pgpool-general-
>> boun...@pgfoundry.org] On Behalf Of Lonni J Friedman
>> Sent: Monday, September 12, 2011 1:14 PM
>> To: pgpool-general@pgfoundry.org
>> Subject: [Pgpool-general] pgpool thinks a backend is down even though
>> its not
>>
>> Greetings,
>> I've got a 3 node postgresql-9.0.4 cluster (1 master, two standby, all
>> running on Linux-x86_64.  I had a hardware problem on one of the
>> standby's, and had to bring it down to swap out the bad HW.  I got it
>> synced back up with the mater successfully, and I can successfully
>> manually run SQL queries from the pgpool server to the standby.
>> However, pgpool is convinced that the standby is still down:
>> read_status_file: 1 th backend is set to down status
>>
>> I'm confused how its making this determination, or how to fix it,
>> especially since I've set:
>> health_check_period = 0
>>
>> Help?!
>> _
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


[Pgpool-general] pgpool thinks a backend is down even though its not

2011-09-12 Thread Lonni J Friedman
Greetings,
I've got a 3 node postgresql-9.0.4 cluster (1 master, two standby, all
running on Linux-x86_64.  I had a hardware problem on one of the
standby's, and had to bring it down to swap out the bad HW.  I got it
synced back up with the mater successfully, and I can successfully
manually run SQL queries from the pgpool server to the standby.
However, pgpool is convinced that the standby is still down:
read_status_file: 1 th backend is set to down status

I'm confused how its making this determination, or how to fix it,
especially since I've set:
health_check_period = 0

Help?!
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-12 Thread Lonni J Friedman
On Mon, Sep 12, 2011 at 3:34 AM, Toshihiro Kitagawa
 wrote:
> Hi,
>
>> However, I'm see tons (dozens every minute) of the
>> following in my postgresql server logs:
>> LOG:  unexpected EOF on client connection
>
> I found recently that it occurs if you specify the role that needs a
> password to health_check_user.
>
> Is your health_check_user trust authentication?

Yes, all connections defined in pool_hba.conf are trust auth.
However, I also have
health_check_period = 0
in pgpool.conf, so I'd assume that no health checks are being performed?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-09-09 Thread Lonni J Friedman
No one else has experienced this or has suggestions how to debug it?

On Wed, Sep 7, 2011 at 12:49 PM, Lonni J Friedman  wrote:
> Greetings,
> I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
> balancer for a three server postgresql-9.0.4 cluster (1 master, 2
> standby).  I'm seeing strange behavior where a single pgpool process
> seems to hang after some period of time, and then consume 100% of the
> CPU.  I've seen this behavior happen twice since last Friday (when
> pgpool was brought online in my production environment).  At the
> moment the current hung process looks like this in 'ps auxww' output:
>
> postgres 19838 98.7  0.0  68856  2904 ?        R    Sep06 1027:36
> pgpool: lfriedman nightly 10.31.45.20(58277) idle
>
>
> In top, I see:
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool
>
>
> When to connect to the process with strace, there is no output, so I'm
> guessing the process is stuck spinning somewhere:
> # strace -p 19838
> Process 19838 attached - interrupt to quit
> ...
> ^CProcess 19838 detached
>
> One thing that i'm certain of is that the client IP (10.31.45.20)
> associated with the hung process has rebooted at least once since that
> process was spawned.  So pgpool seems to be in some confused state, as
> the client definitely severed the connection already.  I checked the
> pgpool log and there are no explicit references to PID 19838.  I'm at
> a loss how to debug this further, but clearly something is wrong
> somewhere, and this isn't normal/expected behavior.
>
> Help?!
>
> thanks
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] pgpool-II 3.1 released

2011-09-08 Thread Lonni J Friedman
On Thu, Sep 8, 2011 at 5:30 AM, Toshihiro Kitagawa
 wrote:
> - Change the lock method of insert_lock. The previous insert_lock uses
>  row locking against the sequence relation, but the current one uses
>  row locking against pgpool_catalog.insert_lock table. The reason is
>  that PostgreSQL core developers decided to disallow row locking
>  against the sequence relation to avoid an internal error which it
>  leads. So creating insert_lock table in all databases which are
>  accessed via pgpool-II beforehand is required. If does not exist
>  insert_lock table, pgpool-II locks the insert target table. This
>  behavior is same as pgpool-II 2.2 and 2.3 series. If you want to use
>  insert_lock which is compatible with older releases, you can specify
>  lock method by configure options: --enable-sequence-lock,
>  --enable-table-lock(Kitagawa)

Is there a sql script somewhere for creating this table with the
correct/expected schema?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


[Pgpool-general] seemingly hung pgpool process consuming 100% CPU

2011-09-07 Thread Lonni J Friedman
Greetings,
I'm running pgpool-3.0.4 on a Linux-x86_64 server serving as a load
balancer for a three server postgresql-9.0.4 cluster (1 master, 2
standby).  I'm seeing strange behavior where a single pgpool process
seems to hang after some period of time, and then consume 100% of the
CPU.  I've seen this behavior happen twice since last Friday (when
pgpool was brought online in my production environment).  At the
moment the current hung process looks like this in 'ps auxww' output:

postgres 19838 98.7  0.0  68856  2904 ?RSep06 1027:36
pgpool: lfriedman nightly 10.31.45.20(58277) idle


In top, I see:
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
19838 postgres  20   0 68856 2904 1072 R 100.0  0.0   1027:29 pgpool


When to connect to the process with strace, there is no output, so I'm
guessing the process is stuck spinning somewhere:
# strace -p 19838
Process 19838 attached - interrupt to quit
...
^CProcess 19838 detached

One thing that i'm certain of is that the client IP (10.31.45.20)
associated with the hung process has rebooted at least once since that
process was spawned.  So pgpool seems to be in some confused state, as
the client definitely severed the connection already.  I checked the
pgpool log and there are no explicit references to PID 19838.  I'm at
a loss how to debug this further, but clearly something is wrong
somewhere, and this isn't normal/expected behavior.

Help?!

thanks
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-04 Thread Lonni J Friedman
On Sun, Sep 4, 2011 at 5:36 PM, Rosser Schwarz  wrote:
> "Unexpected EOF" doesn't mean a postgres backend is failing; it
> typically means that a client has disconnected from its backend
> without closing out the session.  (E.g., connect via psql, then kill
> the psql process -- not the backend! -- and you'll see an "Unexpected
> EOF".)
>
> As for why you're only seeing them since starting to use pgpool, I
> couldn't say. I'm seeing them, too, but was before we started working
> with the pooler, too, and haven't noticed (but also haven't
> specifically looked for) a change in their frequency.

Weird.  I guess I can ignore it for now, as I haven't noticed any
negative side effects.  It would be nice if there was an effective
means of debugging it though (other than turning on SQL logging, which
is going to fill up my disks in a matter of hours).
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-04 Thread Lonni J Friedman
Thanks for your reply.  That doesn't appear to be the problem.  While
logged onto the server where pgpool is running, I can successfully
invoke psql to connect directly to both of the standby servers (using
port 5432).  Also, I'm not seeing any errors in the pgpool log, where
I'd expect to see something if connection attempts were failing?

On Sun, Sep 4, 2011 at 4:43 PM, Richard Diekema  wrote:
> If I recall, unexpected EOF means your connection to one of the postgres
> instances is failing. Check your pg_hba.conf to make sure the pgpool has
> permission to connect, and that you can connect from the server running
> pgpool.
>
> On Sep 4, 2011 7:22 PM, "Lonni J Friedman"  wrote:
>> No one has any ideas or suggestions?
>>
>> On Fri, Sep 2, 2011 at 6:44 PM, Lonni J Friedman 
>> wrote:
>>> Greetings,
>>> I just deployed a postgresql-9.0.4 cluster (1 master, 2 hot
>>> standby's), with pgpool-II-3.0.4 (all running on Linux-x86_64
>>> servers).  I'm currently only using pgpool for load balancing, and its
>>> working fine.  However, I'm see tons (dozens every minute) of the
>>> following in my postgresql server logs:
>>> LOG:  unexpected EOF on client connection
>>>
>>> This was not happening prior to setting up pgpool, so I'm currently
>>> working with the assumption that something in pgpool's load balancing
>>> might be causing it.  Is this a known issue?  If not, how can I debug
>>> this further?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


Re: [Pgpool-general] unexpected EOF on client connection

2011-09-04 Thread Lonni J Friedman
No one has any ideas or suggestions?

On Fri, Sep 2, 2011 at 6:44 PM, Lonni J Friedman  wrote:
> Greetings,
> I just deployed a postgresql-9.0.4 cluster (1 master, 2 hot
> standby's), with pgpool-II-3.0.4 (all running on Linux-x86_64
> servers).  I'm currently only using pgpool for load balancing, and its
> working fine.  However, I'm see tons (dozens every minute) of the
> following in my postgresql server logs:
> LOG:  unexpected EOF on client connection
>
> This was not happening prior to setting up pgpool, so I'm currently
> working with the assumption that something in pgpool's load balancing
> might be causing it.  Is this a known issue?  If not, how can I debug
> this further?
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


[Pgpool-general] unexpected EOF on client connection

2011-09-02 Thread Lonni J Friedman
Greetings,
I just deployed a postgresql-9.0.4 cluster (1 master, 2 hot
standby's), with pgpool-II-3.0.4 (all running on Linux-x86_64
servers).  I'm currently only using pgpool for load balancing, and its
working fine.  However, I'm see tons (dozens every minute) of the
following in my postgresql server logs:
LOG:  unexpected EOF on client connection

This was not happening prior to setting up pgpool, so I'm currently
working with the assumption that something in pgpool's load balancing
might be causing it.  Is this a known issue?  If not, how can I debug
this further?

thanks!
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general


[Pgpool-general] load balancing only setup

2011-08-22 Thread Lonni J Friedman
Greetings,
I have a 3 node PostgreSQL-9.0.4 cluster (1 master, 2 standbys) setup
with streaming replication enabled & verified functional.  I'm in the
process of setting up pgpool-II-3.0.4 (for the first time), where all
I care about is the load balancing functionality, such that all the
read-only queries (SELECT, etc) are equally distributed.  I don't need
pgpool to take care of any failover, or really anything other than
load balancing.  I've gone through the official documentation a few
times, and I'm having difficulty separating out the required options
from the optional options.  I'm honestly not even completely certain
if its possible to do only load balancing without any other
functionality?

I *think* if I set the following, I'll get what I want, but I'd
appreciate some confirmation or a helpful push in the right direction
if I'm missing something:
master_slave_mode = true
load_balance_mode = true
replication_mode = false
replication_stop_on_mismatch = false
failover_if_affected_tuples_mismatch = false
replicate_select = false
master_slave_sub_mode = 'stream'
failover_command = ''
failback_command = ''
fail_over_on_backend_error = false
parallel_mode = false


thanks!
___
Pgpool-general mailing list
Pgpool-general@pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgpool-general