I have been able to successfully perform an online recovery now with no problems. I have only done once so far but I will be doing a few more shortly.
Thanks Tatsuo Marcelo Linux/Solaris System Administrator PostgreSQL DBA http://www.zeroaccess.org On Dec 4, 2008, at 9:21 PM, Marcelo Martins wrote: > That's awesome, thank you. > I will download the latest cvs version and try it out. > > On Dec 4, 2008, at 20:39, Tatsuo Ishii <[EMAIL PROTECTED]> wrote: > >> Hi Marcelo, >> >> With your help, I was able to find the problem. >> If you connect to pgpool *before* starting recovery, the timeout >> parameter to select(2) is set to NULL which means it will wait >> forever. I have modified pool_process_query.c so that it will set >> timeout whenever client_idle_limit_in_recovery > 0. Please grab the >> CVS Head and try it out. >> >> Thanks for your great help! >> -- >> Tatsuo Ishii >> SRA OSS, Inc. Japan >> >>> Hi Tatsuo, >>> >>> >>> I have also checked the value for fds >>> >>> This "if (((*InRecovery == 0 && pool_config->client_idle_limit > 0) >>> || >>> (*InRecovery && pool_config->client_idle_limit_in_recovery > 0)) && >>> fds == 0) " never >>> becomes true unless I run some query inside the psql connection >>> that I >>> created on step 1. >>> >>> 1) connect to pgpool with psql >>> >>> 2) run pcp_recovery_node >>> >>> 3) When log shows that is stuck on starting staging 2 connect to >>> pgpool PID from step 1 >>> >>> - 1st GDB backtrace - >>> >>> - on frame 2 - in pool_process_query (frontend=0x8128c18, >>> backend=0x8128a10, connection_reuse=0, >>> first_ready_for_query_received=0) at pool_process_query.c:363 >>> 363 fds = select(num_fds, >>> &readmask, &writemask, &exceptmask, &timeout); >>> >>> (gdb) frame 2 >>> #2 0x0805a499 in pool_process_query (frontend=0x8128c18, >>> backend=0x8128a10, connection_reuse=0, >>> first_ready_for_query_received=0) at pool_process_query.c:365 >>> 365 fds = select(num_fds, >>> &readmask, &writemask, &exceptmask, NULL); >>> (gdb) print *InRecovery >>> $1 = 1 >>> (gdb) print pool_config->client_idle_limit >>> $2 = 0 >>> (gdb) print pool_config->client_idle_limit_in_recovery >>> $3 = 7 >>> (gdb) print fds >>> $4 = 135432720 >>> >>> >>> 4) List databases inside psql connection created on step 1 "\l" >>> >>> 5) Detach gdb from PID and attach it back to let "\l" run >>> >>> - 2st GDB backtrace - >>> >>> - on frame 2 - in pool_process_query (frontend=0x8128c18, >>> backend=0x8128a10, connection_reuse=0, >>> first_ready_for_query_received=0) at pool_process_query.c:363 >>> 363 fds = select(num_fds, >>> &readmask, &writemask, &exceptmask, &timeout); >>> >>> (gdb) bt >>> #0 0xb7f69402 in ?? () >>> #1 0xb7e810fd in select () from /lib/tls/i686/cmov/libc.so.6 >>> #2 0x0805a463 in pool_process_query (frontend=0x8128c18, >>> backend=0x8128a10, connection_reuse=0, >>> first_ready_for_query_received=0) at pool_process_query.c:363 >>> #3 0x0804f03e in do_child (unix_fd=3, inet_fd=4) at child.c:428 >>> #4 0x0804bc21 in fork_a_child (unix_fd=3, inet_fd=4, id=3) at >>> main.c: >>> 814 >>> #5 0x0804d1e8 in failover () at main.c:1328 >>> #6 0x0804b16b in main (argc=7, argv=0xbfef7c64) at main.c:519 >>> (gdb) frame 2 >>> #2 0x0805a463 in pool_process_query (frontend=0x8128c18, >>> backend=0x8128a10, connection_reuse=0, >>> first_ready_for_query_received=0) at pool_process_query.c:363 >>> 363 fds = select(num_fds, >>> &readmask, &writemask, &exceptmask, &timeout); >>> (gdb) print *InRecovery >>> $1 = 1 >>> (gdb) print pool_config->client_idle_limit >>> $2 = 0 >>> (gdb) print pool_config->client_idle_limit_in_recovery >>> $3 = 7 >>> (gdb) print fds >>> $4 = 0 >>> >>> >>> Once I attach back to process I'm able to see a line in the pgpool >>> LOG file as shown below >>> >>> Dec 4 09:41:58 debian-db6 pgpool: 2008-12-04 09:41:58 DEBUG: pid >>> 24697: idle count:1 InRecovery:0 client_idle_limit:7 >>> client_idle_limit_in_recovery:-1074827064 >>> >>> Then I let gdb continue the process and recovery proceeds since the >>> if >>> statement is now able to evaluate to true >>> >>> >>> >>> >>> Hope that helps >>> >>> If you want to see this happening let me know and I can setup some >>> VMs >>> and then provide you with access to it >>> >>> - >>> Marcelo >>> >>> >>> On Dec 4, 2008, at 4:17 AM, Tatsuo Ishii wrote: >>> >>>> Thanks! >>>> >>>> Can you please print the value of: >>>> >>>> *InRecovery >>>> *pool_config >>>> >>>> at frame #2? >>>> -- >>>> Tatsuo Ishii >>>> SRA OSS, Inc. Japan >>>> >>>>> Hi Tatsuo, >>>>> >>>>> sorry for the delay here. >>>>> I was able to compile the CVS version now and no problem in >>>>> regards >>>>> to >>>>> bison, thanks. >>>>> >>>>> I have also placed this back on the list >>>>>> >>>>>> Thanks. What I want to know is followings: >>>>>> >>>>>> 1) connect to pgpool-II using psql >>>>> >>>>> Ok connected to pgpool through psql >>>>>> >>>>>> 2) start recovery >>>>> >>>>> Ok, ./pcp_recovery_node 100 localhost 9898 nastpcp nastpcp 1 >>>>> >>>>>> >>>>>> 3) pgpool-II stucks at the beginning of 2nd stage (this is what I >>>>>> couldn't reproduce) >>>>> >>>>> Ok, got stuck >>>>> >>>>>> >>>>>> 4) attach gdb to pgpool-II child process which psql connected at >>>>>> 1) >>>>> >>>>> Ok, gdb pgpool PID >>>>> >>>>>> >>>>>> 5) get backtrace to know where pgpool-II sticks >>>>>> >>>>>>> >>>>> >>>>> Attaching to process 23712 >>>>> Reading symbols from /opt/pgpool-cvs.1.117/bin/pgpool...done. >>>>> Using host libthread_db library "/lib/tls/i686/cmov/ >>>>> libthread_db.so. >>>>> 1". >>>>> Reading symbols from /usr/lib/libpq.so.5...done. >>>>> Loaded symbols for /usr/lib/libpq.so.5 >>>>> Reading symbols from /opt/pgpool-cvs.1.117/lib/libpcp.so.0...done. >>>>> Loaded symbols for /opt/pgpool-cvs.1.117/lib/libpcp.so.0 >>>>> Reading symbols from /lib/tls/i686/cmov/libresolv.so.2...done. >>>>> Loaded symbols for /lib/tls/i686/cmov/libresolv.so.2 >>>>> Reading symbols from /lib/tls/i686/cmov/libnsl.so.1...done. >>>>> Loaded symbols for /lib/tls/i686/cmov/libnsl.so.1 >>>>> Reading symbols from /lib/tls/i686/cmov/libm.so.6...done. >>>>> Loaded symbols for /lib/tls/i686/cmov/libm.so.6 >>>>> Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. >>>>> Loaded symbols for /lib/tls/i686/cmov/libc.so.6 >>>>> Reading symbols from /lib/tls/i686/cmov/libcrypt.so.1...done. >>>>> Loaded symbols for /lib/tls/i686/cmov/libcrypt.so.1 >>>>> Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...done. >>>>> Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8 >>>>> Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...done. >>>>> Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8 >>>>> Reading symbols from /usr/lib/libkrb5.so.3...done. >>>>> Loaded symbols for /usr/lib/libkrb5.so.3 >>>>> Reading symbols from /lib/libcom_err.so.2...done. >>>>> Loaded symbols for /lib/libcom_err.so.2 >>>>> Reading symbols from /usr/lib/libgssapi_krb5.so.2...done. >>>>> Loaded symbols for /usr/lib/libgssapi_krb5.so.2 >>>>> Reading symbols from /usr/lib/libldap_r.so.2...done. >>>>> Loaded symbols for /usr/lib/libldap_r.so.2 >>>>> Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...done. >>>>> [Thread debugging using libthread_db enabled] >>>>> [New Thread -1214495040 (LWP 23712)] >>>>> Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0 >>>>> Reading symbols from /lib/ld-linux.so.2...done. >>>>> Loaded symbols for /lib/ld-linux.so.2 >>>>> Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done. >>>>> Loaded symbols for /lib/tls/i686/cmov/libdl.so.2 >>>>> Reading symbols from /usr/lib/libz.so.1...done. >>>>> Loaded symbols for /usr/lib/libz.so.1 >>>>> Reading symbols from /usr/lib/libk5crypto.so.3...done. >>>>> Loaded symbols for /usr/lib/libk5crypto.so.3 >>>>> Reading symbols from /usr/lib/libkrb5support.so.0...done. >>>>> Loaded symbols for /usr/lib/libkrb5support.so.0 >>>>> Reading symbols from /usr/lib/liblber.so.2...done. >>>>> root 5312 6 0 Dec03 ? 00:00:00 [pdflush] >>>>> Loaded symbols for /usr/lib/liblber.so.2 >>>>> Reading symbols from /usr/lib/libsasl2.so.2...done. >>>>> Loaded symbols for /usr/lib/libsasl2.so.2 >>>>> Reading symbols from /usr/lib/libgnutls.so.13...done. >>>>> Loaded symbols for /usr/lib/libgnutls.so.13 >>>>> Reading symbols from /usr/lib/libtasn1.so.3...done. >>>>> Loaded symbols for /usr/lib/libtasn1.so.3 >>>>> Reading symbols from /usr/lib/libgcrypt.so.11...done. >>>>> Loaded symbols for /usr/lib/libgcrypt.so.11 >>>>> Reading symbols from /usr/lib/libgpg-error.so.0...done. >>>>> Loaded symbols for /usr/lib/libgpg-error.so.0 >>>>> Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done. >>>>> Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2 >>>>> Failed to read a valid object file image from memory. >>>>> 0xb7f39402 in ?? () >>>>> >>>>> (gdb) bt >>>>> #0 0xb7f39402 in ?? () >>>>> #1 0xb7e510fd in select () from /lib/tls/i686/cmov/libc.so.6 >>>>> #2 0x0805a499 in pool_process_query (frontend=0x8128c18, >>>>> backend=0x8128a10, connection_reuse=0, >>>>> first_ready_for_query_received=0) >>>>> at pool_process_query.c:365 >>>>> #3 0x0804f03e in do_child (unix_fd=3, inet_fd=4) at child.c:428 >>>>> #4 0x0804bc21 in fork_a_child (unix_fd=3, inet_fd=4, id=2) at >>>>> main.c: >>>>> 814 >>>>> #5 0x0804d1e8 in failover () at main.c:1328 >>>>> #6 0x0804b16b in main (argc=7, argv=0xbff10594) at main.c:519 >>>>> >>>>> >>>>> >>>>> >>> > _______________________________________________ > Pgpool-general mailing list > [email protected] > http://pgfoundry.org/mailman/listinfo/pgpool-general _______________________________________________ Pgpool-general mailing list [email protected] http://pgfoundry.org/mailman/listinfo/pgpool-general
