Stefan Kaltenbrunner wrote: > Tom Lane wrote: >> Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes: >>> one of my new buildfarm boxes (an Debian/Etch based ARM box) is >>> sometimes failing to stop the database during the regression tests: >>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=quagga&dt=2007-01-08%2003:03:03 >>> this only seems to happen sometimes and only if --with-tcl is enabled on >>> quagga. >>> lionfish (my mipsel box) is able to trigger that on every build if I >>> enable --with-tcl but it is nearly impossible to debug it there because >>> of the low amount of memory and diskspace it has. >> Hm, could pl/tcl somehow be preventing the backend from exiting once >> it's run any pl/tcl stuff? I have no idea why though, and even less >> why it wouldn't be repeatable. >> >>> After the stopdb failure we still have those processes running: >>> pgbuild 3488 0.0 2.4 43640 6300 ? Ss 06:15 0:01 >>> postgres: pgbuild pl_regression [local] idle >> Can you get a stack trace from this process? > > (gdb) bt > #0 0x406b9d80 in __pthread_sigsuspend () from /lib/libpthread.so.0 > #1 0x406b8a7c in __pthread_wait_for_restart_signal () from > /lib/libpthread.so.0 > #2 0x406b91f8 in pthread_onexit_process () from /lib/libpthread.so.0 > #3 0x40438658 in exit () from /lib/libc.so.6 > #4 0x40438658 in exit () from /lib/libc.so.6 > Previous frame identical to this frame (corrupt stack?) > > > >>> pgbuild 3489 0.0 0.0 0 0 ? Z 06:15 0:00 >>> [postgres] <defunct> >> This is a bit odd ... if that process is a direct child of the >> postmaster it should have been reaped promptly. Could it be a child >> of the other backend? If so, why was it started? Please try the >> ps again with whatever switch it needs to list parent process ID. > > looks you are right - the defunct 3489 seems to be a child of 3488: > > PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND > 1 3389 18341 18341 ? -1 S 1001 0:03 > /home/pgbuild/pgbuildfarm/HEAD/inst/bin/postgres -D data > 3389 3391 3391 3391 ? -1 Ss 1001 0:00 postgres: > writer process > 3389 3392 3392 3392 ? -1 Ss 1001 0:00 postgres: stats > collector process > 3389 3488 3488 3488 ? -1 Ss 1001 0:01 postgres: > pgbuild pl_regression [local] idle > 3488 3489 3488 3488 ? -1 Z 1001 0:00 [postgres] > <defunct>
FWIW - I removed --with-tcl from quagga's configuration about two weeks ago and it has not failed(for that reason) again. So the issue most definitly looks like plptcl related ... Stefan ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend