Alvaro Herrera wrote: > Alvaro Herrera wrote: > > Stefan Kaltenbrunner wrote: > > > > > well - i now have a core file but it does not seem to be much worth > > > except to prove that autovacuum seems to be the culprit: > > > > > > Core was generated by `postgres: autovacuum worker process > > > '. > > > Program terminated with signal 6, Aborted. > > > > > > [...] > > > > > > #0 0x00000ed9 in ?? () > > > warning: GDB can't find the start of the function at 0xed9. > > I just noticed an ugly bug in the worker code which I'm fixing. I think > this one would also throw SIGSEGV, not SIGABRT.
Nailed it -- this is the actual bug that causes the abort. But I am surprised that it doesn't print the error message in Stefan machine's; here it outputs TRAP: FailedAssertion("!((((unsigned long)(elem)) > ShmemBase))", File: "/pgsql/source/00head/src/backend/storage/ipc/shmqueue.c", Line: 107) 16496 2007-05-02 11:30:31 CLT DEBUG: server process (PID 16540) was terminated by signal 6: Aborted 16496 2007-05-02 11:30:31 CLT LOG: server process (PID 16540) was terminated by signal 6: Aborted 16496 2007-05-02 11:30:31 CLT LOG: terminating any other active server processes 16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16541 16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16498 16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16500 16496 2007-05-02 11:30:31 CLT DEBUG: sending SIGQUIT to process 16499 16541 2007-05-02 11:30:33 CLT WARNING: terminating connection because of crash of another server process Maybe stderr is going somewhere else? That would be strange, I think. I'll commit the fix shortly; attached. -- Alvaro Herrera http://www.flickr.com/photos/alvherre/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Index: src/backend/postmaster/autovacuum.c =================================================================== RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v retrieving revision 1.42 diff -c -p -r1.42 autovacuum.c *** src/backend/postmaster/autovacuum.c 18 Apr 2007 16:44:18 -0000 1.42 --- src/backend/postmaster/autovacuum.c 2 May 2007 15:25:27 -0000 *************** AutoVacWorkerMain(int argc, char *argv[] *** 1407,1431 **** * Get the info about the database we're going to work on. */ LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); ! MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker); ! dbid = MyWorkerInfo->wi_dboid; ! MyWorkerInfo->wi_workerpid = MyProcPid; ! ! /* insert into the running list */ ! SHMQueueInsertBefore(&AutoVacuumShmem->av_runningWorkers, ! &MyWorkerInfo->wi_links); /* ! * remove from the "starting" pointer, so that the launcher can start a new ! * worker if required */ ! AutoVacuumShmem->av_startingWorker = INVALID_OFFSET; ! LWLockRelease(AutovacuumLock); ! on_shmem_exit(FreeWorkerInfo, 0); ! /* wake up the launcher */ ! if (AutoVacuumShmem->av_launcherpid != 0) ! kill(AutoVacuumShmem->av_launcherpid, SIGUSR1); if (OidIsValid(dbid)) { --- 1407,1442 ---- * Get the info about the database we're going to work on. */ LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE); ! /* ! * beware of startingWorker being INVALID; this could happen if the ! * launcher thinks we've taking too long to start. */ ! if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET) ! { ! MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker); ! dbid = MyWorkerInfo->wi_dboid; ! MyWorkerInfo->wi_workerpid = MyProcPid; ! ! /* insert into the running list */ ! SHMQueueInsertBefore(&AutoVacuumShmem->av_runningWorkers, ! &MyWorkerInfo->wi_links); ! /* ! * remove from the "starting" pointer, so that the launcher can start a new ! * worker if required ! */ ! AutoVacuumShmem->av_startingWorker = INVALID_OFFSET; ! LWLockRelease(AutovacuumLock); ! on_shmem_exit(FreeWorkerInfo, 0); ! /* wake up the launcher */ ! if (AutoVacuumShmem->av_launcherpid != 0) ! kill(AutoVacuumShmem->av_launcherpid, SIGUSR1); ! } ! else ! /* no worker entry for me, go away */ ! LWLockRelease(AutovacuumLock); if (OidIsValid(dbid)) { *************** AutoVacWorkerMain(int argc, char *argv[] *** 1466,1473 **** } /* ! * FIXME -- we need to notify the launcher when we are gone. But this ! * should be done after our PGPROC is released, in ProcKill. */ /* All done, go away */ --- 1477,1484 ---- } /* ! * The launcher will be notified of my death in ProcKill, *if* we managed ! * to get a worker slot at all */ /* All done, go away */
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org