Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > Stefan Kaltenbrunner wrote:
> > 
> > > well - i now have a core file but it does not seem to be much worth
> > > except to prove that autovacuum seems to be the culprit:
> > > 
> > > Core was generated by `postgres: autovacuum worker process
> > >                              '.
> > > Program terminated with signal 6, Aborted.
> > > 
> > > [...]
> > > 
> > > #0  0x00000ed9 in ?? ()
> > > warning: GDB can't find the start of the function at 0xed9.
> 
> I just noticed an ugly bug in the worker code which I'm fixing.  I think
> this one would also throw SIGSEGV, not SIGABRT.

Nailed it -- this is the actual bug that causes the abort.  But I am
surprised that it doesn't print the error message in Stefan machine's;
here it outputs


TRAP: FailedAssertion("!((((unsigned long)(elem)) > ShmemBase))", File: 
"/pgsql/source/00head/src/backend/storage/ipc/shmqueue.c", Line: 107)
16496 2007-05-02 11:30:31 CLT DEBUG:  server process (PID 16540) was terminated 
by signal 6: Aborted
16496 2007-05-02 11:30:31 CLT LOG:  server process (PID 16540) was terminated 
by signal 6: Aborted
16496 2007-05-02 11:30:31 CLT LOG:  terminating any other active server 
processes
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16541
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16498
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16500
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16499
16541 2007-05-02 11:30:33 CLT WARNING:  terminating connection because of crash 
of another server process


Maybe stderr is going somewhere else?  That would be strange, I think.

I'll commit the fix shortly; attached.

-- 
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.42
diff -c -p -r1.42 autovacuum.c
*** src/backend/postmaster/autovacuum.c	18 Apr 2007 16:44:18 -0000	1.42
--- src/backend/postmaster/autovacuum.c	2 May 2007 15:25:27 -0000
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1407,1431 ****
  	 * Get the info about the database we're going to work on.
  	 */
  	LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! 	MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker);
! 	dbid = MyWorkerInfo->wi_dboid;
! 	MyWorkerInfo->wi_workerpid = MyProcPid;
! 
! 	/* insert into the running list */
! 	SHMQueueInsertBefore(&AutoVacuumShmem->av_runningWorkers, 
! 						 &MyWorkerInfo->wi_links);
  	/*
! 	 * remove from the "starting" pointer, so that the launcher can start a new
! 	 * worker if required
  	 */
! 	AutoVacuumShmem->av_startingWorker = INVALID_OFFSET;
! 	LWLockRelease(AutovacuumLock);
  
! 	on_shmem_exit(FreeWorkerInfo, 0);
  
! 	/* wake up the launcher */
! 	if (AutoVacuumShmem->av_launcherpid != 0)
! 		kill(AutoVacuumShmem->av_launcherpid, SIGUSR1);
  
  	if (OidIsValid(dbid))
  	{
--- 1407,1442 ----
  	 * Get the info about the database we're going to work on.
  	 */
  	LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! 
  	/*
! 	 * beware of startingWorker being INVALID; this could happen if the
! 	 * launcher thinks we've taking too long to start.
  	 */
! 	if (AutoVacuumShmem->av_startingWorker != INVALID_OFFSET)
! 	{
! 		MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem->av_startingWorker);
! 		dbid = MyWorkerInfo->wi_dboid;
! 		MyWorkerInfo->wi_workerpid = MyProcPid;
! 
! 		/* insert into the running list */
! 		SHMQueueInsertBefore(&AutoVacuumShmem->av_runningWorkers, 
! 							 &MyWorkerInfo->wi_links);
! 		/*
! 		 * remove from the "starting" pointer, so that the launcher can start a new
! 		 * worker if required
! 		 */
! 		AutoVacuumShmem->av_startingWorker = INVALID_OFFSET;
! 		LWLockRelease(AutovacuumLock);
  
! 		on_shmem_exit(FreeWorkerInfo, 0);
  
! 		/* wake up the launcher */
! 		if (AutoVacuumShmem->av_launcherpid != 0)
! 			kill(AutoVacuumShmem->av_launcherpid, SIGUSR1);
! 	}
! 	else
! 		/* no worker entry for me, go away */
! 		LWLockRelease(AutovacuumLock);
  
  	if (OidIsValid(dbid))
  	{
*************** AutoVacWorkerMain(int argc, char *argv[]
*** 1466,1473 ****
  	}
  
  	/*
! 	 * FIXME -- we need to notify the launcher when we are gone.  But this
! 	 * should be done after our PGPROC is released, in ProcKill.
  	 */
  
  	/* All done, go away */
--- 1477,1484 ----
  	}
  
  	/*
! 	 * The launcher will be notified of my death in ProcKill, *if* we managed
! 	 * to get a worker slot at all
  	 */
  
  	/* All done, go away */
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to