On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
> > How about having it sleep for a short while, then try again?
> 
> I could get behind that, but I don't think the delay should be more than
> 100ms or so.  It's important for the postmaster to acquire the lock (or
> not) pretty quickly, or pg_ctl is going to get confused.  If we keep it
> short, we can also dispense with the log spam you were suggesting.
> 
> (Actually, I wonder if this type of scenario isn't going to confuse
> pg_ctl already --- it might think the lockfile belongs to the postmaster
> *it* started, not some pre-existing one.  Does that matter?)

I took Alvaro's approach of a sleep.  The file test was already in a
loop that went 100 times.  Basically, if the lock file exists, this
postmaster isn't going to succeed, so I figured there is no reason to
rush in the testing.  I gave it 5 tries with one second between
attempts.  Either the file is being populated, or it is stale and empty.

I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
to exit out of the postmaster should be fine.

Patch attached.

FYI, I noticed we have a similar 5-second creation time requirement in
pg_ctl:

        /*
         * The postmaster should create postmaster.pid very soon after being
         * started.  If it's not there after we've waited 5 or more seconds,
         * assume startup failed and give up waiting.  (Note this covers both
         * cases where the pidfile was never created, and where it was created
         * and then removed during postmaster exit.)  Also, if there *is* a
         * file there but it appears stale, issue a suitable warning and give
         * up waiting.
         */
        if (i >= 5)

This is for the case where the file has an old pid, rather than it is
empty.

FYI, I fixed the filename problem Tom found.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..0309494
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*************** CreateLockFile(const char *filename, boo
*** 766,771 ****
--- 766,793 ----
  							filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			/*
+ 			 *	An empty lock file exits;  either is it from another postmaster
+ 			 *	that is still starting up, or left from a crash.  Check for
+ 			 *	five seconds, then if it still empty, it must be from a crash,
+ 			 *	so fail and recommend lock file removal.
+ 			 */
+ 			if (ntries < 5)
+ 			{
+ 				sleep(1);
+ 				continue;
+ 			}
+ 			else
+ 				ereport(FATAL,
+ 						(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 						 errmsg("lock file \"%s\" is empty", filename),
+ 						 errhint(
+ 						"Empty lock file probably left from operating system crash during\n"
+ 						"database startup;  file deletion suggested.")));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to