Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-29 Thread Bruce Momjian

Applied.

---

On Wed, Aug 29, 2012 at 08:51:40AM -0400, Bruce Momjian wrote:
> On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
> > >> It's a pretty strange line wrap you got in this version of the patch.
> > >> Normally we just let the string run past the 78 char limit, without
> > >> cutting it in any way.  And moving the start of the string to the line
> > >> following "errhint(" looks very odd to me.
> > 
> > > OK, updated patch attached.
> > 
> > I agree with Alvaro's complaint that moving the whole string literal to
> > the next line isn't conformant to our usual coding style.  Definitely
> > nitpicky, but why would you do it like that?
> 
> I remember now why I added "\n".  I am used to writing pg_upgrade output
> strings, which are obviously not sent to log files.  Seems I forgot that
> distinction.  As far as moving the string to the next line, I was trying
> to keep the line from getting too long.  
> 
> Updated patch has everyone on the same line.  I am fine with nitpicky. 
> Frankly, I have applied so many patches in the past few weeks, I am glad
> someone is watching.
> 
> -- 
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
> 
>   + It's impossible for everything to be true. +

> diff --git a/src/backend/utils/init/miscinit.c 
> b/src/backend/utils/init/miscinit.c
> new file mode 100644
> index 775d71f..9a0f92c
> *** a/src/backend/utils/init/miscinit.c
> --- b/src/backend/utils/init/miscinit.c
> *** CreateLockFile(const char *filename, boo
> *** 766,771 
> --- 766,779 
>   filename)));
>   close(fd);
>   
> + if (len == 0)
> + {
> + ereport(FATAL,
> + (errcode(ERRCODE_LOCK_FILE_EXISTS),
> +  errmsg("lock file \"%s\" is empty", 
> filename),
> +  errhint("Either another server is 
> starting, or the lock file is the remnant of a previous server startup 
> crash.")));
> + }
> + 
>   buffer[len] = '\0';
>   encoded_pid = atoi(buffer);
>   
> diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
> new file mode 100644
> index af8d8b2..81ba39e
> *** a/src/bin/pg_ctl/pg_ctl.c
> --- b/src/bin/pg_ctl/pg_ctl.c
> *** get_pgpid(void)
> *** 292,299 
>   }
>   if (fscanf(pidf, "%ld", &pid) != 1)
>   {
> ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
> !  progname, pid_file);
>   exit(1);
>   }
>   fclose(pidf);
> --- 292,304 
>   }
>   if (fscanf(pidf, "%ld", &pid) != 1)
>   {
> ! /* Is the file empty? */
> ! if (ftell(pidf) == 0 && feof(pidf))
> ! write_stderr(_("%s: the PID file \"%s\" is empty\n"),
> !  progname, pid_file);
> ! else
> ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
> !  progname, pid_file);
>   exit(1);
>   }
>   fclose(pidf);

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-29 Thread Bruce Momjian
On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
> >> It's a pretty strange line wrap you got in this version of the patch.
> >> Normally we just let the string run past the 78 char limit, without
> >> cutting it in any way.  And moving the start of the string to the line
> >> following "errhint(" looks very odd to me.
> 
> > OK, updated patch attached.
> 
> I agree with Alvaro's complaint that moving the whole string literal to
> the next line isn't conformant to our usual coding style.  Definitely
> nitpicky, but why would you do it like that?

I remember now why I added "\n".  I am used to writing pg_upgrade output
strings, which are obviously not sent to log files.  Seems I forgot that
distinction.  As far as moving the string to the next line, I was trying
to keep the line from getting too long.  

Updated patch has everyone on the same line.  I am fine with nitpicky. 
Frankly, I have applied so many patches in the past few weeks, I am glad
someone is watching.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..9a0f92c
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,779 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg("lock file \"%s\" is empty", filename),
+ 	 errhint("Either another server is starting, or the lock file is the remnant of a previous server startup crash.")));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..81ba39e
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
! 	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (ftell(pidf) == 0 && feof(pidf))
! 			write_stderr(_("%s: the PID file \"%s\" is empty\n"),
! 		 progname, pid_file);
! 		else
! 			write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
! 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-28 Thread Tom Lane
Bruce Momjian  writes:
> On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
>> It's a pretty strange line wrap you got in this version of the patch.
>> Normally we just let the string run past the 78 char limit, without
>> cutting it in any way.  And moving the start of the string to the line
>> following "errhint(" looks very odd to me.

> OK, updated patch attached.

I agree with Alvaro's complaint that moving the whole string literal to
the next line isn't conformant to our usual coding style.  Definitely
nitpicky, but why would you do it like that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-28 Thread Bruce Momjian
On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012:
> > On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Updated patch attached which just reports the file as empty.  I assume
> > > > we don't want the extra text output for pg_ctl like we do for the
> > > > backend.
> > > 
> > > The backend side of this looks mostly sane to me (but drop the \n,
> > > messages are not supposed to contain those).  But the feof test proposed
> > 
> > Removed.
> 
> It's a pretty strange line wrap you got in this version of the patch.
> Normally we just let the string run past the 78 char limit, without
> cutting it in any way.  And moving the start of the string to the line
> following "errhint(" looks very odd to me.

OK, updated patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..efd5152
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,780 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg("lock file \"%s\" is empty", filename),
+ 	 errhint(
+ 	 "Either another server is starting, or the lock file is the remnant of a previous server startup crash.")));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..81ba39e
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
! 	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (ftell(pidf) == 0 && feof(pidf))
! 			write_stderr(_("%s: the PID file \"%s\" is empty\n"),
! 		 progname, pid_file);
! 		else
! 			write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
! 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-28 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012:
> On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Updated patch attached which just reports the file as empty.  I assume
> > > we don't want the extra text output for pg_ctl like we do for the
> > > backend.
> > 
> > The backend side of this looks mostly sane to me (but drop the \n,
> > messages are not supposed to contain those).  But the feof test proposed
> 
> Removed.

It's a pretty strange line wrap you got in this version of the patch.
Normally we just let the string run past the 78 char limit, without
cutting it in any way.  And moving the start of the string to the line
following "errhint(" looks very odd to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-28 Thread Tom Lane
Bruce Momjian  writes:
> On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
>> The backend side of this looks mostly sane to me (but drop the \n,
>> messages are not supposed to contain those).  But the feof test proposed

> Removed.  I thought we needed to add \n so that strings >80 would wrap
> properly.  How do we handle this?

We don't.  Per the message style guidelines, it's the responsibility of
a client frontend to line-wrap such text if it feels the need to.  The
backend has no business assuming that 80 characters (or any other
number) is where to wrap.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-28 Thread Bruce Momjian
On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Updated patch attached which just reports the file as empty.  I assume
> > we don't want the extra text output for pg_ctl like we do for the
> > backend.
> 
> The backend side of this looks mostly sane to me (but drop the \n,
> messages are not supposed to contain those).  But the feof test proposed

Removed.  I thought we needed to add \n so that strings >80 would wrap
properly.  How do we handle this?

> for pg_ctl is no good: consider a file containing just, say, "-".
> fscanf would eat the "-", then hit eof, and this would complain the file
> is empty.  Possibly checking for ftell(pidf) == 0 would do, though I'm
> not sure whether it's portable to assume fscanf would eat a non-numeric
> character before complaining.

ftell() seems to work fine when combined with feof(), so I used that in
the attached patch.  ftell() alone remains at zero if the file contains
"A", so feof() is also needed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..eadfcbf
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,781 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg("lock file \"%s\" is empty", filename),
+ 	 errhint(
+ 	 "Either another server is starting, or the lock file is the remnant "
+ 	 "of a previous server startup crash.")));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..81ba39e
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
! 	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (ftell(pidf) == 0 && feof(pidf))
! 			write_stderr(_("%s: the PID file \"%s\" is empty\n"),
! 		 progname, pid_file);
! 		else
! 			write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
! 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-28 Thread Tom Lane
Bruce Momjian  writes:
> Updated patch attached which just reports the file as empty.  I assume
> we don't want the extra text output for pg_ctl like we do for the
> backend.

The backend side of this looks mostly sane to me (but drop the \n,
messages are not supposed to contain those).  But the feof test proposed
for pg_ctl is no good: consider a file containing just, say, "-".
fscanf would eat the "-", then hit eof, and this would complain the file
is empty.  Possibly checking for ftell(pidf) == 0 would do, though I'm
not sure whether it's portable to assume fscanf would eat a non-numeric
character before complaining.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-28 Thread Bruce Momjian
On Mon, Aug 27, 2012 at 10:17:43PM -0400, Bruce Momjian wrote:
> Seems pg_ctl would also need some cleanup if we change the error
> message and/or timing.
> 
> I am thinking we should just change the error message in the postmaster
> and pg_ctl to say the file is empty, and call it done (no hint message).
> If we do want a hint, say that either the file is stale from a crash or
> another postmaster is starting up, and let the user diagnose it.

Updated patch attached which just reports the file as empty.  I assume
we don't want the extra text output for pg_ctl like we do for the
backend.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..9f0a58a
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,781 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 	 errmsg("lock file \"%s\" is empty", filename),
+ 	 errhint(
+ 	 "Either another server is starting, or the lock file is the remnant\n"
+ 	 "of a previous server startup crash.")));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index af8d8b2..0be0f2d
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** get_pgpid(void)
*** 292,299 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
  	 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);
--- 292,304 
  	}
  	if (fscanf(pidf, "%ld", &pid) != 1)
  	{
! 		/* Is the file empty? */
! 		if (feof(pidf))
! 			write_stderr(_("%s: the PID file \"%s\" is empty\n"),
  	 progname, pid_file);
+ 		else
+ 			write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
+ 		 progname, pid_file);
  		exit(1);
  	}
  	fclose(pidf);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Bruce Momjian
On Mon, Aug 27, 2012 at 09:59:10PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
> >> I could get behind that, but I don't think the delay should be more than
> >> 100ms or so.
> 
> > I took Alvaro's approach of a sleep.  The file test was already in a
> > loop that went 100 times.  Basically, if the lock file exists, this
> > postmaster isn't going to succeed, so I figured there is no reason to
> > rush in the testing.  I gave it 5 tries with one second between
> > attempts.  Either the file is being populated, or it is stale and empty.
> 
> How did "100ms" translate to 5 seconds?

That was the "no need to rush, let's just be sure of what we report".

> > I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
> > to exit out of the postmaster should be fine.
> 
> pg_ctl is not the only consideration here.  In particular, there are a
> lot of initscripts out there (all of Red Hat's, for instance) that don't
> use pg_ctl and expect the postmaster to come up (or not) in a couple of
> seconds.
> 
> I don't see a need for more than about one retry with 100ms delay.
> There is no evidence that the case we're worried about has ever occurred
> in the real world anyway, so slowing down error failures to make really
> really really sure there's not a competing postmaster doesn't seem like
> a good tradeoff.
> 
> I'm not terribly impressed with that errhint, either.

I am concerned at 100ms that we can't be sure if it is still being
created, and if we can't be sure, I am not sure there is much point in
trying to clarify the odd error message we omit.

FYI, here is what the code does now with a zero-length pid file, with my
patch:

$ postmaster
[ wait 5 seconds ]
FATAL:  lock file "postmaster.pid" is empty
HINT:  Empty lock file probably left from operating system crash during
database startup;  file deletion suggested.
$ pg_ctl start
pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid"
$ pg_ctl -w start
pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid"

Seems pg_ctl would also need some cleanup if we change the error
message and/or timing.

I am thinking we should just change the error message in the postmaster
and pg_ctl to say the file is empty, and call it done (no hint message).
If we do want a hint, say that either the file is stale from a crash or
another postmaster is starting up, and let the user diagnose it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Tom Lane
Bruce Momjian  writes:
> On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
>> I could get behind that, but I don't think the delay should be more than
>> 100ms or so.

> I took Alvaro's approach of a sleep.  The file test was already in a
> loop that went 100 times.  Basically, if the lock file exists, this
> postmaster isn't going to succeed, so I figured there is no reason to
> rush in the testing.  I gave it 5 tries with one second between
> attempts.  Either the file is being populated, or it is stale and empty.

How did "100ms" translate to 5 seconds?

> I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
> to exit out of the postmaster should be fine.

pg_ctl is not the only consideration here.  In particular, there are a
lot of initscripts out there (all of Red Hat's, for instance) that don't
use pg_ctl and expect the postmaster to come up (or not) in a couple of
seconds.

I don't see a need for more than about one retry with 100ms delay.
There is no evidence that the case we're worried about has ever occurred
in the real world anyway, so slowing down error failures to make really
really really sure there's not a competing postmaster doesn't seem like
a good tradeoff.

I'm not terribly impressed with that errhint, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Bruce Momjian
On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > How about having it sleep for a short while, then try again?
> 
> I could get behind that, but I don't think the delay should be more than
> 100ms or so.  It's important for the postmaster to acquire the lock (or
> not) pretty quickly, or pg_ctl is going to get confused.  If we keep it
> short, we can also dispense with the log spam you were suggesting.
> 
> (Actually, I wonder if this type of scenario isn't going to confuse
> pg_ctl already --- it might think the lockfile belongs to the postmaster
> *it* started, not some pre-existing one.  Does that matter?)

I took Alvaro's approach of a sleep.  The file test was already in a
loop that went 100 times.  Basically, if the lock file exists, this
postmaster isn't going to succeed, so I figured there is no reason to
rush in the testing.  I gave it 5 tries with one second between
attempts.  Either the file is being populated, or it is stale and empty.

I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
to exit out of the postmaster should be fine.

Patch attached.

FYI, I noticed we have a similar 5-second creation time requirement in
pg_ctl:

/*
 * The postmaster should create postmaster.pid very soon after being
 * started.  If it's not there after we've waited 5 or more seconds,
 * assume startup failed and give up waiting.  (Note this covers both
 * cases where the pidfile was never created, and where it was created
 * and then removed during postmaster exit.)  Also, if there *is* a
 * file there but it appears stale, issue a suitable warning and give
 * up waiting.
 */
if (i >= 5)

This is for the case where the file has an old pid, rather than it is
empty.

FYI, I fixed the filename problem Tom found.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..0309494
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 766,771 
--- 766,793 
  			filename)));
  		close(fd);
  
+ 		if (len == 0)
+ 		{
+ 			/*
+ 			 *	An empty lock file exits;  either is it from another postmaster
+ 			 *	that is still starting up, or left from a crash.  Check for
+ 			 *	five seconds, then if it still empty, it must be from a crash,
+ 			 *	so fail and recommend lock file removal.
+ 			 */
+ 			if (ntries < 5)
+ 			{
+ sleep(1);
+ continue;
+ 			}
+ 			else
+ ereport(FATAL,
+ 		(errcode(ERRCODE_LOCK_FILE_EXISTS),
+ 		 errmsg("lock file \"%s\" is empty", filename),
+ 		 errhint(
+ 		"Empty lock file probably left from operating system crash during\n"
+ 		"database startup;  file deletion suggested.")));
+ 		}
+ 
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Tom Lane
Alvaro Herrera  writes:
> How about having it sleep for a short while, then try again?

I could get behind that, but I don't think the delay should be more than
100ms or so.  It's important for the postmaster to acquire the lock (or
not) pretty quickly, or pg_ctl is going to get confused.  If we keep it
short, we can also dispense with the log spam you were suggesting.

(Actually, I wonder if this type of scenario isn't going to confuse
pg_ctl already --- it might think the lockfile belongs to the postmaster
*it* started, not some pre-existing one.  Does that matter?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane  wrote:
>> Perhaps something like:
>> 
>> FATAL: lock file "foo" is empty
>> HINT: This may mean that another postmaster was starting at the
>> same time.  If not, remove the lock file and try again.

> The problem with this is that it gives the customer only one remedy,
> which they will (if experience is any guide) try whether it is
> actually correct to do so or not.

Which other remedy(s) do you think the hint should suggest?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Alvaro Herrera
Excerpts from Robert Haas's message of lun ago 27 18:02:06 -0400 2012:
> On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane  wrote:
> > Bruce Momjian  writes:
> >> I have developed the attached patch to report a zero-length file, as you
> >> suggested.
> >
> > DIRECTORY_LOCK_FILE is entirely incorrect there.
> >
> > Taking a step back, I don't think this message is much better than the
> > existing behavior of reporting "bogus data".  Either way, it's not
> > obvious to typical users what the problem is or what to do about it.
> > If we're going to emit a special message I think it should be more user
> > friendly than this.
> >
> > Perhaps something like:
> >
> > FATAL: lock file "foo" is empty
> > HINT: This may mean that another postmaster was starting at the
> > same time.  If not, remove the lock file and try again.
> 
> The problem with this is that it gives the customer only one remedy,
> which they will (if experience is any guide) try whether it is
> actually correct to do so or not.

How about having it sleep for a short while, then try again?  I would
expect that it would cause the second postmaster to fail during the
second try, which is okay because the first one is then operational.
The problem, of course, is how long to sleep so that this doesn't fail
when load is high enough that the first postmaster still hasn't written
the file after the sleep.

Maybe

LOG:  lock file "foo" is empty, sleeping to retry
-- sleep 100ms and recheck
LOG:  lock file "foo" is empty, sleeping to retry
-- sleep, dunno, 1s, recheck
LOG:  lock file "foo" is empty, sleeping to retry
-- sleep maybe 5s? recheck
FATAL:  lock file "foo" is empty
HINT:  Is another postmaster running on data directory "bar"?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Robert Haas
On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane  wrote:
> Bruce Momjian  writes:
>> I have developed the attached patch to report a zero-length file, as you
>> suggested.
>
> DIRECTORY_LOCK_FILE is entirely incorrect there.
>
> Taking a step back, I don't think this message is much better than the
> existing behavior of reporting "bogus data".  Either way, it's not
> obvious to typical users what the problem is or what to do about it.
> If we're going to emit a special message I think it should be more user
> friendly than this.
>
> Perhaps something like:
>
> FATAL: lock file "foo" is empty
> HINT: This may mean that another postmaster was starting at the
> same time.  If not, remove the lock file and try again.

The problem with this is that it gives the customer only one remedy,
which they will (if experience is any guide) try whether it is
actually correct to do so or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-27 Thread Tom Lane
Bruce Momjian  writes:
> I have developed the attached patch to report a zero-length file, as you
> suggested.

DIRECTORY_LOCK_FILE is entirely incorrect there.

Taking a step back, I don't think this message is much better than the
existing behavior of reporting "bogus data".  Either way, it's not
obvious to typical users what the problem is or what to do about it.
If we're going to emit a special message I think it should be more user
friendly than this.

Perhaps something like:

FATAL: lock file "foo" is empty
HINT: This may mean that another postmaster was starting at the
same time.  If not, remove the lock file and try again.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-08-26 Thread Bruce Momjian
On Fri, Jan  6, 2012 at 08:28:48AM -0500, Michael Beattie wrote:
> On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander  wrote:
> 
> On Thu, Jan 5, 2012 at 23:19, Tom Lane  wrote:
> > Magnus Hagander  writes:
> >> On Thu, Jan 5, 2012 at 17:13, Tom Lane  wrote:
> >>> I think link(2) would create race conditions of its own.  I'd be
> >>> inclined to suggest that maybe we should just special-case a zero
> length
> >>> postmaster.pid file as meaning "okay to proceed".  In general, garbage
> >
> >> That's pretty much what I meant - but with a warning message.
> >
> > Actually ... wait a minute.  If we allow that, don't we create a race
> > condition between two postmasters starting at almost the same instant?
> > The second one could see the lock file when the first has created but
> > not yet filled it.
> 
> Good point, yeah, it should do that. But I still think it's rare
> enough that just special-casing the error message should be enough...
> 
> 
> 
> so just something that does like
> 
> stat(filename, &st);
> size = st.st_size;
> if (size == 0)
>elog(ERROR, "lock file \"%s\" has 0 length.", filename);
> 
> somewhere in CreateLockFile in miscinit.c? 

I have developed the attached patch to report a zero-length file, as you
suggested.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
new file mode 100644
index 775d71f..7b7c141
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
*** CreateLockFile(const char *filename, boo
*** 765,770 
--- 765,772 
  	 errmsg("could not read lock file \"%s\": %m",
  			filename)));
  		close(fd);
+ 		if (len == 0)
+ 			elog(FATAL, "lock file \"%s\" is empty", DIRECTORY_LOCK_FILE);
  
  		buffer[len] = '\0';
  		encoded_pid = atoi(buffer);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-06 Thread Michael Beattie
On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander  wrote:

> On Thu, Jan 5, 2012 at 23:19, Tom Lane  wrote:
> > Magnus Hagander  writes:
> >> On Thu, Jan 5, 2012 at 17:13, Tom Lane  wrote:
> >>> I think link(2) would create race conditions of its own.  I'd be
> >>> inclined to suggest that maybe we should just special-case a zero
> length
> >>> postmaster.pid file as meaning "okay to proceed".  In general, garbage
> >
> >> That's pretty much what I meant - but with a warning message.
> >
> > Actually ... wait a minute.  If we allow that, don't we create a race
> > condition between two postmasters starting at almost the same instant?
> > The second one could see the lock file when the first has created but
> > not yet filled it.
>
> Good point, yeah, it should do that. But I still think it's rare
> enough that just special-casing the error message should be enough...
>
>
so just something that does like

stat(filename, &st);
size = st.st_size;
if (size == 0)
   elog(ERROR, "lock file \"%s\" has 0 length.", filename);

somewhere in CreateLockFile in miscinit.c?


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-06 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 23:19, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Thu, Jan 5, 2012 at 17:13, Tom Lane  wrote:
>>> I think link(2) would create race conditions of its own.  I'd be
>>> inclined to suggest that maybe we should just special-case a zero length
>>> postmaster.pid file as meaning "okay to proceed".  In general, garbage
>
>> That's pretty much what I meant - but with a warning message.
>
> Actually ... wait a minute.  If we allow that, don't we create a race
> condition between two postmasters starting at almost the same instant?
> The second one could see the lock file when the first has created but
> not yet filled it.

Good point, yeah, it should do that. But I still think it's rare
enough that just special-casing the error message should be enough...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-05 Thread Tom Lane
Magnus Hagander  writes:
> On Thu, Jan 5, 2012 at 17:13, Tom Lane  wrote:
>> I think link(2) would create race conditions of its own.  I'd be
>> inclined to suggest that maybe we should just special-case a zero length
>> postmaster.pid file as meaning "okay to proceed".  In general, garbage

> That's pretty much what I meant - but with a warning message.

Actually ... wait a minute.  If we allow that, don't we create a race
condition between two postmasters starting at almost the same instant?
The second one could see the lock file when the first has created but
not yet filled it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-05 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 17:13, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> My laptop ran out of battery and turned itself off while I was just
>> starting up postmaster. After plugging in the charger and rebooting, I
>> got the following error when I tried to restart PostgreSQL:
>
>> FATAL:  bogus data in lock file "postmaster.pid": ""
>
>> postmaster.pid file was present in the data directory, but had zero
>> length. Looking at the way the file is created and written, that can
>> happen if you crash after the file is created, but before it's
>> written/fsync'd (my laptop might have write-cache enabled, which would
>> make the window larger).
>
>> I was a bit surprised by that. That's probably not a big deal in
>> practice, but I wonder if there was some easy way to avoid that. First I
>> thought we could create the new postmaster.pid file with a temporary
>> name and rename it in place, but rename(2) will merrily overwrite any
>> existing file which is not what we want. We could use link(2), I guess.
>
> I think link(2) would create race conditions of its own.  I'd be
> inclined to suggest that maybe we should just special-case a zero length
> postmaster.pid file as meaning "okay to proceed".  In general, garbage

That's pretty much what I meant - but with a warning message.

> data in postmaster.pid is something I'm happy to insist on manual
> recovery from, but maybe we could safely make an exception for empty
> files.

+1.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-05 Thread Tom Lane
Heikki Linnakangas  writes:
> My laptop ran out of battery and turned itself off while I was just 
> starting up postmaster. After plugging in the charger and rebooting, I 
> got the following error when I tried to restart PostgreSQL:

> FATAL:  bogus data in lock file "postmaster.pid": ""

> postmaster.pid file was present in the data directory, but had zero 
> length. Looking at the way the file is created and written, that can 
> happen if you crash after the file is created, but before it's 
> written/fsync'd (my laptop might have write-cache enabled, which would 
> make the window larger).

> I was a bit surprised by that. That's probably not a big deal in 
> practice, but I wonder if there was some easy way to avoid that. First I 
> thought we could create the new postmaster.pid file with a temporary 
> name and rename it in place, but rename(2) will merrily overwrite any 
> existing file which is not what we want. We could use link(2), I guess.

I think link(2) would create race conditions of its own.  I'd be
inclined to suggest that maybe we should just special-case a zero length
postmaster.pid file as meaning "okay to proceed".  In general, garbage
data in postmaster.pid is something I'm happy to insist on manual
recovery from, but maybe we could safely make an exception for empty
files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 8:23 AM, Magnus Hagander  wrote:
> On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
>  wrote:
>> My laptop ran out of battery and turned itself off while I was just starting
>> up postmaster. After plugging in the charger and rebooting, I got the
>> following error when I tried to restart PostgreSQL:
>>
>> FATAL:  bogus data in lock file "postmaster.pid": ""
>>
>> postmaster.pid file was present in the data directory, but had zero length.
>> Looking at the way the file is created and written, that can happen if you
>> crash after the file is created, but before it's written/fsync'd (my laptop
>> might have write-cache enabled, which would make the window larger).
>>
>> I was a bit surprised by that. That's probably not a big deal in practice,
>> but I wonder if there was some easy way to avoid that. First I thought we
>> could create the new postmaster.pid file with a temporary name and rename it
>> in place, but rename(2) will merrily overwrite any existing file which is
>> not what we want. We could use link(2), I guess.
>
> Is this really a problem big enough to spend even that much effort on?
>
> Perhaps a special-case in the error message instead is enough?

On a laptop it's not a big deal, but it sort of sucks if your
production database fails to start automatically after an unexpected
power outage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-05 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
 wrote:
> My laptop ran out of battery and turned itself off while I was just starting
> up postmaster. After plugging in the charger and rebooting, I got the
> following error when I tried to restart PostgreSQL:
>
> FATAL:  bogus data in lock file "postmaster.pid": ""
>
> postmaster.pid file was present in the data directory, but had zero length.
> Looking at the way the file is created and written, that can happen if you
> crash after the file is created, but before it's written/fsync'd (my laptop
> might have write-cache enabled, which would make the window larger).
>
> I was a bit surprised by that. That's probably not a big deal in practice,
> but I wonder if there was some easy way to avoid that. First I thought we
> could create the new postmaster.pid file with a temporary name and rename it
> in place, but rename(2) will merrily overwrite any existing file which is
> not what we want. We could use link(2), I guess.

Is this really a problem big enough to spend even that much effort on?

Perhaps a special-case in the error message instead is enough?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""

2012-01-05 Thread Heikki Linnakangas
My laptop ran out of battery and turned itself off while I was just 
starting up postmaster. After plugging in the charger and rebooting, I 
got the following error when I tried to restart PostgreSQL:


FATAL:  bogus data in lock file "postmaster.pid": ""

postmaster.pid file was present in the data directory, but had zero 
length. Looking at the way the file is created and written, that can 
happen if you crash after the file is created, but before it's 
written/fsync'd (my laptop might have write-cache enabled, which would 
make the window larger).


I was a bit surprised by that. That's probably not a big deal in 
practice, but I wonder if there was some easy way to avoid that. First I 
thought we could create the new postmaster.pid file with a temporary 
name and rename it in place, but rename(2) will merrily overwrite any 
existing file which is not what we want. We could use link(2), I guess.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers