Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Applied. --- On Wed, Aug 29, 2012 at 08:51:40AM -0400, Bruce Momjian wrote: > On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote: > > >> It's a pretty strange line wrap you got in this version of the patch. > > >> Normally we just let the string run past the 78 char limit, without > > >> cutting it in any way. And moving the start of the string to the line > > >> following "errhint(" looks very odd to me. > > > > > OK, updated patch attached. > > > > I agree with Alvaro's complaint that moving the whole string literal to > > the next line isn't conformant to our usual coding style. Definitely > > nitpicky, but why would you do it like that? > > I remember now why I added "\n". I am used to writing pg_upgrade output > strings, which are obviously not sent to log files. Seems I forgot that > distinction. As far as moving the string to the next line, I was trying > to keep the line from getting too long. > > Updated patch has everyone on the same line. I am fine with nitpicky. > Frankly, I have applied so many patches in the past few weeks, I am glad > someone is watching. > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + > diff --git a/src/backend/utils/init/miscinit.c > b/src/backend/utils/init/miscinit.c > new file mode 100644 > index 775d71f..9a0f92c > *** a/src/backend/utils/init/miscinit.c > --- b/src/backend/utils/init/miscinit.c > *** CreateLockFile(const char *filename, boo > *** 766,771 > --- 766,779 > filename))); > close(fd); > > + if (len == 0) > + { > + ereport(FATAL, > + (errcode(ERRCODE_LOCK_FILE_EXISTS), > + errmsg("lock file \"%s\" is empty", > filename), > + errhint("Either another server is > starting, or the lock file is the remnant of a previous server startup > crash."))); > + } > + > buffer[len] = '\0'; > encoded_pid = atoi(buffer); > > diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c > new file mode 100644 > index af8d8b2..81ba39e > *** a/src/bin/pg_ctl/pg_ctl.c > --- b/src/bin/pg_ctl/pg_ctl.c > *** get_pgpid(void) > *** 292,299 > } > if (fscanf(pidf, "%ld", &pid) != 1) > { > ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), > ! progname, pid_file); > exit(1); > } > fclose(pidf); > --- 292,304 > } > if (fscanf(pidf, "%ld", &pid) != 1) > { > ! /* Is the file empty? */ > ! if (ftell(pidf) == 0 && feof(pidf)) > ! write_stderr(_("%s: the PID file \"%s\" is empty\n"), > ! progname, pid_file); > ! else > ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), > ! progname, pid_file); > exit(1); > } > fclose(pidf); > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote: > >> It's a pretty strange line wrap you got in this version of the patch. > >> Normally we just let the string run past the 78 char limit, without > >> cutting it in any way. And moving the start of the string to the line > >> following "errhint(" looks very odd to me. > > > OK, updated patch attached. > > I agree with Alvaro's complaint that moving the whole string literal to > the next line isn't conformant to our usual coding style. Definitely > nitpicky, but why would you do it like that? I remember now why I added "\n". I am used to writing pg_upgrade output strings, which are obviously not sent to log files. Seems I forgot that distinction. As far as moving the string to the next line, I was trying to keep the line from getting too long. Updated patch has everyone on the same line. I am fine with nitpicky. Frankly, I have applied so many patches in the past few weeks, I am glad someone is watching. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c new file mode 100644 index 775d71f..9a0f92c *** a/src/backend/utils/init/miscinit.c --- b/src/backend/utils/init/miscinit.c *** CreateLockFile(const char *filename, boo *** 766,771 --- 766,779 filename))); close(fd); + if (len == 0) + { + ereport(FATAL, + (errcode(ERRCODE_LOCK_FILE_EXISTS), + errmsg("lock file \"%s\" is empty", filename), + errhint("Either another server is starting, or the lock file is the remnant of a previous server startup crash."))); + } + buffer[len] = '\0'; encoded_pid = atoi(buffer); diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c new file mode 100644 index af8d8b2..81ba39e *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *** get_pgpid(void) *** 292,299 } if (fscanf(pidf, "%ld", &pid) != 1) { ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), ! progname, pid_file); exit(1); } fclose(pidf); --- 292,304 } if (fscanf(pidf, "%ld", &pid) != 1) { ! /* Is the file empty? */ ! if (ftell(pidf) == 0 && feof(pidf)) ! write_stderr(_("%s: the PID file \"%s\" is empty\n"), ! progname, pid_file); ! else ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), ! progname, pid_file); exit(1); } fclose(pidf); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Bruce Momjian writes: > On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote: >> It's a pretty strange line wrap you got in this version of the patch. >> Normally we just let the string run past the 78 char limit, without >> cutting it in any way. And moving the start of the string to the line >> following "errhint(" looks very odd to me. > OK, updated patch attached. I agree with Alvaro's complaint that moving the whole string literal to the next line isn't conformant to our usual coding style. Definitely nitpicky, but why would you do it like that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote: > Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012: > > On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote: > > > Bruce Momjian writes: > > > > Updated patch attached which just reports the file as empty. I assume > > > > we don't want the extra text output for pg_ctl like we do for the > > > > backend. > > > > > > The backend side of this looks mostly sane to me (but drop the \n, > > > messages are not supposed to contain those). But the feof test proposed > > > > Removed. > > It's a pretty strange line wrap you got in this version of the patch. > Normally we just let the string run past the 78 char limit, without > cutting it in any way. And moving the start of the string to the line > following "errhint(" looks very odd to me. OK, updated patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c new file mode 100644 index 775d71f..efd5152 *** a/src/backend/utils/init/miscinit.c --- b/src/backend/utils/init/miscinit.c *** CreateLockFile(const char *filename, boo *** 766,771 --- 766,780 filename))); close(fd); + if (len == 0) + { + ereport(FATAL, + (errcode(ERRCODE_LOCK_FILE_EXISTS), + errmsg("lock file \"%s\" is empty", filename), + errhint( + "Either another server is starting, or the lock file is the remnant of a previous server startup crash."))); + } + buffer[len] = '\0'; encoded_pid = atoi(buffer); diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c new file mode 100644 index af8d8b2..81ba39e *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *** get_pgpid(void) *** 292,299 } if (fscanf(pidf, "%ld", &pid) != 1) { ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), ! progname, pid_file); exit(1); } fclose(pidf); --- 292,304 } if (fscanf(pidf, "%ld", &pid) != 1) { ! /* Is the file empty? */ ! if (ftell(pidf) == 0 && feof(pidf)) ! write_stderr(_("%s: the PID file \"%s\" is empty\n"), ! progname, pid_file); ! else ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), ! progname, pid_file); exit(1); } fclose(pidf); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012: > On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > Updated patch attached which just reports the file as empty. I assume > > > we don't want the extra text output for pg_ctl like we do for the > > > backend. > > > > The backend side of this looks mostly sane to me (but drop the \n, > > messages are not supposed to contain those). But the feof test proposed > > Removed. It's a pretty strange line wrap you got in this version of the patch. Normally we just let the string run past the 78 char limit, without cutting it in any way. And moving the start of the string to the line following "errhint(" looks very odd to me. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Bruce Momjian writes: > On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote: >> The backend side of this looks mostly sane to me (but drop the \n, >> messages are not supposed to contain those). But the feof test proposed > Removed. I thought we needed to add \n so that strings >80 would wrap > properly. How do we handle this? We don't. Per the message style guidelines, it's the responsibility of a client frontend to line-wrap such text if it feels the need to. The backend has no business assuming that 80 characters (or any other number) is where to wrap. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > Updated patch attached which just reports the file as empty. I assume > > we don't want the extra text output for pg_ctl like we do for the > > backend. > > The backend side of this looks mostly sane to me (but drop the \n, > messages are not supposed to contain those). But the feof test proposed Removed. I thought we needed to add \n so that strings >80 would wrap properly. How do we handle this? > for pg_ctl is no good: consider a file containing just, say, "-". > fscanf would eat the "-", then hit eof, and this would complain the file > is empty. Possibly checking for ftell(pidf) == 0 would do, though I'm > not sure whether it's portable to assume fscanf would eat a non-numeric > character before complaining. ftell() seems to work fine when combined with feof(), so I used that in the attached patch. ftell() alone remains at zero if the file contains "A", so feof() is also needed. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c new file mode 100644 index 775d71f..eadfcbf *** a/src/backend/utils/init/miscinit.c --- b/src/backend/utils/init/miscinit.c *** CreateLockFile(const char *filename, boo *** 766,771 --- 766,781 filename))); close(fd); + if (len == 0) + { + ereport(FATAL, + (errcode(ERRCODE_LOCK_FILE_EXISTS), + errmsg("lock file \"%s\" is empty", filename), + errhint( + "Either another server is starting, or the lock file is the remnant " + "of a previous server startup crash."))); + } + buffer[len] = '\0'; encoded_pid = atoi(buffer); diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c new file mode 100644 index af8d8b2..81ba39e *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *** get_pgpid(void) *** 292,299 } if (fscanf(pidf, "%ld", &pid) != 1) { ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), ! progname, pid_file); exit(1); } fclose(pidf); --- 292,304 } if (fscanf(pidf, "%ld", &pid) != 1) { ! /* Is the file empty? */ ! if (ftell(pidf) == 0 && feof(pidf)) ! write_stderr(_("%s: the PID file \"%s\" is empty\n"), ! progname, pid_file); ! else ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), ! progname, pid_file); exit(1); } fclose(pidf); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Bruce Momjian writes: > Updated patch attached which just reports the file as empty. I assume > we don't want the extra text output for pg_ctl like we do for the > backend. The backend side of this looks mostly sane to me (but drop the \n, messages are not supposed to contain those). But the feof test proposed for pg_ctl is no good: consider a file containing just, say, "-". fscanf would eat the "-", then hit eof, and this would complain the file is empty. Possibly checking for ftell(pidf) == 0 would do, though I'm not sure whether it's portable to assume fscanf would eat a non-numeric character before complaining. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Mon, Aug 27, 2012 at 10:17:43PM -0400, Bruce Momjian wrote: > Seems pg_ctl would also need some cleanup if we change the error > message and/or timing. > > I am thinking we should just change the error message in the postmaster > and pg_ctl to say the file is empty, and call it done (no hint message). > If we do want a hint, say that either the file is stale from a crash or > another postmaster is starting up, and let the user diagnose it. Updated patch attached which just reports the file as empty. I assume we don't want the extra text output for pg_ctl like we do for the backend. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c new file mode 100644 index 775d71f..9f0a58a *** a/src/backend/utils/init/miscinit.c --- b/src/backend/utils/init/miscinit.c *** CreateLockFile(const char *filename, boo *** 766,771 --- 766,781 filename))); close(fd); + if (len == 0) + { + ereport(FATAL, + (errcode(ERRCODE_LOCK_FILE_EXISTS), + errmsg("lock file \"%s\" is empty", filename), + errhint( + "Either another server is starting, or the lock file is the remnant\n" + "of a previous server startup crash."))); + } + buffer[len] = '\0'; encoded_pid = atoi(buffer); diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c new file mode 100644 index af8d8b2..0be0f2d *** a/src/bin/pg_ctl/pg_ctl.c --- b/src/bin/pg_ctl/pg_ctl.c *** get_pgpid(void) *** 292,299 } if (fscanf(pidf, "%ld", &pid) != 1) { ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"), progname, pid_file); exit(1); } fclose(pidf); --- 292,304 } if (fscanf(pidf, "%ld", &pid) != 1) { ! /* Is the file empty? */ ! if (feof(pidf)) ! write_stderr(_("%s: the PID file \"%s\" is empty\n"), progname, pid_file); + else + write_stderr(_("%s: invalid data in PID file \"%s\"\n"), + progname, pid_file); exit(1); } fclose(pidf); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Mon, Aug 27, 2012 at 09:59:10PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote: > >> I could get behind that, but I don't think the delay should be more than > >> 100ms or so. > > > I took Alvaro's approach of a sleep. The file test was already in a > > loop that went 100 times. Basically, if the lock file exists, this > > postmaster isn't going to succeed, so I figured there is no reason to > > rush in the testing. I gave it 5 tries with one second between > > attempts. Either the file is being populated, or it is stale and empty. > > How did "100ms" translate to 5 seconds? That was the "no need to rush, let's just be sure of what we report". > > I checked pg_ctl and that has a default wait of 60 second, so 5 seconds > > to exit out of the postmaster should be fine. > > pg_ctl is not the only consideration here. In particular, there are a > lot of initscripts out there (all of Red Hat's, for instance) that don't > use pg_ctl and expect the postmaster to come up (or not) in a couple of > seconds. > > I don't see a need for more than about one retry with 100ms delay. > There is no evidence that the case we're worried about has ever occurred > in the real world anyway, so slowing down error failures to make really > really really sure there's not a competing postmaster doesn't seem like > a good tradeoff. > > I'm not terribly impressed with that errhint, either. I am concerned at 100ms that we can't be sure if it is still being created, and if we can't be sure, I am not sure there is much point in trying to clarify the odd error message we omit. FYI, here is what the code does now with a zero-length pid file, with my patch: $ postmaster [ wait 5 seconds ] FATAL: lock file "postmaster.pid" is empty HINT: Empty lock file probably left from operating system crash during database startup; file deletion suggested. $ pg_ctl start pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid" $ pg_ctl -w start pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid" Seems pg_ctl would also need some cleanup if we change the error message and/or timing. I am thinking we should just change the error message in the postmaster and pg_ctl to say the file is empty, and call it done (no hint message). If we do want a hint, say that either the file is stale from a crash or another postmaster is starting up, and let the user diagnose it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Bruce Momjian writes: > On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote: >> I could get behind that, but I don't think the delay should be more than >> 100ms or so. > I took Alvaro's approach of a sleep. The file test was already in a > loop that went 100 times. Basically, if the lock file exists, this > postmaster isn't going to succeed, so I figured there is no reason to > rush in the testing. I gave it 5 tries with one second between > attempts. Either the file is being populated, or it is stale and empty. How did "100ms" translate to 5 seconds? > I checked pg_ctl and that has a default wait of 60 second, so 5 seconds > to exit out of the postmaster should be fine. pg_ctl is not the only consideration here. In particular, there are a lot of initscripts out there (all of Red Hat's, for instance) that don't use pg_ctl and expect the postmaster to come up (or not) in a couple of seconds. I don't see a need for more than about one retry with 100ms delay. There is no evidence that the case we're worried about has ever occurred in the real world anyway, so slowing down error failures to make really really really sure there's not a competing postmaster doesn't seem like a good tradeoff. I'm not terribly impressed with that errhint, either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote: > Alvaro Herrera writes: > > How about having it sleep for a short while, then try again? > > I could get behind that, but I don't think the delay should be more than > 100ms or so. It's important for the postmaster to acquire the lock (or > not) pretty quickly, or pg_ctl is going to get confused. If we keep it > short, we can also dispense with the log spam you were suggesting. > > (Actually, I wonder if this type of scenario isn't going to confuse > pg_ctl already --- it might think the lockfile belongs to the postmaster > *it* started, not some pre-existing one. Does that matter?) I took Alvaro's approach of a sleep. The file test was already in a loop that went 100 times. Basically, if the lock file exists, this postmaster isn't going to succeed, so I figured there is no reason to rush in the testing. I gave it 5 tries with one second between attempts. Either the file is being populated, or it is stale and empty. I checked pg_ctl and that has a default wait of 60 second, so 5 seconds to exit out of the postmaster should be fine. Patch attached. FYI, I noticed we have a similar 5-second creation time requirement in pg_ctl: /* * The postmaster should create postmaster.pid very soon after being * started. If it's not there after we've waited 5 or more seconds, * assume startup failed and give up waiting. (Note this covers both * cases where the pidfile was never created, and where it was created * and then removed during postmaster exit.) Also, if there *is* a * file there but it appears stale, issue a suitable warning and give * up waiting. */ if (i >= 5) This is for the case where the file has an old pid, rather than it is empty. FYI, I fixed the filename problem Tom found. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c new file mode 100644 index 775d71f..0309494 *** a/src/backend/utils/init/miscinit.c --- b/src/backend/utils/init/miscinit.c *** CreateLockFile(const char *filename, boo *** 766,771 --- 766,793 filename))); close(fd); + if (len == 0) + { + /* + * An empty lock file exits; either is it from another postmaster + * that is still starting up, or left from a crash. Check for + * five seconds, then if it still empty, it must be from a crash, + * so fail and recommend lock file removal. + */ + if (ntries < 5) + { + sleep(1); + continue; + } + else + ereport(FATAL, + (errcode(ERRCODE_LOCK_FILE_EXISTS), + errmsg("lock file \"%s\" is empty", filename), + errhint( + "Empty lock file probably left from operating system crash during\n" + "database startup; file deletion suggested."))); + } + buffer[len] = '\0'; encoded_pid = atoi(buffer); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Alvaro Herrera writes: > How about having it sleep for a short while, then try again? I could get behind that, but I don't think the delay should be more than 100ms or so. It's important for the postmaster to acquire the lock (or not) pretty quickly, or pg_ctl is going to get confused. If we keep it short, we can also dispense with the log spam you were suggesting. (Actually, I wonder if this type of scenario isn't going to confuse pg_ctl already --- it might think the lockfile belongs to the postmaster *it* started, not some pre-existing one. Does that matter?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Robert Haas writes: > On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane wrote: >> Perhaps something like: >> >> FATAL: lock file "foo" is empty >> HINT: This may mean that another postmaster was starting at the >> same time. If not, remove the lock file and try again. > The problem with this is that it gives the customer only one remedy, > which they will (if experience is any guide) try whether it is > actually correct to do so or not. Which other remedy(s) do you think the hint should suggest? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Excerpts from Robert Haas's message of lun ago 27 18:02:06 -0400 2012: > On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane wrote: > > Bruce Momjian writes: > >> I have developed the attached patch to report a zero-length file, as you > >> suggested. > > > > DIRECTORY_LOCK_FILE is entirely incorrect there. > > > > Taking a step back, I don't think this message is much better than the > > existing behavior of reporting "bogus data". Either way, it's not > > obvious to typical users what the problem is or what to do about it. > > If we're going to emit a special message I think it should be more user > > friendly than this. > > > > Perhaps something like: > > > > FATAL: lock file "foo" is empty > > HINT: This may mean that another postmaster was starting at the > > same time. If not, remove the lock file and try again. > > The problem with this is that it gives the customer only one remedy, > which they will (if experience is any guide) try whether it is > actually correct to do so or not. How about having it sleep for a short while, then try again? I would expect that it would cause the second postmaster to fail during the second try, which is okay because the first one is then operational. The problem, of course, is how long to sleep so that this doesn't fail when load is high enough that the first postmaster still hasn't written the file after the sleep. Maybe LOG: lock file "foo" is empty, sleeping to retry -- sleep 100ms and recheck LOG: lock file "foo" is empty, sleeping to retry -- sleep, dunno, 1s, recheck LOG: lock file "foo" is empty, sleeping to retry -- sleep maybe 5s? recheck FATAL: lock file "foo" is empty HINT: Is another postmaster running on data directory "bar"? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane wrote: > Bruce Momjian writes: >> I have developed the attached patch to report a zero-length file, as you >> suggested. > > DIRECTORY_LOCK_FILE is entirely incorrect there. > > Taking a step back, I don't think this message is much better than the > existing behavior of reporting "bogus data". Either way, it's not > obvious to typical users what the problem is or what to do about it. > If we're going to emit a special message I think it should be more user > friendly than this. > > Perhaps something like: > > FATAL: lock file "foo" is empty > HINT: This may mean that another postmaster was starting at the > same time. If not, remove the lock file and try again. The problem with this is that it gives the customer only one remedy, which they will (if experience is any guide) try whether it is actually correct to do so or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Bruce Momjian writes: > I have developed the attached patch to report a zero-length file, as you > suggested. DIRECTORY_LOCK_FILE is entirely incorrect there. Taking a step back, I don't think this message is much better than the existing behavior of reporting "bogus data". Either way, it's not obvious to typical users what the problem is or what to do about it. If we're going to emit a special message I think it should be more user friendly than this. Perhaps something like: FATAL: lock file "foo" is empty HINT: This may mean that another postmaster was starting at the same time. If not, remove the lock file and try again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Fri, Jan 6, 2012 at 08:28:48AM -0500, Michael Beattie wrote: > On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander wrote: > > On Thu, Jan 5, 2012 at 23:19, Tom Lane wrote: > > Magnus Hagander writes: > >> On Thu, Jan 5, 2012 at 17:13, Tom Lane wrote: > >>> I think link(2) would create race conditions of its own. I'd be > >>> inclined to suggest that maybe we should just special-case a zero > length > >>> postmaster.pid file as meaning "okay to proceed". In general, garbage > > > >> That's pretty much what I meant - but with a warning message. > > > > Actually ... wait a minute. If we allow that, don't we create a race > > condition between two postmasters starting at almost the same instant? > > The second one could see the lock file when the first has created but > > not yet filled it. > > Good point, yeah, it should do that. But I still think it's rare > enough that just special-casing the error message should be enough... > > > > so just something that does like > > stat(filename, &st); > size = st.st_size; > if (size == 0) >elog(ERROR, "lock file \"%s\" has 0 length.", filename); > > somewhere in CreateLockFile in miscinit.c? I have developed the attached patch to report a zero-length file, as you suggested. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c new file mode 100644 index 775d71f..7b7c141 *** a/src/backend/utils/init/miscinit.c --- b/src/backend/utils/init/miscinit.c *** CreateLockFile(const char *filename, boo *** 765,770 --- 765,772 errmsg("could not read lock file \"%s\": %m", filename))); close(fd); + if (len == 0) + elog(FATAL, "lock file \"%s\" is empty", DIRECTORY_LOCK_FILE); buffer[len] = '\0'; encoded_pid = atoi(buffer); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander wrote: > On Thu, Jan 5, 2012 at 23:19, Tom Lane wrote: > > Magnus Hagander writes: > >> On Thu, Jan 5, 2012 at 17:13, Tom Lane wrote: > >>> I think link(2) would create race conditions of its own. I'd be > >>> inclined to suggest that maybe we should just special-case a zero > length > >>> postmaster.pid file as meaning "okay to proceed". In general, garbage > > > >> That's pretty much what I meant - but with a warning message. > > > > Actually ... wait a minute. If we allow that, don't we create a race > > condition between two postmasters starting at almost the same instant? > > The second one could see the lock file when the first has created but > > not yet filled it. > > Good point, yeah, it should do that. But I still think it's rare > enough that just special-casing the error message should be enough... > > so just something that does like stat(filename, &st); size = st.st_size; if (size == 0) elog(ERROR, "lock file \"%s\" has 0 length.", filename); somewhere in CreateLockFile in miscinit.c?
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Thu, Jan 5, 2012 at 23:19, Tom Lane wrote: > Magnus Hagander writes: >> On Thu, Jan 5, 2012 at 17:13, Tom Lane wrote: >>> I think link(2) would create race conditions of its own. I'd be >>> inclined to suggest that maybe we should just special-case a zero length >>> postmaster.pid file as meaning "okay to proceed". In general, garbage > >> That's pretty much what I meant - but with a warning message. > > Actually ... wait a minute. If we allow that, don't we create a race > condition between two postmasters starting at almost the same instant? > The second one could see the lock file when the first has created but > not yet filled it. Good point, yeah, it should do that. But I still think it's rare enough that just special-casing the error message should be enough... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Magnus Hagander writes: > On Thu, Jan 5, 2012 at 17:13, Tom Lane wrote: >> I think link(2) would create race conditions of its own. I'd be >> inclined to suggest that maybe we should just special-case a zero length >> postmaster.pid file as meaning "okay to proceed". In general, garbage > That's pretty much what I meant - but with a warning message. Actually ... wait a minute. If we allow that, don't we create a race condition between two postmasters starting at almost the same instant? The second one could see the lock file when the first has created but not yet filled it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Thu, Jan 5, 2012 at 17:13, Tom Lane wrote: > Heikki Linnakangas writes: >> My laptop ran out of battery and turned itself off while I was just >> starting up postmaster. After plugging in the charger and rebooting, I >> got the following error when I tried to restart PostgreSQL: > >> FATAL: bogus data in lock file "postmaster.pid": "" > >> postmaster.pid file was present in the data directory, but had zero >> length. Looking at the way the file is created and written, that can >> happen if you crash after the file is created, but before it's >> written/fsync'd (my laptop might have write-cache enabled, which would >> make the window larger). > >> I was a bit surprised by that. That's probably not a big deal in >> practice, but I wonder if there was some easy way to avoid that. First I >> thought we could create the new postmaster.pid file with a temporary >> name and rename it in place, but rename(2) will merrily overwrite any >> existing file which is not what we want. We could use link(2), I guess. > > I think link(2) would create race conditions of its own. I'd be > inclined to suggest that maybe we should just special-case a zero length > postmaster.pid file as meaning "okay to proceed". In general, garbage That's pretty much what I meant - but with a warning message. > data in postmaster.pid is something I'm happy to insist on manual > recovery from, but maybe we could safely make an exception for empty > files. +1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
Heikki Linnakangas writes: > My laptop ran out of battery and turned itself off while I was just > starting up postmaster. After plugging in the charger and rebooting, I > got the following error when I tried to restart PostgreSQL: > FATAL: bogus data in lock file "postmaster.pid": "" > postmaster.pid file was present in the data directory, but had zero > length. Looking at the way the file is created and written, that can > happen if you crash after the file is created, but before it's > written/fsync'd (my laptop might have write-cache enabled, which would > make the window larger). > I was a bit surprised by that. That's probably not a big deal in > practice, but I wonder if there was some easy way to avoid that. First I > thought we could create the new postmaster.pid file with a temporary > name and rename it in place, but rename(2) will merrily overwrite any > existing file which is not what we want. We could use link(2), I guess. I think link(2) would create race conditions of its own. I'd be inclined to suggest that maybe we should just special-case a zero length postmaster.pid file as meaning "okay to proceed". In general, garbage data in postmaster.pid is something I'm happy to insist on manual recovery from, but maybe we could safely make an exception for empty files. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Thu, Jan 5, 2012 at 8:23 AM, Magnus Hagander wrote: > On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas > wrote: >> My laptop ran out of battery and turned itself off while I was just starting >> up postmaster. After plugging in the charger and rebooting, I got the >> following error when I tried to restart PostgreSQL: >> >> FATAL: bogus data in lock file "postmaster.pid": "" >> >> postmaster.pid file was present in the data directory, but had zero length. >> Looking at the way the file is created and written, that can happen if you >> crash after the file is created, but before it's written/fsync'd (my laptop >> might have write-cache enabled, which would make the window larger). >> >> I was a bit surprised by that. That's probably not a big deal in practice, >> but I wonder if there was some easy way to avoid that. First I thought we >> could create the new postmaster.pid file with a temporary name and rename it >> in place, but rename(2) will merrily overwrite any existing file which is >> not what we want. We could use link(2), I guess. > > Is this really a problem big enough to spend even that much effort on? > > Perhaps a special-case in the error message instead is enough? On a laptop it's not a big deal, but it sort of sucks if your production database fails to start automatically after an unexpected power outage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas wrote: > My laptop ran out of battery and turned itself off while I was just starting > up postmaster. After plugging in the charger and rebooting, I got the > following error when I tried to restart PostgreSQL: > > FATAL: bogus data in lock file "postmaster.pid": "" > > postmaster.pid file was present in the data directory, but had zero length. > Looking at the way the file is created and written, that can happen if you > crash after the file is created, but before it's written/fsync'd (my laptop > might have write-cache enabled, which would make the window larger). > > I was a bit surprised by that. That's probably not a big deal in practice, > but I wonder if there was some easy way to avoid that. First I thought we > could create the new postmaster.pid file with a temporary name and rename it > in place, but rename(2) will merrily overwrite any existing file which is > not what we want. We could use link(2), I guess. Is this really a problem big enough to spend even that much effort on? Perhaps a special-case in the error message instead is enough? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FATAL: bogus data in lock file "postmaster.pid": ""
My laptop ran out of battery and turned itself off while I was just starting up postmaster. After plugging in the charger and rebooting, I got the following error when I tried to restart PostgreSQL: FATAL: bogus data in lock file "postmaster.pid": "" postmaster.pid file was present in the data directory, but had zero length. Looking at the way the file is created and written, that can happen if you crash after the file is created, but before it's written/fsync'd (my laptop might have write-cache enabled, which would make the window larger). I was a bit surprised by that. That's probably not a big deal in practice, but I wonder if there was some easy way to avoid that. First I thought we could create the new postmaster.pid file with a temporary name and rename it in place, but rename(2) will merrily overwrite any existing file which is not what we want. We could use link(2), I guess. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers