Re: [HACKERS] panic on 7.3

Rick Gigger Fri, 20 Jan 2006 21:28:51 -0800


On Jan 20, 2006, at 6:02 PM, Tom Lane wrote:

Rick Gigger <[EMAIL PROTECTED]> writes:

Postgres version 7.3.4


You realize of course that that's pretty old ...

That is right now.  Right after it started up it went up to 0292.


So it was the latest file eh?  I thought maybe you had some problem
with a corrupted XID leading to trying to touch a clog file
out-of-order, but that seems ruled out.

2006-01-20 11:50:51 PANIC:  creation of file /var/lib/pgsql/data/
pg_clog/0292 failed: File exists

Digging in the 7.3 sources, it seems that error message could onlyhave

come from here:

    fd = BasicOpenFile(path, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
    if (fd < 0)
    {
        if (errno != ENOENT)
            elog(PANIC, "open of %s failed: %m", path);

fd = BasicOpenFile(path, O_RDWR | O_CREAT | O_EXCL |PG_BINARY,

                           S_IRUSR | S_IWUSR);
        if (fd < 0)
            elog(PANIC, "creation of file %s failed: %m", path);
    }

AFAICS, it is simply not possible for the second open() to fail with
that errno, *unless* someone else created the same file in the
microseconds between the two open calls.

The code doing this has a lock on the particular clog buffer it'strying

to write out, so no-one else could be trying to write the same buffer;
but now that I look at it, it's entirely legal for someone else to be
trying to write a different clog buffer.  This leads to the following
theory:

1. The clog page that would be first in the 0292 segment gotcreated in

clog buffers, but there was no reason to write it out for awhile.  (In
normal operation, only a checkpoint would be cause to write out the
frontmost page of clog.)

2. More than 2K transactions elapsed, so the page that would be second

in the 0292 segment also got set up in clog buffers. (Rick, is theload

on your machine such that several thousand transactions might have
elapsed between checkpoints?)  Perhaps there were even enough
transactions so that more than two pages were dirty and pending write
in the clog buffers, yet the file hadn't been created yet.

So what I think I'm getting killed on right now are the disk writes.So I was thinking of changing away from the default checkpointsettings. My load is going to continue to go up so 2000+transactions are going to start happening pretty fast. I have lotsof disk space so I was going to increase the time betweencheckpoints. Will that increase the chances of this happening againor was this such a strange freak of nature coincidence that it can'treally even happen again. Also I've decided to upgrade all the wayto the latest 8.1 code.

3. Two different backends decided to try to write different clog pages
concurrently.  Probably one was writing the frontmost page because it

was doing a checkpoint, and another needed to read in an older clogpage

so it had to swap out one of the other dirty buffers.

4. Kaboom.

If this theory is correct, the bug has been there since the clog code
was first written.  But the conditions for having it happen are narrow
enough that it's not too surprising we haven't seen it before.

I think that a sufficient fix might just be to remove the O_EXCL flag
from the second open() call --- ie, if someone else creates the file
in this narrow window, it should be considered OK.  Comments?

                        regards, tom lane

---------------------------(end ofbroadcast)---------------------------

TIP 6: explain analyze is your friend



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] panic on 7.3

Reply via email to