Re: [HACKERS] [GENERAL] 8.1.4 - problem with PITR - .backup.done / backup.ready version of the same file at the same time.

2006-05-30 Thread Tom Lane
Rafael Martinez, Guerrero [EMAIL PROTECTED] writes:
 The problem was that 000100080010.0006D5E8.backup was
 already archived, but under pg_xlog/archive_status/ there were two
 files:
 -
 000100080010.0006D5E8.backup.done
 000100080010.0006D5E8.backup.ready
 -

 This situation should not happen, anyone has seen this problem before?

No, it shouldn't.  What I suspect is that XLogArchiveIsDone() got
confused and created a duplicate .ready file.  It basically assumes
that the only way its stat() calls can fail is ENOENT, ie, file not
there ... but I wonder if they failed for some other reason instead.
What sort of platform and filesystem is this on?

Did you happen to make note of the mod times of the two files before
deleting them?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] 8.1.4 - problem with PITR - .backup.done / backup.ready version of the same file at the same time.

2006-05-30 Thread Tom Lane
Rafael Martinez [EMAIL PROTECTED] writes:
 What happens if we have a race condition and the archiver creates
 a .done file between the last check for the .done file and the creation
 of the .ready file by XLogArchiveNotify?

That can't happen; the archiver creates the .done file by rename()ing
the previous .ready file, which is (supposed to be) an atomic action.
If the .ready file isn't there, and then after that we see that the
.done file isn't there, then either neither of them are there or the
filesystem is seriously broken.

My thought is that the stat()s on the .done file failed for some obscure
reason, perhaps insufficient kernel resources, even though the file was 
actually there.

If you have postmaster log output for the interval in which this
happened, it would be interesting to look for occurrences of this
warning message from pgarch_archiveDone:

if (rename(rlogready, rlogdone)  0)
ereport(WARNING,
(errcode_for_file_access(),
 errmsg(could not rename file \%s\ to \%s\: %m,
rlogready, rlogdone)));

If you find any then we might need a different theory ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend