Re: [HACKERS] WAL replay failure after file truncation(?)

Hans-Jürgen Schönig Fri, 27 May 2005 07:05:02 -0700

Tom Lane wrote:

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <[EMAIL PROTECTED]> writes:
My question is: What happens if the system is killed insiderebuild_relation or inside swap_relfilenodes which is called byrebuild_relation?
Nothing at all, because the system catalog updates aren't committed yet,
and we haven't done anything to the relation's old physical file.



This is actually what I expected.
I have gone through the code and it looks correct.

TRUNCATE is the only command in this application which can potentiallycause the problem (it is very unlikely that INSERT removes a file).

If I were you I'd be looking into whether your disk hardware honors
write ordering properly.  This sounds like something allowed the
directory change to reach disk before the transaction commit WAL record
did; which is impossible if fsync is doing what it's supposed to.

                        regards, tom lane

We are on sun Solaris (x86) box here. I am not sure what Sun hascorrupted to make this error happen. Obviously it happens only once per1.000.000 tries ...I am just trying to figure out whether the bug could potentially beinside PostgreSQL. It would have been surprised if somebody had overseena problem like that.


        many thanks and best regards,

                Hans


--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly

Re: [HACKERS] WAL replay failure after file truncation(?)

Reply via email to