The following bug has been logged online: Bug reference: 5038 Logged by: Luke Koops Email address: luke.ko...@entrust.com PostgreSQL version: 8.3.7 Operating system: Windows 2003 Server Enterprise Edition Description: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving. Details:
On my system, one of the WAL files is pending deletion. The handle is being held by one of the postgres backend processes, but that is another potential bug. At first, the unlink worked, and the .ready and .done files were deleted. But the WAL file still shows up in the pg_xlog directory listing. Note: the WAL file did get archived properly. There was no error reported at the time. When it comes time to recycle the log files, RemoveOldXLogFiles() calls ReadDir() to get the list of files, then it calls XLogArchiveCheckDone() which, if it cannot find a .done or a .ready file, calls XLogArchiveNotify(). XLogArchiveNotify() creates the .ready file again. This causes the archiver to call the archive command on the old WAL file that is pending deletion. The copy command will fail and all subsequent archive attempts will keep trying to copy the old WAL file that is pending deletion. At this point, none of the WAL files will get shipped and the pg_xlog folder will start filling up. Before calling XLogArchiveCheckDone(), RemoveOldXLogFiles() makes a number of tests to make sure the name is for a legitimate XLOG. This would be a good time to make sure the file is real, not pending deletion. That would prevent the creation of the .ready file and WAL archiving would continue to work. It might be a good idea to log something at the DEBUG level if a directory entry is encoutered that matches the naming conventions but is not a real file. You could probably reproduce this behaviour by changing the permissions on a WAL file, although you wouldn't be able to test a fix in the same way. I have not reliably reproduced the WAL file handle "leak" in the postgres back end. I believe may be related to statements timing out. My system currently has statement_timeout=1min, but that will be removed. I will report the "leak" when I have a better handle (no pun) on the situation. -Luke -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs