On 23.09.2011 11:48, Florian Pflug wrote:
On Sep23, 2011, at 10:41 , Heikki Linnakangas wrote:
On 23.09.2011 11:02, Linas Virbalas wrote:
On 9/22/11 6:59 PM, "Euler Taveira de Oliveira"<eu...@timbira.com> wrote:
If needed, I could do that, if I had the exact procedure... Currently,
during the start of the backup I take the following information:
Just show us the output of pg_start_backup and part of the standby log with
the following message 'redo starts at' and the subsequent messages up to the
failure.
Unfortunately, it's impossible, because the error message "Could not read
from file "pg_clog/0001" at offset 32768: Success" is shown (and startup
aborted) before the turn for "redo starts at" message arrives.
It looks to me that pg_clog/0001 exists, but it shorter than recovery expects.
Which shouldn't happen, of course, because the start-backup checkpoint should
flush all the clog that's needed by recovery to disk before the backup
procedure begins to them.
Yeah. What confuses me though is that we fail while trying to read from the
clog. When do we do that during normal (non-standby) recovery?
I believe the OP is setting up a standby. He didn't say if it's a hot
standby, though. That changes the startup sequence a bit.
At the end of recovery, we read the "last" clog page, containing the
next XID that will be assigned to a transaction, and zero out the future
part of that clog file (StartupCLOG). In hot standby, we do that
earlier, after reading the checkpoint record but before we start
replaying records.
One other possibility would be that the problem is somehow triggered by vacuum
running while the start-backup checkpoint is commencing. That'd explain why the
problem goes away with immediate checkpoints, because those make the windows
mach smaller. But I haven't got a concrete theory of whats happening..
Hmm, that's a good theory. The logic in SimpleLruPhysicakReadPage()
tolerates non-existent files, because the old clog files might've been
truncated away by a vacuum after the backup started. That's OK, because
they will be recreated, and later deleted again, during the WAL replay.
But what if something like this happens:
0. pg_start_backup().
1. rsync starts. It gets a list of all files. It notes that pg_clog/0001
exists.
2. vacuum runs, and deletes pg_clog/0001
3. rsync starts to copy pg_clog/0001. Oops, it's gone. Instead of
leaving it out altogether, it includes it as an empty file.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers