On Thu, Jun 11, 2015 at 5:48 PM, Fujii Masao <masao.fu...@gmail.com> wrote: > On Thu, Jun 11, 2015 at 2:14 PM, Michael Paquier > <michael.paqu...@gmail.com> wrote: >> On Thu, Jun 11, 2015 at 1:51 AM, Fujii Masao <masao.fu...@gmail.com> wrote: >>> Shouldn't pg_rewind ignore that failure of operation? If the file is not >>> found in source server, the file doesn't need to be copied to destination >>> server obviously. So ISTM that pg_rewind safely can skip copying that file. >>> Thought? >> >> I think that you should fail. Let's imagine that the master to be >> rewound has removed a relation file before being stopped cleanly after >> its standby has been promoted that was here at the last checkpoint >> before forking, and that the standby still has the relation file after >> promotion. You should be able to copy it to be able to replay WAL on >> it. If the standby has removed a file in the file map after taking the >> file map, I guess that the best thing to do is fail because the file >> that should be here for the rewound node cannot be fetched. > > In this case, why do you think that the file should exist in the old master? > Even if it doesn't exist, ISTM that the old master can safely replay the WAL > records related to the file when it restarts. So what's the problem > if the file doesn't exist in the old master?
Well, some user may want to rewind the master down to the point where WAL forked, and then recover it immediately when a consistent point is reached just at restart instead of replugging it into the cluster. In this case I think that you need the relation file of the dropped relation to get a consistent state. That's still cheaper than recreating a node from a fresh base backup in some cases, particularly if the last base backup taken is far in the past for this cluster. >> Documentation should be made clearer about that with a better error >> message... > > I'm wondering how we can recover (or rewind again) the old master from > that error. This also would need to be documented if we decide not to > fix any code regarding this problem... FWIW, here is a scenario able to trigger the error with 1 master (port 5432, data at ~/data/5432) and 1 standby (port 5433, data at ~/data/5433). $ psql -c 'create table aa as select generate_series(1,1000000)' # Promote standby $ pg_ctl promote -D ~/data/5433/ # Drop table on master $ psql -c 'drop table aa' DROP TABLE $ pg_ctl stop -D ~/data/5432/ At this point there is no more relation file on master for 'aa', it is still present on standby. Running pg_rewind at this point will work, the relation file would be copied from the promoted standby to master. $ lldb -- pg_rewind -D 5432 --source-server="port=5433 dbname=postgres" Breakpoint pg_rewind after fetchSourceFileList() and before replaying the changes from the block map, drop table 'aa' on standby and checkpoint it, then the source file list is inconsistent and pg_rewind will fail. This can just happen with --source-server, with --source-pgdata Adding a sleep() of a couple of seconds in pg_rewind may be better to trigger directly the error ;), with DROP DATABASE for example. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers