On Mon, Mar 9, 2015 at 7:32 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote: > > Attached is a new patch version, fixing all the little things you listed. I believe this is pretty much ready for commit. I'm going to read it through myself one more time before committing, but I don't have anything mind now that needs fixing anymore. I just pushed the change to split dbcommands.h into dbcommands.h and dbcommands_xlog.h, as that seems like a nice-to-have anyway. >
Few assorted comments: 1. + <step> + <para> + Copy all those changed blocks from the new cluster to the old cluster. + </para> + </step> Isn't it possible incase of async replication that old cluster has some blocks which new cluster doesn't have, what will it do in such a case? I have tried to test some form of such a case and it seems to be failing with below error: pg_rewind.exe -D ..\..\Data\ --source-pgdata=..\..\Database1 The servers diverged at WAL position 0/16DE858 on timeline 1. Rewinding from last common checkpoint at 0/16B8A70 on timeline 1 could not open file "..\..\Data\/base/12706/16391" for truncation: No such file or directory Failure, exiting Exact scenario is: Node -1 (master): Step-1 Create table t1(c1 int, c2 char(500)) with (fillfactor=10); insert into t1 values(generate_series(1,110),'aaaa'); Stop Node-1 (pg_ctl stop ..) Step-2 Copy manually the data-directory (it contains WAL log as well) to new location say Database1 Node-2 (standby) Step-3 Change settings to make it stand-by (recovery.conf and change postgresql.conf) Start Node and verify all data exists. Step-4 use pg_ctl promote to make Node-2 as master Step-5 Start Node-1 insert few more records insert into t1 values(generate_series(110,115),'aaaa'); Step-6 Node-2 Now insert one more records in table t1 insert into t1 values(116,'aaaa'); Stop both the nodes. Now if I run pg_rewind on old-master(Node-1), it will lead to above error. I think above scenario can be possible in async replication. If I insert more records (greater than what I inserted in Step-5) in Step-6, then pg_rewind works fine. 2. diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm +# To run a test, the test script (in t/ subdirectory) calls the functions What do you mean by t/ subdirectory? 3. + <application>pg_rewind</> was run, and thereforce could not be copied by typo /thereforce 4. +static void +sanityChecks(void) +{ + /* Check that there's no backup_label in either cluster */ I could not see such a check in code. Am I missing anything? 5. + /* + * TODO: move old file out of the way, if any. And perhaps create the + * file with temporary name first and rename in place after it's done. + */ + snprintf(BackupLabelFilePath, MAXPGPATH, + "%s/backup_label" /* BACKUP_LABEL_FILE */, datadir_target); + There are couple of other TODO's in the patch, are these for future? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com