Hi, Thanks for the feedback. Btw, pg_rewind is not a project included in Postgres core as a contrib module or anything, so could you send your feedback and the issues you find directly on github instead? The URL of the project is https://github.com/vmware/pg_rewind.
Either way, here are some comments below... On Wed, Oct 23, 2013 at 6:07 PM, Samrat Revagade <revagade.sam...@gmail.com> wrote: > While testing pg_rewind I encountered following problem. > I used following process to do the testing, Please correct me if I am doing > it in wrong way. > > Problem-1: > pg_rewind gives error (target master must be shut down cleanly.) when > master crashed unexpectedly. > > 1. Setup Streaming Replication (stand alone machine : master server port > -5432, standby server port-5433 ) > 2. Do some operation on master server: > postgres=# create table test(id int); > 3. Crash the Postgres process of master: > kill -9 [pid of postgres process of master server] > 4. Promote standby server > 5. Run pg_rewind: > $ /samrat/postgresql/contrib/pg_rewind/pg_rewind -D > /samrat/master-data/ --source-server='host=localhost port=5433 > dbname=postgres' -v > connected to remote server > fetched file "global/pg_control", length 8192 > target master must be shut down cleanly. > 6. Check masters control information: > $ /samrat/postgresql/install/bin/pg_controldata > /samrat/master-data/ | grep "Database cluster state" > Database cluster state: in production > > IIUC It is because pg_rewind does some checks before resynchronizing the > PostgreSQL data directories. > But In real time scenarios, for example due to hardware failure if master > crashed and its controldata shows the state "in production" then pg_rewind > will fail to pass this check. Yeah, you could call that a limitation of this module. When I looked at its code some time ago, I had on top of my mind the addition of an option of the type --force that could attempt resynchronization of a master even if it did not shut down correctly. > > Problem-2: > For zero length WAL record pf_rewind gives error. > > 1. Setup Streaming Replication (stand alone machine : master server port > -5432, standby server port-5433 ) > 2. Cleanly shutdown master (Do not add any data on master) > 3. Promote standby server > 4. Create table on new master (promoted standby) > postgres=# create table test(id int); > 5. Run pg_rewind: > $ /samrat/postgresql/contrib/pg_rewind/pg_rewind -D > /samrat/master-data/ --source-server='host=localhost port=5433 > connected to remote server > connected to remote server > fetched file "global/pg_control", length 8192 > fetched file "pg_xlog/00000002.history", length 41 > Last common WAL position: 0/4000090 on timeline 1 > could not previous WAL record at 0/4000090: record with zero length > at 0/4000090 This is rather interesting. When I tested it I did not find this error. > Also it as you already listed in README of pg_rewind the it has a problem of > tablespace support. > > I will continue with testing it further to help in improving it :) Thanks! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers