On Fri, Jun 14, 2013 at 10:11 AM, Samrat Revagade <revagade.sam...@gmail.com > wrote:
> Hello, > > > We have already started a discussion on pgsql-hackers for the problem of > taking fresh backup during the failback operation here is the link for that: > > > > > http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com > > > > Let me again summarize the problem we are trying to address. > > > > When the master fails, last few WAL files may not reach the standby. But > the master may have gone ahead and made changes to its local file system > after flushing WAL to the local storage. So master contains some file > system level changes that standby does not have. At this point, the data > directory of master is ahead of standby's data directory. > > Subsequently, the standby will be promoted as new master. Later when the > old master wants to be a standby of the new master, it can't just join the > setup since there is inconsistency in between these two servers. We need to > take the fresh backup from the new master. This can happen in both the > synchronous as well as asynchronous replication. > > > > Fresh backup is also needed in case of clean switch-over because in the > current HEAD, the master does not wait for the standby to receive all the > WAL up to the shutdown checkpoint record before shutting down the > connection. Fujii Masao has already submitted a patch to handle clean > switch-over case, but the problem is still remaining for failback case. > > > > The process of taking fresh backup is very time consuming when databases > are of very big sizes, say several TB's, and when the servers are connected > over a relatively slower link. This would break the service level > agreement of disaster recovery system. So there is need to improve the > process of disaster recovery in PostgreSQL. One way to achieve this is to > maintain consistency between master and standby which helps to avoid need > of fresh backup. > > > > So our proposal on this problem is that we must ensure that master should > not make any file system level changes without confirming that the > corresponding WAL record is replicated to the standby. > > > A alternative proposal (which will probably just reveal my lack of understanding about what is or isn't possible with WAL). Provide a way to restart the master so that it rolls back the WAL changes that the slave hasn't seen. > There are many suggestions and objections pgsql-hackers about this problem > The brief summary is as follows: > > >