Re: [GENERAL] Weird error when setting up streaming replication
I get the same "weird" errors (WAL file is from different database system) too with Ubuntu and Postgresql 9.3 when setting up a slave using rsync. 1. I installed postgresql on the slave (which automatically does the initdb): sudo apt-get install postgresql-9.3 2. Modified my postgresql.conf file (/etc/postgresql/9.3/main/postgresql.conf) to make it a slave. Did the same thing for pg_hba.conf adding my replication user in there. 3. Stopped both master and slave. 4. Did the rsync from the master to the slave excluding pg_xlog (thereby leaving the existing pg_xlog contents on the slave intact). Then I get the same errors (WAL file is from different database system). Now if I delete everything from the data directory on the slave, including the pg_xlog directory, and then do the rsync excluding the pg_xlog directory, the cluster won't start because the pg_xlog directory is not there. But if I rsync with the pg_xlog directory, then I do not get any more messages in the log file, whether I had the installation data directory in place, or I deleted everything from the data directory before the rsync. So it seems in this version of Postgresql 9.3 on Ubuntu, you should NOT exclude pg_xlog when rsyncin' the stuff over. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Weird-error-when-setting-up-streaming-replication-tp5766888p5808923.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird error when setting up streaming replication
On Fri, Aug 9, 2013 at 9:54 AM, Quentin Hartman wrote: > OK, figured this out. I had it start copying the pg_xlog directory as well > when doing the initial sync. I realized this is also the first time I've > setup replication from scratch using 9.2. All my other 9.2 pairs were setup > on either 9.0 or 9.1, and have been upgraded from there with replication > already in place. Previously, and still according to that article in the > wiki, the pg_xlog directory was specifically excluded. You exclude the pg_xlog in the rsync so as not to restore them, because they are not needed and can cause confusion. But, you don't want an old copy of pg_xlog from a previous cluster sitting around, either, which is the case you were having. By including pg_xlog in the sync, what you were doing is overwriting the old files from a previous cluster (which are toxic) with ones from the master, which are useless, but at least not generally toxic. I think one problem from the wiki is step 3: 3. Edit recovery.conf and postgresql.conf on the standby to start up replication and hot standby. First, in postgresql.conf, change this line It doesn't tell you how you got those files in the first place, in order to edit them. You apparently got them from an initdb. What you probably want to do instead is get them by copying them from the master. > Does anyone know why > this behavior may have changed? I don't think it has changed. I think you are interpretation of the instructions has changed, so you did something different under 9.0 and 9.1. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird error when setting up streaming replication
On Fri, Aug 9, 2013 at 8:33 AM, Quentin Hartman wrote: > This pair of servers aren't replacing anything, they are new, empty servers. That should be 'empty server', singular. > Before starting the slave at all, I'm copying the entire data filestructure > over to it via rsync. I'm doing almost exactly what is described here: > http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#Binary_Replication_in_6_Steps > . The only different is that I've tweaked the paths on the rsync to be > appropriate to my system layout. I've even gone so far as to delete > everything in the data dir except for the pg_xlog directory before syncing > everything over to make sure it wasn't caused by something not getting > overwritten when it was supposed to. So then, you *are* replacing the slave server. If you were not, there would be nothing in its data dir to delete, and nothing there to get overwritten (or not get overwritten). Also, not deleting the pg_xlog directory (or at least the contents of that directory) is exactly the problem. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird error when setting up streaming replication
OK, figured this out. I had it start copying the pg_xlog directory as well when doing the initial sync. I realized this is also the first time I've setup replication from scratch using 9.2. All my other 9.2 pairs were setup on either 9.0 or 9.1, and have been upgraded from there with replication already in place. Previously, and still according to that article in the wiki, the pg_xlog directory was specifically excluded. Does anyone know why this behavior may have changed? On Fri, Aug 9, 2013 at 9:33 AM, Quentin Hartman < qhart...@direwolfdigital.com> wrote: > This pair of servers aren't replacing anything, they are new, empty > servers. Before starting the slave at all, I'm copying the entire data > filestructure over to it via rsync. I'm doing almost exactly what is > described here: > http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#Binary_Replication_in_6_Steps. > The only different is that I've tweaked the paths on the rsync to be > appropriate to my system layout. I've even gone so far as to delete > everything in the data dir except for the pg_xlog directory before syncing > everything over to make sure it wasn't caused by something not getting > overwritten when it was supposed to. > > > On Thu, Aug 8, 2013 at 6:23 PM, Michael Paquier > wrote: > >> On Fri, Aug 9, 2013 at 8:55 AM, Quentin Hartman >> wrote: >> > 2013-08-08 23:47:30 GMT LOG: WAL file is from different database system >> > 2013-08-08 23:47:30 GMT DETAIL: WAL file database system identifier is >> > 5909892614333033983, pg_control database system identifier is >> > 5909892824786287231. >> It looks that you are not able to detect valid checkpoint records when >> replaying WAL because your new system has been initialized with a >> fresh initdb, symbolized by the errors above. You should build your >> new node using a base backup or a snapshot of the data folder of the >> node you are trying to replace. >> -- >> Michael >> > >
Re: [GENERAL] Weird error when setting up streaming replication
This pair of servers aren't replacing anything, they are new, empty servers. Before starting the slave at all, I'm copying the entire data filestructure over to it via rsync. I'm doing almost exactly what is described here: http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#Binary_Replication_in_6_Steps. The only different is that I've tweaked the paths on the rsync to be appropriate to my system layout. I've even gone so far as to delete everything in the data dir except for the pg_xlog directory before syncing everything over to make sure it wasn't caused by something not getting overwritten when it was supposed to. On Thu, Aug 8, 2013 at 6:23 PM, Michael Paquier wrote: > On Fri, Aug 9, 2013 at 8:55 AM, Quentin Hartman > wrote: > > 2013-08-08 23:47:30 GMT LOG: WAL file is from different database system > > 2013-08-08 23:47:30 GMT DETAIL: WAL file database system identifier is > > 5909892614333033983, pg_control database system identifier is > > 5909892824786287231. > It looks that you are not able to detect valid checkpoint records when > replaying WAL because your new system has been initialized with a > fresh initdb, symbolized by the errors above. You should build your > new node using a base backup or a snapshot of the data folder of the > node you are trying to replace. > -- > Michael >
Re: [GENERAL] Weird error when setting up streaming replication
On Fri, Aug 9, 2013 at 8:55 AM, Quentin Hartman wrote: > 2013-08-08 23:47:30 GMT LOG: WAL file is from different database system > 2013-08-08 23:47:30 GMT DETAIL: WAL file database system identifier is > 5909892614333033983, pg_control database system identifier is > 5909892824786287231. It looks that you are not able to detect valid checkpoint records when replaying WAL because your new system has been initialized with a fresh initdb, symbolized by the errors above. You should build your new node using a base backup or a snapshot of the data folder of the node you are trying to replace. -- Michael -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Weird error when setting up streaming replication
I'm going through all my usual steps for setting up streaming replication on a new pair of servers. Modify configs as appropriate, rsync data from master to slave, etc. I have this all automated with chef, and it has been pretty bulletproof for awhile. However, today, I ran into this when starting the slave on this new pair: * Starting PostgreSQL 9.2 database server * The PostgreSQL server failed to start. Please check the log output: 2013-08-08 23:47:30 GMT LOG: database system was interrupted; last known up at 2013-08-08 23:22:40 GMT 2013-08-08 23:47:30 GMT LOG: entering standby mode 2013-08-08 23:47:30 GMT LOG: WAL file is from different database system 2013-08-08 23:47:30 GMT DETAIL: WAL file database system identifier is 5909892614333033983, pg_control database system identifier is 5909892824786287231. 2013-08-08 23:47:30 GMT LOG: invalid primary checkpoint record 2013-08-08 23:47:30 GMT LOG: invalid secondary checkpoint record 2013-08-08 23:47:30 GMT PANIC: could not locate a valid checkpoint record 2013-08-08 23:47:30 GMT LOG: startup process (PID 10600) was terminated by signal 6: Aborted 2013-08-08 23:47:30 GMT LOG: aborting startup due to startup process failure And I've been stumped. I've completely nuked my data dirs and started over and gotten the same result, but with different identifier numbers (as I would expect). Any Ideas? Thanks! QH