Re: [GENERAL] Weird error when setting up streaming replication

2014-06-24 Thread pgdude
I get the same "weird" errors (WAL file is from different database system)
too with Ubuntu and Postgresql 9.3 when setting up a slave using rsync.

1. I installed postgresql on the slave (which automatically does the
initdb):
   sudo apt-get install postgresql-9.3

2. Modified my postgresql.conf file
(/etc/postgresql/9.3/main/postgresql.conf) to make it a slave.  Did the same
thing for pg_hba.conf adding my replication user in there.

3. Stopped both master and slave.

4. Did the rsync from the master to the slave excluding pg_xlog (thereby
leaving the existing pg_xlog contents on the slave intact).

Then I get the same errors (WAL file is from different database system).

Now if I delete everything from the data directory on the slave, including
the pg_xlog directory, and then do the rsync excluding the pg_xlog
directory, the cluster won't start because the pg_xlog directory is not
there.

But if I rsync with the pg_xlog directory, then I do not get any more
messages in the log file, whether I had the installation data directory in
place, or I deleted everything from the data directory before the rsync.


So it seems in this version of Postgresql 9.3 on Ubuntu, you should NOT
exclude pg_xlog when rsyncin' the stuff over.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Weird-error-when-setting-up-streaming-replication-tp5766888p5808923.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Weird error when setting up streaming replication

2013-08-09 Thread Jeff Janes
On Fri, Aug 9, 2013 at 9:54 AM, Quentin Hartman
 wrote:
> OK, figured this out. I had it start copying the pg_xlog directory as well
> when doing the initial sync. I realized this is also the first time I've
> setup replication from scratch using 9.2. All my other 9.2 pairs were setup
> on either 9.0 or 9.1, and have been upgraded from there with replication
> already in place. Previously, and still according to that article in the
> wiki, the pg_xlog directory was specifically excluded.

You exclude the pg_xlog in the rsync so as not to restore them,
because they are not needed and can cause confusion.  But, you don't
want an old copy of pg_xlog from a previous cluster sitting around,
either, which is the case you were having.

By including pg_xlog in the sync, what you were doing is overwriting
the old files from a previous cluster (which are toxic) with ones from
the master, which are useless, but at least not generally toxic.

I think one problem from the wiki is step 3:

3. Edit recovery.conf and postgresql.conf on the standby to start up
replication and hot standby. First, in postgresql.conf, change this
line

It doesn't tell you how you got those files in the first place, in
order to edit them.  You apparently got them from an initdb.  What you
probably want to do instead is get them by copying them from the
master.

> Does anyone know why
> this behavior may have changed?

I don't think it has changed.  I think you are interpretation of the
instructions has changed, so you did something different under 9.0 and
9.1.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Weird error when setting up streaming replication

2013-08-09 Thread Jeff Janes
On Fri, Aug 9, 2013 at 8:33 AM, Quentin Hartman
 wrote:
> This pair of servers aren't replacing anything, they are new, empty servers.

That should be 'empty server', singular.

> Before starting the slave at all, I'm copying the entire data filestructure
> over to it via rsync. I'm doing almost exactly what is described here:
> http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#Binary_Replication_in_6_Steps
> . The only different is that I've tweaked the paths on the rsync to be
> appropriate to my system layout. I've even gone so far as to delete
> everything in the data dir except for the pg_xlog directory before syncing
> everything over  to make sure it wasn't caused by something not getting
> overwritten when it was supposed to.

So then, you *are* replacing the slave server.  If you were not, there
would be nothing in its data dir to delete, and nothing there to get
overwritten (or not get overwritten).  Also, not deleting the pg_xlog
directory (or at least the contents of that directory) is exactly the
problem.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Weird error when setting up streaming replication

2013-08-09 Thread Quentin Hartman
OK, figured this out. I had it start copying the pg_xlog directory as well
when doing the initial sync. I realized this is also the first time I've
setup replication from scratch using 9.2. All my other 9.2 pairs were setup
on either 9.0 or 9.1, and have been upgraded from there with replication
already in place. Previously, and still according to that article in the
wiki, the pg_xlog directory was specifically excluded. Does anyone know why
this behavior may have changed?


On Fri, Aug 9, 2013 at 9:33 AM, Quentin Hartman <
qhart...@direwolfdigital.com> wrote:

> This pair of servers aren't replacing anything, they are new, empty
> servers. Before starting the slave at all, I'm copying the entire data
> filestructure over to it via rsync. I'm doing almost exactly what is
> described here:
> http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#Binary_Replication_in_6_Steps.
>  The only different is that I've tweaked the paths on the rsync to be
> appropriate to my system layout. I've even gone so far as to delete
> everything in the data dir except for the pg_xlog directory before syncing
> everything over to make sure it wasn't caused by something not getting
> overwritten when it was supposed to.
>
>
> On Thu, Aug 8, 2013 at 6:23 PM, Michael Paquier  > wrote:
>
>> On Fri, Aug 9, 2013 at 8:55 AM, Quentin Hartman
>>  wrote:
>> > 2013-08-08 23:47:30 GMT LOG:  WAL file is from different database system
>> > 2013-08-08 23:47:30 GMT DETAIL:  WAL file database system identifier is
>> > 5909892614333033983, pg_control database system identifier is
>> > 5909892824786287231.
>> It looks that you are not able to detect valid checkpoint records when
>> replaying WAL because your new system has been initialized with a
>> fresh initdb, symbolized by the errors above. You should build your
>> new node using a base backup or a snapshot of the data folder of the
>> node you are trying to replace.
>> --
>> Michael
>>
>
>


Re: [GENERAL] Weird error when setting up streaming replication

2013-08-09 Thread Quentin Hartman
This pair of servers aren't replacing anything, they are new, empty
servers. Before starting the slave at all, I'm copying the entire data
filestructure over to it via rsync. I'm doing almost exactly what is
described here:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#Binary_Replication_in_6_Steps.
The only different is that I've tweaked the paths on the rsync to be
appropriate to my system layout. I've even gone so far as to delete
everything in the data dir except for the pg_xlog directory before syncing
everything over to make sure it wasn't caused by something not getting
overwritten when it was supposed to.


On Thu, Aug 8, 2013 at 6:23 PM, Michael Paquier
wrote:

> On Fri, Aug 9, 2013 at 8:55 AM, Quentin Hartman
>  wrote:
> > 2013-08-08 23:47:30 GMT LOG:  WAL file is from different database system
> > 2013-08-08 23:47:30 GMT DETAIL:  WAL file database system identifier is
> > 5909892614333033983, pg_control database system identifier is
> > 5909892824786287231.
> It looks that you are not able to detect valid checkpoint records when
> replaying WAL because your new system has been initialized with a
> fresh initdb, symbolized by the errors above. You should build your
> new node using a base backup or a snapshot of the data folder of the
> node you are trying to replace.
> --
> Michael
>


Re: [GENERAL] Weird error when setting up streaming replication

2013-08-08 Thread Michael Paquier
On Fri, Aug 9, 2013 at 8:55 AM, Quentin Hartman
 wrote:
> 2013-08-08 23:47:30 GMT LOG:  WAL file is from different database system
> 2013-08-08 23:47:30 GMT DETAIL:  WAL file database system identifier is
> 5909892614333033983, pg_control database system identifier is
> 5909892824786287231.
It looks that you are not able to detect valid checkpoint records when
replaying WAL because your new system has been initialized with a
fresh initdb, symbolized by the errors above. You should build your
new node using a base backup or a snapshot of the data folder of the
node you are trying to replace.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Weird error when setting up streaming replication

2013-08-08 Thread Quentin Hartman
I'm going through all my usual steps for setting up streaming replication
on a new pair of servers. Modify configs as appropriate, rsync data from
master to slave, etc. I have this all automated with chef, and it has been
pretty bulletproof for awhile. However, today, I ran into this when
starting the slave on this new pair:

 * Starting PostgreSQL 9.2 database
server
* The PostgreSQL server failed to start. Please check the log output:
2013-08-08 23:47:30 GMT LOG:  database system was interrupted; last known
up at 2013-08-08 23:22:40 GMT
2013-08-08 23:47:30 GMT LOG:  entering standby mode
2013-08-08 23:47:30 GMT LOG:  WAL file is from different database system
2013-08-08 23:47:30 GMT DETAIL:  WAL file database system identifier is
5909892614333033983, pg_control database system identifier is
5909892824786287231.
2013-08-08 23:47:30 GMT LOG:  invalid primary checkpoint record
2013-08-08 23:47:30 GMT LOG:  invalid secondary checkpoint record
2013-08-08 23:47:30 GMT PANIC:  could not locate a valid checkpoint record
2013-08-08 23:47:30 GMT LOG:  startup process (PID 10600) was terminated by
signal 6: Aborted
2013-08-08 23:47:30 GMT LOG:  aborting startup due to startup process
failure


And I've been stumped. I've completely nuked my data dirs and started over
and gotten the same result, but with different identifier numbers (as I
would expect).

Any Ideas?

Thanks!

QH