Re: [GENERAL] streaming replication: one problem & several questions
On Thu, Aug 18, 2011 at 4:26 AM, Lonni J Friedman wrote: > I wish I knew. All the documentation out there always focuses on > setting up a restore command, as if there would be a huge disaster if > it wasn't done. Is it safe to simply make wal_keep_segments really > large, and skip the restore_command altogether? There are pros and cons of replication setting NOT using restore_command. Please evaluate whether it's safe or not according to them. Pros; * You don't need to prepare the archive area shared between the master and standby. Don't need to purchase new server for that. * If you use restore_command and have the shared archive area, archive_command is a bit more likely to fail because it copies WAL files via network. Failure of archive_command might fill up the pg_xlog directory on the master, which might cause PANIC error. So you need to consider how to handle this failure case. OTOH, you don't need to do that if you don't use restore_command. Cons; * When setting up the standby, if the backup takes very long because the database is quite large, some WAL files required to the backup might be deleted from the master during the backup. If this happens, the standby starting from that backup will fail to start replication. To avoid such an unexpected deletion of WAL files from the master, you need to increase wal_keep_segments enough. But it might not be easy to determine the appropriate value of it. * You need to prepare large disk space for pg_xlog directory if wal_keep_segments is large. Because, in that case, a large number of WAL files can accumulate in pg_xlog. * When replication connection is terminated, no WAL data is streamed to the standby, so the standby cannot advance recovery at all. OTOH, if you set restore_command on the standby and have the shared archive area, the standby can read new WAL file from it by using restore_command and advance recovery. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] streaming replication: one problem & several questions
On Mon, Aug 15, 2011 at 9:34 PM, Fujii Masao wrote: > On Thu, Aug 11, 2011 at 7:19 AM, Lonni J Friedman wrote: >> First the problem. On *only* one of the two standby servers, I'm >> seeing errors like the following whenever I issue any SQL commands on >> the master which write (insert, update, etc) to the database: >> LOG: invalid record length at 8/7A20 >> FATAL: terminating walreceiver process due to administrator command >> LOG: invalid record length at 8/7AB0 >> LOG: streaming replication successfully connected to primary >> LOG: invalid record length at 8/7B20 >> FATAL: terminating walreceiver process due to administrator command >> LOG: record with zero length at 8/7BB0 >> LOG: streaming replication successfully connected to primary >> LOG: record with incorrect prev-link 8/7958 at 8/7DB0 >> LOG: streaming replication successfully connected to primary > > Did you use gcc4.6 or later to build PostgreSQL9.0? If yes, you would > face the same problem reported before; > http://archives.postgresql.org/pgsql-hackers/2011-06/msg00661.php > > This problem was fixed, and the fix will be included in next minor update > (i.e., 9.0.5). > http://archives.postgresql.org/pgsql-committers/2011-06/msg00101.php Yes, that was the issue. I thought that I had replied earlier to someone else speculating that this was the issue, but perhaps I had not. >> 1) Both of the wiki links above comment that the restore_command may >> not be necessary if wal_keep_segments is large enough (mine is set to >> 128). I was going to setup the restore_command anyway, as I'm not yet >> confident enough about streaming replication and failover with >> postgresql to take chances, although the fact that i have two standby >> servers makes this setup a bit more complex. However, can anyone >> comment about whether its ever truly safe 100% of the time to run >> without a restore_command ? > > Specifically, what problem are you concerned about? I wish I knew. All the documentation out there always focuses on setting up a restore command, as if there would be a huge disaster if it wasn't done. Is it safe to simply make wal_keep_segments really large, and skip the restore_command altogether? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] streaming replication: one problem & several questions
On Thu, Aug 11, 2011 at 7:19 AM, Lonni J Friedman wrote: > First the problem. On *only* one of the two standby servers, I'm > seeing errors like the following whenever I issue any SQL commands on > the master which write (insert, update, etc) to the database: > LOG: invalid record length at 8/7A20 > FATAL: terminating walreceiver process due to administrator command > LOG: invalid record length at 8/7AB0 > LOG: streaming replication successfully connected to primary > LOG: invalid record length at 8/7B20 > FATAL: terminating walreceiver process due to administrator command > LOG: record with zero length at 8/7BB0 > LOG: streaming replication successfully connected to primary > LOG: record with incorrect prev-link 8/7958 at 8/7DB0 > LOG: streaming replication successfully connected to primary Did you use gcc4.6 or later to build PostgreSQL9.0? If yes, you would face the same problem reported before; http://archives.postgresql.org/pgsql-hackers/2011-06/msg00661.php This problem was fixed, and the fix will be included in next minor update (i.e., 9.0.5). http://archives.postgresql.org/pgsql-committers/2011-06/msg00101.php Of course, you can avoid the problem by building PostgreSQL with pre-4.6 gcc. > 0) I've successfully setup the WAL archiving on the master, and set > archive_timeout=61. However, what I'm seeing is that new files are > not getting generated every 61 seconds, but instead only when some > kind of SQL is invoked which writes to the database, or every 305 > seconds (whichever comes first). First of all, you don't need to set archive_timeout. Without archive_timeout, streaming replication transfers WAL records from the master to the standby in almost real time. archive_timeout doesn't always generate new WAL file for each timeout. If there is no write workload, WAL file generation by archive_timeout is skipped. OTOH, checkout generates write workload, so archive_timeout after checkpoint always creates new WAL file. Since (I guess) you set checkpoint_timeout to 5min, you observed WAL file generation for each about 5min. > 1) Both of the wiki links above comment that the restore_command may > not be necessary if wal_keep_segments is large enough (mine is set to > 128). I was going to setup the restore_command anyway, as I'm not yet > confident enough about streaming replication and failover with > postgresql to take chances, although the fact that i have two standby > servers makes this setup a bit more complex. However, can anyone > comment about whether its ever truly safe 100% of the time to run > without a restore_command ? Specifically, what problem are you concerned about? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] streaming replication: one problem & several questions
Hi. I've experienced very similar behavior and log message. I do not know what is the problem in detail but my guess is that standby system has data which is incompatible to accept streaming replication of the master. The way I solved is the following on Ubuntu11.04. Suppose postgresql.conf and recover.conf (on standby) are ready. 0. Stop postgresql on standby standby# /etc/init.d/postgresql stop 1. invoke pg_start_backup() on master master# sudo -u postgres psql -c "SELECT pg_start_backup('2011-08-15_04:49)" 2. remove data files on standby standby# cd /var/lib/postgresql/9.0/main standby# /bin/rm -rf base pg_* standby# mkdir pg_xlog; chown postgres.postgres pg_xlog; chmod 700 pg_xlog 3. copy data files master# rsync -av --delete /var/lib/postgresql/9.0/main --exclude=pg_xlog --exclude=postmaster.pid --exclude=server.crt --exclude=server.key /path/to/standby/data/directory 4. invoke pg_stop_backup() on master master# sudo -u postgres psql -c "SELECT pg_stop_backup()" 5. start postgresql on standby standby# /etc/init.d/postgresql start Then, I found streaming replication just started to work. -- View this message in context: http://postgresql.1045698.n5.nabble.com/streaming-replication-one-problem-several-questions-tp4687602p4698911.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] streaming replication: one problem & several questions
On Thu, Aug 11, 2011 at 8:17 AM, Pedro Sam wrote: > Do your machines have the same architecture? (64 bit vs 32 bit) Yes, they're all Fedora15-x86_64. -- ~ L. Friedman netll...@gmail.com LlamaLand https://netllama.linux-sxs.org -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] streaming replication: one problem & several questions
Do your machines have the same architecture? (64 bit vs 32 bit) - This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general