Re: [ADMIN] Steps to switch from Master to standby mode :
I think that you need to create a new base backup from the new master to make the old master a standby server. I usually do this using rsync, so that it takes as fast as possible, but you could also use a tool like http://www.repmgr.org/ Regards, Strahinja On Mon, May 13, 2013 at 9:23 AM, prakhar jauhari prak...@gmail.com wrote: Hi all, I have been trying to setup Postgres 9.2 in HA using streaming replication and base backup. There is no problem in switching from: Standby - Master using the trigger file mechanism provided by postgres. The problem comes when switching from: Master - Stanbdy : I try to set up Streaming replication from the new standby to the new Master, but replication doesn't start, rather i find the following error in postgres logs FATAL: timeline 2 of the primary does not match recovery target timeline 1. Is there any way the timeline can be bumped up to the correct number on the new standby, without taking base backup. And is it safe to use the method described in http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-timeline-switch-of-slave-node-without-archives/ to bump up the timeline. regards, Prakhar
Re: [ADMIN] Some replication-related notes and questions
I can answer your first question. The way I check the replication delay is by running this query on the replication server: *SELECT now() - pg_last_xact_replay_timestamp();* Of course you need to configure hot standby replication, which you should if you are not. Regards, Strahinja On Tue, Apr 30, 2013 at 9:42 PM, Scott Whitney sc...@journyx.com wrote: We recently moved to PG 9.2.4 (from 8.4.4) to take advantage of replication, and I have to say it's pretty awesome. I ran into some things that I was hoping someone could clarify. a) There appears to be no way to tell how far behind my standby servers are. That is, I can find a checkpoint with pg_controldata. I can find the sending/receiving WAL processes via ps or replay/receive_location(). However there seems to be no correlation to real-world times or dates as in how many seconds/minutes/hours behind are my standby servers. b) This segues nicely into the archive_status/*.done files. The _only_ files in my archive_status directory on my standby servers are .done files. Will these clean up on their own? Can I stat the last one to know the maximum discrepancy between my master and slave? If they do NOT clean up on their own, is it safe to remove them? I couldn't find any information on what that directory specifically does or whether .done files are required. c) It would appear that a FULL vacuum can hose replication (in the case of a long-standing hanging transaction). Has anyone else run into this? Specifically, my standby was up and running and replicating. Sunday morning (I do full vacuums on Saturday due to an old bug in an old database) my standby was trying to replay a WAL file that didn't exist because I had hanging transactions that began before my oldest WAL segment. Since the replication was up and running on Saturday, I was wondering whether full vacuum could have been to blame for the servers getting out ot sync. Thanks, Scott Whitney
Re: [ADMIN] archive falling behind
What does your shell script do that you need a script for archiving? The archive command is usually a cp/scp/rsync command, you usually don't need more than that. Regards, Strahinja On Fri, Apr 26, 2013 at 3:55 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote: German Becker wrote: Here is the archive part of the config: archive_mode = on # allows archiving to be done # (change requires restart) archive_command = '/var/lib/postgresql/scripts/archive_copy.sh %p %f' # command to use to archive a logfile segment #archive_timeout = 0# force a logfile segment switch after this # number of seconds; 0 disables So the problem might be in that script. The archive coommand makes a local copy and then it copies to the backup server via ssh. Both copies are md5-checked and retried up to 3 times in case of failure. archive_command should not retry the operation, but rather return a non-zero return code. I have seen under heavy load that some WALs are skipped, some have less size, some are corrupted (i,e, the loop fails 3 times). I'm not sure about the return value (checking it). What is the expected behaviour of the archiver? Will it retry de archive if archive command returns differnt than 0? Will it retain the WAL segment until it is succesfuly archived? See http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL archive_command should exit with zero only if the WAL segment was archived successfully. PostgreSQL will retry and retain the WAL segment until archival succeeds. Yours, Laurenz Albe
Re: [ADMIN] archive falling behind
How can the archive process fall behind? Postgres will never reuse WAL files which are not yet archived. Regarding your question about slowing down WAL generation, that is not possible to do, unless you slow down the application which is doing the writing into the database. Regards, Strahinja On Mon, Apr 22, 2013 at 6:03 PM, German Becker german.bec...@gmail.comwrote: Hi I am working with version 9.1.9. The dabase is in archiving mode. The archives are then sended to a hotstandby replication sever. I have seen that under heavy load, the archive process will fall behind the WAL generation and thuse some WAL segments won't get archived so the replication stops. To recover from this I need to do a new cold backup and send it to the replication server. Is there a way to limit the WAL generation (i.e slowing down insterts and deletes) so as to prevent the archive from falling behind? Thanks!!
Re: [ADMIN] postgres 9.0 - unable to bind to localhost.
Sorry for a late reply, but I had the exact same problem and it was a bug in the Red Hat RPM package upgrade script of the sudo package. This basically means the user running Postgres cannot resolve hostname localhost. Have you tried logging in as the user running Postgres and trying to resolve localhost? In RHEL/Centos you would do this with: *su - postgres* *nslookup localhost* If this returns 127.0.0.1, then something else is a problem, but if it says the hostname cannot be resolved, then you need to check permissions of /etc/hosts and /etc/nsswitch.conf, all users should be able to read those two files. You can just run: *chmod a+r /etc/hosts* *chmod a+r /etc/nsswitch.conf* Another workaround would be to replace 'localhost' in postgresql.conf with '127.0.0.1'. Regards, Strahinja On Mon, Dec 10, 2012 at 9:57 AM, Rajagopalan, Jayashree jayashree.rajagopa...@emc.com wrote: Hi all: ** ** Getting this error in the postgres logs: 2012-11-29 09:47:39.335 CST - SessionId:50b7839b.4791 - TransId:0 - LOG: could not translate host name localhost, service 5435 to address: Name or service not known 2012-11-29 09:47:39.336 CST - SessionId:50b7839b.4791 - TransId:0 - WARNING: could not create listen socket for localhost 2012-11-29 09:47:39.343 CST - SessionId:50b7839b.4791 - TransId:0 - LOG: could not resolve localhost: Name or service not known ** ** Also the netstat command reveals the port(5435) is not bound to 127.0.0.1. Checked the pg_hba.conf, /etc/hosts, nsswitch.conf, resolv.conf do not see any issues there. ** ** the listen_addresses entry is like this: listen_addresses = 'localhost,host_ip' The port is bound to the host_ip but not to localhost. How does postgres resolve the local host? The entry is proper for localhost in /etc/hosts, and nsswitch.conf points hosts to files, dns. host localhost resolves to 127.0.0.1 in command line. what else to check? ** ** Regards J