Re: [ADMIN] Steps to switch from Master to standby mode :

2013-05-14 Thread Strahinja Kustudić
I think that you need to create a new base backup from the new master to
make the old master a standby server. I usually do this using rsync, so
that it takes as fast as possible, but you could also use a tool like
http://www.repmgr.org/

Regards,
Strahinja


On Mon, May 13, 2013 at 9:23 AM, prakhar jauhari prak...@gmail.com wrote:

 Hi all,

 I have been trying to setup Postgres 9.2 in HA using streaming replication
 and base backup.
 There is no problem in switching from:
 Standby - Master using the trigger file mechanism provided by postgres.

 The problem comes when switching from:
 Master - Stanbdy : I try to set up Streaming replication from the new
 standby to the new Master,
 but replication doesn't start, rather i find the following error in
 postgres logs

 FATAL:  timeline 2 of the primary does not match recovery target timeline
 1.

 Is there any way the timeline can be bumped up to the correct number on
 the new standby, without taking
 base backup. And is it safe to use the method described in

 http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-timeline-switch-of-slave-node-without-archives/
 to bump up the timeline.

 regards,
 Prakhar



Re: [ADMIN] Some replication-related notes and questions

2013-04-30 Thread Strahinja Kustudić
I can answer your first question. The way I check the replication delay is
by running this query on the replication server:

*SELECT now() - pg_last_xact_replay_timestamp();*

Of course you need to configure hot standby replication, which you should
if you are not.

Regards,
Strahinja


On Tue, Apr 30, 2013 at 9:42 PM, Scott Whitney sc...@journyx.com wrote:

 We recently moved to PG 9.2.4 (from 8.4.4) to take advantage of
 replication, and I have to say it's pretty awesome.

 I ran into some things that I was hoping someone could clarify.

 a) There appears to be no way to tell how far behind my standby servers
 are. That is, I can find a checkpoint with pg_controldata. I can find the
 sending/receiving WAL processes via ps or replay/receive_location().
 However there seems to be no correlation to real-world times or dates as in
 how many seconds/minutes/hours behind are my standby servers.

 b) This segues nicely into the archive_status/*.done files. The _only_
 files in my archive_status directory on my standby servers are .done files.
 Will these clean up on their own? Can I stat the last one to know the
 maximum discrepancy between my master and slave? If they do NOT clean up on
 their own, is it safe to remove them? I couldn't find any information on
 what that directory specifically does or whether .done files are required.

 c) It would appear that a FULL vacuum can hose replication (in the case of
 a long-standing hanging transaction). Has anyone else run into this?
 Specifically, my standby was up and running and replicating. Sunday morning
 (I do full vacuums on Saturday due to an old bug in an old database) my
 standby was trying to replay a WAL file that didn't exist because I had
 hanging transactions that began before my oldest WAL segment. Since the
 replication was up and running on Saturday, I was wondering whether full
 vacuum could have been to blame for the servers getting out ot sync.

 Thanks,
 Scott Whitney




Re: [ADMIN] archive falling behind

2013-04-26 Thread Strahinja Kustudić
What does your shell script do that you need a script for archiving? The
archive command is usually a cp/scp/rsync command, you usually don't need
more than that.

Regards,
Strahinja


On Fri, Apr 26, 2013 at 3:55 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote:

 German Becker wrote:
  Here is the archive part of the config:
 
  archive_mode = on   # allows archiving to be done
  # (change requires restart)
  archive_command = '/var/lib/postgresql/scripts/archive_copy.sh %p %f'
 # command to use to
  archive a logfile segment
  #archive_timeout = 0# force a logfile segment switch after
 this
  # number of seconds; 0 disables

 So the problem might be in that script.

  The archive coommand makes a local copy and then it copies to the backup
 server via ssh. Both copies
  are md5-checked and retried up to 3 times in case of failure.

 archive_command should not retry the operation, but rather
 return a non-zero return code.

  I have seen under heavy load that some WALs are skipped, some have less
 size, some are corrupted (i,e,
  the loop fails 3 times).
  I'm not sure about the return value (checking it). What is the expected
 behaviour of the archiver?
  Will it retry de archive if archive command returns differnt than 0?
 Will it retain the WAL segment
  until it is succesfuly archived?

 See
 http://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL

 archive_command should exit with zero only if the
 WAL segment was archived successfully.
 PostgreSQL will retry and retain the WAL segment until
 archival succeeds.

 Yours,
 Laurenz Albe



Re: [ADMIN] archive falling behind

2013-04-25 Thread Strahinja Kustudić
How can the archive process fall behind? Postgres will never reuse WAL
files which are not yet archived.

Regarding your question about slowing down WAL generation, that is not
possible to do, unless you slow down the application which is doing the
writing into the database.

Regards,
Strahinja

On Mon, Apr 22, 2013 at 6:03 PM, German Becker german.bec...@gmail.comwrote:

 Hi I am working with version 9.1.9. The dabase is in archiving mode. The
 archives are then sended to a hotstandby replication sever.
 I have seen that under heavy load, the archive process will fall behind
 the WAL generation and thuse some WAL segments won't get archived so the
 replication stops. To recover from this I need to do a new cold backup and
 send it to the replication server. Is there a way to limit the WAL
 generation (i.e slowing down insterts and deletes) so as to prevent the
 archive from falling behind?

 Thanks!!



Re: [ADMIN] postgres 9.0 - unable to bind to localhost.

2012-12-23 Thread Strahinja Kustudić
Sorry for a late reply, but I had the exact same problem and it was a bug
in the Red Hat RPM package upgrade script of the sudo package. This
basically means the user running Postgres cannot resolve hostname
localhost. Have you tried logging in as the user running Postgres and
trying to resolve localhost? In RHEL/Centos you would do this with:

*su - postgres*
*nslookup localhost*

If this returns 127.0.0.1, then something else is a problem, but if it says
the hostname cannot be resolved, then you need to check permissions of
/etc/hosts and /etc/nsswitch.conf, all users should be able to read those
two files. You can just run:

*chmod a+r /etc/hosts*
*chmod a+r /etc/nsswitch.conf*

Another workaround would be to replace 'localhost' in postgresql.conf with
'127.0.0.1'.

Regards,
Strahinja


On Mon, Dec 10, 2012 at 9:57 AM, Rajagopalan, Jayashree 
jayashree.rajagopa...@emc.com wrote:

 Hi all:

 ** **

 Getting this error in the postgres logs:

 2012-11-29 09:47:39.335 CST - SessionId:50b7839b.4791 - TransId:0 - LOG:
 could not translate host name localhost, service 5435 to address: Name
 or service not known

 2012-11-29 09:47:39.336 CST - SessionId:50b7839b.4791 - TransId:0 -
 WARNING:  could not create listen socket for localhost

 2012-11-29 09:47:39.343 CST - SessionId:50b7839b.4791 - TransId:0 - LOG:
 could not resolve localhost: Name or service not known

 ** **

 Also the netstat command reveals the port(5435) is not bound to 127.0.0.1.
 Checked the pg_hba.conf, /etc/hosts, nsswitch.conf, resolv.conf do not see
 any issues there.

 ** **

 the listen_addresses entry is like this:

 listen_addresses = 'localhost,host_ip'

 The port is bound to the host_ip but not to localhost.

 How does postgres resolve the local host?

 The entry is proper for localhost in /etc/hosts, and nsswitch.conf points
 hosts to files, dns.

 host localhost resolves to 127.0.0.1 in command line.

 what else to check?

 ** **

 Regards

 J