hello Thanks for your comments. They are very helpful. If you have any draft level documentation available of the upgrade procedure I would be very interested in seeing it and maybe trying it out as well.
Meanwhile I did some more testing with my environment using repmgr3 and noticed an issue with promoting standby node. Here is roughly what I did. 1. Install repmgr3.1.2 RPM to all nodes as upgrade to previous 2.0.2. 2. I took repmgr release upgrade SQL-scripts from github 3.1 stable and ran those on master (all three scripts in order). 3. on master I stopped postgresql service 4. on standby I said standby promote which does some things and then hangs forever. This standby promote was working fine before repmgr upgrade. There is a COMMIT command visible with ps: 3324 ? Ss 0:00 postgres: repmgr repmgr <new master IP>(43666) COMMIT waiting for 2/4E000548 What I see in the console is here: [2016-08-08 10:29:03] [NOTICE] using configuration file "/var/lib/pgsql/repmgr/repmgr.conf" [2016-08-08 10:29:03] [INFO] connecting to standby database [2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr'' [2016-08-08 10:29:03] [INFO] connected to standby, checking its state [2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery() [2016-08-08 10:29:03] [INFO] retrieving node list for cluster 'sensorlog' [2016-08-08 10:29:03] [DEBUG] get_master_connection(): SELECT id, conninfo, CASE WHEN type = 'master' THEN 1 ELSE 2 END AS type_priority FROM "repmgr_sensorlog".repl_nodes WHERE cluster = 'sensorlog' AND type != 'witness' ORDER BY active DESC, type_priority, priority, id [2016-08-08 10:29:03] [INFO] checking role of cluster node '1' [2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<old master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr'' [2016-08-08 10:29:03] [ERROR] connection to database failed: could not connect to server: Connection refused Is the server running on host "<old master IP>" and accepting TCP/IP connections on port 5432? [2016-08-08 10:29:03] [INFO] checking role of cluster node '2' [2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr'' [2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery() [2016-08-08 10:29:03] [INFO] checking role of cluster node '3' [2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<old master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr'' [2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery() [2016-08-08 10:29:03] [NOTICE] promoting standby [2016-08-08 10:29:03] [DEBUG] get_pg_setting(): SELECT name, setting FROM pg_settings WHERE name = 'data_directory' [2016-08-08 10:29:03] [DEBUG] get_pg_setting(): returned value is "/var/lib/pgsql/data" [2016-08-08 10:29:03] [NOTICE] promoting server using '/usr/bin/pg_ctl -D /var/lib/pgsql/data promote' server promoting [2016-08-08 10:29:03] [INFO] reconnecting to promoted server [2016-08-08 10:29:03] [DEBUG] connecting to: 'host=<new master IP> user=repmgr dbname=repmgr fallback_application_name='repmgr'' [2016-08-08 10:29:03] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery() [2016-08-08 10:29:05] [DEBUG] is_standby(): SELECT pg_catalog.pg_is_in_recovery() [2016-08-08 10:29:05] [DEBUG] setting node 2 as master and marking existing master as failed [2016-08-08 10:29:05] [DEBUG] begin_transaction() [2016-08-08 10:29:05] [DEBUG] commit_transaction() The system is left in a strange state after this. If I start postgresql again in old master node and issue cluster show it lists both nodes as masters. In this experiment I did not take slot based replication into use at all. Everything is left just as it was before except the repmgr upgrade. br, Pekka 2016-08-04 16:19 GMT+03:00 Ian Barwick <i...@2ndquadrant.com>: > Hi > > On 08/04/2016 05:57 PM, Pekka Rinne wrote: > >> hi! >> >> I have been using postgres 9.4 and repmgr2.0 combination and been doing >> replication (hot standby). Now I'd like to start doing slot based >> replication and have installed repmgr3 and exeuted the provided sql >> scripts >> and also added use_replication_slots=1 into repmgr.conf. >> >> The question is that what is the correct procedure to switch into using >> slots (max_replication_slots) in this case as the system has been set up >> already without them? Do I have to unregister and re-register all the >> standbys? Hopefully re-clone could be avoided. >> > > No reclone needed. > > What I tried was that I configured max_replication_hosts=5, restarted >> master, created some slots using select * from >> pg_create_physical_replication_slot(<name>), configured one created slot >> into recovery.conf in the slave. What I noticed was that replication >> seemed >> to be still working after this but in repl_nodes table slot_name remained >> empty. Then I did standby re-register with force and slot_name was filled >> with repmgr_slot_2 value which is not the name I gave for the slot. I >> think >> repmgr invented this name but in the pg_replication_slots table >> repmgr_slot_2 does not exist. There is only the slot I created myself >> (active=t). So I guess this approach is not quite right. >> >> What if I just skip doing the re-register. Does is matter if slot_name >> remains empty in repl_nodes? >> > > This won't affect replication, however if you attempt any failover actions > using repmgr (e.g. following a new master), it will probably cause > problems when > attempting to create a replication slot on the new master. > > As a workaround you can manually update the slot name in the repl_nodes > table > to match the one you've chosen. We'll update repmgr to better handle this > kind of situation. I don't think we've had this particular use-case before, > so I'll add some notes to the documentation on how best to handle it. > > > Regards > > Ian Barwick > > -- > Ian Barwick http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >