Re: [GENERAL] repmgr won't update witness after failover
Hey, Thanks for the reply, this helped me very much. Kind Regards, Aviel Buskila. בתאריך 17 באוג' 2015 08:49, Jony Cohen jony.cohe...@gmail.com כתב: Hi, The clone command just clones the data from node2 to node1, you need to also register it with the `force` option to override the old record. (as if you're building a new replica node...) see: https://github.com/2ndQuadrant/repmgr#converting-a-failed-master-to-a-standby Regards, - Jony On Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila avie...@gmail.com wrote: Hey, I think I know what the problem is, after the first failover when I clone the old master to be standby with the 'repmgr standby clone' command it seems that nothing updates the repl_nodes table with the new standby in my cluster so on the next failover the repmgrd is failed to find a new upcoming standby to failover.. this issue is confirmed after that I manually updated the repl_nodes table after the clone so that the old master is now a standby database. now my question is: Where does is suppose to happen that after I issue the 'repmgr standby clone' the repl_nodes should be updated too about the new standby server? Best regards, Aviel Buskila 2015-08-16 12:11 GMT+03:00 Aviel Buskila avie...@gmail.com: hey, I have tried to set the configuration all over again, now the status of 'repl_nodes' before the failover is: id | type | upstream_node_id | cluster | name | conninfo | priority | active +-+---++--+- 1 | master | | cluster_name |node1| host=node1 dbname=repmgr port=5432 user=repmgr | 100 | t 2 | standby|1| cluster_name |node2| host=node2 dbname=repmgr port=5432 user=repmgr | 100 | t 3 | witness|| cluster_name |node3| host=node3 dbname=repmgr port=5499 user=repmgr | 100 | t repmgr is started on node2 and node3 (standby and witness) now when I kill postgresmaster process I can see in the repmgrd log the following messages: [WARNING] connection to master has been lost, trying to recover... 60 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 50 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 40 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 30 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 20 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 10 seconds before failover decision and than when it tried to elect node2 to be promoted it shows the following messages: [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr fallback_application_name='repmgr'' [WARNING] unable to defermmine a valid master server; waiting 10 seconds to retry... [ERROR] unable to determine a valid master node, terminating... [INFO] repmgrd terminating.. what am I doing wrong? El 14/08/15 a las 04:14, Aviel Buskila escribió: Hey, yes I did .. and still it wont fail back.. Can you send over the output of repmgr cluster show before and after the failover process? The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover (you need to change repmgr_schema with what you have configured). Also, which version of repmgr are you running? 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com: Hi, did you make the old master follow the new one using repmgr? It doesn't update itself automatically... From the looks of it repmgr thinks you have 2 masters - the old one offline and the new one online. Regards, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [GENERAL] repmgr won't update witness after failover
hey, I have tried to set the configuration all over again, now the status of 'repl_nodes' before the failover is: id | type | upstream_node_id | cluster | name | conninfo | priority | active +-+---++--+- 1 | master | | cluster_name |node1| host=node1 dbname=repmgr port=5432 user=repmgr | 100 | t 2 | standby|1| cluster_name |node2| host=node2 dbname=repmgr port=5432 user=repmgr | 100 | t 3 | witness|| cluster_name |node3| host=node3 dbname=repmgr port=5499 user=repmgr | 100 | t repmgr is started on node2 and node3 (standby and witness) now when I kill postgresmaster process I can see in the repmgrd log the following messages: [WARNING] connection to master has been lost, trying to recover... 60 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 50 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 40 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 30 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 20 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 10 seconds before failover decision and than when it tried to elect node2 to be promoted it shows the following messages: [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr fallback_application_name='repmgr'' [WARNING] unable to defermmine a valid master server; waiting 10 seconds to retry... [ERROR] unable to determine a valid master node, terminating... [INFO] repmgrd terminating.. what am I doing wrong? El 14/08/15 a las 04:14, Aviel Buskila escribió: Hey, yes I did .. and still it wont fail back.. Can you send over the output of repmgr cluster show before and after the failover process? The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover (you need to change repmgr_schema with what you have configured). Also, which version of repmgr are you running? 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com: Hi, did you make the old master follow the new one using repmgr? It doesn't update itself automatically... From the looks of it repmgr thinks you have 2 masters - the old one offline and the new one online. Regards, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [GENERAL] repmgr won't update witness after failover
Hey, I think I know what the problem is, after the first failover when I clone the old master to be standby with the 'repmgr standby clone' command it seems that nothing updates the repl_nodes table with the new standby in my cluster so on the next failover the repmgrd is failed to find a new upcoming standby to failover.. this issue is confirmed after that I manually updated the repl_nodes table after the clone so that the old master is now a standby database. now my question is: Where does is suppose to happen that after I issue the 'repmgr standby clone' the repl_nodes should be updated too about the new standby server? Best regards, Aviel Buskila 2015-08-16 12:11 GMT+03:00 Aviel Buskila avie...@gmail.com: hey, I have tried to set the configuration all over again, now the status of 'repl_nodes' before the failover is: id | type | upstream_node_id | cluster | name | conninfo | priority | active +-+---++--+- 1 | master | | cluster_name |node1| host=node1 dbname=repmgr port=5432 user=repmgr | 100 | t 2 | standby|1| cluster_name |node2| host=node2 dbname=repmgr port=5432 user=repmgr | 100 | t 3 | witness|| cluster_name |node3| host=node3 dbname=repmgr port=5499 user=repmgr | 100 | t repmgr is started on node2 and node3 (standby and witness) now when I kill postgresmaster process I can see in the repmgrd log the following messages: [WARNING] connection to master has been lost, trying to recover... 60 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 50 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 40 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 30 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 20 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 10 seconds before failover decision and than when it tried to elect node2 to be promoted it shows the following messages: [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr fallback_application_name='repmgr'' [WARNING] unable to defermmine a valid master server; waiting 10 seconds to retry... [ERROR] unable to determine a valid master node, terminating... [INFO] repmgrd terminating.. what am I doing wrong? El 14/08/15 a las 04:14, Aviel Buskila escribió: Hey, yes I did .. and still it wont fail back.. Can you send over the output of repmgr cluster show before and after the failover process? The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover (you need to change repmgr_schema with what you have configured). Also, which version of repmgr are you running? 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com : Hi, did you make the old master follow the new one using repmgr? It doesn't update itself automatically... From the looks of it repmgr thinks you have 2 masters - the old one offline and the new one online. Regards, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [GENERAL] repmgr won't update witness after failover
Hi, The clone command just clones the data from node2 to node1, you need to also register it with the `force` option to override the old record. (as if you're building a new replica node...) see: https://github.com/2ndQuadrant/repmgr#converting-a-failed-master-to-a-standby Regards, - Jony On Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila avie...@gmail.com wrote: Hey, I think I know what the problem is, after the first failover when I clone the old master to be standby with the 'repmgr standby clone' command it seems that nothing updates the repl_nodes table with the new standby in my cluster so on the next failover the repmgrd is failed to find a new upcoming standby to failover.. this issue is confirmed after that I manually updated the repl_nodes table after the clone so that the old master is now a standby database. now my question is: Where does is suppose to happen that after I issue the 'repmgr standby clone' the repl_nodes should be updated too about the new standby server? Best regards, Aviel Buskila 2015-08-16 12:11 GMT+03:00 Aviel Buskila avie...@gmail.com: hey, I have tried to set the configuration all over again, now the status of 'repl_nodes' before the failover is: id | type | upstream_node_id | cluster | name | conninfo | priority | active +-+---++--+- 1 | master | | cluster_name |node1| host=node1 dbname=repmgr port=5432 user=repmgr | 100 | t 2 | standby|1| cluster_name |node2| host=node2 dbname=repmgr port=5432 user=repmgr | 100 | t 3 | witness|| cluster_name |node3| host=node3 dbname=repmgr port=5499 user=repmgr | 100 | t repmgr is started on node2 and node3 (standby and witness) now when I kill postgresmaster process I can see in the repmgrd log the following messages: [WARNING] connection to master has been lost, trying to recover... 60 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 50 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 40 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 30 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 20 seconds before failover decision [WARNING] connection to master has been lost, trying to recover... 10 seconds before failover decision and than when it tried to elect node2 to be promoted it shows the following messages: [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr fallback_application_name='repmgr'' [WARNING] unable to defermmine a valid master server; waiting 10 seconds to retry... [ERROR] unable to determine a valid master node, terminating... [INFO] repmgrd terminating.. what am I doing wrong? El 14/08/15 a las 04:14, Aviel Buskila escribió: Hey, yes I did .. and still it wont fail back.. Can you send over the output of repmgr cluster show before and after the failover process? The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover (you need to change repmgr_schema with what you have configured). Also, which version of repmgr are you running? 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com : Hi, did you make the old master follow the new one using repmgr? It doesn't update itself automatically... From the looks of it repmgr thinks you have 2 masters - the old one offline and the new one online. Regards, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [GENERAL] repmgr won't update witness after failover
Hey, yes I did .. and still it wont fail back.. 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com: Hi, did you make the old master follow the new one using repmgr? It doesn't update itself automatically... From the looks of it repmgr thinks you have 2 masters - the old one offline and the new one online. Regards, Jony Sent from my iPhone On 13 באוג׳ 2015, at 15:43, Aviel Buskila avie...@gmail.com wrote: Hey, I have just tried to start the repmgrd on the new standby after I have fixed it as a standby and still this goes the same way. from the message given in the repmgrd log in the witness server it seems that he is not able to elect a new master because he can't see anyone . I have check in the repl_nodes table in the witness and it shows: witnessnode3 master node2 master node1 is there a way update the witness after the first failover? 2015-08-13 15:06 GMT+03:00 Jony Cohen jony.cohe...@gmail.com: Hi Aviel, you can use the 'show cluster' command to see the repmgr state before you do the 2nd failover - make sure the node1 is indeed marked as replica. After a failover the Master doesn't automatically attach to the new master - you need to point him as a slave (standby follow - if possible...) did you start the repmgrd on node1 after making it a replica of the new master? (it needs 2 daemons to decide what to promote) Regards, - Jony On Thu, Aug 13, 2015 at 1:29 PM, Aviel Buskila avie...@gmail.com wrote: Hey, I have set up three nodes of postgresql 9.4 with repmgr in this way: 1. master - node1 2. standby - node2 3. witness - node3 Now I have set up the replication and the witness as it says here: https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rst Now when I do 'kill -9 $(pidof postmaster)' The witness detects that something went wrong and fails over from node1 to node2 But when I setup the replication now to work from node2 to node1 and I kill the postgresql process it doesn't failover and the repmgrd log shows the following message: unable to determine a valid master server; waiting 10 seconds to retry... it seems that the witness doesn't know about the new standby server.. Has anyone got any idea about what am I doing wrong here? Best regards, Aviel Buskila
Re: [GENERAL] repmgr won't update witness after failover
El 14/08/15 a las 04:14, Aviel Buskila escribió: Hey, yes I did .. and still it wont fail back.. Can you send over the output of repmgr cluster show before and after the failover process? The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover (you need to change repmgr_schema with what you have configured). Also, which version of repmgr are you running? 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com: Hi, did you make the old master follow the new one using repmgr? It doesn't update itself automatically... From the looks of it repmgr thinks you have 2 masters - the old one offline and the new one online. Regards, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] repmgr won't update witness after failover
Hi Aviel, you can use the 'show cluster' command to see the repmgr state before you do the 2nd failover - make sure the node1 is indeed marked as replica. After a failover the Master doesn't automatically attach to the new master - you need to point him as a slave (standby follow - if possible...) did you start the repmgrd on node1 after making it a replica of the new master? (it needs 2 daemons to decide what to promote) Regards, - Jony On Thu, Aug 13, 2015 at 1:29 PM, Aviel Buskila avie...@gmail.com wrote: Hey, I have set up three nodes of postgresql 9.4 with repmgr in this way: 1. master - node1 2. standby - node2 3. witness - node3 Now I have set up the replication and the witness as it says here: https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rst Now when I do 'kill -9 $(pidof postmaster)' The witness detects that something went wrong and fails over from node1 to node2 But when I setup the replication now to work from node2 to node1 and I kill the postgresql process it doesn't failover and the repmgrd log shows the following message: unable to determine a valid master server; waiting 10 seconds to retry... it seems that the witness doesn't know about the new standby server.. Has anyone got any idea about what am I doing wrong here? Best regards, Aviel Buskila