Re: [GENERAL] repmgr won't update witness after failover

2015-08-20 Thread Aviel Buskila
Hey,
Thanks for the reply, this helped me very much.

Kind Regards,
Aviel Buskila.
בתאריך 17 באוג' 2015 08:49, Jony Cohen jony.cohe...@gmail.com כתב:

 Hi,
 The clone command just clones the data from node2 to node1, you need to
 also register it with the `force` option to override the old record. (as if
 you're building a new replica node...)
 see:

 https://github.com/2ndQuadrant/repmgr#converting-a-failed-master-to-a-standby

 Regards,
  - Jony


 On Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila avie...@gmail.com wrote:

 Hey,
 I think I know what the problem is,
 after the first failover when I clone the old master to be standby with
 the 'repmgr standby clone' command it seems that nothing updates the
 repl_nodes table with the new standby in my cluster so on the next failover
 the repmgrd is failed to find a new upcoming standby to failover..

 this issue is confirmed after that I manually updated the repl_nodes
 table after the clone so that the old master is now a standby database.

 now my question is:
 Where does is suppose to happen that after I issue the 'repmgr standby
 clone' the repl_nodes should be updated too about the new standby server?

 Best regards,
 Aviel Buskila



 2015-08-16 12:11 GMT+03:00 Aviel Buskila avie...@gmail.com:

 hey,

 I have tried to set the configuration all over again, now the status of
 'repl_nodes' before the failover is:

 id | type | upstream_node_id | cluster | name | conninfo | priority |
 active

 +-+---++--+-
 1 | master |  | cluster_name |node1| host=node1
 dbname=repmgr port=5432 user=repmgr | 100 | t
 2 | standby|1| cluster_name |node2| host=node2
 dbname=repmgr port=5432 user=repmgr | 100 | t

 3 | witness|| cluster_name |node3| host=node3
 dbname=repmgr port=5499 user=repmgr | 100 | t


 repmgr is started on node2 and node3 (standby and witness) now when I
 kill postgresmaster process I can see in the

 repmgrd log the following messages:

 [WARNING] connection to master has been lost, trying to recover... 60
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 50
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 40
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 30
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 20
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 10
 seconds before failover decision


 and than when it tried to elect node2 to be promoted it shows the
 following messages:

 [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
 fallback_application_name='repmgr''

 [WARNING] unable to defermmine a valid master server; waiting 10 seconds
 to retry...

 [ERROR] unable to determine a valid master node, terminating...

 [INFO] repmgrd terminating..



 what am I doing wrong?


 El 14/08/15 a las 04:14, Aviel Buskila escribió:
  Hey,
  yes I did .. and still it wont fail back..

 Can you send over the output of repmgr cluster show before and after
 the failover process?

 The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
 (you need to change repmgr_schema with what you have configured).

 Also, which version of repmgr are you running?

  2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen 
 jony.cohe...@gmail.com:
 
  Hi, did you make the old master follow the new one using repmgr?
 
  It doesn't update itself automatically...
  From the looks of it repmgr thinks you have 2 masters - the old one
  offline and the new one online.

 Regards,

 --
 Martín Marquéshttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services





Re: [GENERAL] repmgr won't update witness after failover

2015-08-16 Thread Aviel Buskila
hey,

I have tried to set the configuration all over again, now the status of
'repl_nodes' before the failover is:

id | type | upstream_node_id | cluster | name | conninfo | priority | active
+-+---++--+-
1 | master |  | cluster_name |node1| host=node1
dbname=repmgr port=5432 user=repmgr | 100 | t
2 | standby|1| cluster_name |node2| host=node2
dbname=repmgr port=5432 user=repmgr | 100 | t

3 | witness|| cluster_name |node3| host=node3
dbname=repmgr port=5499 user=repmgr | 100 | t


repmgr is started on node2 and node3 (standby and witness) now when I kill
postgresmaster process I can see in the

repmgrd log the following messages:

[WARNING] connection to master has been lost, trying to recover... 60
seconds before failover decision

[WARNING] connection to master has been lost, trying to recover... 50
seconds before failover decision

[WARNING] connection to master has been lost, trying to recover... 40
seconds before failover decision

[WARNING] connection to master has been lost, trying to recover... 30
seconds before failover decision

[WARNING] connection to master has been lost, trying to recover... 20
seconds before failover decision

[WARNING] connection to master has been lost, trying to recover... 10
seconds before failover decision


and than when it tried to elect node2 to be promoted it shows the following
messages:

[DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
fallback_application_name='repmgr''

[WARNING] unable to defermmine a valid master server; waiting 10 seconds to
retry...

[ERROR] unable to determine a valid master node, terminating...

[INFO] repmgrd terminating..



what am I doing wrong?


El 14/08/15 a las 04:14, Aviel Buskila escribió:
 Hey,
 yes I did .. and still it wont fail back..

Can you send over the output of repmgr cluster show before and after
the failover process?

The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).

Also, which version of repmgr are you running?

 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com:

 Hi, did you make the old master follow the new one using repmgr?

 It doesn't update itself automatically...
 From the looks of it repmgr thinks you have 2 masters - the old one
 offline and the new one online.

Regards,

--
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


Re: [GENERAL] repmgr won't update witness after failover

2015-08-16 Thread Aviel Buskila
Hey,
I think I know what the problem is,
after the first failover when I clone the old master to be standby with the
'repmgr standby clone' command it seems that nothing updates the repl_nodes
table with the new standby in my cluster so on the next failover the
repmgrd is failed to find a new upcoming standby to failover..

this issue is confirmed after that I manually updated the repl_nodes table
after the clone so that the old master is now a standby database.

now my question is:
Where does is suppose to happen that after I issue the 'repmgr standby
clone' the repl_nodes should be updated too about the new standby server?

Best regards,
Aviel Buskila



2015-08-16 12:11 GMT+03:00 Aviel Buskila avie...@gmail.com:

 hey,

 I have tried to set the configuration all over again, now the status of
 'repl_nodes' before the failover is:

 id | type | upstream_node_id | cluster | name | conninfo | priority |
 active

 +-+---++--+-
 1 | master |  | cluster_name |node1| host=node1
 dbname=repmgr port=5432 user=repmgr | 100 | t
 2 | standby|1| cluster_name |node2| host=node2
 dbname=repmgr port=5432 user=repmgr | 100 | t

 3 | witness|| cluster_name |node3| host=node3
 dbname=repmgr port=5499 user=repmgr | 100 | t


 repmgr is started on node2 and node3 (standby and witness) now when I kill
 postgresmaster process I can see in the

 repmgrd log the following messages:

 [WARNING] connection to master has been lost, trying to recover... 60
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 50
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 40
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 30
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 20
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 10
 seconds before failover decision


 and than when it tried to elect node2 to be promoted it shows the
 following messages:

 [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
 fallback_application_name='repmgr''

 [WARNING] unable to defermmine a valid master server; waiting 10 seconds
 to retry...

 [ERROR] unable to determine a valid master node, terminating...

 [INFO] repmgrd terminating..



 what am I doing wrong?


 El 14/08/15 a las 04:14, Aviel Buskila escribió:
  Hey,
  yes I did .. and still it wont fail back..

 Can you send over the output of repmgr cluster show before and after
 the failover process?

 The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
 (you need to change repmgr_schema with what you have configured).

 Also, which version of repmgr are you running?

  2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com
 :
 
  Hi, did you make the old master follow the new one using repmgr?
 
  It doesn't update itself automatically...
  From the looks of it repmgr thinks you have 2 masters - the old one
  offline and the new one online.

 Regards,

 --
 Martín Marquéshttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



Re: [GENERAL] repmgr won't update witness after failover

2015-08-16 Thread Jony Cohen
Hi,
The clone command just clones the data from node2 to node1, you need to
also register it with the `force` option to override the old record. (as if
you're building a new replica node...)
see:
https://github.com/2ndQuadrant/repmgr#converting-a-failed-master-to-a-standby

Regards,
 - Jony


On Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila avie...@gmail.com wrote:

 Hey,
 I think I know what the problem is,
 after the first failover when I clone the old master to be standby with
 the 'repmgr standby clone' command it seems that nothing updates the
 repl_nodes table with the new standby in my cluster so on the next failover
 the repmgrd is failed to find a new upcoming standby to failover..

 this issue is confirmed after that I manually updated the repl_nodes table
 after the clone so that the old master is now a standby database.

 now my question is:
 Where does is suppose to happen that after I issue the 'repmgr standby
 clone' the repl_nodes should be updated too about the new standby server?

 Best regards,
 Aviel Buskila



 2015-08-16 12:11 GMT+03:00 Aviel Buskila avie...@gmail.com:

 hey,

 I have tried to set the configuration all over again, now the status of
 'repl_nodes' before the failover is:

 id | type | upstream_node_id | cluster | name | conninfo | priority |
 active

 +-+---++--+-
 1 | master |  | cluster_name |node1| host=node1
 dbname=repmgr port=5432 user=repmgr | 100 | t
 2 | standby|1| cluster_name |node2| host=node2
 dbname=repmgr port=5432 user=repmgr | 100 | t

 3 | witness|| cluster_name |node3| host=node3
 dbname=repmgr port=5499 user=repmgr | 100 | t


 repmgr is started on node2 and node3 (standby and witness) now when I
 kill postgresmaster process I can see in the

 repmgrd log the following messages:

 [WARNING] connection to master has been lost, trying to recover... 60
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 50
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 40
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 30
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 20
 seconds before failover decision

 [WARNING] connection to master has been lost, trying to recover... 10
 seconds before failover decision


 and than when it tried to elect node2 to be promoted it shows the
 following messages:

 [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr
 fallback_application_name='repmgr''

 [WARNING] unable to defermmine a valid master server; waiting 10 seconds
 to retry...

 [ERROR] unable to determine a valid master node, terminating...

 [INFO] repmgrd terminating..



 what am I doing wrong?


 El 14/08/15 a las 04:14, Aviel Buskila escribió:
  Hey,
  yes I did .. and still it wont fail back..

 Can you send over the output of repmgr cluster show before and after
 the failover process?

 The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
 (you need to change repmgr_schema with what you have configured).

 Also, which version of repmgr are you running?

  2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com
 :
 
  Hi, did you make the old master follow the new one using repmgr?
 
  It doesn't update itself automatically...
  From the looks of it repmgr thinks you have 2 masters - the old one
  offline and the new one online.

 Regards,

 --
 Martín Marquéshttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services




Re: [GENERAL] repmgr won't update witness after failover

2015-08-14 Thread Aviel Buskila
Hey,
yes I did .. and still it wont fail back..

2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com:

 Hi, did you make the old master follow the new one using repmgr?

 It doesn't update itself automatically...
 From the looks of it repmgr thinks you have 2 masters - the old one
 offline and the new one online.

 Regards,
  Jony
 Sent from my iPhone

 On 13 באוג׳ 2015, at 15:43, Aviel Buskila avie...@gmail.com wrote:

 Hey,
 I have just tried to start the repmgrd on the new standby after I have
 fixed it as a standby and still this goes the same way.

 from the message given in the repmgrd log in the witness server it seems
 that he is not able to elect a new master because he can't see anyone .

 I have check in the repl_nodes table in the witness and it shows:
 witnessnode3
 master node2
 master node1

 is there a way update the witness after the first failover?

 2015-08-13 15:06 GMT+03:00 Jony Cohen jony.cohe...@gmail.com:

 Hi Aviel,
 you can use the 'show cluster' command to see the repmgr state before you
 do the 2nd failover - make sure the node1 is indeed marked as replica.
 After a failover the Master doesn't automatically attach to the new
 master - you need to point him as a slave (standby follow - if possible...)
 did you start the repmgrd on node1 after making it a replica of the new
 master? (it needs 2 daemons to decide what to promote)

 Regards,
  - Jony


 On Thu, Aug 13, 2015 at 1:29 PM, Aviel Buskila avie...@gmail.com wrote:

 Hey,
 I have set up three nodes of postgresql 9.4 with repmgr in this way:
 1. master - node1
 2. standby - node2
 3. witness - node3

 Now I have set up the replication and the witness as it says here:
 https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rst

 Now when I do 'kill -9 $(pidof postmaster)' The witness detects that
 something went wrong and fails over from node1 to node2
 But when I setup the replication now to work from node2 to node1 and I
 kill the postgresql process it doesn't failover and the repmgrd log shows
 the following  message:
 unable to determine a valid master server; waiting 10 seconds to retry...

 it seems that the witness doesn't know about the new standby server..

 Has anyone got any idea about what am I doing wrong here?


 Best regards,
 Aviel Buskila





Re: [GENERAL] repmgr won't update witness after failover

2015-08-14 Thread Martín Marqués
El 14/08/15 a las 04:14, Aviel Buskila escribió:
 Hey,
 yes I did .. and still it wont fail back..

Can you send over the output of repmgr cluster show before and after
the failover process?

The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover
(you need to change repmgr_schema with what you have configured).

Also, which version of repmgr are you running?

 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen jony.cohe...@gmail.com:
 
 Hi, did you make the old master follow the new one using repmgr?

 It doesn't update itself automatically...
 From the looks of it repmgr thinks you have 2 masters - the old one
 offline and the new one online.

Regards,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] repmgr won't update witness after failover

2015-08-13 Thread Jony Cohen
Hi Aviel,
you can use the 'show cluster' command to see the repmgr state before you
do the 2nd failover - make sure the node1 is indeed marked as replica.
After a failover the Master doesn't automatically attach to the new master
- you need to point him as a slave (standby follow - if possible...)
did you start the repmgrd on node1 after making it a replica of the new
master? (it needs 2 daemons to decide what to promote)

Regards,
 - Jony

On Thu, Aug 13, 2015 at 1:29 PM, Aviel Buskila avie...@gmail.com wrote:

 Hey,
 I have set up three nodes of postgresql 9.4 with repmgr in this way:
 1. master - node1
 2. standby - node2
 3. witness - node3

 Now I have set up the replication and the witness as it says here:
 https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rst

 Now when I do 'kill -9 $(pidof postmaster)' The witness detects that
 something went wrong and fails over from node1 to node2
 But when I setup the replication now to work from node2 to node1 and I
 kill the postgresql process it doesn't failover and the repmgrd log shows
 the following  message:
 unable to determine a valid master server; waiting 10 seconds to retry...

 it seems that the witness doesn't know about the new standby server..

 Has anyone got any idea about what am I doing wrong here?


 Best regards,
 Aviel Buskila