from:"范国腾"

[ClusterLabs] 答复: The slave not does not promote to master

2018-05-07 Thread 范国腾

Thank you, Klaus. There is no fencing device in our network according to the 
request. Is there any other way to configure the cluster to make it work?


发件人: Klaus Wenninger [mailto:kwenn...@redhat.com]
发送时间: 2018年5月7日 14:40
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>; 范国腾 <fanguot...@highgo.com>
主题: Re: [ClusterLabs] The slave not does not promote to master

On 05/07/2018 07:39 AM, 范国腾 wrote:

Hi,



We have two nodes cluster using PAF to manage the postgres. Node2 is master. 
Master/Slave Set: pgsql-ha [pgsqld]

 Master: [sds2]

 Slaves: [ sds1 ]



In the master node(sds2), I remove the data directory of postgres. I expect the 
master nodes(sds2) stop and the slave node(sds1) is promoted to master.

The sds2 log show that is executes monitor->notify->demote->notify->stop. The 
sds1 log also show " Promote pgsqld:0#011(Slave -> Master sds1)". But the "pcs 
status" shows the status like the following. Could you please help check what 
prevents the promotion happen in sds1? What should I do if I want to recovery 
the system?

Didn't check all detail but looks as if stopping the resource would
fail. So that it doesn't know the state on sds2 and thus can't
promote on sds1.
If you had enabled fencing this would lead to sds2 being fenced
so that sds1 can take over.

As digimer would say: "use fencing!"

Regards,
Klaus







2 nodes configured

3 resources configured

Online: [ sds1 sds2 ]

Full list of resources:

 Master/Slave Set: pgsql-ha [pgsqld]

 pgsqld (ocf::heartbeat:pgsqlms):   FAILED Master sds2 (blocked)

 Slaves: [ sds1 ]

 Resource Group: mastergroup

 master-vip (ocf::heartbeat:IPaddr2):   Started sds2

Failed Actions:

* pgsqld_stop_0 on sds2 'invalid parameter' (2): call=42, status=complete, 
exitreason='PGDATA "/home/highgo/highgo/database/4.3.1/data" does not exists',

last-rc-change='Mon May  7 00:39:06 2018', queued=1ms, exec=72ms







Here is the sds2 log:

May  7 00:38:46 node2 pgsqlms(pgsqld)[14000]: INFO: Execute action monitor and 
the result 8

May  7 00:38:56 node2 pgsqlms(pgsqld)[14077]: INFO: Execute action monitor and 
the result 8

May  7 00:39:06 node2 pgsqlms(pgsqld)[14152]: ERROR: PGDATA 
"/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_monitor_1:14152:stderr [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_1:36 [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists\n ]

May  7 00:39:06 node2 pgsqlms(pgsqld)[14162]: ERROR: PGDATA 
"/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14162:stderr [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
pgsqld on sds2: 0 (ok)

May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_monitor_1:36 [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists\n ]

May  7 00:39:06 node2 pgsqlms(pgsqld)[14172]: ERROR: PGDATA 
"/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_demote_0:14172:stderr [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of demote operation for 
pgsqld on sds2: 2 (invalid parameter)

May  7 00:39:06 node2 crmd[1129]:  notice: sds2-pgsqld_demote_0:39 [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists\n ]

May  7 00:39:06 node2 pgsqlms(pgsqld)[14182]: ERROR: PGDATA 
"/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14182:stderr [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
pgsqld on sds2: 0 (ok)

May  7 00:39:06 node2 pgsqlms(pgsqld)[14192]: ERROR: PGDATA 
"/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_notify_0:14192:stderr [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists ]

May  7 00:39:06 node2 crmd[1129]:  notice: Result of notify operation for 
pgsqld on sds2: 0 (ok)

May  7 00:39:06 node2 pgsqlms(pgsqld)[14202]: ERROR: PGDATA 
"/home/highgo/highgo/database/4.3.1/data" does not exists

May  7 00:39:06 node2 lrmd[1126]:  notice: pgsqld_stop_0:14202:stderr [ 
ocf-exit-reason:PGDATA "/home/highgo/highgo/database/4.3.1/data" does not 
exists ]

May  7 00:39:06

[ClusterLabs] How to change the "pcs constraint colocation set"

2018-05-14 Thread 范国腾

Hi,

We have two VIP resources and we use the following command to make them in 
different node.

pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 setoptions 
score=-1000

Now we add a new node into the cluster and we add a new VIP too. We want the 
constraint colocation set to change to be:
pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 pgsql-slave-ip3 
setoptions score=-1000
 
How should we change the constraint set?

Thanks
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: How to change the "pcs constraint colocation set"

2018-05-15 Thread 范国腾

Thank you, Tomas. I know how to remove a constraint " pcs constraint colocation 
remove   ". Is there a command to 
delete a constraint colocation set?

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
发送时间: 2018年5月15日 15:42
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] How to change the "pcs constraint colocation set"

Dne 15.5.2018 v 05:25 范国腾 napsal(a):
> Hi,
> 
> We have two VIP resources and we use the following command to make them in 
> different node.
> 
> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 
> setoptions score=-1000
> 
> Now we add a new node into the cluster and we add a new VIP too. We want the 
> constraint colocation set to change to be:
> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 
> pgsql-slave-ip3 setoptions score=-1000
>   
> How should we change the constraint set?
> 
> Thanks

Hi,

pcs provides no commands for editing existing constraints. You can create a new 
constraint and remove the old one. If you want to do it as a single change from 
pacemaker's point of view, follow this procedure:

[root@node1:~]# pcs cluster cib cib1.xml [root@node1:~]# cp cib1.xml cib2.xml 
[root@node1:~]# pcs -f cib2.xml constraint list --full Location Constraints:
Ordering Constraints:
Colocation Constraints:
   Resource Sets:
 set pgsql-slave-ip1 pgsql-slave-ip2
(id:pcs_rsc_set_pgsql-slave-ip1_pgsql-slave-ip2) setoptions score=-1000
(id:pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2)
Ticket Constraints:
[root@node1:~]# pcs -f cib2.xml constraint remove
pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2
[root@node1:~]# pcs -f cib2.xml constraint colocation set
pgsql-slave-ip1 pgsql-slave-ip2 pgsql-slave-ip3 setoptions score=-1000 
[root@node1:~]# pcs cluster cib-push cib2.xml diff-against=cib1.xml CIB updated


Pcs older than 0.9.156 does not support the diff-against option, you can do it 
like this:

[root@node1:~]# pcs cluster cib cib.xml
[root@node1:~]# pcs -f cib.xml constraint list --full Location Constraints:
Ordering Constraints:
Colocation Constraints:
   Resource Sets:
 set pgsql-slave-ip1 pgsql-slave-ip2
(id:pcs_rsc_set_pgsql-slave-ip1_pgsql-slave-ip2) setoptions score=-1000
(id:pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2)
Ticket Constraints:
[root@node1:~]# pcs -f cib.xml constraint remove
pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2
[root@node1:~]# pcs -f cib.xml constraint colocation set pgsql-slave-ip1
pgsql-slave-ip2 pgsql-slave-ip3 setoptions score=-1000 [root@node1:~]# pcs 
cluster cib-push cib.xml CIB updated


Regards,
Tomas
___
Users mailing list: Users@clusterlabs.org 
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] “pcs cluster stop -all” hangs and

2018-05-10 Thread 范国腾

Hi,

When I run the "pcs cluster stop --all", it will hang and there is no any 
response sometimes. The log is as below. Could we find the reason why it hangs 
from the log and how to make the cluster stop right now? 

[root@node2 pg_log]# pcs status
Cluster name: hgpurog
Stack: corosync
Current DC: sds1 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Fri May 11 01:11:26 2018
Last change: Fri May 11 01:09:24 2018 by hacluster via crmd on sds1

2 nodes configured
3 resources configured

Online: [ sds1 sds2 ]

Full list of resources:

 Master/Slave Set: pgsql-ha [pgsqld]
 Stopped: [ sds1 sds2 ]
 Resource Group: mastergroup
 master-vip (ocf::heartbeat:IPaddr2):   Started sds1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@node2 pg_log]# pcs cluster stop --all


The /var/log/messages is as asbelow:
May 11 01:07:50 node2 crmd[5365]:  notice: State transition S_PENDING -> 
S_NOT_DC
May 11 01:07:50 node2 crmd[5365]:  notice: State transition S_NOT_DC -> 
S_PENDING
May 11 01:07:50 node2 crmd[5365]:  notice: State transition S_PENDING -> 
S_NOT_DC
May 11 01:07:51 node2 pgsqlms(pgsqld)[5371]: INFO: Execute action monitor and 
the result 7
May 11 01:07:51 node2 pgsqlms(undef)[5408]: INFO: Execute action meta-data and 
the result 0
May 11 01:07:51 node2 crmd[5365]:  notice: Result of probe operation for pgsqld 
on sds2: 7 (not running)
May 11 01:07:51 node2 crmd[5365]:  notice: sds2-pgsqld_monitor_0:6 [ /tmp:5866 
- no response\n ]
May 11 01:07:51 node2 crmd[5365]:  notice: Result of probe operation for 
master-vip on sds2: 7 (not running)
May 11 01:10:02 node2 systemd: Started Session 16 of user root.
May 11 01:10:02 node2 systemd: Starting Session 16 of user root.
May 11 01:11:33 node2 pacemakerd[5357]:  notice: Caught 'Terminated' signal
May 11 01:11:33 node2 systemd: Stopping Pacemaker High Availability Cluster 
Manager...
May 11 01:11:33 node2 pacemakerd[5357]:  notice: Shutting down Pacemaker
May 11 01:11:33 node2 pacemakerd[5357]:  notice: Stopping crmd
May 11 01:11:33 node2 crmd[5365]:  notice: Caught 'Terminated' signal
May 11 01:11:33 node2 crmd[5365]:  notice: Shutting down cluster resource 
manager
May 11 01:12:49 node2 systemd: Started Session 17 of user root.
May 11 01:12:49 node2 systemd-logind: New session 17 of user root.
May 11 01:12:49 node2 gdm-launch-environment]: AccountsService: ActUserManager: 
user (null) has no username (object path: /org/freedesktop/Accounts/User0, uid: 
0)
May 11 01:12:49 node2 journal: ActUserManager: user (null) has no username 
(object path: /org/freedesktop/Accounts/User0, uid: 0)
May 11 01:12:49 node2 systemd: Starting Session 17 of user root.
May 11 01:12:49 node2 dbus[648]: [system] Activating service 
name='org.freedesktop.problems' (using servicehelper)
May 11 01:12:49 node2 dbus-daemon: dbus[648]: [system] Activating service 
name='org.freedesktop.problems' (using servicehelper)
May 11 01:12:49 node2 dbus[648]: [system] Successfully activated service 
'org.freedesktop.problems'
May 11 01:12:49 node2 dbus-daemon: dbus[648]: [system] Successfully activated 
service 'org.freedesktop.problems'
May 11 01:12:49 node2 journal: g_dbus_interface_skeleton_unexport: assertion 
'interface_->priv->connections != NULL' failed

Here is the log in the peer node
May 11 01:09:08 node1 pgsqlms(pgsqld)[28599]: WARNING: No secondary connected 
to the master
May 11 01:09:08 node1 pgsqlms(pgsqld)[28599]: WARNING: "sds2" is not connected 
to the primary
May 11 01:09:08 node1 pgsqlms(pgsqld)[28599]: INFO: Execute action monitor and 
the result 8
May 11 01:09:18 node1 pgsqlms(pgsqld)[28679]: WARNING: No secondary connected 
to the master
May 11 01:09:18 node1 pgsqlms(pgsqld)[28679]: WARNING: "sds2" is not connected 
to the primary
May 11 01:09:18 node1 pgsqlms(pgsqld)[28679]: INFO: Execute action monitor and 
the result 8
May 11 01:09:24 node1 crmd[]:  notice: sds1-pgsqld_monitor_1:19 [ 
/tmp:5866 - accepting connections\n ]
May 11 01:09:24 node1 crmd[]:  notice: Transition aborted by deletion of 
lrm_resource[@id='pgsqld']: Resource state removal
May 11 01:10:02 node1 systemd: Started Session 17 of user root.
May 11 01:10:02 node1 systemd: Starting Session 17 of user root.
May 11 01:11:33 node1 pacemakerd[1042]:  notice: Caught 'Terminated' signal
May 11 01:11:33 node1 systemd: Stopping Pacemaker High Availability Cluster 
Manager...
May 11 01:11:33 node1 pacemakerd[1042]:  notice: Shutting down Pacemaker
May 11 01:11:33 node1 pacemakerd[1042]:  notice: Stopping crmd
May 11 01:11:33 node1 crmd[]:  notice: Caught 'Terminated' signal
May 11 01:11:33 node1 crmd[]:  notice: Shutting down cluster resource 
manager
May 11 01:11:33 node1 crmd[]: warning: Input I_SHUTDOWN received in state 
S_TRANSITION_ENGINE from crm_shutdown


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home:

[ClusterLabs] 答复: 答复: How to change the "pcs constraint colocation set"

2018-05-15 Thread 范国腾

It could not find the id of constraint set.

[root@node1 ~]# pcs constraint colocation --full
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) 
(id:colocation-clvmd-clone-dlm-clone-INFINITY)
  pgsql-master-ip with pgsql-ha (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Master) (id:colocation-pgsql-master-ip-pgsql-ha-INFINITY)
  pgsql-slave-ip2 with pgsql-ha (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Slave) (id:colocation-pgsql-slave-ip2-pgsql-ha-INFINITY)
  pgsql-slave-ip3 with pgsql-ha (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Slave) (id:colocation-pgsql-slave-ip3-pgsql-ha-INFINITY)
  Resource Sets:
set pgsql-slave-ip2 (id:pcs_rsc_set_pgsql-slave-ip2) setoptions score=-1000 
(id:pcs_rsc_colocation_set_pgsql-slave-ip2)
set pgsql-slave-ip2 pgsql-slave-ip3 
(id:pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3) setoptions score=-1000 
(id:pcs_rsc_colocation_set_pgsql-slave-ip2_pgsql-slave-ip3)
set pgsql-slave-ip2 pgsql-slave-ip3 
(id:pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3-1) setoptions score=-INFINITY 
(id:pcs_rsc_colocation_set_pgsql-slave-ip2_pgsql-slave-ip3-1)
[root@node1 ~]# pcs constraint remove pcs_rsc_set_pgsql-slave-ip2
Error: Unable to find constraint - 'pcs_rsc_set_pgsql-slave-ip2'
[root@node1 ~]# pcs constraint remove 
pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3
Error: Unable to find constraint - 'pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3'
[root@node1 ~]#

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
发送时间: 2018年5月15日 16:12
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] 答复: How to change the "pcs constraint colocation set"

Dne 15.5.2018 v 10:02 范国腾 napsal(a):
> Thank you, Tomas. I know how to remove a constraint " pcs constraint 
> colocation remove   ". Is there a 
> command to delete a constraint colocation set?

There is "pcs constraint remove ". To get a constraint id, run 
"pcs constraint colocation --full" and find the constraint you want to remove.


> 
> -邮件原件-
> 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> 发送时间: 2018年5月15日 15:42
> 收件人: users@clusterlabs.org
> 主题: Re: [ClusterLabs] How to change the "pcs constraint colocation set"
> 
> Dne 15.5.2018 v 05:25 范国腾 napsal(a):
>> Hi,
>>
>> We have two VIP resources and we use the following command to make them in 
>> different node.
>>
>> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 
>> setoptions score=-1000
>>
>> Now we add a new node into the cluster and we add a new VIP too. We want the 
>> constraint colocation set to change to be:
>> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2
>> pgsql-slave-ip3 setoptions score=-1000
>>
>> How should we change the constraint set?
>>
>> Thanks
> 
> Hi,
> 
> pcs provides no commands for editing existing constraints. You can create a 
> new constraint and remove the old one. If you want to do it as a single 
> change from pacemaker's point of view, follow this procedure:
> 
> [root@node1:~]# pcs cluster cib cib1.xml [root@node1:~]# cp cib1.xml cib2.xml 
> [root@node1:~]# pcs -f cib2.xml constraint list --full Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> Resource Sets:
>   set pgsql-slave-ip1 pgsql-slave-ip2
> (id:pcs_rsc_set_pgsql-slave-ip1_pgsql-slave-ip2) setoptions 
> score=-1000
> (id:pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2)
> Ticket Constraints:
> [root@node1:~]# pcs -f cib2.xml constraint remove
> pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2
> [root@node1:~]# pcs -f cib2.xml constraint colocation set
> pgsql-slave-ip1 pgsql-slave-ip2 pgsql-slave-ip3 setoptions score=-1000 
> [root@node1:~]# pcs cluster cib-push cib2.xml diff-against=cib1.xml 
> CIB updated
> 
> 
> Pcs older than 0.9.156 does not support the diff-against option, you can do 
> it like this:
> 
> [root@node1:~]# pcs cluster cib cib.xml [root@node1:~]# pcs -f cib.xml 
> constraint list --full Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> Resource Sets:
>   set pgsql-slave-ip1 pgsql-slave-ip2
> (id:pcs_rsc_set_pgsql-slave-ip1_pgsql-slave-ip2) setoptions 
> score=-1000
> (id:pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2)
> Ticket Constraints:
> [root@node1:~]# pcs -f cib.xml constraint remove
> pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2
> [root@node1:~]# pcs -f cib.xml constraint colocation set 
> pgsql-slave-ip1
> pgsql-slave-ip2 pgsql-slave-ip3 setoptions score=-1000 [root@node1:~]# 
> pcs cluster cib-push cib.xml CIB updated
> 
> 
> Regards,
> Tomas
> ___
> Users mailing list: Users@clu

[ClusterLabs] 答复: 答复: How to change the "pcs constraint colocation set"

2018-05-15 Thread 范国腾

Sorry, my mistake. I should use the second id. It is ok now. Thanks Tomas.

-邮件原件-
发件人: 范国腾 
发送时间: 2018年5月15日 16:19
收件人: users@clusterlabs.org
主题: 答复: [ClusterLabs] 答复: How to change the "pcs constraint colocation set"

It could not find the id of constraint set.

[root@node1 ~]# pcs constraint colocation --full Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) 
(id:colocation-clvmd-clone-dlm-clone-INFINITY)
  pgsql-master-ip with pgsql-ha (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Master) (id:colocation-pgsql-master-ip-pgsql-ha-INFINITY)
  pgsql-slave-ip2 with pgsql-ha (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Slave) (id:colocation-pgsql-slave-ip2-pgsql-ha-INFINITY)
  pgsql-slave-ip3 with pgsql-ha (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Slave) (id:colocation-pgsql-slave-ip3-pgsql-ha-INFINITY)
  Resource Sets:
set pgsql-slave-ip2 (id:pcs_rsc_set_pgsql-slave-ip2) setoptions score=-1000 
(id:pcs_rsc_colocation_set_pgsql-slave-ip2)
set pgsql-slave-ip2 pgsql-slave-ip3 
(id:pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3) setoptions score=-1000 
(id:pcs_rsc_colocation_set_pgsql-slave-ip2_pgsql-slave-ip3)
set pgsql-slave-ip2 pgsql-slave-ip3 
(id:pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3-1) setoptions score=-INFINITY 
(id:pcs_rsc_colocation_set_pgsql-slave-ip2_pgsql-slave-ip3-1)
[root@node1 ~]# pcs constraint remove pcs_rsc_set_pgsql-slave-ip2
Error: Unable to find constraint - 'pcs_rsc_set_pgsql-slave-ip2'
[root@node1 ~]# pcs constraint remove 
pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3
Error: Unable to find constraint - 'pcs_rsc_set_pgsql-slave-ip2_pgsql-slave-ip3'
[root@node1 ~]#

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
发送时间: 2018年5月15日 16:12
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] 答复: How to change the "pcs constraint colocation set"

Dne 15.5.2018 v 10:02 范国腾 napsal(a):
> Thank you, Tomas. I know how to remove a constraint " pcs constraint 
> colocation remove   ". Is there a 
> command to delete a constraint colocation set?

There is "pcs constraint remove ". To get a constraint id, run 
"pcs constraint colocation --full" and find the constraint you want to remove.


> 
> -邮件原件-
> 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> 发送时间: 2018年5月15日 15:42
> 收件人: users@clusterlabs.org
> 主题: Re: [ClusterLabs] How to change the "pcs constraint colocation set"
> 
> Dne 15.5.2018 v 05:25 范国腾 napsal(a):
>> Hi,
>>
>> We have two VIP resources and we use the following command to make them in 
>> different node.
>>
>> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 
>> setoptions score=-1000
>>
>> Now we add a new node into the cluster and we add a new VIP too. We want the 
>> constraint colocation set to change to be:
>> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2
>> pgsql-slave-ip3 setoptions score=-1000
>>
>> How should we change the constraint set?
>>
>> Thanks
> 
> Hi,
> 
> pcs provides no commands for editing existing constraints. You can create a 
> new constraint and remove the old one. If you want to do it as a single 
> change from pacemaker's point of view, follow this procedure:
> 
> [root@node1:~]# pcs cluster cib cib1.xml [root@node1:~]# cp cib1.xml cib2.xml 
> [root@node1:~]# pcs -f cib2.xml constraint list --full Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> Resource Sets:
>   set pgsql-slave-ip1 pgsql-slave-ip2
> (id:pcs_rsc_set_pgsql-slave-ip1_pgsql-slave-ip2) setoptions
> score=-1000
> (id:pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2)
> Ticket Constraints:
> [root@node1:~]# pcs -f cib2.xml constraint remove
> pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2
> [root@node1:~]# pcs -f cib2.xml constraint colocation set
> pgsql-slave-ip1 pgsql-slave-ip2 pgsql-slave-ip3 setoptions score=-1000 
> [root@node1:~]# pcs cluster cib-push cib2.xml diff-against=cib1.xml 
> CIB updated
> 
> 
> Pcs older than 0.9.156 does not support the diff-against option, you can do 
> it like this:
> 
> [root@node1:~]# pcs cluster cib cib.xml [root@node1:~]# pcs -f cib.xml 
> constraint list --full Location Constraints:
> Ordering Constraints:
> Colocation Constraints:
> Resource Sets:
>   set pgsql-slave-ip1 pgsql-slave-ip2
> (id:pcs_rsc_set_pgsql-slave-ip1_pgsql-slave-ip2) setoptions
> score=-1000
> (id:pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2)
> Ticket Constraints:
> [root@node1:~]# pcs -f cib.xml constraint remove
> pcs_rsc_colocation_set_pgsql-slave-ip1_pgsql-slave-ip2
> [root@node1:~]# pcs -f cib.xml constraint colocation set
> pgsql-slave-ip1

[ClusterLabs] Could not start only one node in pacemaker

2018-05-01 Thread 范国腾

Hi,
The cluster has three nodes: one is master and two are slave. Now we run “pcs 
cluster stop --all” to stop all of the nodes. Then we run “pcs cluster start” 
in the master node. We find it not able to started. The cause is that the 
stonith resource could not be started so all of the other resource could not be 
started.

We test this case in two cluster system and the result is same:

l  If we start all of the three nodes, the stonith resource could be started. 
If we stop one node after it starts, the stonith resource could be migrated to 
another node and the cluster still work.

l  If we start only one or only two nodes, the stonith resource could not be 
started.


(1)   We create the stonith resource using this method in one system:
pcs stonith create ipmi_node1 fence_ipmilan ipaddr="192.168.100.202" 
login="ADMIN" passwd="ADMIN" pcmk_host_list="node1"
pcs stonith create ipmi_node2 fence_ipmilan ipaddr="192.168.100.203" 
login="ADMIN" passwd="ADMIN" pcmk_host_list="node2"
pcs stonith create ipmi_node3 fence_ipmilan ipaddr="192.168.100.204" 
login="ADMIN" passwd="ADMIN" pcmk_host_list="node3"


(2)   We create the stonith resource using this method in another system:

pcs stonith create scsi-stonith-device fence_scsi devices=/dev/mapper/fence 
pcmk_monitor_action=metadata pcmk_reboot_action=off pcmk_host_list="node1 node2 
node3 node4" meta provides=unfencing;


The log is in the attachment.
What prevents the stonith resource to be started if we only started part of the 
nodes?

Thanks




Desktop.rar
Description: Desktop.rar
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: Could not start only one node in pacemaker

2018-05-01 Thread 范国腾

Andrei,

We set "pcs property set no-quorum-policy=freeze;" If we want to keep this 
"freeze" value, could you please tell what quorum parameter we should set?

Thanks


-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Andrei Borzenkov
发送时间: 2018年5月2日 12:20
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] Could not start only one node in pacemaker

02.05.2018 05:52, 范国腾 пишет:
> Hi,
> The cluster has three nodes: one is master and two are slave. Now we run “pcs 
> cluster stop --all” to stop all of the nodes. Then we run “pcs cluster start” 
> in the master node. We find it not able to started. The cause is that the 
> stonith resource could not be started so all of the other resource could not 
> be started.
> 
> We test this case in two cluster system and the result is same:
> 
> l  If we start all of the three nodes, the stonith resource could be started. 
> If we stop one node after it starts, the stonith resource could be migrated 
> to another node and the cluster still work.
> 
> l  If we start only one or only two nodes, the stonith resource could not be 
> started.
> 
> 
> (1)   We create the stonith resource using this method in one system:
> pcs stonith create ipmi_node1 fence_ipmilan ipaddr="192.168.100.202" 
> login="ADMIN" passwd="ADMIN" pcmk_host_list="node1"
> pcs stonith create ipmi_node2 fence_ipmilan ipaddr="192.168.100.203" 
> login="ADMIN" passwd="ADMIN" pcmk_host_list="node2"
> pcs stonith create ipmi_node3 fence_ipmilan ipaddr="192.168.100.204" 
> login="ADMIN" passwd="ADMIN" pcmk_host_list="node3"
> 
> 
> (2)   We create the stonith resource using this method in another system:
> 
> pcs stonith create scsi-stonith-device fence_scsi 
> devices=/dev/mapper/fence pcmk_monitor_action=metadata 
> pcmk_reboot_action=off pcmk_host_list="node1 node2 node3 node4" meta 
> provides=unfencing;
> 
> 
> The log is in the attachment.
> What prevents the stonith resource to be started if we only started part of 
> the nodes?

It says quite clearly

May  1 22:02:09 node3 pengine[17997]:  notice: Cannot fence unclean nodes until 
quorum is attained (or no-quorum-policy is set to ignore) 
___
Users mailing list: Users@clusterlabs.org 
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: the PAF switchover does not happen if the VIP resource is stopped

2018-04-26 Thread 范国腾

1. There is no failure in initial status. sds1 is master

[cid:image001.png@01D3DD75.3F4BF110]



2. ifdown the sds1 VIP network card.

[cid:image002.png@01D3DD75.71D5DE70]

3. ifup the sds1 VIP network card and then ifdown sds2 VIP network card

[cid:image003.png@01D3DD76.26C5E820]





-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
发送时间: 2018年4月26日 15:07
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>; 李梦怡 <limen...@highgo.com>
主题: Re: [ClusterLabs] the PAF switchover does not happen if the VIP resource is 
stopped



On Thu, 26 Apr 2018 02:53:33 +

范国腾 <fanguot...@highgo.com<mailto:fanguot...@highgo.com>> wrote:



> Hi Rorthais，

>

> Thank you for your help.

>

> The replication works at that time.

>

> I try again today.

> (1) If I run "ifup enp0s3" in node2, then run "ifdown enp0s3" in

> node1, the switchover issue could be reproduced. (2) But if I run

> "ifup enp0s3" in node2, run "pcs resource cleanup mastergroup" to

> clean the VIP resource, and there is no Failed Actions in "pcs

> status", then run "ifdown enp0s3" in node1, it works. The switchover could 
> happened again.

>

>

> Is there any parameter to control this behaviors so that I don't need

> to execute the "pcs cleanup" command every time?



Check the failcounts for each resource on each nodes (pcs resource failcount 
[...]).

Check the scores as well (crm_simulate -sL).



>

> -邮件原件-

> 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]

> 发送时间: 2018年4月25日 18:39

> 收件人: 范国腾 <fanguot...@highgo.com<mailto:fanguot...@highgo.com>>

> 抄送: Cluster Labs - All topics related to open-source clustering

> welcomed <users@clusterlabs.org<mailto:users@clusterlabs.org>>; 李梦怡 
> <limen...@highgo.com<mailto:limen...@highgo.com>> 主题: Re:

> [ClusterLabs] the PAF switchover does not happen if the VIP resource

> is stopped

>

>

> On Wed, 25 Apr 2018 08:58:34 +

> 范国腾 <fanguot...@highgo.com<mailto:fanguot...@highgo.com>> wrote:

>

> >

> > Our lab has two resource: (1) PAF (master/slave)(2) VIP (bind to the

> > master PAF node). The configuration is in the attachment.

> >

> > Each node has two network card: One(enp0s8) is for the pacemaker

> > heartbeat in internal network, the other(enp0s3) is for the master

> > VIP in the external network.

> >

> >

> >

> > We are testing the following case: if the master VIP network card is

> > down, the master postgres and VIP could switch to another node.

> >

> >

> >

> > 1. At first, node2 is master, I run "ifdown enp0s3" in node2, then

> > node1 become the master, that is ok.

> >

> > 2. Then I run "ifup enp0s3" in node2, wait for 60 seconds,

>

> Did you check PostgreSQL instances were replicating again?

>

> > then run "ifdown enp0s3" in node1, but the node1 still be master.

> > Why does switchover doesn't happened? How to recover to make system work?
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: the PAF switchover does not happen if the VIP resource is stopped

2018-04-25 Thread 范国腾

Hi Rorthais，

Thank you for your help. 

The replication works at that time. 

I try again today.
(1) If I run "ifup enp0s3" in node2, then run "ifdown enp0s3" in node1, the 
switchover issue could be reproduced. 
(2) But if I run "ifup enp0s3" in node2, run "pcs resource cleanup mastergroup" 
to clean the VIP resource, and there is no Failed Actions in "pcs status", then 
run "ifdown enp0s3" in node1, it works. The switchover could happened again.


Is there any parameter to control this behaviors so that I don't need to 
execute the "pcs cleanup" command every time?

-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
发送时间: 2018年4月25日 18:39
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>; 李梦怡 <limen...@highgo.com>
主题: Re: [ClusterLabs] the PAF switchover does not happen if the VIP resource is 
stopped


On Wed, 25 Apr 2018 08:58:34 +
范国腾 <fanguot...@highgo.com> wrote:

> 
> Our lab has two resource: (1) PAF (master/slave)(2) VIP (bind to the
> master PAF node). The configuration is in the attachment.
> 
> Each node has two network card: One(enp0s8) is for the pacemaker 
> heartbeat in internal network, the other(enp0s3) is for the master VIP 
> in the external network.
> 
> 
> 
> We are testing the following case: if the master VIP network card is 
> down, the master postgres and VIP could switch to another node.
> 
> 
> 
> 1. At first, node2 is master, I run "ifdown enp0s3" in node2, then 
> node1 become the master, that is ok.
> 
> 2. Then I run "ifup enp0s3" in node2, wait for 60 seconds,

Did you check PostgreSQL instances were replicating again?

> then run "ifdown enp0s3" in node1, but the node1 still be master. Why 
> does switchover doesn't happened? How to recover to make system work?


info.rar
Description: info.rar
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: the PAF switchover does not happen if the VIP resource is stopped

2018-04-26 Thread 范国腾


Does it mean if one node has ever a resource failure, it could not be promoted 
to be master any more except that I run the pcs cleanup to clean the failcount?

I am testing the case if the VIP resource down because of some reason, the 
cluster could still work. So I only ifdown the VIP network(enp0s3), not the 
heartbeat network card(enp0s8)? 

-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
发送时间: 2018年4月26日 16:02
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>; 李梦怡 <limen...@highgo.com>
主题: Re: [ClusterLabs] the PAF switchover does not happen if the VIP resource is 
stopped

On Thu, 26 Apr 2018 07:53:07 +
范国腾 <fanguot...@highgo.com> wrote:

> 1. There is no failure in initial status. sds1 is master
> 
> [cid:image001.png@01D3DD75.3F4BF110]

yes.

> 2. ifdown the sds1 VIP network card.
> 
> [cid:image002.png@01D3DD75.71D5DE70]

ok, failcount and -inf score appears.

> 3. ifup the sds1 VIP network card and then ifdown sds2 VIP network 
> card
> 
> [cid:image003.png@01D3DD76.26C5E820]

Now failcount and -inf score everywhere.

I'm not sure I understand your mail, do you have a question ?

> -邮件原件-
> 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
> 发送时间: 2018年4月26日 15:07
> 收件人: 范国腾 <fanguot...@highgo.com>
> 抄送: Cluster Labs - All topics related to open-source clustering 
> welcomed <users@clusterlabs.org>; 李梦怡 <limen...@highgo.com> 主题: Re: 
> [ClusterLabs] the PAF switchover does not happen if the VIP resource 
> is stopped
> 
> 
> 
> On Thu, 26 Apr 2018 02:53:33 +
> 
> 范国腾 <fanguot...@highgo.com<mailto:fanguot...@highgo.com>> wrote:
> 
> 
> 
> > Hi Rorthais，
> 
> >  
> 
> > Thank you for your help.  
> 
> >  
> 
> > The replication works at that time.  
> 
> >  
> 
> > I try again today.  
> 
> > (1) If I run "ifup enp0s3" in node2, then run "ifdown enp0s3" in
> 
> > node1, the switchover issue could be reproduced. (2) But if I run
> 
> > "ifup enp0s3" in node2, run "pcs resource cleanup mastergroup" to
> 
> > clean the VIP resource, and there is no Failed Actions in "pcs
> 
> > status", then run "ifdown enp0s3" in node1, it works. The switchover 
> > could happened again.
> 
> >  
> 
> >  
> 
> > Is there any parameter to control this behaviors so that I don't 
> > need
> 
> > to execute the "pcs cleanup" command every time?  
> 
> 
> 
> Check the failcounts for each resource on each nodes (pcs resource 
> failcount [...]).
> 
> Check the scores as well (crm_simulate -sL).
> 
> 
> 
> >  
> 
> > -邮件原件-
> 
> > 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
> 
> > 发送时间: 2018年4月25日 18:39
> 
> > 收件人: 范国腾 <fanguot...@highgo.com<mailto:fanguot...@highgo.com>>
> 
> > 抄送: Cluster Labs - All topics related to open-source clustering
> 
> > welcomed <users@clusterlabs.org<mailto:users@clusterlabs.org>>; 李梦怡
> > <limen...@highgo.com<mailto:limen...@highgo.com>> 主题: Re:  
> 
> > [ClusterLabs] the PAF switchover does not happen if the VIP resource
> 
> > is stopped
> 
> >  
> 
> >  
> 
> > On Wed, 25 Apr 2018 08:58:34 +
> 
> > 范国腾 <fanguot...@highgo.com<mailto:fanguot...@highgo.com>> wrote:  
> 
> >  
> 
> > >  
> 
> > > Our lab has two resource: (1) PAF (master/slave)(2) VIP (bind to the  
> 
> > > master PAF node). The configuration is in the attachment.  
> 
> > >  
> 
> > > Each node has two network card: One(enp0s8) is for the pacemaker
> 
> > > heartbeat in internal network, the other(enp0s3) is for the master
> 
> > > VIP in the external network.  
> 
> > >  
> 
> > >  
> 
> > >  
> 
> > > We are testing the following case: if the master VIP network card 
> > > is
> 
> > > down, the master postgres and VIP could switch to another node.  
> 
> > >  
> 
> > >  
> 
> > >  
> 
> > > 1. At first, node2 is master, I run "ifdown enp0s3" in node2, then
> 
> > > node1 become the master, that is ok.  
> 
> > >  
> 
> > > 2. Then I run "ifup enp0s3" in node2, wait for 60 seconds,
> 
> >  
> 
> > Did you check PostgreSQL instances were replicating again?  
> 
> >  
> 
> > > then run "ifdown enp0s3" in node1, but the node1 still be master.  
> 
> > > Why does switchover doesn't happened? How to recover to make 
> > > system work?



-- 
Jehan-Guillaume de Rorthais
Dalibo
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: Antw: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾

Ulrich,

Thank you very much for the help. When we do the performance test, our 
application(pgsql-ha) will start more than 500 process to process the client 
request. Is it possible to make this issue?

Is it any workaround or method to make pacemaker not restart the resource in 
such situation? Now the system could not work if the client sends high call 
load but we could not control the client's behavior. 

Thanks


-邮件原件-
发件人: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] 
发送时间: 2018年1月10日 18:20
收件人: users@clusterlabs.org
主题: [ClusterLabs] Antw: pacemaker reports monitor timeout while CPU is high

Hi!

I only can talk for myself: In former times with HP-UX, we had severe 
performance problems when the load was in the range of 8 to 14 (I/O waits not 
included, average for all logical CPUs), while in Linux we are getting problems 
with a load above 40 (or so) (I/O included, sum of all logical CPUs (which are 
24)). Also I/O waits cause cluster timeouts before CPU load actually matters 
(for us).
So with a load above 400 (not knowing your number of CPUs) it should not be 
that unusual. What is the number of threads in your system at that time?
It might be worth the efforts binding the cluster processes to specific CPUs 
and keep other tasks away from those, but I don't have experience with that.
I guess the "High CPU load detected" message triggers some internal suspend in 
the cluster engine (assuming the cluster engine caused the high load). Of 
course for "external " load the measure won't help...

Regards,
Ulrich


>>> ???  schrieb am 10.01.2018 um 10:40 in 
>>> Nachricht
<4dc98a5d9be144a78fb9a18721743...@ex01.highgo.com>:
> Hello,
> 
> This issue only appears when we run performance test and the CPU is high. 
> The cluster and log is as below. The Pacemaker will restart the Slave 
> Side pgsql-ha resource about every two minutes.
> 
> Take the following scenario for example:（when the pgsqlms RA is 
> called, we print the log “execute the command start (command)”. When 
> the command is

> returned, we print the log “execute the command stop (Command)
(result)”）
> 
> 1. We could see that pacemaker call “pgsqlms monitor” about every 15

> seconds. And it return $OCF_SUCCESS
> 
> 2. In calls monitor command again at 13:56:16, and then it reports 
> timeout error error 13:56:18. It is only 2 seconds but it reports 
> “timeout=1ms”
> 
> 3. In other logs, sometimes after 15 minutes, there is no “execute the

> command start monitor” printed and it reports timeout error directly.
> 
> Could you please tell how to debug or resolve such issue?
> 
> The log:
> 
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> start

> monitor
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role start 
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role stop 0 
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> stop monitor 0 Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: 
> execute the command start

> monitor
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role start 
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop 0 
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command 
> stop monitor 0 Jan 10 13:56:02 sds2 crmd[26096]:  notice: High CPU 
> load detected:
> 426.77
> Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the command 
> start

> monitor
> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000 
> process (PID

> 5606) timed out
> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000:5606 - 
> timed

> out after 1ms
> Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor operation for 
> pgsqld on db2: Timed Out | call=102 key=pgsqld_monitor_16000
timeout=1ms
> Jan 10 13:56:18 sds2 crmd[26096]:  notice: 
> db2-pgsqld_monitor_16000:102 [
> /tmp:5432 - accepting connections\n ]
> Jan 10 13:56:18 sds2 crmd[26096]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph Jan 10 13:56:19 sds2 pengine[26095]: 
> warning: Processing failed op monitor for pgsqld:0 on db2: unknown 
> error (1) Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing 
> failed op start for

> pgsqld:1 on db1: unknown error (1)
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away 
> from db1

> after 100 failures (max=100)
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away 
> from db1

> after 100 failures (max=100)
> Jan 10 13:56:19 sds2 pengine[26095]:  notice: Recover 
> pgsqld:0#011(Slave
> db2)
> Jan 10 13:56:19 sds2 pengine[26095]:  notice: Calculated transition 
> 37, saving inputs in /var/lib/pacemaker/pengine/pe-input-1251.bz2
> 
> 
> The Cluster Configuration:
> 2 nodes and 13 resources configured
> 
> Online: [ db1 db2 ]
> 
> Full list of resources:
> 
> Clone Set: dlm-clone

[ClusterLabs] 答复: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾

Thank you, Ken.

We have set the timeout to be 10 seconds, but it reports timeout only after 2 
seconds. So it seems not work if I set higher timeouts.
Our application which is managed by pacemaker will start more than 500 process 
to run when running performance test. Does it affect the result? Which log 
could help us to analyze?

> monitor interval=16s role=Slave timeout=10s (pgsqld-monitor-interval-16s)

-邮件原件-
发件人: Ken Gaillot [mailto:kgail...@redhat.com] 
发送时间: 2018年1月11日 0:54
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] pacemaker reports monitor timeout while CPU is high

On Wed, 2018-01-10 at 09:40 +, 范国腾 wrote:
> Hello,
>  
> This issue only appears when we run performance test and the CPU is 
> high. The cluster and log is as below. The Pacemaker will restart the 
> Slave Side pgsql-ha resource about every two minutes.
>  
> Take the following scenario for example:（when the pgsqlms RA is 
> called, we print the log “execute the command start (command)”. When 
> the command is returned, we print the log “execute the command stop
> (Command) (result)”）
> 1. We could see that pacemaker call “pgsqlms monitor” about every
> 15 seconds. And it return $OCF_SUCCESS 2. In calls monitor command 
> again at 13:56:16, and then it reports timeout error error 13:56:18. 
> It is only 2 seconds but it reports “timeout=1ms”
> 3. In other logs, sometimes after 15 minutes, there is no “execute 
> the command start monitor” printed and it reports timeout error 
> directly.
>  
> Could you please tell how to debug or resolve such issue?
>  
> The log:
>  
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> start monitor Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: 
> _confirm_role start Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: 
> _confirm_role stop
> 0
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> stop monitor 0 Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: 
> execute the command start monitor Jan 10 13:55:52 sds2 
> pgsqlms(pgsqld)[5477]: INFO: _confirm_role start Jan 10 13:55:52 sds2 
> pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop
> 0
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command 
> stop monitor 0 Jan 10 13:56:02 sds2 crmd[26096]:  notice: High CPU 
> load detected:
> 426.77
> Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the command 
> start monitor Jan 10 13:56:18 sds2 lrmd[26093]: warning: 
> pgsqld_monitor_16000 process (PID 5606) timed out

There's something more going on than in this log snippet. Notice the process 
that timed out (5606) is not one of the processes that logged above (5240 and 
5477).

Generally, once load gets that high, it's very difficult to maintain 
responsiveness, and the expectation is that another node will fence it.
But it can often be worked around with high timeouts, and/or you can use rules 
to set higher timeouts or maintenance mode during times when high load is 
expected.

> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000:5606
> - timed out after 1ms
> Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor operation 
> for pgsqld on db2: Timed Out | call=102
> key=pgsqld_monitor_16000 timeout=1ms Jan 10 13:56:18 sds2 
> crmd[26096]:  notice: db2-
> pgsqld_monitor_16000:102 [ /tmp:5432 - accepting connections\n ] Jan 
> 10 13:56:18 sds2 crmd[26096]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph Jan 10 13:56:19 sds2 pengine[26095]: 
> warning: Processing failed op monitor for pgsqld:0 on db2: unknown 
> error (1) Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing 
> failed op start for pgsqld:1 on db1: unknown error (1) Jan 10 13:56:19 
> sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1 after 
> 100 failures (max=100) Jan 10 13:56:19 sds2 pengine[26095]: 
> warning: Forcing pgsql-ha away from db1 after 100 failures 
> (max=100) Jan 10 13:56:19 sds2 pengine[26095]:  notice: Recover 
> pgsqld:0#011(Slave db2) Jan 10 13:56:19 sds2 pengine[26095]:  notice: 
> Calculated transition 37, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-1251.bz2
>  
>  
> The Cluster Configuration:
> 2 nodes and 13 resources configured
>  
> Online: [ db1 db2 ]
>  
> Full list of resources:
>  
> Clone Set: dlm-clone [dlm]
>  Started: [ db1 db2 ]
> Clone Set: clvmd-clone [clvmd]
>  Started: [ db1 db2 ]
> ipmi_node1 (stonith:fence_ipmilan):    Started db2
> ipmi_node2 (stonith:fence_ipmilan):    Started db1 Clone Set: 
> clusterfs-clone [clusterfs]
>  Started: [ db1 db2 ]
&g

[ClusterLabs] pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾

Hello,

This issue only appears when we run performance test and the CPU is high. The 
cluster and log is as below. The Pacemaker will restart the Slave Side pgsql-ha 
resource about every two minutes.

Take the following scenario for example:（when the pgsqlms RA is called, we 
print the log “execute the command start (command)”. When the command is 
returned, we print the log “execute the command stop (Command) (result)”）

1. We could see that pacemaker call “pgsqlms monitor” about every 15 
seconds. And it return $OCF_SUCCESS

2. In calls monitor command again at 13:56:16, and then it reports timeout 
error error 13:56:18. It is only 2 seconds but it reports “timeout=1ms”

3. In other logs, sometimes after 15 minutes, there is no “execute the 
command start monitor” printed and it reports timeout error directly.

Could you please tell how to debug or resolve such issue?

The log:

Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command start 
monitor
Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role start
Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role stop 0
Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command stop 
monitor 0
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command start 
monitor
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role start
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop 0
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command stop 
monitor 0
Jan 10 13:56:02 sds2 crmd[26096]:  notice: High CPU load detected: 426.77
Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the command start 
monitor
Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000 process (PID 
5606) timed out
Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000:5606 - timed 
out after 1ms
Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor operation for 
pgsqld on db2: Timed Out | call=102 key=pgsqld_monitor_16000 timeout=1ms
Jan 10 13:56:18 sds2 crmd[26096]:  notice: db2-pgsqld_monitor_16000:102 [ 
/tmp:5432 - accepting connections\n ]
Jan 10 13:56:18 sds2 crmd[26096]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph
Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing failed op monitor for 
pgsqld:0 on db2: unknown error (1)
Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing failed op start for 
pgsqld:1 on db1: unknown error (1)
Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1 
after 100 failures (max=100)
Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1 
after 100 failures (max=100)
Jan 10 13:56:19 sds2 pengine[26095]:  notice: Recover pgsqld:0#011(Slave db2)
Jan 10 13:56:19 sds2 pengine[26095]:  notice: Calculated transition 37, saving 
inputs in /var/lib/pacemaker/pengine/pe-input-1251.bz2


The Cluster Configuration:
2 nodes and 13 resources configured

Online: [ db1 db2 ]

Full list of resources:

Clone Set: dlm-clone [dlm]
 Started: [ db1 db2 ]
Clone Set: clvmd-clone [clvmd]
 Started: [ db1 db2 ]
ipmi_node1 (stonith:fence_ipmilan):Started db2
ipmi_node2 (stonith:fence_ipmilan):Started db1
Clone Set: clusterfs-clone [clusterfs]
 Started: [ db1 db2 ]
Master/Slave Set: pgsql-ha [pgsqld]>

  Masters: [ db1 ]

Slaves: [ db2 ]
Resource Group: mastergroup
 db1-vip(ocf::heartbeat:IPaddr2):   Started
 rep-vip(ocf::heartbeat:IPaddr2):   Started
Resource Group: slavegroup
 db2-vip(ocf::heartbeat:IPaddr2):   Started


pcs resource show pgsql-ha
Master: pgsql-ha
  Meta Attrs: interleave=true notify=true
  Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)
   Attributes: bindir=/usr/local/pgsql/bin pgdata=/home/postgres/data
   Operations: start interval=0s timeout=160s (pgsqld-start-interval-0s)
   stop interval=0s timeout=60s (pgsqld-stop-interval-0s)
   promote interval=0s timeout=130s (pgsqld-promote-interval-0s)
   demote interval=0s timeout=120s (pgsqld-demote-interval-0s)
   monitor interval=15s role=Master timeout=10s 
(pgsqld-monitor-interval-15s)
   monitor interval=16s role=Slave timeout=10s 
(pgsqld-monitor-interval-16s)
   notify interval=0s timeout=60s (pgsqld-notify-interval-0s)
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] “pcs --debug” does not work

2018-01-22 Thread 范国腾

Hello,
The help of "pcs --debug" says " Print all network traffic and external 
commands run." But when I run the "pcs --debug", it still print the help 
information. How to trigger it to print the network traffic?

Thanks
Steven

 [root@db3 ~]# pcs --debug

Usage: pcs [-f file] [-h] [commands]...
Control and configure pacemaker and corosync.

Options:
-h, --help Display usage and exit.
-f filePerform actions on file instead of active CIB.
--debugPrint all network traffic and external commands run.
--version  Print pcs version information.
--request-timeout  Timeout for each outgoing request to another node in
   seconds. Default is 60s.

Commands:
cluster Configure cluster options and nodes.
resourceManage cluster resources.
stonith Manage fence devices.
constraint  Manage resource constraints.
propertyManage pacemaker properties.
acl Manage pacemaker access control lists.
qdevice Manage quorum device provider on the local host.
quorum  Manage cluster quorum settings.
booth   Manage booth (cluster ticket manager).
status  View cluster status.
config  View and manage cluster configuration.
pcsdManage pcs daemon.
nodeManage cluster nodes.
alert   Manage pacemaker alerts.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: 答复: pacemaker reports monitor timeout while CPU is high

2018-01-11 Thread 范国腾

Thank you very much, Ken. I will set the high timeout and try.

-邮件原件-
发件人: Ken Gaillot [mailto:kgail...@redhat.com] 
发送时间: 2018年1月11日 23:48
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
抄送: 王亮 <wangli...@highgo.com>
主题: Re: [ClusterLabs] 答复: pacemaker reports monitor timeout while CPU is high

On Thu, 2018-01-11 at 03:50 +, 范国腾 wrote:
> Thank you, Ken.
> 
> We have set the timeout to be 10 seconds, but it reports timeout only 
> after 2 seconds. So it seems not work if I set higher timeouts.
> Our application which is managed by pacemaker will start more than
> 500 process to run when running performance test. Does it affect the 
> result? Which log could help us to analyze?
> 
> > monitor interval=16s role=Slave timeout=10s (pgsqld-monitor-
> > interval-16s)

It's not timing out after 2 seconds. The message:

  sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command start monitor

indicates that the monitor's process ID is 5240, but the message:

  sds2 lrmd[26093]: warning: pgsqld_monitor_16000 process (PID 5606) timed out

indicates that the monitor that timed out had process ID 5606. That means that 
there were two separate monitors in progress. I'm not sure why; I wouldn't 
expect the second one to be started until after the first one had timed out. 
But it's possible with the high load that the log messages were simply written 
to the log out of order, since they were written by different processes.

I would just raise the timeout higher than 10s during the test.

> 
> -邮件原件-
> 发件人: Ken Gaillot [mailto:kgail...@redhat.com]
> 发送时间: 2018年1月11日 0:54
> 收件人: Cluster Labs - All topics related to open-source clustering 
> welcomed <users@clusterlabs.org>
> 主题: Re: [ClusterLabs] pacemaker reports monitor timeout while CPU is 
> high
> 
> On Wed, 2018-01-10 at 09:40 +, 范国腾 wrote:
> > Hello,
> >  
> > This issue only appears when we run performance test and the CPU is 
> > high. The cluster and log is as below. The Pacemaker will restart 
> > the Slave Side pgsql-ha resource about every two minutes.
> >  
> > Take the following scenario for example:（when the pgsqlms RA is 
> > called, we print the log “execute the command start (command)”.
> > When
> > the command is returned, we print the log “execute the command stop
> > (Command) (result)”）
> > 1. We could see that pacemaker call “pgsqlms monitor” about 
> > every
> > 15 seconds. And it return $OCF_SUCCESS 2. In calls monitor 
> > command again at 13:56:16, and then it reports timeout error error 
> > 13:56:18.
> > It is only 2 seconds but it reports “timeout=1ms”
> > 3. In other logs, sometimes after 15 minutes, there is no 
> > “execute the command start monitor” printed and it reports timeout 
> > error directly.
> >  
> > Could you please tell how to debug or resolve such issue?
> >  
> > The log:
> >  
> > Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the 
> > command start monitor Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: 
> > INFO:
> > _confirm_role start Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]:
> > INFO:
> > _confirm_role stop
> > 0
> > Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the 
> > command stop monitor 0 Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: 
> > INFO:
> > execute the command start monitor Jan 10 13:55:52 sds2
> > pgsqlms(pgsqld)[5477]: INFO: _confirm_role start Jan 10 13:55:52
> > sds2
> > pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop
> > 0
> > Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the 
> > command stop monitor 0 Jan 10 13:56:02 sds2 crmd[26096]:  notice: 
> > High CPU load detected:
> > 426.77
> > Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the 
> > command start monitor Jan 10 13:56:18 sds2 lrmd[26093]: warning:
> > pgsqld_monitor_16000 process (PID 5606) timed out
> 
> There's something more going on than in this log snippet. Notice the 
> process that timed out (5606) is not one of the processes that logged 
> above (5240 and 5477).
> 
> Generally, once load gets that high, it's very difficult to maintain 
> responsiveness, and the expectation is that another node will fence 
> it.
> But it can often be worked around with high timeouts, and/or you can 
> use rules to set higher timeouts or maintenance mode during times when 
> high load is expected.
> 
> > Jan 10 13:56:18 sds2 lrmd[26093]: warning:
> > pgsqld_monitor_16000:5606
> > - timed out after 1ms
> > Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor 
> > operation for pgsqld

[ClusterLabs] How to create the stonith resource in virtualbox

2018-02-07 Thread 范国腾

Hello,

I setup the pacemaker cluster using virtualbox. There are three nodes. The OS 
is centos7, the /dev/sdb is the shared storage（three nodes use the same disk 
file）.

(1) At first, I create the stonith using this command: 
pcs stonith create scsi-stonith-device fence_scsi devices=/dev/mapper/fence 
pcmk_monitor_action=metadata pcmk_reboot_action=off pcmk_host_list="db7-1 db7-2 
db7-3" meta provides=unfencing;

I know the VM not have the /dev/mapper/fence. But sometimes the stonith 
resource able to start, sometimes not. Don't know why. It is not stable.

(2) Then I use the following command to setup stonith using the shared disk 
/dev/sdb: 
pcs stonith create scsi-shooter fence_scsi 
devices=/dev/disk/by-id/ata-VBOX_HARDDISK_VBc833e6c6-af12c936 meta 
provides=unfencing 
  
But the stonith always be stopped and the log show:
Feb  7 15:45:53 db7-1 stonith-ng[8166]: warning: fence_scsi[8197] stderr: [ 
Failed: nodename or key is required ]

Could anyone help tell what is the correct command to setup the stonith in VM 
and centos? Is there any document to introduce this so that I could study it?


Thanks


Here is the cluster status:
[root@db7-1 ~]# pcs status
Cluster name: cluster_pgsql
Stack: corosync
Current DC: db7-2 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum
Last updated: Wed Feb  7 16:27:13 2018
Last change: Wed Feb  7 15:42:38 2018 by root via cibadmin on db7-1

3 nodes configured
1 resource configured

Online: [ db7-1 db7-2 db7-3 ]

Full list of resources:

 scsi-shooter   (stonith:fence_scsi):   Stopped

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: How to create the stonith resource in virtualbox

2018-02-08 Thread 范国腾

Thank Klaus,

The information is very helpful. I try to study the fence_vbox and the 
fence_sdb.

In our test lab, we use ipmi as the stonith. But I want to setup a simulator 
environment in my laptop. So I just need the stonith resource in start state so 
that I could create dlm and clvm resource.And I don't need it relally work. Do 
anybody have other suggestion?

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Klaus Wenninger
发送时间: 2018年2月9日 1:11
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] How to create the stonith resource in virtualbox

On 02/08/2018 02:05 PM, Andrei Borzenkov wrote:
> On Thu, Feb 8, 2018 at 5:51 AM, 范国腾 <fanguot...@highgo.com> wrote:
>> Hello,
>>
>> I setup the pacemaker cluster using virtualbox. There are three nodes. The 
>> OS is centos7, the /dev/sdb is the shared storage（three nodes use the same 
>> disk file）.
>>
>> (1) At first, I create the stonith using this command:
>> pcs stonith create scsi-stonith-device fence_scsi 
>> devices=/dev/mapper/fence pcmk_monitor_action=metadata 
>> pcmk_reboot_action=off pcmk_host_list="db7-1 db7-2 db7-3" meta 
>> provides=unfencing;
>>
>> I know the VM not have the /dev/mapper/fence. But sometimes the stonith 
>> resource able to start, sometimes not. Don't know why. It is not stable.
>>
> It probably tries to check resource and fails. State of stonith 
> resource is irrelevant for actual fencing operation (this resource is 
> only used for periodical check, not for fencing itself).
>
>> (2) Then I use the following command to setup stonith using the shared disk 
>> /dev/sdb:
>> pcs stonith create scsi-shooter fence_scsi 
>> devices=/dev/disk/by-id/ata-VBOX_HARDDISK_VBc833e6c6-af12c936 meta 
>> provides=unfencing
>>
>> But the stonith always be stopped and the log show:
>> Feb  7 15:45:53 db7-1 stonith-ng[8166]: warning: fence_scsi[8197] 
>> stderr: [ Failed: nodename or key is required ]
>>
> Well, you need to provide what is missing - your command did not 
> specify any host.
>
>> Could anyone help tell what is the correct command to setup the stonith in 
>> VM and centos? Is there any document to introduce this so that I could study 
>> it?

I personally don't have any experience setting up a pacemaker-cluster in vbox.

Thus I'm limited to giving rather general advice.

What you might have to assure together with fence_scsi is if the scsi-emulation 
vbox offers lives up to the requirements of fence_scsi.
I've read about troubles in a posting back from 2015. The guy then went for 
using scsi via iSCSI.

Otherwise you could look for alternatives to fence_scsi.

One might be fence_vbox. It doesn't come with centos so far iirc but the 
upstream repo on github has it.
Fencing via the hypervisor is in general not a bad idea when it comes to 
clusters running in VMs (If you can live with the boundary conditions like 
giving certain credentials to the VMs that allow communication with the 
hypervisor.).
There was some discussion about fence_vbox on the clusterlabs-list a couple of 
months ago. iirc there had been issues with using windows as a host for vbox - 
but I guess they were fixed in the course of this discussion.

Another way of doing fencing via a shared disk is fence_sbd (available in 
centos) - although quite different from how fence_scsi is using the disk. One 
difference that might be helpful here is that it has less requirements on which 
disk-infrastructure is emulated.
On the other hand it is strongly advised for sbd in general to use a good 
watchdog device (one that brings down your machine - virtual or physical - in a 
very reliable manner). And afaik the only watchdog-device available inside a 
vbox VM is softdog that doesn't meet this requirement too well as it relies on 
the kernel running in the VM to be at least partially functional.

Sorry for not being able to help in a more specific way but I would be 
interested in which ways of fencing people are using when it comes to clusters 
based on vbox VMs myself ;-)

Regards,
Klaus
>>
>>
>> Thanks
>>
>>
>> Here is the cluster status:
>> [root@db7-1 ~]# pcs status
>> Cluster name: cluster_pgsql
>> Stack: corosync
>> Current DC: db7-2 (version 1.1.16-12.el7_4.7-94ff4df) - partition 
>> with quorum Last updated: Wed Feb  7 16:27:13 2018 Last change: Wed 
>> Feb  7 15:42:38 2018 by root via cibadmin on db7-1
>>
>> 3 nodes configured
>> 1 resource configured
>>
>> Online: [ db7-1 db7-2 db7-3 ]
>>
>> Full list of resources:
>>
>>  scsi-shooter   (stonith:fence_scsi):   Stopped
>>
>> Daemon Status:
>>   corosync: active/disabled
>>   pacemaker: active/disabled
>>   pcsd: active/enabled
>> _

[ClusterLabs] 答复: 答复: How to create the stonith resource in virtualbox

2018-02-10 Thread 范国腾

Marek,

Thank you very much for your help. I add the “pcmk_monitor_action=metadata”and 
the stonith could work now.

Thanks

发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Marek Grac
发送时间: 2018年2月9日 16:38
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] 答复: How to create the stonith resource in virtualbox

Hi,

for fence_vbox take a look at my older blogpost> 
https://ox.sk/howto-fence-vbox-cdd3da374ecd

if all you need is to have fencing in a state when dlm works and you promise 
that you will never have real data on it. There is an easy hack, it really does 
not matter which fence agent you use. All we care about is if action 'monitor' 
works, so add option>

pcmk_monitor_action=metadata

It means that instead of monitor action, you will use action 'metadata' which 
just prints XML metadata and succeed.

m,

On Fri, Feb 9, 2018 at 6:33 AM, 范国腾 
<fanguot...@highgo.com<mailto:fanguot...@highgo.com>> wrote:
Thank Klaus,

The information is very helpful. I try to study the fence_vbox and the 
fence_sdb.

In our test lab, we use ipmi as the stonith. But I want to setup a simulator 
environment in my laptop. So I just need the stonith resource in start state so 
that I could create dlm and clvm resource.And I don't need it relally work. Do 
anybody have other suggestion?

-邮件原件-
发件人: Users 
[mailto:users-boun...@clusterlabs.org<mailto:users-boun...@clusterlabs.org>] 代表 
Klaus Wenninger
发送时间: 2018年2月9日 1:11
收件人: users@clusterlabs.org<mailto:users@clusterlabs.org>
主题: Re: [ClusterLabs] How to create the stonith resource in virtualbox

On 02/08/2018 02:05 PM, Andrei Borzenkov wrote:
> On Thu, Feb 8, 2018 at 5:51 AM, 范国腾 
> <fanguot...@highgo.com<mailto:fanguot...@highgo.com>> wrote:
>> Hello,
>>
>> I setup the pacemaker cluster using virtualbox. There are three nodes. The 
>> OS is centos7, the /dev/sdb is the shared storage（three nodes use the same 
>> disk file）.
>>
>> (1) At first, I create the stonith using this command:
>> pcs stonith create scsi-stonith-device fence_scsi
>> devices=/dev/mapper/fence pcmk_monitor_action=metadata
>> pcmk_reboot_action=off pcmk_host_list="db7-1 db7-2 db7-3" meta
>> provides=unfencing;
>>
>> I know the VM not have the /dev/mapper/fence. But sometimes the stonith 
>> resource able to start, sometimes not. Don't know why. It is not stable.
>>
> It probably tries to check resource and fails. State of stonith
> resource is irrelevant for actual fencing operation (this resource is
> only used for periodical check, not for fencing itself).
>
>> (2) Then I use the following command to setup stonith using the shared disk 
>> /dev/sdb:
>> pcs stonith create scsi-shooter fence_scsi
>> devices=/dev/disk/by-id/ata-VBOX_HARDDISK_VBc833e6c6-af12c936 meta
>> provides=unfencing
>>
>> But the stonith always be stopped and the log show:
>> Feb  7 15:45:53 db7-1 stonith-ng[8166]: warning: fence_scsi[8197]
>> stderr: [ Failed: nodename or key is required ]
>>
> Well, you need to provide what is missing - your command did not
> specify any host.
>
>> Could anyone help tell what is the correct command to setup the stonith in 
>> VM and centos? Is there any document to introduce this so that I could study 
>> it?

I personally don't have any experience setting up a pacemaker-cluster in vbox.

Thus I'm limited to giving rather general advice.

What you might have to assure together with fence_scsi is if the scsi-emulation 
vbox offers lives up to the requirements of fence_scsi.
I've read about troubles in a posting back from 2015. The guy then went for 
using scsi via iSCSI.

Otherwise you could look for alternatives to fence_scsi.

One might be fence_vbox. It doesn't come with centos so far iirc but the 
upstream repo on github has it.
Fencing via the hypervisor is in general not a bad idea when it comes to 
clusters running in VMs (If you can live with the boundary conditions like 
giving certain credentials to the VMs that allow communication with the 
hypervisor.).
There was some discussion about fence_vbox on the clusterlabs-list a couple of 
months ago. iirc there had been issues with using windows as a host for vbox - 
but I guess they were fixed in the course of this discussion.

Another way of doing fencing via a shared disk is fence_sbd (available in 
centos) - although quite different from how fence_scsi is using the disk. One 
difference that might be helpful here is that it has less requirements on which 
disk-infrastructure is emulated.
On the other hand it is strongly advised for sbd in general to use a good 
watchdog device (one that brings down your machine - virtual or physical - in a 
very reliable manner). And afaik the only watchdog-device avail

[ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource has one VIP

2018-02-23 Thread 范国腾

Thank you, Ken,

So I could use the following command: pcs constraint colocation set 
pgsql-slave-ip1 pgsql-slave-ip2 pgsql-slave-ip3 setoptions score=-1000


-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Ken Gaillot
发送时间: 2018年2月23日 23:14
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] 答复: 答复: How to configure to make each slave resource has 
one VIP

On Fri, 2018-02-23 at 12:45 +0000, 范国腾 wrote:
> Thank you very much, Tomas.
> This resolves my problem.
> 
> -邮件原件-
> 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> 发送时间: 2018年2月23日 17:37
> 收件人: users@clusterlabs.org
> 主题: Re: [ClusterLabs] 答复: How to configure to make each slave resource 
> has one VIP
> 
> Dne 23.2.2018 v 10:16 范国腾 napsal(a):
> > Tomas,
> > 
> > Thank you very much. I do the change according to your suggestion 
> > and it works.

One thing to keep in mind: a score of -INFINITY means the IPs will
*never* run on the same node, even if one or more nodes go down. If that's what 
you want, of course, that's good. If you want the IPs to stay on different 
nodes normally, but be able to run on the same node in case of node outage, use 
a finite negative score.

> > 
> > There is a question: If there are too much nodes (e.g.  total 10 
> > slave nodes ), I need run "pcs constraint colocation add pgsql- 
> > slave-ipx with pgsql-slave-ipy -INFINITY" many times. Is there a 
> > simple command to do this?
> 
> I think colocation set does the trick:
> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2
> pgsql-slave-ip3 setoptions score=-INFINITY You may specify as many 
> resources as you need in this command.
> 
> Tomas
> 
> > 
> > Master/Slave Set: pgsql-ha [pgsqld]
> >   Masters: [ node1 ]
> >   Slaves: [ node2 node3 ]
> >   pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started
> > node1
> >   pgsql-slave-ip1(ocf::heartbeat:IPaddr2):   Started
> > node3
> >   pgsql-slave-ip2(ocf::heartbeat:IPaddr2):   Started
> > node2
> > 
> > Thanks
> > Steven
> > 
> > -邮件原件-
> > 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> > 发送时间: 2018年2月23日 17:02
> > 收件人: users@clusterlabs.org
> > 主题: Re: [ClusterLabs] How to configure to make each slave resource 
> > has one VIP
> > 
> > Dne 23.2.2018 v 08:17 范国腾 napsal(a):
> > > Hi,
> > > 
> > > Our system manages the database (one master and multiple slave).
> > > We
> > > use one VIP for multiple Slave resources firstly.
> > > 
> > > Now I want to change the configuration that each slave resource 
> > > has a separate VIP. For example, I have 3 slave nodes and my VIP 
> > > group has
> > > 2 vip; The 2 vips binds to node1 and node2 now; When the node2 
> > > fails, the vip could move to the node3.
> > > 
> > > 
> > > I use the following command to add the VIP
> > > 
> > > /      pcs resource group add pgsql-slave-group pgsql-slave-ip1 
> > > pgsql-slave-ip2/
> > > 
> > > /      pcs constraint colocation add pgsql-slave-group with slave 
> > > pgsql-ha INFINITY/
> > > 
> > > But now the two VIPs are the same nodes:
> > > 
> > > /Master/Slave Set: pgsql-ha [pgsqld]/
> > > 
> > > / Masters: [ node1 ]/
> > > 
> > > / Slaves: [ node2 node3 ]/
> > > 
> > > /pgsql-master-ip    (ocf::heartbeat:IPaddr2):   Started 
> > > node1/
> > > 
> > > /Resource Group: pgsql-slave-group/
> > > 
> > > */ pgsql-slave-ip1    (ocf::heartbeat:IPaddr2):   Started
> > > node2/*
> > > 
> > > */ pgsql-slave-ip2    (ocf::heartbeat:IPaddr2):   Started
> > > node2/*
> > > 
> > > Could anyone tell how to configure to make each slave node has a 
> > > VIP?
> > 
> > Resources in a group always run on the same node. You want the ip 
> > resources to run on different nodes so you cannot put them into a 
> > group.
> > 
> > This will take the resources out of the group:
> > pcs resource ungroup pgsql-slave-group
> > 
> > Then you can set colocation constraints for them:
> > pcs constraint colocation add pgsql-slave-ip1 with slave pgsql-ha 
> > pcs constraint colocation add pgsql-slave-ip2 with slave pgsql-ha
> > 
> > You may also need to tell pacemaker not to put both ips on the same
> > node:
> > pcs constrain

[ClusterLabs] How to configure to make each slave resource has one VIP

2018-02-22 Thread 范国腾

Hi,

Our system manages the database (one master and multiple slave). We use one VIP 
for multiple Slave resources firstly.
Now I want to change the configuration that each slave resource has a separate 
VIP. For example, I have 3 slave nodes and my VIP group has 2 vip; The 2 vips 
binds to node1 and node2 now; When the node2 fails, the vip could move to the 
node3.

[cid:image002.png@01D3ACB5.53E7BAF0]

I use the following command to add the VIP

  pcs resource group add pgsql-slave-group pgsql-slave-ip1 pgsql-slave-ip2
  pcs constraint colocation add pgsql-slave-group with slave pgsql-ha 
INFINITY

But now the two VIPs are the same nodes:

Master/Slave Set: pgsql-ha [pgsqld]
 Masters: [ node1 ]
 Slaves: [ node2 node3 ]
pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started node1
Resource Group: pgsql-slave-group
 pgsql-slave-ip1(ocf::heartbeat:IPaddr2):   Started node2
 pgsql-slave-ip2(ocf::heartbeat:IPaddr2):   Started node2

Could anyone tell how to configure to make each slave node has a VIP?

Thanks

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: 答复: How to create the stonith resource in virtualbox

2018-02-26 Thread 范国腾

Hi Marek and all,

I use the following command to create a stonith resource in 
virtualbox（centos7）which has no /dev/mapper/fence:
pcs stonith create scsi-stonith-device fence_scsi devices=/dev/mapper/fence 
pcmk_monitor_action=metadata pcmk_reboot_action=off pcmk_host_list="node1 
node2" meta provides=unfencing;


The stonith resource status could be “started” in 2017 and I think that it is 
because I use the metadata. But when I install a new VM this year and create 
the stonith again using the same command. It always be stopped status.

I try many times and here are current situation:

(1)   I created a cluster in 2017 in VM using the following command in CENTOS7. 
The stonith status is started until now.

(2)   I created a cluster today in VM using the following command in CENTOS7. 
The stonith status is always stopped.

(3)   I created a cluster today in VM using the following command in REDHAT7. 
The stonith status could be started.

I compare the /usr/sbin/fence_scsi file in different node and it  has no logic 
change.

Why could the Stonith resource could be started? How should I debug it?

Here is my command:
systemctl stop firewalld;chkconfig firewalld off;

yum install -y corosync pacemaker pcs gfs2-utils lvm2-cluster *scsi* 
python-clufter;

pcs cluster auth node1 node2 node3 -u hacluster;pcs cluster setup --name 
cluster_pgsql node1 node2 node3;pcs cluster start --all;pcs property set 
no-quorum-policy=freeze;pcs property set stonith-enabled=true;

pcs stonith create scsi-stonith-device fence_scsi devices=/dev/mapper/fence 
pcmk_monitor_action=metadata pcmk_reboot_action=off pcmk_host_list="node1 
node2" meta provides=unfencing;


Here is the log:
Feb 26 03:55:08 db1 crmd[2215]:  notice: Requesting fencing (on) of node db1
Feb 26 03:55:08 db1 stonith-ng[2211]:  notice: Client crmd.2215.c5d11cbe wants 
to fence (on) 'db2' with device '(any)'
Feb 26 03:55:08 db1 stonith-ng[2211]:  notice: Requesting peer fencing (on) of 
db2
Feb 26 03:55:08 db1 stonith-ng[2211]:  notice: Client crmd.2215.c5d11cbe wants 
to fence (on) 'db1' with device '(any)'
Feb 26 03:55:08 db1 stonith-ng[2211]:  notice: Requesting peer fencing (on) of 
db1
Feb 26 03:55:08 db1 stonith-ng[2211]:  notice: scsi-stonith-device can fence 
(on) db1: static-list
Feb 26 03:55:08 db1 stonith-ng[2211]:  notice: scsi-stonith-device can fence 
(on) db1: static-list
Feb 26 03:55:08 db1 fence_scsi: Failed: device "/dev/mapper/fence" does not 
exist
Feb 26 03:55:08 db1 fence_scsi: Please use '-h' for usage
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [ 
WARNING:root:Parse error: Ignoring unknown option 'port=db1' ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [  ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [ 
ERROR:root:Failed: device "/dev/mapper/fence" does not exist ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [  ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [ 
Failed: device "/dev/mapper/fence" does not exist ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [  ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [ 
ERROR:root:Please use '-h' for usage ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [  ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [ 
Please use '-h' for usage ]
Feb 26 03:55:08 db1 stonith-ng[2211]: warning: fence_scsi[9072] stderr: [  ]
Feb 26 03:59:11 db1 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" 
x-pid="627" x-info="http://www.rsyslog.com;] start
Feb 26 03:59:12 db1 rsyslogd-2027: imjournal: fscanf on state file 
`/var/lib/rsyslog/imjournal.state' failed
[try http://www.rsyslog.com/e/2027 ]

发件人: 范国腾
发送时间: 2018年2月11日 15:43
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: 答复: [ClusterLabs] 答复: How to create the stonith resource in virtualbox

Marek,

Thank you very much for your help. I add the “pcmk_monitor_action=metadata”and 
the stonith could work now.

Thanks


发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Marek Grac
发送时间: 2018年2月9日 16:38
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org<mailto:users@clusterlabs.org>>
主题: Re: [ClusterLabs] 答复: How to create the stonith resource in virtualbox

Hi,

for fence_vbox take a look at my older blogpost> 
https://ox.sk/howto-fence-vbox-cdd3da374ecd

if all you need is to have fencing in a state when dlm works and you promise 
that you will never have real data on it. There is an easy hack, it really does 
not matter which fence agent you use. All we care about is if action 'monitor' 
works, so add option>

pcmk_monitor_action=metadata

It means that instead of monitor action, you will use action 'metadata'

[ClusterLabs] 答复: How to configure to make each slave resource has one VIP

2018-02-23 Thread 范国腾

Tomas,

Thank you very much. I do the change according to your suggestion and it works.

There is a question: If there are too much nodes (e.g.  total 10 slave nodes ), 
I need run "pcs constraint colocation add pgsql-slave-ipx with pgsql-slave-ipy 
-INFINITY" many times. Is there a simple command to do this?

Master/Slave Set: pgsql-ha [pgsqld]
 Masters: [ node1 ]
 Slaves: [ node2 node3 ]
 pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started node1
 pgsql-slave-ip1(ocf::heartbeat:IPaddr2):   Started node3
 pgsql-slave-ip2(ocf::heartbeat:IPaddr2):   Started node2

Thanks
Steven

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
发送时间: 2018年2月23日 17:02
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] How to configure to make each slave resource has one VIP

Dne 23.2.2018 v 08:17 范国腾 napsal(a):
> Hi,
> 
> Our system manages the database (one master and multiple slave). We 
> use one VIP for multiple Slave resources firstly.
> 
> Now I want to change the configuration that each slave resource has a 
> separate VIP. For example, I have 3 slave nodes and my VIP group has 2 
> vip; The 2 vips binds to node1 and node2 now; When the node2 fails, 
> the vip could move to the node3.
> 
> 
> I use the following command to add the VIP
> 
> /      pcs resource group add pgsql-slave-group pgsql-slave-ip1 
> pgsql-slave-ip2/
> 
> /      pcs constraint colocation add pgsql-slave-group with slave 
> pgsql-ha INFINITY/
> 
> But now the two VIPs are the same nodes:
> 
> /Master/Slave Set: pgsql-ha [pgsqld]/
> 
> / Masters: [ node1 ]/
> 
> / Slaves: [ node2 node3 ]/
> 
> /pgsql-master-ip    (ocf::heartbeat:IPaddr2):   Started node1/
> 
> /Resource Group: pgsql-slave-group/
> 
> */ pgsql-slave-ip1    (ocf::heartbeat:IPaddr2):   Started 
> node2/*
> 
> */ pgsql-slave-ip2    (ocf::heartbeat:IPaddr2):   Started 
> node2/*
> 
> Could anyone tell how to configure to make each slave node has a VIP?

Resources in a group always run on the same node. You want the ip resources to 
run on different nodes so you cannot put them into a group.

This will take the resources out of the group:
pcs resource ungroup pgsql-slave-group

Then you can set colocation constraints for them:
pcs constraint colocation add pgsql-slave-ip1 with slave pgsql-ha pcs 
constraint colocation add pgsql-slave-ip2 with slave pgsql-ha

You may also need to tell pacemaker not to put both ips on the same node:
pcs constraint colocation add pgsql-slave-ip1 with pgsql-slave-ip2 -INFINITY


Regards,
Tomas

> 
> Thanks
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
___
Users mailing list: Users@clusterlabs.org 
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: 答复: How to configure to make each slave resource has one VIP

2018-02-23 Thread 范国腾

Thank you very much, Tomas. 
This resolves my problem.

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
发送时间: 2018年2月23日 17:37
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] 答复: How to configure to make each slave resource has one 
VIP

Dne 23.2.2018 v 10:16 范国腾 napsal(a):
> Tomas,
> 
> Thank you very much. I do the change according to your suggestion and it 
> works.
> 
> There is a question: If there are too much nodes (e.g.  total 10 slave nodes 
> ), I need run "pcs constraint colocation add pgsql-slave-ipx with 
> pgsql-slave-ipy -INFINITY" many times. Is there a simple command to do this?

I think colocation set does the trick:
pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2
pgsql-slave-ip3 setoptions score=-INFINITY You may specify as many resources as 
you need in this command.

Tomas

> 
> Master/Slave Set: pgsql-ha [pgsqld]
>   Masters: [ node1 ]
>   Slaves: [ node2 node3 ]
>   pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started node1
>   pgsql-slave-ip1(ocf::heartbeat:IPaddr2):   Started node3
>   pgsql-slave-ip2(ocf::heartbeat:IPaddr2):   Started node2
> 
> Thanks
> Steven
> 
> -邮件原件-
> 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> 发送时间: 2018年2月23日 17:02
> 收件人: users@clusterlabs.org
> 主题: Re: [ClusterLabs] How to configure to make each slave resource has 
> one VIP
> 
> Dne 23.2.2018 v 08:17 范国腾 napsal(a):
>> Hi,
>>
>> Our system manages the database (one master and multiple slave). We 
>> use one VIP for multiple Slave resources firstly.
>>
>> Now I want to change the configuration that each slave resource has a 
>> separate VIP. For example, I have 3 slave nodes and my VIP group has 
>> 2 vip; The 2 vips binds to node1 and node2 now; When the node2 fails, 
>> the vip could move to the node3.
>>
>>
>> I use the following command to add the VIP
>>
>> /      pcs resource group add pgsql-slave-group pgsql-slave-ip1 
>> pgsql-slave-ip2/
>>
>> /      pcs constraint colocation add pgsql-slave-group with slave 
>> pgsql-ha INFINITY/
>>
>> But now the two VIPs are the same nodes:
>>
>> /Master/Slave Set: pgsql-ha [pgsqld]/
>>
>> / Masters: [ node1 ]/
>>
>> / Slaves: [ node2 node3 ]/
>>
>> /pgsql-master-ip    (ocf::heartbeat:IPaddr2):   Started 
>> node1/
>>
>> /Resource Group: pgsql-slave-group/
>>
>> */ pgsql-slave-ip1    (ocf::heartbeat:IPaddr2):   Started
>> node2/*
>>
>> */ pgsql-slave-ip2    (ocf::heartbeat:IPaddr2):   Started
>> node2/*
>>
>> Could anyone tell how to configure to make each slave node has a VIP?
> 
> Resources in a group always run on the same node. You want the ip resources 
> to run on different nodes so you cannot put them into a group.
> 
> This will take the resources out of the group:
> pcs resource ungroup pgsql-slave-group
> 
> Then you can set colocation constraints for them:
> pcs constraint colocation add pgsql-slave-ip1 with slave pgsql-ha pcs 
> constraint colocation add pgsql-slave-ip2 with slave pgsql-ha
> 
> You may also need to tell pacemaker not to put both ips on the same node:
> pcs constraint colocation add pgsql-slave-ip1 with pgsql-slave-ip2 
> -INFINITY
> 
> 
> Regards,
> Tomas
> 
>>
>> Thanks
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
___
Users mailing list: Users@clusterlabs.org 
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: Pacemaker Master restarts when Slave is added to the cluster

2017-12-27 Thread 范国腾

Andrei,

I set the interleave=true and it does not restart any more. Thank you very 
much. 
A word of you resolves the problem confusing my several days 


-邮件原件-
发件人: Andrei Borzenkov [mailto:arvidj...@gmail.com] 
发送时间: 2017年12月27日 19:06
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] Pacemaker Master restarts when Slave is added to the 
cluster

Usual suspect - interleave=false on clone resource.

On Wed, Dec 27, 2017 at 10:49 AM, 范国腾 <fanguot...@highgo.com> wrote:
> Hello,
>
>
>
> In my test environment, I meet one issue about the pacemaker: when a 
> new node is added in the cluster, the master node restart. This issue 
> will lead to the system out of service for a while when adding a new 
> node because there is no master node. Could you please help tell how to debug 
> such issue?
>
>
>
> I have a pacemaker master/slave cluster as below. pgsql-ha is a 
> resource. I copy the script from 
> /usr/lib/ocf/resource.d/heartbeat/Dumy and add some simple codes to make it 
> support promote/demote.
>
> Now when I run “pcs cluster stop” on db1,the db1 is stopped status and 
> db2 is still master.
>
> The problem is: when I run “pcs cluster start” on db1.The db2 status 
> changes as below: master -> slave->stop->slave->master. Why does db2 restart?
>
>
>
> CENTOS7:
>
> ==
>
> 2 nodes and 7 resources configured
>
>
>
> Online: [ db1 db2 ]
>
>
>
> Full list of resources:
>
>
>
> Clone Set: dlm-clone [dlm]
>
>  Started: [ db1 db2 ]
>
> Clone Set: clvmd-clone [clvmd]
>
>  Started: [ db1 db2 ]
>
> scsi-stonith-device(stonith:fence_scsi):   Started db2
>
> Master/Slave Set: pgsql-ha [pgsqld]
>
>  Masters: [ db2 ]
>
>  Slaves: [ db1 ]
>
>
>
> Daemon Status:
>
>   corosync: active/enabled
>
>   pacemaker: active/enabled
>
>   pcsd: active/enabled
>
> [root@db1 heartbeat]#
>
> ==
>
> /var/log/messages:
>
> Dec 27 00:52:50 db2 cib[3290]:  notice: Purged 1 peers with id=1 
> and/or
> uname=db1 from the membership cache
>
> Dec 27 00:52:51 db2 kernel: dlm: closing connection to node 1
>
> Dec 27 00:52:51 db2 corosync[3268]: [TOTEM ] A new membership
> (192.168.199.199:372) was formed. Members left: 1
>
> Dec 27 00:52:51 db2 corosync[3268]: [QUORUM] Members[1]: 2
>
> Dec 27 00:52:51 db2 corosync[3268]: [MAIN  ] Completed service 
> synchronization, ready to provide service.
>
> Dec 27 00:52:51 db2 crmd[3295]:  notice: Node db1 state is now lost
>
> Dec 27 00:52:51 db2 crmd[3295]:  notice: do_shutdown of peer db1 is 
> complete
>
> Dec 27 00:52:51 db2 pacemakerd[3289]:  notice: Node db1 state is now 
> lost
>
> Dec 27 00:52:57 db2 Doctor(pgsqld)[6671]: INFO: pgsqld monitor : 8
>
> Dec 27 00:53:12 db2 Doctor(pgsqld)[6681]: INFO: pgsqld monitor : 8
>
> Dec 27 00:53:27 db2 Doctor(pgsqld)[6746]: INFO: pgsqld monitor : 8
>
> Dec 27 00:53:33 db2 corosync[3268]: [TOTEM ] A new membership
> (192.168.199.197:376) was formed. Members joined: 1
>
> Dec 27 00:53:33 db2 corosync[3268]: [QUORUM] Members[2]: 1 2
>
> Dec 27 00:53:33 db2 corosync[3268]: [MAIN  ] Completed service 
> synchronization, ready to provide service.
>
> Dec 27 00:53:33 db2 crmd[3295]:  notice: Node db1 state is now member
>
> Dec 27 00:53:33 db2 pacemakerd[3289]:  notice: Node db1 state is now 
> member
>
> Dec 27 00:53:33 db2 crmd[3295]:  notice: do_shutdown of peer db1 is 
> complete
>
> Dec 27 00:53:33 db2 crmd[3295]:  notice: State transition S_IDLE -> 
> S_INTEGRATION
>
> Dec 27 00:53:33 db2 pengine[3294]:  notice: Calculated transition 17, 
> saving inputs in /var/lib/pacemaker/pengine/pe-input-116.bz2
>
> Dec 27 00:53:33 db2 crmd[3295]:  notice: Transition 17 (Complete=0, 
> Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-116.bz2): Complete
>
> Dec 27 00:53:33 db2 crmd[3295]:  notice: State transition 
> S_TRANSITION_ENGINE -> S_IDLE
>
> Dec 27 00:53:33 db2 stonith-ng[3291]:  notice: Node db1 state is now 
> member
>
> Dec 27 00:53:33 db2 attrd[3293]:  notice: Node db1 state is now member
>
> Dec 27 00:53:33 db2 cib[3290]:  notice: Node db1 state is now member
>
> Dec 27 00:53:34 db2 crmd[3295]:  notice: State transition S_IDLE -> 
> S_INTEGRATION
>
> Dec 27 00:53:37 db2 crmd[3295]: warning: No reason to expect node 2 to 
> be down
>
> Dec 27 00:53:38 db2 pengine[3294]:  notice: Unfencing db1: node 
> discovery
>
> Dec 27 00:53:38 db2 pengi

[ClusterLabs] 答复: 答复: How to configure to make each slave resource has one VIP

2018-02-24 Thread 范国腾

Hello,

If all of the slave nodes crash, all of the slave vips could not work. 

Do we have any way to make all of the slave VIPs binds to the master node if 
there is no slave nodes in the system?

the user client will not know the system has problem in this way.

Thanks

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
发送时间: 2018年2月23日 17:37
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] 答复: How to configure to make each slave resource has one 
VIP

Dne 23.2.2018 v 10:16 范国腾 napsal(a):
> Tomas,
> 
> Thank you very much. I do the change according to your suggestion and it 
> works.
> 
> There is a question: If there are too much nodes (e.g.  total 10 slave nodes 
> ), I need run "pcs constraint colocation add pgsql-slave-ipx with 
> pgsql-slave-ipy -INFINITY" many times. Is there a simple command to do this?

I think colocation set does the trick:
pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2
pgsql-slave-ip3 setoptions score=-INFINITY You may specify as many resources as 
you need in this command.

Tomas

> 
> Master/Slave Set: pgsql-ha [pgsqld]
>   Masters: [ node1 ]
>   Slaves: [ node2 node3 ]
>   pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started node1
>   pgsql-slave-ip1(ocf::heartbeat:IPaddr2):   Started node3
>   pgsql-slave-ip2(ocf::heartbeat:IPaddr2):   Started node2
> 
> Thanks
> Steven
> 
> -邮件原件-
> 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> 发送时间: 2018年2月23日 17:02
> 收件人: users@clusterlabs.org
> 主题: Re: [ClusterLabs] How to configure to make each slave resource has 
> one VIP
> 
> Dne 23.2.2018 v 08:17 范国腾 napsal(a):
>> Hi,
>>
>> Our system manages the database (one master and multiple slave). We 
>> use one VIP for multiple Slave resources firstly.
>>
>> Now I want to change the configuration that each slave resource has a 
>> separate VIP. For example, I have 3 slave nodes and my VIP group has 
>> 2 vip; The 2 vips binds to node1 and node2 now; When the node2 fails, 
>> the vip could move to the node3.
>>
>>
>> I use the following command to add the VIP
>>
>> /      pcs resource group add pgsql-slave-group pgsql-slave-ip1 
>> pgsql-slave-ip2/
>>
>> /      pcs constraint colocation add pgsql-slave-group with slave 
>> pgsql-ha INFINITY/
>>
>> But now the two VIPs are the same nodes:
>>
>> /Master/Slave Set: pgsql-ha [pgsqld]/
>>
>> / Masters: [ node1 ]/
>>
>> / Slaves: [ node2 node3 ]/
>>
>> /pgsql-master-ip    (ocf::heartbeat:IPaddr2):   Started 
>> node1/
>>
>> /Resource Group: pgsql-slave-group/
>>
>> */ pgsql-slave-ip1    (ocf::heartbeat:IPaddr2):   Started
>> node2/*
>>
>> */ pgsql-slave-ip2    (ocf::heartbeat:IPaddr2):   Started
>> node2/*
>>
>> Could anyone tell how to configure to make each slave node has a VIP?
> 
> Resources in a group always run on the same node. You want the ip resources 
> to run on different nodes so you cannot put them into a group.
> 
> This will take the resources out of the group:
> pcs resource ungroup pgsql-slave-group
> 
> Then you can set colocation constraints for them:
> pcs constraint colocation add pgsql-slave-ip1 with slave pgsql-ha pcs 
> constraint colocation add pgsql-slave-ip2 with slave pgsql-ha
> 
> You may also need to tell pacemaker not to put both ips on the same node:
> pcs constraint colocation add pgsql-slave-ip1 with pgsql-slave-ip2 
> -INFINITY
> 
> 
> Regards,
> Tomas
> 
>>
>> Thanks
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
___
Users mai

[ClusterLabs] 答复: 答复: 答复: 答复: How to configure to make each slave resource has one VIP

2018-03-06 Thread 范国腾

Thank you, Rorthais,

I read the link and it is very helpful.

There are some issues that I have met when I installed the cluster.
1. “pcs cluster stop” could not stop the cluster in some times.
2. when I upgrade the PAF, I could just replace the pgsqlms file. When I 
upgrade the postgres, I just replace the /usr/local/pgsql/.
3.  If the cluster does not stop normally, the pgcontroldata status is not 
"SHUTDOWN",then the PAF would not start the postgresql any more, so I normally 
change the pgsqlms as below after installing the PAF.

elsif ( $pgisready_rc == 2 ) {
# The instance is not listening.
# We check the process status using pg_ctl status and check
# if it was propertly shut down using pg_controldata.
ocf_log( 'debug', 'pgsql_monitor: instance "%s" is not listening',
$OCF_RESOURCE_INSTANCE );
return _confirm_stopped();  ### remove this line
return $OCF_NOT_RUNNING;### add this line 
}


-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
发送时间: 2018年3月6日 17:08
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource 
has one VIP

Hi guys,

Few month ago, I started a new chapter about this exact subject for "PAF - 
Cluster administration under CentOS" ( see:
https://clusterlabs.github.io/PAF/CentOS-7-admin-cookbook.html)

Please, find attach my draft.

All feedback, fix, comments and intensive tests are welcome!

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: 答复: 答复: 答复: How to configure to make each slave resource has one VIP

2018-03-07 Thread 范国腾

Sorry, Rorthais, I have thought that the link and the attachment was the same 
document yesterday.
I just read the attachment and that is exactly what I ask originally.

I have two questions on the following two command:
# pcs constraint colocation add pgsql-ip-stby1 with slave pgsql-ha 10
Q: Does the score 10 means that " move to the master if there is no standby 
alive "?

# pcs constraint order start pgsql-ha then start pgsql-ip-stby1 kind=Mandatory
Q: I did not set the order and I did not find the issue until now. So I add 
this constraint? What will happen if I miss it?

Here is what I did now:
pcs resource create pgsql-slave-ip1 ocf:heartbeat:IPaddr2 ip=192.168.199.186 
nic=enp3s0f0 cidr_netmask=24 op monitor interval=10s;
pcs resource create pgsql-slave-ip2 ocf:heartbeat:IPaddr2 ip=192.168.199.187 
nic=enp3s0f0 cidr_netmask=24 op monitor interval=10s;
pcs constraint colocation add pgsql-slave-ip1 with pgsql-ha
pcs constraint colocation add pgsql-slave-ip2 with pgsql-ha
pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2 pgsql-master-ip 
setoptions score=-1000

-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
发送时间: 2018年3月7日 16:29
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource 
has one VIP

On Wed, 7 Mar 2018 01:27:16 +
范国腾 <fanguot...@highgo.com> wrote:

> Thank you, Rorthais,
> 
> I read the link and it is very helpful.

Did you read the draft I attached to the email? It was the main purpose of my
answer: helping you with IP on slaves. It seems to me your mail is reporting 
different issues than the original subject.

> There are some issues that I have met when I installed the cluster.

I suppose this is another subject and we should open a new thread with the 
appropriate subject.

> 1. “pcs cluster stop” could not stop the cluster in some times.

You would have to give some more details about the context where "pcs cluster 
stop" timed out.

> 2. when I upgrade the PAF, I could just replace the pgsqlms file. When 
> I upgrade the postgres, I just replace the /usr/local/pgsql/.

I believe both actions are documented with best practices in this links I gave 
you.

> 3.  If the cluster does not stop normally, the pgcontroldata status is 
> not "SHUTDOWN",then the PAF would not start the postgresql any more, 
> so I normally change the pgsqlms as below after installing the PAF.
> [...]

This should be discussed to understand the exact context before considering 
your patch.

At a first glance, your patch seems quite dangerous as it bypass the sanity 
checks.

Please, could you start a new thread with proper subject and add extensive 
informations about this issue? You could open a new issue on PAF repository as
well: https://github.com/ClusterLabs/PAF/issues

Regards,
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] The node and resource status is defferent when the node poweroff

2018-03-15 Thread 范国腾

Hello,

There are three nodes in our cluster (redhat7）. When we run "reboot" in one 
node, the "pcs status" show the node status is offline and the resource status 
is Stopped. That is fine. But when we power off the node directly, the node 
status is " UNCLEAN (offline)" and the resource status is " Started(UNCLEAN) ".

Why is the status different when one node shutdown in different way?  Could we 
have any way to make the resource status change from " Started node1 (UNCLEAN)" 
to "Stopped" when we poweroff the node computer?


1. The normal status:
scsi-shooter   (stonith:fence_scsi):   Started node1
 Clone Set: dlm-clone [dlm]
 Started: [ node1 node2 node3 ]
 Clone Set: clvmd-clone [clvmd]
 Started: [ node1 node2 node3 ]
 Clone Set: clusterfs-clone [clusterfs]
 Started: [ node1 node2 node3 ]
 Master/Slave Set: pgsql-ha [pgsqld]
 Masters: [ node3 ]
 Slaves: [ node1 node2 ]
 pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started node3

2. When executing "reboot" in one node:
Online: [ node2 node3 ]
OFFLINE: [ node1 ]

Full list of resources:

 scsi-shooter   (stonith:fence_scsi):   Started node2
 Clone Set: dlm-clone [dlm]
 Started: [ node2 node3 ]
 Stopped: [ node1 ]
 Clone Set: clvmd-clone [clvmd]
 Started: [ node2 node3 ]
 Stopped: [ node1 ]
 Clone Set: clusterfs-clone [clusterfs]
 Started: [ node2 node3 ]
 Stopped: [ node1 ]
 Master/Slave Set: pgsql-ha [pgsqld]
 Masters: [ node3 ]
 Slaves: [ node2 ]
 Stopped: [ node1 ]
 pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started node3

3. When power off the node:

Node node1: UNCLEAN (offline)
Online: [ node2 node3 ]

Full list of resources:

 scsi-shooter   (stonith:fence_scsi):   Started[ node1 node2 ]
 Clone Set: dlm-clone [dlm]
 dlm(ocf::pacemaker:controld):  Started node1 (UNCLEAN)
 Started: [ node2 node3 ]
 Clone Set: clvmd-clone [clvmd]
 clvmd  (ocf::heartbeat:clvm):  Started node1 (UNCLEAN)
 Started: [ node2 node3 ]
 Clone Set: clusterfs-clone [clusterfs]
 clusterfs  (ocf::heartbeat:Filesystem):Started node1 (UNCLEAN)
 Started: [ node2 node3 ]
 Master/Slave Set: pgsql-ha [pgsqld]
 pgsqld (ocf::heartbeat:pgsqlms):   Slave node1 (UNCLEAN)
 Masters: [ node3 ]
 Slaves: [ node2 ]
 pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started node3



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: The node and resource status is defferent when the node poweroff

2018-03-15 Thread 范国腾

Thank you, Andrei and Ulrich. Yes, I use a fake stonith now. I will test the 
working stonith device and see if it happens again.

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Andrei Borzenkov
发送时间: 2018年3月15日 16:06
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
抄送: 李晓飞 <lixiao...@highgo.com>; 祁华鹏 <qihuap...@highgo.com>
主题: Re: [ClusterLabs] The node and resource status is defferent when the node 
poweroff

On Thu, Mar 15, 2018 at 10:42 AM, 范国腾 <fanguot...@highgo.com> wrote:
> Hello,
>
> There are three nodes in our cluster (redhat7）. When we run "reboot" in one 
> node, the "pcs status" show the node status is offline and the resource 
> status is Stopped. That is fine. But when we power off the node directly, the 
> node status is " UNCLEAN (offline)" and the resource status is " 
> Started(UNCLEAN) ".
>
> Why is the status different when one node shutdown in different way?  Could 
> we have any way to make the resource status change from " Started node1 
> (UNCLEAN)" to "Stopped" when we poweroff the node computer?

You must have working stonith agent. Then when node unexpectedly goes away 
other nodes will invoke stonith which confirms that UNCLEAN node is down. After 
that pacemaker will change status to offline and will proceed with restarting 
resources that were running on powered off node.
___
Users mailing list: Users@clusterlabs.org 
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: 答复: 答复: 答复: How to configure to make each slave resource has one VIP

2018-03-08 Thread 范国腾


Thanks Rorthais, Got it. The following command could make sure that it move to 
the master if there is no standby alive:

pcs constraint colocation add pgsql-ip-stby1 with slave pgsql-ha 100
pcs constraint colocation add pgsql-ip-stby1 with pgsql-ha 50

-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
发送时间: 2018年3月8日 17:41
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource 
has one VIP

On Thu, 8 Mar 2018 01:45:43 +
范国腾 <fanguot...@highgo.com> wrote:

> Sorry, Rorthais, I have thought that the link and the attachment was 
> the same document yesterday.

No problem.

For your information, I merged the draft in the official documentation 
yesterday.

> I just read the attachment and that is exactly what I ask originally.

Excellent! Glad it could helped.

> I have two questions on the following two command:
> # pcs constraint colocation add pgsql-ip-stby1 with slave pgsql-ha 10
> Q: Does the score 10 means that " move to the master if there is no 
> standby alive "?

Kind of. It actually says nothing about moving to the master. It just says the 
slaves IP should prefers to locate with a slave. If slaves nodes are down or in 
standby, the IP "can" move to the master as nothing forbid it.

In fact, while writing this sentence, I realize there's nothing to push the 
slaves IP on the master if other nodes are up, but the pgsql-ha slaves are 
stopped or banned. The configuration I provided is incomplete.

1. I added the missing constraints in the doc online 2. notice I raised all the 
scores so they are higher than the stickiness

See:
https://clusterlabs.github.io/PAF/CentOS-7-admin-cookbook.html#adding-ips-on-slaves-nodes

Sorry for this :/

> # pcs constraint order start pgsql-ha then start pgsql-ip-stby1 
> kind=Mandatory
> Q: I did not set the order and I did not find the issue until now. So 
> I add this constraint? What will happen if I miss it?

The IP address can start before PostgreSQL is up on the node. You will have 
client connexions being rejected with error "PostgreSQL is not listening on 
host [...]".

> Here is what I did now:
> pcs resource create pgsql-slave-ip1 ocf:heartbeat:IPaddr2 ip=192.168.199.186
>   nic=enp3s0f0 cidr_netmask=24 op monitor interval=10s; pcs resource 
> create pgsql-slave-ip2 ocf:heartbeat:IPaddr2 ip=192.168.199.187
>   nic=enp3s0f0 cidr_netmask=24 op monitor interval=10s; pcs constraint 
> colocation add pgsql-slave-ip1 with pgsql-ha

It misses the score and the role. Without role specification, it can colocates 
with Master or Slave with no preference.

> pcs constraint colocation add pgsql-slave-ip2 with pgsql-ha

Same, it misses the score and the role.

> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2
>   pgsql-master-ip setoptions score=-1000

The score seems too high in  my opinion, compared to other ones.

You should probably remove all the colocation constraints and try with the one 
I pushed online.

Regards,

> -邮件原件-
> 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
> 发送时间: 2018年3月7日 16:29
> 收件人: 范国腾 <fanguot...@highgo.com>
> 抄送: Cluster Labs - All topics related to open-source clustering 
> welcomed <users@clusterlabs.org> 主题: Re: [ClusterLabs] 答复: 答复: 答复: How 
> to configure to make each slave resource has one VIP
> 
> On Wed, 7 Mar 2018 01:27:16 +
> 范国腾 <fanguot...@highgo.com> wrote:
> 
> > Thank you, Rorthais,
> > 
> > I read the link and it is very helpful.  
> 
> Did you read the draft I attached to the email? It was the main 
> purpose of my
> answer: helping you with IP on slaves. It seems to me your mail is 
> reporting different issues than the original subject.
> 
> > There are some issues that I have met when I installed the cluster.  
> 
> I suppose this is another subject and we should open a new thread with 
> the appropriate subject.
> 
> > 1. “pcs cluster stop” could not stop the cluster in some times.  
> 
> You would have to give some more details about the context where "pcs 
> cluster stop" timed out.
> 
> > 2. when I upgrade the PAF, I could just replace the pgsqlms file. 
> > When I upgrade the postgres, I just replace the /usr/local/pgsql/.
> 
> I believe both actions are documented with best practices in this 
> links I gave you.
> 
> > 3.  If the cluster does not stop normally, the pgcontroldata status 
> > is not "SHUTDOWN",then the PAF would not start the postgresql any 
> > more, so I normally change the pgsqlms as below after installing the PAF.
> > [...]
> 
> This should be discussed to understand the exac

[ClusterLabs] 答复: Trouble starting up PAF cluster for first time

2018-04-06 Thread 范国腾

Hi，
I am using PAF too. You could read the 
/usr/lib/ocf/resource.d/heartbeat/pgsqlms file to find what pgsql command is 
called.

For example, pacemaker start ->pg_ctl start, pacemaker monitor->pg_isready.

Thanks
Steven

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Casey & Gina
发送时间: 2018年4月7日 6:46
收件人: Cluster Labs - All topics related to open-source clustering welcomed 

主题: Re: [ClusterLabs] Trouble starting up PAF cluster for first time

It looks like the main problem was that I needed to add 
pghost="/var/run/postgresql" to the postgresql-10-main resource.  I'm not sure 
why I have to do that, but it makes things work.

For both this and my last E-mail to the list that was also a problem with the 
command being run to start the instance up, I'd like to understand how to 
diagnose what's happening better myself instead of resorting to guesswork.

How can I tell exactly what the command is that Pacemaker ends up calling to 
start PostgreSQL?  I don't see it in corosync.log.  If I could see exactly what 
was being tried, I could try running it by hand and determine the problem 
myself a lot more effectively.

Best wishes,
--
Casey
___
Users mailing list: Users@clusterlabs.org 
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: No slave is promoted to be master

2018-04-12 Thread 范国腾

Hello,

We use the following command to create the cluster. Node2 is always the master 
when the cluster starts. Why does pacemaker not select node1 as the default 
master?
How to configure if we want node1 to be the default master?

pcs cluster setup --name cluster_pgsql node1 node2
pcs resource create pgsqld ocf:heartbeat:pgsqlms bindir=/usr/local/pgsql/bin 
pgdata=/home/postgres/data op start timeout=600s op stop timeout=60s op promote 
timeout=300s op demote timeout=120s op monitor interval=15s timeout=100s 
role="Master" op monitor interval=16s timeout=100s role="Slave" op notify 
timeout=60s;pcs resource master pgsql-ha pgsqld notify=true interleave=true;


Sometimes it reports the following error, how to configure to avoid it?
[cid:image003.png@01D3D271.A4C44560]



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: No slave is promoted to be master

2018-04-15 Thread 范国腾

Thank you, Rorthais. I see now.

-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
发送时间: 2018年4月13日 17:17
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] No slave is promoted to be master

OK, I know what happen.

It seems like your standbies were not replicating when the master "crashed", 
you can find tons of messages like this in the log files:

  WARNING: No secondary connected to the master
  WARNING: "db2" is not connected to the primary
  WARNING: "db3" is not connected to the primary

When a standby is not replicating, the master set negative master score to them 
to forbid the promotion on them, as they are probably lagging for some 
undefined time.

The following command shows the scores just before the simulated master crash:

  $ crm_simulate -x pe-input-2039.bz2 -s|grep -E 'date|promotion'
  Using the original execution date of: 2018-04-11 16:23:07Z
  pgsqld:0 promotion score on db1: 1001
  pgsqld:1 promotion score on db2: -1000
  pgsqld:2 promotion score on db3: -1000

"1001" score design the master. Streaming standbies always have a positive 
master score between 1000 and 1000-N*10 where N is the number of connected 
standbies.



On Fri, 13 Apr 2018 01:37:54 +
范国腾 <fanguot...@highgo.com> wrote:

> The log is in the attachment.
> 
> We make a bug in the PG code in master node to make it not be 
> restarted any more in order to test the following scenario: One slave 
> could be promoted when the master crashed,
> 
> -邮件原件-
> 发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
> 发送时间: 2018年4月12日 17:39
> 收件人: 范国腾 <fanguot...@highgo.com>
> 抄送: Cluster Labs - All topics related to open-source clustering 
> welcomed <users@clusterlabs.org> 主题: Re: [ClusterLabs] No slave is 
> promoted to be master
> 
> Hi,
> On Thu, 12 Apr 2018 08:31:39 +
> 范国腾 <fanguot...@highgo.com> wrote:
> 
> > Thank you very much for help check this issue. The information is in 
> > the attachment.
> > 
> > I have restarted the cluster after I send my first email. Not sure 
> > if it affects the checking of "the result of "crm_simulate -sL"
> 
> It does...
> 
> Could you please provide files
> from /var/lib/pacemaker/pengine/pe-input-2039.bz2 to  pe-input-2065.bz2 ?
> 
> [...]
> > Then the master is restarted and it could not start（that is ok and 
> > we know the reason）。
> 
> Why couldn't it start ?



--
Jehan-Guillaume de Rorthais
Dalibo
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: Postgres PAF setup

2018-04-24 Thread 范国腾

I have meet the similar issue when the postgres is not stopped normally. 

You could run pg_controldata to check if your postgres status is 
shutdown/shutdown in recovery.

I change the /usr/lib/ocf/resource.d/heartbeat/pgsqlms to avoid this problem:

elsif ( $pgisready_rc == 2 ) {
# The instance is not listening.
# We check the process status using pg_ctl status and check
# if it was propertly shut down using pg_controldata.
ocf_log( 'debug', 'pgsql_monitor: instance "%s" is not listening',
$OCF_RESOURCE_INSTANCE );
# return _confirm_stopped();   # remove this line
return $OCF_NOT_RUNNING; 
}

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Adrien Nayrat
发送时间: 2018年4月24日 16:16
收件人: Andrew Edenburn ; pgsql-gene...@postgresql.org; 
users@clusterlabs.org
主题: Re: [ClusterLabs] Postgres PAF setup

On 04/23/2018 08:09 PM, Andrew Edenburn wrote:
> I am having issues with my PAF setup.  I am new to Postgres and have 
> setup the cluster as seen below.
> 
> I am getting this error when trying to start my cluster resources.
> 
>  
> 
> Master/Slave Set: pgsql-ha [pgsqld]
> 
>  pgsqld (ocf::heartbeat:pgsqlms):   FAILED dcmilphlum224 
> (unmanaged)
> 
>  pgsqld (ocf::heartbeat:pgsqlms):   FAILED dcmilphlum223 
> (unmanaged)
> 
> pgsql-master-ip    (ocf::heartbeat:IPaddr2):   Started 
> dcmilphlum223
> 
>  
> 
> Failed Actions:
> 
> * pgsqld_stop_0 on dcmilphlum224 'unknown error' (1): call=239, 
> status=complete, exitreason='Unexpected state for instance "pgsqld" 
> (returned 1)',
> 
>     last-rc-change='Mon Apr 23 13:11:17 2018', queued=0ms, exec=95ms
> 
> * pgsqld_stop_0 on dcmilphlum223 'unknown error' (1): call=248, 
> status=complete, exitreason='Unexpected state for instance "pgsqld" 
> (returned 1)',
> 
>     last-rc-change='Mon Apr 23 13:11:17 2018', queued=0ms, exec=89ms
> 
>  
> 
> cleanup and clear is not fixing any issues and I am not seeing 
> anything in the logs.  Any help would be greatly appreciated.
> 
>  

Hello Andrew,

Could you enable debug logs in Pacemaker?

With Centos you have to edit PCMK_debug variable in /etc/sysconfig/pacemaker :

PCMK_debug=crmd,pengine,lrmd

This should give you more information in logs. Monitor action in PAF should 
report why the cluster doesn't start :
https://github.com/ClusterLabs/PAF/blob/master/script/pgsqlms#L1525

Regards,

--
Adrien NAYRAT

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] the PAF switchover does not happen if the VIP resource is stopped

2018-04-25 Thread 范国腾

Hi,



Our lab has two resource: (1) PAF (master/slave)(2) VIP (bind to the master 
PAF node). The configuration is in the attachment.

Each node has two network card: One(enp0s8) is for the pacemaker heartbeat in 
internal network, the other(enp0s3) is for the master VIP in the external 
network.



We are testing the following case: if the master VIP network card is down, the 
master postgres and VIP could switch to another node.



1. At first, node2 is master, I run "ifdown enp0s3" in node2, then node1 become 
the master, that is ok.



[cid:image001.png@01D3DCB4.E6FC6140]



[cid:image002.png@01D3DCB5.4DF4DA80]



2. Then I run "ifup enp0s3" in node2, wait for 60 seconds, then run "ifdown 
enp0s3" in node1, but the node1 still be master. Why does switchover doesn't 
happened? How to recover to make system work?

[cid:image003.png@01D3DCB5.AC487F60]



The log is in the attachment. Node1 reports the following waring:



Apr 25 04:49:27 node1 crmd[24678]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE

Apr 25 04:49:27 node1 pengine[24677]: warning: Processing failed op start for 
master-vip on sds2: unknown error (1)

Apr 25 04:49:27 node1 pengine[24677]: warning: Processing failed op start for 
master-vip on sds1: unknown error (1)

Apr 25 04:49:27 node1 pengine[24677]: warning: Forcing master-vip away from 
sds1 after 100 failures (max=100)

Apr 25 04:49:27 node1 pengine[24677]: warning: Forcing master-vip away from 
sds2 after 100 failures (max=100)

Apr 25 04:49:27 node1 pengine[24677]:  notice: Calculated transition 14, saving 
inputs in /var/lib/pacemaker/pengine/pe-input-59.bz2

Apr 25 04:49:27 node1 crmd[24678]:  notice: Transition 14 (Complete=0, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-59.bz2): Complete






info.rar
Description: info.rar
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: No slave is promoted to be master

2018-04-16 Thread 范国腾

I check the status again. It is not not promoted but it promoted about 15 
minutes after the cluster starts. 

I try in three labs and the results are same: The promotion happens 15 minutes 
after the cluster starts. 

Why is there about 15 minutes delay every time?


Apr 16 22:08:32 node1 attrd[16618]:  notice: Node sds1 state is now member
Apr 16 22:08:32 node1 attrd[16618]:  notice: Node sds2 state is now member

..

Apr 16 22:21:36 node1 pgsqlms(pgsqld)[18230]: INFO: Execute action monitor and 
the result 0
Apr 16 22:21:52 node1 pgsqlms(pgsqld)[18257]: INFO: Execute action monitor and 
the result 0
Apr 16 22:22:09 node1 pgsqlms(pgsqld)[18296]: INFO: Execute action monitor and 
the result 0
Apr 16 22:22:25 node1 pgsqlms(pgsqld)[18315]: INFO: Execute action monitor and 
the result 0
Apr 16 22:22:41 node1 pgsqlms(pgsqld)[18343]: INFO: Execute action monitor and 
the result 0
Apr 16 22:22:57 node1 pgsqlms(pgsqld)[18362]: INFO: Execute action monitor and 
the result 0
Apr 16 22:23:13 node1 pgsqlms(pgsqld)[18402]: INFO: Execute action monitor and 
the result 0
Apr 16 22:23:29 node1 pgsqlms(pgsqld)[18421]: INFO: Execute action monitor and 
the result 0
Apr 16 22:23:45 node1 pgsqlms(pgsqld)[18449]: INFO: Execute action monitor and 
the result 0
Apr 16 22:23:57 node1 crmd[16620]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE
Apr 16 22:23:57 node1 pengine[16619]:  notice: Promote pgsqld:0#011(Slave -> 
Master sds1)
Apr 16 22:23:57 node1 pengine[16619]:  notice: Start   master-vip#011(sds1)
Apr 16 22:23:57 node1 pengine[16619]:  notice: Start   pgsql-master-ip#011(sds1)
Apr 16 22:23:57 node1 pengine[16619]:  notice: Calculated transition 1, saving 
inputs in /var/lib/pacemaker/pengine/pe-input-18.bz2
Apr 16 22:23:57 node1 crmd[16620]:  notice: Initiating cancel operation 
pgsqld_monitor_16000 locally on sds1
Apr 16 22:23:57 node1 crmd[16620]:  notice: Initiating notify operation 
pgsqld_pre_notify_promote_0 locally on sds1
Apr 16 22:23:57 node1 crmd[16620]:  notice: Initiating notify operation 
pgsqld_pre_notify_promote_0 on sds2
Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18467]: INFO: Promoting instance on node 
"sds1"
Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18467]: INFO: Current node TL#LSN: 
4#117440512
Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18467]: INFO: Execute action notify and 
the result 0
Apr 16 22:23:58 node1 crmd[16620]:  notice: Result of notify operation for 
pgsqld on sds1: 0 (ok)
Apr 16 22:23:58 node1 crmd[16620]:  notice: Initiating promote operation 
pgsqld_promote_0 locally on sds1
Apr 16 22:23:58 node1 pgsqlms(pgsqld)[18499]: INFO: Waiting for the promote to 
complete
Apr 16 22:23:59 node1 pgsqlms(pgsqld)[18499]: INFO: Promote complete



[root@node1 ~]# crm_simulate -sL

Current cluster status:
Online: [ sds1 sds2 ]

 Master/Slave Set: pgsql-ha [pgsqld]
 Masters: [ sds1 ]
 Slaves: [ sds2 ]
 Resource Group: mastergroup
 master-vip (ocf::heartbeat:IPaddr2):   Started sds1
 pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started sds1

Allocation scores:
clone_color: pgsql-ha allocation score on sds1: 1
clone_color: pgsql-ha allocation score on sds2: 1
clone_color: pgsqld:0 allocation score on sds1: 1003
clone_color: pgsqld:0 allocation score on sds2: 1
clone_color: pgsqld:1 allocation score on sds1: 1
clone_color: pgsqld:1 allocation score on sds2: 1002
native_color: pgsqld:0 allocation score on sds1: 1003
native_color: pgsqld:0 allocation score on sds2: 1
native_color: pgsqld:1 allocation score on sds1: -INFINITY
native_color: pgsqld:1 allocation score on sds2: 1002
pgsqld:0 promotion score on sds1: 1002
pgsqld:1 promotion score on sds2: 1001
group_color: mastergroup allocation score on sds1: 0
group_color: mastergroup allocation score on sds2: 0
group_color: master-vip allocation score on sds1: 0
group_color: master-vip allocation score on sds2: 0
native_color: master-vip allocation score on sds1: 1003
native_color: master-vip allocation score on sds2: -INFINITY
native_color: pgsql-master-ip allocation score on sds1: 1003
native_color: pgsql-master-ip allocation score on sds2: -INFINITY

Transition Summary:
[root@node1 ~]#

You could reproduce the issue in two nodes, and execute the following command. 
Then run "pcs cluster stop --all" and "pcs cluster start --all".

pcs resource create pgsqld ocf:heartbeat:pgsqlms 
bindir=/home/highgo/highgo/database/4.3.1/bin 
pgdata=/home/highgo/highgo/database/4.3.1/data op start timeout=600s op stop 
timeout=60s op promote timeout=300s op demote timeout=120s op monitor 
interval=10s timeout=100s role="Master" op monitor interval=16s timeout=100s 
role="Slave" op notify timeout=60s
pcs resource master pgsql-ha pgsqld notify=true interleave=true





-邮件原件-
发件人: 范国腾 
发送时间: 2018年4月17日 10:25
收件人: 'Jehan-Guillaume de Rorthais' <j...@dalibo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: [ClusterLabs

[ClusterLabs] 答复: No slave is promoted to be master

2018-04-17 Thread 范国腾

Thank you very much, Rorthais,

I see now. I have two more questions.

1. If I change the "cluster-recheck-interval" parameter from the default 15 
minutes to 10 seconds, is there any bad impact? Could this be a workaround?

2. This issue happens only in the following configuration.

[cid:image003.jpg@01D3D700.2F3E24D0]

But it does not happen in the following configuration. Why is the behaviors 
different?

[cid:image004.jpg@01D3D700.2F3E24D0]

-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com]
发送时间: 2018年4月17日 17:47
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] No slave is promoted to be master

On Tue, 17 Apr 2018 04:16:38 +

范国腾 <fanguot...@highgo.com<mailto:fanguot...@highgo.com>> wrote:

> I check the status again. It is not not promoted but it promoted about

> 15 minutes after the cluster starts.

>

> I try in three labs and the results are same: The promotion happens 15

> minutes after the cluster starts.

>

> Why is there about 15 minutes delay every time?

This was a bug in Pacemaker up to 1.1.17. I did a report about this last August 
and Ken Gaillot fixed it few days later in 1.1.18. See:

https://lists.clusterlabs.org/pipermail/developers/2017-August/001110.html

https://lists.clusterlabs.org/pipermail/developers/2017-September/001113.html

I wonder if disabling the pgsql resource before shutting down the cluster might 
be a simpler and safer workaround. Eg.:

pcs resource disable pgsql-ha  --wait

pcs cluster stop --all

and

pcs cluster start --all

pcs resource enable pgsql-ha

Another fix would be to force a master score on one node **if needed** using:

  crm_master -N  -r  -l forever -v 1

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] No slave is promoted to be master

2018-04-16 Thread 范国腾

Hi，

We install a new lab which only have the postgres resource and the vip 
resource. After the cluster is installed, the status is ok: only node is master 
and the other is slave. Then I run "pcs cluster stop --all" to close the 
cluster and then I run the "pcs cluster start  --all" to start the cluster. All 
of the pgsql is slave status and they could not be promoted to be master any 
more like this:

Master/Slave Set: pgsql-ha [pgsqld]
 Slaves: [ sds1 sds2 ] 


There is no error in the log and the " crm_simulate -sL" show the flowing and 
it seems that the score is ok too. The detailed log and config is in the 
attachment.

[root@node1 ~]# crm_simulate -sL

Current cluster status:
Online: [ sds1 sds2 ]

 Master/Slave Set: pgsql-ha [pgsqld]
 Slaves: [ sds1 sds2 ]
 Resource Group: mastergroup
 master-vip (ocf::heartbeat:IPaddr2):   Stopped
 pgsql-master-ip(ocf::heartbeat:IPaddr2):   Stopped

Allocation scores:
clone_color: pgsql-ha allocation score on sds1: 1
clone_color: pgsql-ha allocation score on sds2: 1
clone_color: pgsqld:0 allocation score on sds1: 1003
clone_color: pgsqld:0 allocation score on sds2: 1
clone_color: pgsqld:1 allocation score on sds1: 1
clone_color: pgsqld:1 allocation score on sds2: 1002
native_color: pgsqld:0 allocation score on sds1: 1003
native_color: pgsqld:0 allocation score on sds2: 1
native_color: pgsqld:1 allocation score on sds1: -INFINITY
native_color: pgsqld:1 allocation score on sds2: 1002
pgsqld:0 promotion score on sds1: 1002
pgsqld:1 promotion score on sds2: 1001
group_color: mastergroup allocation score on sds1: 0
group_color: mastergroup allocation score on sds2: 0
group_color: master-vip allocation score on sds1: 0
group_color: master-vip allocation score on sds2: 0
native_color: master-vip allocation score on sds1: 1003
native_color: master-vip allocation score on sds2: -INFINITY
native_color: pgsql-master-ip allocation score on sds1: 1003
native_color: pgsql-master-ip allocation score on sds2: -INFINITY

Transition Summary:
 * Promote pgsqld:0 (Slave -> Master sds1)
 * Start   master-vip   (sds1)
 * Start   pgsql-master-ip  (sds1)


log.rar
Description: log.rar
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] How to setup a simple master/slave cluster in two nodes without stonith resource

2018-04-02 Thread 范国腾

Hello,

I want to setup a cluster in two nodes. One is master and the other is slave. I 
don’t need the fencing device because my internal network is stable.  I use the 
following command to create the resource, but all of the two nodes are slave 
and cluster don’t promote it to master. Could you please help check if there is 
anything wrong with my configuration?

pcs property set stonith-enabled=false;
pcs resource create pgsqld ocf:heartbeat:pgsqlms bindir=/usr/local/pgsql/bin 
pgdata=/home/postgres/data op start timeout=600s op stop timeout=60s op promote 
timeout=300s op demote timeout=120s op monitor interval=15s timeout=100s 
role="Master" op monitor interval=16s timeout=100s role="Slave" op notify 
timeout=60s;pcs resource master pgsql-ha pgsqld notify=true interleave=true;

The status is as below:

[root@node1 ~]# pcs status
Cluster name: cluster_pgsql
Stack: corosync
Current DC: node2-1 (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Mon Apr  2 21:51:57 2018  Last change: Mon Apr  2 
21:32:22 2018 by hacluster via crmd on node2-1

2 nodes and 3 resources configured

Online: [ node1-1 node2-1 ]

Full list of resources:

Master/Slave Set: pgsql-ha [pgsqld]
 Slaves: [ node1-1 node2-1 ]
pgsql-master-ip(ocf::heartbeat:IPaddr2):   Stopped

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

When I execute pcs resource cleanup in one node, there is always one node print 
the following waring message in the /var/log/messages. But the other nodes’ log 
show no error. The resource log（pgsqlms） show the monitor action could return 0 
but why the crmd log show failed?

Apr  2 21:53:09 node2 crmd[2425]: warning: No reason to expect node 1 to be down
Apr  2 21:53:09 node2 crmd[2425]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph
Apr  2 21:53:09 node2 crmd[2425]: warning: No reason to expect node 2 to be down
Apr  2 21:53:09 node2 pengine[2424]:  notice: Start   pgsqld:0#011(node1-1)
Apr  2 21:53:09 node2 pengine[2424]:  notice: Start   pgsqld:1#011(node2-1)
Apr  2 21:53:09 node2 pengine[2424]:  notice: Calculated transition 4, saving 
inputs in /var/lib/pacemaker/pengine/pe-input-6.bz2
Apr  2 21:53:09 node2 crmd[2425]:  notice: Initiating monitor operation 
pgsqld:0_monitor_0 on node1-1 | action 2
Apr  2 21:53:09 node2 crmd[2425]:  notice: Initiating monitor operation 
pgsqld:1_monitor_0 locally on node2-1 | action 3
Apr  2 21:53:09 node2 pgsqlms(pgsqld)[3644]: INFO: Action is monitor
Apr  2 21:53:09 node2 pgsqlms(pgsqld)[3644]: INFO: pgsql_monitor: monitor is a 
probe
Apr  2 21:53:09 node2 pgsqlms(pgsqld)[3644]: INFO: pgsql_monitor: instance 
"pgsqld" is listening
Apr  2 21:53:09 node2 pgsqlms(pgsqld)[3644]: INFO: Action result is 0
Apr  2 21:53:09 node2 crmd[2425]:  notice: Result of probe operation for pgsqld 
on node2-1: 0 (ok) | call=33 key=pgsqld_monitor_0 confirmed=true cib-update=62
Apr  2 21:53:09 node2 crmd[2425]: warning: Action 3 (pgsqld:1_monitor_0) on 
node2-1 failed (target: 7 vs. rc: 0): Error
Apr  2 21:53:09 node2 crmd[2425]:  notice: Transition aborted by operation 
pgsqld_monitor_0 'create' on node2-1: Event failed | 
magic=0:0;3:4:7:3a132f28-d8b9-4948-bb6b-736edc221664 cib=0.28.2 
source=match_graph_event:310 complete=false
Apr  2 21:53:09 node2 crmd[2425]: warning: Action 3 (pgsqld:1_monitor_0) on 
node2-1 failed (target: 7 vs. rc: 0): Error
Apr  2 21:53:09 node2 crmd[2425]: warning: Action 2 (pgsqld:0_monitor_0) on 
node1-1 failed (target: 7 vs. rc: 0): Error
Apr  2 21:53:09 node2 crmd[2425]: warning: Action 2 (pgsqld:0_monitor_0) on 
node1-1 failed (target: 7 vs. rc: 0): Error
Apr  2 21:53:09 node2 crmd[2425]:  notice: Transition 4 (Complete=4, Pending=0, 
Fired=0, Skipped=0, Incomplete=10, 
Source=/var/lib/pacemaker/pengine/pe-input-6.bz2): Complete
Apr  2 21:53:09 node2 pengine[2424]:  notice: Calculated transition 5, saving 
inputs in /var/lib/pacemaker/pengine/pe-input-7.bz2
Apr  2 21:53:09 node2 crmd[2425]:  notice: Initiating monitor operation 
pgsqld_monitor_16000 locally on node2-1 | action 4
Apr  2 21:53:09 node2 crmd[2425]:  notice: Initiating monitor operation 
pgsqld_monitor_16000 on node1-1 | action 7
Apr  2 21:53:09 node2 pgsqlms(pgsqld)[3663]: INFO: Action is monitor
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: How to setup a simple master/slave cluster in two nodes without stonith resource

2018-04-02 Thread 范国腾

Yes, my resource are started and they are slave status. So I run "pcs resource 
cleanup pgsql-ha" command. The log shows the error when I run this command.

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Andrei Borzenkov
发送时间: 2018年4月3日 12:00
收件人: users@clusterlabs.org
主题: Re: [ClusterLabs] How to setup a simple master/slave cluster in two nodes 
without stonith resource

03.04.2018 05:07, 范国腾 пишет:
> Hello,
> 
> I want to setup a cluster in two nodes. One is master and the other is slave. 
> I don’t need the fencing device because my internal network is stable.  I use 
> the following command to create the resource, but all of the two nodes are 
> slave and cluster don’t promote it to master. Could you please help check if 
> there is anything wrong with my configuration?
> 
> pcs property set stonith-enabled=false; pcs resource create pgsqld 
> ocf:heartbeat:pgsqlms bindir=/usr/local/pgsql/bin 
> pgdata=/home/postgres/data op start timeout=600s op stop timeout=60s 
> op promote timeout=300s op demote timeout=120s op monitor interval=15s 
> timeout=100s role="Master" op monitor interval=16s timeout=100s 
> role="Slave" op notify timeout=60s;pcs resource master pgsql-ha pgsqld 
> notify=true interleave=true;
> 
> The status is as below:
> 
> [root@node1 ~]# pcs status
> Cluster name: cluster_pgsql
> Stack: corosync
> Current DC: node2-1 (version 1.1.15-11.el7-e174ec8) - partition with quorum
> Last updated: Mon Apr  2 21:51:57 2018  Last change: Mon Apr  2 
> 21:32:22 2018 by hacluster via crmd on node2-1
> 
> 2 nodes and 3 resources configured
> 
> Online: [ node1-1 node2-1 ]
> 
> Full list of resources:
> 
> Master/Slave Set: pgsql-ha [pgsqld]
>  Slaves: [ node1-1 node2-1 ]
> pgsql-master-ip(ocf::heartbeat:IPaddr2):   Stopped
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> 
> When I execute pcs resource cleanup in one node, there is always one node 
> print the following waring message in the /var/log/messages. But the other 
> nodes’ log show no error. The resource log（pgsqlms） show the monitor action 
> could return 0 but why the crmd log show failed?
> 
> Apr  2 21:53:09 node2 crmd[2425]: warning: No reason to expect node 1 
> to be down Apr  2 21:53:09 node2 crmd[2425]:  notice: State transition 
> S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph Apr  2 21:53:09 node2 crmd[2425]: warning: No 
> reason to expect node 2 to be down
> Apr  2 21:53:09 node2 pengine[2424]:  notice: Start   pgsqld:0#011(node1-1)
> Apr  2 21:53:09 node2 pengine[2424]:  notice: Start   pgsqld:1#011(node2-1)
> Apr  2 21:53:09 node2 pengine[2424]:  notice: Calculated transition 4, 
> saving inputs in /var/lib/pacemaker/pengine/pe-input-6.bz2
> Apr  2 21:53:09 node2 crmd[2425]:  notice: Initiating monitor 
> operation pgsqld:0_monitor_0 on node1-1 | action 2 Apr  2 21:53:09 
> node2 crmd[2425]:  notice: Initiating monitor operation 
> pgsqld:1_monitor_0 locally on node2-1 | action 3 Apr  2 21:53:09 node2 
> pgsqlms(pgsqld)[3644]: INFO: Action is monitor Apr  2 21:53:09 node2 
> pgsqlms(pgsqld)[3644]: INFO: pgsql_monitor: monitor is a probe Apr  2 
> 21:53:09 node2 pgsqlms(pgsqld)[3644]: INFO: pgsql_monitor: instance 
> "pgsqld" is listening Apr  2 21:53:09 node2 pgsqlms(pgsqld)[3644]: 
> INFO: Action result is 0 Apr  2 21:53:09 node2 crmd[2425]:  notice: 
> Result of probe operation for pgsqld on node2-1: 0 (ok) | call=33 
> key=pgsqld_monitor_0 confirmed=true cib-update=62 Apr  2 21:53:09 
> node2 crmd[2425]: warning: Action 3 (pgsqld:1_monitor_0) on node2-1 
> failed (target: 7 vs. rc: 0): Error Apr  2 21:53:09 node2 crmd[2425]:  
> notice: Transition aborted by operation pgsqld_monitor_0 'create' on 
> node2-1: Event failed | 
> magic=0:0;3:4:7:3a132f28-d8b9-4948-bb6b-736edc221664 cib=0.28.2 
> source=match_graph_event:310 complete=false Apr  2 21:53:09 node2 
> crmd[2425]: warning: Action 3 (pgsqld:1_monitor_0) on node2-1 failed 
> (target: 7 vs. rc: 0): Error Apr  2 21:53:09 node2 crmd[2425]: 
> warning: Action 2 (pgsqld:0_monitor_0) on node1-1 failed (target: 7 
> vs. rc: 0): Error Apr  2 21:53:09 node2 crmd[2425]: warning: Action 2 
> (pgsqld:0_monitor_0) on node1-1 failed (target: 7 vs. rc: 0): Error

Apparently your applications are already started on both nodes at the time you 
start pacemaker. Pacemaker expects resources to be in inactive state initially.

> Apr  2 21:53:09 node2 crmd[2425]:  notice: Transition 4 (Complete=4, 
> Pending=0, Fired=0, Skipped=0, Incomplete=10, 
> Source=/var/lib/pacemaker/pengine/pe-input-6.bz2): Complete Apr  2 
> 21:53:09 node2 pengine[2424]:  noti

[ClusterLabs] 答复: How to setup a simple master/slave cluster in two nodes without stonith resource

2018-04-03 Thread 范国腾

Rorthais，

Thank you very much for your help. I do according to your comments and the 
cluster status is ok now.

I want to ask two more questions:

1. This line code in PAF prevent the score to be set. Why does PAF request the 
prev_state must be shutdown? Could I just set the score if it is not set?

if ( $prev_state eq "shut down" and not _master_score_exists() )

2. The log shows " Transition aborted by operation pgsqld_monitor_0 'create' on 
node2-1: Event failed ". How could we see the score is not set according this 
log?

Thanks

-邮件原件-
发件人: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] 
发送时间: 2018年4月3日 21:02
收件人: 范国腾 <fanguot...@highgo.com>
抄送: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] How to setup a simple master/slave cluster in two nodes 
without stonith resource

On Tue, 3 Apr 2018 14:41:56 +0200
Jehan-Guillaume de Rorthais <j...@dalibo.com> wrote:

> On Tue, 3 Apr 2018 02:07:50 +
> 范国腾 <fanguot...@highgo.com> wrote:
> 
> > Hello,
> > 
> > I want to setup a cluster in two nodes. One is master and the other 
> > is slave. I don’t need the fencing device because my internal network is
> > stable.   
> 
> How much stable it is? This assumption is frequently wrong.
> 
> See: https://aphyr.com/posts/288-the-network-is-reliable

Plus, if you really don't want to setup node fencing, at least, setup watchdog:
https://clusterlabs.github.io/PAF/CentOS-7-admin-cookbook.html#setting-up-a-watchdog
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] 答复: 答复: 答复: How to configure to make each slave resource has one VIP

2018-03-05 Thread 范国腾

Thank you, Ken. Got it :)

-邮件原件-
发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Ken Gaillot
发送时间: 2018年3月6日 7:18
收件人: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>
主题: Re: [ClusterLabs] 答复: 答复: How to configure to make each slave resource has 
one VIP

On Sun, 2018-02-25 at 02:24 +0000, 范国腾 wrote:
> Hello,
> 
> If all of the slave nodes crash, all of the slave vips could not work.
> 
> Do we have any way to make all of the slave VIPs binds to the master 
> node if there is no slave nodes in the system?
> 
> the user client will not know the system has problem in this way.
> 
> Thanks

Hi,

If you colocate all the slave IPs "with pgsql-ha" instead of "with slave 
pgsql-ha", then they can run on either master or slave nodes.

Including the master IP in the anti-colocation set will keep them apart 
normally.

> 
> -邮件原件-
> 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> 发送时间: 2018年2月23日 17:37
> 收件人: users@clusterlabs.org
> 主题: Re: [ClusterLabs] 答复: How to configure to make each slave resource 
> has one VIP
> 
> Dne 23.2.2018 v 10:16 范国腾 napsal(a):
> > Tomas,
> > 
> > Thank you very much. I do the change according to your suggestion 
> > and it works.
> > 
> > There is a question: If there are too much nodes (e.g.  total 10 
> > slave nodes ), I need run "pcs constraint colocation add pgsql- 
> > slave-ipx with pgsql-slave-ipy -INFINITY" many times. Is there a 
> > simple command to do this?
> 
> I think colocation set does the trick:
> pcs constraint colocation set pgsql-slave-ip1 pgsql-slave-ip2
> pgsql-slave-ip3 setoptions score=-INFINITY You may specify as many 
> resources as you need in this command.
> 
> Tomas
> 
> > 
> > Master/Slave Set: pgsql-ha [pgsqld]
> >   Masters: [ node1 ]
> >   Slaves: [ node2 node3 ]
> >   pgsql-master-ip(ocf::heartbeat:IPaddr2):   Started
> > node1
> >   pgsql-slave-ip1(ocf::heartbeat:IPaddr2):   Started
> > node3
> >   pgsql-slave-ip2(ocf::heartbeat:IPaddr2):   Started
> > node2
> > 
> > Thanks
> > Steven
> > 
> > -----邮件原件-
> > 发件人: Users [mailto:users-boun...@clusterlabs.org] 代表 Tomas Jelinek
> > 发送时间: 2018年2月23日 17:02
> > 收件人: users@clusterlabs.org
> > 主题: Re: [ClusterLabs] How to configure to make each slave resource 
> > has one VIP
> > 
> > Dne 23.2.2018 v 08:17 范国腾 napsal(a):
> > > Hi,
> > > 
> > > Our system manages the database (one master and multiple slave).
> > > We
> > > use one VIP for multiple Slave resources firstly.
> > > 
> > > Now I want to change the configuration that each slave resource 
> > > has a separate VIP. For example, I have 3 slave nodes and my VIP 
> > > group has
> > > 2 vip; The 2 vips binds to node1 and node2 now; When the node2 
> > > fails, the vip could move to the node3.
> > > 
> > > 
> > > I use the following command to add the VIP
> > > 
> > > /      pcs resource group add pgsql-slave-group pgsql-slave-ip1 
> > > pgsql-slave-ip2/
> > > 
> > > /      pcs constraint colocation add pgsql-slave-group with slave 
> > > pgsql-ha INFINITY/
> > > 
> > > But now the two VIPs are the same nodes:
> > > 
> > > /Master/Slave Set: pgsql-ha [pgsqld]/
> > > 
> > > / Masters: [ node1 ]/
> > > 
> > > / Slaves: [ node2 node3 ]/
> > > 
> > > /pgsql-master-ip    (ocf::heartbeat:IPaddr2):   Started 
> > > node1/
> > > 
> > > /Resource Group: pgsql-slave-group/
> > > 
> > > */ pgsql-slave-ip1    (ocf::heartbeat:IPaddr2):   Started
> > > node2/*
> > > 
> > > */ pgsql-slave-ip2    (ocf::heartbeat:IPaddr2):   Started
> > > node2/*
> > > 
> > > Could anyone tell how to configure to make each slave node has a 
> > > VIP?
> > 
> > Resources in a group always run on the same node. You want the ip 
> > resources to run on different nodes so you cannot put them into a 
> > group.
> > 
> > This will take the resources out of the group:
> > pcs resource ungroup pgsql-slave-group
> > 
> > Then you can set colocation constraints for them:
> > pcs constraint colocation add pgsql-slave-ip1 with slave pgsql-ha 
> > pcs constraint colocation add pgsql-slave-ip2 with slave pgsql-ha
> > 
> > You may also need to tell pacemaker not to put both ips on the

[ClusterLabs] How to make PAF use psql to login with password

2019-03-06 Thread 范国腾

Hi,

We use the PAF (https://dalibo.github.io/PAF/?) to manage the postgresql.


According to user's requirement, we could not use trust mode in the 
pg_hba.conf? file. So when running psql, it will ask us to input the password 
and we have to input the password manually.


So the pcs status show the following error:

* pgsqld_stop_0 on node1-pri 'unknown error' (1): call=34, status=complete, 
exitreason='Unexpected state for instance "pgsqld" (returned 1)',
last-rc-change='Wed Mar  6 09:09:46 2019', queued=0ms, exec=504ms



The cause of the error is that the PAF 
(/usr/lib/ocf/resource.d/heartbeat/pgsqlms)  will ask to input the password and 
we could not pass the password to psql command in the PAF script.


exec $PGPSQL, '--set', 'ON_ERROR_STOP=1', '-qXAtf', $tmpfile,
'-R', $RS, '-F', $FS, '--port', $pgport, '--host', 
$pghost,'--username','sysdba',

Is there any way for us to pass the password to the psql command in the PAF 
script?

We have tried to add the "export PGPASSWORD=123456" in the /etc/profile and it 
does not work.

thanks






___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

44 matches

Mail list logo