Re: [ClusterLabs] pacemaker pingd with ms drbd = double masters short time when disconnected networks.

2017-12-19 Thread emmanuel segura
You need to configure the stonith and drbd stonith handler

2017-12-19 8:19 GMT+01:00 Прокопов Павел :

> Hello!
>
> pacemaker pingd with ms drbd = double masters short time when disconnected
> networks.
>
> My crm config:
>
> node 168885811: pp-pacemaker1.heliosoft.ru
> node 168885812: pp-pacemaker2.heliosoft.ru
> primitive drbd1 ocf:linbit:drbd \
> params drbd_resource=drbd1 \
> op monitor interval=60s \
> op start interval=15 timeout=240s \
> op stop interval=15 timeout=240s \
> op monitor role=Master interval=30s \
> op monitor role=Slave interval=60s
> primitive fs_drbd1 Filesystem \
> params device="/dev/drbd1" directory="/mnt/drbd1" fstype=ext4
> options=noatime
> primitive pinger ocf:pacemaker:ping \
> params host_list=10.16.4.1 multiplier=100 \
> op monitor interval=15s \
> op start interval=0 timeout=5s \
> op stop interval=0
> primitive vip IPaddr2 \
> params ip=10.16.5.227 nic=eth0 \
> op monitor interval=10s
> primitive vip2 IPaddr2 \
> params ip=10.16.254.50 nic=eth1 \
> op monitor interval=10s
> group group_master fs_drbd1 vip vip2
> ms ms_drbd1 drbd1 \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
> clone pingerclone pinger \
> meta globally-unique=false
> colocation colocation_master inf: ms_drbd1:Master group_master
> location location_master_ms_drbd1 ms_drbd1 \
> rule $role=Master -inf: not_defined pingd or pingd lte 0
> order main_order Mandatory: pingerclone:start ms_drbd1:promote
> group_master:start
> property cib-bootstrap-options: \
> stonith-enabled=false \
> no-quorum-policy=ignore \
> default-resource-stickiness=500 \
> cluster-name=pp1
>
> root@pp-pacemaker2:~# crm_mon -1
> Stack: corosync
> Current DC: pp-pacemaker2.heliosoft.ru (version 1.1.16-94ff4df) -
> partition with quorum
> Last updated: Fri Dec 15 13:48:10 2017
> Last change: Fri Dec 15 13:46:38 2017 by root via cibadmin on
> pp-pacemaker1.heliosoft.ru
>
> 2 nodes configured
> 7 resources configured
>
> Online: [ pp-pacemaker1.heliosoft.ru pp-pacemaker2.heliosoft.ru ]
>
> Active resources:
>
>  Resource Group: group_master
>  fs_drbd1(ocf::heartbeat:Filesystem):Started
> pp-pacemaker1.heliosoft.ru
>  vip(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker1.heliosoft.ru
>  vip2(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker1.heliosoft.ru
>  Master/Slave Set: ms_drbd1 [drbd1]
>  Masters: [ pp-pacemaker1.heliosoft.ru ]
>  Slaves: [ pp-pacemaker2.heliosoft.ru ]
>  Clone Set: pingerclone [pinger]
>  Started: [ pp-pacemaker1.heliosoft.ru pp-pacemaker2.heliosoft.ru ]
> #end crm_mon
>
> When I disconnect pp-pacemaker2 from all networks, I have:
> root@pp-pacemaker2:~# crm_mon -1
> Stack: corosync
> Current DC: pp-pacemaker2.heliosoft.ru (version 1.1.16-94ff4df) -
> partition with quorum
> Last updated: Fri Dec 15 13:53:15 2017
> Last change: Fri Dec 15 13:53:00 2017 by root via cibadmin on
> pp-pacemaker2.heliosoft.ru
>
> 2 nodes configured
> 7 resources configured
>
> Online: [ pp-pacemaker2.heliosoft.ru]
> OFFLINE: [pp-pacemaker1.heliosoft.ru ]
>
> Active resources:
>
>  Resource Group: group_master
>  fs_drbd1(ocf::heartbeat:Filesystem):Started
> pp-pacemaker2.heliosoft.ru
>  vip(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker2.heliosoft.ru
>  vip2(ocf::heartbeat:IPaddr2):Started
> pp-pacemaker2.heliosoft.ru
>  Master/Slave Set: ms_drbd1 [drbd1]
>  Masters: [ pp-pacemaker2.heliosoft.ru ]
>  Clone Set: pingerclone [pinger]
>  Started: [ pp-pacemaker2.heliosoft.ru ]
> #end crm_mon
>
> Wait 5 seconds.
>
> root@pp-pacemaker2:~# crm_mon -1
> Stack: corosync
> Current DC: pp-pacemaker2.heliosoft.ru (version 1.1.16-94ff4df) -
> partition with quorum
> Last updated: Fri Dec 15 13:48:10 2017
> Last change: Fri Dec 15 13:46:38 2017 by root via cibadmin on
> pp-pacemaker1.heliosoft.ru
>
> 2 nodes configured
> 7 resources configured
>
> Online: [ pp-pacemaker2.heliosoft.ru
> OFFLINE: [pp-pacemaker1.heliosoft.ru ]
>
> Active resources:
>
>  Master/Slave Set: ms_drbd1 [drbd1]
>  Slaves: [ pp-pacemaker2.heliosoft.ru ]
>  Clone Set: pingerclone [pinger]
>  Started: [ pp-pacemaker2.heliosoft.ru ]
> #end crm_mon
>
> Why pp-pacemaker2 first become a master? It breaks drdb.
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] corosync.conf token configuration

2017-12-19 Thread Adrián Gómez
Hello,

I was wondering if someone have some description of the parameters: token, 
token_retransmits, token_retransmits_before_loss_const and consensus. I have 
read about it in the man page of corosync.conf but trying some configuration of 
the cluster I realized that I did not control when the new configuration or 
stonith was going to happened. I have tried corosync 1.X and 2.X in several 
virtual servers (debian-9). 

corosync.conf:

# How long before declaring a token lost (ms)
token: 2

# Consensus, time before token lost to stonith the server (ms)
# consensus: 6

# Interval between tokens (ms)
# token_retransmit: 1

# How many token retransmited before forming a new configuration
token_retransmits_before_loss_const: 20

I expected to declare the token lost before 20s after the processor failed (for 
example, connection lost to the servers), then 
"token_retransmit_before_lost_const" should act (I don’t know how it works) and 
the stonith occurs 24s before the message of “new configuration” (default 
consensus = 1,2 * token -> 1,2 * 20s = 24s). In brief, the cluster is barely 
44s awake without connection before reboot (stonith) is done, in contrast, I 
expect the parameter "token_retransmits_before_loss_const: 20” to delay the 
token lost whilel is trying to reconnect.
I am right?

On the other hand, If I use the parameter consensus, I can calculate exactly 
when the stonith is going to happen.

Please, if someone knows the answer I will appreciate any help.
Thank you


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Add SSH as a resource on pacemaker cluster - RHEL 7.4

2017-12-19 Thread Ken Gaillot
On Tue, 2017-12-19 at 15:27 +1100, Sreenath Reddy wrote:
> Hi There,
> 
> I am trying to add SSH as a resource within pacemaker cluster running
> on RHEL 7.4 systems.
> This is a 2 node cluster (Active/Passive) with simple FTP resources
> and a cluster IP.
> 
> We have 2 SSH daemons. Admins are using a different SSH port (0)
> for remote access (ssh-admin is the service).
> 
> I want to add "default SSH service (sshd.service) running on port 22"
> to be started as part of pacemaker cluster. And this service will be
> constrained to clusterIP (floating IP).. In other words, SSH service
> will be active only on the active node (in active on second node) and
> if the cluster failover happens, SSH service will be started on the
> second node and stopped on the first node. This way SSH will act as a
> probing service which helps in keeping the floating IP active on the
> node which has cluster IP assigned. Our SDN probes on port22 and
> activates the cluster IP.
> 
> I want to use nginx for doing this probing but client wants to use
> default SSH.
> 
> When I tried to add SSH resource using standard pcs resource create
> command, it failed. Error below
> 
> pcs resource create SSHservice ocf:heartbeat:sshd
> configfile=/etc/ssh/sshd_config op monitor interval=30s
> Error: Agent ' ocf : heartbeat :ssh' is not installed or does not
> provide valid metadata: Metadata query for ocf:heartbeat:ssh failed:
> -5 use --force or override
> 
> Appreciate your help in configuring this issue.
> 
> Thanks in advance!
> 
> -- 
> Regards
> Sreenath
> 

Pacemaker supports several different resource types.

The OCF standard is a cluster-specific script API (similar to old init
scripts with some extensions). The standard OCF agents come with the
resource-agents package. As far as I know, there is no OCF agent for
sshd, which is why you get that message.

Pacemaker also supports OS-launched services, which is probably what
you want. For example, if you're using systemd, you can run "systemctl
disable --now sshd" on all your nodes, and add a systemd:sshd resource
to your cluster.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker pingd with ms drbd = double masters short time when disconnected networks.

2017-12-19 Thread Прокопов Павел

Stonith - the kill other node. But I need another.
So that a node can not become a master if it is not connected to network.
I solved my question by increasing the parameter token in corosync.conf
default token value: 3000. New value: 3.
And then the rule (location location_master_ms_drbd1 ms_drbd1 rule 
$role=Master -inf: not_defined pingd or pingd lte 0) has time to work 
correctly.


Thanks to Adrián Gómez  for his question, which 
turned out to be a response to my.



On 19.12.2017 13:00, emmanuel segura wrote:

You need to configure the stonith and drbd stonith handler

2017-12-19 8:19 GMT+01:00 Прокопов Павел >:


Hello!

pacemaker pingd with ms drbd = double masters short time when
disconnected networks.

My crm config:

node 168885811 : pp-pacemaker1.heliosoft.ru

node 168885812 : pp-pacemaker2.heliosoft.ru

primitive drbd1 ocf:linbit:drbd \
    params drbd_resource=drbd1 \
    op monitor interval=60s \
    op start interval=15 timeout=240s \
    op stop interval=15 timeout=240s \
    op monitor role=Master interval=30s \
    op monitor role=Slave interval=60s
primitive fs_drbd1 Filesystem \
    params device="/dev/drbd1" directory="/mnt/drbd1" fstype=ext4
options=noatime
primitive pinger ocf:pacemaker:ping \
    params host_list=10.16.4.1 multiplier=100 \
    op monitor interval=15s \
    op start interval=0 timeout=5s \
    op stop interval=0
primitive vip IPaddr2 \
    params ip=10.16.5.227 nic=eth0 \
    op monitor interval=10s
primitive vip2 IPaddr2 \
    params ip=10.16.254.50 nic=eth1 \
    op monitor interval=10s
group group_master fs_drbd1 vip vip2
ms ms_drbd1 drbd1 \
    meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true
clone pingerclone pinger \
    meta globally-unique=false
colocation colocation_master inf: ms_drbd1:Master group_master
location location_master_ms_drbd1 ms_drbd1 \
    rule $role=Master -inf: not_defined pingd or pingd lte 0
order main_order Mandatory: pingerclone:start ms_drbd1:promote
group_master:start
property cib-bootstrap-options: \
    stonith-enabled=false \
    no-quorum-policy=ignore \
    default-resource-stickiness=500 \
    cluster-name=pp1

root@pp-pacemaker2:~# crm_mon -1
Stack: corosync
Current DC: pp-pacemaker2.heliosoft.ru
 (version 1.1.16-94ff4df) -
partition with quorum
Last updated: Fri Dec 15 13:48:10 2017
Last change: Fri Dec 15 13:46:38 2017 by root via cibadmin on
pp-pacemaker1.heliosoft.ru 

2 nodes configured
7 resources configured

Online: [ pp-pacemaker1.heliosoft.ru
 pp-pacemaker2.heliosoft.ru
 ]

Active resources:

 Resource Group: group_master
 fs_drbd1    (ocf::heartbeat:Filesystem): Started
pp-pacemaker1.heliosoft.ru 
 vip    (ocf::heartbeat:IPaddr2):    Started
pp-pacemaker1.heliosoft.ru 
 vip2    (ocf::heartbeat:IPaddr2):    Started
pp-pacemaker1.heliosoft.ru 
 Master/Slave Set: ms_drbd1 [drbd1]
 Masters: [ pp-pacemaker1.heliosoft.ru
 ]
 Slaves: [ pp-pacemaker2.heliosoft.ru
 ]
 Clone Set: pingerclone [pinger]
 Started: [ pp-pacemaker1.heliosoft.ru
 pp-pacemaker2.heliosoft.ru
 ]
#end crm_mon

When I disconnect pp-pacemaker2 from all networks, I have:
root@pp-pacemaker2:~# crm_mon -1
Stack: corosync
Current DC: pp-pacemaker2.heliosoft.ru
 (version 1.1.16-94ff4df) -
partition with quorum
Last updated: Fri Dec 15 13:53:15 2017
Last change: Fri Dec 15 13:53:00 2017 by root via cibadmin on
pp-pacemaker2.heliosoft.ru 

2 nodes configured
7 resources configured

Online: [ pp-pacemaker2.heliosoft.ru
]
OFFLINE: [pp-pacemaker1.heliosoft.ru
 ]

Active resources:

 Resource Group: group_master
 fs_drbd1    (ocf::heartbeat:Filesystem): Started
pp-pacemaker2.heliosoft.ru 
 vip    (ocf::heartbeat:IPaddr2):    Started
pp-pacemaker2.heliosoft.ru 
 vip2    (ocf::heartbeat:IPaddr2):    Started
pp-pacemaker2.heliosoft.ru 
 Master/Slave Set: ms_drbd1 [drbd1]
 Masters: [ pp-pa