[ClusterLabs] Cluster stop works after upgrade 1.1.15-rc1

2016-05-24 Thread Andrey Rogovsky
Hi!
I have 3 nodes cluster in this config:

# crm configure show
node 1084754433: a \
attributes pgsql-data-status="STREAMING|ASYNC" maintenance=off standby=off
node 1084754434: b \
attributes pgsql-data-status=LATEST maintenance=off standby=off
node 1084754435: c \
attributes pgsql-data-status="STREAMING|ASYNC" maintenance=off
primitive apache apache \
params configfile="/etc/apache2/apache2.conf" port=8000 statusurl="
http://localhost:8000/server-status"; \
op monitor interval=2min on-fail=restart \
op monitor interval=30s on-fail=restart timeout=160s \
meta is-managed=true
primitive pgsql-master-ip IPaddr2 \
params ip=192.168.10.200 nic=peervpn0 \
op start interval=0s on-fail=restart timeout=60s \
op monitor interval=10s on-fail=restart timeout=60s \
op stop interval=0s on-fail=block timeout=60s \
meta target-role=Started
primitive pgsqld pgsqlms \
params bindir="/usr/lib/postgresql/9.4/bin"
pgdata="/var/lib/postgresql/9.4/main" pghost="/var/run/postgresql"
recovery_template="/etc/postgresql/9.4/main/recovery.conf.pcmk"
start_opts="-c config_file=/etc/postgresql/9.4/main/postgresql.conf"
primary_node=a \
op start interval=0 on-fail=restart timeout=60s \
op monitor interval=40s on-fail=restart timeout=60s \
op promote interval=0s on-fail=restart timeout=60s \
op demote interval=0s on-fail=stop timeout=120s \
op monitor interval=30s on-fail=restart role=Master timeout=60s \
op stop interval=0 on-fail=block timeout=60s \
op notify interval=0s timeout=60s
group master pgsql-master-ip
ms msPostgresql pgsqld \
meta master-max=1 master-node-max=1 clone-max=3 clone-node-max=1
target-role=Master notify=true is-managed=true
clone WebFarm apache \
meta target-role=Started
location pgsql-prefers-a msPostgresql role=Master 1000: a
colocation set_ip inf: master msPostgresql:Master
order ip_down 0: msPostgresql:demote master:stop symmetrical=false
order ip_up 0: msPostgresql:promote master:start symmetrical=false
property cib-bootstrap-options: \
dc-version=1.1.15-88ac26d \
cluster-infrastructure=corosync \
expected-quorum-votes=3 \
stonith-enabled=false \
crmd-transition-delay=0 \
last-lrm-refresh=1461400181 \
dc-deadtime=2min \
default-action-timeout=2min \
have-watchdog=true
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=10

So, I have master IP 192.168.10.200 which will assigned to new msPostgresql
master.
All works fine until I update software to 1.1.15-rc1
Now IP 192.168.10.200 is not assigned to new msPostgresql master:


May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
lsn_location ()
May 24 10:25:45 b attrd[1798]:   notice: Sent delete 914043:
node=1084754434, attr=lsn_location, id=, set=(null), section=status
May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
nodes ()
May 24 10:25:45 b attrd[1798]:   notice: Sent delete 914046:
node=1084754434, attr=nodes, id=, set=(null), section=status
May 24 10:25:45 b pgsqlms(pgsqld)[2533]: INFO: pgsql_notify: current node
LSN: 37B/876EC668
May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
lsn_location (37B/876EC668)
May 24 10:25:45 b attrd[1798]:   notice: Sent update 914050:
lsn_location=37B/876EC668
May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
nodes (c b)
May 24 10:25:45 b attrd[1798]:   notice: Sent update 914052: nodes=c b
May 24 10:25:45 b crmd[1800]:   notice: Operation pgsqld_notify_0: ok
(node=b, call=68635, rc=0, cib-update=0, confirmed=true)
May 24 10:25:45 b pgsqlms(pgsqld)[2554]: INFO: pgsql_notify: promoting
instance on node "b"
May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
lsn_location ()
May 24 10:25:45 b attrd[1798]:   notice: Sent delete 914056:
node=1084754434, attr=lsn_location, id=, set=(null), section=status
May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
nodes ()
May 24 10:25:45 b attrd[1798]:   notice: Sent delete 914060:
node=1084754434, attr=nodes, id=, set=(null), section=status
May 24 10:25:45 b pgsqlms(pgsqld)[2554]: INFO: pgsql_notify: current node
LSN: 37B/876EC668
May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
lsn_location (37B/876EC668)
May 24 10:25:45 b attrd[1798]:   notice: Sent update 914063:
lsn_location=37B/876EC668
May 24 10:25:45 b attrd[1798]:   notice: Sending flush op to all hosts for:
nodes (b c)
May 24 10:25:45 b attrd[1798]:   notice: Sent update 914065: nodes=b c
May 24 10:25:45 b crmd[1800]:   notice: Operation pgsqld_notify_0: ok
(node=b, call=68636, rc=0, cib-update=0, confirmed=true)
May 24 10:25:45 b postgres[2280]: [2-1] 2016-05-24 10:25:45 MSK FATAL:
 could not connect to the primary server: could not connect to server: No
route to host
May 24 10:25:45 b postgres[2280]: [2-2] #011#011Is the server running on
host "192.168.10.200" and accepting
May 24 10:25:45 b postgres[2280]: [2-3] #011#011TCP/IP connections on port
5432?
May 24 10:25:45 b postgres[2280]: [2-4]
May 24 10:25:45 b pgsqlms(pgsqld)[

Re: [ClusterLabs] Antw: Using pacemaker for manual failover only?

2016-05-24 Thread Jehan-Guillaume de Rorthais
Le Tue, 24 May 2016 07:49:16 +0200,
"Ulrich Windl"  a écrit :

> >>> "Stephano-Shachter, Dylan"  schrieb am
> >>> 23.05.2016 um
> 21:03 in Nachricht
> :
> 
> [...]
>  I would like for the cluster to do nothing when a node fails unexpectedly.
> [...]
> So this means you only want the cluster to do something, if the node fails as
> part of a planned maintenance? Then you need no cluster at all! (MHO)

I can see the use case for this. I already faced situation where customers 
wanted a one step procedure to failover manually on the other side. I call this 
is the big-red-button failover.

Producing its own custom shell script to failover a resource is actually really
complexe. There so many way it could fail and only one way to do it properly.
And of course, not a single architecture is the same than the other one, so
we quickly end up with custom script everywhere. And we are not speaking of
fencing yet...

Being able to set up and test a cluster to deal with all the machinery to
move your resource correctly is much more confortable and safe.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Using pacemaker for manual failover only?

2016-05-24 Thread Jehan-Guillaume de Rorthais
Le Tue, 24 May 2016 01:53:22 -0400,
Digimer  a écrit :

> On 23/05/16 03:03 PM, Stephano-Shachter, Dylan wrote:
> > Hello,
> > 
> > I am using pacemaker 1.1.14 with pcs 0.9.149. I have successfully
> > configured pacemaker for highly available nfs with drbd. Pacemaker
> > allows me to easily failover without interrupting nfs connections. I,
> > however, am only interested in failing over manually (currently I use
> > "pcs resource move   --master"). I would like for
> > the cluster to do nothing when a node fails unexpectedly.
> > 
> > Right now the solution I am going with is to run 
> > "pcs property set is-managed-default=no"
> > until I need to failover, at which point I set is-managed-default=yes,
> > then failover, then set it back to no.
> > 
> > While this method works for me, it can be unpredictable if people run
> > move commands at the wrong time.
> > 
> > Is there a way to disable automatic failover permanently while still
> > allowing manual failover (with "pcs resource move" or with something else)?

Try to set up your cluster without the "interval" parameter on the monitor
action? The resource will be probed during the target-action (start/promote I
suppose), but then it should not get monitored anymore.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Using pacemaker for manual failover only?

2016-05-24 Thread Klaus Wenninger
On 05/24/2016 09:50 AM, Jehan-Guillaume de Rorthais wrote:
> Le Tue, 24 May 2016 01:53:22 -0400,
> Digimer  a écrit :
>
>> On 23/05/16 03:03 PM, Stephano-Shachter, Dylan wrote:
>>> Hello,
>>>
>>> I am using pacemaker 1.1.14 with pcs 0.9.149. I have successfully
>>> configured pacemaker for highly available nfs with drbd. Pacemaker
>>> allows me to easily failover without interrupting nfs connections. I,
>>> however, am only interested in failing over manually (currently I use
>>> "pcs resource move   --master"). I would like for
>>> the cluster to do nothing when a node fails unexpectedly.
>>>
>>> Right now the solution I am going with is to run 
>>> "pcs property set is-managed-default=no"
>>> until I need to failover, at which point I set is-managed-default=yes,
>>> then failover, then set it back to no.
>>>
>>> While this method works for me, it can be unpredictable if people run
>>> move commands at the wrong time.
>>>
>>> Is there a way to disable automatic failover permanently while still
>>> allowing manual failover (with "pcs resource move" or with something else)?
> Try to set up your cluster without the "interval" parameter on the monitor
> action? The resource will be probed during the target-action (start/promote I
> suppose), but then it should not get monitored anymore.

Ignoring the general cluster yes/no question a simple solution would
be to bind the master-role to a node-attribute that you move around
manually.

> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Using pacemaker for manual failover only?

2016-05-24 Thread Ken Gaillot
On 05/24/2016 04:13 AM, Klaus Wenninger wrote:
> On 05/24/2016 09:50 AM, Jehan-Guillaume de Rorthais wrote:
>> Le Tue, 24 May 2016 01:53:22 -0400,
>> Digimer  a écrit :
>>
>>> On 23/05/16 03:03 PM, Stephano-Shachter, Dylan wrote:
 Hello,

 I am using pacemaker 1.1.14 with pcs 0.9.149. I have successfully
 configured pacemaker for highly available nfs with drbd. Pacemaker
 allows me to easily failover without interrupting nfs connections. I,
 however, am only interested in failing over manually (currently I use
 "pcs resource move   --master"). I would like for
 the cluster to do nothing when a node fails unexpectedly.

 Right now the solution I am going with is to run 
 "pcs property set is-managed-default=no"
 until I need to failover, at which point I set is-managed-default=yes,
 then failover, then set it back to no.

 While this method works for me, it can be unpredictable if people run
 move commands at the wrong time.

 Is there a way to disable automatic failover permanently while still
 allowing manual failover (with "pcs resource move" or with something else)?
>> Try to set up your cluster without the "interval" parameter on the monitor
>> action? The resource will be probed during the target-action (start/promote I
>> suppose), but then it should not get monitored anymore.
> 
> Ignoring the general cluster yes/no question a simple solution would
> be to bind the master-role to a node-attribute that you move around
> manually.

This is the right track. There are a number of ways you could do it, but
the basic idea is to use constraints to only allow the resources to run
on one node. When you want to fail over, flip the constraints.

I'd colocate everything with one (most basic) resource, so then all you
need is one constraint for that resource to flip. It could be as simple
as a -INFINITY location constraint on the node you don't want to run on.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster stops randomly

2016-05-24 Thread ‪H Yavari‬ ‪
Hi,
Thanks for reply. sure I'll send the logs next time. 
issue: for example for 2 days cluster is ok and all nodes are active and 
online, but randomly when I check cluster status on one of nodes, I notice that 
cluster is stooped and it is same on all other nodes.So I should run "pcs 
cluster start --all".
Regards,H.Yavari


  From: Jan Pokorný 
 To: users@clusterlabs.org 
 
   
On 21/05/16 04:46 +, H Yavari wrote:
> I have a cluster and it works good, but I see sometimes cluster is
> stopped on all nodes and I should start manually. pcsd service is
> running but cluster is stopped.I see the pacemaker log but I
> couldn't find any warning or error. what is the issue? 
> (stonith is disable.)

- disabled stonith/fencing not set up is high risk rather than high
  availability in majority of the cases

- is "cluster was started and stopped inadvertently" what you mean?

- please provide the part of the log around the moment cluster ceased
  to work properly plus cluster's configuration (we are not good in
  telephatic remote access yet)

-- 
Jan (Poki)
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


  ___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org