Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Gao,Yan

On 11/30/2017 01:41 PM, Ulrich Windl wrote:




"Gao,Yan"  schrieb am 30.11.2017 um 11:48 in Nachricht

:

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:

SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
VM on VSphere using shared VMDK as SBD. During basic tests by killing
corosync and forcing STONITH pacemaker was not started after reboot.
In logs I see during boot

Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
just fenced by sapprod01p for sapprod01p
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
process (3151) can no longer be respawned,
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down

Pacemaker


SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
stonith with SBD always takes msgwait (at least, visually host is not
declared as OFFLINE until 120s passed). But VM rebots lightning fast
and is up and running long before timeout expires.


As msgwait was intended for the message to arrive, and not for the reboot time (I guess), 
The msgwait timer on the sender starts only after a successful writing. 
The recipient will either eat the pill or get killed by watchdog within 
watchdog timeout. As mentioned in sbd man, msgwait should be twice the 
watchdog timeout. So that the sender can safely assume the target is 
dead when the msgwait timer is popped.


Regards,
  Yan



this just shows a fundamental problem in SBD design: Receipt of the fencing 
command is not confirmed (other than by seeing the consequences of ist 
execution).

So the fencing node will see the other host is down (on the network), but it 
won't believe it until SBD msgwait is over. OTOH if your msgwait is very low, 
and the storage has a problem (exceeding msgwait), the node will assume a 
successful fencing when in fact it didn't complete.

So maybe there should be two timeouts: One for the command to be delivered 
(without needing a confirmation, but the confirmation could shorten the wait), 
and another for executing the command (how long will it take from receipt of 
the command until the host is definitely down). Again a confirmation could stop 
waiting before the timeout is reached.

Regards,
Ulrich




I think I have seen similar report already. Is it something that can
be fixed by SBD/pacemaker tuning?

SBD_DELAY_START=yes in /etc/sysconfig/sbd is the solution.

Regards,
Yan



I can provide full logs tomorrow if needed.

TIA

-andrei

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Gao,Yan

On 11/30/2017 06:48 PM, Andrei Borzenkov wrote:

30.11.2017 16:11, Klaus Wenninger пишет:

On 11/30/2017 01:41 PM, Ulrich Windl wrote:



"Gao,Yan"  schrieb am 30.11.2017 um 11:48 in Nachricht

:

On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:

SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
VM on VSphere using shared VMDK as SBD. During basic tests by killing
corosync and forcing STONITH pacemaker was not started after reboot.
In logs I see during boot

Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
just fenced by sapprod01p for sapprod01p
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
process (3151) can no longer be respawned,
Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down

Pacemaker

SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
stonith with SBD always takes msgwait (at least, visually host is not
declared as OFFLINE until 120s passed). But VM rebots lightning fast
and is up and running long before timeout expires.

As msgwait was intended for the message to arrive, and not for the reboot time 
(I guess), this just shows a fundamental problem in SBD design: Receipt of the 
fencing command is not confirmed (other than by seeing the consequences of ist 
execution).


The 2 x msgwait is not for confirmations but for writing the poison-pill
and for
having it read by the target-side.


Yes, of course, but that's not what Urlich likely intended to say.
msgwait must account for worst case storage path latency, while in
normal cases it happens much faster. If fenced node could acknowledge
having been killed after reboot, stonith agent could return success much
earlier.

How could an alive man be sure he died before? ;)

Regards,
  Yan



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pcs create master/slave resource doesn't work (Ken Gaillot)

2017-12-01 Thread Ken Gaillot
On Fri, 2017-12-01 at 09:36 +0800, Hui Xiang wrote:
> Hi all,
> 
>   I am using the ovndb-servers ocf agent[1] which is a kind of multi-
> state resource,when I am creating it(please see my previous email),
> the monitor is called only once, and the start operation is never
> called, according to below description, the once called monitor
> operation returned OCF_NOT_RUNNING, should the pacemaker will decide
> to execute start action based this return code? is there any way to 

Before Pacemaker does anything with a resource, it first calls a one-
time monitor (called a "probe") to find out the current status of the
resource across the cluster. This allows it to discover if the service
is already running somewhere.

So, you will see those probes for every resource when the cluster
starts, or when the resource is added to the configuration, or when the
resource is cleaned up.

> check out what is the next action? Currently in my environment
> nothing happened and I am almost tried all I known ways to debug,
> however, no lucky, could anyone help it out? thank you very much.
> 
> Monitor Return Code   Description
> OCF_NOT_RUNNING   Stopped
> OCF_SUCCESS   Running (Slave)
> OCF_RUNNING_MASTERRunning (Master)
> OCF_FAILED_MASTER Failed (Master)
> Other Failed (Slave)
> 
> 
> [1] https://github.com/openvswitch/ovs/blob/master/ovn/utilities/ovnd
> b-servers.ocf
> Hui.
> 
> 
> 
> On Thu, Nov 30, 2017 at 6:39 PM, Hui Xiang 
> wrote:
> > The really weired thing is that the monitor is only called once
> > other than expected repeatedly, where should I check for it?
> > 
> > On Thu, Nov 30, 2017 at 4:14 PM, Hui Xiang 
> > wrote:
> > > Thanks Ken very much for your helpful infomation.
> > > 
> > > I am now blocking on I can't see the pacemaker DC do any further
> > > start/promote etc action on my resource agents, no helpful logs
> > > founded.

Each time the DC decides what to do, there will be a line like "...
saving inputs in ..." with a file name. The log messages just before
that may give some useful information.

Otherwise, you can take that file, and simulate what the cluster
decided at that point:

  crm_simulate -Sx $FILENAME

It will first show the status of the cluster at the start of the
decision-making, then a "Transition Summary" with the actions that are
required, then a simulated execution of those actions, and then what
the resulting status would be if those actions succeeded.

That may give you some more information. You can make it more verbose
by using "-Ssx", or by adding "-", but it's not very user-friendly
output.

> > > 
> > > So my first question is that in what kind of situation DC will
> > > decide do call start action?  does the monitor operation need to
> > > be return OCF_SUCCESS? in my case, it will return
> > > OCF_NOT_RUNNING, and the monitor operation is not being called
> > > any more, which should be wrong as I felt that it should be
> > > called intervally. 

The DC will ask for a start if the configuration and current status
require it. For example, if the resource's current status is stopped,
and the configuration calls for a target role of started (the default),
then it will start it. On the other hand, if the current status is
started, then it doesn't need to do anything -- or, if location
constraints ban all the nodes from running the resource, then it can't
do anything.

So, it's all based on what the current status is (based on the last
monitor result), and what the configuration requires.

> > > 
> > > The resource agent monitor logistic:
> > > In the xx_monitor function it will call xx_update, and there
> > > always hit  "$CRM_MASTER -D;;" , what does it usually mean? will
> > > it stopped that start operation being called? 

Each master/slave resource has a special node attribute with a "master
score" for that node. The node with the highest master score will be
promoted to master. It's up to the resource agent to set this
attribute. The "-D" call you see deletes that attribute (presumably
before updating it later).

The master score has no effect on starting/stopping.

> > > 
> > > ovsdb_server_master_update() {
> > >     ocf_log info "ovsdb_server_master_update: $1}"
> > > 
> > >     case $1 in
> > >         $OCF_SUCCESS)
> > >         $CRM_MASTER -v ${slave_score};;
> > >         $OCF_RUNNING_MASTER)
> > >             $CRM_MASTER -v ${master_score};;
> > >         #*) $CRM_MASTER -D;;
> > >     esac
> > >     ocf_log info "ovsdb_server_master_update end}"
> > > }
> > > 
> > > ovsdb_server_monitor() {
> > >     ocf_log info "ovsdb_server_monitor"
> > >     ovsdb_server_check_status
> > >     rc=$?
> > > 
> > >     ovsdb_server_master_update $rc
> > >     ocf_log info "monitor is going to return $rc"
> > >     return $rc
> > > }
> > > 
> > > 
> > > Below is my cluster configuration:
> > > 
> > > 1. First I have an vip set.
> > > [root@node-1 ~]# pcs resource show
> > >  vip__management_old  (ocf::es:ns_IPaddr2):   Started
> > > node-1.domain.tld
> > 

Re: [ClusterLabs] questions about startup fencing

2017-12-01 Thread Ken Gaillot
On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote:
> Ken Gaillot  wrote:
> > On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:
> > > Hi all,
> > > 
> > > A colleague has been valiantly trying to help me belatedly learn
> > > about
> > > the intricacies of startup fencing, but I'm still not fully
> > > understanding some of the finer points of the behaviour.
> > > 
> > > The documentation on the "startup-fencing" option[0] says
> > > 
> > > Advanced Use Only: Should the cluster shoot unseen nodes? Not
> > > using the default is very unsafe!
> > > 
> > > and that it defaults to TRUE, but doesn't elaborate any further:
> > > 
> > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pa
> > > cema
> > > ker_Explained/s-cluster-options.html
> > > 
> > > Let's imagine the following scenario:
> > > 
> > > - We have a 5-node cluster, with all nodes running cleanly.
> > > 
> > > - The whole cluster is shut down cleanly.
> > > 
> > > - The whole cluster is then started up again.  (Side question:
> > > what
> > >   happens if the last node to shut down is not the first to start
> > > up?
> > >   How will the cluster ensure it has the most recent version of
> > > the
> > >   CIB?  Without that, how would it know whether the last man
> > > standing
> > >   was shut down cleanly or not?)
> > 
> > Of course, the cluster can't know what CIB version nodes it doesn't
> > see
> > have, so if a set of nodes is started with an older version, it
> > will go
> > with that.
> 
> Right, that's what I expected.
> 
> > However, a node can't do much without quorum, so it would be
> > difficult
> > to get in a situation where CIB changes were made with quorum
> > before
> > shutdown, but none of those nodes are present at the next start-up
> > with
> > quorum.
> > 
> > In any case, when a new node joins a cluster, the nodes do compare
> > CIB
> > versions. If the new node has a newer CIB, the cluster will use it.
> > If
> > other changes have been made since then, the newest CIB wins, so
> > one or
> > the other's changes will be lost.
> 
> Ahh, that's interesting.  Based on reading
> 
> https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema
> ker_Explained/ch03.html#_cib_properties
> 
> whichever node has the highest (admin_epoch, epoch, num_updates)
> tuple
> will win, so normally in this scenario it would be the epoch which
> decides it, i.e. whichever node had the most changes since the last
> time the conflicting nodes shared the same config - right?

Correct ... assuming the code for that is working properly, which I
haven't confirmed :)

> 
> And if that would choose the wrong node, admin_epoch can be set
> manually to override that decision?

Correct again, with same caveat

> 
> > Whether missing nodes were shut down cleanly or not relates to your
> > next question ...
> > 
> > > - 4 of the nodes boot up fine and rejoin the cluster within the
> > >   dc-deadtime interval, foruming a quorum, but the 5th doesn't.
> > > 
> > > IIUC, with startup-fencing enabled, this will result in that 5th
> > > node
> > > automatically being fenced.  If I'm right, is that really
> > > *always*
> > > necessary?
> > 
> > It's always safe. :-) As you mentioned, if the missing node was the
> > last one alive in the previous run, the cluster can't know whether
> > it
> > shut down cleanly or not. Even if the node was known to shut down
> > cleanly in the last run, the cluster still can't know whether the
> > node
> > was started since then and is now merely unreachable. So, fencing
> > is
> > necessary to ensure it's not accessing resources.
> 
> I get that, but I was questioning the "necessary to ensure it's not
> accessing resources" part of this statement.  My point is that
> sometimes this might be overkill, because sometimes we might be able
> to
> discern through other methods that there are no resources we need to
> worry about potentially conflicting with what we want to run.  That's
> why I gave the stateless clones example.
> 
> > The same scenario is why a single node can't have quorum at start-
> > up in
> > a cluster with "two_node" set. Both nodes have to see each other at
> > least once before they can assume it's safe to do anything.
> 
> Yep.
> 
> > > Let's suppose further that the cluster configuration is such that
> > > no
> > > stateful resources which could potentially conflict with other
> > > nodes
> > > will ever get launched on that 5th node.  For example it might
> > > only
> > > host stateless clones, or resources with require=nothing set, or
> > > it
> > > might not even host any resources at all due to some temporary
> > > constraints which have been applied.
> > > 
> > > In those cases, what is to be gained from fencing?  The only
> > > thing I
> > > can think of is that using (say) IPMI to power-cycle the node
> > > *might*
> > > fix whatever issue was preventing it from joining the
> > > cluster.  Are
> > > there any other reasons for fencing in this case?  It wouldn't
> > > h

Re: [ClusterLabs] questions about startup fencing

2017-12-01 Thread Ken Gaillot
On Fri, 2017-12-01 at 16:21 -0600, Ken Gaillot wrote:
> On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote:
> > Ken Gaillot  wrote:
> > > On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:
> > > > Hi all,
> > > > 
> > > > A colleague has been valiantly trying to help me belatedly
> > > > learn
> > > > about
> > > > the intricacies of startup fencing, but I'm still not fully
> > > > understanding some of the finer points of the behaviour.
> > > > 
> > > > The documentation on the "startup-fencing" option[0] says
> > > > 
> > > > Advanced Use Only: Should the cluster shoot unseen nodes?
> > > > Not
> > > > using the default is very unsafe!
> > > > 
> > > > and that it defaults to TRUE, but doesn't elaborate any
> > > > further:
> > > > 
> > > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/
> > > > Pa
> > > > cema
> > > > ker_Explained/s-cluster-options.html
> > > > 
> > > > Let's imagine the following scenario:
> > > > 
> > > > - We have a 5-node cluster, with all nodes running cleanly.
> > > > 
> > > > - The whole cluster is shut down cleanly.
> > > > 
> > > > - The whole cluster is then started up again.  (Side question:
> > > > what
> > > >   happens if the last node to shut down is not the first to
> > > > start
> > > > up?
> > > >   How will the cluster ensure it has the most recent version of
> > > > the
> > > >   CIB?  Without that, how would it know whether the last man
> > > > standing
> > > >   was shut down cleanly or not?)
> > > 
> > > Of course, the cluster can't know what CIB version nodes it
> > > doesn't
> > > see
> > > have, so if a set of nodes is started with an older version, it
> > > will go
> > > with that.
> > 
> > Right, that's what I expected.
> > 
> > > However, a node can't do much without quorum, so it would be
> > > difficult
> > > to get in a situation where CIB changes were made with quorum
> > > before
> > > shutdown, but none of those nodes are present at the next start-
> > > up
> > > with
> > > quorum.
> > > 
> > > In any case, when a new node joins a cluster, the nodes do
> > > compare
> > > CIB
> > > versions. If the new node has a newer CIB, the cluster will use
> > > it.
> > > If
> > > other changes have been made since then, the newest CIB wins, so
> > > one or
> > > the other's changes will be lost.
> > 
> > Ahh, that's interesting.  Based on reading
> > 
> > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pace
> > ma
> > ker_Explained/ch03.html#_cib_properties
> > 
> > whichever node has the highest (admin_epoch, epoch, num_updates)
> > tuple
> > will win, so normally in this scenario it would be the epoch which
> > decides it, i.e. whichever node had the most changes since the last
> > time the conflicting nodes shared the same config - right?
> 
> Correct ... assuming the code for that is working properly, which I
> haven't confirmed :)
> 
> > 
> > And if that would choose the wrong node, admin_epoch can be set
> > manually to override that decision?
> 
> Correct again, with same caveat
> 
> > 
> > > Whether missing nodes were shut down cleanly or not relates to
> > > your
> > > next question ...
> > > 
> > > > - 4 of the nodes boot up fine and rejoin the cluster within the
> > > >   dc-deadtime interval, foruming a quorum, but the 5th doesn't.
> > > > 
> > > > IIUC, with startup-fencing enabled, this will result in that
> > > > 5th
> > > > node
> > > > automatically being fenced.  If I'm right, is that really
> > > > *always*
> > > > necessary?
> > > 
> > > It's always safe. :-) As you mentioned, if the missing node was
> > > the
> > > last one alive in the previous run, the cluster can't know
> > > whether
> > > it
> > > shut down cleanly or not. Even if the node was known to shut down
> > > cleanly in the last run, the cluster still can't know whether the
> > > node
> > > was started since then and is now merely unreachable. So, fencing
> > > is
> > > necessary to ensure it's not accessing resources.
> > 
> > I get that, but I was questioning the "necessary to ensure it's not
> > accessing resources" part of this statement.  My point is that
> > sometimes this might be overkill, because sometimes we might be
> > able
> > to
> > discern through other methods that there are no resources we need
> > to
> > worry about potentially conflicting with what we want to
> > run.  That's
> > why I gave the stateless clones example.
> > 
> > > The same scenario is why a single node can't have quorum at
> > > start-
> > > up in
> > > a cluster with "two_node" set. Both nodes have to see each other
> > > at
> > > least once before they can assume it's safe to do anything.
> > 
> > Yep.
> > 
> > > > Let's suppose further that the cluster configuration is such
> > > > that
> > > > no
> > > > stateful resources which could potentially conflict with other
> > > > nodes
> > > > will ever get launched on that 5th node.  For example it might
> > > > only
> > > > host stateless clones, or resources with require=nothing set,
> > > > or
> > 

Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-01 Thread Ken Gaillot
On Thu, 2017-11-30 at 07:55 +0100, Ulrich Windl wrote:
> 
> 
> > Kristoffer Gronlund  wrote:
> > > Adam Spiers  writes:
> > > 
> > > > - The whole cluster is shut down cleanly.
> > > > 
> > > > - The whole cluster is then started up again.  (Side question:
> > > > what
> > > >   happens if the last node to shut down is not the first to
> > > > start up?
> > > >   How will the cluster ensure it has the most recent version of
> > > > the
> > > >   CIB?  Without that, how would it know whether the last man
> > > > standing
> > > >   was shut down cleanly or not?)
> > > 
> > > This is my opinion, I don't really know what the "official"
> > > pacemaker
> > > stance is: There is no such thing as shutting down a cluster
> > > cleanly. A
> > > cluster is a process stretching over multiple nodes - if they all
> > > shut
> > > down, the process is gone. When you start up again, you
> > > effectively have
> > > a completely new cluster.
> > 
> > Sorry, I don't follow you at all here.  When you start the cluster
> > up
> > again, the cluster config from before the shutdown is still there.
> > That's very far from being a completely new cluster :-)
> 
> The problem is you cannot "start the cluster" in pacemaker; you can
> only "start nodes". The nodes will come up one by one. As opposed (as
> I had said) to HP Sertvice Guard, where there is a "cluster formation
> timeout". That is, the nodes wait for the specified time for the
> cluster to "form". Then the cluster starts as a whole. Of course that
> only applies if the whole cluster was down, not if a single node was
> down.

I'm not sure what that would specifically entail, but I'm guessing we
have some of the pieces already:

- Corosync has a wait_for_all option if you want the cluster to be
unable to have quorum at start-up until every node has joined. I don't
think you can set a timeout that cancels it, though.

- Pacemaker will wait dc-deadtime for the first DC election to
complete. (if I understand it correctly ...)

- Higher-level tools can start or stop all nodes together (e.g. pcs has
pcs cluster start/stop --all).

> > 
> > > When starting up, how is the cluster, at any point, to know if
> > > the
> > > cluster it has knowledge of is the "latest" cluster?
> > 
> > That was exactly my question.
> > 
> > > The next node could have a newer version of the CIB which adds
> > > yet
> > > more nodes to the cluster.
> > 
> > Yes, exactly.  If the first node to start up was not the last man
> > standing, the CIB history is effectively being forked.  So how is
> > this
> > issue avoided?
> 
> Quorum? "Cluster formation delay"?
> 
> > 
> > > The only way to bring up a cluster from being completely stopped
> > > is to
> > > treat it as creating a completely new cluster. The first node to
> > > start
> > > "creates" the cluster and later nodes join that cluster.
> > 
> > That's ignoring the cluster config, which persists even when the
> > cluster's down.
> > 
> > But to be clear, you picked a small side question from my original
> > post and answered that.  The main questions I had were about
> > startup
> > fencing :-)
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_node_start_state=standby sometimes does not work

2017-12-01 Thread Ken Gaillot
On Tue, 2017-11-28 at 09:36 +, 井上 和徳 wrote:
> Hi,
> 
> Sometimes a node with 'PCMK_node_start_state=standby' will start up
> Online.
> 
> [ reproduction scenario ]
>  * Set 'PCMK_node_start_state=standby' to /etc/sysconfig/pacemaker.
>  * Delete cib (/var/lib/pacemaker/cib/*).
>  * Start pacemaker at the same time on 2 nodes.
>   # for i in rhel74-1 rhel74-3 ; do ssh -f $i systemctl start
> pacemaker ; done
> 
> [ actual result ]
>  * crm_mon
>   Stack: corosync
>   Current DC: rhel74-3 (version 1.1.18-2b07d5c) - partition with
> quorum
>   Last change: Wed Nov 22 06:22:50 2017 by hacluster via crmd on
> rhel74-3
> 
>   2 nodes configured
>   0 resources configured
> 
>   Node rhel74-3: standby
>   Online: [ rhel74-1 ]
> 
>  * cib.xml
>   
> 
> 
>   
>  value="on"/>
>   
> 
>   
> 
>  * pacemaker.log
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: (cib_native.c:462 )
> warning: cib_native_perform_op_delegate:  Call failed: No such
> device or address
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )info: update_attr_delegate:  Update    id="3232261507">
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )info: update_attr_delegate:  Update  es id="nodes-3232261507">
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )info: update_attr_delegate:  Update    id="nodes-3232261507-standby" name="standby" value="on"/>
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )info: update_attr_delegate:  Update  tes>
>   Nov 22 06:22:50 [20755] rhel74-1   crmd: ( cib_attrs.c:320
> )info: update_attr_delegate:  Update   
> 
>  * I attached crm_report to GitHub (too big to attach to this email),
> so look at it.
>    https://github.com/inouekazu/pcmk_report/blob/master/pcmk-Wed-22-N
> ov-2017.tar.bz2
> 
> 
> I think that the additional timing of *1 and
> *2 is the cause.
> *1 '
> *2 
>   value="on"/>
> 
> I expect to be fixed, but if it's difficult, I have two questions.
> 1) Does this only occur if there is no cib.xml (in other words, there
> is no  element)?

I believe so. I think this is the key message:

Nov 22 06:22:50 [20750] rhel74-1cib: ( callbacks.c:1101  )
warning: cib_process_request:Completed cib_modify operation for
section nodes: No such device or address (rc=-6, origin=rhel74-
1/crmd/12, version=0.3.0)

PCMK_node_start_state works by setting the "standby" node attribute in
the CIB. However, it does this via a "modify" command that assumes the
 tag already exists.

If there is no CIB, pacemaker will quickly create one -- but in this
case, the node tries to set the attribute before that's happened.

Hopefully we can come up with a fix. If you want, you can file a bug
report at bugs.clusterlabs.org, to track the progress.

> 2) Is there any workaround other than "Do not start at the same
> time"?
> 
> Best Regards

Before starting pacemaker, if /var/lib/pacemaker/cib is empty, you can
create a skeleton CIB with:

 cibadmin --empty > /var/lib/pacemaker/cib/cib.xml

That will include an empty  tag, and the modify command should
work when pacemaker starts.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] building from source

2017-12-01 Thread Aaron Cody
Unfortunately just upgrading to the latest RHEL is problematic for us as we
are working in a highly qualified environment (based around RHEL 7.2), so I
was hoping to get the latest HA stuff by just building it from source...
I looked at the available RPM packages in the RH HA repo but they don't
seem to be the 'latest' ... could you let me know which versions of what I
would need to download and build to get the latest and greatest?
thanks

On Wed, Nov 29, 2017 at 7:03 AM, Ken Gaillot  wrote:

> On Tue, 2017-11-28 at 11:23 -0800, Aaron Cody wrote:
> > I'm trying to build all of the pacemaker/corosync components from
> > source instead of using the redhat rpms - I have a few questions.
> >
> > I'm building on redhat 7.2 and so far I have been able to build:
> >
> > libqb 1.0.2
> > pacemaker 1.1.18
> > corosync 2.4.3
> > resource-agents 4.0.1
> >
> > however I have not been able to build pcs yet, i'm getting ruby
> > errors:
> >
> > sudo make install_pcsd
> > which: no python3 in (/sbin:/bin:/usr/sbin:/usr/bin)
> > make -C pcsd build_gems
> > make[1]: Entering directory `/home/whacuser/pcs/pcsd'
> > bundle package
> > `ruby_22` is not a valid platform. The available options are: [:ruby,
> > :ruby_18, :ruby_19, :ruby_20, :ruby_21, :mri, :mri_18, :mri_19,
> > :mri_20, :mri_21, :rbx, :jruby,
> > :jruby_18, :jruby_19, :mswin, :mingw, :mingw_18, :mingw_19,
> > :mingw_20, :mingw_21, :x64_mingw, :x64_mingw_20, :x64_mingw_21]
> > make[1]: *** [get_gems] Error 4
> > make[1]: Leaving directory `/home/whacuser/pcs/pcsd'
> > make: *** [install_pcsd] Error 2
> >
> >
> > Q1: Is this the complete set of components I need to build?
>
> Not considering pcs, yes.
>
> > Q2: do I need cluster-glue?
>
> It's only used now to be able to use heartbeat-style fence agents. If
> you have what you need in Red Hat's fence agent packages, you don't
> need it.
>
> > Q3: any idea how I can get past the build error with pcsd?
> > Q4: if I use the pcs rpm instead of building pcs from source, I see
> > an error when my cluster starts up 'unable to get cib'. This didn't
> > happen when I was using the redhat rpms, so i'm wondering what i'm
> > missing...
> >
> > thanks
>
> pcs development is closely tied to Red Hat releases, so it's hit-or-
> miss mixing and matching pcs and RHEL versions. Upgrading to RHEL 7.4
> would get you recent versions of everything, though, so that would be
> easiest if it's an option.
> --
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Why "Stop" action isn't called during failover?

2017-12-01 Thread Ken Gaillot
On Tue, 2017-11-21 at 12:58 +0200, Euronas Support wrote:
> Thanks for the answer Ken,
> The constraints are:
> 
> colocation vmgi_with_filesystem1 inf: vmgi filesystem1
> colocation vmgi_with_libvirtd inf: vmgi cl_libvirtd
> order vmgi_after_filesystem1 inf: filesystem1 vmgi
> order vmgi_after_libvirtd inf: cl_libvirtd vmgi

Those look good as far as ordering vmgi relative to the filesystem, but
I see below that it's vm_lomem1 that's left running. Is vmgi a group
containing vm_lomem1?

> On 20.11.2017 16:44:00 Ken Gaillot wrote:
> > On Fri, 2017-11-10 at 11:15 +0200, Klecho wrote:
> > > Hi List,
> > > 
> > > I have a VM, which is constraint dependant on its storage
> > > resource.
> > > 
> > > When the storage resource goes down, I'm observing the following:
> > > 
> > > (pacemaker 1.1.16 & corosync 2.4.2)
> > > 
> > > Nov 10 10:04:36 [1202] NODE-2pengine: info:
> > > LogActions:  
> > > Leave   vm_lomem1   (Started NODE-2)
> > > 
> > > Filesystem(p_AA_Filesystem_Drive16)[2097324]: 2017/11/10_10:04:37
> > > INFO: 
> > > sending signal TERM to: libvirt+ 1160142   1  0 09:01
> > > ?
> > > Sl 0:07 qemu-system-x86_64
> > > 
> > > 
> > > The VM (VirtualDomain RA) gets killed without calling "Stop" RA
> > > action.
> > > 
> > > Isn't the proper way to call "Stop" for all related resources in
> > > such
> > > cases?
> > 
> > Above, it's not Pacemaker that's killing the VM, it's the
> > Filesystem
> > resource itself.
> > 
> > When the Filesystem agent gets a stop request, if it's unable the
> > unmount the filesystem, it can try further action according to its
> > force_unmount option: "This option allows specifying how to handle
> > processes that are currently accessing the mount directory ...
> > Default
> > value, kill processes accessing mount point".
> > 
> > What does the configuration for the resources and constraints look
> > like? Based on what you described, Pacemaker shouldn't try to stop
> > the
> > Filesystem resource before successfully stopping the VM first.

-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: pacemaker with sbd fails to start if node reboots too fast.

2017-12-01 Thread Andrei Borzenkov
01.12.2017 22:36, Gao,Yan пишет:
> On 11/30/2017 06:48 PM, Andrei Borzenkov wrote:
>> 30.11.2017 16:11, Klaus Wenninger пишет:
>>> On 11/30/2017 01:41 PM, Ulrich Windl wrote:

>>> "Gao,Yan"  schrieb am 30.11.2017 um 11:48 in
>>> Nachricht
 :
> On 11/22/2017 08:01 PM, Andrei Borzenkov wrote:
>> SLES12 SP2 with pacemaker 1.1.15-21.1-e174ec8; two node cluster with
>> VM on VSphere using shared VMDK as SBD. During basic tests by killing
>> corosync and forcing STONITH pacemaker was not started after reboot.
>> In logs I see during boot
>>
>> Nov 22 16:04:56 sapprod01s crmd[3151]: crit: We were allegedly
>> just fenced by sapprod01p for sapprod01p
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:  warning: The crmd
>> process (3151) can no longer be respawned,
>> Nov 22 16:04:56 sapprod01s pacemakerd[3137]:   notice: Shutting down
> Pacemaker
>> SBD timeouts are 60s for watchdog and 120s for msgwait. It seems that
>> stonith with SBD always takes msgwait (at least, visually host is not
>> declared as OFFLINE until 120s passed). But VM rebots lightning fast
>> and is up and running long before timeout expires.
 As msgwait was intended for the message to arrive, and not for the
 reboot time (I guess), this just shows a fundamental problem in SBD
 design: Receipt of the fencing command is not confirmed (other than
 by seeing the consequences of ist execution).
>>>
>>> The 2 x msgwait is not for confirmations but for writing the poison-pill
>>> and for
>>> having it read by the target-side.
>>
>> Yes, of course, but that's not what Urlich likely intended to say.
>> msgwait must account for worst case storage path latency, while in
>> normal cases it happens much faster. If fenced node could acknowledge
>> having been killed after reboot, stonith agent could return success much
>> earlier.
> How could an alive man be sure he died before? ;)
> 

It does not need to. It simply needs to write something on startup to
indicate it is back.

Actually, fenced side already does it - it clears pending message when
sbd is started. It is fencing side that simply unconditionally sleeps
for msgwait:

if (mbox_write_verify(st, mbox, s_mbox) < -1) {
rc = -1; goto out;
}
if (strcasecmp(cmd, "exit") != 0) {
cl_log(LOG_INFO, "Messaging delay: %d",
(int)timeout_msgwait);
sleep(timeout_msgwait);
}

What if we do not sleep but rather periodically check slot for
acknowledgement for msgwait timeout? Then we could return earlier.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org