Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-28 Thread Seth Reid
I will try to install updated packages from ubuntu 16.10 or newer. It can't
get worse than not working.

Can you think of any logs that might help? I've enabled debug on corosync
log, but it really doesn't show anything else other than corosync exiting.
Any diagnostic tools you can recommend?

---
Seth Reid
System Operations Engineer
Vendini, Inc.
415.349.7736
sr...@vendini.com
www.vendini.com


On Mon, Mar 27, 2017 at 3:10 PM, Ken Gaillot  wrote:

> On 03/27/2017 03:54 PM, Seth Reid wrote:
> >
> >
> >
> > On Fri, Mar 24, 2017 at 2:10 PM, Ken Gaillot  > > wrote:
> >
> > On 03/24/2017 03:52 PM, Digimer wrote:
> > > On 24/03/17 04:44 PM, Seth Reid wrote:
> > >> I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its
> not in
> > >> production yet because I'm having a problem during fencing. When I
> > >> disable the network interface of any one machine, the disabled
> machines
> > >> is properly fenced leaving me, briefly, with a two node cluster. A
> > >> second node is then fenced off immediately, and the remaining node
> > >> appears to try to fence itself off. This leave two nodes with
> > >> corosync/pacemaker stopped, and the remaining machine still in the
> > >> cluster but showing an offline node and an UNCLEAN node. What can
> be
> > >> causing this behavior?
> > >
> > > It looks like the fence attempt failed, leaving the cluster hung.
> When
> > > you say all nodes were fenced, did all nodes actually reboot? Or
> did the
> > > two surviving nodes just lock up? If the later, then that is the
> proper
> > > response to a failed fence (DLM stays blocked).
> >
> > See comments inline ...
> >
> > >
> > >> Each machine has a dedicated network interface for the cluster,
> and
> > >> there is a vlan on the switch devoted to just these interfaces.
> > >> In the following, I disabled the interface on node id 2 (b014).
> > Node 1
> > >> (b013) is fenced as well. Node 2 (b015) is still up.
> > >>
> > >> Logs from b013:
> > >> Mar 24 16:35:01 b013 CRON[19133]: (root) CMD (command -v
> debian-sa1 >
> > >> /dev/null && debian-sa1 1 1)
> > >> Mar 24 16:35:13 b013 corosync[2134]: notice  [TOTEM ] A processor
> > >> failed, forming new configuration.
> > >> Mar 24 16:35:13 b013 corosync[2134]:  [TOTEM ] A processor failed,
> > >> forming new configuration.
> > >> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] A new
> > membership
> > >> (192.168.100.13:576 
> > ) was formed. Members left: 2
> > >> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] Failed to
> > receive
> > >> the leave message. failed: 2
> > >> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] A new membership
> > >> (192.168.100.13:576 
> > ) was formed. Members left: 2
> > >> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] Failed to receive
> the
> > >> leave message. failed: 2
> > >> Mar 24 16:35:17 b013 attrd[2223]:   notice: crm_update_peer_proc:
> > Node
> > >> b014-cl[2] - state is now lost (was member)
> > >> Mar 24 16:35:17 b013 cib[2220]:   notice: crm_update_peer_proc:
> Node
> > >> b014-cl[2] - state is now lost (was member)
> > >> Mar 24 16:35:17 b013 cib[2220]:   notice: Removing b014-cl/2 from
> the
> > >> membership list
> > >> Mar 24 16:35:17 b013 cib[2220]:   notice: Purged 1 peers with id=2
> > >> and/or uname=b014-cl from the membership cache
> > >> Mar 24 16:35:17 b013 pacemakerd[2187]:   notice:
> > crm_reap_unseen_nodes:
> > >> Node b014-cl[2] - state is now lost (was member)
> > >> Mar 24 16:35:17 b013 attrd[2223]:   notice: Removing b014-cl/2
> > from the
> > >> membership list
> > >> Mar 24 16:35:17 b013 attrd[2223]:   notice: Purged 1 peers with
> id=2
> > >> and/or uname=b014-cl from the membership cache
> > >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice:
> > crm_update_peer_proc:
> > >> Node b014-cl[2] - state is now lost (was member)
> > >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Removing
> > b014-cl/2 from
> > >> the membership list
> > >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Purged 1 peers
> with
> > >> id=2 and/or uname=b014-cl from the membership cache
> > >> Mar 24 16:35:17 b013 dlm_controld[2727]: 3091 fence request 2 pid
> > 19223
> > >> nodedown time 1490387717 fence_all dlm_stonith
> > >> Mar 24 16:35:17 b013 kernel: [ 3091.800118] dlm: closing
> > connection to
> > >> node 2
> > >> Mar 24 16:35:17 b013 crmd[2227]:   notice: crm_reap_unseen_nodes:
> > Node
> > >> b014-cl[2] - state is now lost (was member)
> > >> Mar 24 16:35:17 b013 dlm_stonith: stonith_api_time: Found 0
> > 

Re: [ClusterLabs] cloned resource not deployed on all matching nodes

2017-03-28 Thread Radoslaw Garbacz
Thanks,

On Tue, Mar 28, 2017 at 2:37 PM, Ken Gaillot  wrote:

> On 03/28/2017 01:26 PM, Radoslaw Garbacz wrote:
> > Hi,
> >
> > I have a situation when a cloned resource is being deployed only on some
> > of the nodes, even though this resource is similar to others, which are
> > being deployed according to location rules properly.
> >
> > Please take a look at the configuration below and let me know if there
> > is anything to do to make the resource "dbx_nfs_mounts_datas" (which is
> > a primitive of "dbx_nfs_mounts_datas-clone") being deployed on all 4
> > nodes matching its location rules.
>
> Look in your logs for "pengine:" messages. They will list the decisions
> made about where to start resources, then have a message about
> "Calculated transition ... saving inputs in ..." with a file name.
>
> You can run crm_simulate on that file to see why the decisions were
> made. The output is somewhat difficult to follow, but "crm_simulate -Ssx
> $FILENAME" will show every score that went into the decision.
>
> >
> >
> > Thanks in advance,
> >
> >
> >
> > * Configuration:
> > ** Nodes:
> > 
> >   
> > 
> >   
> >   
> >   
> > 
> >   
> >   
> > 
> >   
> >   
> >   
> > 
> >   
> >   
> > 
> >   
> >   
> >   
> > 
> >   
> >   
> > 
> >   
> >   
> >   
> > 
> >   
> >   
> > 
> >   
> >   
> >   
> > 
> >   
> > 
> >
> >
> >
> > ** Resource in question:
> >   
> > http://dbx_mounts.ocf.sh>" class="ocf" provider="dbxcl">
> >> id="dbx_nfs_mounts_datas-instance_attributes">
> >  ...
> >   
> >   
> >  ...
> >   
> > 
> > 
> >> id="dbx_nfs_mounts_datas-meta_attributes-target-role"/>
> >> id="dbx_nfs_mounts_datas-meta_attributes-clone-max"/>
> > 
> >   
> >
> >
> >
> > ** Resource location
> >> rsc="dbx_nfs_mounts_datas">
> >  > id="on_nodes_dbx_nfs_mounts_datas-INFINITY" boolean-op="and">
> >> id="on_nodes_dbx_nfs_mounts_datas-INFINITY-0-expr" value="Active"/>
> >> id="on_nodes_dbx_nfs_mounts_datas-INFINITY-1-expr" value="AD"/>
> > 
> >  > id="on_nodes_dbx_nfs_mounts_datas--INFINITY" boolean-op="or">
> >> id="on_nodes_dbx_nfs_mounts_datas--INFINITY-0-expr" value="Active"/>
> >> id="on_nodes_dbx_nfs_mounts_datas--INFINITY-1-expr" value="AD"/>
> > 
> >   
> >
> >
> >
> > ** Status on properly deployed node:
> >> type="dbx_mounts.ocf.sh " class="ocf"
> > provider="dbxcl">
> >  > operation_key="dbx_nfs_mounts_datas_start_0" operation="start"
> > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> > transition-key="156:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> > transition-magic="0:0;156:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> > on_node="ip-10-180-227-53" call-id="85" rc-code="0" op-status="0"
> > interval="0" last-run="1490720995" last-rc-change="1490720995"
> > exec-time="733" queue-time="0"
> > op-digest="e95785e3e2d043b0bda24c5bd4655317" op-force-restart=""
> > op-restart-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
> >  > operation_key="dbx_nfs_mounts_datas_monitor_137000" operation="monitor"
> > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> > transition-key="157:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> > transition-magic="0:0;157:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> > on_node="ip-10-180-227-53" call-id="86" rc-code="0" op-status="0"
> > interval="137000" last-rc-change="1490720995" exec-time="172"
> > queue-time="0" op-digest="a992d78564e6b3942742da0859d8c734"/>
> >   
> >
> >
> >
> > ** Status on not properly deployed node:
> >> type="dbx_mounts.ocf.sh " class="ocf"
> > provider="dbxcl">
> >  > operation_key="dbx_nfs_mounts_datas_monitor_0" operation="monitor"
> > crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> > transition-key="73:0:7:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> > transition-magic="0:7;73:0:7:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> > on_node="ip-10-183-39-69" call-id="39" rc-code="7" op-status="0"
> > interval="0" last-run="1490720950" last-rc-change="1490720950"
> > exec-time="172" queue-time="0"
> > op-digest="e95785e3e2d043b0bda24c5bd4655317" op-force-restart=""
> > op-restart-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
> >   
> >
> >
> >
> > --
> > Best Regards,
> >
> > Radoslaw Garbacz
> > XtremeData Incorporated
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: 

[ClusterLabs] cloned resource not deployed on all matching nodes

2017-03-28 Thread Radoslaw Garbacz
Hi,

I have a situation when a cloned resource is being deployed only on some of
the nodes, even though this resource is similar to others, which are being
deployed according to location rules properly.

Please take a look at the configuration below and let me know if there is
anything to do to make the resource "dbx_nfs_mounts_datas" (which is a
primitive of "dbx_nfs_mounts_datas-clone") being deployed on all 4 nodes
matching its location rules.


Thanks in advance,



* Configuration:
** Nodes:

  

  
  
  

  
  

  
  
  

  
  

  
  
  

  
  

  
  
  

  
  

  
  
  

  




** Resource in question:
  

  
 ...
  
  
 ...
  


  
  

  



** Resource location
  

  
  


  
  

  



** Status on properly deployed node:
  


  



** Status on not properly deployed node:
  

  



-- 
Best Regards,

Radoslaw Garbacz
XtremeData Incorporated
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Dejan Muhamedagic
On Tue, Mar 28, 2017 at 04:20:12PM +0300, Alexander Markov wrote:
> Hello, Dejan,
> 
> >Why? I don't have a test system right now, but for instance this
> >should work:
> >
> >$ stonith -t ibmhmc ipaddr=10.1.2.9 -lS
> >$ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename}
> 
> Ah, I see. Everything (including stonith methods, fencing and failover)
> works just fine under normal circumstances. Sorry if I wasn't clear about
> that. The problem occurs only when I have one datacenter (i.e. one IBM
> machine and one HMC) lost due to power outage.
> 
> For example:
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.8 -lS | wc -l
> info: ibmhmc device OK.
> 39
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.9 -lS | wc -l
> info: ibmhmc device OK.
> 39
> 
> As I had said stonith device can see and manage all the cluster nodes.

That's great :)

> >If so, then your configuration does not appear to be correct. If
> >both are capable of managing all nodes then you should tell
> >pacemaker about it.
> 
> Thanks for the hint. But if stonith device return node list, isn't it
> obvious for cluster that it can manage those nodes?

Did you try that? Just drop the location constraints and see if
it works. The pacemaker should actually keep the list of resources
(stonith) capable of managing the node.

> Could you please be more
> precise about what you refer to? I currently changed configuration to two
> fencing levels (one per HMC) but still don't think I get an idea here.
> 
> >Survived node, running stonith resource for dead node tries to
> >contact ipmi device (which is also dead). How does cluster understand that
> >lost node is really dead and it's not just a network issue?
> >
> >It cannot.
> 
> How do people then actually solve the problem of two node metro cluster?

That depends, but if you have a communication channel for stonith
devices which is _independent_ of the cluster communication then
you should be OK. Of course, a fencing device which goes down
together with its node is of no use, but that doesn't seem to be
the case here.

> I mean, I know one option: stonith-enabled=false, but it doesn't seem right
> for me.

Certainly not.

Thanks,

Dejan

> 
> Thank you.
> 
> Regards,
> Alexander Markov
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Ken Gaillot
On 03/28/2017 08:20 AM, Alexander Markov wrote:
> Hello, Dejan,
> 
>> Why? I don't have a test system right now, but for instance this
>> should work:
>>
>> $ stonith -t ibmhmc ipaddr=10.1.2.9 -lS
>> $ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename}
> 
> Ah, I see. Everything (including stonith methods, fencing and failover)
> works just fine under normal circumstances. Sorry if I wasn't clear
> about that. The problem occurs only when I have one datacenter (i.e. one
> IBM machine and one HMC) lost due to power outage.

If the datacenters are completely separate, you might want to take a
look at booth. With booth, you set up a separate cluster at each
datacenter, and booth coordinates which one can host resources. Each
datacenter must have its own self-sufficient cluster with its own
fencing, but one site does not need to be able to fence the other.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683855002656

> 
> For example:
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.8 -lS | wc -l
> info: ibmhmc device OK.
> 39
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.9 -lS | wc -l
> info: ibmhmc device OK.
> 39
> 
> As I had said stonith device can see and manage all the cluster nodes.
> 
>> If so, then your configuration does not appear to be correct. If
>> both are capable of managing all nodes then you should tell
>> pacemaker about it.
> 
> Thanks for the hint. But if stonith device return node list, isn't it
> obvious for cluster that it can manage those nodes? Could you please be
> more precise about what you refer to? I currently changed configuration
> to two fencing levels (one per HMC) but still don't think I get an idea
> here.

I believe Dejan is referring to fencing topology (levels). That would be
preferable to booth if the datacenters are physically close, and even if
one fence device fails, the other can still function.

In this case you'd probably want level 1 = the main fence device, and
level 2 = the fence device to use if the main device fails.

A common implementation (which Digimer uses to great effect) is to use
IPMI as level 1 and an intelligent power switch as level 2. If your
second device can function regardless of what hosts are up or down, you
can do something similar.

> 
>> Survived node, running stonith resource for dead node tries to
>> contact ipmi device (which is also dead). How does cluster understand
>> that
>> lost node is really dead and it's not just a network issue?
>>
>> It cannot.

And it will be unable to recover resources that were running on the
questionable partition.

> 
> How do people then actually solve the problem of two node metro cluster?
> I mean, I know one option: stonith-enabled=false, but it doesn't seem
> right for me.
> 
> Thank you.
> 
> Regards,
> Alexander Markov

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Alexander Markov

Hello, Dejan,


Why? I don't have a test system right now, but for instance this
should work:

$ stonith -t ibmhmc ipaddr=10.1.2.9 -lS
$ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename}


Ah, I see. Everything (including stonith methods, fencing and failover) 
works just fine under normal circumstances. Sorry if I wasn't clear 
about that. The problem occurs only when I have one datacenter (i.e. one 
IBM machine and one HMC) lost due to power outage.


For example:
test01:~ # stonith -t ibmhmc ipaddr=10.1.2.8 -lS | wc -l
info: ibmhmc device OK.
39
test01:~ # stonith -t ibmhmc ipaddr=10.1.2.9 -lS | wc -l
info: ibmhmc device OK.
39

As I had said stonith device can see and manage all the cluster nodes.


If so, then your configuration does not appear to be correct. If
both are capable of managing all nodes then you should tell
pacemaker about it.


Thanks for the hint. But if stonith device return node list, isn't it 
obvious for cluster that it can manage those nodes? Could you please be 
more precise about what you refer to? I currently changed configuration 
to two fencing levels (one per HMC) but still don't think I get an idea 
here.



Survived node, running stonith resource for dead node tries to
contact ipmi device (which is also dead). How does cluster understand 
that

lost node is really dead and it's not just a network issue?

It cannot.


How do people then actually solve the problem of two node metro cluster?
I mean, I know one option: stonith-enabled=false, but it doesn't seem 
right for me.


Thank you.

Regards,
Alexander Markov


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Dejan Muhamedagic
On Mon, Mar 27, 2017 at 01:17:31PM +0300, Alexander Markov wrote:
> Hello, Dejan,
> 
> 
> >The first thing I'd try is making sure you can fence each node from the
> >command line by manually running the fence agent. I'm not sure how to do
> >that for the "stonith:" type agents.
> >
> >There's a program stonith(8). It's easy to replicate the
> >configuration on the command line.
> 
> Unfortunately, it is not.

Why? I don't have a test system right now, but for instance this
should work:

$ stonith -t ibmhmc ipaddr=10.1.2.9 -lS
$ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename}

Read the examples in the man page:

$ man stonith

Check also the documentation of your agent:

$ stonith -t ibmhmc -h
$ stonith -t ibmhmc -n

> The landscape I refer to is similar to VMWare. We use cluster for virtual
> machines (LPARs) and everything works OK but the real pain occurs when whole
> host system is down. Keeping in mind that it's actually used now in
> production, I just can't afford to turn it off for test reason.

Yes, I understand. However, I was just talking about how to use
the stonith agents and how to do the testing outside of
pacemaker.

> >Stonith agents are to be queried for the list of nodes they can
> >manage. It's part of the interface. Some agents can figure that
> >out by themself and some need a parameter defining the node list.
> 
> And this is just the place I'm stuck. I've got two stonith devices (ibmhmc)
> for redundancy. Both of them are capable to manage every node.

If so, then your configuration does not appear to be correct. If
both are capable of managing all nodes then you should tell
pacemaker about it. Digimer had a fairly extensive documentation
on how to configure complex fencing configurations. You can also
check with your vendor's documentation.

> The problem starts when
> 
> 1) one stonith device is completely lost and inaccessible (due to power
> outage in datacenter)
> 2) survived stonith device cannot access nor cluster node neither hosting
> system (in VMWare terms) for this cluster node, for both of them are also
> lost due to power outage.

Both lost? What remained? Why do you mention vmware? I thought
that your nodes are LPARs.

> What is the correct solution for this situation?
> 
> >Well, this used to be a standard way to configure one kind of
> >stonith resources, one common representative being ipmi, and
> >served exactly the purpose of restricting the stonith resource
> >from being enabled ("running") on a node which this resource
> >manages.
> 
> Unfortunately, there's no such thing as ipmi in IBM Power boxes.

I mentioned ipmi as an example, not that it has anything to do
with your setup.

> But it
> triggers interesting question for me: if both one node and its complementary
> ipmi device are lost (due to power outage) - what's happening with a
> cluster?

The cluster gets stuck trying to fence the node. Typically this
would render your cluster unusable. There are some IPMI devices
which have a battery to allow for some extra time to manage the
host.

> Survived node, running stonith resource for dead node tries to
> contact ipmi device (which is also dead). How does cluster understand that
> lost node is really dead and it's not just a network issue?

It cannot.

Thanks,

Dejan

> 
> Thank you.
> 
> -- 
> Regards,
> Alexander Markov
> +79104531955
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org