[ClusterLabs] Resource stop when another resource run on that node

2015-06-30 Thread John Gogu
Hello,
i would like to ask you if you have any idea about how to accomplish
following scenario:

2 cluster nodes (node01 node02)
2 different resources (ex. IP1 run on node01 and IP2 run on node02)

I would like to setup a constraint (or another ideea) that will shutdown
resource IP2 on node02 when IP1 is moved manually or automatically by
pacemaker to node02.


​Thank you,​

*John Gogu*

M:   +49 (0) 152 ​569 485 26
Skype: ionut.gogu
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] SCSI persistant reservation fencing

2015-06-30 Thread Vladimir-M. Obelic
Hi,

I'm trialling SLES 12 w/ HAE to run a Fileserver cluster consisting of
two nodes. Idea was to use SCSI persistant reservation as a fencing
method using the fence_scsi script from stonith fence agents. Two
nodes (a, b) are connected via FC to the same LUN which is then
exported via NFS from the active node only.

The issue is with fence_scsi as crm fails/complains that nodename/key
isn't supplied.

primitive storage-fence stonith:fence_scsi \
params action=off devices="/dev/mapper/mpath_test"  \
op monitor interval=60s timeout=0s

I end up with:

storage-fence_start_0 on fs009a 'unknown error' (1): call=18,
status=Error, last-rc-change='Wed Jun 17 00:51:40 2015', queued=0ms,
exec=1093ms
storage-fence_start_0 on fs009b 'unknown error' (1): call=18,
status=Error, last-rc-change='Wed Jun 17 00:56:42 2015', queued=0ms,
exec=1101ms

and

2015-06-17T01:34:29.156751+02:00 fs009a stonithd[25547]:  warning:
log_operation: storage-fence:25670 [ ERROR:root:Failed: nodename or
key is required ]
2015-06-17T01:34:29.156988+02:00 fs009a stonithd[25547]:  warning:
log_operation: storage-fence:25670 [  ]
2015-06-17T01:34:29.157234+02:00 fs009a stonithd[25547]:  warning:
log_operation: storage-fence:25670 [ ERROR:root:Please use '-h' for
usage ]
2015-06-17T01:34:29.157460+02:00 fs009a stonithd[25547]:  warning:
log_operation: storage-fence:25670 [  ]

Now If nodename is supplied then it doesn't complain. But then I don't
understand the fencing configuration. Should I setup two
stonith:fence_scsi resources each "stickied" to each of the two nodes?
How and when is the stonith resource run when fencing should occur,
and what action parameter for fence_scsi should be configured?

According to fence_scsi resource info page, only the 'action'
parameter is obligatory, while the 'nodename' and 'key' parameters
aren't.
Yet without those, it fails. Seems to me this is a bug?
I've seen a similar issue in RHEL: https://access.redhat.com/solutions/1421063

This is an example from RHEL that takes care of the whole thing, no
additional constraints (and it works!)

 pcs stonith create my-scsi-shooter fence_scsi devices=/dev/sda meta
provides=unfencing

(https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/s1-unfence-HAAR.html)

Note that SLES12 still uses crm while RHEL uses pcs. Also in SLES the
meta attribute 'provides' doesn't exist. Is there a way to translate
the RHEL pcs command to SLES?


Here the complete config:
crm config http://pastebin.com/mqxge6jm
corosync.conf http://pastebin.com/M5sr7htC

corosync 2.3.3
pacemaker 1.1.12

Any help appreciated!

Regards,
Vladimir

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource stop when another resource run on that node

2015-06-30 Thread Andrei Borzenkov
On Tue, Jun 30, 2015 at 2:50 PM, John Gogu  wrote:
> Hello,
> i would like to ask you if you have any idea about how to accomplish
> following scenario:
>
> 2 cluster nodes (node01 node02)
> 2 different resources (ex. IP1 run on node01 and IP2 run on node02)
>
> I would like to setup a constraint (or another ideea) that will shutdown
> resource IP2 on node02 when IP1 is moved manually or automatically by
> pacemaker to node02.
>

This is called colocation constraints. See
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-colocation.html#_colocation_properties
for details.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] SCSI persistant reservation fencing

2015-06-30 Thread Kristoffer Grönlund
"Vladimir-M. Obelic"  writes:

> Hi,
>
> I'm trialling SLES 12 w/ HAE to run a Fileserver cluster consisting of
> two nodes. Idea was to use SCSI persistant reservation as a fencing
> method using the fence_scsi script from stonith fence agents. Two
> nodes (a, b) are connected via FC to the same LUN which is then
> exported via NFS from the active node only.
>
> According to fence_scsi resource info page, only the 'action'
> parameter is obligatory, while the 'nodename' and 'key' parameters
> aren't.
> Yet without those, it fails. Seems to me this is a bug?
> I've seen a similar issue in RHEL: https://access.redhat.com/solutions/1421063

Yes, it looks like you are encountering this precise issue. Please file
an issue with SUSE about this!

>
> This is an example from RHEL that takes care of the whole thing, no
> additional constraints (and it works!)
>
>  pcs stonith create my-scsi-shooter fence_scsi devices=/dev/sda meta
> provides=unfencing
>
> (https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/s1-unfence-HAAR.html)
>
> Note that SLES12 still uses crm while RHEL uses pcs. Also in SLES the
> meta attribute 'provides' doesn't exist. Is there a way to translate
> the RHEL pcs command to SLES?

I would recommend against following guides for RHEL when configuring
SLES. While the core software is the same, the versions are different,
they are patched differently and the tools around pacemaker are
different.

The pcs command creates pretty much the same resource as the crm
command, there are two problems however: The first is that you are
encountering the above bug, and the second is that the "provides" meta
attribute is not known by crmsh.

The meta attribute can can still be created, however you will get a
warning.

// Kristoffer

>
>
> Here the complete config:
> crm config http://pastebin.com/mqxge6jm
> corosync.conf http://pastebin.com/M5sr7htC
>
> corosync 2.3.3
> pacemaker 1.1.12
>
> Any help appreciated!
>
> Regards,
> Vladimir
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Which node initiates fencing?

2015-06-30 Thread Jonathan Vargas
Thanks Ken,

We will do our tests.



*Jonathan Vargas Rodríguez*
Founder and Solution Engineer
Alkaid  | Open Source Software

* mail *  jonathan.var...@alkaid.cr
 telf   +506 4001 6259 Ext. 01
 mobi   +506 4001 6259 Ext. 51



   


2015-06-25 8:57 GMT-06:00 Ken Gaillot :

> On 06/24/2015 06:39 PM, Jonathan Vargas wrote:
> > Thanks Ken.
> >
> > It's weird. Because we did tests and that did not happen.
> >
> > There is a node (named Z) without stonith/sbd resources assigned at all
> and
> > but it was the node that sent the fencing request to a crashed node (X).
> >
> > But this error appeared in its logs: "No route to host".
> >
> > It's obvious for us that if SBD isn't running on Z, and there is no
> network
> > access to that crashed node (X), then based on your answer, node Y which
> > really had access to X via SBD had to initiate the fencing request. But
> > this did not happen.
> >
> > In addition to this answer, I wonder if I could tell the cluster to avoid
> > sending fencing requests from specific nodes, or at the other side: Tell
> > the cluster which nodes are authorized to send fencing requests.
> >
> > Any idea?
>
> Yes, that's exactly what you have to do.
>
> By default, a cluster will be "opt-out" -- any resource can run on any
> node unless you tell it otherwise. (You can change that to "opt-in", but
> for simplicity I'll assume you're using the default.)
>
> The node that "runs" the fencing resource will monitor it, so if only
> certain nodes can monitor the device, you need location constraints. How
> you configure that depends on what tools you are using (pcs, crm or
> low-level), but it's simple, you just say "this resource has this score
> on this node". A score of -INFINITY means "never run this resource on
> this node".
>
> For fencing resources, the cluster also need to know which hosts the
> device can fence. By default the cluster will ask the fence agent by
> running its "list" command. If that's not sufficient, you can configure
> a static list of hosts that the device can fence. For details see:
>
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_special_treatment_of_stonith_resources
>
>
> > On Jun 24, 2015 1:56 PM, "Ken Gaillot"  wrote:
> >
> >> On 06/24/2015 12:20 PM, Jonathan Vargas wrote:
> >>> Hi there,
> >>>
> >>> We have a 3-node cluster for OCFS2.
> >>>
> >>> When one of the nodes fail, it should be fenced. I noticed sometimes
> one
> >> of
> >>> them is the one who sends the fencing message to the failing node, and
> >>> sometimes it's the another.
> >>>
> >>> How the cluster decides which of the remaining active nodes will be the
> >> one
> >>> to tell the failed node to fence itself?
> >>>
> >>> Thanks.
> >>
> >> Fencing resources are assigned to a node like any other resource, even
> >> though they don't really "run" anywhere. Assuming you've configured a
> >> recurring monitor operation on the resource, that node will monitor the
> >> device to ensure it's available.
> >>
> >> Because that node has "verified" (monitored) access to the device, the
> >> cluster will prefer that node to execute the fencing if possible. So
> >> it's just whatever node happened to be assigned the fencing resource.
> >>
> >> If for any reason the node with verified access can't do it, the cluster
> >> will fall back to any other capable node.
> >>
> >>> *Jonathan Vargas Rodríguez*
> >>> Founder and Solution Engineer
> >>> Alkaid  | Open Source Software
> >>>
> >>> * mail *  jonathan.var...@alkaid.cr
> >>>  telf   +506 4001 6259 Ext. 01
> >>>  mobi   +506 4001 6259 Ext. 51
> >>>
> >>> 
> >>> 
> >>>    <
> https://twitter.com/alkaidcr
>
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Cluster node getting stopped from other node(resending mail)

2015-06-30 Thread Arjun Pandey
Hi

I am running a 2 node cluster with this config on centos 6.5/6.6

Master/Slave Set: foo-master [foo]
Masters: [ messi ]
Stopped: [ronaldo ]
 eth1-CP(ocf::pw:IPaddr):   Started messi
 eth2-UP(ocf::pw:IPaddr):   Started messi
 eth3-UPCP  (ocf::pw:IPaddr):   Started messi

where i have a multi-state resource foo being run in master/slave mode and
 IPaddr RA is just modified IPAddr2 RA. Additionally i have a
collocation constraint for the IP addr to be collocated with the master.

Sometimes when i setup the cluster , i find that one of the nodes (the
second node that joins ) gets stopped and i find this log.

2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker
Cluster Manager
2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]:   notice:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]:   notice:
do_state_transition: State transition S_PENDING -> S_NOT_DC [
input=I_NOT_DC cause=C_HA_MESSAG
E origin=do_cl_join_finalize_respond ]
2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]:   notice:
attrd_local_callback: Sending full refresh (origin=crmd)
2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]:   notice:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (0)
 This looks to be the likely
reason***
2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]:error:
handle_request: We didn't ask to be shut down, yet our DC is telling us too
.
*

2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]:   notice:
do_state_transition: State transition S_NOT_DC -> S_STOPPING [ input=I_STOP
cause=C_HA_MESSAGE
 origin=route_message ]
2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]:   notice:
lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown...
waiting (2 ops remaining)

Based on the logs , pacemaker on active was stopping the secondary cloud
everytime it joins cluster. This issue seems similar to
http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error

Packages used :-
pacemaker-1.1.12-4.el6.x86_64
pacemaker-libs-1.1.12-4.el6.x86_64
pacemaker-cli-1.1.12-4.el6.x86_64
pacemaker-cluster-libs-1.1.12-4.el6.x86_64
pacemaker-debuginfo-1.1.10-14.el6.x86_64
pcsc-lite-libs-1.5.2-13.el6_4.x86_64
pcs-0.9.90-2.el6.centos.2.noarch
pcsc-lite-1.5.2-13.el6_4.x86_64
pcsc-lite-openct-0.6.19-4.el6.x86_64
corosync-1.4.1-17.el6.x86_64
corosynclib-1.4.1-17.el6.x86_64



Thanks in advance for your help

Regards
Arjun
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Resource stop when another resource run on that node

2015-06-30 Thread John Gogu
​Hello,
this is what i have setup but is now working 100%:

Online: [ node01hb0 node02hb0 ]
Full list of resources:
 IP1_Vir(ocf::heartbeat:IPaddr):Started node01hb0
 IP2_Vir(ocf::heartbeat:IPaddr):Started node02hb0


 default-resource-stickiness: 2000


​Location Constraints:
  Resource: IP1_Vir
Enabled on: node01hb0 (score:1000)

  Resource: IP2_Vir
Disabled on: node01hb0 (score:-INFINITY)

Colocation Constraints:
  IP2_Vir with IP1_Vir (score:-INFINITY)

​When i move manual the resource ​IP1_Vir from node01hb0 > node02hb0 all is
fine, IP2_Vir is stopped.
When i crash node node01hb0 / stop pacemaker > both resources are stopped.


​Regards,
John Gogu​
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org