[ClusterLabs] Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-20 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 19.07.2016 um 16:17 in 
>>> Nachricht
:

[...]
> You're right -- if not told otherwise, Pacemaker will query the device
> for the target list. In this case, the output of "stonith_admin -l"

In sles11 SP4 I see the following (surprising) output:
"stonith_admin -l" shows the usage message
"stonith_admin -l any" shows the configured devices, independently whether the 
given name is part of the cluster or no. Even if that host does not exist at 
all the same list is displayed:
 prm_stonith_sbd:0
 prm_stonith_sbd

Is that the way it's meant to be?

> suggests it's not returning the desired information. I'm not familiar
> with the external agents, so I don't know why that would be. I
> mistakenly assumed it worked similarly to fence_ipmilan ...

Regards,
Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Zhu Lingshan

Hi Jason,

tcm_node is in a package called lio-utils. If it is SUSE, you can try to 
zypper in lio-utils.



Thanks,
BR
Zhu Lingsan

On 07/20/2016 11:08 PM, Jason A Ramsey wrote:

I have been struggling getting a HA iSCSI Target cluster in place for literally 
weeks. I cannot, for whatever reason, get pacemaker to create an 
iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
believe that I’m missing something on the systems (“tcm_node”). Here are my 
setup commands leading up to seeing this error message:

# pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s

# pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" path=/dev/drbd1 
implementation="lio" op monitor interval=15s


Failed Actions:
* hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
status=complete, exitreason='Setup problem: couldn't find command: tcm_node',
 last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms

This is with the following installed:

pacemaker-cli-1.1.13-10.el7.x86_64
pacemaker-1.1.13-10.el7.x86_64
pacemaker-libs-1.1.13-10.el7.x86_64
pacemaker-cluster-libs-1.1.13-10.el7.x86_64
corosynclib-2.3.4-7.el7.x86_64
corosync-2.3.4-7.el7.x86_64

Please please please…any ideas are appreciated. I’ve exhausted all avenues of 
investigation at this point and don’t know what to do. Thank you!

--
  
[ jR ]

@: ja...@eramsey.org
  
   there is no path to greatness; greatness is the path


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-20 Thread Andrew Beekhof
On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers  wrote:
> Ken Gaillot  wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
>
> [snipped]
>
> I don't want to comment directly on any of the excellent points which
> have been raised in this thread, but it seems like a good time to make
> a plea for easier reload / restart of individual instances of cloned
> services, one node at a time.  Currently, if nodes are all managed by
> a configuration management system (such as Chef in our case),

Puppet creates the same kinds of issues.
Both seem designed for a magical world full of unrelated servers that
require no co-ordination to update.
Particularly when the timing of an update to some central store (cib,
database, whatever) needs to be carefully ordered.

When you say "restart" though, is that a traditional stop/start cycle
in Pacemaker that also results in all the dependancies being stopped
too?
I'm guessing you really want the "atomic reload" kind where nothing
else is affected because we already have the other style covered by
crm_resource --restart.

I propose that we introduce a --force-restart option for crm_resource which:

1. disables any recurring monitor operations
2. calls a native restart action directly on the resource if it
exists, otherwise calls the native stop+start actions
3. re-enables the recurring monitor operations regardless of whether
the reload succeeds, fails, or times out, etc

No maintenance mode required, and whatever state the resource ends up
in is re-detected by the cluster in step 3.

> when the
> system wants to perform a configuration run on that node (e.g. when
> updating a service's configuration file from a template), it is
> necessary to place the entire node in maintenance mode before
> reloading or restarting that service on that node.  It works OK, but
> can result in ugly effects such as the node getting stuck in
> maintenance mode if the chef-client run failed, without any easy way
> to track down the original cause.
>
> I went through several design iterations before settling on this
> approach, and they are detailed in a lengthy comment here, which may
> help you better understand the challenges we encountered:
>
>   
> https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61
>
> Similar challenges are posed during upgrade of Pacemaker-managed
> OpenStack infrastructure.
>
> Cheers,
> Adam
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-20 Thread Ken Gaillot
On 07/20/2016 11:47 AM, Adam Spiers wrote:
> Ken Gaillot  wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
> 
> [snipped]
> 
> I don't want to comment directly on any of the excellent points which
> have been raised in this thread, but it seems like a good time to make
> a plea for easier reload / restart of individual instances of cloned
> services, one node at a time.  Currently, if nodes are all managed by
> a configuration management system (such as Chef in our case), when the
> system wants to perform a configuration run on that node (e.g. when
> updating a service's configuration file from a template), it is
> necessary to place the entire node in maintenance mode before
> reloading or restarting that service on that node.  It works OK, but
> can result in ugly effects such as the node getting stuck in
> maintenance mode if the chef-client run failed, without any easy way
> to track down the original cause.
> 
> I went through several design iterations before settling on this
> approach, and they are detailed in a lengthy comment here, which may
> help you better understand the challenges we encountered:
> 
>   
> https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61

Wow, that is a lot of hard-earned wisdom. :-)

I don't think the problem is restarting individual clone instances. You
can already restart an individual clone instance, by unmanaging the
resource and disabling any monitors on it, then using crm_resource
--force-* on the desired node.

The problem (for your use case) is that is-managed is cluster-wide for
the given resource. I suspect coming up with a per-node
interface/implementation for is-managed would be difficult.

If we implement --force-reload, there won't be a problem with reloads,
since unmanaging shouldn't be necessary.

FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13.

> Similar challenges are posed during upgrade of Pacemaker-managed
> OpenStack infrastructure.
> 
> Cheers,
> Adam
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-20 Thread Ken Gaillot
On 07/20/2016 12:02 PM, Martin Schlegel wrote:
> Thank you Andrei, Ken & Klaus - much appreciated !
> 
> I am now including pcmk_host_list and pcmk_host_check=static-list. 
> 
> The command stonith_admin -l  is now showing the right stonith 
> device
> - the one matching the requested , i.e. stonith_admin -l pg1 would
> show only the registered device p_ston_pg1.
> 
> However, could you please have another look - I'd like to understand what I am
> seeing ?
> 
> 1) Why does pg3 have stonith devices registered even though none of the 
> stonith
> resources (p_ston_pg1, p_ston_pg2 or p_ston_pg3) were started on pg3 according
> to the crm_mon output ?
> 2) Why does pg2 have p_ston_pg3 registered although it only runs p_ston_pg1
> according to the crm_mon output ?

Where a fence device is running does not limit what targets it can
fence, or what nodes can execute fencing using the device.

A fence device may be used by any cluster node, regardless of where the
device is running, or even whether it is running at all -- unless you've
explicitly disabled the device in the configuration.

To pacemaker, having a fence device "running" on a node simply means
that the node runs the recurring monitor for the device (if one is
configured). That gives the node "verified" access to the device, and it
will be preferred to execute the fencing, if it's available -- but
another node can execute the fencing if necessary.

> (see also the detailed output for stonith_admin further below)
> 
> Cheers,
> Martin
> 
> __
> 
> [...]
> primitive p_ston_pg1 stonith:external/ipmi \
> params hostname=pg1 pcmk_host_list=pg1 pcmk_host_check=static-list
> ipaddr=10.148.128.35 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> 
> primitive p_ston_pg2 stonith:external/ipmi \
> params hostname=pg2 pcmk_host_list=pg2 pcmk_host_check=static-list
> ipaddr=10.148.128.19 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> 
> primitive p_ston_pg3 stonith:external/ipmi \
> params hostname=pg3 pcmk_host_list=pg3 pcmk_host_check=static-list
> ipaddr=10.148.128.59 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
> [...]
> 
> 
> root@dsvt0-resiliency-test-7:~# crm_mon -1rR
> Last updated: Wed Jul 20 14:36:13 2016 Last change: Wed Jul 20 14:24:19 2016 
> by
> root via cibadmin on pg2
> Stack: corosync
> Current DC: pg2 (2) (version 1.1.14-70404b0) - partition with quorum
> 3 nodes and 25 resources configured
> 
> Online: [ pg1 (1) pg2 (2) pg3 (3) ]
> 
> Full list of resources:
> 
> p_ston_pg1 (stonith:external/ipmi): Started pg2
> p_ston_pg2 (stonith:external/ipmi): Started pg1
> p_ston_pg3 (stonith:external/ipmi): Started pg1
> 
> [...]
> 
> 
> root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en
> $xnode'\n==\n\n' ; for node in pg{1..3}; do echo -en 'Fence node '\$node'
> with:\n' ; stonith_admin -l \$node ; echo '--' ; done"; done
> pg1
> ==
> 
> Fence node pg1 with:
> No devices found
> --
> Fence node pg2 with:
> 1 devices found
> p_ston_pg2
> --
> Fence node pg3 with:
> 1 devices found
> p_ston_pg3
> --
> pg2
> ==
> 
> Fence node pg1 with:
> 1 devices found
> p_ston_pg1
> --
> Fence node pg2 with:
> No devices found
> --
> Fence node pg3 with:
> 1 devices found
> p_ston_pg3
> --
> pg3
> ==
> 
> Fence node pg1 with:
> 1 devices found
> p_ston_pg1
> --
> Fence node pg2 with:
> 1 devices found
> p_ston_pg2
> --
> Fence node pg3 with:
> No devices found
> --
> 
> 
> 
> root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en
> $xnode'\n==\n\n' ; stonith_admin -L; echo "; done
> pg1
> ==
> 
> 2 devices found
> p_ston_pg3
> p_ston_pg2
> 
> pg2
> ==
> 
> 2 devices found
> p_ston_pg3
> p_ston_pg1
> 
> pg3
> ==
> 
> 2 devices found
> p_ston_pg1
> p_ston_pg2
> 
> 
> 
>> Andrei Borzenkov  hat am 20. Juli 2016 um 08:26
>> geschrieben:
>>
>> On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel  wrote:
> [...]
>
> primitive p_ston_pg1 stonith:external/ipmi \
> params hostname=pg1 ipaddr=10.148.128.35 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
>
> primitive p_ston_pg2 stonith:external/ipmi \
> params hostname=pg2 ipaddr=10.148.128.19 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
>
> primitive p_ston_pg3 stonith:external/ipmi \
> params hostname=pg3 ipaddr=10.148.128.59 userid=root
> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
> passwd_method=file interface=lan priv=OPERATOR
>
> location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 }
> resource-discovery=exclusi

Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Jason A Ramsey
Actually, according to http://linux-iscsi.org/wiki/Lio-utils lio-utils has been 
deprecated and replaced by targetcli.

--
 
[ jR ]
@: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

On 7/20/16, 12:09 PM, "Andrei Borzenkov"  wrote:

20.07.2016 18:08, Jason A Ramsey пишет:
> I have been struggling getting a HA iSCSI Target cluster in place for 
literally weeks. I cannot, for whatever reason, get pacemaker to create an 
iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
believe that I’m missing something on the systems (“tcm_node”). Here are my 
setup commands leading up to seeing this error message:
> 
> # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s
> 
> # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
path=/dev/drbd1 implementation="lio" op monitor interval=15s
> 
> 
> Failed Actions:
> * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
status=complete, exitreason='Setup problem: couldn't find command: tcm_node',

tcm_node is part of lio-utils. I am not familiar with RedHat packages,
but I presume that searching for "lio" should reveal something.

> last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms
> 
> This is with the following installed:
> 
> pacemaker-cli-1.1.13-10.el7.x86_64
> pacemaker-1.1.13-10.el7.x86_64
> pacemaker-libs-1.1.13-10.el7.x86_64
> pacemaker-cluster-libs-1.1.13-10.el7.x86_64
> corosynclib-2.3.4-7.el7.x86_64
> corosync-2.3.4-7.el7.x86_64
> 
> Please please please…any ideas are appreciated. I’ve exhausted all 
avenues of investigation at this point and don’t know what to do. Thank you!
> 
> --
>  
> [ jR ]
> @: ja...@eramsey.org
>  
>   there is no path to greatness; greatness is the path
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Jason A Ramsey
Hmm. lio-utils is replaced with targetcli in rhel7. Is there any way to make 
pacemaker use targetcli? I’m starting to wonder if it’s even possible to make 
an HA iSCSI Target with RedHat...

--
 
[ jR ]
@: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

On 7/20/16, 12:09 PM, "Andrei Borzenkov"  wrote:

20.07.2016 18:08, Jason A Ramsey пишет:
> I have been struggling getting a HA iSCSI Target cluster in place for 
literally weeks. I cannot, for whatever reason, get pacemaker to create an 
iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
believe that I’m missing something on the systems (“tcm_node”). Here are my 
setup commands leading up to seeing this error message:
> 
> # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s
> 
> # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
path=/dev/drbd1 implementation="lio" op monitor interval=15s
> 
> 
> Failed Actions:
> * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
status=complete, exitreason='Setup problem: couldn't find command: tcm_node',

tcm_node is part of lio-utils. I am not familiar with RedHat packages,
but I presume that searching for "lio" should reveal something.

> last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms
> 
> This is with the following installed:
> 
> pacemaker-cli-1.1.13-10.el7.x86_64
> pacemaker-1.1.13-10.el7.x86_64
> pacemaker-libs-1.1.13-10.el7.x86_64
> pacemaker-cluster-libs-1.1.13-10.el7.x86_64
> corosynclib-2.3.4-7.el7.x86_64
> corosync-2.3.4-7.el7.x86_64
> 
> Please please please…any ideas are appreciated. I’ve exhausted all 
avenues of investigation at this point and don’t know what to do. Thank you!
> 
> --
>  
> [ jR ]
> @: ja...@eramsey.org
>  
>   there is no path to greatness; greatness is the path
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-20 Thread Martin Schlegel
Thank you Andrei, Ken & Klaus - much appreciated !

I am now including pcmk_host_list and pcmk_host_check=static-list. 

The command stonith_admin -l  is now showing the right stonith device
- the one matching the requested , i.e. stonith_admin -l pg1 would
show only the registered device p_ston_pg1.

However, could you please have another look - I'd like to understand what I am
seeing ?

1) Why does pg3 have stonith devices registered even though none of the stonith
resources (p_ston_pg1, p_ston_pg2 or p_ston_pg3) were started on pg3 according
to the crm_mon output ?
2) Why does pg2 have p_ston_pg3 registered although it only runs p_ston_pg1
according to the crm_mon output ?

(see also the detailed output for stonith_admin further below)

Cheers,
Martin

__

[...]
primitive p_ston_pg1 stonith:external/ipmi \
params hostname=pg1 pcmk_host_list=pg1 pcmk_host_check=static-list
ipaddr=10.148.128.35 userid=root
passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
passwd_method=file interface=lan priv=OPERATOR

primitive p_ston_pg2 stonith:external/ipmi \
params hostname=pg2 pcmk_host_list=pg2 pcmk_host_check=static-list
ipaddr=10.148.128.19 userid=root
passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
passwd_method=file interface=lan priv=OPERATOR

primitive p_ston_pg3 stonith:external/ipmi \
params hostname=pg3 pcmk_host_list=pg3 pcmk_host_check=static-list
ipaddr=10.148.128.59 userid=root
passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
passwd_method=file interface=lan priv=OPERATOR
[...]


root@dsvt0-resiliency-test-7:~# crm_mon -1rR
Last updated: Wed Jul 20 14:36:13 2016 Last change: Wed Jul 20 14:24:19 2016 by
root via cibadmin on pg2
Stack: corosync
Current DC: pg2 (2) (version 1.1.14-70404b0) - partition with quorum
3 nodes and 25 resources configured

Online: [ pg1 (1) pg2 (2) pg3 (3) ]

Full list of resources:

p_ston_pg1 (stonith:external/ipmi): Started pg2
p_ston_pg2 (stonith:external/ipmi): Started pg1
p_ston_pg3 (stonith:external/ipmi): Started pg1

[...]


root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en
$xnode'\n==\n\n' ; for node in pg{1..3}; do echo -en 'Fence node '\$node'
with:\n' ; stonith_admin -l \$node ; echo '--' ; done"; done
pg1
==

Fence node pg1 with:
No devices found
--
Fence node pg2 with:
1 devices found
p_ston_pg2
--
Fence node pg3 with:
1 devices found
p_ston_pg3
--
pg2
==

Fence node pg1 with:
1 devices found
p_ston_pg1
--
Fence node pg2 with:
No devices found
--
Fence node pg3 with:
1 devices found
p_ston_pg3
--
pg3
==

Fence node pg1 with:
1 devices found
p_ston_pg1
--
Fence node pg2 with:
1 devices found
p_ston_pg2
--
Fence node pg3 with:
No devices found
--



root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en
$xnode'\n==\n\n' ; stonith_admin -L; echo "; done
pg1
==

2 devices found
p_ston_pg3
p_ston_pg2

pg2
==

2 devices found
p_ston_pg3
p_ston_pg1

pg3
==

2 devices found
p_ston_pg1
p_ston_pg2



> Andrei Borzenkov  hat am 20. Juli 2016 um 08:26
> geschrieben:
> 
> On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel  wrote:
> >> > [...]
> >> >
> >> > primitive p_ston_pg1 stonith:external/ipmi \
> >> > params hostname=pg1 ipaddr=10.148.128.35 userid=root
> >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
> >> > passwd_method=file interface=lan priv=OPERATOR
> >> >
> >> > primitive p_ston_pg2 stonith:external/ipmi \
> >> > params hostname=pg2 ipaddr=10.148.128.19 userid=root
> >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass"
> >> > passwd_method=file interface=lan priv=OPERATOR
> >> >
> >> > primitive p_ston_pg3 stonith:external/ipmi \
> >> > params hostname=pg3 ipaddr=10.148.128.59 userid=root
> >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass"
> >> > passwd_method=file interface=lan priv=OPERATOR
> >> >
> >> > location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 }
> >> > resource-discovery=exclusive \
> >> > rule #uname eq pg1 \
> >> > rule #uname eq pg2 \
> >> > rule #uname eq pg3
> >> >
> >> > location l_ston_pg1 p_ston_pg1 -inf: pg1
> >> > location l_ston_pg2 p_ston_pg2 -inf: pg2
> >> > location l_ston_pg3 p_ston_pg3 -inf: pg3
> >>
> >> These constraints prevent each device from running on its intended
> >> target, but they don't limit which nodes each device can fence. For
> >> that, each device needs a pcmk_host_list or pcmk_host_map entry, for
> >> example:
> >>
> >> primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com
> >>
> >> Use pcmk_host_list if the fence device needs the node name as known to
> >> the cluster, and pcmk_host_map if you need to translate a node name to
> >> an address the device understands.
> 
> > We used the parameter "hostname". What does it do if not that ?
> 
> hostname is resource parameter. From pacemaker point of view this is
> opaque string and only resource agent knows how to interpret it.
> 
> 

Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Greg Woods
On Wed, Jul 20, 2016 at 10:09 AM, Andrei Borzenkov 
wrote:

> tcm_node is part of lio-utils. I am not familiar with RedHat packages,
> but I presume that searching for "lio" should reveal something.
>

I checked on both Fedora and CentOS, and there is no such package and no
package provides a file called "tcm_node".  I also looked at rpmfind.net
and the only RPMs I found are for various versions of OpenSUSE. Looks like
something slipped in that is SuSE-specific.

--Greg
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Doing reload right

2016-07-20 Thread Adam Spiers
Ken Gaillot  wrote:
> Hello all,
> 
> I've been meaning to address the implementation of "reload" in Pacemaker
> for a while now, and I think the next release will be a good time, as it
> seems to be coming up more frequently.

[snipped]

I don't want to comment directly on any of the excellent points which
have been raised in this thread, but it seems like a good time to make
a plea for easier reload / restart of individual instances of cloned
services, one node at a time.  Currently, if nodes are all managed by
a configuration management system (such as Chef in our case), when the
system wants to perform a configuration run on that node (e.g. when
updating a service's configuration file from a template), it is
necessary to place the entire node in maintenance mode before
reloading or restarting that service on that node.  It works OK, but
can result in ugly effects such as the node getting stuck in
maintenance mode if the chef-client run failed, without any easy way
to track down the original cause.

I went through several design iterations before settling on this
approach, and they are detailed in a lengthy comment here, which may
help you better understand the challenges we encountered:

  
https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61

Similar challenges are posed during upgrade of Pacemaker-managed
OpenStack infrastructure.

Cheers,
Adam

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Andrei Borzenkov
20.07.2016 18:08, Jason A Ramsey пишет:
> I have been struggling getting a HA iSCSI Target cluster in place for 
> literally weeks. I cannot, for whatever reason, get pacemaker to create an 
> iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
> believe that I’m missing something on the systems (“tcm_node”). Here are my 
> setup commands leading up to seeing this error message:
> 
> # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
> iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s
> 
> # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
> target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
> path=/dev/drbd1 implementation="lio" op monitor interval=15s
> 
> 
> Failed Actions:
> * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
> status=complete, exitreason='Setup problem: couldn't find command: tcm_node',

tcm_node is part of lio-utils. I am not familiar with RedHat packages,
but I presume that searching for "lio" should reveal something.

> last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms
> 
> This is with the following installed:
> 
> pacemaker-cli-1.1.13-10.el7.x86_64
> pacemaker-1.1.13-10.el7.x86_64
> pacemaker-libs-1.1.13-10.el7.x86_64
> pacemaker-cluster-libs-1.1.13-10.el7.x86_64
> corosynclib-2.3.4-7.el7.x86_64
> corosync-2.3.4-7.el7.x86_64
> 
> Please please please…any ideas are appreciated. I’ve exhausted all avenues of 
> investigation at this point and don’t know what to do. Thank you!
> 
> --
>  
> [ jR ]
> @: ja...@eramsey.org
>  
>   there is no path to greatness; greatness is the path
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Setup problem: couldn't find command: tcm_node

2016-07-20 Thread Jason A Ramsey
I have been struggling getting a HA iSCSI Target cluster in place for literally 
weeks. I cannot, for whatever reason, get pacemaker to create an 
iSCSILogicalUnit resource. The error message that I’m seeing leads me to 
believe that I’m missing something on the systems (“tcm_node”). Here are my 
setup commands leading up to seeing this error message:

# pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s

# pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
path=/dev/drbd1 implementation="lio" op monitor interval=15s


Failed Actions:
* hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, 
status=complete, exitreason='Setup problem: couldn't find command: tcm_node',
last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms

This is with the following installed:

pacemaker-cli-1.1.13-10.el7.x86_64
pacemaker-1.1.13-10.el7.x86_64
pacemaker-libs-1.1.13-10.el7.x86_64
pacemaker-cluster-libs-1.1.13-10.el7.x86_64
corosynclib-2.3.4-7.el7.x86_64
corosync-2.3.4-7.el7.x86_64

Please please please…any ideas are appreciated. I’ve exhausted all avenues of 
investigation at this point and don’t know what to do. Thank you!

--
 
[ jR ]
@: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker not always selecting the right stonith device

2016-07-20 Thread Klaus Wenninger
On 07/19/2016 06:54 PM, Andrei Borzenkov wrote:
> 19.07.2016 19:01, Andrei Borzenkov пишет:
>> 19.07.2016 18:24, Klaus Wenninger пишет:
>>> On 07/19/2016 04:17 PM, Ken Gaillot wrote:
 On 07/19/2016 09:00 AM, Andrei Borzenkov wrote:
> On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot  wrote:
> ...
>>> primitive p_ston_pg1 stonith:external/ipmi \
>>>  params hostname=pg1 ipaddr=10.148.128.35 userid=root
>>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass"
>>> passwd_method=file interface=lan priv=OPERATOR
>>>
> ...
>> These constraints prevent each device from running on its intended
>> target, but they don't limit which nodes each device can fence. For
>> that, each device needs a pcmk_host_list or pcmk_host_map entry, for
>> example:
>>
>>primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com
>>
>> Use pcmk_host_list if the fence device needs the node name as known to
>> the cluster, and pcmk_host_map if you need to translate a node name to
>> an address the device understands.
>>
> Is not pacemaker expected by default to query stonith agent instance
> (sorry I do not know proper name for it) for a list of hosts it can
> manage? And external/ipmi should return value of "hostname" patameter
> here? So the question is why it does not work?
 You're right -- if not told otherwise, Pacemaker will query the device
 for the target list. In this case, the output of "stonith_admin -l"
 suggests it's not returning the desired information. I'm not familiar
 with the external agents, so I don't know why that would be. I
 mistakenly assumed it worked similarly to fence_ipmilan ...
>>> guess it worked at the times when pacemaker did fencing via
>>> cluster-glue-code...
>>> A grep for "gethosts" doesn't return much for current pacemaker-sources
>>> apart
>>> from some leftovers in cts.
>> Oh oh ... this sounds like a bug, no?
>>
> Apparently of all cluster-glue agents only ec2 supports both old and new
> variants
>
> gethosts|hostlist|list)
> # List of names we know about
>
> all others use gethosts. Not sure whether it is something to fix in
> pacemaker or cluster-glue.
Haven't dealt with legacy-fencing for a while so degradation of in-memory
information + development in pacemaker create a portion of uncertainty
in what I'm saying ;-)
What you could try is adding "" to
/usr/sbin/fence_legacy
to convince pacemaker to even try asking the external Linux-HA stonith
plugin.
Unfortunately I currently don't have a setup (no cluster-glue stuff) I could
quickly experiment with legacy-fencing.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] agent ocf:pacemaker:controld

2016-07-20 Thread Da Shi Cao
Thank you all for the information about dlm_controld. I will make a try using 
https://git.fedorahosted.org/cgit/dlm.git/log/ .

Dashi Cao


From: Jan Pokorný 
Sent: Monday, July 18, 2016 8:47:50 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

> On 18/07/16 07:59, Da Shi Cao wrote:
>> dlm_controld is very tightly coupled with cman.

Wrong assumption.

In fact, support for shipping ocf:pacemaker:controld has been
explicitly restricted to cases when CMAN logic (specifically the
respective handle-all initscript that is in turn, in that limited use
case, triggered from pacemaker's proper one and, moreover, takes
care of dlm_controld management on its own so any subsequent attempts
to do the same would be ineffective) is _not_ around:

https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3
(accidental syntactical typos were fixed later on:
https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77)

>> I have built a cluster purely with
>> pacemaker+corosync+fence_sanlock. But if agent
>> ocf:pacemaker:controld is desired, dlm_controld must exist! I can
>> only find it in cman.
>> Can the command dlm_controld be obtained without bringing in cman?

To recap what others have suggested:

On 18/07/16 08:57 +0100, Christine Caulfield wrote:
> There should be a package called 'dlm' that has a dlm_controld suitable
> for use with pacemaker.

On 18/07/16 17:26 +0800, Eric Ren wrote:
> DLM upstream hosted here:
>   https://git.fedorahosted.org/cgit/dlm.git/log/
>
> The name of DLM on openSUSE is libdlm.

--
Jan (Poki)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org