[ClusterLabs] Antw: Re: Pacemaker not always selecting the right stonith device
>>> Ken Gaillot schrieb am 19.07.2016 um 16:17 in >>> Nachricht : [...] > You're right -- if not told otherwise, Pacemaker will query the device > for the target list. In this case, the output of "stonith_admin -l" In sles11 SP4 I see the following (surprising) output: "stonith_admin -l" shows the usage message "stonith_admin -l any" shows the configured devices, independently whether the given name is part of the cluster or no. Even if that host does not exist at all the same list is displayed: prm_stonith_sbd:0 prm_stonith_sbd Is that the way it's meant to be? > suggests it's not returning the desired information. I'm not familiar > with the external agents, so I don't know why that would be. I > mistakenly assumed it worked similarly to fence_ipmilan ... Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
Hi Jason, tcm_node is in a package called lio-utils. If it is SUSE, you can try to zypper in lio-utils. Thanks, BR Zhu Lingsan On 07/20/2016 11:08 PM, Jason A Ramsey wrote: I have been struggling getting a HA iSCSI Target cluster in place for literally weeks. I cannot, for whatever reason, get pacemaker to create an iSCSILogicalUnit resource. The error message that I’m seeing leads me to believe that I’m missing something on the systems (“tcm_node”). Here are my setup commands leading up to seeing this error message: # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" path=/dev/drbd1 implementation="lio" op monitor interval=15s Failed Actions: * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, status=complete, exitreason='Setup problem: couldn't find command: tcm_node', last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms This is with the following installed: pacemaker-cli-1.1.13-10.el7.x86_64 pacemaker-1.1.13-10.el7.x86_64 pacemaker-libs-1.1.13-10.el7.x86_64 pacemaker-cluster-libs-1.1.13-10.el7.x86_64 corosynclib-2.3.4-7.el7.x86_64 corosync-2.3.4-7.el7.x86_64 Please please please…any ideas are appreciated. I’ve exhausted all avenues of investigation at this point and don’t know what to do. Thank you! -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers wrote: > Ken Gaillot wrote: >> Hello all, >> >> I've been meaning to address the implementation of "reload" in Pacemaker >> for a while now, and I think the next release will be a good time, as it >> seems to be coming up more frequently. > > [snipped] > > I don't want to comment directly on any of the excellent points which > have been raised in this thread, but it seems like a good time to make > a plea for easier reload / restart of individual instances of cloned > services, one node at a time. Currently, if nodes are all managed by > a configuration management system (such as Chef in our case), Puppet creates the same kinds of issues. Both seem designed for a magical world full of unrelated servers that require no co-ordination to update. Particularly when the timing of an update to some central store (cib, database, whatever) needs to be carefully ordered. When you say "restart" though, is that a traditional stop/start cycle in Pacemaker that also results in all the dependancies being stopped too? I'm guessing you really want the "atomic reload" kind where nothing else is affected because we already have the other style covered by crm_resource --restart. I propose that we introduce a --force-restart option for crm_resource which: 1. disables any recurring monitor operations 2. calls a native restart action directly on the resource if it exists, otherwise calls the native stop+start actions 3. re-enables the recurring monitor operations regardless of whether the reload succeeds, fails, or times out, etc No maintenance mode required, and whatever state the resource ends up in is re-detected by the cluster in step 3. > when the > system wants to perform a configuration run on that node (e.g. when > updating a service's configuration file from a template), it is > necessary to place the entire node in maintenance mode before > reloading or restarting that service on that node. It works OK, but > can result in ugly effects such as the node getting stuck in > maintenance mode if the chef-client run failed, without any easy way > to track down the original cause. > > I went through several design iterations before settling on this > approach, and they are detailed in a lengthy comment here, which may > help you better understand the challenges we encountered: > > > https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61 > > Similar challenges are posed during upgrade of Pacemaker-managed > OpenStack infrastructure. > > Cheers, > Adam > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
On 07/20/2016 11:47 AM, Adam Spiers wrote: > Ken Gaillot wrote: >> Hello all, >> >> I've been meaning to address the implementation of "reload" in Pacemaker >> for a while now, and I think the next release will be a good time, as it >> seems to be coming up more frequently. > > [snipped] > > I don't want to comment directly on any of the excellent points which > have been raised in this thread, but it seems like a good time to make > a plea for easier reload / restart of individual instances of cloned > services, one node at a time. Currently, if nodes are all managed by > a configuration management system (such as Chef in our case), when the > system wants to perform a configuration run on that node (e.g. when > updating a service's configuration file from a template), it is > necessary to place the entire node in maintenance mode before > reloading or restarting that service on that node. It works OK, but > can result in ugly effects such as the node getting stuck in > maintenance mode if the chef-client run failed, without any easy way > to track down the original cause. > > I went through several design iterations before settling on this > approach, and they are detailed in a lengthy comment here, which may > help you better understand the challenges we encountered: > > > https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61 Wow, that is a lot of hard-earned wisdom. :-) I don't think the problem is restarting individual clone instances. You can already restart an individual clone instance, by unmanaging the resource and disabling any monitors on it, then using crm_resource --force-* on the desired node. The problem (for your use case) is that is-managed is cluster-wide for the given resource. I suspect coming up with a per-node interface/implementation for is-managed would be difficult. If we implement --force-reload, there won't be a problem with reloads, since unmanaging shouldn't be necessary. FYI, maintenance mode is supported for Pacemaker Remote nodes as of 1.1.13. > Similar challenges are posed during upgrade of Pacemaker-managed > OpenStack infrastructure. > > Cheers, > Adam > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker not always selecting the right stonith device
On 07/20/2016 12:02 PM, Martin Schlegel wrote: > Thank you Andrei, Ken & Klaus - much appreciated ! > > I am now including pcmk_host_list and pcmk_host_check=static-list. > > The command stonith_admin -l is now showing the right stonith > device > - the one matching the requested , i.e. stonith_admin -l pg1 would > show only the registered device p_ston_pg1. > > However, could you please have another look - I'd like to understand what I am > seeing ? > > 1) Why does pg3 have stonith devices registered even though none of the > stonith > resources (p_ston_pg1, p_ston_pg2 or p_ston_pg3) were started on pg3 according > to the crm_mon output ? > 2) Why does pg2 have p_ston_pg3 registered although it only runs p_ston_pg1 > according to the crm_mon output ? Where a fence device is running does not limit what targets it can fence, or what nodes can execute fencing using the device. A fence device may be used by any cluster node, regardless of where the device is running, or even whether it is running at all -- unless you've explicitly disabled the device in the configuration. To pacemaker, having a fence device "running" on a node simply means that the node runs the recurring monitor for the device (if one is configured). That gives the node "verified" access to the device, and it will be preferred to execute the fencing, if it's available -- but another node can execute the fencing if necessary. > (see also the detailed output for stonith_admin further below) > > Cheers, > Martin > > __ > > [...] > primitive p_ston_pg1 stonith:external/ipmi \ > params hostname=pg1 pcmk_host_list=pg1 pcmk_host_check=static-list > ipaddr=10.148.128.35 userid=root > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" > passwd_method=file interface=lan priv=OPERATOR > > primitive p_ston_pg2 stonith:external/ipmi \ > params hostname=pg2 pcmk_host_list=pg2 pcmk_host_check=static-list > ipaddr=10.148.128.19 userid=root > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass" > passwd_method=file interface=lan priv=OPERATOR > > primitive p_ston_pg3 stonith:external/ipmi \ > params hostname=pg3 pcmk_host_list=pg3 pcmk_host_check=static-list > ipaddr=10.148.128.59 userid=root > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass" > passwd_method=file interface=lan priv=OPERATOR > [...] > > > root@dsvt0-resiliency-test-7:~# crm_mon -1rR > Last updated: Wed Jul 20 14:36:13 2016 Last change: Wed Jul 20 14:24:19 2016 > by > root via cibadmin on pg2 > Stack: corosync > Current DC: pg2 (2) (version 1.1.14-70404b0) - partition with quorum > 3 nodes and 25 resources configured > > Online: [ pg1 (1) pg2 (2) pg3 (3) ] > > Full list of resources: > > p_ston_pg1 (stonith:external/ipmi): Started pg2 > p_ston_pg2 (stonith:external/ipmi): Started pg1 > p_ston_pg3 (stonith:external/ipmi): Started pg1 > > [...] > > > root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en > $xnode'\n==\n\n' ; for node in pg{1..3}; do echo -en 'Fence node '\$node' > with:\n' ; stonith_admin -l \$node ; echo '--' ; done"; done > pg1 > == > > Fence node pg1 with: > No devices found > -- > Fence node pg2 with: > 1 devices found > p_ston_pg2 > -- > Fence node pg3 with: > 1 devices found > p_ston_pg3 > -- > pg2 > == > > Fence node pg1 with: > 1 devices found > p_ston_pg1 > -- > Fence node pg2 with: > No devices found > -- > Fence node pg3 with: > 1 devices found > p_ston_pg3 > -- > pg3 > == > > Fence node pg1 with: > 1 devices found > p_ston_pg1 > -- > Fence node pg2 with: > 1 devices found > p_ston_pg2 > -- > Fence node pg3 with: > No devices found > -- > > > > root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en > $xnode'\n==\n\n' ; stonith_admin -L; echo "; done > pg1 > == > > 2 devices found > p_ston_pg3 > p_ston_pg2 > > pg2 > == > > 2 devices found > p_ston_pg3 > p_ston_pg1 > > pg3 > == > > 2 devices found > p_ston_pg1 > p_ston_pg2 > > > >> Andrei Borzenkov hat am 20. Juli 2016 um 08:26 >> geschrieben: >> >> On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel wrote: > [...] > > primitive p_ston_pg1 stonith:external/ipmi \ > params hostname=pg1 ipaddr=10.148.128.35 userid=root > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" > passwd_method=file interface=lan priv=OPERATOR > > primitive p_ston_pg2 stonith:external/ipmi \ > params hostname=pg2 ipaddr=10.148.128.19 userid=root > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass" > passwd_method=file interface=lan priv=OPERATOR > > primitive p_ston_pg3 stonith:external/ipmi \ > params hostname=pg3 ipaddr=10.148.128.59 userid=root > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass" > passwd_method=file interface=lan priv=OPERATOR > > location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 } > resource-discovery=exclusi
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
Actually, according to http://linux-iscsi.org/wiki/Lio-utils lio-utils has been deprecated and replaced by targetcli. -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path On 7/20/16, 12:09 PM, "Andrei Borzenkov" wrote: 20.07.2016 18:08, Jason A Ramsey пишет: > I have been struggling getting a HA iSCSI Target cluster in place for literally weeks. I cannot, for whatever reason, get pacemaker to create an iSCSILogicalUnit resource. The error message that I’m seeing leads me to believe that I’m missing something on the systems (“tcm_node”). Here are my setup commands leading up to seeing this error message: > > # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s > > # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" path=/dev/drbd1 implementation="lio" op monitor interval=15s > > > Failed Actions: > * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, status=complete, exitreason='Setup problem: couldn't find command: tcm_node', tcm_node is part of lio-utils. I am not familiar with RedHat packages, but I presume that searching for "lio" should reveal something. > last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms > > This is with the following installed: > > pacemaker-cli-1.1.13-10.el7.x86_64 > pacemaker-1.1.13-10.el7.x86_64 > pacemaker-libs-1.1.13-10.el7.x86_64 > pacemaker-cluster-libs-1.1.13-10.el7.x86_64 > corosynclib-2.3.4-7.el7.x86_64 > corosync-2.3.4-7.el7.x86_64 > > Please please please…any ideas are appreciated. I’ve exhausted all avenues of investigation at this point and don’t know what to do. Thank you! > > -- > > [ jR ] > @: ja...@eramsey.org > > there is no path to greatness; greatness is the path > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
Hmm. lio-utils is replaced with targetcli in rhel7. Is there any way to make pacemaker use targetcli? I’m starting to wonder if it’s even possible to make an HA iSCSI Target with RedHat... -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path On 7/20/16, 12:09 PM, "Andrei Borzenkov" wrote: 20.07.2016 18:08, Jason A Ramsey пишет: > I have been struggling getting a HA iSCSI Target cluster in place for literally weeks. I cannot, for whatever reason, get pacemaker to create an iSCSILogicalUnit resource. The error message that I’m seeing leads me to believe that I’m missing something on the systems (“tcm_node”). Here are my setup commands leading up to seeing this error message: > > # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s > > # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" path=/dev/drbd1 implementation="lio" op monitor interval=15s > > > Failed Actions: > * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, status=complete, exitreason='Setup problem: couldn't find command: tcm_node', tcm_node is part of lio-utils. I am not familiar with RedHat packages, but I presume that searching for "lio" should reveal something. > last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms > > This is with the following installed: > > pacemaker-cli-1.1.13-10.el7.x86_64 > pacemaker-1.1.13-10.el7.x86_64 > pacemaker-libs-1.1.13-10.el7.x86_64 > pacemaker-cluster-libs-1.1.13-10.el7.x86_64 > corosynclib-2.3.4-7.el7.x86_64 > corosync-2.3.4-7.el7.x86_64 > > Please please please…any ideas are appreciated. I’ve exhausted all avenues of investigation at this point and don’t know what to do. Thank you! > > -- > > [ jR ] > @: ja...@eramsey.org > > there is no path to greatness; greatness is the path > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker not always selecting the right stonith device
Thank you Andrei, Ken & Klaus - much appreciated ! I am now including pcmk_host_list and pcmk_host_check=static-list. The command stonith_admin -l is now showing the right stonith device - the one matching the requested , i.e. stonith_admin -l pg1 would show only the registered device p_ston_pg1. However, could you please have another look - I'd like to understand what I am seeing ? 1) Why does pg3 have stonith devices registered even though none of the stonith resources (p_ston_pg1, p_ston_pg2 or p_ston_pg3) were started on pg3 according to the crm_mon output ? 2) Why does pg2 have p_ston_pg3 registered although it only runs p_ston_pg1 according to the crm_mon output ? (see also the detailed output for stonith_admin further below) Cheers, Martin __ [...] primitive p_ston_pg1 stonith:external/ipmi \ params hostname=pg1 pcmk_host_list=pg1 pcmk_host_check=static-list ipaddr=10.148.128.35 userid=root passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" passwd_method=file interface=lan priv=OPERATOR primitive p_ston_pg2 stonith:external/ipmi \ params hostname=pg2 pcmk_host_list=pg2 pcmk_host_check=static-list ipaddr=10.148.128.19 userid=root passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass" passwd_method=file interface=lan priv=OPERATOR primitive p_ston_pg3 stonith:external/ipmi \ params hostname=pg3 pcmk_host_list=pg3 pcmk_host_check=static-list ipaddr=10.148.128.59 userid=root passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass" passwd_method=file interface=lan priv=OPERATOR [...] root@dsvt0-resiliency-test-7:~# crm_mon -1rR Last updated: Wed Jul 20 14:36:13 2016 Last change: Wed Jul 20 14:24:19 2016 by root via cibadmin on pg2 Stack: corosync Current DC: pg2 (2) (version 1.1.14-70404b0) - partition with quorum 3 nodes and 25 resources configured Online: [ pg1 (1) pg2 (2) pg3 (3) ] Full list of resources: p_ston_pg1 (stonith:external/ipmi): Started pg2 p_ston_pg2 (stonith:external/ipmi): Started pg1 p_ston_pg3 (stonith:external/ipmi): Started pg1 [...] root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en $xnode'\n==\n\n' ; for node in pg{1..3}; do echo -en 'Fence node '\$node' with:\n' ; stonith_admin -l \$node ; echo '--' ; done"; done pg1 == Fence node pg1 with: No devices found -- Fence node pg2 with: 1 devices found p_ston_pg2 -- Fence node pg3 with: 1 devices found p_ston_pg3 -- pg2 == Fence node pg1 with: 1 devices found p_ston_pg1 -- Fence node pg2 with: No devices found -- Fence node pg3 with: 1 devices found p_ston_pg3 -- pg3 == Fence node pg1 with: 1 devices found p_ston_pg1 -- Fence node pg2 with: 1 devices found p_ston_pg2 -- Fence node pg3 with: No devices found -- root@test123:~# for xnode in pg{1..3}; do ssh -q $xnode "echo -en $xnode'\n==\n\n' ; stonith_admin -L; echo "; done pg1 == 2 devices found p_ston_pg3 p_ston_pg2 pg2 == 2 devices found p_ston_pg3 p_ston_pg1 pg3 == 2 devices found p_ston_pg1 p_ston_pg2 > Andrei Borzenkov hat am 20. Juli 2016 um 08:26 > geschrieben: > > On Tue, Jul 19, 2016 at 6:33 PM, Martin Schlegel wrote: > >> > [...] > >> > > >> > primitive p_ston_pg1 stonith:external/ipmi \ > >> > params hostname=pg1 ipaddr=10.148.128.35 userid=root > >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" > >> > passwd_method=file interface=lan priv=OPERATOR > >> > > >> > primitive p_ston_pg2 stonith:external/ipmi \ > >> > params hostname=pg2 ipaddr=10.148.128.19 userid=root > >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG2-ipmipass" > >> > passwd_method=file interface=lan priv=OPERATOR > >> > > >> > primitive p_ston_pg3 stonith:external/ipmi \ > >> > params hostname=pg3 ipaddr=10.148.128.59 userid=root > >> > passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG3-ipmipass" > >> > passwd_method=file interface=lan priv=OPERATOR > >> > > >> > location l_pgs_resources { otherstuff p_ston_pg1 p_ston_pg2 p_ston_pg3 } > >> > resource-discovery=exclusive \ > >> > rule #uname eq pg1 \ > >> > rule #uname eq pg2 \ > >> > rule #uname eq pg3 > >> > > >> > location l_ston_pg1 p_ston_pg1 -inf: pg1 > >> > location l_ston_pg2 p_ston_pg2 -inf: pg2 > >> > location l_ston_pg3 p_ston_pg3 -inf: pg3 > >> > >> These constraints prevent each device from running on its intended > >> target, but they don't limit which nodes each device can fence. For > >> that, each device needs a pcmk_host_list or pcmk_host_map entry, for > >> example: > >> > >> primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com > >> > >> Use pcmk_host_list if the fence device needs the node name as known to > >> the cluster, and pcmk_host_map if you need to translate a node name to > >> an address the device understands. > > > We used the parameter "hostname". What does it do if not that ? > > hostname is resource parameter. From pacemaker point of view this is > opaque string and only resource agent knows how to interpret it. > >
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
On Wed, Jul 20, 2016 at 10:09 AM, Andrei Borzenkov wrote: > tcm_node is part of lio-utils. I am not familiar with RedHat packages, > but I presume that searching for "lio" should reveal something. > I checked on both Fedora and CentOS, and there is no such package and no package provides a file called "tcm_node". I also looked at rpmfind.net and the only RPMs I found are for various versions of OpenSUSE. Looks like something slipped in that is SuSE-specific. --Greg ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Doing reload right
Ken Gaillot wrote: > Hello all, > > I've been meaning to address the implementation of "reload" in Pacemaker > for a while now, and I think the next release will be a good time, as it > seems to be coming up more frequently. [snipped] I don't want to comment directly on any of the excellent points which have been raised in this thread, but it seems like a good time to make a plea for easier reload / restart of individual instances of cloned services, one node at a time. Currently, if nodes are all managed by a configuration management system (such as Chef in our case), when the system wants to perform a configuration run on that node (e.g. when updating a service's configuration file from a template), it is necessary to place the entire node in maintenance mode before reloading or restarting that service on that node. It works OK, but can result in ugly effects such as the node getting stuck in maintenance mode if the chef-client run failed, without any easy way to track down the original cause. I went through several design iterations before settling on this approach, and they are detailed in a lengthy comment here, which may help you better understand the challenges we encountered: https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61 Similar challenges are posed during upgrade of Pacemaker-managed OpenStack infrastructure. Cheers, Adam ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Setup problem: couldn't find command: tcm_node
20.07.2016 18:08, Jason A Ramsey пишет: > I have been struggling getting a HA iSCSI Target cluster in place for > literally weeks. I cannot, for whatever reason, get pacemaker to create an > iSCSILogicalUnit resource. The error message that I’m seeing leads me to > believe that I’m missing something on the systems (“tcm_node”). Here are my > setup commands leading up to seeing this error message: > > # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget > iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s > > # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit > target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" > path=/dev/drbd1 implementation="lio" op monitor interval=15s > > > Failed Actions: > * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, > status=complete, exitreason='Setup problem: couldn't find command: tcm_node', tcm_node is part of lio-utils. I am not familiar with RedHat packages, but I presume that searching for "lio" should reveal something. > last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms > > This is with the following installed: > > pacemaker-cli-1.1.13-10.el7.x86_64 > pacemaker-1.1.13-10.el7.x86_64 > pacemaker-libs-1.1.13-10.el7.x86_64 > pacemaker-cluster-libs-1.1.13-10.el7.x86_64 > corosynclib-2.3.4-7.el7.x86_64 > corosync-2.3.4-7.el7.x86_64 > > Please please please…any ideas are appreciated. I’ve exhausted all avenues of > investigation at this point and don’t know what to do. Thank you! > > -- > > [ jR ] > @: ja...@eramsey.org > > there is no path to greatness; greatness is the path > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Setup problem: couldn't find command: tcm_node
I have been struggling getting a HA iSCSI Target cluster in place for literally weeks. I cannot, for whatever reason, get pacemaker to create an iSCSILogicalUnit resource. The error message that I’m seeing leads me to believe that I’m missing something on the systems (“tcm_node”). Here are my setup commands leading up to seeing this error message: # pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" op monitor interval=15s # pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" path=/dev/drbd1 implementation="lio" op monitor interval=15s Failed Actions: * hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=321, status=complete, exitreason='Setup problem: couldn't find command: tcm_node', last-rc-change='Wed Jul 20 10:51:15 2016', queued=0ms, exec=32ms This is with the following installed: pacemaker-cli-1.1.13-10.el7.x86_64 pacemaker-1.1.13-10.el7.x86_64 pacemaker-libs-1.1.13-10.el7.x86_64 pacemaker-cluster-libs-1.1.13-10.el7.x86_64 corosynclib-2.3.4-7.el7.x86_64 corosync-2.3.4-7.el7.x86_64 Please please please…any ideas are appreciated. I’ve exhausted all avenues of investigation at this point and don’t know what to do. Thank you! -- [ jR ] @: ja...@eramsey.org there is no path to greatness; greatness is the path ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker not always selecting the right stonith device
On 07/19/2016 06:54 PM, Andrei Borzenkov wrote: > 19.07.2016 19:01, Andrei Borzenkov пишет: >> 19.07.2016 18:24, Klaus Wenninger пишет: >>> On 07/19/2016 04:17 PM, Ken Gaillot wrote: On 07/19/2016 09:00 AM, Andrei Borzenkov wrote: > On Tue, Jul 19, 2016 at 4:52 PM, Ken Gaillot wrote: > ... >>> primitive p_ston_pg1 stonith:external/ipmi \ >>> params hostname=pg1 ipaddr=10.148.128.35 userid=root >>> passwd="/var/vcap/data/packages/pacemaker/ra-tmp/stonith/PG1-ipmipass" >>> passwd_method=file interface=lan priv=OPERATOR >>> > ... >> These constraints prevent each device from running on its intended >> target, but they don't limit which nodes each device can fence. For >> that, each device needs a pcmk_host_list or pcmk_host_map entry, for >> example: >> >>primitive p_ston_pg1 ... pcmk_host_map=pg1:pg1.ipmi.example.com >> >> Use pcmk_host_list if the fence device needs the node name as known to >> the cluster, and pcmk_host_map if you need to translate a node name to >> an address the device understands. >> > Is not pacemaker expected by default to query stonith agent instance > (sorry I do not know proper name for it) for a list of hosts it can > manage? And external/ipmi should return value of "hostname" patameter > here? So the question is why it does not work? You're right -- if not told otherwise, Pacemaker will query the device for the target list. In this case, the output of "stonith_admin -l" suggests it's not returning the desired information. I'm not familiar with the external agents, so I don't know why that would be. I mistakenly assumed it worked similarly to fence_ipmilan ... >>> guess it worked at the times when pacemaker did fencing via >>> cluster-glue-code... >>> A grep for "gethosts" doesn't return much for current pacemaker-sources >>> apart >>> from some leftovers in cts. >> Oh oh ... this sounds like a bug, no? >> > Apparently of all cluster-glue agents only ec2 supports both old and new > variants > > gethosts|hostlist|list) > # List of names we know about > > all others use gethosts. Not sure whether it is something to fix in > pacemaker or cluster-glue. Haven't dealt with legacy-fencing for a while so degradation of in-memory information + development in pacemaker create a portion of uncertainty in what I'm saying ;-) What you could try is adding "" to /usr/sbin/fence_legacy to convince pacemaker to even try asking the external Linux-HA stonith plugin. Unfortunately I currently don't have a setup (no cluster-glue stuff) I could quickly experiment with legacy-fencing. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] agent ocf:pacemaker:controld
Thank you all for the information about dlm_controld. I will make a try using https://git.fedorahosted.org/cgit/dlm.git/log/ . Dashi Cao From: Jan Pokorný Sent: Monday, July 18, 2016 8:47:50 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld > On 18/07/16 07:59, Da Shi Cao wrote: >> dlm_controld is very tightly coupled with cman. Wrong assumption. In fact, support for shipping ocf:pacemaker:controld has been explicitly restricted to cases when CMAN logic (specifically the respective handle-all initscript that is in turn, in that limited use case, triggered from pacemaker's proper one and, moreover, takes care of dlm_controld management on its own so any subsequent attempts to do the same would be ineffective) is _not_ around: https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3 (accidental syntactical typos were fixed later on: https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77) >> I have built a cluster purely with >> pacemaker+corosync+fence_sanlock. But if agent >> ocf:pacemaker:controld is desired, dlm_controld must exist! I can >> only find it in cman. >> Can the command dlm_controld be obtained without bringing in cman? To recap what others have suggested: On 18/07/16 08:57 +0100, Christine Caulfield wrote: > There should be a package called 'dlm' that has a dlm_controld suitable > for use with pacemaker. On 18/07/16 17:26 +0800, Eric Ren wrote: > DLM upstream hosted here: > https://git.fedorahosted.org/cgit/dlm.git/log/ > > The name of DLM on openSUSE is libdlm. -- Jan (Poki) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org