Re: [ClusterLabs] Previous DC fenced prior to integration

2016-07-22 Thread Andrei Borzenkov
23.07.2016 01:37, Nate Clark пишет:
> Hello,
> 
> I am running pacemaker 1.1.13 with corosync and think I may have
> encountered a start up timing issue on a two node cluster. I didn't
> notice anything in the changelog for 14 or 15 that looked similar to
> this or open bugs.
> 
> The rough out line of what happened:
> 
> Module 1 and 2 running
> Module 1 is DC
> Module 2 shuts down
> Module 1 updates node attributes used by resources
> Module 1 shuts down
> Module 2 starts up
> Module 2 votes itself as DC
> Module 1 starts up
> Module 2 sees module 1 in corosync and notices it has quorum
> Module 2 enters policy engine state.
> Module 2 policy engine decides to fence 1
> Module 2 then continues and starts resource on itself based upon the old state
> 
> For some reason the integration never occurred and module 2 starts to
> perform actions based on stale state.
> 
> Here is the full logs
> Jul 20 16:29:06.376805 module-2 crmd[21969]:   notice: Connecting to
> cluster infrastructure: corosync
> Jul 20 16:29:06.386853 module-2 crmd[21969]:   notice: Could not
> obtain a node name for corosync nodeid 2
> Jul 20 16:29:06.392795 module-2 crmd[21969]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.403611 module-2 crmd[21969]:   notice: Quorum lost
> Jul 20 16:29:06.409237 module-2 stonith-ng[21965]:   notice: Watching
> for stonith topology changes
> Jul 20 16:29:06.409474 module-2 stonith-ng[21965]:   notice: Added
> 'watchdog' to the device list (1 active devices)
> Jul 20 16:29:06.413589 module-2 stonith-ng[21965]:   notice: Relying
> on watchdog integration for fencing
> Jul 20 16:29:06.416905 module-2 cib[21964]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.417044 module-2 crmd[21969]:   notice:
> pcmk_quorum_notification: Node module-2[2] - state is now member (was
> (null))
> Jul 20 16:29:06.421821 module-2 crmd[21969]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:06.422121 module-2 crmd[21969]:   notice: Notifications disabled
> Jul 20 16:29:06.422149 module-2 crmd[21969]:   notice: Watchdog
> enabled but stonith-watchdog-timeout is disabled
> Jul 20 16:29:06.422286 module-2 crmd[21969]:   notice: The local CRM
> is operational
> Jul 20 16:29:06.422312 module-2 crmd[21969]:   notice: State
> transition S_STARTING -> S_PENDING [ input=I_PENDING
> cause=C_FSA_INTERNAL origin=do_started ]
> Jul 20 16:29:07.416871 module-2 stonith-ng[21965]:   notice: Added
> 'fence_sbd' to the device list (2 active devices)
> Jul 20 16:29:08.418567 module-2 stonith-ng[21965]:   notice: Added
> 'ipmi-1' to the device list (3 active devices)
> Jul 20 16:29:27.423578 module-2 crmd[21969]:  warning: FSA: Input
> I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
> Jul 20 16:29:27.424298 module-2 crmd[21969]:   notice: State
> transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
> cause=C_TIMER_POPPED origin=election_timeout_popped ]
> Jul 20 16:29:27.460834 module-2 crmd[21969]:  warning: FSA: Input
> I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
> Jul 20 16:29:27.463794 module-2 crmd[21969]:   notice: Notifications disabled
> Jul 20 16:29:27.463824 module-2 crmd[21969]:   notice: Watchdog
> enabled but stonith-watchdog-timeout is disabled
> Jul 20 16:29:27.473285 module-2 attrd[21967]:   notice: Defaulting to
> uname -n for the local corosync node name
> Jul 20 16:29:27.498464 module-2 pengine[21968]:   notice: Relying on
> watchdog integration for fencing
> Jul 20 16:29:27.498536 module-2 pengine[21968]:   notice: We do not
> have quorum - fencing and resource management disabled
> Jul 20 16:29:27.502272 module-2 pengine[21968]:  warning: Node
> module-1 is unclean!
> Jul 20 16:29:27.502287 module-2 pengine[21968]:   notice: Cannot fence
> unclean nodes until quorum is attained (or no-quorum-policy is set to
> ignore)
> Jul 20 16:29:27.503521 module-2 pengine[21968]:   notice: Start
> fence_sbd(module-2 - blocked)
> Jul 20 16:29:27.503539 module-2 pengine[21968]:   notice: Start
> ipmi-1(module-2 - blocked)
> Jul 20 16:29:27.503559 module-2 pengine[21968]:   notice: Start
> SlaveIP(module-2 - blocked)
> Jul 20 16:29:27.503582 module-2 pengine[21968]:   notice: Start
> postgres:0(module-2 - blocked)
> Jul 20 16:29:27.503597 module-2 pengine[21968]:   notice: Start
> ethmonitor:0(module-2 - blocked)
> Jul 20 16:29:27.503618 module-2 pengine[21968]:   notice: Start
> tomcat-instance:0(module-2 - blocked)
> Jul 20 16:29:27.503629 module-2 pengine[21968]:   notice: Start
> ClusterMonitor:0(module-2 - blocked)
> Jul 20 16:29:27.506945 module-2 pengine[21968]:  warning: Calculated
> Transition 0: /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Jul 20 16:29:27.507976 module-2 crmd[21969]:   notice: Initiating
> action 4: monitor fence_sbd_monitor_0 on module-2 (local)
> Jul 20 16:29:27.509282 module-2 crmd[21969]:   noti

Re: [ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues

2016-07-22 Thread Andrei Borzenkov
23.07.2016 00:07, TEG AMJG пишет:
...

>  Master: kamailioetcclone
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
> notify=true
>   Resource: kamailioetc (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=kamailioetc
>Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s)
>promote interval=0s timeout=90
> (kamailioetc-promote-interval-0s)
>demote interval=0s timeout=90
> (kamailioetc-demote-interval-0s)
>stop interval=0s timeout=100 (kamailioetc-stop-interval-0s)
>monitor interval=10s (kamailioetc-monitor-interval-10s)
...

> 
> The problem is that when i have only one node online in corosync and start
> the other node to rejoin the cluster, all my resources restart and
> sometimes even migrates to the other node 

Try adding interleave=true to your clone resource.

> (starting by changing in
> promotion who is master and who is slave) even though the first node is
> healthy and i use resource-stickiness=200 as a default in all resources
> inside the cluster.
> 
> I do believe it has something to do with the constraint of promotion that
> happens with DRBD.
> 
> Thank you very much in advance.
> 
> Regards.
> 
> Alejandro
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Previous DC fenced prior to integration

2016-07-22 Thread Nate Clark
Hello,

I am running pacemaker 1.1.13 with corosync and think I may have
encountered a start up timing issue on a two node cluster. I didn't
notice anything in the changelog for 14 or 15 that looked similar to
this or open bugs.

The rough out line of what happened:

Module 1 and 2 running
Module 1 is DC
Module 2 shuts down
Module 1 updates node attributes used by resources
Module 1 shuts down
Module 2 starts up
Module 2 votes itself as DC
Module 1 starts up
Module 2 sees module 1 in corosync and notices it has quorum
Module 2 enters policy engine state.
Module 2 policy engine decides to fence 1
Module 2 then continues and starts resource on itself based upon the old state

For some reason the integration never occurred and module 2 starts to
perform actions based on stale state.

Here is the full logs
Jul 20 16:29:06.376805 module-2 crmd[21969]:   notice: Connecting to
cluster infrastructure: corosync
Jul 20 16:29:06.386853 module-2 crmd[21969]:   notice: Could not
obtain a node name for corosync nodeid 2
Jul 20 16:29:06.392795 module-2 crmd[21969]:   notice: Defaulting to
uname -n for the local corosync node name
Jul 20 16:29:06.403611 module-2 crmd[21969]:   notice: Quorum lost
Jul 20 16:29:06.409237 module-2 stonith-ng[21965]:   notice: Watching
for stonith topology changes
Jul 20 16:29:06.409474 module-2 stonith-ng[21965]:   notice: Added
'watchdog' to the device list (1 active devices)
Jul 20 16:29:06.413589 module-2 stonith-ng[21965]:   notice: Relying
on watchdog integration for fencing
Jul 20 16:29:06.416905 module-2 cib[21964]:   notice: Defaulting to
uname -n for the local corosync node name
Jul 20 16:29:06.417044 module-2 crmd[21969]:   notice:
pcmk_quorum_notification: Node module-2[2] - state is now member (was
(null))
Jul 20 16:29:06.421821 module-2 crmd[21969]:   notice: Defaulting to
uname -n for the local corosync node name
Jul 20 16:29:06.422121 module-2 crmd[21969]:   notice: Notifications disabled
Jul 20 16:29:06.422149 module-2 crmd[21969]:   notice: Watchdog
enabled but stonith-watchdog-timeout is disabled
Jul 20 16:29:06.422286 module-2 crmd[21969]:   notice: The local CRM
is operational
Jul 20 16:29:06.422312 module-2 crmd[21969]:   notice: State
transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Jul 20 16:29:07.416871 module-2 stonith-ng[21965]:   notice: Added
'fence_sbd' to the device list (2 active devices)
Jul 20 16:29:08.418567 module-2 stonith-ng[21965]:   notice: Added
'ipmi-1' to the device list (3 active devices)
Jul 20 16:29:27.423578 module-2 crmd[21969]:  warning: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jul 20 16:29:27.424298 module-2 crmd[21969]:   notice: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_TIMER_POPPED origin=election_timeout_popped ]
Jul 20 16:29:27.460834 module-2 crmd[21969]:  warning: FSA: Input
I_ELECTION_DC from do_election_check() received in state S_INTEGRATION
Jul 20 16:29:27.463794 module-2 crmd[21969]:   notice: Notifications disabled
Jul 20 16:29:27.463824 module-2 crmd[21969]:   notice: Watchdog
enabled but stonith-watchdog-timeout is disabled
Jul 20 16:29:27.473285 module-2 attrd[21967]:   notice: Defaulting to
uname -n for the local corosync node name
Jul 20 16:29:27.498464 module-2 pengine[21968]:   notice: Relying on
watchdog integration for fencing
Jul 20 16:29:27.498536 module-2 pengine[21968]:   notice: We do not
have quorum - fencing and resource management disabled
Jul 20 16:29:27.502272 module-2 pengine[21968]:  warning: Node
module-1 is unclean!
Jul 20 16:29:27.502287 module-2 pengine[21968]:   notice: Cannot fence
unclean nodes until quorum is attained (or no-quorum-policy is set to
ignore)
Jul 20 16:29:27.503521 module-2 pengine[21968]:   notice: Start
fence_sbd(module-2 - blocked)
Jul 20 16:29:27.503539 module-2 pengine[21968]:   notice: Start
ipmi-1(module-2 - blocked)
Jul 20 16:29:27.503559 module-2 pengine[21968]:   notice: Start
SlaveIP(module-2 - blocked)
Jul 20 16:29:27.503582 module-2 pengine[21968]:   notice: Start
postgres:0(module-2 - blocked)
Jul 20 16:29:27.503597 module-2 pengine[21968]:   notice: Start
ethmonitor:0(module-2 - blocked)
Jul 20 16:29:27.503618 module-2 pengine[21968]:   notice: Start
tomcat-instance:0(module-2 - blocked)
Jul 20 16:29:27.503629 module-2 pengine[21968]:   notice: Start
ClusterMonitor:0(module-2 - blocked)
Jul 20 16:29:27.506945 module-2 pengine[21968]:  warning: Calculated
Transition 0: /var/lib/pacemaker/pengine/pe-warn-0.bz2
Jul 20 16:29:27.507976 module-2 crmd[21969]:   notice: Initiating
action 4: monitor fence_sbd_monitor_0 on module-2 (local)
Jul 20 16:29:27.509282 module-2 crmd[21969]:   notice: Initiating
action 5: monitor ipmi-1_monitor_0 on module-2 (local)
Jul 20 16:29:27.511839 module-2 crmd[21969]:   notice: Initiating
action 6: monitor ipmi-2_monitor_0 on module-2 (local)
Jul 20 16:29:27.512629 module-2 crmd[2196

Re: [ClusterLabs] Doing reload right

2016-07-22 Thread Ken Gaillot
On 07/21/2016 07:46 PM, Andrew Beekhof wrote:
> On Fri, Jul 22, 2016 at 1:48 AM, Adam Spiers  wrote:
>> Ken Gaillot  wrote:
>>> On 07/20/2016 07:32 PM, Andrew Beekhof wrote:
 On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers  wrote:
> Ken Gaillot  wrote:
>> Hello all,
>>
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
>
> [snipped]
>
> I don't want to comment directly on any of the excellent points which
> have been raised in this thread, but it seems like a good time to make
> a plea for easier reload / restart of individual instances of cloned
> services, one node at a time.  Currently, if nodes are all managed by
> a configuration management system (such as Chef in our case),

 Puppet creates the same kinds of issues.
 Both seem designed for a magical world full of unrelated servers that
 require no co-ordination to update.
 Particularly when the timing of an update to some central store (cib,
 database, whatever) needs to be carefully ordered.

 When you say "restart" though, is that a traditional stop/start cycle
 in Pacemaker that also results in all the dependancies being stopped
 too?
>>
>> No, just the service reload or restart without causing any cascading
>> effects in Pacemaker.
>>
 I'm guessing you really want the "atomic reload" kind where nothing
 else is affected because we already have the other style covered by
 crm_resource --restart.
>>>
>>> crm_resource --restart isn't sufficient for his use case because it
>>> affects all clone instances cluster-wide, whereas he needs to reload or
>>> restart (depending on the service) the local instance only.
> 
> Isn't that what I said?  That --restart does a version that he doesn't want?
> 
>> Exactly.
>>
 I propose that we introduce a --force-restart option for crm_resource 
 which:

 1. disables any recurring monitor operations
>>>
>>> None of the other --force-* options disable monitors, so for
>>> consistency, I think we should leave this to the user (or add it for
>>> other --force-*).
> 
> No.  There is no other way to reliably achieve a restart than to
> disable the monitors first so that they don't detect a transient
> state.  Especially if the resource doesn't advertise a restart
> command.

I see your point, --force-{stop,demote,promote} can still complete with
monitors running (even if the cluster reverses it immediately after),
but a stop-start cycle might not even complete before being disrupted.

>>>
 2. calls a native restart action directly on the resource if it
 exists, otherwise calls the native stop+start actions
>>>
>>> What do you mean by native restart action? Systemd restart?
> 
> Whatever the agent supports.

Are you suggesting that pacemaker starting checking whether the agent
metadata advertises a "restart" action? Or just assume that certain
resource classes support restart (e.g. systemd) and others don't (e.g. ocf)?

>>>
 3. re-enables the recurring monitor operations regardless of whether
 the reload succeeds, fails, or times out, etc

 No maintenance mode required, and whatever state the resource ends up
 in is re-detected by the cluster in step 3.
>>>
>>> If you're lucky :-)
>>>
>>> The cluster may still mess with the resource even without monitors, e.g.
>>> a dependency fails or a preferred node comes online.
> 
> Can you explain how neither of those results in a restart of the service?

Unless the resource is unmanaged, the cluster could do something like
move it to a different node, disrupting the local force-restart.

Ideally, we'd be able to disable monitors and unmanage the resource for
the duration of the force-restart, but only on the local node.

>>> Maintenance
>>> mode/unmanaging would still be safer (though no --force-* option is
>>> completely safe, besides check).
>>
>> I'm happy with whatever you gurus come up with ;-)  I'm just hoping
>> that it can be made possible to pinpoint an individual resource on an
>> individual node, rather than having to toggle maintenance flag(s)
>> across a whole set of clones, or a whole node.
> 
> Yep.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Active/Passive Cluster restarting resources on healthy node and DRBD issues

2016-07-22 Thread TEG AMJG
Hi

I am having a problem with a very simple Active/Passive cluster using DRBD.

This is my configuration:

Cluster Name: kamcluster
Corosync Nodes:
 kam1vs3 kam2vs3
Pacemaker Nodes:
 kam1vs3 kam2vs3

Resources:
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.0.1.206 cidr_netmask=32
  Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
  stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
  monitor interval=10s (ClusterIP-monitor-interval-10s)
 Resource: ClusterIP2 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=10.0.1.207 cidr_netmask=32
  Operations: start interval=0s timeout=20s (ClusterIP2-start-interval-0s)
  stop interval=0s timeout=20s (ClusterIP2-stop-interval-0s)
  monitor interval=10s (ClusterIP2-monitor-interval-10s)
 Resource: rtpproxycluster (class=systemd type=rtpproxy)
  Operations: monitor interval=10s (rtpproxycluster-monitor-interval-10s)
  stop interval=0s on-fail=fence
(rtpproxycluster-stop-interval-0s)
 Resource: kamailioetcfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/etc/kamailio fstype=ext4
  Operations: start interval=0s timeout=60 (kamailioetcfs-start-interval-0s)
  monitor interval=10s on-fail=fence
(kamailioetcfs-monitor-interval-10s)
  stop interval=0s on-fail=fence
(kamailioetcfs-stop-interval-0s)
 Clone: fence_kam2_xvm-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: fence_kam2_xvm (class=stonith type=fence_xvm)
   Attributes: port=tegamjg_kam2 pcmk_host_list=kam2vs3
   Operations: monitor interval=60s (fence_kam2_xvm-monitor-interval-60s)
 Master: kamailioetcclone
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true
  Resource: kamailioetc (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=kamailioetc
   Operations: start interval=0s timeout=240 (kamailioetc-start-interval-0s)
   promote interval=0s timeout=90
(kamailioetc-promote-interval-0s)
   demote interval=0s timeout=90
(kamailioetc-demote-interval-0s)
   stop interval=0s timeout=100 (kamailioetc-stop-interval-0s)
   monitor interval=10s (kamailioetc-monitor-interval-10s)
 Resource: kamailiocluster (class=ocf provider=heartbeat type=kamailio)
  Attributes: listen_address=10.0.1.206 conffile=/etc/kamailio/kamailio.cfg
pidfile=/var/run/kamailio.pid monitoring_ip=10.0.1.206
monitoring_ip2=10.0.1.207 port=5060 proto=udp
kamctlrc=/etc/kamailio/kamctlrc
  Operations: start interval=0s timeout=60
(kamailiocluster-start-interval-0s)
  stop interval=0s on-fail=fence
(kamailiocluster-stop-interval-0s)
  monitor interval=5s (kamailiocluster-monitor-interval-5s)
 Clone: fence_kam1_xvm-clone
  Meta Attrs: interleave=true clone-max=2 clone-node-max=1
  Resource: fence_kam1_xvm (class=stonith type=fence_xvm)
   Attributes: port=tegamjg_kam1 pcmk_host_list=kam1vs3
   Operations: monitor interval=60s (fence_kam1_xvm-monitor-interval-60s)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: kamailiocluster
Enabled on: kam1vs3 (score:INFINITY) (role: Started)
(id:cli-prefer-kamailiocluster)
Ordering Constraints:
  start ClusterIP then start ClusterIP2 (kind:Mandatory)
(id:order-ClusterIP-ClusterIP2-mandatory)
  start ClusterIP2 then start rtpproxycluster (kind:Mandatory)
(id:order-ClusterIP2-rtpproxycluster-mandatory)
  start fence_kam2_xvm-clone then promote kamailioetcclone (kind:Mandatory)
(id:order-fence_kam2_xvm-clone-kamailioetcclone-mandatory)
  promote kamailioetcclone then start kamailioetcfs (kind:Mandatory)
(id:order-kamailioetcclone-kamailioetcfs-mandatory)
  start kamailioetcfs then start ClusterIP (kind:Mandatory)
(id:order-kamailioetcfs-ClusterIP-mandatory)
  start rtpproxycluster then start kamailiocluster (kind:Mandatory)
(id:order-rtpproxycluster-kamailiocluster-mandatory)
  start fence_kam1_xvm-clone then start fence_kam2_xvm-clone
(kind:Mandatory)
(id:order-fence_kam1_xvm-clone-fence_kam2_xvm-clone-mandatory)
Colocation Constraints:
  rtpproxycluster with ClusterIP2 (score:INFINITY)
(id:colocation-rtpproxycluster-ClusterIP2-INFINITY)
  ClusterIP2 with ClusterIP (score:INFINITY)
(id:colocation-ClusterIP2-ClusterIP-INFINITY)
  ClusterIP with kamailioetcfs (score:INFINITY)
(id:colocation-ClusterIP-kamailioetcfs-INFINITY)
  kamailioetcfs with kamailioetcclone (score:INFINITY)
(with-rsc-role:Master)
(id:colocation-kamailioetcfs-kamailioetcclone-INFINITY)
  kamailioetcclone with fence_kam2_xvm-clone (score:INFINITY)
(id:colocation-kamailioetcclone-fence_kam2_xvm-clone-INFINITY)
  kamailiocluster with rtpproxycluster (score:INFINITY)
(id:colocation-kamailiocluster-rtpproxycluster-INFINITY)
  fence_kam2_xvm-clone with fence_kam1_xvm-clone (score:INFINITY)
(id:colocation-fence_kam2_xvm-clone-fence_kam1_xvm-clone-INFINITY)

Resources Defaults:
 migration-thre

Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Jason A Ramsey
Great! Thanks for the pointer! Any ideas on the other stuff I was asking about 
(i.e. how to use any other backstore other than block with Pacemaker)?

--
 
[ jR ]
  @: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

On 7/22/16, 12:24 PM, "Andrei Borzenkov"  wrote:

22.07.2016 18:29, Jason A Ramsey пишет:
> From the command line parameters for the pcs resource create or is it
> something internal (not exposed to the user)? If the former, what
> parameter?
> 


http://www.linux-ha.org/doc/dev-guides/_literal_ocf_resource_instance_literal.html

> --
> 
> [ jR ] @: ja...@eramsey.org
> 
> there is no path to greatness; greatness is the path
> 
> On 7/22/16, 11:08 AM, "Andrei Borzenkov" 
> wrote:
> 
> 22.07.2016 17:43, Jason A Ramsey пишет:
>> Additionally (and this is just a failing on my part), I’m unclear
>> as to where the resource agent is fed the value for 
>> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters
>> one is permitted to supply with “pcs resource create…”
>> 
> 
> It is supplied automatically by pacemaker.
> 
> 



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] [Announce] clufter v0.59.0 released

2016-07-22 Thread Jan Pokorný
I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.0
released and published (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite for this version is also provided:

or alternatively:


Changelog highlights for v0.59.0:
- this is a feature extension and bug fix release
- bug fixes:
  . previously, pcs2pcscmd* commands would attempt to have quorum
device configured using "pcs quorum add" whereas the correct syntax
is "pcs quorum device add"
  . with {cib,pcs}2pcscmd* commands, clufter no longer chokes on
validation failures (unless --nocheck provided) due to source CIB
file using newer "validate-with" validation version specification
than supported so far, such as with pacemaker-2.5 introducing the
alert handlers stanza in CIB, because the support has been extended
up that very version (only affects deployments that do not borrow
the schemas from the installed pacemaker on-the-fly during a build
stage, which is not the case when building RPMs using the upstream
specfile)
- feature extensions:
  . {cib,pcs}2pcscmd* commands are now aware of configured alert
handlers in CIB and able to emit respective configuration
commands using pcs tool
- functional changes:
  . due to too many moving targets (corosync, pacemaker, pcs) with
features being gradually added, clufter as of this release
relies on the specified distribution target (which basically boils
down to snapshot of the supported features, as opposed to passing
zillion extra parameters expressing the same) stronger than ever;
this has several implications: do not expect that one sequence
of pcs commands at the clufter's output is portable to completely
different environment, and your distribution/setup may not be
supported (I try to cover Fedora, RHEL+derivates, Debian and Ubuntu
directly) in which case facts.py (where everything is tracked)
needs to be patched

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpcGdzXFqU9D.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Andrei Borzenkov
22.07.2016 18:29, Jason A Ramsey пишет:
> From the command line parameters for the pcs resource create or is it
> something internal (not exposed to the user)? If the former, what
> parameter?
> 

http://www.linux-ha.org/doc/dev-guides/_literal_ocf_resource_instance_literal.html

> --
> 
> [ jR ] @: ja...@eramsey.org
> 
> there is no path to greatness; greatness is the path
> 
> On 7/22/16, 11:08 AM, "Andrei Borzenkov" 
> wrote:
> 
> 22.07.2016 17:43, Jason A Ramsey пишет:
>> Additionally (and this is just a failing on my part), I’m unclear
>> as to where the resource agent is fed the value for 
>> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters
>> one is permitted to supply with “pcs resource create…”
>> 
> 
> It is supplied automatically by pacemaker.
> 
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Jason A Ramsey
From the command line parameters for the pcs resource create or is it something 
internal (not exposed to the user)? If the former, what parameter?

--
 
[ jR ]
  @: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

On 7/22/16, 11:08 AM, "Andrei Borzenkov"  wrote:

22.07.2016 17:43, Jason A Ramsey пишет:
> Additionally (and this is just a failing on my part), I’m
> unclear as to where the resource agent is fed the value for
> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one
> is permitted to supply with “pcs resource create…”
>

It is supplied automatically by pacemaker.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker not always selecting the right stonith device

2016-07-22 Thread Andrei Borzenkov
22.07.2016 09:52, Ulrich Windl пишет:
> That could be. Should there be a node list to configure, or can't the agent
> find out itself (for SBD)?
> 

It apparently does it already


gethosts)
echo `sbd -d $sbd_device list | cut -f2 | sort | uniq`
exit 0


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Andrei Borzenkov
22.07.2016 17:43, Jason A Ramsey пишет:
> Additionally (and this is just a failing on my part), I’m
> unclear as to where the resource agent is fed the value for
> “${OCF_RESOURCE_INSTANCE}” given the limited number of parameters one
> is permitted to supply with “pcs resource create…”
>

It is supplied automatically by pacemaker.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Resource Agent ocf:heartbeat:iSCSILogicalUnit

2016-07-22 Thread Jason A Ramsey
I’m struggling to understand how to fully exploit the capabilities of targetcli 
using the Pacemaker resource agent for iSCSILogicalUnit. From this block of 
code:

lio-t)
# For lio, we first have to create a target device, then
# add it to the Target Portal Group as an LU.
ocf_run targetcli /backstores/block create 
name=${OCF_RESOURCE_INSTANCE} dev=${OCF_RESKEY_path} || exit $OCF_ERR_GENERIC
if [ -n "${OCF_RESKEY_scsi_sn}" ]; then
echo ${OCF_RESKEY_scsi_sn} > 
/sys/kernel/config/target/core/iblock_${OCF_RESKEY_lio_iblock}/${OCF_RESOURCE_INSTANCE}/wwn/vpd_unit_serial
fi
ocf_run targetcli /iscsi/${OCF_RESKEY_target_iqn}/tpg1/luns 
create /backstores/block/${OCF_RESOURCE_INSTANCE} ${OCF_RESKEY_lun} || exit 
$OCF_ERR_GENERIC

if [ -n "${OCF_RESKEY_allowed_initiators}" ]; then
for initiator in ${OCF_RESKEY_allowed_initiators}; do
ocf_run targetcli 
/iscsi/${OCF_RESKEY_target_iqn}/tpg1/acls create ${initiator} 
add_mapped_luns=False || exit $OCF_ERR_GENERIC
ocf_run targetcli 
/iscsi/${OCF_RESKEY_target_iqn}/tpg1/acls/${initiator} create ${OCF_RESKEY_lun} 
${OCF_RESKEY_lun} || exit $OCF_ERR_GENERIC
done
fi
;;

it looks like I’m only permitted to create a block backstore. Critically 
missing, in this scenario, is the ability to create fileio backstores on things 
like mounted filesystems abstracted by things like drbd. Additionally (and this 
is just a failing on my part), I’m unclear as to where the resource agent is 
fed the value for “${OCF_RESOURCE_INSTANCE}” given the limited number of 
parameters one is permitted to supply with “pcs resource create…”

Can anyone provide any insight please? Thank you in advance!


--
 
[ jR ]
  @: ja...@eramsey.org
 
  there is no path to greatness; greatness is the path

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] agent ocf:pacemaker:controld

2016-07-22 Thread Eric Ren

Hello,

On 07/22/2016 02:14 PM, Da Shi Cao wrote:

The manual "Pacemaker 1.1 Clusters from Scratch" gives the false impression that gfs2 
relies only on dlm, but I cannot make it work without gfs_controld. Again this little daemon is 
heavily coupled with cman. I think it is quite hard to use gfs2 in a cluster build only using 
"pacemaker+corosync"! Am I wrong?


A big doubt is, why do you make everything from scratch rather than 
having a try on a recent OS release?
No matter why, it's a good start point from trying out a distribution 
that supports HA solution. Trust me,
it's very easy to setup a cluster filesystem (gfs2 or ocfs2) on openSUSE 
or fedora.


Cheers,
Eric



Thanks a lot!
Dashi Cao

From: Da Shi Cao 
Sent: Thursday, July 21, 2016 9:31:51 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

I've built the dlm_tool suite using the source from 
https://git.fedorahosted.org/cgit/dlm.git/log/.  The resource uisng 
ocf:pacemaker:controld will always fail to start because of timeout, even if 
start timeout is set to 120s! But if dlm_controld is first started outside the 
cluster management,  then the resource will show up and stay well!

Another question is what's the difference of dlm_controld and gfs_controld? 
Must they both be present if a cluster gfs file system is mounted?

Thanks a lot!
Dashi Cao

From: Da Shi Cao 
Sent: Wednesday, July 20, 2016 4:47:31 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

Thank you all for the information about dlm_controld. I will make a try using 
https://git.fedorahosted.org/cgit/dlm.git/log/ .

Dashi Cao


From: Jan Pokorný 
Sent: Monday, July 18, 2016 8:47:50 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld


On 18/07/16 07:59, Da Shi Cao wrote:

dlm_controld is very tightly coupled with cman.

Wrong assumption.

In fact, support for shipping ocf:pacemaker:controld has been
explicitly restricted to cases when CMAN logic (specifically the
respective handle-all initscript that is in turn, in that limited use
case, triggered from pacemaker's proper one and, moreover, takes
care of dlm_controld management on its own so any subsequent attempts
to do the same would be ineffective) is _not_ around:

https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3
(accidental syntactical typos were fixed later on:
https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77)


I have built a cluster purely with
pacemaker+corosync+fence_sanlock. But if agent
ocf:pacemaker:controld is desired, dlm_controld must exist! I can
only find it in cman.
Can the command dlm_controld be obtained without bringing in cman?

To recap what others have suggested:

On 18/07/16 08:57 +0100, Christine Caulfield wrote:

There should be a package called 'dlm' that has a dlm_controld suitable
for use with pacemaker.

On 18/07/16 17:26 +0800, Eric Ren wrote:

DLM upstream hosted here:
   https://git.fedorahosted.org/cgit/dlm.git/log/

The name of DLM on openSUSE is libdlm.

--
Jan (Poki)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] agent ocf:pacemaker:controld

2016-07-22 Thread Eric Ren

Hello,

On 07/21/2016 09:31 PM, Da Shi Cao wrote:

I've built the dlm_tool suite using the source from 
https://git.fedorahosted.org/cgit/dlm.git/log/.  The resource uisng 
ocf:pacemaker:controld will always fail to start because of timeout, even if 
start timeout is set to 120s! But if dlm_controld is first started outside the 
cluster management,  then the resource will show up and stay well!
1. Why do you suppose it's because of timeout? Any logs when DLM RA 
failed to start?
"ocf:pacemaker:controld" is bash script 
(/usr/lib/ocf/resource.d/pacemaker/controld).
If taking a look at this script, you'll find it suppose that 
dlm_controld is installed in a certain place (/usr/sbin/dlm_controld for

openSUSE). So, how would dlm RA find your dlm deamon?

Another question is what's the difference of dlm_controld and gfs_controld? 
Must they both be present if a cluster gfs file system is mounted?
2. dlm_controld is a deamon in userland for dlm kernel module, while 
gfs2_controld is for gfs2, i think. However, on the recent release 
(redhat and suse, AFAIK),
gfs_controld is no longer needed. But I don't know much history about 
this change. Hope someone could elaborate on this a bit more;-)


Cheers,
Eric



Thanks a lot!
Dashi Cao

From: Da Shi Cao 
Sent: Wednesday, July 20, 2016 4:47:31 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld

Thank you all for the information about dlm_controld. I will make a try using 
https://git.fedorahosted.org/cgit/dlm.git/log/ .

Dashi Cao


From: Jan Pokorný 
Sent: Monday, July 18, 2016 8:47:50 PM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: Re: [ClusterLabs] agent ocf:pacemaker:controld


On 18/07/16 07:59, Da Shi Cao wrote:

dlm_controld is very tightly coupled with cman.

Wrong assumption.

In fact, support for shipping ocf:pacemaker:controld has been
explicitly restricted to cases when CMAN logic (specifically the
respective handle-all initscript that is in turn, in that limited use
case, triggered from pacemaker's proper one and, moreover, takes
care of dlm_controld management on its own so any subsequent attempts
to do the same would be ineffective) is _not_ around:

https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3
(accidental syntactical typos were fixed later on:
https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77)


I have built a cluster purely with
pacemaker+corosync+fence_sanlock. But if agent
ocf:pacemaker:controld is desired, dlm_controld must exist! I can
only find it in cman.
Can the command dlm_controld be obtained without bringing in cman?

To recap what others have suggested:

On 18/07/16 08:57 +0100, Christine Caulfield wrote:

There should be a package called 'dlm' that has a dlm_controld suitable
for use with pacemaker.

On 18/07/16 17:26 +0800, Eric Ren wrote:

DLM upstream hosted here:
   https://git.fedorahosted.org/cgit/dlm.git/log/

The name of DLM on openSUSE is libdlm.

--
Jan (Poki)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker in puppet with cib.xml?

2016-07-22 Thread Jan Pokorný
On 21/07/16 21:51 +0200, Jan Pokorný wrote:
> Yes, it's counterintuitive to have this asymmetry and it could be
> made to work with some added effort at the side of pcs with
> the original, disapproved, sequence as-is, but that's perhaps
> sound of the future per the referenced pcs bug.
> So take this idiom as a rule of thumb not to be questioned
> any time soon.

...at least until something better is around:
https://bugzilla.redhat.com/1359057 (open for comments)

-- 
Jan (Poki)


pgprGseIVSXop.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org