[ClusterLabs] [Announce] clufter-0.57.0 (+0.56.3) released

2016-07-01 Thread Jan Pokorný
I am happy to announce that clufter-0.57.0, a tool/library for
transforming/analyzing cluster configuration formats, has been
released and published (incl. signature using my 60BCBB4F5CD7F9EF key,
expiration of which was prolonged just a few days back so you may
want to consult key servers first):


or alternative (original) location:



The test suite for this version is also provided:

or alternatively:


[interpolate the same for v0.56.3 that arrived just a little bit earlier
and is detailed below as well for completeness]

Changelog highlights for v0.56.3:
- this is a bug fix release
- bug fixes:
  . with *2pcscmd* commands, clufter no longer suggests
"pcs cluster cib  --config" that doesn't currently work for
subsequent local-modification pcs commands (which is the purpose
together with sequence-crowning cib-push in this context), so
rather use mere "pcs cluster cib "
validation failures (unless --nocheck provided) due to source CIB
file using newer "validate-with" validation version specification
than the only supported so far (pacemaker-1.2.rng) or possibly
using a syntax not compatible with that; now also 2.0, 2.3 and 2.4
versions are supported, and the specfile is ready to borrow the
schemas from the installed pacemaker on-the-fly during a build stage
[resolves: rhbz#1328078]
  . with [cp]cs2pcscmd commands, clufter no longer suggests
"pcs cluster start --all --wait=-1"  as part of the emitted command
sequence  (last option decides, through a failure, whether pcs accepts
a numeric argument there, which would then make the rest of sequence
use this recent, more elegant provision of pcs instead of "sleep")
without suppressing both standard and error outputs so as to prevent
unnecessary clutter with newer, compatible versions of pcs

Changelog highlights for v0.57.0:
- this is a feature extension and bug fix release
- bug fixes:
  . with *2pcscmd* commands, clufter would previously emit doubled
"pcs" at the beginning for the command defining simple order
constraint
  . with *2pcscmd* commands, clufter would previously omit and/or
logic operators between each pair of atomic expressions
forming a rule for location constraint
  . with  *2pcscmd* commands, clufter would previously disregard
master/slave roles correctly encoded with a capitalized first
letter in CIB for colocation and location constraints
- feature extensions:
  . with *2pcscmd* commands, clufter now supports resource sets
for colocation and order constraints
  . with *2pcscmd* commands, clufter now supports ticket contraints
(incl. resource sets)

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic archives by GitHub preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,

(rather than ).


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpuR09YrsBrB.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] corosync just listening on 127.0.0.1

2016-07-01 Thread Lentes, Bernd
Hi,

i'm currently establishing a two-node cluster and i'm playing around with it.
I have two nodes, both have a bond-device. It is intended for DRBD, MySQL 
replication and the inter-cluster-communication.
Each bond has a private IP-address (192.168.100.xxx).

This is my setup:

sunhb58820-2:/etc/corosync # ifconfig
bond1 Link encap:Ethernet  HWaddr 28:80:23:A3:F0:23
  inet addr:192.168.100.20  Bcast:192.168.100.255  Mask:255.255.255.0
  inet6 addr: fe80::2a80:23ff:fea3:f023/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
  RX packets:56053 errors:0 dropped:0 overruns:0 frame:0
  TX packets:53217 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:4589704 (4.3 Mb)  TX bytes:4219706 (4.0 Mb)

eth0  Link encap:Ethernet  HWaddr 68:B5:99:C2:4A:37
  inet addr:146.107.235.161  Bcast:146.107.235.255  Mask:255.255.255.0
  inet6 addr: fe80::6ab5:99ff:fec2:4a37/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:5433800 errors:0 dropped:723509 overruns:0 frame:0
  TX packets:26076 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:509817106 (486.1 Mb)  TX bytes:6132823 (5.8 Mb)
  Interrupt:17


sunhb58820-2:/etc/corosync # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
0.0.0.0 146.107.235.1   0.0.0.0 UG0  00 eth0
127.0.0.0   0.0.0.0 255.0.0.0   U 0  00 lo
146.107.235.0   0.0.0.0 255.255.255.0   U 0  00 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0  00 eth0
192.168.100.0   0.0.0.0 255.255.255.0   U 0  00 bond1


The bond-devices from the two nodes are directly connected to each other.
Replication and DRBD is working fine already.

This is my corosync.conf:

...
interface {

ringnumber: 0
bindnetaddr: 192.168.1.0
mcastport: 5406
# mcastaddr: 239.255.1.1

member {
memberaddr: 192.168.100.10
}
member {
memberaddr: 192.168.100.20
}
}
transport: udpu

# interface {
#   bindnetaddr: 192.168.100.0
#   # mcastaddr: 225.94.1.1
#   broadcast: yes
#   mcastport: 5406
#   ringnumber: 0
# }
...


sunhb58820-2:/etc/corosync # netstat -anp|grep 540
udp0  0 127.0.0.1:5406  0.0.0.0:*   
9023/corosync
unix  3  [ ] STREAM CONNECTED 16540  6465/master


Corosync is just listening on the localhost device, not on the bond.
Same with the other node. Ports in firewall are open:

sunhb58820-2:/etc/corosync # iptables -nvL|grep 540
0 0 LOGtcp  --  *  *   0.0.0.0/00.0.0.0/0   
 limit: avg 3/min burst 5 tcp dpt:5404 flags:0x17/0x02 LOG flags 6 
level 4 prefix "SFW2-INext-ACC-TCP "
0 0 ACCEPT tcp  --  *  *   0.0.0.0/00.0.0.0/0   
 tcp dpt:5404
0 0 LOGtcp  --  *  *   0.0.0.0/00.0.0.0/0   
 limit: avg 3/min burst 5 tcp dpt:5405 flags:0x17/0x02 LOG flags 6 
level 4 prefix "SFW2-INext-ACC-TCP "
0 0 ACCEPT tcp  --  *  *   0.0.0.0/00.0.0.0/0   
 tcp dpt:5405
0 0 LOGtcp  --  *  *   0.0.0.0/00.0.0.0/0   
 limit: avg 3/min burst 5 tcp dpt:5406 flags:0x17/0x02 LOG flags 6 
level 4 prefix "SFW2-INext-ACC-TCP "
0 0 ACCEPT tcp  --  *  *   0.0.0.0/00.0.0.0/0   
 tcp dpt:5406


sunhb58820-2:/etc/corosync # corosync-cfgtool -s
Printing ring status.
Local node ID 2130706433
Could not get the ring status, the error is: 6




Any ideas ?

Thanks.


Bernd



-- 
Bernd Lentes 

Systemadministration 
institute of developmental genetics 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum München 
bernd.len...@helmholtz-muenchen.de 
phone: +49 (0)89 3187 1241 
fax: +49 (0)89 3187 2294 

Wer glaubt das Projektleiter Projekte leiten 
der glaubt auch das Zitronenfalter 
Zitronen falten
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Dr. Alfons Enhsen, Renate Schlusen 
(komm.)
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 

Re: [ClusterLabs] Sticky resource not sticky after unplugging network cable

2016-07-01 Thread Ken Gaillot
On 07/01/2016 02:13 AM, Auer, Jens wrote:
> Hi,
> 
> I have an active/passive cluster configuration and I am trying to make a
> virtual IP resource sticky such that it does not move back to a node
> after a fail-over. In my setup, I have a location preference for the
> virtual IP to the "primary" node:
> pcs resource show --full
>  Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
>   Meta Attrs: stickiniess=201
>   Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
>   stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
>   monitor interval=30s (mda-ip-monitor-interval-30s)
>  Master: drbd1_sync
>   Meta Attrs: master-max=1 master-node-max=1 clone-max=2
> clone-node-max=1 notify=true
>   Resource: drbd1 (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=shared_fs
>Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
>promote interval=0s timeout=90 (drbd1-promote-interval-0s)
>demote interval=0s timeout=90 (drbd1-demote-interval-0s)
>stop interval=0s timeout=100 (drbd1-stop-interval-0s)
>monitor interval=60s (drbd1-monitor-interval-60s)
>  Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
>   Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
>   stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
>   monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)
>  Resource: PF-PEP (class=ocf provider=pfpep type=pfpep_clusterSwitch)
>   Operations: start interval=0s timeout=20 (PF-PEP-start-interval-0s)
>   stop interval=0s timeout=20 (PF-PEP-stop-interval-0s)
>   monitor interval=10 timeout=20 (PF-PEP-monitor-interval-10)
>  Clone: supervisor-clone
>   Resource: supervisor (class=ocf provider=pfpep type=pfpep_supervisor)
>Operations: start interval=0s timeout=20 (supervisor-start-interval-0s)
>stop interval=0s timeout=20 (supervisor-stop-interval-0s)
>monitor interval=10 timeout=20
> (supervisor-monitor-interval-10)
>  Clone: snmpAgent-clone
>   Resource: snmpAgent (class=ocf provider=pfpep type=pfpep_snmpAgent)
>Operations: start interval=0s timeout=20 (snmpAgent-start-interval-0s)
>stop interval=0s timeout=20 (snmpAgent-stop-interval-0s)
>monitor interval=10 timeout=20
> (snmpAgent-monitor-interval-10)
> 
> Location Constraints:
>   Resource: mda-ip
> Enabled on: MDA1PFP (score:50) (id:location-mda-ip-MDA1PFP-50)
> Ordering Constraints:
>   promote drbd1_sync then start shared_fs (kind:Mandatory)
> (id:order-drbd1_sync-shared_fs-mandatory)
>   start shared_fs then start PF-PEP (kind:Mandatory)
> (id:order-shared_fs-PF-PEP-mandatory)
>   start snmpAgent-clone then start supervisor-clone (kind:Optional)
> (id:order-snmpAgent-clone-supervisor-clone-Optional)
>   start shared_fs then start snmpAgent-clone (kind:Optional)
> (id:order-shared_fs-snmpAgent-clone-Optional)
> Colocation Constraints:
>   mda-ip with drbd1_sync (score:INFINITY) (with-rsc-role:Master)
> (id:colocation-mda-ip-drbd1_sync-INFINITY)
>   shared_fs with drbd1_sync (score:INFINITY) (with-rsc-role:Master)
> (id:colocation-shared_fs-drbd1_sync-INFINITY)
>   PF-PEP with mda-ip (score:INFINITY) (id:colocation-PF-PEP-mda-ip-INFINITY)
> 
> pcs resource defaults
> resource-stickiness: 100
> 
> I use the virtual IP as a master resource and colocate everyhting else
> with it. The resource prefers one node with a score of 50, and the
> stickiness is 100 so I expect that after switching to the passive node
> and activating the primary node again the resource stays on the passive
> node. This works fine if I manually stop the primary node with pcs
> cluster stop. However, when I try to force a fail-over by unplugging the
> network cables of the primary node, and then after waiting  plug in the
> cables again, the resource moves back to the primary node.
> 
> I tried larger stickiness values, and also to set a meta
> resource-stickiness property on the resource itself, but it did not
> change. How do configure this?

Your "mda-ip with drbd1_sync master" colocation constraint has a score
of INFINITY, so it takes precedence over stickiness. Once drbd1_sync is
promoted on a node, mda-ip will move to it regardless of stickiness.
Perhaps what you want is the location preference to refer to drbd1_sync
master instead of mda-ip.

> Best wishes,
>   Jens
> 
> --
> *Jens Auer *| CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> _jens.auer@cgi.com_ 
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie
> unter _de.cgi.com/pflichtangaben_ .
>  
> CONFIDENTIALITY NOTICE: 

Re: [ClusterLabs] Antw: Doing reload right

2016-07-01 Thread Ken Gaillot
On 07/01/2016 04:48 AM, Jan Pokorný wrote:
> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
> Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
> Nachricht
>> <57754f9f.8070...@redhat.com>:
>>> I've been meaning to address the implementation of "reload" in Pacemaker
>>> for a while now, and I think the next release will be a good time, as it
>>> seems to be coming up more frequently.
>>>
>>> In the current implementation, Pacemaker considers a resource parameter
>>> "reloadable" if the resource agent supports the "reload" action, and the
>>> agent's metadata marks the parameter with "unique=0". If (only) such
>>> parameters get changed in the resource's pacemaker configuration,
>>> pacemaker will call the agent's reload action rather than the
>>> stop-then-start it usually does for parameter changes.
>>>
>>> This is completely broken for two reasons:
>>
>> I agree ;-)
>>
>>>
>>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>>> originally intended (and is widely used by existing resource agents) as
>>> a hint to UIs to indicate which parameters uniquely determine a resource
>>> instance. That is, two resource instances should never have the same
>>> value of a "unique" parameter. For this purpose, it makes perfect sense
>>> that (for example) the path to a binary command would have unique=0 --
>>> multiple resource instances could (and likely would) use the same
>>> binary. However, such a parameter could never be reloadable.
>>
>> I tought unique=0 were reloadable (unique=1 were not)...

Correct. By "could never be reloadable", I mean that if someone changes
the location of the daemon binary, there's no way the agent could change
that with anything other than a full restart. So using unique=0 to
indicate reloadable doesn't make sense.

> I see a doubly-distorted picture here:
> - actually "unique=1" on a RA parameter (together with this RA supporting
>   "reload") currently leads to reload-on-change
> - also the provided example shows why reload for "unique=0" is wrong,
>   but as the opposite applies as of current state, it's not an argument
>   why something is broken
> 
> See also:
> https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23

Nope, unique=1 is used for the *restart* list -- the non-reloadable
parameters.

>>> 2. Every known resource agent that implements a reload action does so
>>> incorrectly. Pacemaker uses reload for changes in the resource's
>>> *pacemaker* configuration, while all known RAs use reload for a
>>> service's native reload capability of its own configuration file. As an
>>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>>> action, which will have zero effect on any pacemaker-configured
>>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>>> UI sense, and none of those parameters are actually reloadable.
> 
> (per the last subclause, applicable also, after mentioned inversion, for
> "unique=1", such as a pid file path, which cannot be reloadable for
> apparent reason)
> 
>> Maybe LSB confusion...
> 
> That's not entirely fair vindication, as when you have to do some
> extra actions with parameters in LSB-aliased "start" action in the
> RA, you should do such reflections also for "reload".

I think the point is that "reload" for an LSB init script or systemd
unit always reloads the native service configuration, so it's natural
for administrators and developers to think of that when they see "reload".

>>> My proposed solution is:
>>>
>>> * Add a new "reloadable" attribute for resource agent metadata, to
>>> indicate reloadable parameters. Pacemaker would use this instead of
>>> "unique".
>>
>> No objections if you change the XML metadata version number this time ;-)
> 
> Good point, but I guess everyone's a bit scared to open this Pandora
> box as there's so much technical debt connected to that (unifying FA/RA
> metadata if possible, adding new UI-oriented annotations, pacemaker's
> silent additions like "private" parameter).
> I'd imagine an established authority for OCF matters (and maintaing
> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
> process inspired by Python PEPs for coordinated development:
> https://www.python.org/dev/peps/pep-0001/
> 
> 

An update to the OCF spec is long overdue. I wouldn't mind those wheels
starting to turn, but I think this reload change could proceed
independently (though of course coordinated at the appropriate time).

>>> * Add a new "reload-options" RA action for the ability to reload
>>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
>>
>> Why not "reload-parameters"?
> 
> That came to my mind as well.  Or not wasting time/space on too many
> letters, just "reload-params", perhaps.
> 
> 

Just being a lazy typist :)

You're right, "parameters" or "params" would be more consistent with
existing usage. "Instance attributes" is probably the most 

Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-07-01 Thread Tomas Jelinek

Dne 30.6.2016 v 18:31 Digimer napsal(a):

On 30/06/16 12:28 PM, Jan Friesse wrote:

I am pleased to announce the latest release of Corosync
2.4.0 available immediately from our website at
http://build.clusterlabs.org/corosync/releases/.

This release is mostly about long awaited QDevice feature and few rather
small fixes.


Woot! :D


Pcs CLI supports Qdevice starting upstream version pcs-0.9.153 which is 
available as of now.





Qdevice is complete rewrite of old cman qdisk using network arbiter
(instead of disk) so it's very similar to Linux HA quorum daemon/HP
Serviceguard Quorum Server/...

Qdevice currently consists of two daemons:

- corosync-qdevice is a daemon running on each node of a cluster. It
provides a configured number of votes to the quorum subsystem based  on
a third-party arbitrator’s decision. Its primary use is to allow a
cluster to sustain more node failures than standard quorum rules allow.
It is recommended  for  clusters  with an even number of nodes and
highly recommended for 2 node clusters.

- corosync-qnetd is a daemon running outside of the cluster with the
purpose of providing a vote to the  corosync-qdevice model net. It’s
designed to support multiple clusters and be almost configuration and
state free. New clusters are handled dynamically and  no  configuration
file exists. It’s also able to run as non-root user - which is
recommended. Connection between the corosync-qdevice model net client
can be optionally configured with TLS client certificate checking. The
communication protocol between server and client is designed to be very
simple and allow backwards compatibility.

To compile corosync-qdevice/corosync-qnetd, configure.sh has to be
invoked with --enable-qdevices/--enable-qnetd switches.

To find out how to configure qdevice/qnetd take a look to man pages
corosync-qdevice (8) and corosync-qnetd (8).

Please note that because of required changes in votequorum,
libvotequorum is no longer binary compatible. This is reason for version
bump.

Starting with this release 2.3 branch becomes unsupported and officially
supported Needle is 2.4 branch. Just a note, this doesn't affect support
of Flatiron where nothing changes and 1.4 branch is still supported.

Changelog for fixes for 2.4.0 (qdevices commits not included):

Ferenc Wágner (8):
   cmap_track_add.3.in: fix typo: bellow -> below
   Fix typo: funtion -> function
   Fix typo: interger -> integer
   Fix typo: Uknown -> Unknown
   Fix typo: aquire -> acquire
   Fix typo: retrive -> retrieve
   Fix typo: alocated -> allocated
   Fix typo: Diabled -> disabled

Jan Friesse (1):
   config: get_cluster_mcast_addr error is not fatal

bliu (1):
   low:typo fix in sam.h

Upgrade is (more than usually) highly recommended.

Thanks/congratulations to all people that contributed to achieve this
great milestone.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] pcs resource delete

2016-07-01 Thread philipp . achmueller
hi,

how to delete a resource from pacemaker configuration without stopping 
underlying resource itself?

i remember using "pcs resource delete --force " - but last 
time my   resource was beeing stopped

thank you!
regards
Philipp
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Doing reload right

2016-07-01 Thread Jan Pokorný
On 01/07/16 09:23 +0200, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
 Nachricht
> <57754f9f.8070...@redhat.com>:
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
>> 
>> In the current implementation, Pacemaker considers a resource parameter
>> "reloadable" if the resource agent supports the "reload" action, and the
>> agent's metadata marks the parameter with "unique=0". If (only) such
>> parameters get changed in the resource's pacemaker configuration,
>> pacemaker will call the agent's reload action rather than the
>> stop-then-start it usually does for parameter changes.
>> 
>> This is completely broken for two reasons:
> 
> I agree ;-)
> 
>> 
>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>> originally intended (and is widely used by existing resource agents) as
>> a hint to UIs to indicate which parameters uniquely determine a resource
>> instance. That is, two resource instances should never have the same
>> value of a "unique" parameter. For this purpose, it makes perfect sense
>> that (for example) the path to a binary command would have unique=0 --
>> multiple resource instances could (and likely would) use the same
>> binary. However, such a parameter could never be reloadable.
> 
> I tought unique=0 were reloadable (unique=1 were not)...

I see a doubly-distorted picture here:
- actually "unique=1" on a RA parameter (together with this RA supporting
  "reload") currently leads to reload-on-change
- also the provided example shows why reload for "unique=0" is wrong,
  but as the opposite applies as of current state, it's not an argument
  why something is broken

See also:
https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23

>> 2. Every known resource agent that implements a reload action does so
>> incorrectly. Pacemaker uses reload for changes in the resource's
>> *pacemaker* configuration, while all known RAs use reload for a
>> service's native reload capability of its own configuration file. As an
>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>> action, which will have zero effect on any pacemaker-configured
>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>> UI sense, and none of those parameters are actually reloadable.

(per the last subclause, applicable also, after mentioned inversion, for
"unique=1", such as a pid file path, which cannot be reloadable for
apparent reason)

> Maybe LSB confusion...

That's not entirely fair vindication, as when you have to do some
extra actions with parameters in LSB-aliased "start" action in the
RA, you should do such reflections also for "reload".

>> My proposed solution is:
>> 
>> * Add a new "reloadable" attribute for resource agent metadata, to
>> indicate reloadable parameters. Pacemaker would use this instead of
>> "unique".
> 
> No objections if you change the XML metadata version number this time ;-)

Good point, but I guess everyone's a bit scared to open this Pandora
box as there's so much technical debt connected to that (unifying FA/RA
metadata if possible, adding new UI-oriented annotations, pacemaker's
silent additions like "private" parameter).
I'd imagine an established authority for OCF matters (and maintaing
https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
process inspired by Python PEPs for coordinated development:
https://www.python.org/dev/peps/pep-0001/



>> * Add a new "reload-options" RA action for the ability to reload
>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
> 
> Why not "reload-parameters"?

That came to my mind as well.  Or not wasting time/space on too many
letters, just "reload-params", perhaps.



>> * Formalize that "reload" means reload the service's own configuration,
>> legitimizing the most common existing RA implementations. (Pacemaker
>> itself will not use this, but tools such as crm_resource might.)
> 
> Maybe be precise what your "reload-options" is expected to do,
> compared to the "reload" action.  I'm still a bit confused. Maybe a
> working example...

IIUIC, reload-options should first reflect the parameters as when
"start" is invoked, then delegate the responsibility to something
that triggers as native reload as possible (which was mentioned
is commonly [and problematically] implemented directly in current
"reload" actions of common RAs).

>> * Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
>> use unique, reloadable, reload, and reload-options properly.
>> 
>> The downside is that this breaks backward compatibility. Any RA that
>> actually implements unique and reload so that reload works will lose
>> reload capability until it is updated to the new style.
> 
> Maybe there's a solution that is even simpler: Keep 

[ClusterLabs] fence_vmware_soap: fail to shutdown VMs

2016-07-01 Thread Kevin THIERRY

Hello !

I'm trying to fence my nodes using fence_vmware_soap but it fails to 
shutdown or reboot my VMs. I can get the list of the VMs on a host or 
query the status of a specific VM without problem:


# fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure 
-4 -n laa-billing-backup -o status
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: 
InsecureRequestWarning:
Unverified HTTPS request is being made. Adding certificate verification 
is strongly advised. See: 
https://urllib3.readthedocs.org/en/latest/security.html

  InsecureRequestWarning)
Status: ON

However, trying to shutdown or to reboot a VM fails:

# fence_vmware_soap -a 10.5.200.20 -l root -p "**" -z --ssl-insecure 
-4 -n laa-billing-backup -o reboot
/usr/lib/python2.7/site-packages/urllib3/connectionpool.py:769: 
InsecureRequestWarning: Unverified HTTPS request is being made. Adding 
certificate verification is strongly advised. See: 
https://urllib3.readthedocs.org/en/latest/security.html

  InsecureRequestWarning)
Failed: Timed out waiting to power OFF

On the ESXi I get the following logs in /var/log/hostd.log:

[LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND 
(2/0):

Accepted password for user root from 10.5.200.12
2016-07-01T08:49:50.911Z info hostd[34380B70] [Originator@6876 
sub=Vimsvc.ha-eventmgr opID=47defdf1] Event 190 : User root@10.5.200.12 
logged in as python-requests/2.6.0 CPython/2.7.5 
Linux/3.10.0-327.18.2.el7.x86_64
2016-07-01T08:49:50.998Z info hostd[32F80B70] [Originator@6876 
sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Created : 
haTask--vim.SearchIndex.findByUuid-2513
2016-07-01T08:49:50.999Z info hostd[32F80B70] [Originator@6876 
sub=Vimsvc.TaskManager opID=47defdf4 user=root] Task Completed : 
haTask--vim.SearchIndex.findByUuid-2513 Status success
2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 
sub=Solo.Vmomi opID=47defdf6 user=root] Activation 
[N5Vmomi10ActivationE:0x34603c28] : Invoke done [powerOff] on 
[vim.VirtualMachine:3]
2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 
sub=Solo.Vmomi opID=47defdf6 user=root] Throw vim.fault.RestrictedVersion
2016-07-01T08:49:51.009Z info hostd[32F80B70] [Originator@6876 
sub=Solo.Vmomi opID=47defdf6 user=root] Result:

--> (vim.fault.RestrictedVersion) {
-->faultCause = (vmodl.MethodFault) null,
-->msg = ""
--> }
2016-07-01T08:49:51.027Z info hostd[34380B70] [Originator@6876 
sub=Vimsvc.ha-eventmgr opID=47defdf7 user=root] Event 191 : User 
root@10.5.200.12 logged out (login time: Friday, 01 July, 2016 08:49:50, 
number of API invocations: 0, user agent: python-requests/2.6.0 
CPython/2.7.5 Linux/3.10.0-327.18.2.el7.x86_64)



I am wondering if there is some kind of compatibility issue. I am using 
fence-agents-vmware-soap 4.0.11 on CentOS 7.2.1511 and ESXi 6.0.0 Build 
2494585.

Any ideas about that issue?

Best regards,

--
Kevin THIERRY
IT System Engineer

CIT Lao Ltd. – A.T.M.
PO Box 10082
Vientiane Capital – Lao P.D.R.
Cell : +856 (0)20 2221 8623
kevin.thierry.cit...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Doing reload right

2016-07-01 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
>>> Nachricht
<57754f9f.8070...@redhat.com>:
> Hello all,
> 
> I've been meaning to address the implementation of "reload" in Pacemaker
> for a while now, and I think the next release will be a good time, as it
> seems to be coming up more frequently.
> 
> In the current implementation, Pacemaker considers a resource parameter
> "reloadable" if the resource agent supports the "reload" action, and the
> agent's metadata marks the parameter with "unique=0". If (only) such
> parameters get changed in the resource's pacemaker configuration,
> pacemaker will call the agent's reload action rather than the
> stop-then-start it usually does for parameter changes.
> 
> This is completely broken for two reasons:

I agree ;-)

> 
> 1. It relies on "unique=0" to determine reloadability. "unique" was
> originally intended (and is widely used by existing resource agents) as
> a hint to UIs to indicate which parameters uniquely determine a resource
> instance. That is, two resource instances should never have the same
> value of a "unique" parameter. For this purpose, it makes perfect sense
> that (for example) the path to a binary command would have unique=0 --
> multiple resource instances could (and likely would) use the same
> binary. However, such a parameter could never be reloadable.

I tought unique=0 were reloadable (unique=1 were not)...


> 
> 2. Every known resource agent that implements a reload action does so
> incorrectly. Pacemaker uses reload for changes in the resource's
> *pacemaker* configuration, while all known RAs use reload for a
> service's native reload capability of its own configuration file. As an
> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
> action, which will have zero effect on any pacemaker-configured
> parameters -- and on top of that, the RA uses "unique=0" in its correct
> UI sense, and none of those parameters are actually reloadable.

Maybe LSB confusion...

> 
> My proposed solution is:
> 
> * Add a new "reloadable" attribute for resource agent metadata, to
> indicate reloadable parameters. Pacemaker would use this instead of
> "unique".

No objections if you change the XML metadata version number this time ;-)

> 
> * Add a new "reload-options" RA action for the ability to reload
> Pacemaker-configured options. Pacemaker would call this instead if "reload".

Why not "reload-parameters"?

> 
> * Formalize that "reload" means reload the service's own configuration,
> legitimizing the most common existing RA implementations. (Pacemaker
> itself will not use this, but tools such as crm_resource might.)

Maybe be precise what your "reload-options" is expected to do, compared to the 
"reload" action.
I'm still a bit confused. Maybe a working example...


> 
> * Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
> use unique, reloadable, reload, and reload-options properly.
> 
> The downside is that this breaks backward compatibility. Any RA that
> actually implements unique and reload so that reload works will lose
> reload capability until it is updated to the new style.

Maybe there's a solution that is even simpler: Keep the action name, but don't 
use "relaod" unless the RA indicated at least one "reloadable" parameter. 
Naturally old RAs don't do it. And once an author touches the RA to fix thing, 
he/she should do so (not just adding "relaodable" to the metadata).

> 
> While we usually go to great lengths to preserve backward compatibility,
> I think it is OK to break it in this case, because most RAs that
> implement reload do so wrongly: some implement it as a service reload, a
> few advertise reload but don't actually implement it, and others map
> reload to start, which might theoretically work in some cases (I'm not
> familiar enough with iSCSILogicalUnit and iSCSITarget to be sure), but
> typically won't, as the previous service options are not reverted (for
> example, I think Route would incorrectly leave the old route in the old
> table).
> 
> So, I think breaking backward compatibility is actually a good thing
> here, since the most reload can do with existing RAs is trigger bad
> behavior.

See my comment above.

> 
> The opposing view would be that we shouldn't punish any RA writer who
> implemented this correctly. However, there's no solution that preserves

The software could give a warning if the RA provides "reload", but does not 
define any reloadble parameter. So to notify users and developers that some 
change is needed.

> backward compatibility with both UI usage of unique and reload usage of
> unique. Plus, the worst that would happen is that the RA would stop
> being reloadable -- not as bad as the current possibilities from
> mis-implemented reload.

Agree on that.

> 
> My questions are:
> 
> Does anyone know of an RA that uses reload correctly? Dummy doesn't
> count ;-)

Revalidation by the RA authors (and adding "reloadable" attribute to