[ClusterLabs] Corosync lost quorum but DLM still gives locks

2017-10-11 Thread Jean-Marc Saffroy
Hi,

I am caught by surprise with this behaviour of DLM:
- I have 5 nodes (test VMs)
- 3 of them have 1 vote for the corosync quorum (they are "voters")
- 2 of them have 0 vote ("non-voters")

So the corosync quorum is 2.

On the non-voters, I run DLM and an application that runs it. On DLM, 
fencing is disabled.

Now, if I stop corosync on 2 of the voters:
- as expected, corosync says "Activity blocked"
- but to my surprise, DLM seems happy to give more locks

Shouldn't DLM block lock requests in this situation?


Cheers,
JM

-- 

[root@vm4 ~]# corosync-quorumtool 
Quorum information
--
Date: Wed Oct 11 20:29:52 2017
Quorum provider:  corosync_votequorum
Nodes:3
Node ID:  5
Ring ID:  3/24
Quorate:  No

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  1
Quorum:   2 Activity blocked
Flags:

Membership information
--
Nodeid  Votes Name
 3  1 172.16.2.33
 4  0 172.16.3.33
 5  0 172.16.4.33 (local)

[root@vm4 ~]# dlm_tool status
cluster nodeid 5 quorate 0 ring seq 24 24
daemon now 6908 fence_pid 0 
node 4 M add 4912 rem 0 fail 0 fence 0 at 0 0
node 5 M add 4912 rem 0 fail 0 fence 0 at 0 0

[root@vm4 ~]# corosync-cpgtool 
Group Name PID Node ID
dlm:ls:XYZ\x00
   971   4 (172.16.3.33)
 10095   5 (172.16.4.33)
dlm:controld\x00
   971   4 (172.16.3.33)
 10095   5 (172.16.4.33)

[root@vm4 ~]# cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 

[root@vm4 ~]# rpm -q corosync dlm
corosync-2.4.0-9.el7_4.2.x86_64
dlm-4.0.7-1.el7.x86_64

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterMon mail notification - does not work

2017-10-11 Thread Ken Gaillot
On Wed, 2017-10-11 at 09:12 +0200, Ferenc Wágner wrote:
> Donat Zenichev  writes:
> 
> > then resource is stopped, but nothing occurred on e-mail
> > destination.
> > Where I did wrong actions?
> 
> Please note that ClusterMon notifications are becoming deprecated
> (they
> should still work, but I've got no experience with them).  Try using
> alerts instead, as documented at
> https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explai
> ned/ch07.html

Alerts were introduced in Pacemaker 1.1.15 -- I believe Ubuntu 16.04
has 1.1.14.

Donat: if you can upgrade to a newer Ubuntu, you should be able to get
a version with alerts, which is a better implementation for sending e-
mails than ClusterMon.

If you can't, or you still want to use ClusterMon for the HTML status
updates, my first suggestion would be to make sure that your version of
crm_mon supports the mail-* arguments. It's a compile-time option, and
I don't know if Ubuntu enabled it. Simply do "man crm_mon", and if it
shows the mail-* options, then you have the capability.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-11 Thread Jonathan Davies

Hi ClusterLabs,

I'm seeing a race condition in corosync where votequorum can have 
incorrect membership info when a node joins the cluster then leaves very 
soon after.


I'm on corosync-2.3.4 plus my patch 
https://github.com/corosync/corosync/pull/248. That patch makes the 
problem readily reproducible but the bug was already present.


Here's the scenario. I have two hosts, cluster1 and cluster2. The 
corosync.conf on cluster2 is:


totem {
  version: 2
  cluster_name: test
  config_version: 2
  transport: udpu
}
nodelist {
  node {
nodeid: 1
ring0_addr: cluster1
  }
  node {
nodeid: 2
ring0_addr: cluster2
  }
}
quorum {
  provider: corosync_votequorum
  auto_tie_breaker: 1
}
logging {
  to_syslog: yes
}

The corosync.conf on cluster1 is the same except with "config_version: 1".

I start corosync on cluster2. When I start corosync on cluster1, it 
joins and then immediately leaves due to the lower config_version.
(Previously corosync on cluster2 would also exit but with 
https://github.com/corosync/corosync/pull/248 it remains alive.)


But often at this point, cluster1's disappearance is not reflected in 
the votequorum info on cluster2:


Quorum information
--
Date: Tue Oct 10 16:43:50 2017
Quorum provider:  corosync_votequorum
Nodes:1
Node ID:  2
Ring ID:  700
Quorate:  Yes

Votequorum information
--
Expected votes:   2
Highest expected: 2
Total votes:  2
Quorum:   2
Flags:Quorate AutoTieBreaker

Membership information
--
Nodeid  Votes Name
 2  1 cluster2 (local)

The logs on cluster1 show:

Oct 10 16:43:37 cluster1 corosync[15750]:  [CMAP  ] Received config 
version (2) is different than my config version (1)! Exiting


The logs on cluster2 show:

Oct 10 16:43:37 cluster2 corosync[5102]:  [TOTEM ] A new membership 
(10.71.218.17:588) was formed. Members joined: 1
Oct 10 16:43:37 cluster2 corosync[5102]:  [QUORUM] This node is 
within the primary component and will provide service.

Oct 10 16:43:37 cluster2 corosync[5102]:  [QUORUM] Members[1]: 2
Oct 10 16:43:37 cluster2 corosync[5102]:  [TOTEM ] A new membership 
(10.71.218.18:592) was formed. Members left: 1

Oct 10 16:43:37 cluster2 corosync[5102]:  [QUORUM] Members[1]: 2
Oct 10 16:43:37 cluster2 corosync[5102]:  [MAIN  ] Completed 
service synchronization, ready to provide service.


It looks like QUORUM has seen cluster1's arrival but not its departure!

When it works as expected, the state is left consistent:

Quorum information
--
Date: Tue Oct 10 16:58:14 2017
Quorum provider:  corosync_votequorum
Nodes:1
Node ID:  2
Ring ID:  604
Quorate:  No

Votequorum information
--
Expected votes:   2
Highest expected: 2
Total votes:  1
Quorum:   2 Activity blocked
Flags:AutoTieBreaker

Membership information
--
Nodeid  Votes Name
 2  1 cluster2 (local)

Logs on cluster1:

Oct 10 16:58:01 cluster1 corosync[16430]:  [CMAP  ] Received config 
version (2) is different than my config version (1)! Exiting


Logs on cluster2 are either:

Oct 10 16:58:01 cluster2 corosync[17835]:  [TOTEM ] A new 
membership (10.71.218.17:600) was formed. Members joined: 1
Oct 10 16:58:01 cluster2 corosync[17835]:  [QUORUM] This node is 
within the primary component and will provide service.

Oct 10 16:58:01 cluster2 corosync[17835]:  [QUORUM] Members[1]: 2
Oct 10 16:58:01 cluster2 corosync[17835]:  [CMAP  ] Highest config 
version (2) and my config version (2)
Oct 10 16:58:01 cluster2 corosync[17835]:  [TOTEM ] A new 
membership (10.71.218.18:604) was formed. Members left: 1
Oct 10 16:58:01 cluster2 corosync[17835]:  [QUORUM] This node is 
within the non-primary component and will NOT provide any services.

Oct 10 16:58:01 cluster2 corosync[17835]:  [QUORUM] Members[1]: 2
Oct 10 16:58:01 cluster2 corosync[17835]:  [MAIN  ] Completed 
service synchronization, ready to provide service.


... in which it looks like QUORUM has seen cluster1's arrival *and* its 
departure,


or:

Oct 10 16:59:03 cluster2 corosync[18841]:  [TOTEM ] A new 
membership (10.71.218.17:632) was formed. Members joined: 1
Oct 10 16:59:03 cluster2 corosync[18841]:  [CMAP  ] Highest config 
version (2) and my config version (2)
Oct 10 16:59:03 cluster2 corosync[18841]:  [TOTEM ] A new 
membership (10.71.218.18:636) was formed. Members left: 1

Oct 10 16:59:03 cluster2 corosync[18841]:  [QUORUM] Members[1]: 2
Oct 10 16:59:03 cluster2 corosync[18841]:  [MAIN  ] Completed 

Re: [ClusterLabs] if resourceA starts @nodeA then start resource[xy] @node[xy]

2017-10-11 Thread Ken Gaillot
On Tue, 2017-10-10 at 12:06 +0100, lejeczek wrote:
> 
> On 26/09/17 13:15, Klaus Wenninger wrote:
> > On 09/26/2017 02:06 PM, lejeczek wrote:
> > > hi fellas
> > > 
> > > can something like in the subject pacemaker do? And if yes then
> > > how to
> > > do it?
> > 
> > You could bind ResourceA to nodeA and resource[xy] to node[xy] via
> > location constraints.
> > Afterwards you could make resource[xy] depend on ResourceA -
> > without
> > collocation.
> > The actual commands (based on crmsh or pcs) to create these rules
> > depend on thedistribution you are using.
> > 
> > Regards,
> > Klaus
> 
> thanks,
> I am probably hoping for too much(?) - without man made 
> constraints but sort of cluster logic would make these 
> decisions: wherever it decided to start/run resourceA then 
> rourceB would have to run on different(all or remaining 
> cluster's nodes).
> 
> I'd only have to tell it something like: if you started 
> resourceA on nodeA then remaing(or maybe specific) resources 
> start on all but nodeA nodes.

Sure, that's simply a colocation constraint with a negative score.

For details, see http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/htm
l-single/Pacemaker_Explained/index.html#s-resource-colocation (and/or
the help for whatever higher-level tools you're using)

> 
> > > I'm looking into docs but before I'd gone through it all I hoped
> > > an
> > > expert could tell.
> > > 
> > > many thanks, L.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] trouble with IPaddr2

2017-10-11 Thread Valentin Vidic
On Wed, Oct 11, 2017 at 01:29:40PM +0200, Stefan Krueger wrote:
> ohh damn.. thanks a lot for this hint.. I delete all the IPs on enp4s0f0, and 
> than it works..
> but could you please explain why it now works? why he has a problem with this 
> IPs?

AFAICT, it found a better interface with that subnet and tried
to use it instead of the one specified in the parameters :)

But maybe IPaddr2 should just skip interface auto-detection
if an explicit interface was given in the parameters?

-- 
Valentin

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] can't move/migrate ressource

2017-10-11 Thread Stefan Krueger
Hello,

when i try to migrate a ressource from one server to an other (for example for 
maintenance), it don't work.
a single ressource works fine, after that I create a group with 2 ressources 
and try to move that.

my config is:
crm conf show
node 739272007: zfs-serv1
node 739272008: zfs-serv2
primitive HA_IP-Serv1 IPaddr2 \
params ip=172.16.101.70 cidr_netmask=16 \
op monitor interval=20 timeout=30 on-fail=restart nic=bond0 \
meta target-role=Started
primitive HA_IP-Serv2 IPaddr2 \
params ip=172.16.101.74 cidr_netmask=16 \
op monitor interval=10s nic=bond0
primitive nc_storage ZFS \
params pool=nc_storage importargs="-d /dev/disk/by-partlabel/"
group compl_zfs-serv1 nc_storage HA_IP-Serv1
location cli-prefer-HA_IP-Serv1 compl_zfs-serv1 role=Started inf: zfs-serv1
location cli-prefer-HA_IP-Serv2 HA_IP-Serv2 role=Started inf: zfs-serv2
location cli-prefer-compl_zfs-serv1 compl_zfs-serv1 role=Started inf: zfs-serv2
location cli-prefer-nc_storage compl_zfs-serv1 role=Started inf: zfs-serv1
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-94ff4df \
cluster-infrastructure=corosync \
cluster-name=debian \
no-quorum-policy=ignore \
default-resource-stickiness=100 \
stonith-enabled=false \
last-lrm-refresh=1507702403


command:
crm resource move compl_zfs-serv1 zfs-serv2


pacemakerlog from zfs-serv2:
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  Diff: 
--- 0.106.0 2
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  Diff: 
+++ 0.107.0 cc224b15d0a796e040b026b7c2965770
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  -- 
/cib/configuration/constraints/rsc_location[@id='cli-prefer-compl_zfs-serv1']
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  +  
/cib:  @epoch=107
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_process_request: 
Completed cib_delete operation for section constraints: OK (rc=0, 
origin=zfs-serv1/crm_resource/3, version=0.107.0)
Oct 11 13:55:58 [3561] zfs-serv2   crmd: info: abort_transition_graph:  
Transition aborted by deletion of 
rsc_location[@id='cli-prefer-compl_zfs-serv1']: Configuration change | 
cib=0.107.0 source=te_update_diff:444 
path=/cib/configuration/constraints/rsc_location[@id='cli-prefer-compl_zfs-serv1']
 complete=true
Oct 11 13:55:58 [3561] zfs-serv2   crmd:   notice: do_state_transition: 
State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  Diff: 
--- 0.107.0 2
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  Diff: 
+++ 0.108.0 (null)
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  +  
/cib:  @epoch=108
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_perform_op:  ++ 
/cib/configuration/constraints:  
Oct 11 13:55:58 [3556] zfs-serv2cib: info: cib_process_request: 
Completed cib_modify operation for section constraints: OK (rc=0, 
origin=zfs-serv1/crm_resource/4, version=0.108.0)
Oct 11 13:55:58 [3561] zfs-serv2   crmd: info: abort_transition_graph:  
Transition aborted by rsc_location.cli-prefer-compl_zfs-serv1 'create': 
Configuration change | cib=0.108.0 source=te_update_diff:444 
path=/cib/configuration/constraints complete=true
Oct 11 13:55:58 [3560] zfs-serv2pengine:   notice: unpack_config:   On loss 
of CCM Quorum: Ignore
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: determine_online_status: 
Node zfs-serv2 is online
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: determine_online_status: 
Node zfs-serv1 is online
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: determine_op_status: 
Operation monitor found resource nc_storage active on zfs-serv2
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: native_print:
HA_IP-Serv2 (ocf::heartbeat:IPaddr2):   Started zfs-serv2
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: group_print:  
Resource Group: compl_zfs-serv1
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: native_print: 
nc_storage (ocf::heartbeat:ZFS):   Started zfs-serv1
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: native_print: 
HA_IP-Serv1(ocf::heartbeat:IPaddr2):   Started zfs-serv1
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: LogActions:  Leave   
HA_IP-Serv2 (Started zfs-serv2)
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: LogActions:  Leave   
nc_storage  (Started zfs-serv1)
Oct 11 13:55:58 [3560] zfs-serv2pengine: info: LogActions:  Leave   
HA_IP-Serv1 (Started zfs-serv1)
Oct 11 13:55:58 [3560] zfs-serv2pengine:   notice: process_pe_message:  
Calculated transition 8, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-1348.bz2

Re: [ClusterLabs] trouble with IPaddr2

2017-10-11 Thread Stefan Krueger
Hello Valentin,
thanks for you help

> Can you share more info on the network of zfs-serv2, for example: ip a?
ip a s
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: enp4s0f0:  mtu 1500 qdisc mq state UP group 
default qlen 1000
link/ether ac:1f:6b:xx:xx:xx brd ff:ff:ff:ff:ff:ff
inet 172.16.22.126/16 brd 172.16.255.255 scope global enp4s0f0
   valid_lft forever preferred_lft forever
inet 172.16.101.74/16 brd 172.16.255.255 scope global secondary enp4s0f0
   valid_lft forever preferred_lft forever
inet6 fe80::ae1f:6bff::/64 scope link
   valid_lft forever preferred_lft forever
3: enp4s0f1:  mtu 1500 qdisc noop state DOWN group default 
qlen 1000
link/ether ac:1f:6b:xx:xx:xx brd ff:ff:ff:ff:ff:ff
4: ens2f0:  mtu 1500 qdisc mq master 
bond0 state UP group default qlen 1000
link/ether 3c:fd:fe:xx:xx:xx brd ff:ff:ff:ff:ff:ff
5: ens2f1:  mtu 1500 qdisc mq master 
bond0 state UP group default qlen 1000
link/ether 3c:fd:fe:xx:xx:xx brd ff:ff:ff:ff:ff:ff
6: bond0:  mtu 1500 qdisc noqueue state 
UP group default qlen 1000
link/ether 3c:fd:fe:xx:xx:xx brd ff:ff:ff:ff:ff:ff
inet 172.16.101.72/16 brd 172.16.255.255 scope global bond0
   valid_lft forever preferred_lft forever
inet 172.16.101.74/16 brd 172.16.255.255 scope global secondary bond0
   valid_lft forever preferred_lft forever
inet6 fe80::3efd:feff::/64 scope link
   valid_lft forever preferred_lft forever


ohh damn.. thanks a lot for this hint.. I delete all the IPs on enp4s0f0, and 
than it works..
but could you please explain why it now works? why he has a problem with this 
IPs?

best regards

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] pcs 0.9.160 released

2017-10-11 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.9.160.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.9.160.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.9.160.zip

Be aware that support for CMAN clusters has been deprecated in this
release. This does not mean it will be removed in the very next release.
Instead, once we get to overhaul commands supporting CMAN clusters to
the new architecture, the support will be removed.


Complete change log for this release:
### Added
- Configurable pcsd port ([rhbz#1415197])
- Description of the `--force` option added to man page and help
  ([rhbz#1491631])

### Fixed
- Fixed some crashes when pcs encounters a non-ascii character in
  environment variables, command line arguments and so on
  ([rhbz#1435697])
- Fixed detecting if systemd is in use ([ghissue#118])
- Upgrade CIB schema version when `resource-discovery` option is used in
  location constraints ([rhbz#1420437])
- Fixed error messages in `pcs cluster report` ([rhbz#1388783])
- Increase request timeout when starting a cluster with large number of
  nodes to prevent timeouts ([rhbz#1463327])
- Fixed "Unable to update cib" error caused by invalid resource
  operation IDs
- `pcs resource op defaults` now fails on an invalid option
  ([rhbz#1341582])
- Fixed behaviour of `pcs cluster verify` command when entered with the
  filename argument ([rhbz#1213946])

### Changed
- CIB changes are now pushed to pacemaker as a diff in commands
  overhauled to the new architecture (previously the whole CIB was
  pushed). This resolves race conditions and ACLs related errors when
  pushing CIB. ([rhbz#1441673])
- All actions / operations defined in resource agent's metadata (except
  meta-data, status and validate-all) are now copied to the CIB when
  creating a resource. ([rhbz#1418199], [ghissue#132])
- Improve documentation of the `pcs stonith confirm` command
  ([rhbz#1489682])

### Deprecated
- This is the last version fully supporting CMAN clusters and python
  2.6. Support for these will be gradually dropped.


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Jan Pokorný, Ondrej Mular, Tomas Jelinek and
Valentin Vidic.

Cheers,
Tomas


[ghissue#118]: https://github.com/ClusterLabs/pcs/issues/118
[ghissue#132]: https://github.com/ClusterLabs/pcs/issues/132
[rhbz#1213946]: https://bugzilla.redhat.com/show_bug.cgi?id=1213946
[rhbz#1341582]: https://bugzilla.redhat.com/show_bug.cgi?id=1341582
[rhbz#1388783]: https://bugzilla.redhat.com/show_bug.cgi?id=1388783
[rhbz#1415197]: https://bugzilla.redhat.com/show_bug.cgi?id=1415197
[rhbz#1418199]: https://bugzilla.redhat.com/show_bug.cgi?id=1418199
[rhbz#1420437]: https://bugzilla.redhat.com/show_bug.cgi?id=1420437
[rhbz#1435697]: https://bugzilla.redhat.com/show_bug.cgi?id=1435697
[rhbz#1441673]: https://bugzilla.redhat.com/show_bug.cgi?id=1441673
[rhbz#1463327]: https://bugzilla.redhat.com/show_bug.cgi?id=1463327
[rhbz#1489682]: https://bugzilla.redhat.com/show_bug.cgi?id=1489682
[rhbz#1491631]: https://bugzilla.redhat.com/show_bug.cgi?id=1491631

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] trouble with IPaddr2

2017-10-11 Thread Valentin Vidic
On Wed, Oct 11, 2017 at 10:51:04AM +0200, Stefan Krueger wrote:
> primitive HA_IP-Serv1 IPaddr2 \
> params ip=172.16.101.70 cidr_netmask=16 \
> op monitor interval=20 timeout=30 on-fail=restart nic=bond0 \
> meta target-role=Started

There might be something wrong with the network setup because enp4s0f0
gets used instead of bond0:

> Oct 11 08:19:32 zfs-serv2 IPaddr2(HA_IP-Serv1)[27672]: INFO: Adding inet 
> address 172.16.101.70/16 with broadcast address 172.16.255.255 to device 
> enp4s0f0

Can you share more info on the network of zfs-serv2, for example: ip a?

-- 
Valentin

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach  writes:

> On 10/11/2017 09:00 AM, Ferenc Wágner wrote:
>
>> Václav Mach  writes:
>>
>>> allow-hotplug eth0
>>> iface eth0 inet dhcp
>>
>> Try replacing allow-hotplug with auto.  Ifupdown simply runs ifup -a
>> before network-online.target, which excludes allow-hotplug interfaces.
>> That means allow-hotplug interfaces are not waited for before corosync
>> is started during boot.
>
> That did the trick for network config using DHCP. Thanks for clarification.
>
> Do you know what is the reason, why allow-hotplug interfaces are
> excluded? It's obivous that if ifup (according to it's man) is run as
> 'ifup -a' it does ignore them, but I don't get why allow hotplug
> interfaces should be ignored by init system.

Allow-hotplug interfaces aren't assumed to be present all the time, but
rather to be plugged in and out arbitrarily.  They are handled by udev,
asynchronously when the system is running.  Waiting for them during
bootup would be strange if you ask me.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] trouble with IPaddr2

2017-10-11 Thread Stefan Krueger
Hello,

I've a simple setup with just 3 resources (at the moment), the ZFS resource 
also works fine.BUT my IPaddr2 don't work, and I dont know why and how to 
resolve that.

my config:
conf sh
node 739272007: zfs-serv1
node 739272008: zfs-serv2
primitive HA_IP-Serv1 IPaddr2 \
params ip=172.16.101.70 cidr_netmask=16 \
op monitor interval=20 timeout=30 on-fail=restart nic=bond0 \
meta target-role=Started
primitive HA_IP-Serv2 IPaddr2 \
params ip=172.16.101.74 cidr_netmask=16 \
op monitor interval=10s nic=bond0
primitive nc_storage ZFS \
params pool=nc_storage importargs="-d /dev/disk/by-partlabel/"
location cli-prefer-HA_IP-Serv1 HA_IP-Serv1 role=Started inf: zfs-serv1
location cli-prefer-HA_IP-Serv2 HA_IP-Serv2 role=Started inf: zfs-serv2
location cli-prefer-nc_storage nc_storage role=Started inf: zfs-serv2
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.16-94ff4df \
cluster-infrastructure=corosync \
cluster-name=debian \
no-quorum-policy=ignore \
default-resource-stickiness=100 \
stonith-enabled=false \
last-lrm-refresh=1507702403


command:
resource move HA_IP-Serv1 zfs-serv2


pacemaker log from zfs-serv2:
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request:
Completed cib_delete operation for section constraints: OK (rc=0, 
origin=zfs-serv1/crm_resource/3, version=0.82.44)
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: 
--- 0.82.44 2
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: 
+++ 0.83.0 (null)
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: +  
/cib:  @epoch=83, @num_updates=0
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: +  
/cib/configuration/constraints/rsc_location[@id='cli-prefer-HA_IP-Serv1']:  
@node=zfs-serv2
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request:
Completed cib_modify operation for section constraints: OK (rc=0, 
origin=zfs-serv1/crm_resource/4, version=0.83.0)
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: 
--- 0.83.0 2
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: 
+++ 0.83.1 (null)
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: +  
/cib:  @num_updates=1
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: +  
/cib/status/node_state[@id='739272007']/lrm[@id='739272007']/lrm_resources/lrm_resource[@id='HA_IP-Serv1']/lrm_rsc_op[@id='HA_IP-Serv1_last_0']:
  @operation_key=HA_IP-Serv1_stop_0, @operation=stop, 
@crm-debug-origin=do_update_resource, 
@transition-key=8:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, 
@transition-magic=0:0;8:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, 
@call-id=55, @last-run=1507702772, @last-rc-change=1507702772, @exec
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0, 
origin=zfs-serv1/crmd/1853, version=0.83.1)
Oct 11 08:19:32 [23938] zfs-serv2   crmd: info: do_lrm_rsc_op:  
Performing key=9:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f 
op=HA_IP-Serv1_start_0
Oct 11 08:19:32 [23935] zfs-serv2   lrmd: info: log_execute:
executing - rsc:HA_IP-Serv1 action:start call_id:17
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_file_backup:
Archived previous version as /var/lib/pacemaker/cib/cib-39.raw
Oct 11 08:19:32 [23935] zfs-serv2   lrmd: info: log_finished:   
finished - rsc:HA_IP-Serv1 action:start call_id:17 pid:27672 exit-code:0 
exec-time:48ms queue-time:0ms
Oct 11 08:19:32 [23938] zfs-serv2   crmd: info: action_synced_wait: 
Managed IPaddr2_meta-data_0 process 27735 exited with rc=0
Oct 11 08:19:32 [23938] zfs-serv2   crmd:   notice: process_lrm_event:  
Result of start operation for HA_IP-Serv1 on zfs-serv2: 0 (ok) | call=17 
key=HA_IP-Serv1_start_0 confirmed=true cib-update=15
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_process_request:
Forwarding cib_modify operation for section status to all (origin=local/crmd/15)
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: 
--- 0.83.1 2
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: Diff: 
+++ 0.83.2 (null)
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: +  
/cib:  @num_updates=2
Oct 11 08:19:32 [23933] zfs-serv2cib: info: cib_perform_op: +  
/cib/status/node_state[@id='739272008']/lrm[@id='739272008']/lrm_resources/lrm_resource[@id='HA_IP-Serv1']/lrm_rsc_op[@id='HA_IP-Serv1_last_0']:
  @operation_key=HA_IP-Serv1_start_0, @operation=start, 
@transition-key=9:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, 
@transition-magic=0:0;9:1574:0:d4b03c3c-1a4e-4609-86ca-675fa4a2ec8f, 
@call-id=17, @rc-code=0, @last-run=1507702772, 

Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Václav Mach



On 10/11/2017 09:00 AM, Ferenc Wágner wrote:

Václav Mach  writes:


allow-hotplug eth0
iface eth0 inet dhcp


Try replacing allow-hotplug with auto.  Ifupdown simply runs ifup -a
before network-online.target, which excludes allow-hotplug interfaces.
That means allow-hotplug interfaces are not waited for before corosync
is started during boot.



That did the trick for network config using DHCP. Thanks for clarification.

Do you know what is the reason, why allow-hotplug interfaces are 
excluded? It's obivous that if ifup (according to it's man) is run as 
'ifup -a' it does ignore them, but I don't get why allow hotplug 
interfaces should be ignored by init system.


--
Václav Mach
CESNET, z.s.p.o.
www.cesnet.cz



smime.p7s
Description: S/MIME Cryptographic Signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterMon mail notification - does not work

2017-10-11 Thread Ferenc Wágner
Donat Zenichev  writes:

> then resource is stopped, but nothing occurred on e-mail destination.
> Where I did wrong actions?

Please note that ClusterMon notifications are becoming deprecated (they
should still work, but I've got no experience with them).  Try using
alerts instead, as documented at
https://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch07.html
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync service not automatically started

2017-10-11 Thread Ferenc Wágner
Václav Mach  writes:

> allow-hotplug eth0
> iface eth0 inet dhcp

Try replacing allow-hotplug with auto.  Ifupdown simply runs ifup -a
before network-online.target, which excludes allow-hotplug interfaces.
That means allow-hotplug interfaces are not waited for before corosync
is started during boot.
-- 
Regards,
Feri

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org