[ClusterLabs] Pacemaker 1.1.18 Release Candidate 4

2017-11-02 Thread Ken Gaillot
I decided to do another release candidate, because we had a large
number of changes since rc3. The fourth release candidate for Pacemaker
version 1.1.18 is now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.18-
rc4

The big changes are numerous scalability improvements and bundle fixes.
We're starting to test Pacemaker with as many as 1,500 bundles (Docker
containers) running on 20 guest nodes running on three 56-core physical
cluster nodes.

For details on the changes in this release, see the ChangeLog.

This is likely to be the last release candidate before the final
release next week. Any testing you can do is very welcome.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] drbd clone not becoming master

2017-11-02 Thread Dennis Jacobfeuerborn
On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote:
> Hi,
> I'm setting up a redundant NFS server for some experiments but almost
> immediately ran into a strange issue. The drbd clone resource never
> promotes either of the to clones to the Master state.
> 
> The state says this:
> 
>  Master/Slave Set: drbd-clone [drbd]
>  Slaves: [ nfsserver1 nfsserver2 ]
>  metadata-fs  (ocf::heartbeat:Filesystem):Stopped
> 
> The resource configuration looks like this:
> 
> Resources:
>  Master: drbd-clone
>   Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
> clone-node-max=1
>   Resource: drbd (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=r0
>Operations: demote interval=0s timeout=90 (drbd-demote-interval-0s)
>monitor interval=60s (drbd-monitor-interval-60s)
>promote interval=0s timeout=90 (drbd-promote-interval-0s)
>start interval=0s timeout=240 (drbd-start-interval-0s)
>stop interval=0s timeout=100 (drbd-stop-interval-0s)
>  Resource: metadata-fs (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd/by-res/r0/0 directory=/var/lib/nfs_shared
> fstype=ext4 options=noatime
>   Operations: monitor interval=20 timeout=40
> (metadata-fs-monitor-interval-20)
>   start interval=0s timeout=60 (metadata-fs-start-interval-0s)
>   stop interval=0s timeout=60 (metadata-fs-stop-interval-0s)
> 
> Location Constraints:
> Ordering Constraints:
>   promote drbd-clone then start metadata-fs (kind:Mandatory)
> Colocation Constraints:
>   metadata-fs with drbd-clone (score:INFINITY) (with-rsc-role:Master)
> 
> Shouldn't one of the clones be promoted to the Master state automatically?

I think the source of the issue is this:

Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called
/usr/sbin/crm_master -Q -l reboot -v 1
Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107
Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command output:
Nov  2 23:12:03 nfsserver1 lrmd[2163]:  notice:
drbd_monitor_6:4673:stderr [ Error signing on to the CIB service:
Transport endpoint is not connected ]

It seems the drbd resource agent tries to use crm_master to promote the
clone but fails because it cannot "sign on to the CIB service". Does
anybody know what that means?

Regards,
  Dennis




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] drbd clone not becoming master

2017-11-02 Thread Dennis Jacobfeuerborn
Hi,
I'm setting up a redundant NFS server for some experiments but almost
immediately ran into a strange issue. The drbd clone resource never
promotes either of the to clones to the Master state.

The state says this:

 Master/Slave Set: drbd-clone [drbd]
 Slaves: [ nfsserver1 nfsserver2 ]
 metadata-fs(ocf::heartbeat:Filesystem):Stopped

The resource configuration looks like this:

Resources:
 Master: drbd-clone
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
clone-node-max=1
  Resource: drbd (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=r0
   Operations: demote interval=0s timeout=90 (drbd-demote-interval-0s)
   monitor interval=60s (drbd-monitor-interval-60s)
   promote interval=0s timeout=90 (drbd-promote-interval-0s)
   start interval=0s timeout=240 (drbd-start-interval-0s)
   stop interval=0s timeout=100 (drbd-stop-interval-0s)
 Resource: metadata-fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd/by-res/r0/0 directory=/var/lib/nfs_shared
fstype=ext4 options=noatime
  Operations: monitor interval=20 timeout=40
(metadata-fs-monitor-interval-20)
  start interval=0s timeout=60 (metadata-fs-start-interval-0s)
  stop interval=0s timeout=60 (metadata-fs-stop-interval-0s)

Location Constraints:
Ordering Constraints:
  promote drbd-clone then start metadata-fs (kind:Mandatory)
Colocation Constraints:
  metadata-fs with drbd-clone (score:INFINITY) (with-rsc-role:Master)

Shouldn't one of the clones be promoted to the Master state automatically?

Regards,
  Dennis

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to mount rpc_pipefs on a different mountpoint on RHEL/CentOS 7?

2017-11-02 Thread Dennis Jacobfeuerborn
On 31.10.2017 12:58, Ferenc Wágner wrote:
> Dennis Jacobfeuerborn  writes:
> 
>> if I create a new unit file for the new file the services would not
>> depend on it so it wouldn't get automatically mounted when they start.
> 
> Put the new unit file under /etc/systemd/system/x.service.requires to
> have x.service require it.  I don't get the full picture, but this trick
> may help puzzle it together.
> 

It seems the nfsserve resource agent isn't compatible with
RHEL/CentOS-7. These systems always mount /var/lib/nfs/rpc_pipefs on
boot but the resource agent script actually checks if "/var/lib/nfs" is
present in /proc/mounts and if that is the case then it will refuse to
start the nfs server thus preventing the fail-over.
I honestly have no good idea how to solve this as the mounting of
/var/lib/nfs/rpc_pipefs is basically hard-coded into the RHEL/CentOS
service files.

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync lost quorum but DLM still gives locks

2017-11-02 Thread Jean-Marc Saffroy
Replying to myself:

On Wed, 11 Oct 2017, Jean-Marc Saffroy wrote:

> I am caught by surprise with this behaviour of DLM:
> - I have 5 nodes (test VMs)
> - 3 of them have 1 vote for the corosync quorum (they are "voters")
> - 2 of them have 0 vote ("non-voters")
> 
> So the corosync quorum is 2.
> 
> On the non-voters, I run DLM and an application that runs it. On DLM, 
> fencing is disabled.
> 
> Now, if I stop corosync on 2 of the voters:
> - as expected, corosync says "Activity blocked"
> - but to my surprise, DLM seems happy to give more locks
> 
> Shouldn't DLM block lock requests in this situation?

Apparently DLM does not care about changes in quorum until there are 
changes in membership of the process groups it is part of. In my test, the 
"voters" do not run DLM, and therefore (I suppose?) DLM does not react to 
their absence.

DLM does block lock requests when quorum is lost AND THEN there is a 
change in membership for the DLM participants, because quorum is required 
for lockspace operations.

Does that make sense?


Cheers,
JM

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Colocation rule with vip and ms master

2017-11-02 Thread Ulrich Windl
>>> Ferenc Wágner  schrieb am 27.10.2017 um 07:41 in Nachricht
<87o9otxi4e@lant.ki.iif.hu>:
> Norberto Lopes  writes:
> 
>> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master
>> colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave
>>
>> Basically what's occurring in my cluster is that the first rule stops the
>> Sync node from being promoted if the Master ever dies. The second doesn't
>> but I can't quite follow why.
> 
> Getting a score of -inf means that the resource won't run.  On the other
> hand, (+)inf just means "strongest" preference.

Interesting: I thought -inf with colocation means "never colocate" (with
postgresMS:Master).

> -- 
> Regards,
> Feri
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org