Re: [ClusterLabs] drbd clone not becoming master
On 03.11.2017 15:49, Ken Gaillot wrote: > On Thu, 2017-11-02 at 23:18 +0100, Dennis Jacobfeuerborn wrote: >> On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote: >>> Hi, >>> I'm setting up a redundant NFS server for some experiments but >>> almost >>> immediately ran into a strange issue. The drbd clone resource never >>> promotes either of the to clones to the Master state. >>> >>> The state says this: >>> >>> Master/Slave Set: drbd-clone [drbd] >>> Slaves: [ nfsserver1 nfsserver2 ] >>> metadata-fs(ocf::heartbeat:Filesystem):Stopped >>> >>> The resource configuration looks like this: >>> >>> Resources: >>> Master: drbd-clone >>> Meta Attrs: master-node-max=1 clone-max=2 notify=true master- >>> max=1 >>> clone-node-max=1 >>> Resource: drbd (class=ocf provider=linbit type=drbd) >>> Attributes: drbd_resource=r0 >>> Operations: demote interval=0s timeout=90 (drbd-demote-interval- >>> 0s) >>> monitor interval=60s (drbd-monitor-interval-60s) >>> promote interval=0s timeout=90 (drbd-promote- >>> interval-0s) >>> start interval=0s timeout=240 (drbd-start-interval- >>> 0s) >>> stop interval=0s timeout=100 (drbd-stop-interval-0s) >>> Resource: metadata-fs (class=ocf provider=heartbeat >>> type=Filesystem) >>> Attributes: device=/dev/drbd/by-res/r0/0 >>> directory=/var/lib/nfs_shared >>> fstype=ext4 options=noatime >>> Operations: monitor interval=20 timeout=40 >>> (metadata-fs-monitor-interval-20) >>> start interval=0s timeout=60 (metadata-fs-start- >>> interval-0s) >>> stop interval=0s timeout=60 (metadata-fs-stop- >>> interval-0s) >>> >>> Location Constraints: >>> Ordering Constraints: >>> promote drbd-clone then start metadata-fs (kind:Mandatory) >>> Colocation Constraints: >>> metadata-fs with drbd-clone (score:INFINITY) (with-rsc- >>> role:Master) >>> >>> Shouldn't one of the clones be promoted to the Master state >>> automatically? >> >> I think the source of the issue is this: >> >> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called >> /usr/sbin/crm_master -Q -l reboot -v 1 >> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107 >> Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command >> output: >> Nov 2 23:12:03 nfsserver1 lrmd[2163]: notice: >> drbd_monitor_6:4673:stderr [ Error signing on to the CIB service: >> Transport endpoint is not connected ] >> >> It seems the drbd resource agent tries to use crm_master to promote >> the >> clone but fails because it cannot "sign on to the CIB service". Does >> anybody know what that means? >> >> Regards, >> Dennis >> > > That's odd, it should only happen if the cluster is not running, but > then the agent wouldn't have been called. > > The CIB is one of the core daemons of pacemaker; it manages the cluster > configuration and status. If it's not running, the cluster can't do > anything. > > Perhaps the CIB is crashing, or something is blocking the communication > between the agent and the CIB. SELinux was the culprit. After disabling it the problem went away. Regards, Dennis ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Prevent resources from beeing restarted when activating placement-strategy
On Fri, 2017-11-03 at 09:55 +0100, Philipp Achmüller wrote: > Hi, > > i have a 4 node cluster with several VirtualDomain resources. > (SLES12.2, pacemaker 1.1.15) > > i set up required resource capacities, and provided capacitiy per > node > > when activating placement-strategy the first time to e.g. balanced > all VMs get restarted once by the cluster. > afterwards i can switch/remove the placement-strategy without any > impact on running resources. > > - why is this happening? > - is there a way to prevent this behaviour on the first time?? > > thank you! > regards > Philipp > Good question, I didn't realize that. crm_simulate is a good tool for exploring that sort of "why", but it's rather arcane. If you have a pe- input file from the transition with the restart, I can take a look. -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] drbd clone not becoming master
On Thu, 2017-11-02 at 23:18 +0100, Dennis Jacobfeuerborn wrote: > On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote: > > Hi, > > I'm setting up a redundant NFS server for some experiments but > > almost > > immediately ran into a strange issue. The drbd clone resource never > > promotes either of the to clones to the Master state. > > > > The state says this: > > > > Master/Slave Set: drbd-clone [drbd] > > Slaves: [ nfsserver1 nfsserver2 ] > > metadata-fs(ocf::heartbeat:Filesystem):Stopped > > > > The resource configuration looks like this: > > > > Resources: > > Master: drbd-clone > > Meta Attrs: master-node-max=1 clone-max=2 notify=true master- > > max=1 > > clone-node-max=1 > > Resource: drbd (class=ocf provider=linbit type=drbd) > > Attributes: drbd_resource=r0 > > Operations: demote interval=0s timeout=90 (drbd-demote-interval- > > 0s) > > monitor interval=60s (drbd-monitor-interval-60s) > > promote interval=0s timeout=90 (drbd-promote- > > interval-0s) > > start interval=0s timeout=240 (drbd-start-interval- > > 0s) > > stop interval=0s timeout=100 (drbd-stop-interval-0s) > > Resource: metadata-fs (class=ocf provider=heartbeat > > type=Filesystem) > > Attributes: device=/dev/drbd/by-res/r0/0 > > directory=/var/lib/nfs_shared > > fstype=ext4 options=noatime > > Operations: monitor interval=20 timeout=40 > > (metadata-fs-monitor-interval-20) > > start interval=0s timeout=60 (metadata-fs-start- > > interval-0s) > > stop interval=0s timeout=60 (metadata-fs-stop- > > interval-0s) > > > > Location Constraints: > > Ordering Constraints: > > promote drbd-clone then start metadata-fs (kind:Mandatory) > > Colocation Constraints: > > metadata-fs with drbd-clone (score:INFINITY) (with-rsc- > > role:Master) > > > > Shouldn't one of the clones be promoted to the Master state > > automatically? > > I think the source of the issue is this: > > Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called > /usr/sbin/crm_master -Q -l reboot -v 1 > Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107 > Nov 2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command > output: > Nov 2 23:12:03 nfsserver1 lrmd[2163]: notice: > drbd_monitor_6:4673:stderr [ Error signing on to the CIB service: > Transport endpoint is not connected ] > > It seems the drbd resource agent tries to use crm_master to promote > the > clone but fails because it cannot "sign on to the CIB service". Does > anybody know what that means? > > Regards, > Dennis > That's odd, it should only happen if the cluster is not running, but then the agent wouldn't have been called. The CIB is one of the core daemons of pacemaker; it manages the cluster configuration and status. If it's not running, the cluster can't do anything. Perhaps the CIB is crashing, or something is blocking the communication between the agent and the CIB. -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] announcement: schedule for resource-agents release 4.1.0
I forgot to update the number of issues and PR's. There are currently 63 issues and 45 pull requests open. On 03/11/17 12:26 +0100, Oyvind Albrigtsen wrote: Hi, This is a tentative schedule for resource-agents v4.1.0: 4.1.0-rc1: November 14. 4.1.0: November 21. I modified the corresponding milestones at https://github.com/ClusterLabs/resource-agents/milestones If there's anything you think should be part of the release please open an issue, a pull request, or a bugzilla, as you see fit. If there's anything that hasn't received due attention, please let us know. Finally, if you can help with resolving issues consider yourself invited to do so. There are currently 49 issues and 38 pull requests still open. Cheers, Oyvind Albrigtsen ___ Developers mailing list develop...@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/developers ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] announcement: schedule for resource-agents release 4.1.0
Hi, This is a tentative schedule for resource-agents v4.1.0: 4.1.0-rc1: November 14. 4.1.0: November 21. I modified the corresponding milestones at https://github.com/ClusterLabs/resource-agents/milestones If there's anything you think should be part of the release please open an issue, a pull request, or a bugzilla, as you see fit. If there's anything that hasn't received due attention, please let us know. Finally, if you can help with resolving issues consider yourself invited to do so. There are currently 49 issues and 38 pull requests still open. Cheers, Oyvind Albrigtsen ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Prevent resources from beeing restarted when activating placement-strategy
Hi, i have a 4 node cluster with several VirtualDomain resources. (SLES12.2, pacemaker 1.1.15) i set up required resource capacities, and provided capacitiy per node when activating placement-strategy the first time to e.g. balanced all VMs get restarted once by the cluster. afterwards i can switch/remove the placement-strategy without any impact on running resources. - why is this happening? - is there a way to prevent this behaviour on the first time?? thank you! regards Philipp ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] pcs 0.9.161 released
I am happy to announce the latest release of pcs, version 0.9.161. Source code is available at: https://github.com/ClusterLabs/pcs/archive/0.9.161.tar.gz or https://github.com/ClusterLabs/pcs/archive/0.9.161.zip Complete change log for this release: ### Added - List of pcs and pcsd capabilities ([rhbz#1230919]) ### Fixed - Fixed `pcs cluster auth` when already authenticated and using different port ([rhbz#1415197]) - It is now possible to restart a bundle resource on one node ([rhbz#1501274]) - `resource update` no longer exits with an error when the `remote-node` meta attribute is set to the same value that it already has ([rhbz#1502715], [ghissue#145]) - Listing and describing resource and stonith agents no longer crashes when agents' metadata contain non-ascii characters ([rhbz#1503110], [ghissue#151]) Thanks / congratulations to everyone who contributed to this release, including Bruno Travouillon, Ivan Devat, Ondrej Mular, Tomas Jelinek and Valentin Vidic. Cheers, Tomas [ghissue#145]: https://github.com/ClusterLabs/pcs/issues/145 [ghissue#151]: https://github.com/ClusterLabs/pcs/issues/151 [rhbz#1230919]: https://bugzilla.redhat.com/show_bug.cgi?id=1230919 [rhbz#1415197]: https://bugzilla.redhat.com/show_bug.cgi?id=1415197 [rhbz#1501274]: https://bugzilla.redhat.com/show_bug.cgi?id=1501274 [rhbz#1502715]: https://bugzilla.redhat.com/show_bug.cgi?id=1502715 [rhbz#1503110]: https://bugzilla.redhat.com/show_bug.cgi?id=1503110 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.18 Release Candidate 4
Ken Gaillotwrites: > I decided to do another release candidate, because we had a large > number of changes since rc3. The fourth release candidate for Pacemaker > version 1.1.18 is now available at: > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.18- > rc4 > > The big changes are numerous scalability improvements and bundle fixes. > We're starting to test Pacemaker with as many as 1,500 bundles (Docker > containers) running on 20 guest nodes running on three 56-core physical > cluster nodes. Hi Ken, That's really cool. What's the size of the CIB with that kind of configuration? I guess it would compress pretty well, but still. Cheers, Kristoffer > > For details on the changes in this release, see the ChangeLog. > > This is likely to be the last release candidate before the final > release next week. Any testing you can do is very welcome. > -- > Ken Gaillot > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org