Re: [ClusterLabs] drbd clone not becoming master

2017-11-03 Thread Dennis Jacobfeuerborn
On 03.11.2017 15:49, Ken Gaillot wrote:
> On Thu, 2017-11-02 at 23:18 +0100, Dennis Jacobfeuerborn wrote:
>> On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote:
>>> Hi,
>>> I'm setting up a redundant NFS server for some experiments but
>>> almost
>>> immediately ran into a strange issue. The drbd clone resource never
>>> promotes either of the to clones to the Master state.
>>>
>>> The state says this:
>>>
>>>  Master/Slave Set: drbd-clone [drbd]
>>>  Slaves: [ nfsserver1 nfsserver2 ]
>>>  metadata-fs(ocf::heartbeat:Filesystem):Stopped
>>>
>>> The resource configuration looks like this:
>>>
>>> Resources:
>>>  Master: drbd-clone
>>>   Meta Attrs: master-node-max=1 clone-max=2 notify=true master-
>>> max=1
>>> clone-node-max=1
>>>   Resource: drbd (class=ocf provider=linbit type=drbd)
>>>    Attributes: drbd_resource=r0
>>>    Operations: demote interval=0s timeout=90 (drbd-demote-interval-
>>> 0s)
>>>    monitor interval=60s (drbd-monitor-interval-60s)
>>>    promote interval=0s timeout=90 (drbd-promote-
>>> interval-0s)
>>>    start interval=0s timeout=240 (drbd-start-interval-
>>> 0s)
>>>    stop interval=0s timeout=100 (drbd-stop-interval-0s)
>>>  Resource: metadata-fs (class=ocf provider=heartbeat
>>> type=Filesystem)
>>>   Attributes: device=/dev/drbd/by-res/r0/0
>>> directory=/var/lib/nfs_shared
>>> fstype=ext4 options=noatime
>>>   Operations: monitor interval=20 timeout=40
>>> (metadata-fs-monitor-interval-20)
>>>   start interval=0s timeout=60 (metadata-fs-start-
>>> interval-0s)
>>>   stop interval=0s timeout=60 (metadata-fs-stop-
>>> interval-0s)
>>>
>>> Location Constraints:
>>> Ordering Constraints:
>>>   promote drbd-clone then start metadata-fs (kind:Mandatory)
>>> Colocation Constraints:
>>>   metadata-fs with drbd-clone (score:INFINITY) (with-rsc-
>>> role:Master)
>>>
>>> Shouldn't one of the clones be promoted to the Master state
>>> automatically?
>>
>> I think the source of the issue is this:
>>
>> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called
>> /usr/sbin/crm_master -Q -l reboot -v 1
>> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107
>> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command
>> output:
>> Nov  2 23:12:03 nfsserver1 lrmd[2163]:  notice:
>> drbd_monitor_6:4673:stderr [ Error signing on to the CIB service:
>> Transport endpoint is not connected ]
>>
>> It seems the drbd resource agent tries to use crm_master to promote
>> the
>> clone but fails because it cannot "sign on to the CIB service". Does
>> anybody know what that means?
>>
>> Regards,
>>   Dennis
>>
> 
> That's odd, it should only happen if the cluster is not running, but
> then the agent wouldn't have been called.
> 
> The CIB is one of the core daemons of pacemaker; it manages the cluster
> configuration and status. If it's not running, the cluster can't do
> anything.
> 
> Perhaps the CIB is crashing, or something is blocking the communication
> between the agent and the CIB.

SELinux was the culprit. After disabling it the problem went away.

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Prevent resources from beeing restarted when activating placement-strategy

2017-11-03 Thread Ken Gaillot
On Fri, 2017-11-03 at 09:55 +0100, Philipp Achmüller wrote:
> Hi, 
> 
> i have a 4 node cluster with several VirtualDomain resources.
> (SLES12.2, pacemaker 1.1.15) 
> 
> i set up required resource capacities, and provided capacitiy per
> node 
> 
> when activating placement-strategy the first time to e.g. balanced
> all VMs get restarted once by the cluster. 
> afterwards i can switch/remove the placement-strategy without any
> impact on running resources. 
> 
> - why is this happening? 
> - is there a way to prevent this behaviour on the first time?? 
> 
> thank you! 
> regards 
> Philipp
> 

Good question, I didn't realize that. crm_simulate is a good tool for
exploring that sort of "why", but it's rather arcane. If you have a pe-
input file from the transition with the restart, I can take a look.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] drbd clone not becoming master

2017-11-03 Thread Ken Gaillot
On Thu, 2017-11-02 at 23:18 +0100, Dennis Jacobfeuerborn wrote:
> On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote:
> > Hi,
> > I'm setting up a redundant NFS server for some experiments but
> > almost
> > immediately ran into a strange issue. The drbd clone resource never
> > promotes either of the to clones to the Master state.
> > 
> > The state says this:
> > 
> >  Master/Slave Set: drbd-clone [drbd]
> >  Slaves: [ nfsserver1 nfsserver2 ]
> >  metadata-fs(ocf::heartbeat:Filesystem):Stopped
> > 
> > The resource configuration looks like this:
> > 
> > Resources:
> >  Master: drbd-clone
> >   Meta Attrs: master-node-max=1 clone-max=2 notify=true master-
> > max=1
> > clone-node-max=1
> >   Resource: drbd (class=ocf provider=linbit type=drbd)
> >    Attributes: drbd_resource=r0
> >    Operations: demote interval=0s timeout=90 (drbd-demote-interval-
> > 0s)
> >    monitor interval=60s (drbd-monitor-interval-60s)
> >    promote interval=0s timeout=90 (drbd-promote-
> > interval-0s)
> >    start interval=0s timeout=240 (drbd-start-interval-
> > 0s)
> >    stop interval=0s timeout=100 (drbd-stop-interval-0s)
> >  Resource: metadata-fs (class=ocf provider=heartbeat
> > type=Filesystem)
> >   Attributes: device=/dev/drbd/by-res/r0/0
> > directory=/var/lib/nfs_shared
> > fstype=ext4 options=noatime
> >   Operations: monitor interval=20 timeout=40
> > (metadata-fs-monitor-interval-20)
> >   start interval=0s timeout=60 (metadata-fs-start-
> > interval-0s)
> >   stop interval=0s timeout=60 (metadata-fs-stop-
> > interval-0s)
> > 
> > Location Constraints:
> > Ordering Constraints:
> >   promote drbd-clone then start metadata-fs (kind:Mandatory)
> > Colocation Constraints:
> >   metadata-fs with drbd-clone (score:INFINITY) (with-rsc-
> > role:Master)
> > 
> > Shouldn't one of the clones be promoted to the Master state
> > automatically?
> 
> I think the source of the issue is this:
> 
> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called
> /usr/sbin/crm_master -Q -l reboot -v 1
> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107
> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command
> output:
> Nov  2 23:12:03 nfsserver1 lrmd[2163]:  notice:
> drbd_monitor_6:4673:stderr [ Error signing on to the CIB service:
> Transport endpoint is not connected ]
> 
> It seems the drbd resource agent tries to use crm_master to promote
> the
> clone but fails because it cannot "sign on to the CIB service". Does
> anybody know what that means?
> 
> Regards,
>   Dennis
> 

That's odd, it should only happen if the cluster is not running, but
then the agent wouldn't have been called.

The CIB is one of the core daemons of pacemaker; it manages the cluster
configuration and status. If it's not running, the cluster can't do
anything.

Perhaps the CIB is crashing, or something is blocking the communication
between the agent and the CIB.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [ClusterLabs Developers] announcement: schedule for resource-agents release 4.1.0

2017-11-03 Thread Oyvind Albrigtsen

I forgot to update the number of issues and PR's.

There are currently 63 issues and 45 pull requests open.

On 03/11/17 12:26 +0100, Oyvind Albrigtsen wrote:

Hi,

This is a tentative schedule for resource-agents v4.1.0:
4.1.0-rc1: November 14.
4.1.0: November 21.

I modified the corresponding milestones at
https://github.com/ClusterLabs/resource-agents/milestones

If there's anything you think should be part of the release
please open an issue, a pull request, or a bugzilla, as you see
fit.

If there's anything that hasn't received due attention, please
let us know.

Finally, if you can help with resolving issues consider yourself
invited to do so. There are currently 49 issues and 38 pull
requests still open.


Cheers,
Oyvind Albrigtsen

___
Developers mailing list
develop...@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/developers


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] announcement: schedule for resource-agents release 4.1.0

2017-11-03 Thread Oyvind Albrigtsen

Hi,

This is a tentative schedule for resource-agents v4.1.0:
4.1.0-rc1: November 14.
4.1.0: November 21.

I modified the corresponding milestones at
https://github.com/ClusterLabs/resource-agents/milestones

If there's anything you think should be part of the release
please open an issue, a pull request, or a bugzilla, as you see
fit.

If there's anything that hasn't received due attention, please
let us know.

Finally, if you can help with resolving issues consider yourself
invited to do so. There are currently 49 issues and 38 pull
requests still open.


Cheers,
Oyvind Albrigtsen

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Prevent resources from beeing restarted when activating placement-strategy

2017-11-03 Thread Philipp Achmüller
Hi,

i have a 4 node cluster with several VirtualDomain resources. (SLES12.2, 
pacemaker 1.1.15)

i set up required resource capacities, and provided capacitiy per node

when activating placement-strategy the first time to e.g. balanced all VMs 
get restarted once by the cluster.
afterwards i can switch/remove the placement-strategy without any impact 
on running resources.

- why is this happening? 
- is there a way to prevent this behaviour on the first time??

thank you!
regards
Philipp
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] pcs 0.9.161 released

2017-11-03 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.9.161.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.9.161.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.9.161.zip


Complete change log for this release:
### Added
- List of pcs and pcsd capabilities ([rhbz#1230919])

### Fixed
- Fixed `pcs cluster auth` when already authenticated and using
  different port ([rhbz#1415197])
- It is now possible to restart a bundle resource on one node
  ([rhbz#1501274])
- `resource update` no longer exits with an error when the `remote-node`
  meta attribute is set to the same value that it already has
  ([rhbz#1502715], [ghissue#145])
- Listing and describing resource and stonith agents no longer crashes
  when agents' metadata contain non-ascii characters ([rhbz#1503110],
  [ghissue#151])


Thanks / congratulations to everyone who contributed to this release,
including Bruno Travouillon, Ivan Devat, Ondrej Mular, Tomas Jelinek and
Valentin Vidic.

Cheers,
Tomas


[ghissue#145]: https://github.com/ClusterLabs/pcs/issues/145
[ghissue#151]: https://github.com/ClusterLabs/pcs/issues/151
[rhbz#1230919]: https://bugzilla.redhat.com/show_bug.cgi?id=1230919
[rhbz#1415197]: https://bugzilla.redhat.com/show_bug.cgi?id=1415197
[rhbz#1501274]: https://bugzilla.redhat.com/show_bug.cgi?id=1501274
[rhbz#1502715]: https://bugzilla.redhat.com/show_bug.cgi?id=1502715
[rhbz#1503110]: https://bugzilla.redhat.com/show_bug.cgi?id=1503110

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker 1.1.18 Release Candidate 4

2017-11-03 Thread Kristoffer Grönlund
Ken Gaillot  writes:

> I decided to do another release candidate, because we had a large
> number of changes since rc3. The fourth release candidate for Pacemaker
> version 1.1.18 is now available at:
>
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.18-
> rc4
>
> The big changes are numerous scalability improvements and bundle fixes.
> We're starting to test Pacemaker with as many as 1,500 bundles (Docker
> containers) running on 20 guest nodes running on three 56-core physical
> cluster nodes.

Hi Ken,

That's really cool. What's the size of the CIB with that kind of
configuration? I guess it would compress pretty well, but still.

Cheers,
Kristoffer

>
> For details on the changes in this release, see the ChangeLog.
>
> This is likely to be the last release candidate before the final
> release next week. Any testing you can do is very welcome.
> -- 
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org