from:"Kristoffer Grönlund"

Re: [ClusterLabs] Adding HAProxy as a Resource

2019-07-11 Thread Kristoffer Grönlund


On 2019-07-11 09:31, Somanath Jeeva wrote:

Hi All,

I am using HAProxy in my environment  which I plan to add to pacemaker
as resource. I see no RA available for that in resource agent.

Should I write a new RA or is there any way to add it to pacemaker as
a systemd service.


Hello,

haproxy works well as a plain systemd service, so you can add it as
systemd:haproxy - that is, instead of an ocf: prefix, just put
systemd:.

If you want the cluster to manage multiple, differently configured
instances of haproxy, you might have to either create custom systemd
service scripts for each one, or create an agent with parameters.

Cheers,
Kristoffer





With Regards
Somanath Thilak J


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Why do clusters have a name?

2019-03-27 Thread Kristoffer Grönlund

On Wed, 2019-03-27 at 12:25 +0100, Jehan-Guillaume de Rorthais wrote:
> On Wed, 27 Mar 2019 10:20:21 +0100
> Kristoffer Grönlund  wrote:
> 
> > On Wed, 2019-03-27 at 10:13 +0100, Jehan-Guillaume de Rorthais
> > wrote:
> > > On Wed, 27 Mar 2019 09:59:16 +0100
> > > Kristoffer Grönlund  wrote:
> > >   
> > > > On Wed, 2019-03-27 at 08:27 +0100, Ivan Devát  wrote:  
> > > > > On 26. 03. 19 21:12, Brian Reichert wrote:
> > > > > > This will sound like a dumb question:
> > > > > > 
> > > > > > The manpage for pcs(8) implies that to set up a cluster,
> > > > > > one
> > > > > > needs
> > > > > > to provide a name.
> > > > > > 
> > > > > > Why do clusters have names?
> > > > > > 
> > > > > > Is there a use case wherein there would be multiple
> > > > > > clusters
> > > > > > visible
> > > > > > in an administrative UI, such that they'd need to be
> > > > > > differentiated?
> > > > > > 
> > > > > 
> > > > > For example in a web UI of pcs is a page with multiple
> > > > > clusters.
> > > > > 
> > > > 
> > > > We use cluster names and rules to apply the same exact CIB to
> > > > multiple
> > > > clusters, particularly when configuring geo clusters.  
> > > 
> > > I'm not sure to understand. Is it possible to have multiple
> > > Pacemaker
> > > daemon instances on the same serveurs?
> > > 
> > > Or do you mean it is possible to have multiple namespace where
> > > resources are
> > > isolated in and one Pacemaker daemon to manage them?
> > >   
> > 
> > I am not sure what you mean by the second, but I am fairly sure I
> > don't
> > mean either of those :) I'm talking about having multiple actual,
> > distinct clusters
> 
> distinct cluster of Pacemaker/corosync daemons on the same servers or
> distinct
> cluster of servers?
> 

Distinct clusters of servers:

Cluster "Tokyo" consisting of node A, B, C
Cluster "Stockholm" consisting of node D, E, F
Cluster "New York" consisting of node G, H, I

All with the same CIB XML document.

Using tickets, resources can then be moved from one cluster to the
other, or cloned across multiple clusters. A cluster of clusters, if
you will.

Cheers,
Kristoffer

> > and sharing the same configuration across all of
> > them,
> 
> Same configuration like, the same file or the same content accross
> different
> files?
> 
> Sorry for being bold...I just don't get it :/
> 
> 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Why do clusters have a name?

2019-03-27 Thread Kristoffer Grönlund

On Wed, 2019-03-27 at 10:13 +0100, Jehan-Guillaume de Rorthais wrote:
> On Wed, 27 Mar 2019 09:59:16 +0100
> Kristoffer Grönlund  wrote:
> 
> > On Wed, 2019-03-27 at 08:27 +0100, Ivan Devát  wrote:
> > > On 26. 03. 19 21:12, Brian Reichert wrote:  
> > > > This will sound like a dumb question:
> > > > 
> > > > The manpage for pcs(8) implies that to set up a cluster, one
> > > > needs
> > > > to provide a name.
> > > > 
> > > > Why do clusters have names?
> > > > 
> > > > Is there a use case wherein there would be multiple clusters
> > > > visible
> > > > in an administrative UI, such that they'd need to be
> > > > differentiated?
> > > >   
> > > 
> > > For example in a web UI of pcs is a page with multiple clusters.
> > >   
> > 
> > We use cluster names and rules to apply the same exact CIB to
> > multiple
> > clusters, particularly when configuring geo clusters.
> 
> I'm not sure to understand. Is it possible to have multiple Pacemaker
> daemon instances on the same serveurs?
> 
> Or do you mean it is possible to have multiple namespace where
> resources are
> isolated in and one Pacemaker daemon to manage them?
> 

I am not sure what you mean by the second, but I am fairly sure I don't
mean either of those :) I'm talking about having multiple actual,
distinct clusters and sharing the same configuration across all of
them, using rules to separate the cases where the configurations
differ.

Cheers,
Kristoffer


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Why do clusters have a name?

2019-03-27 Thread Kristoffer Grönlund

On Wed, 2019-03-27 at 08:27 +0100, Ivan Devát  wrote:
> On 26. 03. 19 21:12, Brian Reichert wrote:
> > This will sound like a dumb question:
> > 
> > The manpage for pcs(8) implies that to set up a cluster, one needs
> > to provide a name.
> > 
> > Why do clusters have names?
> > 
> > Is there a use case wherein there would be multiple clusters
> > visible
> > in an administrative UI, such that they'd need to be
> > differentiated?
> > 
> 
> For example in a web UI of pcs is a page with multiple clusters.
> 

We use cluster names and rules to apply the same exact CIB to multiple
clusters, particularly when configuring geo clusters.

Cheers,
Kristoffer

> Ivan
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Announcing hawk-apiserver, now in ClusterLabs

2019-02-13 Thread Kristoffer Grönlund

Ulrich Windl   writes:

> Hello!
>
> I'd like to comment as an "old" SuSE customer:
> I'm amazed that lighttpd is dropped in favor of some new go application:
> SuSE now has a base system that needs (correct me if I'm wrong): shell, perl,
> python, java, go, ruby, ...?
>

Oh, that list is a lot longer, and this is not the first go project to
make it into SLE.

> Maybe each programmer has his favorite. Personally I also learned quite a lot
> of languages (and even editors), but most being equivalent, you'll have to
> decide whether it makes sense to start using still another language (go in 
> this
> case). Especially i'm afraid of single-vendor languages...

TBH I am more sceptical about languages designed by committee ;)

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>>>> Kristoffer Grönlund  schrieb am 12.02.2019 um 20:00
> in
> Nachricht <87mun0g7c9@suse.com>:
>> Hello everyone,
>> 
>> I just wanted to send out an email about the hawk-apiserver project
>> which was moved into the ClusterLabs organization on Github today. This
>> project is used by us at SUSE for Hawk in our latest releases already,
>> and is also available in openSUSE for use with Hawk. However, I am
>> hoping that it can prove to be useful more generally, not just for Hawk
>> but for other projects that may want to integrate with Pacemaker using
>> the C API, and also to show what is possible when using the API.
>> 
>> To describe the hawk-apiserver briefly, I'll start by describing the use
>> case it was designed to cover: Previously, we were using lighttpd as the
>> web server for Hawk (a Ruby on Rails application), but a while ago the
>> maintainers of lighttpd decided that since Hawk was the only user of
>> this project in SLE, they would like to remove it from the next
>> release. This left Apache as the web server available to us, which has
>> some interesting issues for Hawk: Mainly, we expect people to run apache
>> as a resource in the cluster which might result in a confusing mix of
>> processes on the systems.
>> 
>> At the same time, I had started looking at Go and discovered how easy it
>> was to write a basic proxying web server in Go. So, as an experiment I
>> decided to see if I could replace the use of lighttpd with a custom web
>> server written in Go. Turns out the answer was yes! Once we had our own
>> web server, I discovered new things we could do with it. So here are
>> some of the other unique features in hawk-apiserver now:
>> 
>> * SSL certificate termination, and automatic detection and redirection
>>   from HTTP to HTTPS *on the same port*: Hawk runs on port 7630, and if
>>   someone accesses that port via HTTP, they will get a redirect to the
>>   same port but on HTTPS. It's magic.
>> 
>> * Persistent connection to Pacemaker via the C API, enabling instant
>>   change notification to the web frontend. From the point of view of the
>>   web frontend, this is a long-lived connection which completes when
>>   something changes in the CIB. On the backend side, it uses goroutines
>>   to enable thousands of such long-lived connections with minimal
>>   overhead.
>> 
>> * Optional exposure of the CIB as a REST API. Right now this is somewhat
>>   primitive, but we are working on making this a more fully featured
>>   API.
>> 
>> * Configurable static file serving routes (serve images on /img from
>>   /srv/http/images for example).
>> 
>> * Configurable proxying of subroutes to other web applications.
>> 
>> The URL to the project is https://github.com/ClusterLabs/hawk-apiserver,
>> I hope you will find it useful. Comments, issues and contributions are
>> of course more than welcome.
>> 
>> One final note: hawk-apiserver uses a project called go-pacemaker
>> located at https://github.com/krig/go-pacemaker. I indend to transfer
>> this to ClusterLabs as well. go-pacemaker is still somewhat rough around
>> the edges, and our plan is to work on the C API of pacemaker to make
>> using and exposing it via Go easier, as well as moving functionality
>> from crm_mon into the C API so that status information can be made
>> available in a more convenient format via the API as well.
>> 
>> -- 
>> // Kristoffer Grönlund
>> // kgronl...@suse.com 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterla

[ClusterLabs] Announcing hawk-apiserver, now in ClusterLabs

2019-02-12 Thread Kristoffer Grönlund

Hello everyone,

I just wanted to send out an email about the hawk-apiserver project
which was moved into the ClusterLabs organization on Github today. This
project is used by us at SUSE for Hawk in our latest releases already,
and is also available in openSUSE for use with Hawk. However, I am
hoping that it can prove to be useful more generally, not just for Hawk
but for other projects that may want to integrate with Pacemaker using
the C API, and also to show what is possible when using the API.

To describe the hawk-apiserver briefly, I'll start by describing the use
case it was designed to cover: Previously, we were using lighttpd as the
web server for Hawk (a Ruby on Rails application), but a while ago the
maintainers of lighttpd decided that since Hawk was the only user of
this project in SLE, they would like to remove it from the next
release. This left Apache as the web server available to us, which has
some interesting issues for Hawk: Mainly, we expect people to run apache
as a resource in the cluster which might result in a confusing mix of
processes on the systems.

At the same time, I had started looking at Go and discovered how easy it
was to write a basic proxying web server in Go. So, as an experiment I
decided to see if I could replace the use of lighttpd with a custom web
server written in Go. Turns out the answer was yes! Once we had our own
web server, I discovered new things we could do with it. So here are
some of the other unique features in hawk-apiserver now:

* SSL certificate termination, and automatic detection and redirection
  from HTTP to HTTPS *on the same port*: Hawk runs on port 7630, and if
  someone accesses that port via HTTP, they will get a redirect to the
  same port but on HTTPS. It's magic.

* Persistent connection to Pacemaker via the C API, enabling instant
  change notification to the web frontend. From the point of view of the
  web frontend, this is a long-lived connection which completes when
  something changes in the CIB. On the backend side, it uses goroutines
  to enable thousands of such long-lived connections with minimal
  overhead.

* Optional exposure of the CIB as a REST API. Right now this is somewhat
  primitive, but we are working on making this a more fully featured
  API.

* Configurable static file serving routes (serve images on /img from
  /srv/http/images for example).

* Configurable proxying of subroutes to other web applications.

The URL to the project is https://github.com/ClusterLabs/hawk-apiserver,
I hope you will find it useful. Comments, issues and contributions are
of course more than welcome.

One final note: hawk-apiserver uses a project called go-pacemaker
located at https://github.com/krig/go-pacemaker. I indend to transfer
this to ClusterLabs as well. go-pacemaker is still somewhat rough around
the edges, and our plan is to work on the C API of pacemaker to make
using and exposing it via Go easier, as well as moving functionality
from crm_mon into the C API so that status information can be made
available in a more convenient format via the API as well.

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools

2019-01-08 Thread Kristoffer Grönlund

On Tue, 2019-01-08 at 10:07 -0600, Ken Gaillot wrote:
> On Tue, 2019-01-08 at 10:30 +0100, Kristoffer Grönlund wrote:
> > On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote:
> > > 
> > Having all the tools able to produce XML output like cibadmin and
> > crm_mon would be good in general, I think. So that seems like a
> > good
> > proposal to me.
> > 
> > In the case of an error, at least in my experience just getting a
> > return code and stderr output is enough to make sense of it -
> > getting
> > XML on stderr in the case of an error wouldn't seem like something
> > that
> > would add much value to me.
> 
> There are two benefits: it can give extended information (such as the
> text string that corresponds to a numeric exit status), and because
> it
> would also be used by any future REST API (which won't have stderr),
> API/CLI output could be parsed identically.
> 

Hm, am I understanding you correctly:

My sort-of vision for implementing a REST API has been to move all of
the core functionality out of the command line tools and into the C
libraries (I think we discussed something like a libpacemakerclient
before) - the idea is that the XML output would be generated on that
level?

If so, that is something that I am all for :)

Right now, we are experimenting with a REST API based on taking what we
use in Hawk and moving that into an API server written in Go, and just
calling crm_mon --as-xml to get status information that can be exposed
via the API. Having that available in C directly and not having to call
out to command line tools would be great and a lot cleaner:

https://github.com/krig/hawk-apiserver
https://github.com/hawk-ui/hawk-web-client

Cheers,
Kristoffer

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools

2019-01-08 Thread Kristoffer Grönlund

On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote:
> There has been some discussion in the past about generating more
> machine-friendly output from pacemaker CLI tools for scripting and
> high-level interfaces, as well as possibly adding a pacemaker REST
> API.
> 
> I've filed an RFE BZ
> 
>  https://bugs.clusterlabs.org/show_bug.cgi?id=5376
> 
> to design an output interface that would suit these goals. An actual
> REST API is not planned at this point, but this would provide a key
> component of any future implementation.

Having all the tools able to produce XML output like cibadmin and
crm_mon would be good in general, I think. So that seems like a good
proposal to me.

In the case of an error, at least in my experience just getting a
return code and stderr output is enough to make sense of it - getting
XML on stderr in the case of an error wouldn't seem like something that
would add much value to me.

Cheers,
Kristoffer

> 
> The question is what machine-friendly output should look like. The
> basic idea is: for commands like "crm_resource --constraints" or
> "stonith_admin --history", what output format would be most useful
> for
> a GUI or other program to parse?
> 
> Suggestions welcome here and/or on the bz ...
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fwd: After failover Pacemaker moves resource back when dead node become up

2019-01-04 Thread Kristoffer Grönlund

On Fri, 2019-01-04 at 15:27 +0300, Özkan Göksu  wrote:
> Hello.
> 
> I'm using Pacemaker & Corosync for my cluster. When a node dies
> pacemaker
> moving my resources to another online node. Everything ok here.
> But when the dead node comes back, Pacemaker moving the resource
> back. I
> don't have any "location" line in my config and also I tried with
> "unmove"
> command but nothing changed.
> corosync & pacemaker services are enabled and starting at boot. If I
> run it
> manually it does not move resources failback.
> 
> How can I stop moving the resource if it is running normally?

Configuring a positive resource-stickiness should take care of this for
you, so there has to be something else going on. Do you get any strange
errors reported for the resources on the second node? Check if there is
any failcount for the resources on that node using "crm_mon --
failcounts". Other than that, looking in the logs for anything unusual
would be my next move.

Another thing that stands out to me is that you configure a monitor
action for the gui resource, but you don't set a timeout. I'm not sure
what the default is there, so I would configure a timeout explicitly.

Finally, it looks like you have a 2-node cluster with STONITH disabled.
That's not going to work. You need some kind of stonith, or things will
behave badly. So that could be why you're seeing strange behavior.

Cheers,
Kristoffer

> 
> *crm configure sh*
> 
> node 1: DEV1
> node 2: DEV2
> primitive poolip IPaddr2 \
> params ip=10.1.60.33 nic=enp2s0f0 cidr_netmask=24 \
> meta migration-threshold=2 target-role=Started \
> op monitor interval=20 timeout=20 on-fail=restart
> primitive gui systemd:gui \
> op monitor interval=20s \
> meta target-role=Started
> primitive gui-ip IPaddr2 \
> params ip=10.1.60.35 nic=enp2s0f0 cidr_netmask=24 \
> meta migration-threshold=2 target-role=Started \
> op monitor interval=20 timeout=20 on-fail=restart
> colocation cluster-gui inf: gui gui-ip
> order gui-after-ip Mandatory: gui-ip gui
> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=2.0.0-1-8cf3fe749e \
> cluster-infrastructure=corosync \
> cluster-name=mycluster \
> stonith-enabled=false \
> no-quorum-policy=ignore \
> last-lrm-refresh=1545920437
> rsc_defaults rsc-options: \
> migration-threshold=10 \
> resource-stickiness=100
> 
> *pcs resource defaults*
> 
> migration-threshold=10
> resource-stickiness=100
> 
> *pcs resource show gui*
> 
> Resource: gui (class=systemd type=gui)
>  Meta Attrs: target-role=Started
>  Operations: monitor interval=20s (gui-monitor-20s)
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Coming in Pacemaker 2.0.1 / 1.1.20: improved fencing history

2018-12-12 Thread Kristoffer Grönlund

On Tue, 2018-12-11 at 14:48 -0600, Ken Gaillot wrote:
> Pacemaker has long had the stonith_admin --history option to show a
> history of past fencing actions that the cluster has carried out.
> However, this list included only events since the node it was run on
> had joined the cluster, and it just wasn't very convenient.
> 
> In the upcoming release, the cluster keeps the fence history
> synchronized across all nodes, so you get the same answer no matter
> which node you query.

This is a great feature!

On a related note, it would be amazing to have the complete transition
history synchronized across all nodes as well..

Cheers,
Kristoffer

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Announcing Anvil! m2 v2.0.7

2018-11-20 Thread Kristoffer Grönlund

On Tue, 2018-11-20 at 02:25 -0500, Digimer wrote:
> * https://github.com/ClusterLabs/striker/releases/tag/v2.0.7
> 
> This is the first release since March, 2018. No critical issues are
> know
> or where fixed. Users are advised to upgrade.
> 

Congratulations!

Cheers,
Kristoffer

> Main bugs fixed;
> 
> * Fixed install issues for Windows 10 and 2016 clients.
> * Improved duplicate record detection and cleanup in scan-clustat and
> scan-storcli.
> * Disabled the detection and recovery of 'paused' state servers (it
> caused more trouble than it solved).
> 
> Notable new features;
> * Improved the server boot logic to choose the node with the most
> running servers, all else being equal.
> * Updated UPS power transfer reason alerts from "warning" to "notice"
> level alerts.
> * Added support for EL 6.10.
> 
> Users can upgrade using 'striker-update' from their Striker
> dashboards.
> 
> /sbin/striker/striker-update --local
> /sbin/striker/striker-update --anvil all
> 
> Please feel free to report any issues in the Striker github
> repository.
> 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] resource-agents v4.2.0

2018-10-24 Thread Kristoffer Grönlund

On Wed, 2018-10-24 at 10:21 +0200, Oyvind Albrigtsen wrote:
> ClusterLabs is happy to announce resource-agents v4.2.0.
> Source code is available at:
> https://github.com/ClusterLabs/resource-agents/releases/tag/v4.2.0
> 

[snip]

>   - ocf.py: new Python library and dev guide
> 

I just wanted to highlight the Python library since I think it can make
agent development a lot easier in the future, especially as we expand
the library with more utilities that are commonly needed when writing
agents.

Any agents written in Python should (for now at least) be compatible
both with Python 2.7+ and Python 3.3+. We still need to expand the CI
to actually verify that agents do support these versions, so anyone who
would like to help out improving the test setup is more than welcome to
do so :)

The biggest example of an agent using it that we have now is the azure-
events agent [1], so I would recommend anyone interested in working on
new agents to take a look at that. For a more compact example, I wrote
a version of the Dummy resource agent using the ocf.py library and put
it in a gist [2], and then there is a small example in the document
describing the library and how to use it [3].

[1]: https://github.com/ClusterLabs/resource-agents/blob/master/heartbe
at/azure-events.in
[2]: https://gist.github.com/krig/6676d0ae065fd852fac8b445410e1c95
[3]: https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev
-guides/writing-python-agents.md

Cheers,
Kristoffer

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] resource-agents v4.2.0 rc1

2018-10-19 Thread Kristoffer Grönlund

On Fri, 2018-10-19 at 10:55 +0200, Oyvind Albrigtsen wrote:
> On 18/10/18 19:43 +0200, Valentin Vidic wrote:
> > On Wed, Oct 17, 2018 at 12:03:18PM +0200, Oyvind Albrigtsen wrote:
> > >  - apache: retry PID check.
> > 
> > I noticed that the ocft test started failing for apache in this
> > version. Not sure if the test is broken or the agent. Can you
> > check if the test still works for you? Restoring the previous
> > version of the agent fixes the problem for me.
> 
> It seems to work fine for me except for the that I had to change name
> from apache2 to httpd (which it's called on RHEL and Fedora) in the
> ocft-config, so I think we need some additional logic for that.

I wonder if perhaps there was a configuration change as well, since the
return code seems to be configuration related. Maybe something changed
in the build scripts that moved something around? Wild guess, but...

Cheers,
Kristoffer

> > 
> > # ocft test -v apache
> > Initializing 'apache' ...
> > Done.
> > 
> > Starting 'apache' case 0 'check base env':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 1 'check base env: set non-existing
> > OCF_RESKEY_statusurl':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 2 'check base env: set non-existing
> > OCF_RESKEY_configfile':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 3 'normal start':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 4 'normal stop':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 5 'double start':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 6 'double stop':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 7 'running monitor':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 8 'not running monitor':
> > ERROR: './apache monitor' failed, the return code is 2.
> > Starting 'apache' case 9 'unimplemented command':
> > ERROR: './apache monitor' failed, the return code is 2.
> > 
> > -- 
> > Valentin
> > ___
> > Users mailing list: Users@clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
> > h.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
> 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] crm resource stop VirtualDomain - how to know when/if VirtualDomain is really stopped ?

2018-10-11 Thread Kristoffer Grönlund

On Thu, 2018-10-11 at 13:59 +0200,  Lentes, Bernd  wrote:
> Hi,
> 
> i'm trying to write a script which shutdown my VirtualDomains in the
> night for a short period to take a clean snapshot with libvirt.
> To shut them down i can use "crm resource stop VirtualDomain".
> 
> But when i do a "crm resource stop VirtualDomain" in my script, the
> command returns immediately. How can i know if my VirtualDomains are
> really stopped, because the shutdown may take up to several minutes.
> 
> I know i could do something with a loop and "crm resource status" and
> grepping for e.g. stopped, but i would prefer a cleaner solution.
> 
> Any ideas ?

You should be able to pass -w to crm,

crm -w resource stop VirtualDomain

That should wait until the policy engine settles down again.

Cheers,
Kristoffer

> 
> Thanks.
> 
> 
> Bernd
> 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: meatware stonith

2018-09-27 Thread Kristoffer Grönlund

On Thu, 2018-09-27 at 02:49 -0400, Digimer wrote:
> On 2018-09-27 01:54 AM, Ulrich Windl wrote:
> > > > > Digimer  schrieb am 26.09.2018 um 18:29 in
> > > > > Nachricht
> > 
> > <1c70b5e2-ea8e-8cbe-3d83-e207ca47b...@alteeve.ca>:
> > > On 2018-09-26 11:11 AM, Patrick Whitney wrote:
> > > > Hey everyone,
> > > > 
> > > > I'm doing some pacemaker/corosync/dlm/clvm testing.  I'm
> > > > without a power
> > > > fencing solution at the moment, so I wanted to utilize
> > > > meatware, but it
> > > > doesn't show when I list available stonith devices (pcs stonith
> > > > list).
> > > > 
> > > > I do seem to have it on the system, as cluster-glue is
> > > > installed, and I
> > > > see meatware.so and meatclient on the system, and I also see
> > > > meatware
> > > > listed when running the command 'stonith -L' 
> > > > 
> > > > Can anyone guide me as to how to create a stonith meatware
> > > > resource
> > > > using pcs? 
> > > > 
> > > > Best,
> > > > -Pat
> > > 
> > > The "fence_manual" agent was removed after EL5 days, a lng
> > > time ago,
> > > because it so often led to split-brains because of misuse. Manual
> > > fencing is NOT recommended.
> > > 
> > > There are new options, like SBD (storage-based death) if you have
> > > a
> > > watchdog timer.
> > 
> > And even if you do not ;-)
> 
> I've not used SBD. How, without a watchdog timer, can you be sure the
> target node is dead?

You can't. You can use the Linux softdog module though, but since it is
a pure software solution it is limited and not ideal.

> 
-- 

Cheers,
Kristoffer

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Q: Reusing date specs in crm shell

2018-09-13 Thread Kristoffer Grönlund

On Tue, 2018-09-11 at 13:52 +0200,  Ulrich Windl  wrote:
> Hi!
> 
> I have a set of resources with almost identical rules, one part being
> a data spec. Currently I'm using two different date specs in those
> rules. However I repeated the date spec in every rule. Foreseeing
> that I might change those one day, I wonder whether it's possible in
> crm shell to define a date spec once (outside of any resource for
> symmetry) and reference that data spec inside a rule. Ok, time for an
> example:
> 
> meta 1: ...default settings... \
> meta 2: rule 0: date spec hours=7-18 weekdays=1-5 ...override
> settings outside prime time...
> 
> In the crm manual page the reference examples use dummy primitives.
> 

I wonder if this could be done with id-based references, but it's not
something I've actually experimented with. Not a great answer, I
know...

> Regards,
> Ulrich
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
> 
-- 

Cheers,
Kristoffer

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Q: ordering for a monitoring op only?

2018-08-20 Thread Kristoffer Grönlund

On Mon, 2018-08-20 at 10:51 +0200,  Ulrich Windl  wrote:
> Hi!
> 
> I wonder whether it's possible to run a monitoring op only if some
> specific resource is up.
> Background: We have some resource that runs fine without NFS, but the
> start, stop and monitor operations will just hang if NFS is down. In
> effect the monitor operation will time out, the cluster will try to
> recover, calling the stop operation, which in turn will time out,
> making things worse (i.e.: causing a node fence).
> 
> So my idea was to pause the monitoing operation while NFS is down
> (NFS itself is controlled by the cluster and should recover "rather
> soon" TM).
> 
> Is that possible?

It would be a lot better to fix the problem in the RA which causes it
to fail when NFS is down, I would think?

> And before you ask: No, I have not written that RA that has the
> problem; a multi-million-dollar company wrote it (Years before I had
> written a monitor for HP-UX' cluster that did not have this problem,
> even though the configuration files were read from NFS (It's not
> magic: Just periodically copy them to shared memory, and read the
> config from shared memory).
> 
> Regards,
> Ulrich
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
> 
-- 

Cheers,
Kristoffer

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] crm --version shows "cam dev"

2018-07-04 Thread Kristoffer Grönlund

On Wed, 2018-07-04 at 17:52 +0200, Salvatore D'angelo wrote:
> Hi,
> 
> With crash 2.2.0 the command:
> cam —version
> works fine. I downloaded 3.0.1 and it shows:
> crm dev
> 
> I know this is not a big issue but I just wanted to verify I
> installed the correct version of crash.
> 

It's probably right, but can you describe in more detail from where you
downloaded and how you installed it?

Cheers,
Kristoffer

> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] difference between external/ipmi and fence_ipmilan

2018-06-27 Thread Kristoffer Grönlund

"Stefan K"  writes:

> OK I see, but it would be good if somebody mark one of this as deprecated and 
> then delete it. So that noone get confused about these.
>

The external/* agents are not deprecated, though. Future agents will be
implemented in the fence-agents framework, but the existing agents are
still being used (not by RH, but by SUSE at least).

Cheers,
Kristoffer

> best regards
> Stefan
>
>> Gesendet: Dienstag, 26. Juni 2018 um 18:26 Uhr
>> Von: "Ken Gaillot" 
>> An: "Cluster Labs - All topics related to open-source clustering welcomed" 
>> 
>> Betreff: Re: [ClusterLabs] difference between external/ipmi and fence_ipmilan
>>
>> On Tue, 2018-06-26 at 12:00 +0200, Stefan K wrote:
>> > Hello,
>> > 
>> > can somebody tell me the difference between external/ipmi and
>> > fence_ipmilan? Are there preferences?
>> > Is one of these more common or has some advantages? 
>> > 
>> > Thanks in advance!
>> > best regards
>> > Stefan
>> 
>> The distinction is mostly historical. At one time, there were two
>> different open-source clustering environments, each with its own set of
>> fence agents. The community eventually settled on Pacemaker as a sort
>> of merged evolution of the earlier environments, and so it supports
>> both styles of fence agents. Thus, you often see an "external/*" agent
>> and a "fence_*" agent available for the same physical device.
>> 
>> However, they are completely different implementations, so there may be
>> substantive differences as well. I'm not familiar enough with these two
>> to address that, maybe someone else can.
>> -- 
>> Ken Gaillot 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?

2018-06-15 Thread Kristoffer Grönlund

Jan Pokorný  writes:

>> 4.  [ ] Do you use "tag" based syntactic grouping[3] in CIB?
>
> 0x
>
> keeps me at guess what it was meant to/could be used for in practice
> (had some ideas but will gladly be surprised if anyone's going to
> give it a crack)
>

The background for this feature as far as I understand it was related to
booth-based geo clusters, where the tag feature made it easier to unify
the configuration of two geo clusters. Hawk also supports the tag
feature via the user interface, where you can get a custom status view
for a tag showing only the tagged resources instead of the whole cluster
status.

I honestly don't know how much use it sees in practice.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Booth fail-over conditions

2018-04-16 Thread Kristoffer Grönlund

Zach Anderson  writes:

>  Hey all,
>
> new user to pacemaker/booth and I'm fumbling my way through my first proof
> of concept. I have a 2 site configuration setup with local pacemaker
> clusters at each site (running rabbitmq) and a booth arbitrator. I've
> successfully validated the base failover when the "granted" site has
> failed. My question is if there are any other ways to configure failover,
> i.e. using resource health checks or the like?
>

Hi Zach,

Do you mean that a resource health check should trigger site failover?
That's actually something I'm not sure comes built-in.. though making a
resource agent which revokes a ticket on failure should be fairly
straight-forward. You could then group your resource which the ticket
resource to enable this functionality.

The logic in the ticket resource ought to be something like "if monitor
fails and the current site is granted, then revoke the ticket, else do
nothing". You would probably want to handle probe monitor invocations
differently. There is a ocf_is_probe function provided to help with
this.

Cheers,
Kristoffer

> Thanks!
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-09 Thread Kristoffer Grönlund

Jehan-Guillaume de Rorthais  writes:

>
> I feel like you guys are talking of a solution that already exists and you
> probably already know, eg. "etcd".
>
> Etcd provides:
>
> * a cluster wide key/value storage engine
> * support quorum
> * key locking
> * atomic changes
> * REST API
> * etc...
>
> However, it requires to open a new TCP port, indeed :/
>

My main inspiration and reasoning is indeed to introduce the same
functionality provided by etcd into a corosync-based cluster without
having to add a parallel cluster consensus solution. Simply installing
etcd means 1) now you have two clusters, 2) etcd doesn't handle 2-node
clusters or fencing and doesn't degrade well to a single node, 3)
relying on the presence of the KV-store in pacemaker tools is not an
option unless pacemaker wants to make etcd a requirement.

Cheers,
Kristoffer

> Moreover, as a RA developer, I am currently messing with attrd weird
> behavior[1], so any improvement there is welcomed :)
>
> Cheers,
>
> [1] https://github.com/ClusterLabs/PAF/issues/131
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-09 Thread Kristoffer Grönlund

Jan Pokorný  writes:

> /me keenly joins the bike-shedding
>
> What about pcmk-based/pcmk-infod.  First, we effectively tone down
> "common information/base" from the expanded CIB abbreviation[*1],
> and second, in the former case, we highlight that's the central point
> providing resident data glue (pcmk-datad?[*2]) amongst the other daemons.

pcmk-infod sounds pretty good to me, it indicates data management /
central information handling etc. Plus it contains at least part of one
of the words of the expansion of "CIB".

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-06 Thread Kristoffer Grönlund

Klaus Wenninger  writes:

>
> One thing I thought over as well is some kind of
> a chicken & egg issue arising when you want to
> use the syncing-mechanism so setup (bootstrap)
> the cluster.
> So something like the ssh-mechanism pcsd is
> using might still be needed.
> The file-syncing approach would have the data
> easily available locally prior to starting the
> actual cluster-wide syncing.
>
> Well ... no solutions or anything ... just
> a few thoughts I had on that issue ... 25ct max ;-)
>

Bootstrapping is a problem I've thought about quite a bit.. It's
possible to implement in a number of ways, and it's not clear what's the
better approach. But I see a cluster-wide configuration database as an
enabler for better bootstrapping rather than a hurdle. If a new node
doesn't need a local copy of the database but can access the database
from an existing node, it would be possible for the new node to
bootstrap itself into the cluster with nothing more than remote access
to that database, so a single port to open and a single authentication
mechanism - this could certainly be handled over SSH just like pcsd and
crmsh implements it today.

But yes, at some point there needs to be communication channel opened..

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-06 Thread Kristoffer Grönlund

Ken Gaillot  writes:

> On Tue, 2018-04-03 at 08:33 +0200, Kristoffer Grönlund wrote:
>> Ken Gaillot  writes:
>> 
>> > > I
>> > > would vote against PREFIX-configd as compared to other cluster
>> > > software,
>> > > I would expect that daemon name to refer to a more generic
>> > > cluster
>> > > configuration key/value store, and that is something that I have
>> > > some
>> > > hope of adding in the future ;) So I'd like to keep "config" or
>> > > "database" for such a possible future component...
>> > 
>> > What's the benefit of another layer over the CIB?
>> > 
>> 
>> The idea is to provide a more generalized key-value store that other
>> applications built on top of pacemaker can use. Something like a
>> HTTP REST API to a key-value store with transactional semantics
>> provided
>> by the cluster. My understanding so far is that the CIB is too heavy
>> to
>> support that kind of functionality well, and besides that the
>> interface
>> is not convenient for non-cluster applications.
>
> My first impression is that it sounds like a good extension to attrd,
> cluster-wide attributes instead of node attributes. (I would envision a
> REST API daemon sitting in front of all the daemons without providing
> any actual functionality itself.)
>
> The advantage to extending attrd is that it already has code to
> synchronize attributes at start-up, DC election, partition healing,
> etc., as well as features such as write dampening.

Yes, I've considered that as well and yes, I think it could make
sense. I need to gain a better understanding of the current attrd
implementation to see how to make it do what I want. The configd
name/part comes into play when bringing in syncing data beyond the
key-value store (see below).

>
> Also cib -> pcmk-configd is very popular :)
>

I can live with it. ;)

>> My most immediate applications for that would be to build file
>> syncing
>> into the cluster and to avoid having to have an extra communication
>> layer for the UI.
>
> How would file syncing via a key-value store work?
>
> One of the key hurdles in any cluster-based sync is
> authentication/authorization. Authorization to use a cluster UI is not
> necessarily equivalent to authorization to transfer arbitrary files as
> root.
>

Yeah, the key-value store wouldn't be enough to implement file
syncing, but it could potentially be the mechanism by which the file
syncing implementation maintains its state. I'm somewhat conflating two
things that I want that are both related to syncing configuration beyond
the cluster daemon itself across the cluster.

I don't see authentication/authorization as a hurdle or blocker, but
it's certainly something that needs to be considered. Clearly a
less-privileged user shouldn't be able to configure syncing of
root-owned files across the cluster.

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-04-02 Thread Kristoffer Grönlund

Ken Gaillot  writes:

>> I
>> would vote against PREFIX-configd as compared to other cluster
>> software,
>> I would expect that daemon name to refer to a more generic cluster
>> configuration key/value store, and that is something that I have some
>> hope of adding in the future ;) So I'd like to keep "config" or
>> "database" for such a possible future component...
>
> What's the benefit of another layer over the CIB?
>

The idea is to provide a more generalized key-value store that other
applications built on top of pacemaker can use. Something like a
HTTP REST API to a key-value store with transactional semantics provided
by the cluster. My understanding so far is that the CIB is too heavy to
support that kind of functionality well, and besides that the interface
is not convenient for non-cluster applications.

My most immediate applications for that would be to build file syncing
into the cluster and to avoid having to have an extra communication
layer for the UI.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons

2018-03-29 Thread Kristoffer Grönlund

Ken Gaillot  writes:

> Hi all,
>
> Andrew Beekhof brought up a potential change to help with reading
> Pacemaker logs.
>
> Currently, pacemaker daemon names are not intuitive, making it
> difficult to search the system log or understand what each one does.
>
> The idea is to rename the daemons, with a common prefix, and a name
> that better reflects the purpose.
>

[...]

> Here are the current names, with some example replacements:
>
>  pacemakerd: PREFIX-launchd, PREFIX-launcher
>
>  attrd: PREFIX-attrd, PREFIX-attributes
>
>  cib: PREFIX-configd, PREFIX-state
>
>  crmd: PREFIX-controld, PREFIX-clusterd, PREFIX-controller
>
>  lrmd: PREFIX-locald, PREFIX-resourced, PREFIX-runner
>
>  pengine: PREFIX-policyd, PREFIX-scheduler
>
>  stonithd: PREFIX-fenced, PREFIX-stonithd, PREFIX-executioner
>
>  pacemaker_remoted: PREFIX-remoted, PREFIX-remote

Better to do it now rather than later. I vote in favor of changing the
names. Yes, it'll mess up crmsh, but at least for distributions it's
just a simple search/replace patch to apply.

I would also vote in favour of sticking to the 15 character limit, and
to use "pcmk" as the prefix. That leaves 11 characters for the name,
which should be enough for anyone ;)

My votes:

pacemakerd -> pcmk-launchd
attrd -> pcmk-attrd
cib -> pcmk-stated
crmd -> pcmk-controld
lrmd -> pcmk-resourced
pengine -> pcmk-schedulerd
stonithd -> pcmk-fenced
pacemaker_remoted -> pcmk-remoted

The one I'm the most divided about is cib. pcmk-cibd would also work. I
would vote against PREFIX-configd as compared to other cluster software,
I would expect that daemon name to refer to a more generic cluster
configuration key/value store, and that is something that I have some
hope of adding in the future ;) So I'd like to keep "config" or
"database" for such a possible future component...

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] crm shell 2.1.2 manual bug?

2018-03-28 Thread Kristoffer Grönlund

"Ulrich Windl"  writes:

> Hi!
>
> For crmsh-2.1.2+git132.gbc9fde0-18.2 I think there's a bug in the manual 
> describing resource sets:
>
>sequential
>If true, the resources in the set do not depend on each other 
> internally. Setting sequential to true implies a strict order of dependency 
> within the set.
>
> Obviously "true" cannot mean both: "do not depend" and "depend". My guess is 
> that the first true has to be false.

Right, "do not depend" should be "depend" there. Thanks for catching it :)

> I came across this when trying to add a colocation like this:
> colocation col_LV inf:( cln_LV cln_LV-L1 cln_LV-L2 cln_ML cln_ML-L1 cln_ML-L2 
> ) cln_VMs
>
> crm complained about this:
> ERROR: 1: syntax in role: Unmatched opening bracket near  parsing 
> 'colocation ...'
> ERROR: 2: syntax: Unknown command near  parsing 'cln_ml-l2 ) 
> cln_VMs'
> (note the lower case)

The problem reported is that there is no space between "inf:" and "(" -
the parser in crmsh doesn't handle missing spaces between tokens right
now.

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] How to configure lifetime in constraints?

2018-03-21 Thread Kristoffer Grönlund

xin  writes:

> Hi:
>
>I noticed that in latest constraints schema file(constraints-3.0.rng),
>"element-lifetime" is an option in location/colocation/order, and it 
> linked to rule-2.9.rng.
>
>I can not find the keyword "lifetime" in upstream document "Pacemaker 
> 1.1 Configuration Explained",
>then I guess "date_expression" in rules means lifetime.
>
>So I write this xml section in file cons1.xml:
>###
>
>  
>
>  
>
>  
>
>###

First off, the lifetime element is deprecated IIRC and not necessary.

Second, the required rule is somewhat complicated. I would recommend
using the crmsh "crm resource ban" command to create the constraint with
a lifetime, and then look at the CIB XML to see what the created
constraint looks like.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Error when linking to libqb in shared library

2018-02-12 Thread Kristoffer Grönlund

Jan Pokorný  writes:

> I guess you are linking your python extension with one of the
> pacemaker libraries (directly on indirectly to libcrmcommon), and in
> that case, you need to rebuild pacemaker with the patched libqb[*] for
> the whole arrangement to work.  Likewise in that case, as you may be
> aware, the "API" is quite uncommitted at this point, stability hasn't
> been of importance so far (because of the handles into pacemaker being
> mostly abstracted through built-in CLI tools for the outside players
> so far, which I agree is encumbered with tedious round-trips, etc.).
> There's a huge debt in this area, so some discretion and perhaps
> feedback which functions are indeed proper-API-worth is advised.

The ultimate goal of my project is indeed to be able to propose or begin
a discussion around a stable API for Pacemaker to eventually move away
from command-line tools as the only way to interact with the cluster.

Thank you, I'll investigate the proposed changes.

Cheers,
Kristoffer

>
> [*]
> shortcut 1: just recompile pacemaker with those extra
> /usr/include/qb/qblog.h modifications as of the
>   referenced commit)
> shortcut 2: if the above can be tolerated widely, this is certainly
> for local development only: recompile pacemaker with
>   CPPFLAGS=-DQB_KILL_ATTRIBUTE_SECTION
>
> Hope this helps.
>
> -- 
> Jan (Poki)
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Error when linking to libqb in shared library

2018-02-11 Thread Kristoffer Grönlund

Hi everyone,

(and especially the libqb developers)

I started hacking on a python library written in C which links to
pacemaker, and so to libqb as well, but I'm encountering a strange
problem which I don't know how to solve.

When I try to import the library in python, I see this error:

--- command ---
PYTHONPATH='/home/krig/projects/work/libpacemakerclient/build/python' 
/usr/bin/python3 
/home/krig/projects/python-pacemaker/build/../python/clienttest.py
--- stderr ---
python3: utils.c:66: common: Assertion `"implicit callsite section is 
observable, otherwise target's and/or libqb's build is at fault, preventing 
reliable logging" && work_s1 != NULL && work_s2 != NULL' failed.
---

This appears to be coming from the following libqb macro:

https://github.com/ClusterLabs/libqb/blob/master/include/qb/qblog.h#L352

There is a long comment above the macro which if nothing else tells me
that I'm not the first person to have issues with it, but it doesn't
really tell me what I'm doing wrong...

Does anyone know what the issue is, and if so, what I could do to
resolve it?

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology

2018-01-17 Thread Kristoffer Grönlund

Ken Gaillot  writes:

>
> I can see the point, but I do like having  separate.
>
> A clone with a single instance is not identical to a primitive. Think
> of building a cluster, starting with one node, and configuring a clone
> -- it has only one instance, but you wouldn't expect it to show up as a
> primitive in status displays.
>
> Also, there are a large number of clone meta-attributes that aren't
> applicable to simple primitives. By contrast, master adds only two
> attributes to clones.

I'm not convinced by either argument. :)

The distinction between single-instance clone and primitive is certainly
not clear to me, and there is no problem for status displays to display
a resource with a single replica differently from a resource that isn't
configured to be replicated.

The number of meta-attributes related to clones seems irrelevant as
well, pacemaker can reject a configuration that sets clone-related
attributes for non-clone resources just as well as if they were on a
different node in the XML.

>
> From the XML perspective, I think the current approach is logically
> structured, a  wrapped around a  or , each
> with its own meta-attributes.

Well, I guess it's a matter of opinion. For me, I don't think it is very
logical at all. For example, the result of having the hierarchy of nodes
is that it is possible to configure target-role for both the wrapped
 and the container:

Then edit the configuration removing the clone, save, and the resource
starts when it should have been stopped.

It's even worse in the case of a clone wrapping a group holding
clones of resources, in which case there can be four levels of attribute
inheritance -- and this applies to both meta attributes and instance
attributes.

Add to that the fact that there can be multiple sets of instance
attributes and meta attributes for each of these with rule expressions
and implicit precedence determining which set actually applies...

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology

2018-01-17 Thread Kristoffer Grönlund

Ken Gaillot  writes:

>
> For Pacemaker 2, I'd like to replace the  resource type with
> . (The old syntax would be transparently
> upgraded to the new one.) The role names themselves are not likely to
> be changed in that time frame, as they are used in more external pieces
> such as notification variables. But it would be the first step.
>
> I hope that this will be an uncontroversial change in the ClusterLabs
> community, but because such changes have been heated elsewhere, here is
> why this change is desirable:
>

I agree 100% about this change. In Hawk, we've already tried to hide the
Master/Slave terms as much as possible and replace them with
primary/secondary and "Multi-state", but I'm happy to converge on common
terms.

I'm partial to "Promoted" and "Started" since it makes it clearer that
the secondary state is a base state and that it's the promoted state
which is different / special.

However, can I throw a wrench in the machinery? When replacing the
 resource type with , why not go a step
further and merge both  and  with the basic ?

 => clone
 => master

or for groups,

I have never understood the usefulness of separate meta-attribute sets
for the  and  nodes.

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Changes coming in Pacemaker 2.0.0

2018-01-11 Thread Kristoffer Grönlund

Jehan-Guillaume de Rorthais  writes:

>
> For what is worth, while using crmsh, I always have to explain to
> people or customers that:
>
> * we should issue an "unmigrate" to remove the constraint as soon as the
>   resource can get back to the original node or get off the current node if
>   needed (depending on the -inf or +inf constraint location issued)
> * this will not migrate back the resource if it's sticky enough on the current
>   node. 
>
> See:
> http://clusterlabs.github.io/PAF/Debian-8-admin-cookbook.html#swapping-master-and-slave-roles-between-nodes
>
> This is counter-intuitive, indeed. I prefer the pcs interface using
> the move/clear actions.

No need! You can use crm rsc move / crm rsc clear. In fact, "unmove" is
just a backwards-compatibility alias for clear in crmsh.

Cheers,
Kristoffer

>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Changes coming in Pacemaker 2.0.0

2018-01-11 Thread Kristoffer Grönlund

Andrei Borzenkov  writes:

> On Thu, Jan 11, 2018 at 10:54 AM, Ulrich Windl
>  wrote:
>> Hi!
>>
>> On the tool changes, I'd prefer --move and --un-move as pair over --move and 
>> --clear ("clear" is less expressive IMHO).
>
> --un-move is really wrong semantically. You do not "unmove" resource -
> you "clear" constraints that were created. Whether this actually
> results in any "movement" is unpredictable (easily).
>
> Personally I find lack of any means to change resource state
> non-persistently one of major usability issue with pacemaker comparing
> with other cluster stacks. Just a small example:
>
> I wanted to show customer how "maintenance-mode" works. After setting
> maintenance-mode=yes for the cluster we found that database was
> mysteriously restarted after being stopped manually. It took quite
> some time to find out that couple of weeks ago "crm resource manager"
> followed by "crm resource unmanage" was run for this resource - which
> left explicit "managed=yes" on resource which took precedence over
> "maintenance-mode".
>
> Not only is this asymmetrical and non-intuitive. There is no way to
> distinguish temporary change from permanent one. Moving resources is
> special-cased but for any change that involves setting resource
> (meta-)attributes this approach is not possible. Attribute is there,
> and we do not know why it was set.

The problem is really that the configuration is declarative and that in
the declarative configuration there is a hierarchy of attributes that
combine in more or less obvious ways. There is no way to retain that and
not create pitfalls. At least the CIB is not CSS...

In this case, the place where things went wrong was when crmsh left
"managed=yes" in place instead of relying on the default and just
unsetting the managed attribute.

Though there's a similar confusion when setting target-role - the
command line gives the impression of imperative commands; "start this,
stop that" while the actual instructions issued to pacemaker are
declarative. It gets especially tricky when target-role is set on a
group as well as on individual resources in the group.

Unhelpful perhaps, but in my opinion, the CIB makes it very difficult to
answer even simple questions like "what value does this attribute really
have", and for very marginal benefit. If it were up to me, rule
expressions, op_defaults, rsc_defaults, nested resources (group,
master, clone) and multiple meta_attribute/attribute elements for single
resources would all be deleted. The only real valid case I can see for
rule expressions is for configuring different attribute values for
different nodes - and that would be better achieved by fetching the
value from a distributed database which handles that part. Having such a
database would also enable things like private data / passwords to be
kept out of the CIB.

Cheers,
Kristoffer

>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Cluster IP that "supports" two subnets !?

2018-01-08 Thread Kristoffer Grönlund

Zarko Dudic  writes:

> Hi there, I'd like to setup a cluster, with two nodes, but on two 
> different sub-nets (nodes are in two different cities). Notes are 
> running Oracle Linux 7.4 and so far I have both them running and cluster 
> software have been installed and configured.
>
> Well, next is to add a resources and I'd like to start with ClusterIP, 
> and seems it's straightforward if nodes are on same subnet, which is not 
> my case. First of all is it possible to accomplish what I want, and if 
> yes, I'd appreciate to hear some suggestions. Thanks a lot.

Hi,

I'm not sure I understand the question so my answer may be off the
mark.

An IP address is intrinsically part of a particular subnet, so how would
managing an IP address across separate subnets work? Or do you mean to
manage an IP address from a third subnet mapped to both locations? This
second option is indeed possible using the regular IP resources, it is
more of a network setup problem.

Another option would be to manage DNS records across subnets. This is
possible using the dnsupdate resource.

Yet a third option would be to access the resources through a proxy, but
then availability is of course limited to the availability of the
proxy and network between proxy and the active site.

Cheers,
Kristoffer

>
>
> -- 
> Thanks,
> Zarko
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] crmsh resource failcount does not appear to work

2017-12-27 Thread Kristoffer Grönlund

Andrei Borzenkov  writes:

> As far as I can tell, pacemaker acts on failcount attributes qualified
> by operation name, while crm sets/queries unqualified attribute; I do
> not see any syntax to set fail-count for specific operation in crmsh.

crmsh uses crm_attribute to get the failcount. It could be that this
usage has stopped working as of 1.1.17..

Cheers,
Kristoffer

>
> ha1:~ # rpm -q crmsh
> crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch
> ha1:~ # crm_mon -1rf
> Stack: corosync
> Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with quorum
> Last updated: Sun Dec 24 10:55:54 2017
> Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on ha2
>
> 2 nodes configured
> 4 resources configured
>
> Online: [ ha1 ha2 ]
>
> Full list of resources:
>
>  stonith-sbd  (stonith:external/sbd): Started ha1
>  rsc_dummy_1  (ocf::pacemaker:Dummy): Started ha2
>  Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
>  Masters: [ ha1 ]
>  Slaves: [ ha2 ]
>
> Migration Summary:
> * Node ha2:
> * Node ha1:
> ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state
> ha1:~ # crm_failcount -G -r rsc_Stateful_1
> scope=status  name=fail-count-rsc_Stateful_1 value=1
> ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> scope=status  name=fail-count-rsc_Stateful_1 value=0
> ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4
> ha1:~ # crm_failcount -G -r rsc_Stateful_1
> scope=status  name=fail-count-rsc_Stateful_1 value=1
> ha1:~ # crm resource failcount rsc_Stateful_1 show ha1
> scope=status  name=fail-count-rsc_Stateful_1 value=4
> ha1:~ # cibadmin -Q | grep fail-count
>id="status-1084752129-fail-count-rsc_Stateful_1.monitor_1"
> name="fail-count-rsc_Stateful_1#monitor_1" value="1"/>
>name="fail-count-rsc_Stateful_1" value="4"/>
> ha1:~ #
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: questions about startup fencing

2017-12-04 Thread Kristoffer Grönlund

Tomas Jelinek  writes:

>> 
>> * how is it shutting down the cluster when issuing "pcs cluster stop --all"?
>
> First, it sends a request to each node to stop pacemaker. The requests 
> are sent in parallel which prevents resources from being moved from node 
> to node. Once pacemaker stops on all nodes, corosync is stopped on all 
> nodes in the same manner.
>
>> * any race condition possible where the cib will record only one node up 
>> before
>>the last one shut down?
>> * will the cluster start safely?

That definitely sounds racy to me. The best idea I can think of would be
to set all nodes except one in standby, and then shutdown pacemaker
everywhere...

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund

Adam Spiers  writes:

>
> OK, so reading between the lines, if we don't want our cluster's
> latest config changes accidentally discarded during a complete cluster
> reboot, we should ensure that the last man standing is also the first
> one booted up - right?

That would make sense to me, but I don't know if it's the only
solution. If you separately ensure that they all have the same
configuration first, you could start them in any order I guess.

>
> If so, I think that's a perfectly reasonable thing to ask for, but
> maybe it should be documented explicitly somewhere?  Apologies if it
> is already and I missed it.

Yeah, maybe a section discussing both starting and stopping a whole
cluster would be helpful, but I don't know if I feel like I've thought
about it enough myself. Regarding the HP Service Guard commands that
Ulrich Windl mentioned, the very idea of such commands offends me on
some level but I don't know if I can clearly articulate why. :D

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund

Adam Spiers  writes:

> Kristoffer Gronlund  wrote:
>>Adam Spiers  writes:
>>
>>> - The whole cluster is shut down cleanly.
>>>
>>> - The whole cluster is then started up again.  (Side question: what
>>>   happens if the last node to shut down is not the first to start up?
>>>   How will the cluster ensure it has the most recent version of the
>>>   CIB?  Without that, how would it know whether the last man standing
>>>   was shut down cleanly or not?)
>>
>>This is my opinion, I don't really know what the "official" pacemaker
>>stance is: There is no such thing as shutting down a cluster cleanly. A
>>cluster is a process stretching over multiple nodes - if they all shut
>>down, the process is gone. When you start up again, you effectively have
>>a completely new cluster.
>
> Sorry, I don't follow you at all here.  When you start the cluster up
> again, the cluster config from before the shutdown is still there.
> That's very far from being a completely new cluster :-)

You have a new cluster with (possibly fragmented) memories of a previous
life ;)

>
> Yes, exactly.  If the first node to start up was not the last man
> standing, the CIB history is effectively being forked.  So how is this
> issue avoided?
>
>>The only way to bring up a cluster from being completely stopped is to
>>treat it as creating a completely new cluster. The first node to start
>>"creates" the cluster and later nodes join that cluster.
>
> That's ignoring the cluster config, which persists even when the
> cluster's down.

There could be a command in pacemaker which resets a set of nodes to a
common known state, basically to pick the CIB from one of the nodes as
the survivor and copy that to all of them. But in the end, that's just
the same thing as just picking one node as the first node, and telling
the others to join that one and to discard their configurations. So,
treating it as a new cluster.

>
> But to be clear, you picked a small side question from my original
> post and answered that.  The main questions I had were about startup
> fencing :-)

I did! :)

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund

Adam Spiers  writes:

> - The whole cluster is shut down cleanly.
>
> - The whole cluster is then started up again.  (Side question: what
>   happens if the last node to shut down is not the first to start up?
>   How will the cluster ensure it has the most recent version of the
>   CIB?  Without that, how would it know whether the last man standing
>   was shut down cleanly or not?)

This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.

When starting up, how is the cluster, at any point, to know if the
cluster it has knowledge of is the "latest" cluster? The next node could
have a newer version of the CIB which adds yet more nodes to the
cluster.

The only way to bring up a cluster from being completely stopped is to
treat it as creating a completely new cluster. The first node to start
"creates" the cluster and later nodes join that cluster.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] How much cluster-glue support is still needed in Pacemaker?

2017-11-17 Thread Kristoffer Grönlund

Ken Gaillot  writes:

> We're starting work on Pacemaker 2.0, which will remove support for the
> heartbeat stack.
>
> cluster-glue was traditionally associated with heartbeat. Do current
> distributions still ship it?
>
> Currently, Pacemaker uses cluster-glue's stonith/stonith.h to support
> heartbeat-class stonith agents via the fence_legacy agent. If this is
> still widely used, we can keep this support.
>
> Pacemaker also checks for heartbeat/glue_config.h and uses certain
> configuration values there in favor of Pacemaker's own defaults (e.g.
> the value of HA_COREDIR instead of /var/lib/pacemaker/cores). Does
> anyone still use the cluster-glue configuration for such things? If
> not, I'd prefer to drop this.

Hi Ken,

We're still shipping it, but mostly only for the legacy agents which we
still use - although we aim to phase them out in favor of fence-agents.

I would say that if you can keep the fence_legacy agent intact, dropping
the rest is OK.

Cheers,
Kristoffer

> -- 
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker 1.1.18 Release Candidate 4

2017-11-03 Thread Kristoffer Grönlund

Ken Gaillot  writes:

> I decided to do another release candidate, because we had a large
> number of changes since rc3. The fourth release candidate for Pacemaker
> version 1.1.18 is now available at:
>
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.18-
> rc4
>
> The big changes are numerous scalability improvements and bundle fixes.
> We're starting to test Pacemaker with as many as 1,500 bundles (Docker
> containers) running on 20 guest nodes running on three 56-core physical
> cluster nodes.

Hi Ken,

That's really cool. What's the size of the CIB with that kind of
configuration? I guess it would compress pretty well, but still.

Cheers,
Kristoffer

>
> For details on the changes in this release, see the ChangeLog.
>
> This is likely to be the last release candidate before the final
> release next week. Any testing you can do is very welcome.
> -- 
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Azure Resource Agent

2017-09-18 Thread Kristoffer Grönlund

AZ_ENABLED"
> fi
>
> #--set the ipconfig name
> AZ_IPCONFIG_NAME="ipconfig-""$OCF_RESKEY_ip"
> logIt "debug1: AZ_IPCONFIG_NAME=$AZ_IPCONFIG_NAME"
>
> #--get the resource group name
> AZ_RG_NAME=$(az group list|grep name|cut -d":" -f2|sed "s/  *//g"|sed 
> "s/\"//g"|sed "s/,//g")
> if [ -z "$AZ_RG_NAME" ]
> then
> logIt "could not determine the Azure resource group name"
> exit $OCF_ERR_GENERIC
> else
> logIt "debug1: AZ_RG_NAME=$AZ_RG_NAME"
> fi
>
> #--get the nic name
> AZ_NIC_NAME=$(az vm nic list -g $AZ_RG_NAME --vm-name $MY_HOSTNAME|grep 
> networkInterfaces|cut -d"/" -f9|sed "s/\",//g")
> if [ -z "$AZ_NIC_NAME" ]
> then
> echo "could not determine the Azure NIC name"
> exit $OCF_ERR_GENERIC
> else
> logIt "debug1: AZ_NIC_NAME=$AZ_NIC_NAME"
> fi
>
> #--get the vnet and subnet names
> R=$(az network nic show --name $AZ_NIC_NAME --resource-group $AZ_RG_NAME|grep 
> -i subnets|head -1|sed "s/  */ /g"|cut -d"/" -f9,11|sed "s/\",//g")
> LDIFS=$IFS
> IFS="/"
> R_ARRAY=( $R )
> AZ_VNET_NAME=${R_ARRAY[0]}
> AZ_SUBNET_NAME=${R_ARRAY[1]}
> if [ -z "$AZ_VNET_NAME" ]
> then
> logIt "could not determine Azure vnet name"
> exit $OCF_ERR_GENERIC
> else
> logIt "debug1: AZ_VNET_NAME=$AZ_VNET_NAME"
> fi
> if [ -z "$AZ_SUBNET_NAME" ]
> then
> logIt "could not determine the Azure subnet name"
> exit $OCF_ERR_GENERIC
> else
> logIt "debug1: AZ_SUBNET_NAME=$AZ_SUBNET_NAME"
> fi
>
> ##
> #  Actions
> ##
>
> case $__OCF_ACTION in
> meta-data) meta_data
> RC=$?
> ;;
> usage|help)   azip_usage
> RC=$?
> ;;
> start) azip_start
> RC=$?
> ;;
> stop) azip_stop
> RC=$?
> ;;
> status)  azip_query
> RC=$?
> ;;
> monitor)  azip_monitor
> RC=$?
> ;;
> validate-all);;
> *)azip_usage
> RC=$OCF_ERR_UNIMPLEMENTED
> ;;
> esac
>
> #--exit with return code
> logIt "debug1: exiting $SCRIPT_NAME with code $RC"
> exit $RC
>
> #--end
>
> --
> Eric Robinson
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0

2017-09-14 Thread Kristoffer Grönlund

Jehan-Guillaume de Rorthais  writes:

>> Planning to move this under the Clusterlabs github group?
>
> Yes!
>
> I'm not sure how long and how many answers I should wait for to reach a
> community agreement. But first answers are encouraging :)

Regarding your concerns with submitting it into resource-agents, I would
say that moving into ClusterLabs/ as a separate repository at first
makes sense to me as well. We can look at including it in
resource-agents and the implications of supporting various
language-libraries for OCF agents later.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Moving PAF to clusterlabs ?

2017-09-08 Thread Kristoffer Grönlund

Jehan-Guillaume de Rorthais  writes:

> Hi All,
>
> I am currently thinking about moving the RA PAF (PostgreSQL Automatic 
> Failover)
> out of the Dalibo organisation on Github. Code and website.

[snip]

> Note that part of the project (some perl modules) might be pushed to
> resource-agents independently, see [2]. Two years after, I'm still around on
> this project. Obviously, I'll keep maintaining it on my Dalibo's and personal
> time.
>
> Thoughts?

Hi,

I for one would be happy to see it included in the resource-agents
repository. If people are worried about the additional dependency on
perl, we can just add a --without-perl flag (or something along those
lines) to the Makefile.

We already have different agents for the same application but with
different contexts so this wouldn't be anything new.

Cheers,
Kristoffer

>
> [1] http://lists.clusterlabs.org/pipermail/developers/2015-August/66.html
> [2] http://lists.clusterlabs.org/pipermail/developers/2015-August/68.html
>
> Regards,
> -- 
> Jehan-Guillaume de Rorthais
> Dalibo
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Oh how we've grown! :D

2017-09-08 Thread Kristoffer Grönlund

Digimer  writes:

> Here are the attendee pictures from 2015 and from this summit today.
>
> So amazing to see how far our community has come. I am stoked to see how
> much larger we are still in 2019!
>

A huge thank you again to everyone! You are all awesome.

Cheers,
Kristoffer

>
>
>
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clusterlabs Summit: Presentation material

2017-09-07 Thread Kristoffer Grönlund

Hi everyone,

I got some requests to provide the slides for the presentations at the
summit, and I thought that the best solution is probably to do what some
presenters already did on the Trello board: For those of you who have
slides to share, please attach them to the card of your presentation at
on the Trello board:

https://trello.com/b/LNUrtV1Q/clusterlabs-summit-2017

There's also a link to the group photo on the plan wiki now:

http://plan.alteeve.ca/index.php/Main_Page

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clusterlabs Summit: Expect rain tomorrow

2017-09-05 Thread Kristoffer Grönlund

Hey everyone!

I am going to try to be at the event area at 8 in the morning tomorrow,
and I wouldn't recommend showing up earlier than that. The doors will
probably be locked. The summit itself is scheduled to start at 9.

Unfortunately it seems we can expect rain tomorrow, so I wanted to send
out a small warning: In case you haven't brought an umbrella or rain
gear, now is the time to go out and get it.

For anyone needing to take a taxi, the number is +49 (0911) 19 410, or
the reception here at the SUSE office can help call a taxi as
well. It is also possible to take the U-bahn to Maxfeld station, though
unfortunately there is a short walk to the office even then.

Cheers and welcome,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker in Azure

2017-08-25 Thread Kristoffer Grönlund

Eric Robinson  writes:

> Hi Kristoffer --
>
> If you would be willing to share your AWS ip control agent(s), I think those 
> would be very helpful to us and the community at large. I'll be happy to 
> share whatever we come up with in terms of an Azure agent when we're all done.

I meant the agents that are in resource-agents already:

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/awsvip
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/awseip
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/aws-vpc-route53

You'll probably also be interested in fencing: There are agents for
fencing both on AWS and Azure in the fence-agents repository.

Cheers,
Kristoffer

>
> --
> Eric Robinson
>
> -Original Message-
> From: Kristoffer Grönlund [mailto:kgronl...@suse.com] 
> Sent: Friday, August 25, 2017 3:16 AM
> To: Eric Robinson ; Cluster Labs - All topics 
> related to open-source clustering welcomed 
> Subject: Re: [ClusterLabs] Pacemaker in Azure
>
> Eric Robinson  writes:
>
>> I deployed a couple of cluster nodes in Azure and found out right away that 
>> floating a virtual IP address between nodes does not work because Azure does 
>> not honor IP changes made from within the VMs. IP changes must be made to 
>> virtual NICs in the Azure portal itself. Anybody know of an easy way around 
>> this limitation?
>
> You will need a custom IP control agent for Azure. We have a series of agents 
> for controlling IP addresses and domain names in AWS, but there is no agent 
> for Azure IP control yet. (At least as far as I am aware).
>
> Cheers,
> Kristoffer
>
>>
>> --
>> Eric Robinson
>>
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org Getting started: 
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> --
> // Kristoffer Grönlund
> // kgronl...@suse.com

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker in Azure

2017-08-25 Thread Kristoffer Grönlund

Eric Robinson  writes:

> I deployed a couple of cluster nodes in Azure and found out right away that 
> floating a virtual IP address between nodes does not work because Azure does 
> not honor IP changes made from within the VMs. IP changes must be made to 
> virtual NICs in the Azure portal itself. Anybody know of an easy way around 
> this limitation?

You will need a custom IP control agent for Azure. We have a series of
agents for controlling IP addresses and domain names in AWS, but there
is no agent for Azure IP control yet. (At least as far as I am aware).

Cheers,
Kristoffer

>
> --
> Eric Robinson
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clusterlabs Summit - Finding the office

2017-08-25 Thread Kristoffer Grönlund

Hello everyone,

The summit is coming closer, and I thought I should send out a brief
mail about how to find the event area once you are in Nuremberg.

Finding the office
==

The SUSE office is within walking distance from the conference hotel and
the old town center. The closest subway station is the Maxfeld station
on the U3 line.

Google maps link: https://goo.gl/maps/JMzSnv8ZGqF2

If you are coming from the Central Station, take the U3 directly to
Maxfeld (direction Friedrich-Ebert-Platz).

From the airport, take the U2 to Rathenauplatz, then change to U3
(direction Friedrich-Ebert-Platz) and exit at Maxfeld.

Finding the event
=

The summit will take place in the SUSE Event Area at Rollnerstraße
8. This is the same building as the SUSE offices, but it is a separate
ground floor entrace. We will put up posters to make this clear.

The regular SUSE reception is on the 3rd floor, and they have kindly
asked me to direct everyone attending the summit directly to the event
area.

Finding the hotel
=

The main conference hotel is the Sorat Saxx, located on the Hauptmarkt
square in the Nuremberg old town. This is within easy walking distance
from both the Central Station and the SUSE office. The closest subway
station is Lorenzkirche on the U1 line.

Hotel website: https://www.sorat-hotels.com/en/hotel/saxx-nuernberg.html

If you have any questions or concerns, please feel free to contact me.

See you there!
// Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] SLES11 SP4: Strange problem with "(crm configure) commit"

2017-08-21 Thread Kristoffer Grönlund

Ulrich Windl  writes:

> Hi! 
>
> I just had a strange problem: When trying to "clean up" the cib configuration 
> (acually deleting unneded "operations" lines), I failed to commit the change, 
> even through it verified OK:
>
> crm(live)configure# commit
> Call cib_apply_diff failed (-206): Application of an update diff failed
> ERROR: could not patch cib (rc=206)
> INFO: offending xml diff: 

It looks to me (from a cursory glance) like you may be hitting a bug
with the patch generation in pacemaker. But there isn't enough details
to say for sure.

Try running crmsh with the "-dR" command line options to get it to
output the patch it tries to apply to the log.

Cheers,
Kristoffer

>
> In Syslog I see this:
> Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved 
> meta_attributes.14926208 to position 1 instead of 2 (0xe3f0f0)
> Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved 
> meta_attributes.9876096 to position 1 instead of 2 (0xe3c470)
> Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved 
> utilization.10594784 to position 1 instead of 2 (0x96a2b0)
> Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved 
> meta_attributes.11397008 to position 1 instead of 2 (0xacc5b0)
> Aug 21 15:01:48 h02 cib[19397]:  warning: cib_server_process_diff: Something 
> went wrong in compatibility mode, requesting full refresh
> Aug 21 15:01:48 h02 cib[19397]:  warning: cib_process_request: Completed 
> cib_apply_diff operation for section 'all': Application of an update diff 
> failed (rc=-206, origin=local/cibadmin/2, version=1.65.23)
>
> What could be causing this? I think I did the same change about three years 
> ago without problem (with different software, of course).
>
> # rpm -q pacemaker corosync crmsh
> pacemaker-1.1.12-18.1
> corosync-1.4.7-0.23.5
> crmsh-2.1.2+git132.gbc9fde0-18.2
> (latest)
>
> Regards,
> Ulrich
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-10 Thread Kristoffer Grönlund

"Lentes, Bernd"  writes:

> In both cases i'm inside crmsh.
> The difference is that i always enter the complete command from the highest 
> level of crm. This has the advantage that i can execute any command from the 
> history directly.
> And this has a kind of autocommit.
> If i would enter a lower level, then my history is less useless. I always 
> have to go to the respective level before executing the command from the 
> history.
> But then i have to commit.
> Am i the only one who does it like this ? Nobody stumbled across this ?
> I always wondered about my ineffective commit, but never got the idea that 
> such a small difference is the reason.

You are right, this is a quirk of crmsh: Each level has its own "state",
and exiting the level triggers a commit. Running a command like
"configure primitive ..." results internally in three movements;

* enter the configure level: This fetches the CIB and checks that it is writable
* create the primitive: This updates the internal copy of the CIB
* exit the configure level: This creates, verifies and applies a patch to the 
CIB

I can't speak for others, but somehow this has never caused me problems
as far as I can remember. Either I have been using it interactively from
within the configure section, or I have been running commands from
bash. I can't recall if that's because I was told at some point or if it
was made clear in the documentation somewhere.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clusterlabs Summit 2017: Please register!

2017-08-09 Thread Kristoffer Grönlund

Hi everyone,

This mail is for attendees of the Clusterlabs Summit event in Nuremberg,
September 6-7 2017. If it didn't arrive via the Clusterlabs mailing
list and you're not going but got this mail anyway, please let me know
since apparently I have you on my list of possible attendees ;)

Apologies for springing this on you at such a late stage, but as we are
investigating dinner options, making badges and making sure there are
enough chairs for everyone at the event, it became more and more clear
that it would be very useful to have a better grasp of how many people
are coming to the event.

URL to sign up
--

https://www.eventbrite.com/e/clusterlabs-summit-2017-dinner-tickets-3689052

To make it as easy as possible, I created an event on Eventbrite for
this purpose. Signing up is not a requirement! However, it would be
great if you could send an email to me confirming your attendance
regardless, in case you are unhappy about using Eventbrite.

Also, it would be great if you could register as quickly as possible so
that we can make dinner reservations early enough to hopefully be able
to fit everyone into one space.

Thank you,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] big trouble with a DRBD resource

2017-08-06 Thread Kristoffer Grönlund

"Lentes, Bernd"  writes:

> Hi,
>
> first: is there a tutorial or s.th. else which helps in understanding what 
> pacemaker logs in syslog and /var/log/cluster/corosync.log ?
> I try hard to find out what's going wrong, but they are difficult to 
> understand, also because of the amount of information.
> Or should i deal more with "crm histroy" or hb_report ?

I like to use crm history log to get the logs from all the nodes in a
single flow, but it depends quite a bit on configuration what gets
logged where..

>
> What happened:
> I tried to configure a simple drbd resource following 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296
> I used this simple snip from the doc:
> configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \
> op monitor interval=60s

I'll try to sum up the issues I see, from a glance:

* The drbd resource is a multi-state / master-slave resource, which is
  technically a variant of a clone resource where different clones can
  either be in a primary or secondary state. To configure it correctly,
  you'll need to create a master resource as well. Doing this with a
  single command is unfortunately a bit painful. Either use crm
  configure edit, or the interactive crm mode (with a verify / commit
  after creating both the primitive and the master resources).

* You'll need to create monitor operations for both the master and slave
  roles, as you note below, and set explicit timeouts for all
  operations.

* Make sure the wwwdata DRBD resource exists, is accessible from both
  nodes, and is in a good state to begin with (that is, not
  split-brained).

I would recommend following one of the tutorials provided by Linbit
themselves which show how to set this stuff up correctly, since it is
quite a bit involved.

> Btw: is there a history like in the bash where i see which crm command i 
> entered at which time ? I know that crm history is mighty, but didn't find 
> that.

We don't have that yet :/ If you're not in interactive mode, your bash
history should have the commands though.

> no backup - no mercy

lol ;)

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Kristoffer Grönlund

Ulrich Windl  writes:

>
> See my proposal above. ;-)

Hmm, yes. It's a possibility. Magic values rarely end up making things
simpler though :/

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Kristoffer Grönlund

Ulrich Windl  writes:

>
> What aout this priority for newly added resources:?
> 1) Use the value specified explicitly
> 2) Use the value the RA's metadata specifies
> 3) Use the global default
>
> With "use" I mean "add it to the RA configuration".

Yeah, I've considered it. The main issue I see with making the change to
crmsh now is that it would also be confusing, when configuring a
resource without any operations and getting operations defined
anyway. Also, it would be impossible not to define operations that have
defaults in the metadata.

One idea might be to have a new command which inserts missing operations
and operation timeouts based on the RA metadata.

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Kristoffer Grönlund

"Lentes, Bernd"  writes:

> Hi,
>
> i'm wondering from where the default values for operations of a resource come 
> from.

[snip]

>
> Is it hardcoded ? All timeouts i found in my config were explicitly related 
> to a dedicated resource.
> What are the values for the hardcoded defaults ?
>
> Does that also mean that what the description of the RA says as "default" 
> isn't a default, but just a recommendation ?

The default timeout is set by the default-action-timeout property, and
the default value is 20s.

You are correct, the timeout values defined in the resource agent are
not used automatically. They are recommended minimums, and the
thought as I understand it (this predates my involvement in HA) is that
any timeouts need to be reviewed carefully by the administrator.

I agree that it is somewhat surprising.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clusterlabs Summit 2017 (Sept. 6-7 in Nuremberg) - One month left!

2017-08-01 Thread Kristoffer Grönlund

Hey everyone!

Here's a quick update for the upcoming Clusterlabs Summit at the SUSE
office in Nuremberg in September:

The time to register for the pool of hotel rooms has now expired - we
have sent the final list of names to the hotel. There may still be hotel
rooms available at the Sorat Saxx or other hotels in Nuremberg, so if
anyone missed the deadline and still needs a room, either contact me or
feel free to contact the hotel directly. The same goes for any changes,
for those who have reservations: Please either contact me, or contact
the hotel directly at i...@saxx-nuernberg.de.

The schedule is being sorted out right now, and the planning wiki will
be updated with a preliminary schedule soon. If there is anyone who
would like to present on a topic or would like to discuss a topic that
isn't on the wiki yet, now is the time to add it there.

Other than that, I don't have any other remarks, other than to wish
everyone welcome to Nuremberg in a month! Feel free to contact me with
any concerns or issues related to the summit, and I'll do what I can to
help out.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [ClusterLabs Developers] [HA/ClusterLabs Summit] Key-Signing Party, 2017 Edition

2017-07-24 Thread Kristoffer Grönlund

Jan Pokorný  writes:

> [ Unknown signature status ]
> Hello cluster masters :-)
>
> as there's little less than 7 weeks left to "The Summit" meetup
> (<http://plan.alteeve.ca/>), it's about time to get the ball
> rolling so we can voluntarily augment the digital trust amongst
> us the attendees, on OpenGPG basis.
>
> Doing that, we'll actually establish a tradition since this will
> be the second time such event is being kicked off (unlike the birds
> of the feather gathering itself, was edu-feathered back then):
>
>   <https://people.redhat.com/jpokorny/keysigning/2015-ha/>
>   <http://lists.linux-ha.org/pipermail/linux-ha/2015-January/048507.html>
>
> If there are no objections, yours truly will conduct this undertaking.
> (As an aside, I am toying with an idea of optimizing the process
> a bit now that many keys are cross-signed already; I doubt there's
> a value of adding identical signatures just with different timestamps,
> unless, of course, the inscribed level of trust is going to change,
> presumably elevate -- any comments?)

Hi Jan,

No objections from me, thank you for taking charge of this!

Cheers,
Kristoffer


>
> * * *
>
> So, going to attend summit and want your key signed while reciprocally
> spreading the web of trust?
> Awesome, let's reuse the steps from the last time:
>
> Once you have a key pair (and provided that you are using GnuPG),
> please run the following sequence:
>
> # figure out the key ID for the identity to be verified;
> # IDENTITY is either your associated email address/your name
> # if only single key ID matches, specific key otherwise
> # (you can use "gpg -K" to select a desired ID at the "sec" line)
> KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5)
>
> # export the public key to a file that is suitable for exchange
> gpg --export -a -- $KEY > $KEY
>
> # verify that you have an expected data to share
> gpg --with-fingerprint -- $KEY
>
> with IDENTITY adjusted as per the instruction above, and send me the
> resulting $KEY file, preferably in a signed (or even encrypted[*]) email
> from an address associated with that very public key of yours.
>
> Timeline?
> Please, send me your public keys *by 2017-09-05*, off-list and
> best with [key-2017-ha] prefix in the subject.  I will then compile
> a list of the attendees together with their keys and publish it at
> <https://people.redhat.com/jpokorny/keysigning/2017-ha/>
> so it can be printed beforehand.
>
> [*] You can find my public key at public keyservers:
> <http://pool.sks-keyservers.net/pks/lookup?op=vindex&search=0x60BCBB4F5CD7F9EF>
> Indeed, the trust in this key should be ephemeral/one-off
> (e.g. using a temporary keyring, not a universal one before we
> proceed with the signing :)
>
> * * *
>
> Thanks for your cooperation, looking forward to this side stage
> (but nonetheless important if release or commit[1] signing is to get
> traction) happening and hope this will be beneficial to all involved.
>
> See you there!
>
>
> [1] for instance, see:
> <https://github.com/blog/2144-gpg-signature-verification>
> <https://pagure.io/pagure/issue/885>
>
> -- 
> Jan (Poki)
> ___
> Developers mailing list
> develop...@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/developers

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] crmsh: Release 3.0.1

2017-07-21 Thread Kristoffer Grönlund

Hello everyone!

I'm happy to announce the release of crmsh version 3.0.1 today. This
is mainly a bug fix release, so no new exciting features and mainly
fixes to the new bootstrap functionality added in 3.0.0.

I would also like to take the opportinity to introduce a new core
developer for crmsh, Xin Liang! For this release he has contributed
some of the bug fixes discovered, but he has also contributed a
rewrite of hb_report into Python, as well as worked on improving the
tab completion support in crmsh. I also want to recognize the hard
work of Shiwen Zhang who initially started the work of rewriting the
hb_report script in Python.

For the complete list of changes in this release, see the ChangeLog:

* https://github.com/ClusterLabs/crmsh/blob/3.0.1/ChangeLog

The source code can be downloaded from Github:

* https://github.com/ClusterLabs/crmsh/releases/tag/3.0.1

This version of crmsh (or a version very close to it) is already
available in openSUSE Tumbleweed, and packages for several popular
Linux distributions will be available from the Stable repository at
the OBS:

* http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

Archives of the tagged release:

* https://github.com/ClusterLabs/crmsh/archive/3.0.1.tar.gz
* https://github.com/ClusterLabs/crmsh/archive/3.0.1.zip

As usual, a huge thank you to all contributors and users of crmsh!

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Introducing the Anvil! Intelligent Availability platform

2017-07-10 Thread Kristoffer Grönlund

Digimer  writes:

> Hi all,
>
>   I suspect by now, many of you here have heard me talk about the Anvil!
> intelligent availability platform. Today, I am proud to announce that it
> is ready for general use!
>
> https://github.com/ClusterLabs/striker/releases/tag/v2.0.0
>

Cool, congratulations!

Cheers,
Kristoffer

>
>   Now, time to start working full time on version 3!
>
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Kristoffer Grönlund

Eric Robinson  writes:

>> If you're looking to run without support, you can run openSUSE Leap - it's 
>> the
>> closest equivalent to centOS in the SUSE world and the HA packages are all in
>> there.
>> 
>
> Out of curiosity, do the openSUSE Leap repos and packages work with SLES? 

I know that there are some base system differences that could cause
problems, things like Leap using systemd/journald for logging while SLES
is still logging via syslog-ng (IIRC)... so it's possible that you could
get into problems if you mix versions. And adding the Leap repositories
to SLES will probably mess things up since both deliver slightly
different versions of the base system.

For SLES, there's now the Package Hub which has open source packages
taken from Leap and confirmed not to conflict with SLES, so you can mix
a supported base system with unsupported open source packages with less
risk for breaking anything:

https://packagehub.suse.com/

Cheers,
Kristoffer

>
> --Eric

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Kristoffer Grönlund

Eric Robinson  writes:

> We've been a Red Hat/CentOS shop for 10+ years and have installed 
> Corosync+Pacemaker+DRBD dozens of times using the repositories, all for free.
>
> We are now trying out our first SLES 12 server, and I'm looking for the 
> repos. Where the heck are they? I went looking, and all I can find is the 
> SLES "High Availability Extension," which I must pay $700/year for? No 
> freaking way!
>
> This is Linux we're talking about, right? There's got to be an easy way to 
> install the cluster without paying for a subscription... right?
>
> Someone talk me off the ledge here.
>

If you're looking to run without support, you can run openSUSE Leap -
it's the closest equivalent to centOS in the SUSE world and the HA
packages are all in there.

(I'd recommend the supported version, of course ;)

Cheers,
Kristoffer


> --
> Eric Robinson
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] "Connecting" Pacemaker with another cluster manager

2017-05-23 Thread Kristoffer Grönlund

Timo  writes:

> Hi,
>
> I have a proprietary cluster manager running on a bunch (four) of nodes.
> It decides to run the daemon for which HA is required on its own set of
> (undisclosed) requirements and decisions. This is, unfortunately,
> unavoidable due to business requirements.
>
> However, I have to put also Pacemaker onto the nodes in order to provide
> an additional daemon running in HA mode. (I cannot do this using the
> existing cluster manager, as this is a closed system.)
>
> I have to make sure that the additional daemon (which I plan to
> coordinate using Pacemaker) only runs on the machine where the daemon
> (controlled by the existing, closed cluster manager) runs. I could check
> for local VIPs, for example, to check whether it runs on a node or not.
>
> Is there any way to make Pacemaker "check" for existence of a local
> (V)IP so that I could "connect" both cluster managers?
>
> In short: I need Pacemaker to put the single instance of a daemon
> exactly onto the node the other cluster manager decided to run the
> (primary) daemon.

Hi,

I'm not sure I completely understand the problem description, but if I
parsed it correctly:

What you can do is run an external script which sets a node attribute on
the node that has the external cluster manager daemon, and have a
constraint which locates the additional daemon based on that node
attribute.

Cheers,
Kristoffer

>
> Best regards,
>
> Timo
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-09 Thread Kristoffer Grönlund

"Lentes, Bernd"  writes:

> - On May 8, 2017, at 9:20 PM, Bernd Lentes 
> bernd.len...@helmholtz-muenchen.de wrote:
>
>> Hi,
>> 
>> i remember that digimer often campaigns for a fence delay in a 2-node  
>> cluster.
>> E.g. here: 
>> http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html
>> In my eyes it makes sense, so i try to establish that. I have two HP servers,
>> each with an ILO card.
>> I have to use the stonith:external/ipmi agent, the stonith:external/riloe
>> refused to work.
>> 
>> But i don't have a delay parameter there.
>> crm ra info stonith:external/ipmi:
>> 
>> ...
>> pcmk_delay_max (time, [0s]): Enable random delay for stonith actions and 
>> specify
>> the maximum of random delay
>>This prevents double fencing when using slow devices such as sbd.
>>Use this to enable random delay for stonith actions and specify the 
>> maximum of
>>random delay.
>> ...
>> 
>> This is the only delay parameter i can use. But a random delay does not seem 
>> to
>> be a reliable solution.
>> 
>> The stonith:ipmilan agent also provides just a random delay. Same with the 
>> riloe
>> agent.
>> 
>> How did anyone solve this problem ?
>> 
>> Or do i have to edit the RA (I will get practice in that :-))?
>> 
>> 
>
> crm ra info stonith:external/ipmi says there exists a parameter 
> pcmk_delay_max.
> Having a look in  /usr/lib64/stonith/plugins/external/ipmi i don't find 
> anything about delay.
> Also "crm_resource --show-metadata=stonith:external/ipmi" does not say 
> anything about a delay.
>
> Is this "pcmk_delay_max" not implemented ? From where does "crm ra info 
> stonith:external/ipmi" get this info ?
>

pcmk_delay_max is implemented by Pacemaker. crmsh gets the information
about available parameters by querying stonithd directly.

Cheers,
Kristoffer

>
> Bernd
>  
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clusterlabs Summit 2017 (Nuremberg, 6-7 September) - Hotels and Topics

2017-05-02 Thread Kristoffer Grönlund

Hi everyone!

Here's a quick update on the summit happening at the SUSE office in
Nuremberg on September 6-7.

I am still collecting hotel reservations from attendees. In order to
notify the hotel about how many rooms we actually need, I'll need a
complete list of people who want to attend before 15 June, at the
latest. So if you plan to attend and need a hotel room, let me know as
soon as possible by emailing me! There are 40 hotel rooms reserved,
and about half of those are claimed at this point.

We are starting to have a preliminary list of topics ready. The event
area has a projector and A/V equipment available, so we should be able
to show slides for those wanting to present a particular topic.

This is the current list of topics:

Topic Requester/Presenter Topic

Andrew Beekhof or Ken Gaillot New container "bundle" feature in Pacemaker   
Ken Gaillot   What would Pacemaker 1.2 or 2.0 look like?
Ken Gaillot   Ideas for the OCF resource agent standard 
Klaus Wenninger   Recent work and future plans for SBD  
Chrissie Caulfieldknet and corosync 3   
Chris Feist (requestor)   kubernetes
Chris Feist (requestor)   Multisite (QDevice/Booth) 
Madison Kelly ScanCore and "Intelligent Availability"   
Kristoffer Gronlund,  Hawk, Cluster API and future plans
Ayoub Belarbi

We also have Kai Wagner from the openATTIC team attending, and he has
agreed to present openATTIC. For those who aren't familiar with it,
openATTIC is a storage management tool with some support for managing
things like LVM, DRBD and Ceph.

I am also happy to say that Adam Spiers from the SUSE Cloud team will be
attending the summit, and hopefully I can convince him to present their
work on using Pacemaker with Openstack, the current state of Openstack
HA and perhaps some of his future plans and wishes around HA.

Keep adding topics to the list! We'll work out a rough schedule for
the two days as the event draws nearer, but I'd hope to leave enough
room for deeper discussions around the topics as we work through
them.

As a reminder, the plans for the summit are being collected at the
Alteeve! planning wiki, here:

http://plan.alteeve.ca/index.php/Main_Page

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby

2017-04-24 Thread Kristoffer Grönlund

Ken Gaillot  writes:

> Hi all,
>
> Pacemaker 1.1.17 will have a feature that people have occasionally asked
> for in the past: the ability to start a node in standby mode.
>
> It will be controlled by an environment variable (set in
> /etc/sysconfig/pacemaker, /etc/default/pacemaker, or wherever your
> distro puts them):
>
>
> # By default, nodes will join the cluster in an online state when they first
> # start, unless they were previously put into standby mode. If this
> variable is
> # set to "standby" or "online", it will force this node to join in the
> # specified state when starting.
> # (experimental; currently ignored for Pacemaker Remote nodes)
> # PCMK_node_start_state=default
>
>
> As described, it will be considered experimental in this release, mainly
> because it doesn't work with Pacemaker Remote nodes yet. However, I
> don't expect any problems using it with cluster nodes.
>
> Example use cases:
>
> You want want fenced nodes to automatically start the cluster after a
> reboot, so they contribute to quorum, but not run any resources, so the
> problem can be investigated. You would leave
> PCMK_node_start_state=standby permanently.
>
> You want to ensure a newly added node joins the cluster without problems
> before allowing it to run resources. You would set this to "standby"
> when deploying the node, and remove the setting once you're satisfied
> with the node, so it can run resources at future reboots.
>
> You want a standby setting to last only until the next boot. You would
> set this permanently to "online", and any manual setting of standby mode
> would be overwritten at the next boot.
>
> Many thanks to developers Alexandra Zhuravleva and Sergey Mishin, who
> contributed this feature as part of a project with EMC.

One of those features that seem obvious in retrospect. Great addition,
thanks to everyone involved!

Cheers,
Kristoffer

> -- 
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Wtrlt: Antw: Re: Antw: Re: how important would you consider to have two independent fencing device for each node ?

2017-04-21 Thread Kristoffer Grönlund

Ken Gaillot  writes:

>>> I think it works differently: One task periodically reads ist mailbox slot 
>>> for commands, and once a comment was read, it's executed immediately. Only
>> if 
>>> the read task does hang for a long time, the watchdog itself triggers a
>> reset 
>>> (as SBD seems dead). So the delay is actually made from the sum of "write 
>>> delay", "read delay", "command excution".
>
> I think you're right when sbd uses shared-storage, but there is a
> watchdog-only configuration that I believe digimer was referring to.
>
> With watchdog-only, the cluster will wait for the value of the
> stonith-watchdog-timeout property before considering the fencing successful.

I think there are some important distictions to make, to clarify what
SBD is and how it works:

* The original SBD model uses shared storage as its fencing mechanism
  (thus the name Shared-storage based death) - when talking about
  watchdog-only SBD, a new mode only introduced in a fork of the SBD
  project, it would probably help avoid confusion to be explicit about
  that.

* Watchdog-only SBD relies on quorum to avoid split-brain or fence
  loops, and thus requires at least three nodes or an additional qdevice
  node. This is my understanding, correct me if I am wrong. Also, this
  disqualifies watchdog-sbd from any of Digimers setups since they are
  2-node only, so that's probably something to be aware of in this
  discussion. ;)

* The watchdog fencing in SBD is not the primary fence mechanism when
  shared storage is available. In fact, it is an optional although
  strongly recommended component. [1]

[1]: We (as in SUSE) require use of a watchdog for supported
configurations, but technically it is optional.

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Surprising semantics of location constraints with INFINITY score

2017-04-12 Thread Kristoffer Grönlund

Jehan-Guillaume de Rorthais  writes:

> Hi,
>
>> >>> Kristoffer Grönlund  schrieb am 11.04.2017 um 15:30
>> >>> in  
>> Nachricht <87lgr7kr64@suse.com>:
>> > Hi all,
>> > 
>> > I discovered today that a location constraint with score=INFINITY
>> > doesn't actually restrict resources to running only on particular
>> > nodes. From what I can tell, the constraint assigns the score to that
>> > node, but doesn't change scores assigned to other nodes. So if the node
>> > in question happens to be offline, the resource will be started on any
>> > other node.  
>
> AFAIU, this behavior is expected when you set up your cluster with the Opt-In
> strategy:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_deciding_which_nodes_a_resource_can_run_on
>

No, this is the behavior of an Opt-Out cluster. So it seems you are
under the same misconception as I was. :)

Cheers,
Kristoffer

> -- 
> Jehan-Guillaume de Rorthais
> Dalibo
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Surprising semantics of location constraints with INFINITY score

2017-04-11 Thread Kristoffer Grönlund

Hi all,

I discovered today that a location constraint with score=INFINITY
doesn't actually restrict resources to running only on particular
nodes. From what I can tell, the constraint assigns the score to that
node, but doesn't change scores assigned to other nodes. So if the node
in question happens to be offline, the resource will be started on any
other node.

Example:



If node2 is offline, I see the following:

 dummy  (ocf::heartbeat:Dummy): Started node1
native_color: dummy allocation score on node1: 1
native_color: dummy allocation score on node2: -INFINITY
native_color: dummy allocation score on webui: 0

It makes some kind of sense, but seems surprising - and the
documentation is a bit unclear on the topic. In particular, the
statement that a score = INFINITY means "must" is clearly not correct in
this case. Maybe the documentation should be clarified for location
constraints?

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Rename option group resource id with pcs

2017-04-11 Thread Kristoffer Grönlund

Ulrich Windl  writes:

>>>> Dejan Muhamedagic  schrieb am 11.04.2017 um 11:43 in
> Nachricht <20170411094352.GD8414@tuttle.homenet>:
>> Hi,
>> 
>> On Tue, Apr 11, 2017 at 10:50:56AM +0200, Tomas Jelinek wrote:
>>> Dne 11.4.2017 v 08:53 SAYED, MAJID ALI SYED AMJAD ALI napsal(a):
>>> >Hello,
>>> >
>>> >Is there any option in pcs to rename group resource id?
>>> >
>>> 
>>> Hi,
>>> 
>>> No, there is not.
>>> 
>>> Pacemaker doesn't really cover the concept of renaming a resource.
>> 
>> Perhaps you can check how crmsh does resource rename. It's not
>> impossible, but can be rather involved if there are other objects
>> (e.g. constraints) referencing the resource. Also, crmsh will
>> refuse to rename the resource if it's running.
>
> The real problem in pacemaker (as resources are created now) is that the 
> "IDs" have too much semantic, i.e. most are derived from the resource name 
> (while lacking a name attribute or element), and some required elements are 
> IDs are accessed by ID, and not by name.
>
> Examples:
> 
>value="1.1
> .12-f47ea56"/>
>
> A s and s have no name, but only an ID (it seems).
>
>   
>
> This is redundant: As the  is part of a resource (by XML structure) it's 
> unneccessary to put the name of the resource into the ID of the operation.
>
> It all looks like a kind of abuse of XML IMHO.I think the next CIB format 
> should be able to handle IDs that are free of semantics other than to denote 
> (relatively unique) identity. That is: It should be OK to assign IDs like 
> "i1", "i2", "i3", ... and besides from an IDREF the elements should be 
> accessed by structure and/or name.
>
> (If the ID should be the primary identification feature, flatten all 
> structure and drop all (redundant) names.)

The abuse of ids in the pacemaker schema is a pet peeve of mine; it
would be better to only have ids for nodes where it makes sense: Naming
resources, for example (though I would prefer human-friendly names
rather than ids with loosely defined restrictions). References to
individual XML nodes can be done via XPATH rather than having to assign
ids to every single node in the tree.

Of course, changing it at this point is probably not worth the trouble.

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>> 
>> Thanks,
>> 
>> Dejan
>> 
>>> From
>>> pacemaker's point of view one resource gets removed and another one gets
>>> created.
>>> 
>>> This has been discussed recently:
>>> http://lists.clusterlabs.org/pipermail/users/2017-April/005387.html 
>>> 
>>> Regards,
>>> Tomas
>>> 
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >*/MAJID SAYED/*
>>> >
>>> >/HPC System Administrator./
>>> >
>>> >/King Abdullah International Medical Research Centre/
>>> >
>>> >/Phone:+9661801(Ext:40631)/
>>> >
>>> >/Email:sayed...@ngha.med.sa/
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >This Email and any files transmitted may contain confidential and/or
>>> >privileged information and is intended solely for the addressee(s)
>>> >named. If you have received this information in error, or are being
>>> >posted by accident, please notify the sender by return Email, do not
>>> >redistribute this email message, delete it immediately and keep no
>>> >copies of it. All opinions and/or views expressed in this email are
>>> >solely those of the author and do not necessarily represent those of
>>> >NGHA. Any purchase order, purchase advice or legal commitment is only
>>> >valid once backed by the signed hardcopy by the authorized person from 
>>> >NGHA.
>>> >
>>> >
>>> >___
>>> >Users mailing list: Users@clusterlabs.org 
>>> >http://lists.clusterlabs.org/mailman/listinfo/users 
>>> >
>>> >Project Home: http://www.clusterlabs.org 
>>> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> >Bugs: http://bugs.clusterlabs.org 
>>> >
>>> 
>>> ___
>>> Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-10 Thread Kristoffer Grönlund

Eric Robinson  writes:

>> crm configure show xml c_clust19
>
> Here is what I am entering using crmsh (version 2.0-1):
>
>
> colocation c_clust19 inf: [ p_mysql_057 p_mysql_092 p_mysql_187 ] 
> p_vip_clust19 p_fs_clust19 p_lv_on_drbd0 ms_drbd0:Master
> order o_clust19 inf: ms_drbd0:promote p_lv_on_drbd0 p_fs_clust19 
> p_vip_clust19 [ p_mysql_057 p_mysql_092 p_mysql_187 ]
>
>
> After I save it, I get no errors, but it converts it to this...
>
>
> colocation c_clust19 inf: [ p_mysql_057 p_mysql_092 p_mysql_187 ] ( 
> p_vip_clust19:Master p_fs_clust19:Master p_lv_on_drbd0:Master ) ( 
> ms_drbd0:Master )
> order o_clust19 inf: ms_drbd0:promote ( p_lv_on_drbd0:start 
> p_fs_clust19:start p_vip_clust19:start ) [ p_mysql_057 p_mysql_092 
> p_mysql_187 ]
>
> This looks incorrect to me.
>
> Here is the xml that it generates.
>
> 
>   
> 
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
>   
>   
>   
> 
> 
>   
> 
>   
> 
>
> The resources in set c_clust19-1 should start sequentially, starting with 
> p_lv_on_drbd0 and ending with p_vip_clust19. I also don't understand why 
> p_lv_on_drbd0 and p_vip_clust19 are getting the Master designation. 

Hi,

Yeah, that does indeed look like a bug.. One thing that is confusing and
may be one reason why things get split in an unexpected way is because
as you can see, the role attribute is applied per resource set, while
it looks like it applies per resource in the crmsh syntax. So the shell
does some complex logic to "split" sets based on role assignment.

Cheers,
Kristoffer

>
> --
> Eric Robinson
>
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-09 Thread Kristoffer Grönlund

Eric Robinson  writes:

> Here's the config. I don't know why the CRM put in the parenthesis where it 
> did. That's not the way I typed it. I usually have all my mysql instances 
> between parenthesis and everything else outside.

[ ...]

> colocation c_clust19 inf: ( p_mysql_057 p_mysql_092 p_mysql_187 p_mysql_213 
> p_mysql_250 p_mysql_289 p_mysql_312 p_vip_clust19 p_mysql_702 p_mysql_743 
> p_mysql_745 p_mysql_746 p_fs_clust19 p_lv_on_drbd0 ) ( ms_drbd0:Master )
> colocation c_clust20 inf: p_vip_clust20 p_fs_clust20 p_lv_on_drbd1 
> ms_drbd1:Master
> order o_clust19 inf: ms_drbd0:promote ( p_lv_on_drbd0:start ) ( p_fs_clust19 
> p_vip_clust19 ) ( p_mysql_057 p_mysql_092 p_mysql_187 p_mysql_213 p_mysql_250 
> p_mysql_289 p_mysql_312 p_mysql_702 p_mysql_743 p_mysql_745 p_mysql_746 )

This might be a bug in crmsh: What was the expression you intended to
write, and which version of crmsh do you have?

You can see the resulting XML that crmsh generates and then re-parses
into the line syntax using

crm configure show xml c_clust19

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clusterlabs Summit 2017 - Dates, hotels and other updates

2017-04-06 Thread Kristoffer Grönlund

Hi everyone!

In case anyone missed the previous emails on this topic, I am working on
arranging another Clusterlabs (formerly Linux HA) Summit, this time in
Nuremberg, Germany. Although I know it presents a bit of a problem for
some people, we have now decided on a pair of dates for the summit. The
unfortunate reality is that Nuremberg is very busy during the summer,
and finding dates that have both a location for the summit as well as
sufficient hotel rooms still available presents a challenge.

The summit will take place on September 6-7, 2017, in the brand new SUSE
office event area.

This page has instructions for how to get there:

https://www.suse.com/company/contact/headquarters/

For anyone flying in, I can recommend flying to Frankfurt and taking the
train from there. The flight from Frankfurt is only 30 minutes, so often
the wait between flights and the flight combined end up taking the same
time as the train.

HOTELS

We have 40 hotel rooms reserved at the Sorat Saxx Hotel at a very good
rate including breakfast and wifi for the duration of the conference
week (September 4-8). If you are interested in grabbing one of these
rooms, please let me know at kgronl...@suse.com before July 10 at the
latest as we need a complete list of names to give to the hotel before
the conference starts.

(more than half of the rooms are already spoken for, so let me know
ASAP)

https://www.sorat-hotels.com/en/hotel/saxx-nuernberg.html

If you plan to book your own accomodation, make sure you do so as soon
as possible. The hotels in Nuremberg tend to fill up early for the
summer season.

ATTENDEE LIST

If you plan to attend, please put your name on the planning wiki! That
way, we have a chance to make sure that there's enough coffee for
everyone. ;)

http://plan.alteeve.ca/index.php/Main_Page

CALL FOR TOPICS

For those attending, now is a good time to start thinking about topics
we might cover! If anyone would like to present something, we will have
access to basic equipment like a projector and so on. Again, putting it
on the wiki is the best place for suggestions or opinions as well.

Thank you, and hope to see you there!

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: question about ocf metadata actions

2017-03-31 Thread Kristoffer Grönlund

Ulrich Windl  writes:

> I thought the hierarchy is like this:
> 1) default timeout
> 2) RA's default timeout
> 3) user-specified timeout
>
> So crm would go from 1) to 3) taking the last value it finds. Isn't it like
> that?

No, step 2) is not taken by crm.

> I mean if there's no timeout in the resource cnfiguration, doesn't the RM use
> the default timeout?

Yes, it then uses the timeout defined in op_defaults:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-operation-defaults

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>> 
>> https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra
>
>> -dev-guide.asc#_metadata
>> 
>>> Every action should list its own timeout value. This is a hint to the
>>> user what minimal timeout should be configured for the action. This is
>>> meant to cater for the fact that some resources are quick to start and
>>> stop (IP addresses or filesystems, for example), some may take several
>>> minutes to do so (such as databases).
>> 
>>> In addition, recurring actions (such as monitor) should also specify a
>>> recommended minimum interval, which is the time between two
>>> consecutive invocations of the same action. Like timeout, this value
>>> does not constitute a default— it is merely a hint for the user which
>>> action interval to configure, at minimum.
>> 
>> Cheers,
>> Kristoffer
>> 
>>>
>>> Br,
>>>
>>> Allen
>>> ___
>>> Users mailing list: Users@clusterlabs.org 
>>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>> 
>> -- 
>> // Kristoffer Grönlund
>> // kgronl...@suse.com 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Stonith

2017-03-30 Thread Kristoffer Grönlund

Alexander Markov  writes:

> Hello, Kristoffer
>
>> Did you test failover through pacemaker itself?
>
> Yes, I did, no problems here.
>
>> However: Am I understanding it correctly that you have one node in each
>> data center, and a stonith device in each data center?
>
> Yes.
>
>> If the
>> data center is lost, the stonith device for the node in that data 
>> center
>> would also be lost and thus not able to fence.
>
> Exactly what happens!
>
>> In such a hardware configuration, only a poison pill solution like SBD
>> could work, I think.
>
> I've got no shared storage here. Every datacenter has its own storage 
> and they have replication on top (similar to drbd). I can organize a 
> cross-shared solution though if it help, but don't see how.

The only solution I know which allows for a configuration like this is
using separate clusters in each data center, and using booth for
transferring ticket ownership between them. Booth requires a data
center-level quorum (meaning at least 3 locations), though the third
location can be just a small daemon without an actual cluster, and can
run in a public cloud or similar for example.

Cheers,
Kristoffer

>
>> --
>> Regards,
>> Alexander
>
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] question about ocf metadata actions

2017-03-30 Thread Kristoffer Grönlund

he.hailo...@zte.com.cn writes:

> Hi,
>
>
> Does the timeout configured in the ocf metadata actually take effect?
>
>
>
>
> ＜actions＞
>
> ＜action name="start" timeout="300s" /＞
>
> ＜action name="stop" timeout="200s" /＞
>
> ＜action name="status" timeout="20s" /＞
>
> ＜action name="monitor" depth="0" timeout="20s" interval="2s" /＞
>
> ＜action name="meta-data" timeout="120s" /＞
>
> ＜action name="validate-all"  timeout="20s" /＞
>
> ＜/actions＞
>
>
>
>
> what's the relationship with the ones configured using "crm configure 
> primitive" ?

Hi Allen,

The timeouts in the OCF metadata are merely documentation hints, and
ignored by Pacemaker unless configured appropriately in the CIB (which
is what crm configure primitive does). See the OCF documentation:

https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc#_metadata

> Every action should list its own timeout value. This is a hint to the
> user what minimal timeout should be configured for the action. This is
> meant to cater for the fact that some resources are quick to start and
> stop (IP addresses or filesystems, for example), some may take several
> minutes to do so (such as databases).

> In addition, recurring actions (such as monitor) should also specify a
> recommended minimum interval, which is the time between two
> consecutive invocations of the same action. Like timeout, this value
> does not constitute a default — it is merely a hint for the user which
> action interval to configure, at minimum.

Cheers,
Kristoffer

>
> Br,
>
> Allen
> _______
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Stonith

2017-03-30 Thread Kristoffer Grönlund

Alexander Markov  writes:

> Hello guys,
>
> it looks like I miss something obvious, but I just don't get what has 
> happened.
>
> I've got a number of stonith-enabled clusters within my big POWER boxes. 
> My stonith devices are two HMC (hardware management consoles) - separate 
> servers from IBM that can reboot separate LPARs (logical partitions) 
> within POWER boxes - one per every datacenter.
>
> So my definition for stonith devices was pretty straightforward:
>
> primitive st_dc2_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.9
> primitive st_dc1_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.8
> clone cl_st_dc2_hmc st_dc2_hmc
> clone cl_st_dc1_hmc st_dc1_hmc
>
> Everything was ok when we tested failover. But today upon power outage 

Did you test failover through pacemaker itself?

Otherwise, the logs for the attempted stonith should reveal more about
how Pacemaker tried to call the stonith device, and what went wrong.

However: Am I understanding it correctly that you have one node in each
data center, and a stonith device in each data center? That doesn't
sound like a setup that can recover from data center failure: If the
data center is lost, the stonith device for the node in that data center
would also be lost and thus not able to fence.

In such a hardware configuration, only a poison pill solution like SBD
could work, I think.

Cheers,
Kristoffer

> we lost one DC completely. Shortly after that cluster just literally 
> hanged itself upong trying to reboot nonexistent node. No failover 
> occured. Nonexistent node was marked OFFLINE UNCLEAN and resources were 
> marked "Started UNCLEAN" on nonexistent node.
>
> UNCLEAN seems to flag a problems with stonith configuration. So my 
> question is: how to avoid such behaviour?
>
> Thank you!
>
> -- 
> Regards,
> Alexander
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fence agent for VirtualBox

2017-02-23 Thread Kristoffer Grönlund

Marek Grac  writes:

> Hi,
>
> we have added support for a host with Windows but it is not trivial to
> setup because of various contexts/privileges.
>
> Install openssh on Windows (tutorial can be found on
> http://linuxbsdos.com/2015/07/30/how-to-install-openssh-on-windows-10/)
>
> There is a major issue with current setup in Windows.  You have to start
> virtual machines from openssh connection if you wish to manage them from
> openssh connection.
>
> So, you have to connect from Windows to very same Windows using ssh and
> then run
>
> “/Program Files/Oracle/VirtualBox/VBoxManage.exe” start NAME_OF_VM
>
> Be prepared that you will not see that your machine VM is running in
> VirtualBox
> management UI.
>
> Afterwards it is enough to add parameter --host-os windows (or
> host_os=windows when stdin/pcs is used).
>

Cool, nice work!

Cheers,
Kristoffer

> m,
>
> On Wed, Feb 22, 2017 at 11:49 AM, Marek Grac  wrote:
>
>> Hi,
>>
>> I have updated fence agent for Virtual Box (upstream git). The main
>> benefit is new option --host-os (host_os on stdin) that supports
>> linux|macos. So if your host is linux/macos all you need to set is this
>> option (and ssh access to a machine). I would love to add a support also
>> for windows but I'm not able to run vboxmanage.exe over the openssh. It
>> works perfectly from command prompt under same user, so there are some
>> privileges issues, if you know how to fix this please let me know.
>>
>> m,
>>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] crm shell RA completion

2017-02-20 Thread Kristoffer Grönlund

Ulrich Windl  writes:

> Hi!
>
> I have a proposal for crm shell's RA completion: When pressing TAB after "ra 
> info", crm shell suggests a long list of RAs. Wouldn't it preferable to 
> complete only up to the next ':'?
>
> Consider this:
> crm(live)# ra info
> Display all 402 possibilities? (y or n)n
> crm(live)# ra info ocf:
> Display all 101 possibilities? (y or n)n
> crm(live)# ra info ocf:heartbeat:
> (a long list is displayed)
>
> So at the first level not all 402 RAs should be suggested but only the first 
> level (like "ocf"), and at the second level not all 101 completions should be 
> suggested, but only a few (like "heartbeat").
>
> What do you think?

Sounds good to me, yes. The completion is a bit wonky and tricky to get
right. Still a work in progress.

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] question about equal resource distribution

2017-02-18 Thread Kristoffer Grönlund

Ilia Sokolinski  writes:

> Suppose I have a N node cluster where N > 2 running m*N resources. Resources 
> don’t have preferred nodes, but since resources take RAM and CPU it is 
> important to distribute them equally among the nodes.
> Will pacemaker do the equal distribution, e.g. m resources per node?
> If a node fails, will pacemaker redistribute the resources equally too, e.g. 
> m * N/(N-1) per node?
>
> I don’t see any settings controlling this behavior in the documentation, but 
> perhaps, pacemaker tries to be “fair” by default.
>

Yes, pacemaker tries to allocate resources evenly by default, and will
move resources when nodes fail in order to maintain that.

There are several different mechanisms that influence this behaviour:

* Any placement constraints in general influence where resources are
  allocated.

* You can set resource-stickiness to a non-zero value which determines
  to which degree Pacemaker prefers to leave resources running where
  they are. The score is in relation to other placement scores, like
  constraint scores etc. This can be set for individual resources or
  globally. [1]

* If you have an asymmetrical cluster, resources have to be manually
  allocated to nodes via constraints, see [2]

[1]: 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options
[2]: 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_asymmetrical_opt_in_clusters

Cheers,
Kristoffer

> Thanks 
>
> Ilia Sokolinski
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] question about equal resource distribution

2017-02-18 Thread Kristoffer Grönlund

Ilia Sokolinski  writes:

> Thank you!
>
> What quantity does pacemaker tries to equalize - number of running resources 
> per node or total stickiness per node?
>

I honestly don't know exactly what the criteria are. Without any
utilization definitions for nodes, I *think* it tries to balance the
number of resources per node. But if the resources and nodes have
cpu/memory utilization defined, the rules change. But I'm afraid I
haven't dug into exactly what the logic looks like.

> Suppose I have a bunch of web server groups each with IPaddr and apache 
> resources, and a fewer number of database groups each with IPaddr, postgres 
> and LVM resources.
>
> In that case, does it mean that 3 web server groups are weighted the same as 
> 2 database groups in terms of distribution?

Good question, I think it looks purely at the primitive
resources. Groups are just shorthand for a series of ordering and
placement constraints.

Cheers,
Kristoffer

>
> Ilia
>
>
>
>> On Feb 17, 2017, at 2:58 AM, Kristoffer Grönlund  
>> wrote:
>> 
>> Ilia Sokolinski  writes:
>> 
>>> Suppose I have a N node cluster where N > 2 running m*N resources. 
>>> Resources don’t have preferred nodes, but since resources take RAM and CPU 
>>> it is important to distribute them equally among the nodes.
>>> Will pacemaker do the equal distribution, e.g. m resources per node?
>>> If a node fails, will pacemaker redistribute the resources equally too, 
>>> e.g. m * N/(N-1) per node?
>>> 
>>> I don’t see any settings controlling this behavior in the documentation, 
>>> but perhaps, pacemaker tries to be “fair” by default.
>>> 
>> 
>> Yes, pacemaker tries to allocate resources evenly by default, and will
>> move resources when nodes fail in order to maintain that.
>> 
>> There are several different mechanisms that influence this behaviour:
>> 
>> * Any placement constraints in general influence where resources are
>>  allocated.
>> 
>> * You can set resource-stickiness to a non-zero value which determines
>>  to which degree Pacemaker prefers to leave resources running where
>>  they are. The score is in relation to other placement scores, like
>>  constraint scores etc. This can be set for individual resources or
>>  globally. [1]
>> 
>> * If you have an asymmetrical cluster, resources have to be manually
>>  allocated to nodes via constraints, see [2]
>> 
>> [1]: 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options
>> [2]: 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_asymmetrical_opt_in_clusters
>> 
>> Cheers,
>> Kristoffer
>> 
>>> Thanks 
>>> 
>>> Ilia Sokolinski
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>> 
>> -- 
>> // Kristoffer Grönlund
>> // kgronl...@suse.com
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] resources management - redesign

2017-02-06 Thread Kristoffer Grönlund

Hi Florin,

I'm afraid I don't quite understand what it is that you are asking. You
can specify the resource ID when creating resources, and using resource
constraints, you can specify any order/colocation structure that you
need.

> 1. RG = rg1 + following resources: fs1, fs2,fs3, ocf:heartbeat[my custom
> systemd script] 

What do you mean by ocf:heartbeat[my custom systemd script]? If you've
got your own service with a systemd service file and you don't need
custom monitoring, you can use "systemd:" as the resource agent.

> Now, what solution exists ?  export cib, edit cib and re-import cib;
> what if  I will need a new fs:fs4, so what: export cib, create new
> resource inside exported cib and re-import it. 

One way to make large changes to the configuration is to

1. Stop all resources

crm configure property stop-all-resources=true 

2. Edit configuration to what you need

crm configure edit

3. Start all resources

   crm configure property stop-all-resources=false

You might have some success in keeping services running during editing
by using maintenance-mode=true instead, but that takes a lot more
care and is difficult to recommend in the general case.

It is also possible to use the shadow CIB facitility to simulate changes
to the cluster before applying them:

http://clusterlabs.org/man/pacemaker/crm_simulate.8.html

There's some documentation on using Hawk with the simulator which is
already outdated but might be of some help in figuring out what is
possible:

https://hawk-guide.readthedocs.io/en/latest/simulator.html

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: crm shell: How to display properties?

2017-02-06 Thread Kristoffer Grönlund

Ulrich Windl  writes:

>>>> xin  schrieb am 06.02.2017 um 10:50 in Nachricht
> <65fbbdf9-f820-63e7-fe02-1d1acefc5...@suse.com>:
>> Hi Ulrich:
>> 
>>"crm configure show" can display what you set for properties.
>> 
>>Do you find another way?
>
> Yes,, but it shows the while configuration. If your configuration is long, the
> output can be very long.
> What I'm talking about is:
> crm(live)configure# show property
> ERROR: object property does not exist
> crm(live)configure# show pe-error-series-max
> ERROR: object pe-error-series-max does not exist
>
> But I found out: This one works: "crm(live)configure# show
> cib-bootstrap-options".
>

You can also use

crm configure show type:property

If you follow the *-options naming convention, you can do

crm configure show \*options

Cheers,
Kristoffer

> Regards,
> Ulrich
>
>> 
>> 在 2017年02月06日 17:12, Ulrich Windl 写道:
>>>>>> Ken Gaillot  schrieb am 02.02.2017 um 21:19 in
> Nachricht
>>> :
>>>
>>> [...]
>>>> The files are not necessary for cluster operation, so you can clean them
>>>> as desired. The cluster can clean them for you based on cluster options;
>>>> see pe-error-series-max, pe-warn-series-max, and pe-input-series-max:
>>> [...]
>>>
>>> Related question:
>>> in crm shell I can set properties in configure context ("property ..."),
> but 
>> how can I display them (except from looking at the end of a "show")?
>>>
>>> Regards,
>>> Ulrich
>>>
>>>
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org 
>>> http://lists.clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>>
>> 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] fence_vbox '--action=' not executing action

2017-02-06 Thread Kristoffer Grönlund

dur...@mgtsciences.com writes:

> Kristoffer Grönlund  wrote on 02/01/2017 10:49:54 PM:
>
>> 
>> Another possibility is that the command that fence_vbox tries to run
>> doesn't work for you for some reason. It will either call
>> 
>> VBoxManage startvm  --type headless
>> 
>> or
>> 
>> VBoxManage controlvm  poweroff
>> 
>> when passed on or off as the --action parameter.
>
> If there is no further work being done on fence_vbox, is there a 'dummy' 
> fence
> which I might use to make STONITH happy in my configuration?  It need only 
> send
> the correct signals to STONITH so that I might create an active/active 
> cluster
> to experiment with?  This is only an experimental configuration.
>

Another option would be to use SBD for fencing if your hypervisor can
provide uncached shared storage:

https://github.com/ClusterLabs/sbd

This is what we usually use for our test setups here, both with
VirtualBox and qemu/kvm.

fence_vbox is actively maintained for sure, but we'd need to narrow down
what the correct changes would be to make it work in your
environment.

Trying to use a dummy fencing agent is likely to come back to bite you,
the cluster will act very unpredictably if it thinks that there is a
fencing option that doesn't actually work.

For fence_vbox, the best path forward is probably to create an issue
upstream, and attach as much relevant information about your environment
as possible:

https://github.com/ClusterLabs/fence-agents/issues/new

Cheers,
Kristoffer

> Thank you,
>
> Durwin
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] fence_vbox '--action=' not executing action

2017-02-01 Thread Kristoffer Grönlund

dur...@mgtsciences.com writes:

> I have 2 Fedora 24 Virtualbox machines running on Windows 10 host.  On the 
> host from DOS shell I can start 'node1' with,
>
> VBoxManage.exe startvm node1 --type headless
>
> I can shut it down with,
>
> VBoxManage.exe controlvm node1 acpipowerbutton
>
> But running fence_vbox from 'node2' does not work correctly.  Below are 
> two commands and the output.  First action is 'status' second action is 
> 'off'.  The both get list of running nodes, but 'off' does *not* shutdown 
> or kill the node.
>
> Any ideas?

I haven't tested with Windows as the host OS for fence_vbox (I wrote the
initial implementation of the agent). My guess from looking at your
usage is that passing "cmd" to --ssh-options might not be
sufficient to get it to work in that environment, but I have no idea
what the right arguments might be.

Another possibility is that the command that fence_vbox tries to run
doesn't work for you for some reason. It will either call

VBoxManage startvm  --type headless

or

VBoxManage controlvm  poweroff

when passed on or off as the --action parameter.

Cheers,
Kristoffer

>
> Thank you,
>
> Durwin
>
>
> 02:04 PM root@node2 ~
> fc25> fence_vbox --verbose --ip=172.23.93.249 --username=durwin 
> --identity-file=/root/.ssh/id_rsa.pub --password= --plug="node1" 
> --ssh-options="cmd" --command-prompt='>' --login-timeout=10 
> --shell-timeout=20 --action=status
> Running command: /usr/bin/ssh  durwin@172.23.93.249 -i 
> /root/.ssh/id_rsa.pub -p 22 cmd
> Received: Enter passphrase for key '/root/.ssh/id_rsa.pub':
> Sent:
>
> Received:
> stty: 'standard input': Inappropriate ioctl for device
> Microsoft Windows [Version 10.0.14393]
> (c) 2016 Microsoft Corporation. All rights reserved.
>
> D:\home\durwin>
> Sent: VBoxManage list runningvms
>
> Received: VBoxManage list runningvms
> VBoxManage list runningvms
>
> D:\home\durwin>
> Sent: VBoxManage list vms
>
> Received: VBoxManage list vms
> VBoxManage list vms
> "node2" {14bff1fe-bd26-4583-829d-bc3a393b2a01}
> "node1" {5a029c3c-4549-48be-8e80-c7a67584cd98}
>
> D:\home\durwin>
> Status: OFF
> Sent: quit
>
>
>
> 02:05 PM root@node2 ~
> fc25> fence_vbox --verbose --ip=172.23.93.249 --username=durwin 
> --identity-file=/root/.ssh/id_rsa.pub --password= --plug="node1" 
> --ssh-options="cmd" --command-prompt='>' --login-timeout=10 
> --shell-timeout=20 --action=off
> Delay 0 second(s) before logging in to the fence device
> Running command: /usr/bin/ssh  durwin@172.23.93.249 -i 
> /root/.ssh/id_rsa.pub -p 22 cmd
> Received: Enter passphrase for key '/root/.ssh/id_rsa.pub':
> Sent:
>
> Received:
> stty: 'standard input': Inappropriate ioctl for device
> Microsoft Windows [Version 10.0.14393]
> (c) 2016 Microsoft Corporation. All rights reserved.
>
> D:\home\durwin>
> Sent: VBoxManage list runningvms
>
> Received: VBoxManage list runningvms
> VBoxManage list runningvms
>
> D:\home\durwin>
> Sent: VBoxManage list vms
>
> Received: VBoxManage list vms
> VBoxManage list vms
> "node2" {14bff1fe-bd26-4583-829d-bc3a393b2a01}
> "node1" {5a029c3c-4549-48be-8e80-c7a67584cd98}
>
> D:\home\durwin>
> Success: Already OFF
> Sent: quit
>
>
> Durwin F. De La Rue
> Management Sciences, Inc.
> 6022 Constitution Ave. NE
> Albuquerque, NM  87110
> Phone (505) 255-8611
>
>
> This email message and any attachments are for the sole use of the 
> intended recipient(s) and may contain proprietary and/or confidential 
> information which may be privileged or otherwise protected from 
> disclosure. Any unauthorized review, use, disclosure or distribution is 
> prohibited. If you are not the intended recipient(s), please contact the 
> sender by reply email and destroy the original message and any copies of 
> the message as well as any attachments to the original message.
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] How to change the name of one cluster resource and resource group ?

2017-02-01 Thread Kristoffer Grönlund

Jihed M'selmi  writes:

> Thanks for reply,
> I don't have crm command. It's  corosync version 2.3.4.el7_2.1.
>

crmsh is a separate project, you can install it in parallel with
corosync/pacemaker. There are packages on OBS:

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-7/

Otherwise if you have pcs it should have something similar to crm
configure rename.

Cheers,
Kristoffer

> On Wed, Feb 1, 2017, 3:38 PM Kristoffer Grönlund  wrote:
>
>> Jihed M'selmi  writes:
>>
>> > Hello,
>> >
>> > I need update the name of one resource group with a new name. Any
>> thoughts?
>> >
>>
>> crmsh has the crm configure rename command, which tries to update any
>> constraint references atomically as well.
>>
>> Cheers,
>> Kristoffer
>>
>> > Cheers,
>> > JM
>> > --
>> >
>> > J.M
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>>
>> --
>> // Kristoffer Grönlund
>> // kgronl...@suse.com
>>
> -- 
>
> J.M

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] How to change the name of one cluster resource and resource group ?

2017-02-01 Thread Kristoffer Grönlund

Jihed M'selmi  writes:

> Hello,
>
> I need update the name of one resource group with a new name. Any thoughts?
>

crmsh has the crm configure rename command, which tries to update any
constraint references atomically as well.

Cheers,
Kristoffer

> Cheers,
> JM
> -- 
>
> J.M
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal

2017-01-31 Thread Kristoffer Grönlund

Chris Feist  writes:

> On Mon, Jan 30, 2017 at 8:23 AM, Kristoffer Grönlund 
> wrote:
>
>> Hi everyone!
>>
>> The last time we had an HA summit was in 2015, and the intention then
>> was to have SUSE arrange the next meetup in the following year. We did
>> try to find a date that would be suitable for everyone, but for various
>> reasons there was never a conclusion and 2016 came and went.
>>
>> Well, I'd like to give it another try this year! This time, I've already
>> got a proposal for a place and date: September 7-8 in Nuremberg, Germany
>> (SUSE main office). I've got the new event area in the SUSE office
>> already reserved for these dates.
>>
>> My suggestion is to do a two day event similar to the one in Brno, but I
>> am open to any suggestions as to format and content. The main reason for
>> having the event would be for everyone to have a chance to meet and get
>> to know each other, but it's also an opportunity to discuss the future
>> of Clusterlabs and the direction going forward.
>>
>> Any thoughts or feedback are more than welcome! Let me know if you are
>> interested in coming or unable to make it.
>>
>
> Kristoffer,
>
> Thank you for getting some dates and providing a space for the summit.  I
> know myself and several cluster engineers from Red Hat are definitely
> interested in attending.  The only thing that I might recommend is moving
> the conference one day earlier (change to Wed/Thu instead of Thu/Fri) to
> make it easier for people traveling to/from the conference.

Hi Chris,

Sounds great! Happy to move it to September 6-7 if that works out
better.

Cheers,
Kristoffer

>
> Thanks!
> Chris
>
>
>>
>> Cheers,
>> Kristoffer
>>
>> --
>> // Kristoffer Grönlund
>> // kgronl...@suse.com
>>
>> ___
>> Developers mailing list
>> develop...@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/developers
>>
> ___
> Developers mailing list
> develop...@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/developers

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Colocations and Orders Syntax Changed?

2017-01-31 Thread Kristoffer Grönlund

Ulrich Windl  writes:

>>>> Eric Robinson  schrieb am 20.01.2017 um 12:56 in
> Nachricht
> 
>
>> Thanks for the input. I usually just do a 'crm config show > 
>> myfile.xml.date_time' and the read it back in if I need to. 
>
> I guess 'crm configure show xml > myfile.xml.date_time', because here I get 
> "ERROR: config: No such command" and no XML... ;-)
>
> Acutally I'm using "cibadmin -Q -o configuration", because I think it's 
> faster...

If you use a more recent version of crmsh, "crm config show" will
actually work as well, thanks to some fuzzy command matching ;)

(though to get XML you do need the xml argument still)

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Releasing crmsh version 3.0.0

2017-01-31 Thread Kristoffer Grönlund

Hello everyone!

I'm happy to announce the release of crmsh version 3.0.0 today. The
main reason for the major version bump is because I have merged the
sleha-bootstrap project with crmsh, replacing the cluster
init/add/remove commands with the corresponding commands from
sleha-bootstrap.

At the moment, these commands are highly specific to SLE and openSUSE,
unfortunately. I am working on making them as distribution agnostic as
possible, but would appreciate help from users of other distributions
in making them work as well on those platforms as they do on
SLE/openSUSE.

Briefly, the "cluster init" command configures a complete cluster from
scratch, including optional configuration of fencing via SBD, shared
storage using OCFS2, setting up the Hawk web interface etc.

There are some other changes in this release as well, see the
ChangeLog for the complete list of changes:

* https://github.com/ClusterLabs/crmsh/blob/3.0.0/ChangeLog

The source code can be downloaded from Github:

* https://github.com/ClusterLabs/crmsh/releases/tag/3.0.0

This version of crmsh will be available in openSUSE Tumbleweed as soon
as possible, and packages for several popular Linux distributions are
available from the Stable repository at the OBS:

* http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

Archives of the tagged release:

* https://github.com/ClusterLabs/crmsh/archive/3.0.0.tar.gz
* https://github.com/ClusterLabs/crmsh/archive/3.0.0.zip

As usual, a huge thank you to all contributors and users of crmsh!

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal

2017-01-31 Thread Kristoffer Grönlund

Digimer  writes:

> On 30/01/17 09:23 AM, Kristoffer Grönlund wrote:
>> Hi everyone!
>> 
>> The last time we had an HA summit was in 2015, and the intention then
>> was to have SUSE arrange the next meetup in the following year. We did
>> try to find a date that would be suitable for everyone, but for various
>> reasons there was never a conclusion and 2016 came and went.
>> 
>> Well, I'd like to give it another try this year! This time, I've already
>> got a proposal for a place and date: September 7-8 in Nuremberg, Germany
>> (SUSE main office). I've got the new event area in the SUSE office
>> already reserved for these dates.
>> 
>> My suggestion is to do a two day event similar to the one in Brno, but I
>> am open to any suggestions as to format and content. The main reason for
>> having the event would be for everyone to have a chance to meet and get
>> to know each other, but it's also an opportunity to discuss the future
>> of Clusterlabs and the direction going forward.
>> 
>> Any thoughts or feedback are more than welcome! Let me know if you are
>> interested in coming or unable to make it.
>> 
>> Cheers,
>> Kristoffer
>
> Thank you for starting this back up. I was just thinking about this a
> few days ago.
>
> I could make it, and I would be happy to help organize it however I
> might be able to help.

Hi,

Awesome! I might hold you to that promise :) If nothing else your wiki
has been useful in the past as a place to host the list of attendees and
the agenda.

Another option would be to create a repository in the Clusterlabs github
organization and have people add themselves there via pull requests. I'm
also open to suggestions on that front.

Cheers,
Kristoffer

>
> -- 
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>
> ___
> Developers mailing list
> develop...@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/developers

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] lrmd segfault

2017-01-30 Thread Kristoffer Grönlund

ale...@kurnosov.spb.ru writes:

> [ Unknown signature status ]
>
> Hi All.
>
> We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 
> SL7(Scientific linux) and 2 SL6.
> SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - 
> 1.1.13-10), SL6 build from sources (same version).
> The cluster not unified, some nodes have RA which other do not have. crmsh 
> used for management.
> SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the 
> exactly same place.
> Here is an example:
>

Just from looking at the core dump, it looks like your processor doesn't
support the SSE extensions used by the newer version of the code. You'll
need to recompile and disable use of those extensions.

It looks like the code is using SSE 4.2, which is relatively new:

https://en.wikipedia.org/wiki/SSE4#SSE4.2

Cheers,
Kristoffer

> Core was generated by `/usr/libexec/pacemaker/lrmd'.
> Program terminated with signal 11, Segmentation fault.
> #0  __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
> 164 movdqu  (%rdi), %xmm1
> (gdb) bt
> #0  __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
> #1  0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 
> "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416
> #2  0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at 
> services.c:644
> #3  0x7fed073eac1d in services_action_async (op=0xed58e0, 
> action_callback=) at services.c:625
> #4  0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, 
> rsc=0xed4500) at lrmd.c:1242
> #5  lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308
> #6  lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry= variable: value has been optimized out>) at lrmd.c:1317
> #7  0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, 
> callback=, userdata=) at mainloop.c:107
> #8  0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109
> #9  g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708
> #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, 
> block=block@entry=1, dispatch=dispatch@entry=1, self=) at 
> gmain.c:3779
> #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973
> #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) at 
> main.c:476
>
> Any help would be appreciated.
>
> --
> Alexey Kurnosov
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] HA/Clusterlabs Summit 2017 Proposal

2017-01-30 Thread Kristoffer Grönlund

Hi everyone!

The last time we had an HA summit was in 2015, and the intention then
was to have SUSE arrange the next meetup in the following year. We did
try to find a date that would be suitable for everyone, but for various
reasons there was never a conclusion and 2016 came and went.

Well, I'd like to give it another try this year! This time, I've already
got a proposal for a place and date: September 7-8 in Nuremberg, Germany
(SUSE main office). I've got the new event area in the SUSE office
already reserved for these dates.

My suggestion is to do a two day event similar to the one in Brno, but I
am open to any suggestions as to format and content. The main reason for
having the event would be for everyone to have a chance to meet and get
to know each other, but it's also an opportunity to discuss the future
of Clusterlabs and the direction going forward.

Any thoughts or feedback are more than welcome! Let me know if you are
interested in coming or unable to make it.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] large cluster with corosync

2017-01-04 Thread Kristoffer Grönlund

915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync notice  [TOTEM ] A new membership 
> (10.5.4.101:964) was formed. Members
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync notice  [MAIN  ] Completed 
> service synchronization, ready to provide service.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE 
> message was thrown away during flush operation.
> Jan 04 10:29:59 [4915] reniar corosync warning [MAIN  ] Corosync main 
> process was not scheduled for 1465.7160 ms (threshold is 800. ms). 
> Consider token timeout increase.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] sbd: Cannot open watchdog device: /dev/watchdog

2017-01-03 Thread Kristoffer Grönlund

Muhammad Sharfuddin  writes:

> Hello,
>
> pacemaker does not start on this machine(Fujitsu PRIMERGY RX2540 M1) 
> with following error in  the logs:
>
> sbd: [13236]: ERROR: Cannot open watchdog device: /dev/watchdog: No such 
> file or directory

Does /dev/watchdog exist? If so, it may be opened by a different
process. If you have more than one watchdog device, you can configure
sbd to use a different device using the -w option.

Cheers,
Kristoffer

>
> System Info:
>
> sbd-1.2.1-8.7.x86_64  corosync-2.3.3-7.12.x86_64 pacemaker-1.1.12-7.1.x86_64
>
> lsmod | egrep "(wd|dog)"
> iTCO_wdt   13480  0
> iTCO_vendor_support13718  1 iTCO_wdt
>
> dmidecode | grep -A3 '^System Information'
> System Information
>  Manufacturer: FUJITSU
>  Product Name: PRIMERGY RX2540 M1
>  Version: GS01
>
> logs:
>
> 2017-01-03T21:00:26.890503+05:00 prdnode1 sbd: [13235]: info: Watchdog 
> enabled.
> 2017-01-03T21:00:26.899817+05:00 prdnode1 sbd: [13238]: info: Servant 
> starting for device 
> /dev/disk/by-id/wwn-0x60e00d28002825b5-part1
> 2017-01-03T21:00:26.900175+05:00 prdnode1 sbd: [13238]: info: Device 
> /dev/disk/by-id/wwn-0x60e00d28002825b5-part1 uuid: 
> fda42d64-ca74-4578-90c8-976ea7ff5f6e
> 2017-01-03T21:00:26.900418+05:00 prdnode1 sbd: [13239]: info: Monitoring 
> Pacemaker health
> 2017-01-03T21:00:27.901022+05:00 prdnode1 sbd: [13236]: ERROR: Cannot 
> open watchdog device: /dev/watchdog: No such file or directory
> 2017-01-03T21:00:27.912098+05:00 prdnode1 sbd: [13236]: WARN: Servant 
> for pcmk (pid: 13239) has terminated
> 2017-01-03T21:00:27.941950+05:00 prdnode1 sbd: [13236]: WARN: Servant 
> for /dev/disk/by-id/wwn-0x60e00d28002825b5-part1 (pid: 
> 13238) has terminated
> 2017-01-03T21:00:27.949401+05:00 prdnode1 sbd.sh[13231]: sbd failed; 
> please check the logs.
> 2017-01-03T21:00:27.992606+05:00 prdnode1 sbd.sh[13231]: SBD failed to 
> start; aborting.
> 2017-01-03T21:00:27.993061+05:00 prdnode1 systemd[1]: sbd.service: 
> control process exited, code=exited status=1
> 2017-01-03T21:00:27.993339+05:00 prdnode1 systemd[1]: Failed to start 
> Shared-storage based fencing daemon.
> 2017-01-03T21:00:27.993610+05:00 prdnode1 systemd[1]: Dependency failed 
> for Pacemaker High Availability Cluster Manager.
> 2017-01-03T21:00:27.994054+05:00 prdnode1 systemd[1]: Unit sbd.service 
> entered failed state.
>
> please help.
>
> -- 
> Regards,
>
> Muhammad Sharfuddin
> <http://www.nds.com.pk>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: [ClusterLabs Developers] announcement: schedule for resource-agents release 3.9.8

2017-01-03 Thread Kristoffer Grönlund

Ulrich Windl  writes:

>>>> Kristoffer Grönlund  schrieb am 03.01.2017 um 11:55 in
> Nachricht <878tqsjtv4@suse.com>:
>> Oyvind Albrigtsen  writes:
>> 
>>> Hi,
>>>
>>> This is a tentative schedule for resource-agents v3.9.8:
>>> 3.9.8-rc1: January 10.
>>> 3.9.8: January 31.
>>>
>>> I modified the corresponding milestones at
>>> https://github.com/ClusterLabs/resource-agents/milestones 
>>>
>>> If there's anything you think should be part of the release
>>> please open an issue, a pull request, or a bugzilla, as you see
>>> fit.
>>>
>> 
>> Hi Oyvind,
>> 
>> I think it's high time for a new release! My only suggestion would be to
>> call it 4.0.0, since there are much bigger changes from 3.9.7 than an
>> update to the patch release number would suggest.
>
> I don't know the semantics of everybody's release numbering, but for a
> three-level number a "compatibility"."feature"."bug-fix" pattern wouldn't be
> bad; that is only change the first number if there are incompatible changes
> (things may not work after ugrading from the previous level). Change the 
> second
> number whenever there are new features (the users may want to read about), and
> change only the last number if just bugs were fixed (without affecting the
> interfaces).
> And: There's nothing wrong with "10" following "9" ;-)
>
> And if you are just happy to throw out new versions (whatever they bring),
> call it "2017-01" ;-)

There was a recent talk by Rich Hickey on this topic, his way of putting
it was that versions basically boil down to X.Y where Y means "don't
care, just upgrade" and X means "anything can have changed, be very
careful" :)

For resource-agents and the releases historically, I personally think
having a single number that just increments each release makes as much
sense as anything else, at least in my experience there is just a single
development track where bug fixes, new features and backwards
incompatible changes mix freely, even if we do try to keep the
incompatible changes as rare as possible.

But, keeping the x.y.z triplet is easier to maintain in relation to the
older releases. 

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>> 
>> Cheers,
>> Kristoffer
>> 
>>> If there's anything that hasn't received due attention, please
>>> let us know.
>>>
>>> Finally, if you can help with resolving issues consider yourself
>>> invited to do so. There are currently 49 issues and 38 pull
>>> requests still open.
>>>
>>>
>>> Cheers,
>>> Oyvind Albrigtsen
>>>
>>> ___
>>> Developers mailing list
>>> develop...@clusterlabs.org 
>>> http://lists.clusterlabs.org/mailman/listinfo/developers 
>>>
>> 
>> -- 
>> // Kristoffer Grönlund
>> // kgronl...@suse.com 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [ClusterLabs Developers] announcement: schedule for resource-agents release 3.9.8

2017-01-03 Thread Kristoffer Grönlund

Oyvind Albrigtsen  writes:

> Hi,
>
> This is a tentative schedule for resource-agents v3.9.8:
> 3.9.8-rc1: January 10.
> 3.9.8: January 31.
>
> I modified the corresponding milestones at
> https://github.com/ClusterLabs/resource-agents/milestones
>
> If there's anything you think should be part of the release
> please open an issue, a pull request, or a bugzilla, as you see
> fit.
>

Hi Oyvind,

I think it's high time for a new release! My only suggestion would be to
call it 4.0.0, since there are much bigger changes from 3.9.7 than an
update to the patch release number would suggest.

Cheers,
Kristoffer

> If there's anything that hasn't received due attention, please
> let us know.
>
> Finally, if you can help with resolving issues consider yourself
> invited to do so. There are currently 49 issues and 38 pull
> requests still open.
>
>
> Cheers,
> Oyvind Albrigtsen
>
> ___
> Developers mailing list
> develop...@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/developers
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

1 2 >

1 - 100 of 194 matches

Mail list logo