Re: [ClusterLabs] Adding HAProxy as a Resource
On 2019-07-11 09:31, Somanath Jeeva wrote: Hi All, I am using HAProxy in my environment which I plan to add to pacemaker as resource. I see no RA available for that in resource agent. Should I write a new RA or is there any way to add it to pacemaker as a systemd service. Hello, haproxy works well as a plain systemd service, so you can add it as systemd:haproxy - that is, instead of an ocf: prefix, just put systemd:. If you want the cluster to manage multiple, differently configured instances of haproxy, you might have to either create custom systemd service scripts for each one, or create an agent with parameters. Cheers, Kristoffer With Regards Somanath Thilak J ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why do clusters have a name?
On Wed, 2019-03-27 at 12:25 +0100, Jehan-Guillaume de Rorthais wrote: > On Wed, 27 Mar 2019 10:20:21 +0100 > Kristoffer Grönlund wrote: > > > On Wed, 2019-03-27 at 10:13 +0100, Jehan-Guillaume de Rorthais > > wrote: > > > On Wed, 27 Mar 2019 09:59:16 +0100 > > > Kristoffer Grönlund wrote: > > > > > > > On Wed, 2019-03-27 at 08:27 +0100, Ivan Devát wrote: > > > > > On 26. 03. 19 21:12, Brian Reichert wrote: > > > > > > This will sound like a dumb question: > > > > > > > > > > > > The manpage for pcs(8) implies that to set up a cluster, > > > > > > one > > > > > > needs > > > > > > to provide a name. > > > > > > > > > > > > Why do clusters have names? > > > > > > > > > > > > Is there a use case wherein there would be multiple > > > > > > clusters > > > > > > visible > > > > > > in an administrative UI, such that they'd need to be > > > > > > differentiated? > > > > > > > > > > > > > > > > For example in a web UI of pcs is a page with multiple > > > > > clusters. > > > > > > > > > > > > > We use cluster names and rules to apply the same exact CIB to > > > > multiple > > > > clusters, particularly when configuring geo clusters. > > > > > > I'm not sure to understand. Is it possible to have multiple > > > Pacemaker > > > daemon instances on the same serveurs? > > > > > > Or do you mean it is possible to have multiple namespace where > > > resources are > > > isolated in and one Pacemaker daemon to manage them? > > > > > > > I am not sure what you mean by the second, but I am fairly sure I > > don't > > mean either of those :) I'm talking about having multiple actual, > > distinct clusters > > distinct cluster of Pacemaker/corosync daemons on the same servers or > distinct > cluster of servers? > Distinct clusters of servers: Cluster "Tokyo" consisting of node A, B, C Cluster "Stockholm" consisting of node D, E, F Cluster "New York" consisting of node G, H, I All with the same CIB XML document. Using tickets, resources can then be moved from one cluster to the other, or cloned across multiple clusters. A cluster of clusters, if you will. Cheers, Kristoffer > > and sharing the same configuration across all of > > them, > > Same configuration like, the same file or the same content accross > different > files? > > Sorry for being bold...I just don't get it :/ > > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why do clusters have a name?
On Wed, 2019-03-27 at 10:13 +0100, Jehan-Guillaume de Rorthais wrote: > On Wed, 27 Mar 2019 09:59:16 +0100 > Kristoffer Grönlund wrote: > > > On Wed, 2019-03-27 at 08:27 +0100, Ivan Devát wrote: > > > On 26. 03. 19 21:12, Brian Reichert wrote: > > > > This will sound like a dumb question: > > > > > > > > The manpage for pcs(8) implies that to set up a cluster, one > > > > needs > > > > to provide a name. > > > > > > > > Why do clusters have names? > > > > > > > > Is there a use case wherein there would be multiple clusters > > > > visible > > > > in an administrative UI, such that they'd need to be > > > > differentiated? > > > > > > > > > > For example in a web UI of pcs is a page with multiple clusters. > > > > > > > We use cluster names and rules to apply the same exact CIB to > > multiple > > clusters, particularly when configuring geo clusters. > > I'm not sure to understand. Is it possible to have multiple Pacemaker > daemon instances on the same serveurs? > > Or do you mean it is possible to have multiple namespace where > resources are > isolated in and one Pacemaker daemon to manage them? > I am not sure what you mean by the second, but I am fairly sure I don't mean either of those :) I'm talking about having multiple actual, distinct clusters and sharing the same configuration across all of them, using rules to separate the cases where the configurations differ. Cheers, Kristoffer ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Why do clusters have a name?
On Wed, 2019-03-27 at 08:27 +0100, Ivan Devát wrote: > On 26. 03. 19 21:12, Brian Reichert wrote: > > This will sound like a dumb question: > > > > The manpage for pcs(8) implies that to set up a cluster, one needs > > to provide a name. > > > > Why do clusters have names? > > > > Is there a use case wherein there would be multiple clusters > > visible > > in an administrative UI, such that they'd need to be > > differentiated? > > > > For example in a web UI of pcs is a page with multiple clusters. > We use cluster names and rules to apply the same exact CIB to multiple clusters, particularly when configuring geo clusters. Cheers, Kristoffer > Ivan > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: Announcing hawk-apiserver, now in ClusterLabs
Ulrich Windl writes: > Hello! > > I'd like to comment as an "old" SuSE customer: > I'm amazed that lighttpd is dropped in favor of some new go application: > SuSE now has a base system that needs (correct me if I'm wrong): shell, perl, > python, java, go, ruby, ...? > Oh, that list is a lot longer, and this is not the first go project to make it into SLE. > Maybe each programmer has his favorite. Personally I also learned quite a lot > of languages (and even editors), but most being equivalent, you'll have to > decide whether it makes sense to start using still another language (go in > this > case). Especially i'm afraid of single-vendor languages... TBH I am more sceptical about languages designed by committee ;) Cheers, Kristoffer > > Regards, > Ulrich > >>>> Kristoffer Grönlund schrieb am 12.02.2019 um 20:00 > in > Nachricht <87mun0g7c9@suse.com>: >> Hello everyone, >> >> I just wanted to send out an email about the hawk-apiserver project >> which was moved into the ClusterLabs organization on Github today. This >> project is used by us at SUSE for Hawk in our latest releases already, >> and is also available in openSUSE for use with Hawk. However, I am >> hoping that it can prove to be useful more generally, not just for Hawk >> but for other projects that may want to integrate with Pacemaker using >> the C API, and also to show what is possible when using the API. >> >> To describe the hawk-apiserver briefly, I'll start by describing the use >> case it was designed to cover: Previously, we were using lighttpd as the >> web server for Hawk (a Ruby on Rails application), but a while ago the >> maintainers of lighttpd decided that since Hawk was the only user of >> this project in SLE, they would like to remove it from the next >> release. This left Apache as the web server available to us, which has >> some interesting issues for Hawk: Mainly, we expect people to run apache >> as a resource in the cluster which might result in a confusing mix of >> processes on the systems. >> >> At the same time, I had started looking at Go and discovered how easy it >> was to write a basic proxying web server in Go. So, as an experiment I >> decided to see if I could replace the use of lighttpd with a custom web >> server written in Go. Turns out the answer was yes! Once we had our own >> web server, I discovered new things we could do with it. So here are >> some of the other unique features in hawk-apiserver now: >> >> * SSL certificate termination, and automatic detection and redirection >> from HTTP to HTTPS *on the same port*: Hawk runs on port 7630, and if >> someone accesses that port via HTTP, they will get a redirect to the >> same port but on HTTPS. It's magic. >> >> * Persistent connection to Pacemaker via the C API, enabling instant >> change notification to the web frontend. From the point of view of the >> web frontend, this is a long-lived connection which completes when >> something changes in the CIB. On the backend side, it uses goroutines >> to enable thousands of such long-lived connections with minimal >> overhead. >> >> * Optional exposure of the CIB as a REST API. Right now this is somewhat >> primitive, but we are working on making this a more fully featured >> API. >> >> * Configurable static file serving routes (serve images on /img from >> /srv/http/images for example). >> >> * Configurable proxying of subroutes to other web applications. >> >> The URL to the project is https://github.com/ClusterLabs/hawk-apiserver, >> I hope you will find it useful. Comments, issues and contributions are >> of course more than welcome. >> >> One final note: hawk-apiserver uses a project called go-pacemaker >> located at https://github.com/krig/go-pacemaker. I indend to transfer >> this to ClusterLabs as well. go-pacemaker is still somewhat rough around >> the edges, and our plan is to work on the C API of pacemaker to make >> using and exposing it via Go easier, as well as moving functionality >> from crm_mon into the C API so that status information can be made >> available in a more convenient format via the API as well. >> >> -- >> // Kristoffer Grönlund >> // kgronl...@suse.com >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterla
[ClusterLabs] Announcing hawk-apiserver, now in ClusterLabs
Hello everyone, I just wanted to send out an email about the hawk-apiserver project which was moved into the ClusterLabs organization on Github today. This project is used by us at SUSE for Hawk in our latest releases already, and is also available in openSUSE for use with Hawk. However, I am hoping that it can prove to be useful more generally, not just for Hawk but for other projects that may want to integrate with Pacemaker using the C API, and also to show what is possible when using the API. To describe the hawk-apiserver briefly, I'll start by describing the use case it was designed to cover: Previously, we were using lighttpd as the web server for Hawk (a Ruby on Rails application), but a while ago the maintainers of lighttpd decided that since Hawk was the only user of this project in SLE, they would like to remove it from the next release. This left Apache as the web server available to us, which has some interesting issues for Hawk: Mainly, we expect people to run apache as a resource in the cluster which might result in a confusing mix of processes on the systems. At the same time, I had started looking at Go and discovered how easy it was to write a basic proxying web server in Go. So, as an experiment I decided to see if I could replace the use of lighttpd with a custom web server written in Go. Turns out the answer was yes! Once we had our own web server, I discovered new things we could do with it. So here are some of the other unique features in hawk-apiserver now: * SSL certificate termination, and automatic detection and redirection from HTTP to HTTPS *on the same port*: Hawk runs on port 7630, and if someone accesses that port via HTTP, they will get a redirect to the same port but on HTTPS. It's magic. * Persistent connection to Pacemaker via the C API, enabling instant change notification to the web frontend. From the point of view of the web frontend, this is a long-lived connection which completes when something changes in the CIB. On the backend side, it uses goroutines to enable thousands of such long-lived connections with minimal overhead. * Optional exposure of the CIB as a REST API. Right now this is somewhat primitive, but we are working on making this a more fully featured API. * Configurable static file serving routes (serve images on /img from /srv/http/images for example). * Configurable proxying of subroutes to other web applications. The URL to the project is https://github.com/ClusterLabs/hawk-apiserver, I hope you will find it useful. Comments, issues and contributions are of course more than welcome. One final note: hawk-apiserver uses a project called go-pacemaker located at https://github.com/krig/go-pacemaker. I indend to transfer this to ClusterLabs as well. go-pacemaker is still somewhat rough around the edges, and our plan is to work on the C API of pacemaker to make using and exposing it via Go easier, as well as moving functionality from crm_mon into the C API so that status information can be made available in a more convenient format via the API as well. -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools
On Tue, 2019-01-08 at 10:07 -0600, Ken Gaillot wrote: > On Tue, 2019-01-08 at 10:30 +0100, Kristoffer Grönlund wrote: > > On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote: > > > > > Having all the tools able to produce XML output like cibadmin and > > crm_mon would be good in general, I think. So that seems like a > > good > > proposal to me. > > > > In the case of an error, at least in my experience just getting a > > return code and stderr output is enough to make sense of it - > > getting > > XML on stderr in the case of an error wouldn't seem like something > > that > > would add much value to me. > > There are two benefits: it can give extended information (such as the > text string that corresponds to a numeric exit status), and because > it > would also be used by any future REST API (which won't have stderr), > API/CLI output could be parsed identically. > Hm, am I understanding you correctly: My sort-of vision for implementing a REST API has been to move all of the core functionality out of the command line tools and into the C libraries (I think we discussed something like a libpacemakerclient before) - the idea is that the XML output would be generated on that level? If so, that is something that I am all for :) Right now, we are experimenting with a REST API based on taking what we use in Hawk and moving that into an API server written in Go, and just calling crm_mon --as-xml to get status information that can be exposed via the API. Having that available in C directly and not having to call out to command line tools would be great and a lot cleaner: https://github.com/krig/hawk-apiserver https://github.com/hawk-ui/hawk-web-client Cheers, Kristoffer ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools
On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote: > There has been some discussion in the past about generating more > machine-friendly output from pacemaker CLI tools for scripting and > high-level interfaces, as well as possibly adding a pacemaker REST > API. > > I've filed an RFE BZ > > https://bugs.clusterlabs.org/show_bug.cgi?id=5376 > > to design an output interface that would suit these goals. An actual > REST API is not planned at this point, but this would provide a key > component of any future implementation. Having all the tools able to produce XML output like cibadmin and crm_mon would be good in general, I think. So that seems like a good proposal to me. In the case of an error, at least in my experience just getting a return code and stderr output is enough to make sense of it - getting XML on stderr in the case of an error wouldn't seem like something that would add much value to me. Cheers, Kristoffer > > The question is what machine-friendly output should look like. The > basic idea is: for commands like "crm_resource --constraints" or > "stonith_admin --history", what output format would be most useful > for > a GUI or other program to parse? > > Suggestions welcome here and/or on the bz ... ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fwd: After failover Pacemaker moves resource back when dead node become up
On Fri, 2019-01-04 at 15:27 +0300, Özkan Göksu wrote: > Hello. > > I'm using Pacemaker & Corosync for my cluster. When a node dies > pacemaker > moving my resources to another online node. Everything ok here. > But when the dead node comes back, Pacemaker moving the resource > back. I > don't have any "location" line in my config and also I tried with > "unmove" > command but nothing changed. > corosync & pacemaker services are enabled and starting at boot. If I > run it > manually it does not move resources failback. > > How can I stop moving the resource if it is running normally? Configuring a positive resource-stickiness should take care of this for you, so there has to be something else going on. Do you get any strange errors reported for the resources on the second node? Check if there is any failcount for the resources on that node using "crm_mon -- failcounts". Other than that, looking in the logs for anything unusual would be my next move. Another thing that stands out to me is that you configure a monitor action for the gui resource, but you don't set a timeout. I'm not sure what the default is there, so I would configure a timeout explicitly. Finally, it looks like you have a 2-node cluster with STONITH disabled. That's not going to work. You need some kind of stonith, or things will behave badly. So that could be why you're seeing strange behavior. Cheers, Kristoffer > > *crm configure sh* > > node 1: DEV1 > node 2: DEV2 > primitive poolip IPaddr2 \ > params ip=10.1.60.33 nic=enp2s0f0 cidr_netmask=24 \ > meta migration-threshold=2 target-role=Started \ > op monitor interval=20 timeout=20 on-fail=restart > primitive gui systemd:gui \ > op monitor interval=20s \ > meta target-role=Started > primitive gui-ip IPaddr2 \ > params ip=10.1.60.35 nic=enp2s0f0 cidr_netmask=24 \ > meta migration-threshold=2 target-role=Started \ > op monitor interval=20 timeout=20 on-fail=restart > colocation cluster-gui inf: gui gui-ip > order gui-after-ip Mandatory: gui-ip gui > property cib-bootstrap-options: \ > have-watchdog=false \ > dc-version=2.0.0-1-8cf3fe749e \ > cluster-infrastructure=corosync \ > cluster-name=mycluster \ > stonith-enabled=false \ > no-quorum-policy=ignore \ > last-lrm-refresh=1545920437 > rsc_defaults rsc-options: \ > migration-threshold=10 \ > resource-stickiness=100 > > *pcs resource defaults* > > migration-threshold=10 > resource-stickiness=100 > > *pcs resource show gui* > > Resource: gui (class=systemd type=gui) > Meta Attrs: target-role=Started > Operations: monitor interval=20s (gui-monitor-20s) > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in Pacemaker 2.0.1 / 1.1.20: improved fencing history
On Tue, 2018-12-11 at 14:48 -0600, Ken Gaillot wrote: > Pacemaker has long had the stonith_admin --history option to show a > history of past fencing actions that the cluster has carried out. > However, this list included only events since the node it was run on > had joined the cluster, and it just wasn't very convenient. > > In the upcoming release, the cluster keeps the fence history > synchronized across all nodes, so you get the same answer no matter > which node you query. This is a great feature! On a related note, it would be amazing to have the complete transition history synchronized across all nodes as well.. Cheers, Kristoffer ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Announcing Anvil! m2 v2.0.7
On Tue, 2018-11-20 at 02:25 -0500, Digimer wrote: > * https://github.com/ClusterLabs/striker/releases/tag/v2.0.7 > > This is the first release since March, 2018. No critical issues are > know > or where fixed. Users are advised to upgrade. > Congratulations! Cheers, Kristoffer > Main bugs fixed; > > * Fixed install issues for Windows 10 and 2016 clients. > * Improved duplicate record detection and cleanup in scan-clustat and > scan-storcli. > * Disabled the detection and recovery of 'paused' state servers (it > caused more trouble than it solved). > > Notable new features; > * Improved the server boot logic to choose the node with the most > running servers, all else being equal. > * Updated UPS power transfer reason alerts from "warning" to "notice" > level alerts. > * Added support for EL 6.10. > > Users can upgrade using 'striker-update' from their Striker > dashboards. > > /sbin/striker/striker-update --local > /sbin/striker/striker-update --anvil all > > Please feel free to report any issues in the Striker github > repository. > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] resource-agents v4.2.0
On Wed, 2018-10-24 at 10:21 +0200, Oyvind Albrigtsen wrote: > ClusterLabs is happy to announce resource-agents v4.2.0. > Source code is available at: > https://github.com/ClusterLabs/resource-agents/releases/tag/v4.2.0 > [snip] > - ocf.py: new Python library and dev guide > I just wanted to highlight the Python library since I think it can make agent development a lot easier in the future, especially as we expand the library with more utilities that are commonly needed when writing agents. Any agents written in Python should (for now at least) be compatible both with Python 2.7+ and Python 3.3+. We still need to expand the CI to actually verify that agents do support these versions, so anyone who would like to help out improving the test setup is more than welcome to do so :) The biggest example of an agent using it that we have now is the azure- events agent [1], so I would recommend anyone interested in working on new agents to take a look at that. For a more compact example, I wrote a version of the Dummy resource agent using the ocf.py library and put it in a gist [2], and then there is a small example in the document describing the library and how to use it [3]. [1]: https://github.com/ClusterLabs/resource-agents/blob/master/heartbe at/azure-events.in [2]: https://gist.github.com/krig/6676d0ae065fd852fac8b445410e1c95 [3]: https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev -guides/writing-python-agents.md Cheers, Kristoffer ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] resource-agents v4.2.0 rc1
On Fri, 2018-10-19 at 10:55 +0200, Oyvind Albrigtsen wrote: > On 18/10/18 19:43 +0200, Valentin Vidic wrote: > > On Wed, Oct 17, 2018 at 12:03:18PM +0200, Oyvind Albrigtsen wrote: > > > - apache: retry PID check. > > > > I noticed that the ocft test started failing for apache in this > > version. Not sure if the test is broken or the agent. Can you > > check if the test still works for you? Restoring the previous > > version of the agent fixes the problem for me. > > It seems to work fine for me except for the that I had to change name > from apache2 to httpd (which it's called on RHEL and Fedora) in the > ocft-config, so I think we need some additional logic for that. I wonder if perhaps there was a configuration change as well, since the return code seems to be configuration related. Maybe something changed in the build scripts that moved something around? Wild guess, but... Cheers, Kristoffer > > > > # ocft test -v apache > > Initializing 'apache' ... > > Done. > > > > Starting 'apache' case 0 'check base env': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 1 'check base env: set non-existing > > OCF_RESKEY_statusurl': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 2 'check base env: set non-existing > > OCF_RESKEY_configfile': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 3 'normal start': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 4 'normal stop': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 5 'double start': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 6 'double stop': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 7 'running monitor': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 8 'not running monitor': > > ERROR: './apache monitor' failed, the return code is 2. > > Starting 'apache' case 9 'unimplemented command': > > ERROR: './apache monitor' failed, the return code is 2. > > > > -- > > Valentin > > ___ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc > > h.pdf > > Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm resource stop VirtualDomain - how to know when/if VirtualDomain is really stopped ?
On Thu, 2018-10-11 at 13:59 +0200, Lentes, Bernd wrote: > Hi, > > i'm trying to write a script which shutdown my VirtualDomains in the > night for a short period to take a clean snapshot with libvirt. > To shut them down i can use "crm resource stop VirtualDomain". > > But when i do a "crm resource stop VirtualDomain" in my script, the > command returns immediately. How can i know if my VirtualDomains are > really stopped, because the shutdown may take up to several minutes. > > I know i could do something with a loop and "crm resource status" and > grepping for e.g. stopped, but i would prefer a cleaner solution. > > Any ideas ? You should be able to pass -w to crm, crm -w resource stop VirtualDomain That should wait until the policy engine settles down again. Cheers, Kristoffer > > Thanks. > > > Bernd > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: meatware stonith
On Thu, 2018-09-27 at 02:49 -0400, Digimer wrote: > On 2018-09-27 01:54 AM, Ulrich Windl wrote: > > > > > Digimer schrieb am 26.09.2018 um 18:29 in > > > > > Nachricht > > > > <1c70b5e2-ea8e-8cbe-3d83-e207ca47b...@alteeve.ca>: > > > On 2018-09-26 11:11 AM, Patrick Whitney wrote: > > > > Hey everyone, > > > > > > > > I'm doing some pacemaker/corosync/dlm/clvm testing. I'm > > > > without a power > > > > fencing solution at the moment, so I wanted to utilize > > > > meatware, but it > > > > doesn't show when I list available stonith devices (pcs stonith > > > > list). > > > > > > > > I do seem to have it on the system, as cluster-glue is > > > > installed, and I > > > > see meatware.so and meatclient on the system, and I also see > > > > meatware > > > > listed when running the command 'stonith -L' > > > > > > > > Can anyone guide me as to how to create a stonith meatware > > > > resource > > > > using pcs? > > > > > > > > Best, > > > > -Pat > > > > > > The "fence_manual" agent was removed after EL5 days, a lng > > > time ago, > > > because it so often led to split-brains because of misuse. Manual > > > fencing is NOT recommended. > > > > > > There are new options, like SBD (storage-based death) if you have > > > a > > > watchdog timer. > > > > And even if you do not ;-) > > I've not used SBD. How, without a watchdog timer, can you be sure the > target node is dead? You can't. You can use the Linux softdog module though, but since it is a pure software solution it is limited and not ideal. > -- Cheers, Kristoffer ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Q: Reusing date specs in crm shell
On Tue, 2018-09-11 at 13:52 +0200, Ulrich Windl wrote: > Hi! > > I have a set of resources with almost identical rules, one part being > a data spec. Currently I'm using two different date specs in those > rules. However I repeated the date spec in every rule. Foreseeing > that I might change those one day, I wonder whether it's possible in > crm shell to define a date spec once (outside of any resource for > symmetry) and reference that data spec inside a rule. Ok, time for an > example: > > meta 1: ...default settings... \ > meta 2: rule 0: date spec hours=7-18 weekdays=1-5 ...override > settings outside prime time... > > In the crm manual page the reference examples use dummy primitives. > I wonder if this could be done with id-based references, but it's not something I've actually experimented with. Not a great answer, I know... > Regards, > Ulrich > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org > -- Cheers, Kristoffer ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Q: ordering for a monitoring op only?
On Mon, 2018-08-20 at 10:51 +0200, Ulrich Windl wrote: > Hi! > > I wonder whether it's possible to run a monitoring op only if some > specific resource is up. > Background: We have some resource that runs fine without NFS, but the > start, stop and monitor operations will just hang if NFS is down. In > effect the monitor operation will time out, the cluster will try to > recover, calling the stop operation, which in turn will time out, > making things worse (i.e.: causing a node fence). > > So my idea was to pause the monitoing operation while NFS is down > (NFS itself is controlled by the cluster and should recover "rather > soon" TM). > > Is that possible? It would be a lot better to fix the problem in the RA which causes it to fail when NFS is down, I would think? > And before you ask: No, I have not written that RA that has the > problem; a multi-million-dollar company wrote it (Years before I had > written a monitor for HP-UX' cluster that did not have this problem, > even though the configuration files were read from NFS (It's not > magic: Just periodically copy them to shared memory, and read the > config from shared memory). > > Regards, > Ulrich > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org > -- Cheers, Kristoffer ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm --version shows "cam dev"
On Wed, 2018-07-04 at 17:52 +0200, Salvatore D'angelo wrote: > Hi, > > With crash 2.2.0 the command: > cam —version > works fine. I downloaded 3.0.1 and it shows: > crm dev > > I know this is not a big issue but I just wanted to verify I > installed the correct version of crash. > It's probably right, but can you describe in more detail from where you downloaded and how you installed it? Cheers, Kristoffer > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] difference between external/ipmi and fence_ipmilan
"Stefan K" writes: > OK I see, but it would be good if somebody mark one of this as deprecated and > then delete it. So that noone get confused about these. > The external/* agents are not deprecated, though. Future agents will be implemented in the fence-agents framework, but the existing agents are still being used (not by RH, but by SUSE at least). Cheers, Kristoffer > best regards > Stefan > >> Gesendet: Dienstag, 26. Juni 2018 um 18:26 Uhr >> Von: "Ken Gaillot" >> An: "Cluster Labs - All topics related to open-source clustering welcomed" >> >> Betreff: Re: [ClusterLabs] difference between external/ipmi and fence_ipmilan >> >> On Tue, 2018-06-26 at 12:00 +0200, Stefan K wrote: >> > Hello, >> > >> > can somebody tell me the difference between external/ipmi and >> > fence_ipmilan? Are there preferences? >> > Is one of these more common or has some advantages? >> > >> > Thanks in advance! >> > best regards >> > Stefan >> >> The distinction is mostly historical. At one time, there were two >> different open-source clustering environments, each with its own set of >> fence agents. The community eventually settled on Pacemaker as a sort >> of merged evolution of the earlier environments, and so it supports >> both styles of fence agents. Thus, you often see an "external/*" agent >> and a "fence_*" agent available for the same physical device. >> >> However, they are completely different implementations, so there may be >> substantive differences as well. I'm not familiar enough with these two >> to address that, maybe someone else can. >> -- >> Ken Gaillot >> ___ >> Users mailing list: Users@clusterlabs.org >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?
Jan Pokorný writes: >> 4. [ ] Do you use "tag" based syntactic grouping[3] in CIB? > > 0x > > keeps me at guess what it was meant to/could be used for in practice > (had some ideas but will gladly be surprised if anyone's going to > give it a crack) > The background for this feature as far as I understand it was related to booth-based geo clusters, where the tag feature made it easier to unify the configuration of two geo clusters. Hawk also supports the tag feature via the user interface, where you can get a custom status view for a tag showing only the tagged resources instead of the whole cluster status. I honestly don't know how much use it sees in practice. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Booth fail-over conditions
Zach Anderson writes: > Hey all, > > new user to pacemaker/booth and I'm fumbling my way through my first proof > of concept. I have a 2 site configuration setup with local pacemaker > clusters at each site (running rabbitmq) and a booth arbitrator. I've > successfully validated the base failover when the "granted" site has > failed. My question is if there are any other ways to configure failover, > i.e. using resource health checks or the like? > Hi Zach, Do you mean that a resource health check should trigger site failover? That's actually something I'm not sure comes built-in.. though making a resource agent which revokes a ticket on failure should be fairly straight-forward. You could then group your resource which the ticket resource to enable this functionality. The logic in the ticket resource ought to be something like "if monitor fails and the current site is granted, then revoke the ticket, else do nothing". You would probably want to handle probe monitor invocations differently. There is a ocf_is_probe function provided to help with this. Cheers, Kristoffer > Thanks! > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons
Jehan-Guillaume de Rorthais writes: > > I feel like you guys are talking of a solution that already exists and you > probably already know, eg. "etcd". > > Etcd provides: > > * a cluster wide key/value storage engine > * support quorum > * key locking > * atomic changes > * REST API > * etc... > > However, it requires to open a new TCP port, indeed :/ > My main inspiration and reasoning is indeed to introduce the same functionality provided by etcd into a corosync-based cluster without having to add a parallel cluster consensus solution. Simply installing etcd means 1) now you have two clusters, 2) etcd doesn't handle 2-node clusters or fencing and doesn't degrade well to a single node, 3) relying on the presence of the KV-store in pacemaker tools is not an option unless pacemaker wants to make etcd a requirement. Cheers, Kristoffer > Moreover, as a RA developer, I am currently messing with attrd weird > behavior[1], so any improvement there is welcomed :) > > Cheers, > > [1] https://github.com/ClusterLabs/PAF/issues/131 > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons
Jan Pokorný writes: > /me keenly joins the bike-shedding > > What about pcmk-based/pcmk-infod. First, we effectively tone down > "common information/base" from the expanded CIB abbreviation[*1], > and second, in the former case, we highlight that's the central point > providing resident data glue (pcmk-datad?[*2]) amongst the other daemons. pcmk-infod sounds pretty good to me, it indicates data management / central information handling etc. Plus it contains at least part of one of the words of the expansion of "CIB". Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons
Klaus Wenninger writes: > > One thing I thought over as well is some kind of > a chicken & egg issue arising when you want to > use the syncing-mechanism so setup (bootstrap) > the cluster. > So something like the ssh-mechanism pcsd is > using might still be needed. > The file-syncing approach would have the data > easily available locally prior to starting the > actual cluster-wide syncing. > > Well ... no solutions or anything ... just > a few thoughts I had on that issue ... 25ct max ;-) > Bootstrapping is a problem I've thought about quite a bit.. It's possible to implement in a number of ways, and it's not clear what's the better approach. But I see a cluster-wide configuration database as an enabler for better bootstrapping rather than a hurdle. If a new node doesn't need a local copy of the database but can access the database from an existing node, it would be possible for the new node to bootstrap itself into the cluster with nothing more than remote access to that database, so a single port to open and a single authentication mechanism - this could certainly be handled over SSH just like pcsd and crmsh implements it today. But yes, at some point there needs to be communication channel opened.. -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons
Ken Gaillot writes: > On Tue, 2018-04-03 at 08:33 +0200, Kristoffer Grönlund wrote: >> Ken Gaillot writes: >> >> > > I >> > > would vote against PREFIX-configd as compared to other cluster >> > > software, >> > > I would expect that daemon name to refer to a more generic >> > > cluster >> > > configuration key/value store, and that is something that I have >> > > some >> > > hope of adding in the future ;) So I'd like to keep "config" or >> > > "database" for such a possible future component... >> > >> > What's the benefit of another layer over the CIB? >> > >> >> The idea is to provide a more generalized key-value store that other >> applications built on top of pacemaker can use. Something like a >> HTTP REST API to a key-value store with transactional semantics >> provided >> by the cluster. My understanding so far is that the CIB is too heavy >> to >> support that kind of functionality well, and besides that the >> interface >> is not convenient for non-cluster applications. > > My first impression is that it sounds like a good extension to attrd, > cluster-wide attributes instead of node attributes. (I would envision a > REST API daemon sitting in front of all the daemons without providing > any actual functionality itself.) > > The advantage to extending attrd is that it already has code to > synchronize attributes at start-up, DC election, partition healing, > etc., as well as features such as write dampening. Yes, I've considered that as well and yes, I think it could make sense. I need to gain a better understanding of the current attrd implementation to see how to make it do what I want. The configd name/part comes into play when bringing in syncing data beyond the key-value store (see below). > > Also cib -> pcmk-configd is very popular :) > I can live with it. ;) >> My most immediate applications for that would be to build file >> syncing >> into the cluster and to avoid having to have an extra communication >> layer for the UI. > > How would file syncing via a key-value store work? > > One of the key hurdles in any cluster-based sync is > authentication/authorization. Authorization to use a cluster UI is not > necessarily equivalent to authorization to transfer arbitrary files as > root. > Yeah, the key-value store wouldn't be enough to implement file syncing, but it could potentially be the mechanism by which the file syncing implementation maintains its state. I'm somewhat conflating two things that I want that are both related to syncing configuration beyond the cluster daemon itself across the cluster. I don't see authentication/authorization as a hurdle or blocker, but it's certainly something that needs to be considered. Clearly a less-privileged user shouldn't be able to configure syncing of root-owned files across the cluster. -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons
Ken Gaillot writes: >> I >> would vote against PREFIX-configd as compared to other cluster >> software, >> I would expect that daemon name to refer to a more generic cluster >> configuration key/value store, and that is something that I have some >> hope of adding in the future ;) So I'd like to keep "config" or >> "database" for such a possible future component... > > What's the benefit of another layer over the CIB? > The idea is to provide a more generalized key-value store that other applications built on top of pacemaker can use. Something like a HTTP REST API to a key-value store with transactional semantics provided by the cluster. My understanding so far is that the CIB is too heavy to support that kind of functionality well, and besides that the interface is not convenient for non-cluster applications. My most immediate applications for that would be to build file syncing into the cluster and to avoid having to have an extra communication layer for the UI. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Possible idea for 2.0.0: renaming the Pacemaker daemons
Ken Gaillot writes: > Hi all, > > Andrew Beekhof brought up a potential change to help with reading > Pacemaker logs. > > Currently, pacemaker daemon names are not intuitive, making it > difficult to search the system log or understand what each one does. > > The idea is to rename the daemons, with a common prefix, and a name > that better reflects the purpose. > [...] > Here are the current names, with some example replacements: > > pacemakerd: PREFIX-launchd, PREFIX-launcher > > attrd: PREFIX-attrd, PREFIX-attributes > > cib: PREFIX-configd, PREFIX-state > > crmd: PREFIX-controld, PREFIX-clusterd, PREFIX-controller > > lrmd: PREFIX-locald, PREFIX-resourced, PREFIX-runner > > pengine: PREFIX-policyd, PREFIX-scheduler > > stonithd: PREFIX-fenced, PREFIX-stonithd, PREFIX-executioner > > pacemaker_remoted: PREFIX-remoted, PREFIX-remote Better to do it now rather than later. I vote in favor of changing the names. Yes, it'll mess up crmsh, but at least for distributions it's just a simple search/replace patch to apply. I would also vote in favour of sticking to the 15 character limit, and to use "pcmk" as the prefix. That leaves 11 characters for the name, which should be enough for anyone ;) My votes: pacemakerd -> pcmk-launchd attrd -> pcmk-attrd cib -> pcmk-stated crmd -> pcmk-controld lrmd -> pcmk-resourced pengine -> pcmk-schedulerd stonithd -> pcmk-fenced pacemaker_remoted -> pcmk-remoted The one I'm the most divided about is cib. pcmk-cibd would also work. I would vote against PREFIX-configd as compared to other cluster software, I would expect that daemon name to refer to a more generic cluster configuration key/value store, and that is something that I have some hope of adding in the future ;) So I'd like to keep "config" or "database" for such a possible future component... Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm shell 2.1.2 manual bug?
"Ulrich Windl" writes: > Hi! > > For crmsh-2.1.2+git132.gbc9fde0-18.2 I think there's a bug in the manual > describing resource sets: > >sequential >If true, the resources in the set do not depend on each other > internally. Setting sequential to true implies a strict order of dependency > within the set. > > Obviously "true" cannot mean both: "do not depend" and "depend". My guess is > that the first true has to be false. Right, "do not depend" should be "depend" there. Thanks for catching it :) > I came across this when trying to add a colocation like this: > colocation col_LV inf:( cln_LV cln_LV-L1 cln_LV-L2 cln_ML cln_ML-L1 cln_ML-L2 > ) cln_VMs > > crm complained about this: > ERROR: 1: syntax in role: Unmatched opening bracket near parsing > 'colocation ...' > ERROR: 2: syntax: Unknown command near parsing 'cln_ml-l2 ) > cln_VMs' > (note the lower case) The problem reported is that there is no space between "inf:" and "(" - the parser in crmsh doesn't handle missing spaces between tokens right now. Cheers, Kristoffer > > Regards, > Ulrich > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] How to configure lifetime in constraints?
xin writes: > Hi: > >I noticed that in latest constraints schema file(constraints-3.0.rng), >"element-lifetime" is an option in location/colocation/order, and it > linked to rule-2.9.rng. > >I can not find the keyword "lifetime" in upstream document "Pacemaker > 1.1 Configuration Explained", >then I guess "date_expression" in rules means lifetime. > >So I write this xml section in file cons1.xml: >### > > > > > > > >### First off, the lifetime element is deprecated IIRC and not necessary. Second, the required rule is somewhat complicated. I would recommend using the crmsh "crm resource ban" command to create the constraint with a lifetime, and then look at the CIB XML to see what the created constraint looks like. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Error when linking to libqb in shared library
Jan Pokorný writes: > I guess you are linking your python extension with one of the > pacemaker libraries (directly on indirectly to libcrmcommon), and in > that case, you need to rebuild pacemaker with the patched libqb[*] for > the whole arrangement to work. Likewise in that case, as you may be > aware, the "API" is quite uncommitted at this point, stability hasn't > been of importance so far (because of the handles into pacemaker being > mostly abstracted through built-in CLI tools for the outside players > so far, which I agree is encumbered with tedious round-trips, etc.). > There's a huge debt in this area, so some discretion and perhaps > feedback which functions are indeed proper-API-worth is advised. The ultimate goal of my project is indeed to be able to propose or begin a discussion around a stable API for Pacemaker to eventually move away from command-line tools as the only way to interact with the cluster. Thank you, I'll investigate the proposed changes. Cheers, Kristoffer > > [*] > shortcut 1: just recompile pacemaker with those extra > /usr/include/qb/qblog.h modifications as of the > referenced commit) > shortcut 2: if the above can be tolerated widely, this is certainly > for local development only: recompile pacemaker with > CPPFLAGS=-DQB_KILL_ATTRIBUTE_SECTION > > Hope this helps. > > -- > Jan (Poki) > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Error when linking to libqb in shared library
Hi everyone, (and especially the libqb developers) I started hacking on a python library written in C which links to pacemaker, and so to libqb as well, but I'm encountering a strange problem which I don't know how to solve. When I try to import the library in python, I see this error: --- command --- PYTHONPATH='/home/krig/projects/work/libpacemakerclient/build/python' /usr/bin/python3 /home/krig/projects/python-pacemaker/build/../python/clienttest.py --- stderr --- python3: utils.c:66: common: Assertion `"implicit callsite section is observable, otherwise target's and/or libqb's build is at fault, preventing reliable logging" && work_s1 != NULL && work_s2 != NULL' failed. --- This appears to be coming from the following libqb macro: https://github.com/ClusterLabs/libqb/blob/master/include/qb/qblog.h#L352 There is a long comment above the macro which if nothing else tells me that I'm not the first person to have issues with it, but it doesn't really tell me what I'm doing wrong... Does anyone know what the issue is, and if so, what I could do to resolve it? Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
Ken Gaillot writes: > > I can see the point, but I do like having separate. > > A clone with a single instance is not identical to a primitive. Think > of building a cluster, starting with one node, and configuring a clone > -- it has only one instance, but you wouldn't expect it to show up as a > primitive in status displays. > > Also, there are a large number of clone meta-attributes that aren't > applicable to simple primitives. By contrast, master adds only two > attributes to clones. I'm not convinced by either argument. :) The distinction between single-instance clone and primitive is certainly not clear to me, and there is no problem for status displays to display a resource with a single replica differently from a resource that isn't configured to be replicated. The number of meta-attributes related to clones seems irrelevant as well, pacemaker can reject a configuration that sets clone-related attributes for non-clone resources just as well as if they were on a different node in the XML. > > From the XML perspective, I think the current approach is logically > structured, a wrapped around a or , each > with its own meta-attributes. Well, I guess it's a matter of opinion. For me, I don't think it is very logical at all. For example, the result of having the hierarchy of nodes is that it is possible to configure target-role for both the wrapped and the container: Then edit the configuration removing the clone, save, and the resource starts when it should have been stopped. It's even worse in the case of a clone wrapping a group holding clones of resources, in which case there can be four levels of attribute inheritance -- and this applies to both meta attributes and instance attributes. Add to that the fact that there can be multiple sets of instance attributes and meta attributes for each of these with rule expressions and implicit precedence determining which set actually applies... -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology
Ken Gaillot writes: > > For Pacemaker 2, I'd like to replace the resource type with > . (The old syntax would be transparently > upgraded to the new one.) The role names themselves are not likely to > be changed in that time frame, as they are used in more external pieces > such as notification variables. But it would be the first step. > > I hope that this will be an uncontroversial change in the ClusterLabs > community, but because such changes have been heated elsewhere, here is > why this change is desirable: > I agree 100% about this change. In Hawk, we've already tried to hide the Master/Slave terms as much as possible and replace them with primary/secondary and "Multi-state", but I'm happy to converge on common terms. I'm partial to "Promoted" and "Started" since it makes it clearer that the secondary state is a base state and that it's the promoted state which is different / special. However, can I throw a wrench in the machinery? When replacing the resource type with , why not go a step further and merge both and with the basic ? => clone => master or for groups, I have never understood the usefulness of separate meta-attribute sets for the and nodes. -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Changes coming in Pacemaker 2.0.0
Jehan-Guillaume de Rorthais writes: > > For what is worth, while using crmsh, I always have to explain to > people or customers that: > > * we should issue an "unmigrate" to remove the constraint as soon as the > resource can get back to the original node or get off the current node if > needed (depending on the -inf or +inf constraint location issued) > * this will not migrate back the resource if it's sticky enough on the current > node. > > See: > http://clusterlabs.github.io/PAF/Debian-8-admin-cookbook.html#swapping-master-and-slave-roles-between-nodes > > This is counter-intuitive, indeed. I prefer the pcs interface using > the move/clear actions. No need! You can use crm rsc move / crm rsc clear. In fact, "unmove" is just a backwards-compatibility alias for clear in crmsh. Cheers, Kristoffer > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Changes coming in Pacemaker 2.0.0
Andrei Borzenkov writes: > On Thu, Jan 11, 2018 at 10:54 AM, Ulrich Windl > wrote: >> Hi! >> >> On the tool changes, I'd prefer --move and --un-move as pair over --move and >> --clear ("clear" is less expressive IMHO). > > --un-move is really wrong semantically. You do not "unmove" resource - > you "clear" constraints that were created. Whether this actually > results in any "movement" is unpredictable (easily). > > Personally I find lack of any means to change resource state > non-persistently one of major usability issue with pacemaker comparing > with other cluster stacks. Just a small example: > > I wanted to show customer how "maintenance-mode" works. After setting > maintenance-mode=yes for the cluster we found that database was > mysteriously restarted after being stopped manually. It took quite > some time to find out that couple of weeks ago "crm resource manager" > followed by "crm resource unmanage" was run for this resource - which > left explicit "managed=yes" on resource which took precedence over > "maintenance-mode". > > Not only is this asymmetrical and non-intuitive. There is no way to > distinguish temporary change from permanent one. Moving resources is > special-cased but for any change that involves setting resource > (meta-)attributes this approach is not possible. Attribute is there, > and we do not know why it was set. The problem is really that the configuration is declarative and that in the declarative configuration there is a hierarchy of attributes that combine in more or less obvious ways. There is no way to retain that and not create pitfalls. At least the CIB is not CSS... In this case, the place where things went wrong was when crmsh left "managed=yes" in place instead of relying on the default and just unsetting the managed attribute. Though there's a similar confusion when setting target-role - the command line gives the impression of imperative commands; "start this, stop that" while the actual instructions issued to pacemaker are declarative. It gets especially tricky when target-role is set on a group as well as on individual resources in the group. Unhelpful perhaps, but in my opinion, the CIB makes it very difficult to answer even simple questions like "what value does this attribute really have", and for very marginal benefit. If it were up to me, rule expressions, op_defaults, rsc_defaults, nested resources (group, master, clone) and multiple meta_attribute/attribute elements for single resources would all be deleted. The only real valid case I can see for rule expressions is for configuring different attribute values for different nodes - and that would be better achieved by fetching the value from a distributed database which handles that part. Having such a database would also enable things like private data / passwords to be kept out of the CIB. Cheers, Kristoffer > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Cluster IP that "supports" two subnets !?
Zarko Dudic writes: > Hi there, I'd like to setup a cluster, with two nodes, but on two > different sub-nets (nodes are in two different cities). Notes are > running Oracle Linux 7.4 and so far I have both them running and cluster > software have been installed and configured. > > Well, next is to add a resources and I'd like to start with ClusterIP, > and seems it's straightforward if nodes are on same subnet, which is not > my case. First of all is it possible to accomplish what I want, and if > yes, I'd appreciate to hear some suggestions. Thanks a lot. Hi, I'm not sure I understand the question so my answer may be off the mark. An IP address is intrinsically part of a particular subnet, so how would managing an IP address across separate subnets work? Or do you mean to manage an IP address from a third subnet mapped to both locations? This second option is indeed possible using the regular IP resources, it is more of a network setup problem. Another option would be to manage DNS records across subnets. This is possible using the dnsupdate resource. Yet a third option would be to access the resources through a proxy, but then availability is of course limited to the availability of the proxy and network between proxy and the active site. Cheers, Kristoffer > > > -- > Thanks, > Zarko > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crmsh resource failcount does not appear to work
Andrei Borzenkov writes: > As far as I can tell, pacemaker acts on failcount attributes qualified > by operation name, while crm sets/queries unqualified attribute; I do > not see any syntax to set fail-count for specific operation in crmsh. crmsh uses crm_attribute to get the failcount. It could be that this usage has stopped working as of 1.1.17.. Cheers, Kristoffer > > ha1:~ # rpm -q crmsh > crmsh-4.0.0+git.1511604050.816cb0f5-1.1.noarch > ha1:~ # crm_mon -1rf > Stack: corosync > Current DC: ha2 (version 1.1.17-3.3-36d2962a8) - partition with quorum > Last updated: Sun Dec 24 10:55:54 2017 > Last change: Sun Dec 24 10:55:47 2017 by hacluster via crmd on ha2 > > 2 nodes configured > 4 resources configured > > Online: [ ha1 ha2 ] > > Full list of resources: > > stonith-sbd (stonith:external/sbd): Started ha1 > rsc_dummy_1 (ocf::pacemaker:Dummy): Started ha2 > Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1] > Masters: [ ha1 ] > Slaves: [ ha2 ] > > Migration Summary: > * Node ha2: > * Node ha1: > ha1:~ # echo xxx > /run/Stateful-rsc_Stateful_1.state > ha1:~ # crm_failcount -G -r rsc_Stateful_1 > scope=status name=fail-count-rsc_Stateful_1 value=1 > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1 > scope=status name=fail-count-rsc_Stateful_1 value=0 > ha1:~ # crm resource failcount rsc_Stateful_1 set ha1 4 > ha1:~ # crm_failcount -G -r rsc_Stateful_1 > scope=status name=fail-count-rsc_Stateful_1 value=1 > ha1:~ # crm resource failcount rsc_Stateful_1 show ha1 > scope=status name=fail-count-rsc_Stateful_1 value=4 > ha1:~ # cibadmin -Q | grep fail-count >id="status-1084752129-fail-count-rsc_Stateful_1.monitor_1" > name="fail-count-rsc_Stateful_1#monitor_1" value="1"/> >name="fail-count-rsc_Stateful_1" value="4"/> > ha1:~ # > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: questions about startup fencing
Tomas Jelinek writes: >> >> * how is it shutting down the cluster when issuing "pcs cluster stop --all"? > > First, it sends a request to each node to stop pacemaker. The requests > are sent in parallel which prevents resources from being moved from node > to node. Once pacemaker stops on all nodes, corosync is stopped on all > nodes in the same manner. > >> * any race condition possible where the cib will record only one node up >> before >>the last one shut down? >> * will the cluster start safely? That definitely sounds racy to me. The best idea I can think of would be to set all nodes except one in standby, and then shutdown pacemaker everywhere... -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Adam Spiers writes: > > OK, so reading between the lines, if we don't want our cluster's > latest config changes accidentally discarded during a complete cluster > reboot, we should ensure that the last man standing is also the first > one booted up - right? That would make sense to me, but I don't know if it's the only solution. If you separately ensure that they all have the same configuration first, you could start them in any order I guess. > > If so, I think that's a perfectly reasonable thing to ask for, but > maybe it should be documented explicitly somewhere? Apologies if it > is already and I missed it. Yeah, maybe a section discussing both starting and stopping a whole cluster would be helpful, but I don't know if I feel like I've thought about it enough myself. Regarding the HP Service Guard commands that Ulrich Windl mentioned, the very idea of such commands offends me on some level but I don't know if I can clearly articulate why. :D -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Adam Spiers writes: > Kristoffer Gronlund wrote: >>Adam Spiers writes: >> >>> - The whole cluster is shut down cleanly. >>> >>> - The whole cluster is then started up again. (Side question: what >>> happens if the last node to shut down is not the first to start up? >>> How will the cluster ensure it has the most recent version of the >>> CIB? Without that, how would it know whether the last man standing >>> was shut down cleanly or not?) >> >>This is my opinion, I don't really know what the "official" pacemaker >>stance is: There is no such thing as shutting down a cluster cleanly. A >>cluster is a process stretching over multiple nodes - if they all shut >>down, the process is gone. When you start up again, you effectively have >>a completely new cluster. > > Sorry, I don't follow you at all here. When you start the cluster up > again, the cluster config from before the shutdown is still there. > That's very far from being a completely new cluster :-) You have a new cluster with (possibly fragmented) memories of a previous life ;) > > Yes, exactly. If the first node to start up was not the last man > standing, the CIB history is effectively being forked. So how is this > issue avoided? > >>The only way to bring up a cluster from being completely stopped is to >>treat it as creating a completely new cluster. The first node to start >>"creates" the cluster and later nodes join that cluster. > > That's ignoring the cluster config, which persists even when the > cluster's down. There could be a command in pacemaker which resets a set of nodes to a common known state, basically to pick the CIB from one of the nodes as the survivor and copy that to all of them. But in the end, that's just the same thing as just picking one node as the first node, and telling the others to join that one and to discard their configurations. So, treating it as a new cluster. > > But to be clear, you picked a small side question from my original > post and answered that. The main questions I had were about startup > fencing :-) I did! :) -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] questions about startup fencing
Adam Spiers writes: > - The whole cluster is shut down cleanly. > > - The whole cluster is then started up again. (Side question: what > happens if the last node to shut down is not the first to start up? > How will the cluster ensure it has the most recent version of the > CIB? Without that, how would it know whether the last man standing > was shut down cleanly or not?) This is my opinion, I don't really know what the "official" pacemaker stance is: There is no such thing as shutting down a cluster cleanly. A cluster is a process stretching over multiple nodes - if they all shut down, the process is gone. When you start up again, you effectively have a completely new cluster. When starting up, how is the cluster, at any point, to know if the cluster it has knowledge of is the "latest" cluster? The next node could have a newer version of the CIB which adds yet more nodes to the cluster. The only way to bring up a cluster from being completely stopped is to treat it as creating a completely new cluster. The first node to start "creates" the cluster and later nodes join that cluster. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] How much cluster-glue support is still needed in Pacemaker?
Ken Gaillot writes: > We're starting work on Pacemaker 2.0, which will remove support for the > heartbeat stack. > > cluster-glue was traditionally associated with heartbeat. Do current > distributions still ship it? > > Currently, Pacemaker uses cluster-glue's stonith/stonith.h to support > heartbeat-class stonith agents via the fence_legacy agent. If this is > still widely used, we can keep this support. > > Pacemaker also checks for heartbeat/glue_config.h and uses certain > configuration values there in favor of Pacemaker's own defaults (e.g. > the value of HA_COREDIR instead of /var/lib/pacemaker/cores). Does > anyone still use the cluster-glue configuration for such things? If > not, I'd prefer to drop this. Hi Ken, We're still shipping it, but mostly only for the legacy agents which we still use - although we aim to phase them out in favor of fence-agents. I would say that if you can keep the fence_legacy agent intact, dropping the rest is OK. Cheers, Kristoffer > -- > Ken Gaillot > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.18 Release Candidate 4
Ken Gaillot writes: > I decided to do another release candidate, because we had a large > number of changes since rc3. The fourth release candidate for Pacemaker > version 1.1.18 is now available at: > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.18- > rc4 > > The big changes are numerous scalability improvements and bundle fixes. > We're starting to test Pacemaker with as many as 1,500 bundles (Docker > containers) running on 20 guest nodes running on three 56-core physical > cluster nodes. Hi Ken, That's really cool. What's the size of the CIB with that kind of configuration? I guess it would compress pretty well, but still. Cheers, Kristoffer > > For details on the changes in this release, see the ChangeLog. > > This is likely to be the last release candidate before the final > release next week. Any testing you can do is very welcome. > -- > Ken Gaillot > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Azure Resource Agent
AZ_ENABLED" > fi > > #--set the ipconfig name > AZ_IPCONFIG_NAME="ipconfig-""$OCF_RESKEY_ip" > logIt "debug1: AZ_IPCONFIG_NAME=$AZ_IPCONFIG_NAME" > > #--get the resource group name > AZ_RG_NAME=$(az group list|grep name|cut -d":" -f2|sed "s/ *//g"|sed > "s/\"//g"|sed "s/,//g") > if [ -z "$AZ_RG_NAME" ] > then > logIt "could not determine the Azure resource group name" > exit $OCF_ERR_GENERIC > else > logIt "debug1: AZ_RG_NAME=$AZ_RG_NAME" > fi > > #--get the nic name > AZ_NIC_NAME=$(az vm nic list -g $AZ_RG_NAME --vm-name $MY_HOSTNAME|grep > networkInterfaces|cut -d"/" -f9|sed "s/\",//g") > if [ -z "$AZ_NIC_NAME" ] > then > echo "could not determine the Azure NIC name" > exit $OCF_ERR_GENERIC > else > logIt "debug1: AZ_NIC_NAME=$AZ_NIC_NAME" > fi > > #--get the vnet and subnet names > R=$(az network nic show --name $AZ_NIC_NAME --resource-group $AZ_RG_NAME|grep > -i subnets|head -1|sed "s/ */ /g"|cut -d"/" -f9,11|sed "s/\",//g") > LDIFS=$IFS > IFS="/" > R_ARRAY=( $R ) > AZ_VNET_NAME=${R_ARRAY[0]} > AZ_SUBNET_NAME=${R_ARRAY[1]} > if [ -z "$AZ_VNET_NAME" ] > then > logIt "could not determine Azure vnet name" > exit $OCF_ERR_GENERIC > else > logIt "debug1: AZ_VNET_NAME=$AZ_VNET_NAME" > fi > if [ -z "$AZ_SUBNET_NAME" ] > then > logIt "could not determine the Azure subnet name" > exit $OCF_ERR_GENERIC > else > logIt "debug1: AZ_SUBNET_NAME=$AZ_SUBNET_NAME" > fi > > ## > # Actions > ## > > case $__OCF_ACTION in > meta-data) meta_data > RC=$? > ;; > usage|help) azip_usage > RC=$? > ;; > start) azip_start > RC=$? > ;; > stop) azip_stop > RC=$? > ;; > status) azip_query > RC=$? > ;; > monitor) azip_monitor > RC=$? > ;; > validate-all);; > *)azip_usage > RC=$OCF_ERR_UNIMPLEMENTED > ;; > esac > > #--exit with return code > logIt "debug1: exiting $SCRIPT_NAME with code $RC" > exit $RC > > #--end > > -- > Eric Robinson > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0
Jehan-Guillaume de Rorthais writes: >> Planning to move this under the Clusterlabs github group? > > Yes! > > I'm not sure how long and how many answers I should wait for to reach a > community agreement. But first answers are encouraging :) Regarding your concerns with submitting it into resource-agents, I would say that moving into ClusterLabs/ as a separate repository at first makes sense to me as well. We can look at including it in resource-agents and the implications of supporting various language-libraries for OCF agents later. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Moving PAF to clusterlabs ?
Jehan-Guillaume de Rorthais writes: > Hi All, > > I am currently thinking about moving the RA PAF (PostgreSQL Automatic > Failover) > out of the Dalibo organisation on Github. Code and website. [snip] > Note that part of the project (some perl modules) might be pushed to > resource-agents independently, see [2]. Two years after, I'm still around on > this project. Obviously, I'll keep maintaining it on my Dalibo's and personal > time. > > Thoughts? Hi, I for one would be happy to see it included in the resource-agents repository. If people are worried about the additional dependency on perl, we can just add a --without-perl flag (or something along those lines) to the Makefile. We already have different agents for the same application but with different contexts so this wouldn't be anything new. Cheers, Kristoffer > > [1] http://lists.clusterlabs.org/pipermail/developers/2015-August/66.html > [2] http://lists.clusterlabs.org/pipermail/developers/2015-August/68.html > > Regards, > -- > Jehan-Guillaume de Rorthais > Dalibo > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Oh how we've grown! :D
Digimer writes: > Here are the attendee pictures from 2015 and from this summit today. > > So amazing to see how far our community has come. I am stoked to see how > much larger we are still in 2019! > A huge thank you again to everyone! You are all awesome. Cheers, Kristoffer > > > > -- > Digimer > Papers and Projects: https://alteeve.com/w/ > "I am, somehow, less interested in the weight and convolutions of > Einstein’s brain than in the near certainty that people of equal talent > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Clusterlabs Summit: Presentation material
Hi everyone, I got some requests to provide the slides for the presentations at the summit, and I thought that the best solution is probably to do what some presenters already did on the Trello board: For those of you who have slides to share, please attach them to the card of your presentation at on the Trello board: https://trello.com/b/LNUrtV1Q/clusterlabs-summit-2017 There's also a link to the group photo on the plan wiki now: http://plan.alteeve.ca/index.php/Main_Page Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Clusterlabs Summit: Expect rain tomorrow
Hey everyone! I am going to try to be at the event area at 8 in the morning tomorrow, and I wouldn't recommend showing up earlier than that. The doors will probably be locked. The summit itself is scheduled to start at 9. Unfortunately it seems we can expect rain tomorrow, so I wanted to send out a small warning: In case you haven't brought an umbrella or rain gear, now is the time to go out and get it. For anyone needing to take a taxi, the number is +49 (0911) 19 410, or the reception here at the SUSE office can help call a taxi as well. It is also possible to take the U-bahn to Maxfeld station, though unfortunately there is a short walk to the office even then. Cheers and welcome, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker in Azure
Eric Robinson writes: > Hi Kristoffer -- > > If you would be willing to share your AWS ip control agent(s), I think those > would be very helpful to us and the community at large. I'll be happy to > share whatever we come up with in terms of an Azure agent when we're all done. I meant the agents that are in resource-agents already: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/awsvip https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/awseip https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/aws-vpc-route53 You'll probably also be interested in fencing: There are agents for fencing both on AWS and Azure in the fence-agents repository. Cheers, Kristoffer > > -- > Eric Robinson > > -Original Message- > From: Kristoffer Grönlund [mailto:kgronl...@suse.com] > Sent: Friday, August 25, 2017 3:16 AM > To: Eric Robinson ; Cluster Labs - All topics > related to open-source clustering welcomed > Subject: Re: [ClusterLabs] Pacemaker in Azure > > Eric Robinson writes: > >> I deployed a couple of cluster nodes in Azure and found out right away that >> floating a virtual IP address between nodes does not work because Azure does >> not honor IP changes made from within the VMs. IP changes must be made to >> virtual NICs in the Azure portal itself. Anybody know of an easy way around >> this limitation? > > You will need a custom IP control agent for Azure. We have a series of agents > for controlling IP addresses and domain names in AWS, but there is no agent > for Azure IP control yet. (At least as far as I am aware). > > Cheers, > Kristoffer > >> >> -- >> Eric Robinson >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > -- > // Kristoffer Grönlund > // kgronl...@suse.com -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker in Azure
Eric Robinson writes: > I deployed a couple of cluster nodes in Azure and found out right away that > floating a virtual IP address between nodes does not work because Azure does > not honor IP changes made from within the VMs. IP changes must be made to > virtual NICs in the Azure portal itself. Anybody know of an easy way around > this limitation? You will need a custom IP control agent for Azure. We have a series of agents for controlling IP addresses and domain names in AWS, but there is no agent for Azure IP control yet. (At least as far as I am aware). Cheers, Kristoffer > > -- > Eric Robinson > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Clusterlabs Summit - Finding the office
Hello everyone, The summit is coming closer, and I thought I should send out a brief mail about how to find the event area once you are in Nuremberg. Finding the office == The SUSE office is within walking distance from the conference hotel and the old town center. The closest subway station is the Maxfeld station on the U3 line. Google maps link: https://goo.gl/maps/JMzSnv8ZGqF2 If you are coming from the Central Station, take the U3 directly to Maxfeld (direction Friedrich-Ebert-Platz). From the airport, take the U2 to Rathenauplatz, then change to U3 (direction Friedrich-Ebert-Platz) and exit at Maxfeld. Finding the event = The summit will take place in the SUSE Event Area at Rollnerstraße 8. This is the same building as the SUSE offices, but it is a separate ground floor entrace. We will put up posters to make this clear. The regular SUSE reception is on the 3rd floor, and they have kindly asked me to direct everyone attending the summit directly to the event area. Finding the hotel = The main conference hotel is the Sorat Saxx, located on the Hauptmarkt square in the Nuremberg old town. This is within easy walking distance from both the Central Station and the SUSE office. The closest subway station is Lorenzkirche on the U1 line. Hotel website: https://www.sorat-hotels.com/en/hotel/saxx-nuernberg.html If you have any questions or concerns, please feel free to contact me. See you there! // Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] SLES11 SP4: Strange problem with "(crm configure) commit"
Ulrich Windl writes: > Hi! > > I just had a strange problem: When trying to "clean up" the cib configuration > (acually deleting unneded "operations" lines), I failed to commit the change, > even through it verified OK: > > crm(live)configure# commit > Call cib_apply_diff failed (-206): Application of an update diff failed > ERROR: could not patch cib (rc=206) > INFO: offending xml diff: It looks to me (from a cursory glance) like you may be hitting a bug with the patch generation in pacemaker. But there isn't enough details to say for sure. Try running crmsh with the "-dR" command line options to get it to output the patch it tries to apply to the log. Cheers, Kristoffer > > In Syslog I see this: > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > meta_attributes.14926208 to position 1 instead of 2 (0xe3f0f0) > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > meta_attributes.9876096 to position 1 instead of 2 (0xe3c470) > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > utilization.10594784 to position 1 instead of 2 (0x96a2b0) > Aug 21 15:01:48 h02 cib[19397]:error: xml_apply_patchset_v2: Moved > meta_attributes.11397008 to position 1 instead of 2 (0xacc5b0) > Aug 21 15:01:48 h02 cib[19397]: warning: cib_server_process_diff: Something > went wrong in compatibility mode, requesting full refresh > Aug 21 15:01:48 h02 cib[19397]: warning: cib_process_request: Completed > cib_apply_diff operation for section 'all': Application of an update diff > failed (rc=-206, origin=local/cibadmin/2, version=1.65.23) > > What could be causing this? I think I did the same change about three years > ago without problem (with different software, of course). > > # rpm -q pacemaker corosync crmsh > pacemaker-1.1.12-18.1 > corosync-1.4.7-0.23.5 > crmsh-2.1.2+git132.gbc9fde0-18.2 > (latest) > > Regards, > Ulrich > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] big trouble with a DRBD resource
"Lentes, Bernd" writes: > In both cases i'm inside crmsh. > The difference is that i always enter the complete command from the highest > level of crm. This has the advantage that i can execute any command from the > history directly. > And this has a kind of autocommit. > If i would enter a lower level, then my history is less useless. I always > have to go to the respective level before executing the command from the > history. > But then i have to commit. > Am i the only one who does it like this ? Nobody stumbled across this ? > I always wondered about my ineffective commit, but never got the idea that > such a small difference is the reason. You are right, this is a quirk of crmsh: Each level has its own "state", and exiting the level triggers a commit. Running a command like "configure primitive ..." results internally in three movements; * enter the configure level: This fetches the CIB and checks that it is writable * create the primitive: This updates the internal copy of the CIB * exit the configure level: This creates, verifies and applies a patch to the CIB I can't speak for others, but somehow this has never caused me problems as far as I can remember. Either I have been using it interactively from within the configure section, or I have been running commands from bash. I can't recall if that's because I was told at some point or if it was made clear in the documentation somewhere. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Clusterlabs Summit 2017: Please register!
Hi everyone, This mail is for attendees of the Clusterlabs Summit event in Nuremberg, September 6-7 2017. If it didn't arrive via the Clusterlabs mailing list and you're not going but got this mail anyway, please let me know since apparently I have you on my list of possible attendees ;) Apologies for springing this on you at such a late stage, but as we are investigating dinner options, making badges and making sure there are enough chairs for everyone at the event, it became more and more clear that it would be very useful to have a better grasp of how many people are coming to the event. URL to sign up -- https://www.eventbrite.com/e/clusterlabs-summit-2017-dinner-tickets-3689052 To make it as easy as possible, I created an event on Eventbrite for this purpose. Signing up is not a requirement! However, it would be great if you could send an email to me confirming your attendance regardless, in case you are unhappy about using Eventbrite. Also, it would be great if you could register as quickly as possible so that we can make dinner reservations early enough to hopefully be able to fit everyone into one space. Thank you, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] big trouble with a DRBD resource
"Lentes, Bernd" writes: > Hi, > > first: is there a tutorial or s.th. else which helps in understanding what > pacemaker logs in syslog and /var/log/cluster/corosync.log ? > I try hard to find out what's going wrong, but they are difficult to > understand, also because of the amount of information. > Or should i deal more with "crm histroy" or hb_report ? I like to use crm history log to get the logs from all the nodes in a single flow, but it depends quite a bit on configuration what gets logged where.. > > What happened: > I tried to configure a simple drbd resource following > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#idm140457860751296 > I used this simple snip from the doc: > configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata \ > op monitor interval=60s I'll try to sum up the issues I see, from a glance: * The drbd resource is a multi-state / master-slave resource, which is technically a variant of a clone resource where different clones can either be in a primary or secondary state. To configure it correctly, you'll need to create a master resource as well. Doing this with a single command is unfortunately a bit painful. Either use crm configure edit, or the interactive crm mode (with a verify / commit after creating both the primitive and the master resources). * You'll need to create monitor operations for both the master and slave roles, as you note below, and set explicit timeouts for all operations. * Make sure the wwwdata DRBD resource exists, is accessible from both nodes, and is in a good state to begin with (that is, not split-brained). I would recommend following one of the tutorials provided by Linbit themselves which show how to set this stuff up correctly, since it is quite a bit involved. > Btw: is there a history like in the bash where i see which crm command i > entered at which time ? I know that crm history is mighty, but didn't find > that. We don't have that yet :/ If you're not in interactive mode, your bash history should have the commands though. > no backup - no mercy lol ;) Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?
Ulrich Windl writes: > > See my proposal above. ;-) Hmm, yes. It's a possibility. Magic values rarely end up making things simpler though :/ Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: from where does the default value for start/stop op of a resource come ?
Ulrich Windl writes: > > What aout this priority for newly added resources:? > 1) Use the value specified explicitly > 2) Use the value the RA's metadata specifies > 3) Use the global default > > With "use" I mean "add it to the RA configuration". Yeah, I've considered it. The main issue I see with making the change to crmsh now is that it would also be confusing, when configuring a resource without any operations and getting operations defined anyway. Also, it would be impossible not to define operations that have defaults in the metadata. One idea might be to have a new command which inserts missing operations and operation timeouts based on the RA metadata. Cheers, Kristoffer > > Regards, > Ulrich > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] from where does the default value for start/stop op of a resource come ?
"Lentes, Bernd" writes: > Hi, > > i'm wondering from where the default values for operations of a resource come > from. [snip] > > Is it hardcoded ? All timeouts i found in my config were explicitly related > to a dedicated resource. > What are the values for the hardcoded defaults ? > > Does that also mean that what the description of the RA says as "default" > isn't a default, but just a recommendation ? The default timeout is set by the default-action-timeout property, and the default value is 20s. You are correct, the timeout values defined in the resource agent are not used automatically. They are recommended minimums, and the thought as I understand it (this predates my involvement in HA) is that any timeouts need to be reviewed carefully by the administrator. I agree that it is somewhat surprising. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Clusterlabs Summit 2017 (Sept. 6-7 in Nuremberg) - One month left!
Hey everyone! Here's a quick update for the upcoming Clusterlabs Summit at the SUSE office in Nuremberg in September: The time to register for the pool of hotel rooms has now expired - we have sent the final list of names to the hotel. There may still be hotel rooms available at the Sorat Saxx or other hotels in Nuremberg, so if anyone missed the deadline and still needs a room, either contact me or feel free to contact the hotel directly. The same goes for any changes, for those who have reservations: Please either contact me, or contact the hotel directly at i...@saxx-nuernberg.de. The schedule is being sorted out right now, and the planning wiki will be updated with a preliminary schedule soon. If there is anyone who would like to present on a topic or would like to discuss a topic that isn't on the wiki yet, now is the time to add it there. Other than that, I don't have any other remarks, other than to wish everyone welcome to Nuremberg in a month! Feel free to contact me with any concerns or issues related to the summit, and I'll do what I can to help out. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] [HA/ClusterLabs Summit] Key-Signing Party, 2017 Edition
Jan Pokorný writes: > [ Unknown signature status ] > Hello cluster masters :-) > > as there's little less than 7 weeks left to "The Summit" meetup > (<http://plan.alteeve.ca/>), it's about time to get the ball > rolling so we can voluntarily augment the digital trust amongst > us the attendees, on OpenGPG basis. > > Doing that, we'll actually establish a tradition since this will > be the second time such event is being kicked off (unlike the birds > of the feather gathering itself, was edu-feathered back then): > > <https://people.redhat.com/jpokorny/keysigning/2015-ha/> > <http://lists.linux-ha.org/pipermail/linux-ha/2015-January/048507.html> > > If there are no objections, yours truly will conduct this undertaking. > (As an aside, I am toying with an idea of optimizing the process > a bit now that many keys are cross-signed already; I doubt there's > a value of adding identical signatures just with different timestamps, > unless, of course, the inscribed level of trust is going to change, > presumably elevate -- any comments?) Hi Jan, No objections from me, thank you for taking charge of this! Cheers, Kristoffer > > * * * > > So, going to attend summit and want your key signed while reciprocally > spreading the web of trust? > Awesome, let's reuse the steps from the last time: > > Once you have a key pair (and provided that you are using GnuPG), > please run the following sequence: > > # figure out the key ID for the identity to be verified; > # IDENTITY is either your associated email address/your name > # if only single key ID matches, specific key otherwise > # (you can use "gpg -K" to select a desired ID at the "sec" line) > KEY=$(gpg --with-colons 'IDENTITY' | grep '^pub' | cut -d: -f5) > > # export the public key to a file that is suitable for exchange > gpg --export -a -- $KEY > $KEY > > # verify that you have an expected data to share > gpg --with-fingerprint -- $KEY > > with IDENTITY adjusted as per the instruction above, and send me the > resulting $KEY file, preferably in a signed (or even encrypted[*]) email > from an address associated with that very public key of yours. > > Timeline? > Please, send me your public keys *by 2017-09-05*, off-list and > best with [key-2017-ha] prefix in the subject. I will then compile > a list of the attendees together with their keys and publish it at > <https://people.redhat.com/jpokorny/keysigning/2017-ha/> > so it can be printed beforehand. > > [*] You can find my public key at public keyservers: > <http://pool.sks-keyservers.net/pks/lookup?op=vindex&search=0x60BCBB4F5CD7F9EF> > Indeed, the trust in this key should be ephemeral/one-off > (e.g. using a temporary keyring, not a universal one before we > proceed with the signing :) > > * * * > > Thanks for your cooperation, looking forward to this side stage > (but nonetheless important if release or commit[1] signing is to get > traction) happening and hope this will be beneficial to all involved. > > See you there! > > > [1] for instance, see: > <https://github.com/blog/2144-gpg-signature-verification> > <https://pagure.io/pagure/issue/885> > > -- > Jan (Poki) > ___ > Developers mailing list > develop...@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/developers -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] crmsh: Release 3.0.1
Hello everyone! I'm happy to announce the release of crmsh version 3.0.1 today. This is mainly a bug fix release, so no new exciting features and mainly fixes to the new bootstrap functionality added in 3.0.0. I would also like to take the opportinity to introduce a new core developer for crmsh, Xin Liang! For this release he has contributed some of the bug fixes discovered, but he has also contributed a rewrite of hb_report into Python, as well as worked on improving the tab completion support in crmsh. I also want to recognize the hard work of Shiwen Zhang who initially started the work of rewriting the hb_report script in Python. For the complete list of changes in this release, see the ChangeLog: * https://github.com/ClusterLabs/crmsh/blob/3.0.1/ChangeLog The source code can be downloaded from Github: * https://github.com/ClusterLabs/crmsh/releases/tag/3.0.1 This version of crmsh (or a version very close to it) is already available in openSUSE Tumbleweed, and packages for several popular Linux distributions will be available from the Stable repository at the OBS: * http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ Archives of the tagged release: * https://github.com/ClusterLabs/crmsh/archive/3.0.1.tar.gz * https://github.com/ClusterLabs/crmsh/archive/3.0.1.zip As usual, a huge thank you to all contributors and users of crmsh! Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Introducing the Anvil! Intelligent Availability platform
Digimer writes: > Hi all, > > I suspect by now, many of you here have heard me talk about the Anvil! > intelligent availability platform. Today, I am proud to announce that it > is ready for general use! > > https://github.com/ClusterLabs/striker/releases/tag/v2.0.0 > Cool, congratulations! Cheers, Kristoffer > > Now, time to start working full time on version 3! > > -- > Digimer > Papers and Projects: https://alteeve.com/w/ > "I am, somehow, less interested in the weight and convolutions of > Einstein’s brain than in the near certainty that people of equal talent > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?
Eric Robinson writes: >> If you're looking to run without support, you can run openSUSE Leap - it's >> the >> closest equivalent to centOS in the SUSE world and the HA packages are all in >> there. >> > > Out of curiosity, do the openSUSE Leap repos and packages work with SLES? I know that there are some base system differences that could cause problems, things like Leap using systemd/journald for logging while SLES is still logging via syslog-ng (IIRC)... so it's possible that you could get into problems if you mix versions. And adding the Leap repositories to SLES will probably mess things up since both deliver slightly different versions of the base system. For SLES, there's now the Package Hub which has open source packages taken from Leap and confirmed not to conflict with SLES, so you can mix a supported base system with unsupported open source packages with less risk for breaking anything: https://packagehub.suse.com/ Cheers, Kristoffer > > --Eric -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?
Eric Robinson writes: > We've been a Red Hat/CentOS shop for 10+ years and have installed > Corosync+Pacemaker+DRBD dozens of times using the repositories, all for free. > > We are now trying out our first SLES 12 server, and I'm looking for the > repos. Where the heck are they? I went looking, and all I can find is the > SLES "High Availability Extension," which I must pay $700/year for? No > freaking way! > > This is Linux we're talking about, right? There's got to be an easy way to > install the cluster without paying for a subscription... right? > > Someone talk me off the ledge here. > If you're looking to run without support, you can run openSUSE Leap - it's the closest equivalent to centOS in the SUSE world and the HA packages are all in there. (I'd recommend the supported version, of course ;) Cheers, Kristoffer > -- > Eric Robinson > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] "Connecting" Pacemaker with another cluster manager
Timo writes: > Hi, > > I have a proprietary cluster manager running on a bunch (four) of nodes. > It decides to run the daemon for which HA is required on its own set of > (undisclosed) requirements and decisions. This is, unfortunately, > unavoidable due to business requirements. > > However, I have to put also Pacemaker onto the nodes in order to provide > an additional daemon running in HA mode. (I cannot do this using the > existing cluster manager, as this is a closed system.) > > I have to make sure that the additional daemon (which I plan to > coordinate using Pacemaker) only runs on the machine where the daemon > (controlled by the existing, closed cluster manager) runs. I could check > for local VIPs, for example, to check whether it runs on a node or not. > > Is there any way to make Pacemaker "check" for existence of a local > (V)IP so that I could "connect" both cluster managers? > > In short: I need Pacemaker to put the single instance of a daemon > exactly onto the node the other cluster manager decided to run the > (primary) daemon. Hi, I'm not sure I completely understand the problem description, but if I parsed it correctly: What you can do is run an external script which sets a node attribute on the node that has the external cluster manager daemon, and have a constraint which locates the additional daemon based on that node attribute. Cheers, Kristoffer > > Best regards, > > Timo > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?
"Lentes, Bernd" writes: > - On May 8, 2017, at 9:20 PM, Bernd Lentes > bernd.len...@helmholtz-muenchen.de wrote: > >> Hi, >> >> i remember that digimer often campaigns for a fence delay in a 2-node >> cluster. >> E.g. here: >> http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html >> In my eyes it makes sense, so i try to establish that. I have two HP servers, >> each with an ILO card. >> I have to use the stonith:external/ipmi agent, the stonith:external/riloe >> refused to work. >> >> But i don't have a delay parameter there. >> crm ra info stonith:external/ipmi: >> >> ... >> pcmk_delay_max (time, [0s]): Enable random delay for stonith actions and >> specify >> the maximum of random delay >>This prevents double fencing when using slow devices such as sbd. >>Use this to enable random delay for stonith actions and specify the >> maximum of >>random delay. >> ... >> >> This is the only delay parameter i can use. But a random delay does not seem >> to >> be a reliable solution. >> >> The stonith:ipmilan agent also provides just a random delay. Same with the >> riloe >> agent. >> >> How did anyone solve this problem ? >> >> Or do i have to edit the RA (I will get practice in that :-))? >> >> > > crm ra info stonith:external/ipmi says there exists a parameter > pcmk_delay_max. > Having a look in /usr/lib64/stonith/plugins/external/ipmi i don't find > anything about delay. > Also "crm_resource --show-metadata=stonith:external/ipmi" does not say > anything about a delay. > > Is this "pcmk_delay_max" not implemented ? From where does "crm ra info > stonith:external/ipmi" get this info ? > pcmk_delay_max is implemented by Pacemaker. crmsh gets the information about available parameters by querying stonithd directly. Cheers, Kristoffer > > Bernd > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Clusterlabs Summit 2017 (Nuremberg, 6-7 September) - Hotels and Topics
Hi everyone! Here's a quick update on the summit happening at the SUSE office in Nuremberg on September 6-7. I am still collecting hotel reservations from attendees. In order to notify the hotel about how many rooms we actually need, I'll need a complete list of people who want to attend before 15 June, at the latest. So if you plan to attend and need a hotel room, let me know as soon as possible by emailing me! There are 40 hotel rooms reserved, and about half of those are claimed at this point. We are starting to have a preliminary list of topics ready. The event area has a projector and A/V equipment available, so we should be able to show slides for those wanting to present a particular topic. This is the current list of topics: Topic Requester/Presenter Topic Andrew Beekhof or Ken Gaillot New container "bundle" feature in Pacemaker Ken Gaillot What would Pacemaker 1.2 or 2.0 look like? Ken Gaillot Ideas for the OCF resource agent standard Klaus Wenninger Recent work and future plans for SBD Chrissie Caulfieldknet and corosync 3 Chris Feist (requestor) kubernetes Chris Feist (requestor) Multisite (QDevice/Booth) Madison Kelly ScanCore and "Intelligent Availability" Kristoffer Gronlund, Hawk, Cluster API and future plans Ayoub Belarbi We also have Kai Wagner from the openATTIC team attending, and he has agreed to present openATTIC. For those who aren't familiar with it, openATTIC is a storage management tool with some support for managing things like LVM, DRBD and Ceph. I am also happy to say that Adam Spiers from the SUSE Cloud team will be attending the summit, and hopefully I can convince him to present their work on using Pacemaker with Openstack, the current state of Openstack HA and perhaps some of his future plans and wishes around HA. Keep adding topics to the list! We'll work out a rough schedule for the two days as the event draws nearer, but I'd hope to leave enough room for deeper discussions around the topics as we work through them. As a reminder, the plans for the summit are being collected at the Alteeve! planning wiki, here: http://plan.alteeve.ca/index.php/Main_Page Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby
Ken Gaillot writes: > Hi all, > > Pacemaker 1.1.17 will have a feature that people have occasionally asked > for in the past: the ability to start a node in standby mode. > > It will be controlled by an environment variable (set in > /etc/sysconfig/pacemaker, /etc/default/pacemaker, or wherever your > distro puts them): > > > # By default, nodes will join the cluster in an online state when they first > # start, unless they were previously put into standby mode. If this > variable is > # set to "standby" or "online", it will force this node to join in the > # specified state when starting. > # (experimental; currently ignored for Pacemaker Remote nodes) > # PCMK_node_start_state=default > > > As described, it will be considered experimental in this release, mainly > because it doesn't work with Pacemaker Remote nodes yet. However, I > don't expect any problems using it with cluster nodes. > > Example use cases: > > You want want fenced nodes to automatically start the cluster after a > reboot, so they contribute to quorum, but not run any resources, so the > problem can be investigated. You would leave > PCMK_node_start_state=standby permanently. > > You want to ensure a newly added node joins the cluster without problems > before allowing it to run resources. You would set this to "standby" > when deploying the node, and remove the setting once you're satisfied > with the node, so it can run resources at future reboots. > > You want a standby setting to last only until the next boot. You would > set this permanently to "online", and any manual setting of standby mode > would be overwritten at the next boot. > > Many thanks to developers Alexandra Zhuravleva and Sergey Mishin, who > contributed this feature as part of a project with EMC. One of those features that seem obvious in retrospect. Great addition, thanks to everyone involved! Cheers, Kristoffer > -- > Ken Gaillot > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Wtrlt: Antw: Re: Antw: Re: how important would you consider to have two independent fencing device for each node ?
Ken Gaillot writes: >>> I think it works differently: One task periodically reads ist mailbox slot >>> for commands, and once a comment was read, it's executed immediately. Only >> if >>> the read task does hang for a long time, the watchdog itself triggers a >> reset >>> (as SBD seems dead). So the delay is actually made from the sum of "write >>> delay", "read delay", "command excution". > > I think you're right when sbd uses shared-storage, but there is a > watchdog-only configuration that I believe digimer was referring to. > > With watchdog-only, the cluster will wait for the value of the > stonith-watchdog-timeout property before considering the fencing successful. I think there are some important distictions to make, to clarify what SBD is and how it works: * The original SBD model uses shared storage as its fencing mechanism (thus the name Shared-storage based death) - when talking about watchdog-only SBD, a new mode only introduced in a fork of the SBD project, it would probably help avoid confusion to be explicit about that. * Watchdog-only SBD relies on quorum to avoid split-brain or fence loops, and thus requires at least three nodes or an additional qdevice node. This is my understanding, correct me if I am wrong. Also, this disqualifies watchdog-sbd from any of Digimers setups since they are 2-node only, so that's probably something to be aware of in this discussion. ;) * The watchdog fencing in SBD is not the primary fence mechanism when shared storage is available. In fact, it is an optional although strongly recommended component. [1] [1]: We (as in SUSE) require use of a watchdog for supported configurations, but technically it is optional. -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Surprising semantics of location constraints with INFINITY score
Jehan-Guillaume de Rorthais writes: > Hi, > >> >>> Kristoffer Grönlund schrieb am 11.04.2017 um 15:30 >> >>> in >> Nachricht <87lgr7kr64@suse.com>: >> > Hi all, >> > >> > I discovered today that a location constraint with score=INFINITY >> > doesn't actually restrict resources to running only on particular >> > nodes. From what I can tell, the constraint assigns the score to that >> > node, but doesn't change scores assigned to other nodes. So if the node >> > in question happens to be offline, the resource will be started on any >> > other node. > > AFAIU, this behavior is expected when you set up your cluster with the Opt-In > strategy: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_deciding_which_nodes_a_resource_can_run_on > No, this is the behavior of an Opt-Out cluster. So it seems you are under the same misconception as I was. :) Cheers, Kristoffer > -- > Jehan-Guillaume de Rorthais > Dalibo > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Surprising semantics of location constraints with INFINITY score
Hi all, I discovered today that a location constraint with score=INFINITY doesn't actually restrict resources to running only on particular nodes. From what I can tell, the constraint assigns the score to that node, but doesn't change scores assigned to other nodes. So if the node in question happens to be offline, the resource will be started on any other node. Example: If node2 is offline, I see the following: dummy (ocf::heartbeat:Dummy): Started node1 native_color: dummy allocation score on node1: 1 native_color: dummy allocation score on node2: -INFINITY native_color: dummy allocation score on webui: 0 It makes some kind of sense, but seems surprising - and the documentation is a bit unclear on the topic. In particular, the statement that a score = INFINITY means "must" is clearly not correct in this case. Maybe the documentation should be clarified for location constraints? -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Rename option group resource id with pcs
Ulrich Windl writes: >>>> Dejan Muhamedagic schrieb am 11.04.2017 um 11:43 in > Nachricht <20170411094352.GD8414@tuttle.homenet>: >> Hi, >> >> On Tue, Apr 11, 2017 at 10:50:56AM +0200, Tomas Jelinek wrote: >>> Dne 11.4.2017 v 08:53 SAYED, MAJID ALI SYED AMJAD ALI napsal(a): >>> >Hello, >>> > >>> >Is there any option in pcs to rename group resource id? >>> > >>> >>> Hi, >>> >>> No, there is not. >>> >>> Pacemaker doesn't really cover the concept of renaming a resource. >> >> Perhaps you can check how crmsh does resource rename. It's not >> impossible, but can be rather involved if there are other objects >> (e.g. constraints) referencing the resource. Also, crmsh will >> refuse to rename the resource if it's running. > > The real problem in pacemaker (as resources are created now) is that the > "IDs" have too much semantic, i.e. most are derived from the resource name > (while lacking a name attribute or element), and some required elements are > IDs are accessed by ID, and not by name. > > Examples: > >value="1.1 > .12-f47ea56"/> > > A s and s have no name, but only an ID (it seems). > > > > This is redundant: As the is part of a resource (by XML structure) it's > unneccessary to put the name of the resource into the ID of the operation. > > It all looks like a kind of abuse of XML IMHO.I think the next CIB format > should be able to handle IDs that are free of semantics other than to denote > (relatively unique) identity. That is: It should be OK to assign IDs like > "i1", "i2", "i3", ... and besides from an IDREF the elements should be > accessed by structure and/or name. > > (If the ID should be the primary identification feature, flatten all > structure and drop all (redundant) names.) The abuse of ids in the pacemaker schema is a pet peeve of mine; it would be better to only have ids for nodes where it makes sense: Naming resources, for example (though I would prefer human-friendly names rather than ids with loosely defined restrictions). References to individual XML nodes can be done via XPATH rather than having to assign ids to every single node in the tree. Of course, changing it at this point is probably not worth the trouble. Cheers, Kristoffer > > Regards, > Ulrich > >> >> Thanks, >> >> Dejan >> >>> From >>> pacemaker's point of view one resource gets removed and another one gets >>> created. >>> >>> This has been discussed recently: >>> http://lists.clusterlabs.org/pipermail/users/2017-April/005387.html >>> >>> Regards, >>> Tomas >>> >>> > >>> > >>> > >>> > >>> > >>> > >>> >*/MAJID SAYED/* >>> > >>> >/HPC System Administrator./ >>> > >>> >/King Abdullah International Medical Research Centre/ >>> > >>> >/Phone:+9661801(Ext:40631)/ >>> > >>> >/Email:sayed...@ngha.med.sa/ >>> > >>> > >>> > >>> > >>> > >>> >This Email and any files transmitted may contain confidential and/or >>> >privileged information and is intended solely for the addressee(s) >>> >named. If you have received this information in error, or are being >>> >posted by accident, please notify the sender by return Email, do not >>> >redistribute this email message, delete it immediately and keep no >>> >copies of it. All opinions and/or views expressed in this email are >>> >solely those of the author and do not necessarily represent those of >>> >NGHA. Any purchase order, purchase advice or legal commitment is only >>> >valid once backed by the signed hardcopy by the authorized person from >>> >NGHA. >>> > >>> > >>> >___ >>> >Users mailing list: Users@clusterlabs.org >>> >http://lists.clusterlabs.org/mailman/listinfo/users >>> > >>> >Project Home: http://www.clusterlabs.org >>> >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> >Bugs: http://bugs.clusterlabs.org >>> > >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org
Re: [ClusterLabs] Can't See Why This Cluster Failed Over
Eric Robinson writes: >> crm configure show xml c_clust19 > > Here is what I am entering using crmsh (version 2.0-1): > > > colocation c_clust19 inf: [ p_mysql_057 p_mysql_092 p_mysql_187 ] > p_vip_clust19 p_fs_clust19 p_lv_on_drbd0 ms_drbd0:Master > order o_clust19 inf: ms_drbd0:promote p_lv_on_drbd0 p_fs_clust19 > p_vip_clust19 [ p_mysql_057 p_mysql_092 p_mysql_187 ] > > > After I save it, I get no errors, but it converts it to this... > > > colocation c_clust19 inf: [ p_mysql_057 p_mysql_092 p_mysql_187 ] ( > p_vip_clust19:Master p_fs_clust19:Master p_lv_on_drbd0:Master ) ( > ms_drbd0:Master ) > order o_clust19 inf: ms_drbd0:promote ( p_lv_on_drbd0:start > p_fs_clust19:start p_vip_clust19:start ) [ p_mysql_057 p_mysql_092 > p_mysql_187 ] > > This looks incorrect to me. > > Here is the xml that it generates. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The resources in set c_clust19-1 should start sequentially, starting with > p_lv_on_drbd0 and ending with p_vip_clust19. I also don't understand why > p_lv_on_drbd0 and p_vip_clust19 are getting the Master designation. Hi, Yeah, that does indeed look like a bug.. One thing that is confusing and may be one reason why things get split in an unexpected way is because as you can see, the role attribute is applied per resource set, while it looks like it applies per resource in the crmsh syntax. So the shell does some complex logic to "split" sets based on role assignment. Cheers, Kristoffer > > -- > Eric Robinson > > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Can't See Why This Cluster Failed Over
Eric Robinson writes: > Here's the config. I don't know why the CRM put in the parenthesis where it > did. That's not the way I typed it. I usually have all my mysql instances > between parenthesis and everything else outside. [ ...] > colocation c_clust19 inf: ( p_mysql_057 p_mysql_092 p_mysql_187 p_mysql_213 > p_mysql_250 p_mysql_289 p_mysql_312 p_vip_clust19 p_mysql_702 p_mysql_743 > p_mysql_745 p_mysql_746 p_fs_clust19 p_lv_on_drbd0 ) ( ms_drbd0:Master ) > colocation c_clust20 inf: p_vip_clust20 p_fs_clust20 p_lv_on_drbd1 > ms_drbd1:Master > order o_clust19 inf: ms_drbd0:promote ( p_lv_on_drbd0:start ) ( p_fs_clust19 > p_vip_clust19 ) ( p_mysql_057 p_mysql_092 p_mysql_187 p_mysql_213 p_mysql_250 > p_mysql_289 p_mysql_312 p_mysql_702 p_mysql_743 p_mysql_745 p_mysql_746 ) This might be a bug in crmsh: What was the expression you intended to write, and which version of crmsh do you have? You can see the resulting XML that crmsh generates and then re-parses into the line syntax using crm configure show xml c_clust19 Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Clusterlabs Summit 2017 - Dates, hotels and other updates
Hi everyone! In case anyone missed the previous emails on this topic, I am working on arranging another Clusterlabs (formerly Linux HA) Summit, this time in Nuremberg, Germany. Although I know it presents a bit of a problem for some people, we have now decided on a pair of dates for the summit. The unfortunate reality is that Nuremberg is very busy during the summer, and finding dates that have both a location for the summit as well as sufficient hotel rooms still available presents a challenge. The summit will take place on September 6-7, 2017, in the brand new SUSE office event area. This page has instructions for how to get there: https://www.suse.com/company/contact/headquarters/ For anyone flying in, I can recommend flying to Frankfurt and taking the train from there. The flight from Frankfurt is only 30 minutes, so often the wait between flights and the flight combined end up taking the same time as the train. HOTELS We have 40 hotel rooms reserved at the Sorat Saxx Hotel at a very good rate including breakfast and wifi for the duration of the conference week (September 4-8). If you are interested in grabbing one of these rooms, please let me know at kgronl...@suse.com before July 10 at the latest as we need a complete list of names to give to the hotel before the conference starts. (more than half of the rooms are already spoken for, so let me know ASAP) https://www.sorat-hotels.com/en/hotel/saxx-nuernberg.html If you plan to book your own accomodation, make sure you do so as soon as possible. The hotels in Nuremberg tend to fill up early for the summer season. ATTENDEE LIST If you plan to attend, please put your name on the planning wiki! That way, we have a chance to make sure that there's enough coffee for everyone. ;) http://plan.alteeve.ca/index.php/Main_Page CALL FOR TOPICS For those attending, now is a good time to start thinking about topics we might cover! If anyone would like to present something, we will have access to basic equipment like a projector and so on. Again, putting it on the wiki is the best place for suggestions or opinions as well. Thank you, and hope to see you there! -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: question about ocf metadata actions
Ulrich Windl writes: > I thought the hierarchy is like this: > 1) default timeout > 2) RA's default timeout > 3) user-specified timeout > > So crm would go from 1) to 3) taking the last value it finds. Isn't it like > that? No, step 2) is not taken by crm. > I mean if there's no timeout in the resource cnfiguration, doesn't the RM use > the default timeout? Yes, it then uses the timeout defined in op_defaults: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-operation-defaults Cheers, Kristoffer > > Regards, > Ulrich > >> >> https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra > >> -dev-guide.asc#_metadata >> >>> Every action should list its own timeout value. This is a hint to the >>> user what minimal timeout should be configured for the action. This is >>> meant to cater for the fact that some resources are quick to start and >>> stop (IP addresses or filesystems, for example), some may take several >>> minutes to do so (such as databases). >> >>> In addition, recurring actions (such as monitor) should also specify a >>> recommended minimum interval, which is the time between two >>> consecutive invocations of the same action. Like timeout, this value >>> does not constitute a default— it is merely a hint for the user which >>> action interval to configure, at minimum. >> >> Cheers, >> Kristoffer >> >>> >>> Br, >>> >>> Allen >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> -- >> // Kristoffer Grönlund >> // kgronl...@suse.com >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith
Alexander Markov writes: > Hello, Kristoffer > >> Did you test failover through pacemaker itself? > > Yes, I did, no problems here. > >> However: Am I understanding it correctly that you have one node in each >> data center, and a stonith device in each data center? > > Yes. > >> If the >> data center is lost, the stonith device for the node in that data >> center >> would also be lost and thus not able to fence. > > Exactly what happens! > >> In such a hardware configuration, only a poison pill solution like SBD >> could work, I think. > > I've got no shared storage here. Every datacenter has its own storage > and they have replication on top (similar to drbd). I can organize a > cross-shared solution though if it help, but don't see how. The only solution I know which allows for a configuration like this is using separate clusters in each data center, and using booth for transferring ticket ownership between them. Booth requires a data center-level quorum (meaning at least 3 locations), though the third location can be just a small daemon without an actual cluster, and can run in a public cloud or similar for example. Cheers, Kristoffer > >> -- >> Regards, >> Alexander > > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] question about ocf metadata actions
he.hailo...@zte.com.cn writes: > Hi, > > > Does the timeout configured in the ocf metadata actually take effect? > > > > > <actions> > > <action name="start" timeout="300s" /> > > <action name="stop" timeout="200s" /> > > <action name="status" timeout="20s" /> > > <action name="monitor" depth="0" timeout="20s" interval="2s" /> > > <action name="meta-data" timeout="120s" /> > > <action name="validate-all" timeout="20s" /> > > </actions> > > > > > what's the relationship with the ones configured using "crm configure > primitive" ? Hi Allen, The timeouts in the OCF metadata are merely documentation hints, and ignored by Pacemaker unless configured appropriately in the CIB (which is what crm configure primitive does). See the OCF documentation: https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc#_metadata > Every action should list its own timeout value. This is a hint to the > user what minimal timeout should be configured for the action. This is > meant to cater for the fact that some resources are quick to start and > stop (IP addresses or filesystems, for example), some may take several > minutes to do so (such as databases). > In addition, recurring actions (such as monitor) should also specify a > recommended minimum interval, which is the time between two > consecutive invocations of the same action. Like timeout, this value > does not constitute a default — it is merely a hint for the user which > action interval to configure, at minimum. Cheers, Kristoffer > > Br, > > Allen > _______ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith
Alexander Markov writes: > Hello guys, > > it looks like I miss something obvious, but I just don't get what has > happened. > > I've got a number of stonith-enabled clusters within my big POWER boxes. > My stonith devices are two HMC (hardware management consoles) - separate > servers from IBM that can reboot separate LPARs (logical partitions) > within POWER boxes - one per every datacenter. > > So my definition for stonith devices was pretty straightforward: > > primitive st_dc2_hmc stonith:ibmhmc \ > params ipaddr=10.1.2.9 > primitive st_dc1_hmc stonith:ibmhmc \ > params ipaddr=10.1.2.8 > clone cl_st_dc2_hmc st_dc2_hmc > clone cl_st_dc1_hmc st_dc1_hmc > > Everything was ok when we tested failover. But today upon power outage Did you test failover through pacemaker itself? Otherwise, the logs for the attempted stonith should reveal more about how Pacemaker tried to call the stonith device, and what went wrong. However: Am I understanding it correctly that you have one node in each data center, and a stonith device in each data center? That doesn't sound like a setup that can recover from data center failure: If the data center is lost, the stonith device for the node in that data center would also be lost and thus not able to fence. In such a hardware configuration, only a poison pill solution like SBD could work, I think. Cheers, Kristoffer > we lost one DC completely. Shortly after that cluster just literally > hanged itself upong trying to reboot nonexistent node. No failover > occured. Nonexistent node was marked OFFLINE UNCLEAN and resources were > marked "Started UNCLEAN" on nonexistent node. > > UNCLEAN seems to flag a problems with stonith configuration. So my > question is: how to avoid such behaviour? > > Thank you! > > -- > Regards, > Alexander > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fence agent for VirtualBox
Marek Grac writes: > Hi, > > we have added support for a host with Windows but it is not trivial to > setup because of various contexts/privileges. > > Install openssh on Windows (tutorial can be found on > http://linuxbsdos.com/2015/07/30/how-to-install-openssh-on-windows-10/) > > There is a major issue with current setup in Windows. You have to start > virtual machines from openssh connection if you wish to manage them from > openssh connection. > > So, you have to connect from Windows to very same Windows using ssh and > then run > > “/Program Files/Oracle/VirtualBox/VBoxManage.exe” start NAME_OF_VM > > Be prepared that you will not see that your machine VM is running in > VirtualBox > management UI. > > Afterwards it is enough to add parameter --host-os windows (or > host_os=windows when stdin/pcs is used). > Cool, nice work! Cheers, Kristoffer > m, > > On Wed, Feb 22, 2017 at 11:49 AM, Marek Grac wrote: > >> Hi, >> >> I have updated fence agent for Virtual Box (upstream git). The main >> benefit is new option --host-os (host_os on stdin) that supports >> linux|macos. So if your host is linux/macos all you need to set is this >> option (and ssh access to a machine). I would love to add a support also >> for windows but I'm not able to run vboxmanage.exe over the openssh. It >> works perfectly from command prompt under same user, so there are some >> privileges issues, if you know how to fix this please let me know. >> >> m, >> > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm shell RA completion
Ulrich Windl writes: > Hi! > > I have a proposal for crm shell's RA completion: When pressing TAB after "ra > info", crm shell suggests a long list of RAs. Wouldn't it preferable to > complete only up to the next ':'? > > Consider this: > crm(live)# ra info > Display all 402 possibilities? (y or n)n > crm(live)# ra info ocf: > Display all 101 possibilities? (y or n)n > crm(live)# ra info ocf:heartbeat: > (a long list is displayed) > > So at the first level not all 402 RAs should be suggested but only the first > level (like "ocf"), and at the second level not all 101 completions should be > suggested, but only a few (like "heartbeat"). > > What do you think? Sounds good to me, yes. The completion is a bit wonky and tricky to get right. Still a work in progress. Cheers, Kristoffer > > Regards, > Ulrich > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] question about equal resource distribution
Ilia Sokolinski writes: > Suppose I have a N node cluster where N > 2 running m*N resources. Resources > don’t have preferred nodes, but since resources take RAM and CPU it is > important to distribute them equally among the nodes. > Will pacemaker do the equal distribution, e.g. m resources per node? > If a node fails, will pacemaker redistribute the resources equally too, e.g. > m * N/(N-1) per node? > > I don’t see any settings controlling this behavior in the documentation, but > perhaps, pacemaker tries to be “fair” by default. > Yes, pacemaker tries to allocate resources evenly by default, and will move resources when nodes fail in order to maintain that. There are several different mechanisms that influence this behaviour: * Any placement constraints in general influence where resources are allocated. * You can set resource-stickiness to a non-zero value which determines to which degree Pacemaker prefers to leave resources running where they are. The score is in relation to other placement scores, like constraint scores etc. This can be set for individual resources or globally. [1] * If you have an asymmetrical cluster, resources have to be manually allocated to nodes via constraints, see [2] [1]: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options [2]: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_asymmetrical_opt_in_clusters Cheers, Kristoffer > Thanks > > Ilia Sokolinski > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] question about equal resource distribution
Ilia Sokolinski writes: > Thank you! > > What quantity does pacemaker tries to equalize - number of running resources > per node or total stickiness per node? > I honestly don't know exactly what the criteria are. Without any utilization definitions for nodes, I *think* it tries to balance the number of resources per node. But if the resources and nodes have cpu/memory utilization defined, the rules change. But I'm afraid I haven't dug into exactly what the logic looks like. > Suppose I have a bunch of web server groups each with IPaddr and apache > resources, and a fewer number of database groups each with IPaddr, postgres > and LVM resources. > > In that case, does it mean that 3 web server groups are weighted the same as > 2 database groups in terms of distribution? Good question, I think it looks purely at the primitive resources. Groups are just shorthand for a series of ordering and placement constraints. Cheers, Kristoffer > > Ilia > > > >> On Feb 17, 2017, at 2:58 AM, Kristoffer Grönlund >> wrote: >> >> Ilia Sokolinski writes: >> >>> Suppose I have a N node cluster where N > 2 running m*N resources. >>> Resources don’t have preferred nodes, but since resources take RAM and CPU >>> it is important to distribute them equally among the nodes. >>> Will pacemaker do the equal distribution, e.g. m resources per node? >>> If a node fails, will pacemaker redistribute the resources equally too, >>> e.g. m * N/(N-1) per node? >>> >>> I don’t see any settings controlling this behavior in the documentation, >>> but perhaps, pacemaker tries to be “fair” by default. >>> >> >> Yes, pacemaker tries to allocate resources evenly by default, and will >> move resources when nodes fail in order to maintain that. >> >> There are several different mechanisms that influence this behaviour: >> >> * Any placement constraints in general influence where resources are >> allocated. >> >> * You can set resource-stickiness to a non-zero value which determines >> to which degree Pacemaker prefers to leave resources running where >> they are. The score is in relation to other placement scores, like >> constraint scores etc. This can be set for individual resources or >> globally. [1] >> >> * If you have an asymmetrical cluster, resources have to be manually >> allocated to nodes via constraints, see [2] >> >> [1]: >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options >> [2]: >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_asymmetrical_opt_in_clusters >> >> Cheers, >> Kristoffer >> >>> Thanks >>> >>> Ilia Sokolinski >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> -- >> // Kristoffer Grönlund >> // kgronl...@suse.com > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] resources management - redesign
Hi Florin, I'm afraid I don't quite understand what it is that you are asking. You can specify the resource ID when creating resources, and using resource constraints, you can specify any order/colocation structure that you need. > 1. RG = rg1 + following resources: fs1, fs2,fs3, ocf:heartbeat[my custom > systemd script] What do you mean by ocf:heartbeat[my custom systemd script]? If you've got your own service with a systemd service file and you don't need custom monitoring, you can use "systemd:" as the resource agent. > Now, what solution exists ? export cib, edit cib and re-import cib; > what if I will need a new fs:fs4, so what: export cib, create new > resource inside exported cib and re-import it. One way to make large changes to the configuration is to 1. Stop all resources crm configure property stop-all-resources=true 2. Edit configuration to what you need crm configure edit 3. Start all resources crm configure property stop-all-resources=false You might have some success in keeping services running during editing by using maintenance-mode=true instead, but that takes a lot more care and is difficult to recommend in the general case. It is also possible to use the shadow CIB facitility to simulate changes to the cluster before applying them: http://clusterlabs.org/man/pacemaker/crm_simulate.8.html There's some documentation on using Hawk with the simulator which is already outdated but might be of some help in figuring out what is possible: https://hawk-guide.readthedocs.io/en/latest/simulator.html Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: crm shell: How to display properties?
Ulrich Windl writes: >>>> xin schrieb am 06.02.2017 um 10:50 in Nachricht > <65fbbdf9-f820-63e7-fe02-1d1acefc5...@suse.com>: >> Hi Ulrich: >> >>"crm configure show" can display what you set for properties. >> >>Do you find another way? > > Yes,, but it shows the while configuration. If your configuration is long, the > output can be very long. > What I'm talking about is: > crm(live)configure# show property > ERROR: object property does not exist > crm(live)configure# show pe-error-series-max > ERROR: object pe-error-series-max does not exist > > But I found out: This one works: "crm(live)configure# show > cib-bootstrap-options". > You can also use crm configure show type:property If you follow the *-options naming convention, you can do crm configure show \*options Cheers, Kristoffer > Regards, > Ulrich > >> >> 在 2017年02月06日 17:12, Ulrich Windl 写道: >>>>>> Ken Gaillot schrieb am 02.02.2017 um 21:19 in > Nachricht >>> : >>> >>> [...] >>>> The files are not necessary for cluster operation, so you can clean them >>>> as desired. The cluster can clean them for you based on cluster options; >>>> see pe-error-series-max, pe-warn-series-max, and pe-input-series-max: >>> [...] >>> >>> Related question: >>> in crm shell I can set properties in configure context ("property ..."), > but >> how can I display them (except from looking at the end of a "show")? >>> >>> Regards, >>> Ulrich >>> >>> >>> >>> ___ >>> Users mailing list: Users@clusterlabs.org >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_vbox '--action=' not executing action
dur...@mgtsciences.com writes: > Kristoffer Grönlund wrote on 02/01/2017 10:49:54 PM: > >> >> Another possibility is that the command that fence_vbox tries to run >> doesn't work for you for some reason. It will either call >> >> VBoxManage startvm --type headless >> >> or >> >> VBoxManage controlvm poweroff >> >> when passed on or off as the --action parameter. > > If there is no further work being done on fence_vbox, is there a 'dummy' > fence > which I might use to make STONITH happy in my configuration? It need only > send > the correct signals to STONITH so that I might create an active/active > cluster > to experiment with? This is only an experimental configuration. > Another option would be to use SBD for fencing if your hypervisor can provide uncached shared storage: https://github.com/ClusterLabs/sbd This is what we usually use for our test setups here, both with VirtualBox and qemu/kvm. fence_vbox is actively maintained for sure, but we'd need to narrow down what the correct changes would be to make it work in your environment. Trying to use a dummy fencing agent is likely to come back to bite you, the cluster will act very unpredictably if it thinks that there is a fencing option that doesn't actually work. For fence_vbox, the best path forward is probably to create an issue upstream, and attach as much relevant information about your environment as possible: https://github.com/ClusterLabs/fence-agents/issues/new Cheers, Kristoffer > Thank you, > > Durwin > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] fence_vbox '--action=' not executing action
dur...@mgtsciences.com writes: > I have 2 Fedora 24 Virtualbox machines running on Windows 10 host. On the > host from DOS shell I can start 'node1' with, > > VBoxManage.exe startvm node1 --type headless > > I can shut it down with, > > VBoxManage.exe controlvm node1 acpipowerbutton > > But running fence_vbox from 'node2' does not work correctly. Below are > two commands and the output. First action is 'status' second action is > 'off'. The both get list of running nodes, but 'off' does *not* shutdown > or kill the node. > > Any ideas? I haven't tested with Windows as the host OS for fence_vbox (I wrote the initial implementation of the agent). My guess from looking at your usage is that passing "cmd" to --ssh-options might not be sufficient to get it to work in that environment, but I have no idea what the right arguments might be. Another possibility is that the command that fence_vbox tries to run doesn't work for you for some reason. It will either call VBoxManage startvm --type headless or VBoxManage controlvm poweroff when passed on or off as the --action parameter. Cheers, Kristoffer > > Thank you, > > Durwin > > > 02:04 PM root@node2 ~ > fc25> fence_vbox --verbose --ip=172.23.93.249 --username=durwin > --identity-file=/root/.ssh/id_rsa.pub --password= --plug="node1" > --ssh-options="cmd" --command-prompt='>' --login-timeout=10 > --shell-timeout=20 --action=status > Running command: /usr/bin/ssh durwin@172.23.93.249 -i > /root/.ssh/id_rsa.pub -p 22 cmd > Received: Enter passphrase for key '/root/.ssh/id_rsa.pub': > Sent: > > Received: > stty: 'standard input': Inappropriate ioctl for device > Microsoft Windows [Version 10.0.14393] > (c) 2016 Microsoft Corporation. All rights reserved. > > D:\home\durwin> > Sent: VBoxManage list runningvms > > Received: VBoxManage list runningvms > VBoxManage list runningvms > > D:\home\durwin> > Sent: VBoxManage list vms > > Received: VBoxManage list vms > VBoxManage list vms > "node2" {14bff1fe-bd26-4583-829d-bc3a393b2a01} > "node1" {5a029c3c-4549-48be-8e80-c7a67584cd98} > > D:\home\durwin> > Status: OFF > Sent: quit > > > > 02:05 PM root@node2 ~ > fc25> fence_vbox --verbose --ip=172.23.93.249 --username=durwin > --identity-file=/root/.ssh/id_rsa.pub --password= --plug="node1" > --ssh-options="cmd" --command-prompt='>' --login-timeout=10 > --shell-timeout=20 --action=off > Delay 0 second(s) before logging in to the fence device > Running command: /usr/bin/ssh durwin@172.23.93.249 -i > /root/.ssh/id_rsa.pub -p 22 cmd > Received: Enter passphrase for key '/root/.ssh/id_rsa.pub': > Sent: > > Received: > stty: 'standard input': Inappropriate ioctl for device > Microsoft Windows [Version 10.0.14393] > (c) 2016 Microsoft Corporation. All rights reserved. > > D:\home\durwin> > Sent: VBoxManage list runningvms > > Received: VBoxManage list runningvms > VBoxManage list runningvms > > D:\home\durwin> > Sent: VBoxManage list vms > > Received: VBoxManage list vms > VBoxManage list vms > "node2" {14bff1fe-bd26-4583-829d-bc3a393b2a01} > "node1" {5a029c3c-4549-48be-8e80-c7a67584cd98} > > D:\home\durwin> > Success: Already OFF > Sent: quit > > > Durwin F. De La Rue > Management Sciences, Inc. > 6022 Constitution Ave. NE > Albuquerque, NM 87110 > Phone (505) 255-8611 > > > This email message and any attachments are for the sole use of the > intended recipient(s) and may contain proprietary and/or confidential > information which may be privileged or otherwise protected from > disclosure. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient(s), please contact the > sender by reply email and destroy the original message and any copies of > the message as well as any attachments to the original message. > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] How to change the name of one cluster resource and resource group ?
Jihed M'selmi writes: > Thanks for reply, > I don't have crm command. It's corosync version 2.3.4.el7_2.1. > crmsh is a separate project, you can install it in parallel with corosync/pacemaker. There are packages on OBS: http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-7/ Otherwise if you have pcs it should have something similar to crm configure rename. Cheers, Kristoffer > On Wed, Feb 1, 2017, 3:38 PM Kristoffer Grönlund wrote: > >> Jihed M'selmi writes: >> >> > Hello, >> > >> > I need update the name of one resource group with a new name. Any >> thoughts? >> > >> >> crmsh has the crm configure rename command, which tries to update any >> constraint references atomically as well. >> >> Cheers, >> Kristoffer >> >> > Cheers, >> > JM >> > -- >> > >> > J.M >> > ___ >> > Users mailing list: Users@clusterlabs.org >> > http://lists.clusterlabs.org/mailman/listinfo/users >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> -- >> // Kristoffer Grönlund >> // kgronl...@suse.com >> > -- > > J.M -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] How to change the name of one cluster resource and resource group ?
Jihed M'selmi writes: > Hello, > > I need update the name of one resource group with a new name. Any thoughts? > crmsh has the crm configure rename command, which tries to update any constraint references atomically as well. Cheers, Kristoffer > Cheers, > JM > -- > > J.M > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal
Chris Feist writes: > On Mon, Jan 30, 2017 at 8:23 AM, Kristoffer Grönlund > wrote: > >> Hi everyone! >> >> The last time we had an HA summit was in 2015, and the intention then >> was to have SUSE arrange the next meetup in the following year. We did >> try to find a date that would be suitable for everyone, but for various >> reasons there was never a conclusion and 2016 came and went. >> >> Well, I'd like to give it another try this year! This time, I've already >> got a proposal for a place and date: September 7-8 in Nuremberg, Germany >> (SUSE main office). I've got the new event area in the SUSE office >> already reserved for these dates. >> >> My suggestion is to do a two day event similar to the one in Brno, but I >> am open to any suggestions as to format and content. The main reason for >> having the event would be for everyone to have a chance to meet and get >> to know each other, but it's also an opportunity to discuss the future >> of Clusterlabs and the direction going forward. >> >> Any thoughts or feedback are more than welcome! Let me know if you are >> interested in coming or unable to make it. >> > > Kristoffer, > > Thank you for getting some dates and providing a space for the summit. I > know myself and several cluster engineers from Red Hat are definitely > interested in attending. The only thing that I might recommend is moving > the conference one day earlier (change to Wed/Thu instead of Thu/Fri) to > make it easier for people traveling to/from the conference. Hi Chris, Sounds great! Happy to move it to September 6-7 if that works out better. Cheers, Kristoffer > > Thanks! > Chris > > >> >> Cheers, >> Kristoffer >> >> -- >> // Kristoffer Grönlund >> // kgronl...@suse.com >> >> ___ >> Developers mailing list >> develop...@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/developers >> > ___ > Developers mailing list > develop...@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/developers -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Antw: Colocations and Orders Syntax Changed?
Ulrich Windl writes: >>>> Eric Robinson schrieb am 20.01.2017 um 12:56 in > Nachricht > > >> Thanks for the input. I usually just do a 'crm config show > >> myfile.xml.date_time' and the read it back in if I need to. > > I guess 'crm configure show xml > myfile.xml.date_time', because here I get > "ERROR: config: No such command" and no XML... ;-) > > Acutally I'm using "cibadmin -Q -o configuration", because I think it's > faster... If you use a more recent version of crmsh, "crm config show" will actually work as well, thanks to some fuzzy command matching ;) (though to get XML you do need the xml argument still) Cheers, Kristoffer > > Regards, > Ulrich > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Releasing crmsh version 3.0.0
Hello everyone! I'm happy to announce the release of crmsh version 3.0.0 today. The main reason for the major version bump is because I have merged the sleha-bootstrap project with crmsh, replacing the cluster init/add/remove commands with the corresponding commands from sleha-bootstrap. At the moment, these commands are highly specific to SLE and openSUSE, unfortunately. I am working on making them as distribution agnostic as possible, but would appreciate help from users of other distributions in making them work as well on those platforms as they do on SLE/openSUSE. Briefly, the "cluster init" command configures a complete cluster from scratch, including optional configuration of fencing via SBD, shared storage using OCFS2, setting up the Hawk web interface etc. There are some other changes in this release as well, see the ChangeLog for the complete list of changes: * https://github.com/ClusterLabs/crmsh/blob/3.0.0/ChangeLog The source code can be downloaded from Github: * https://github.com/ClusterLabs/crmsh/releases/tag/3.0.0 This version of crmsh will be available in openSUSE Tumbleweed as soon as possible, and packages for several popular Linux distributions are available from the Stable repository at the OBS: * http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ Archives of the tagged release: * https://github.com/ClusterLabs/crmsh/archive/3.0.0.tar.gz * https://github.com/ClusterLabs/crmsh/archive/3.0.0.zip As usual, a huge thank you to all contributors and users of crmsh! Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal
Digimer writes: > On 30/01/17 09:23 AM, Kristoffer Grönlund wrote: >> Hi everyone! >> >> The last time we had an HA summit was in 2015, and the intention then >> was to have SUSE arrange the next meetup in the following year. We did >> try to find a date that would be suitable for everyone, but for various >> reasons there was never a conclusion and 2016 came and went. >> >> Well, I'd like to give it another try this year! This time, I've already >> got a proposal for a place and date: September 7-8 in Nuremberg, Germany >> (SUSE main office). I've got the new event area in the SUSE office >> already reserved for these dates. >> >> My suggestion is to do a two day event similar to the one in Brno, but I >> am open to any suggestions as to format and content. The main reason for >> having the event would be for everyone to have a chance to meet and get >> to know each other, but it's also an opportunity to discuss the future >> of Clusterlabs and the direction going forward. >> >> Any thoughts or feedback are more than welcome! Let me know if you are >> interested in coming or unable to make it. >> >> Cheers, >> Kristoffer > > Thank you for starting this back up. I was just thinking about this a > few days ago. > > I could make it, and I would be happy to help organize it however I > might be able to help. Hi, Awesome! I might hold you to that promise :) If nothing else your wiki has been useful in the past as a place to host the list of attendees and the agenda. Another option would be to create a repository in the Clusterlabs github organization and have people add themselves there via pull requests. I'm also open to suggestions on that front. Cheers, Kristoffer > > -- > Digimer > Papers and Projects: https://alteeve.com/w/ > "I am, somehow, less interested in the weight and convolutions of > Einstein’s brain than in the near certainty that people of equal talent > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > > ___ > Developers mailing list > develop...@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/developers -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] lrmd segfault
ale...@kurnosov.spb.ru writes: > [ Unknown signature status ] > > Hi All. > > We have the heterogeneous corosync/pacemaker cluster of 5 nodes: 3 > SL7(Scientific linux) and 2 SL6. > SL7 pacemaker installed from a standard repo (corosync - 2.3.4, pacemaker - > 1.1.13-10), SL6 build from sources (same version). > The cluster not unified, some nodes have RA which other do not have. crmsh > used for management. > SL6 nodes runs surprisingly smoothly, but SL7 steady segfaulting in the > exactly same place. > Here is an example: > Just from looking at the core dump, it looks like your processor doesn't support the SSE extensions used by the newer version of the code. You'll need to recompile and disable use of those extensions. It looks like the code is using SSE 4.2, which is relatively new: https://en.wikipedia.org/wiki/SSE4#SSE4.2 Cheers, Kristoffer > Core was generated by `/usr/libexec/pacemaker/lrmd'. > Program terminated with signal 11, Segmentation fault. > #0 __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 > 164 movdqu (%rdi), %xmm1 > (gdb) bt > #0 __strcasecmp_l_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164 > #1 0x7fed076136dc in crm_str_eq (a=, b=b@entry=0xed7070 > "DRBD_D16", use_case=use_case@entry=0) at utils.c:1416 > #2 0x7fed073eaafa in is_op_blocked (rsc=0xed7070 "DRBD_D16") at > services.c:644 > #3 0x7fed073eac1d in services_action_async (op=0xed58e0, > action_callback=) at services.c:625 > #4 0x00404e4a in lrmd_rsc_execute_service_lib (cmd=0xed9e10, > rsc=0xed4500) at lrmd.c:1242 > #5 lrmd_rsc_execute (rsc=0xed4500) at lrmd.c:1308 > #6 lrmd_rsc_dispatch (user_data=0xed4500, user_data@entry= variable: value has been optimized out>) at lrmd.c:1317 > #7 0x7fed07634c73 in crm_trigger_dispatch (source=0xed54c0, > callback=, userdata=) at mainloop.c:107 > #8 0x7fed055cb7aa in g_main_dispatch (context=0xeb4d40) at gmain.c:3109 > #9 g_main_context_dispatch (context=context@entry=0xeb4d40) at gmain.c:3708 > #10 0x7fed055cbaf8 in g_main_context_iterate (context=0xeb4d40, > block=block@entry=1, dispatch=dispatch@entry=1, self=) at > gmain.c:3779 > #11 0x7fed055cbdca in g_main_loop_run (loop=0xe96510) at gmain.c:3973 > #12 0x004028ce in main (argc=, argv=0x7ffe9b3b0fd8) at > main.c:476 > > Any help would be appreciated. > > -- > Alexey Kurnosov > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] HA/Clusterlabs Summit 2017 Proposal
Hi everyone! The last time we had an HA summit was in 2015, and the intention then was to have SUSE arrange the next meetup in the following year. We did try to find a date that would be suitable for everyone, but for various reasons there was never a conclusion and 2016 came and went. Well, I'd like to give it another try this year! This time, I've already got a proposal for a place and date: September 7-8 in Nuremberg, Germany (SUSE main office). I've got the new event area in the SUSE office already reserved for these dates. My suggestion is to do a two day event similar to the one in Brno, but I am open to any suggestions as to format and content. The main reason for having the event would be for everyone to have a chance to meet and get to know each other, but it's also an opportunity to discuss the future of Clusterlabs and the direction going forward. Any thoughts or feedback are more than welcome! Let me know if you are interested in coming or unable to make it. Cheers, Kristoffer -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] large cluster with corosync
915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync notice [TOTEM ] A new membership > (10.5.4.101:964) was formed. Members > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync notice [MAIN ] Completed > service synchronization, ready to provide service. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:55 [4915] reniar corosync warning [TOTEM ] JOIN or LEAVE > message was thrown away during flush operation. > Jan 04 10:29:59 [4915] reniar corosync warning [MAIN ] Corosync main > process was not scheduled for 1465.7160 ms (threshold is 800. ms). > Consider token timeout increase. > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] sbd: Cannot open watchdog device: /dev/watchdog
Muhammad Sharfuddin writes: > Hello, > > pacemaker does not start on this machine(Fujitsu PRIMERGY RX2540 M1) > with following error in the logs: > > sbd: [13236]: ERROR: Cannot open watchdog device: /dev/watchdog: No such > file or directory Does /dev/watchdog exist? If so, it may be opened by a different process. If you have more than one watchdog device, you can configure sbd to use a different device using the -w option. Cheers, Kristoffer > > System Info: > > sbd-1.2.1-8.7.x86_64 corosync-2.3.3-7.12.x86_64 pacemaker-1.1.12-7.1.x86_64 > > lsmod | egrep "(wd|dog)" > iTCO_wdt 13480 0 > iTCO_vendor_support13718 1 iTCO_wdt > > dmidecode | grep -A3 '^System Information' > System Information > Manufacturer: FUJITSU > Product Name: PRIMERGY RX2540 M1 > Version: GS01 > > logs: > > 2017-01-03T21:00:26.890503+05:00 prdnode1 sbd: [13235]: info: Watchdog > enabled. > 2017-01-03T21:00:26.899817+05:00 prdnode1 sbd: [13238]: info: Servant > starting for device > /dev/disk/by-id/wwn-0x60e00d28002825b5-part1 > 2017-01-03T21:00:26.900175+05:00 prdnode1 sbd: [13238]: info: Device > /dev/disk/by-id/wwn-0x60e00d28002825b5-part1 uuid: > fda42d64-ca74-4578-90c8-976ea7ff5f6e > 2017-01-03T21:00:26.900418+05:00 prdnode1 sbd: [13239]: info: Monitoring > Pacemaker health > 2017-01-03T21:00:27.901022+05:00 prdnode1 sbd: [13236]: ERROR: Cannot > open watchdog device: /dev/watchdog: No such file or directory > 2017-01-03T21:00:27.912098+05:00 prdnode1 sbd: [13236]: WARN: Servant > for pcmk (pid: 13239) has terminated > 2017-01-03T21:00:27.941950+05:00 prdnode1 sbd: [13236]: WARN: Servant > for /dev/disk/by-id/wwn-0x60e00d28002825b5-part1 (pid: > 13238) has terminated > 2017-01-03T21:00:27.949401+05:00 prdnode1 sbd.sh[13231]: sbd failed; > please check the logs. > 2017-01-03T21:00:27.992606+05:00 prdnode1 sbd.sh[13231]: SBD failed to > start; aborting. > 2017-01-03T21:00:27.993061+05:00 prdnode1 systemd[1]: sbd.service: > control process exited, code=exited status=1 > 2017-01-03T21:00:27.993339+05:00 prdnode1 systemd[1]: Failed to start > Shared-storage based fencing daemon. > 2017-01-03T21:00:27.993610+05:00 prdnode1 systemd[1]: Dependency failed > for Pacemaker High Availability Cluster Manager. > 2017-01-03T21:00:27.994054+05:00 prdnode1 systemd[1]: Unit sbd.service > entered failed state. > > please help. > > -- > Regards, > > Muhammad Sharfuddin > <http://www.nds.com.pk> > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: [ClusterLabs Developers] announcement: schedule for resource-agents release 3.9.8
Ulrich Windl writes: >>>> Kristoffer Grönlund schrieb am 03.01.2017 um 11:55 in > Nachricht <878tqsjtv4@suse.com>: >> Oyvind Albrigtsen writes: >> >>> Hi, >>> >>> This is a tentative schedule for resource-agents v3.9.8: >>> 3.9.8-rc1: January 10. >>> 3.9.8: January 31. >>> >>> I modified the corresponding milestones at >>> https://github.com/ClusterLabs/resource-agents/milestones >>> >>> If there's anything you think should be part of the release >>> please open an issue, a pull request, or a bugzilla, as you see >>> fit. >>> >> >> Hi Oyvind, >> >> I think it's high time for a new release! My only suggestion would be to >> call it 4.0.0, since there are much bigger changes from 3.9.7 than an >> update to the patch release number would suggest. > > I don't know the semantics of everybody's release numbering, but for a > three-level number a "compatibility"."feature"."bug-fix" pattern wouldn't be > bad; that is only change the first number if there are incompatible changes > (things may not work after ugrading from the previous level). Change the > second > number whenever there are new features (the users may want to read about), and > change only the last number if just bugs were fixed (without affecting the > interfaces). > And: There's nothing wrong with "10" following "9" ;-) > > And if you are just happy to throw out new versions (whatever they bring), > call it "2017-01" ;-) There was a recent talk by Rich Hickey on this topic, his way of putting it was that versions basically boil down to X.Y where Y means "don't care, just upgrade" and X means "anything can have changed, be very careful" :) For resource-agents and the releases historically, I personally think having a single number that just increments each release makes as much sense as anything else, at least in my experience there is just a single development track where bug fixes, new features and backwards incompatible changes mix freely, even if we do try to keep the incompatible changes as rare as possible. But, keeping the x.y.z triplet is easier to maintain in relation to the older releases. Cheers, Kristoffer > > Regards, > Ulrich > >> >> Cheers, >> Kristoffer >> >>> If there's anything that hasn't received due attention, please >>> let us know. >>> >>> Finally, if you can help with resolving issues consider yourself >>> invited to do so. There are currently 49 issues and 38 pull >>> requests still open. >>> >>> >>> Cheers, >>> Oyvind Albrigtsen >>> >>> ___ >>> Developers mailing list >>> develop...@clusterlabs.org >>> http://lists.clusterlabs.org/mailman/listinfo/developers >>> >> >> -- >> // Kristoffer Grönlund >> // kgronl...@suse.com >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] announcement: schedule for resource-agents release 3.9.8
Oyvind Albrigtsen writes: > Hi, > > This is a tentative schedule for resource-agents v3.9.8: > 3.9.8-rc1: January 10. > 3.9.8: January 31. > > I modified the corresponding milestones at > https://github.com/ClusterLabs/resource-agents/milestones > > If there's anything you think should be part of the release > please open an issue, a pull request, or a bugzilla, as you see > fit. > Hi Oyvind, I think it's high time for a new release! My only suggestion would be to call it 4.0.0, since there are much bigger changes from 3.9.7 than an update to the patch release number would suggest. Cheers, Kristoffer > If there's anything that hasn't received due attention, please > let us know. > > Finally, if you can help with resolving issues consider yourself > invited to do so. There are currently 49 issues and 38 pull > requests still open. > > > Cheers, > Oyvind Albrigtsen > > ___ > Developers mailing list > develop...@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/developers > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org