Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On Mon, 7 Nov 2016 12:39:32 -0600 Ken Gaillot wrote: > On 11/07/2016 12:03 PM, Jehan-Guillaume de Rorthais wrote: > > On Mon, 7 Nov 2016 09:31:20 -0600 > > Ken Gaillot wrote: > > > >> On 11/07/2016 03:47 AM, Klaus Wenninger wrote: > >>> On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote: > On Mon, 7 Nov 2016 10:12:04 +0100 > Klaus Wenninger wrote: > > > On 11/07/2016 08:41 AM, Ulrich Windl wrote: > > Ken Gaillot schrieb am 04.11.2016 um 22:37 in > > Nachricht > >> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: > >>> On 11/04/2016 02:29 AM, Ulrich Windl wrote: > >>> Ken Gaillot schrieb am 03.11.2016 um 17:08 > >>> in > Nachricht > <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: > ... > >>> Another possible use would be for a cron that needs to know whether a > >>> particular resource is running, and an attribute query is quicker and > >>> easier than something like parsing crm_mon output or probing the > >>> service. > >> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so > >> besides of lacking options and inefficient implementation, why should > >> one be faster than the other? > > attrd_updater doesn't go for the CIB > AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" > since 1.1.13: > https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 > >>> That prevents values being stored in the CIB. attrd_updater should > >>> always talk to attrd as I got it ... > >> > >> It's a bit confusing: Both crm_attribute and attrd_updater will > >> ultimately affect both attrd and the CIB in most cases, but *how* they > >> do so is different. crm_attribute modifies the CIB, and lets attrd pick > >> up the change from there; attrd_updater notifies attrd, and lets attrd > >> modify the CIB. > >> > >> The difference is subtle. > >> > >> With corosync 2, attrd only modifies "transient" node attributes (which > >> stay in effect till the next reboot), not "permanent" attributes. > > > > So why "--private" is not compatible with corosync 1.x as attrd_updater > > only set "transient" attributes anyway? > > Corosync 1 does not support certain reliability guarantees required by > the current attrd, so when building against the corosync 1 libraries, > pacemaker will install "legacy" attrd instead. The difference is mainly > that the current attrd can guarantee atomic updates to attribute values. > attrd_updater actually can set permanent attributes when used with > legacy attrd. OK, I understand now. > > How and where private attributes are stored? > > They are kept in memory only, in attrd. Of course, attrd is clustered, > so they are kept in sync across all nodes. OK, that was my guess. > >> So crm_attribute must be used if you want to set a permanent attribute. > >> crm_attribute also has the ability to modify cluster properties and > >> resource defaults, as well as node attributes. > >> > >> On the other hand, by contacting attrd directly, attrd_updater can > >> change an attribute's "dampening" (how often it is flushed to the CIB), > >> and it can (as mentioned above) set "private" attributes that are never > >> written to the CIB (and thus never cause the cluster to re-calculate > >> resource placement). > > > > Interesting, thank you for the clarification. > > > > As I understand it, it resumes to: > > > > crm_attribute -> CIB <-(poll/notify?) attrd > > attrd_updater -> attrd -> CIB > > Correct. On startup, attrd registers with CIB to be notified of all changes. > > > Just a quick question about this, is it possible to set a "dampening" high > > enough so attrd never flush it to the CIB (kind of private attribute too)? > > I'd expect that to work, if the dampening interval was higher than the > lifetime of the cluster being up. Interesting. > It's also possible to abuse attrd to create a kind of private attribute > by using a node name that doesn't exist and never will. :) This ability > is intentionally allowed, so you can set attributes for nodes that the > current partition isn't aware of, or nodes that are planned to be added > later, but only attributes for known nodes will be written to the CIB. Again, interesting. I'll do some test on my RA as I need clustered private attributes and was not able to get them under old stack (Debian < 8 or RHEL < 7). Thank you very much for your answers! Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On 11/07/2016 12:03 PM, Jehan-Guillaume de Rorthais wrote: > On Mon, 7 Nov 2016 09:31:20 -0600 > Ken Gaillot wrote: > >> On 11/07/2016 03:47 AM, Klaus Wenninger wrote: >>> On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote: On Mon, 7 Nov 2016 10:12:04 +0100 Klaus Wenninger wrote: > On 11/07/2016 08:41 AM, Ulrich Windl wrote: > Ken Gaillot schrieb am 04.11.2016 um 22:37 in > Nachricht >> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: >>> On 11/04/2016 02:29 AM, Ulrich Windl wrote: >>> Ken Gaillot schrieb am 03.11.2016 um 17:08 >>> in Nachricht <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: ... >>> Another possible use would be for a cron that needs to know whether a >>> particular resource is running, and an attribute query is quicker and >>> easier than something like parsing crm_mon output or probing the >>> service. >> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so >> besides of lacking options and inefficient implementation, why should one >> be faster than the other? > attrd_updater doesn't go for the CIB AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" since 1.1.13: https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 >>> That prevents values being stored in the CIB. attrd_updater should >>> always talk to attrd as I got it ... >> >> It's a bit confusing: Both crm_attribute and attrd_updater will >> ultimately affect both attrd and the CIB in most cases, but *how* they >> do so is different. crm_attribute modifies the CIB, and lets attrd pick >> up the change from there; attrd_updater notifies attrd, and lets attrd >> modify the CIB. >> >> The difference is subtle. >> >> With corosync 2, attrd only modifies "transient" node attributes (which >> stay in effect till the next reboot), not "permanent" attributes. > > So why "--private" is not compatible with corosync 1.x as attrd_updater only > set > "transient" attributes anyway? Corosync 1 does not support certain reliability guarantees required by the current attrd, so when building against the corosync 1 libraries, pacemaker will install "legacy" attrd instead. The difference is mainly that the current attrd can guarantee atomic updates to attribute values. attrd_updater actually can set permanent attributes when used with legacy attrd. > How and where private attributes are stored? They are kept in memory only, in attrd. Of course, attrd is clustered, so they are kept in sync across all nodes. >> So crm_attribute must be used if you want to set a permanent attribute. >> crm_attribute also has the ability to modify cluster properties and >> resource defaults, as well as node attributes. >> >> On the other hand, by contacting attrd directly, attrd_updater can >> change an attribute's "dampening" (how often it is flushed to the CIB), >> and it can (as mentioned above) set "private" attributes that are never >> written to the CIB (and thus never cause the cluster to re-calculate >> resource placement). > > Interesting, thank you for the clarification. > > As I understand it, it resumes to: > > crm_attribute -> CIB <-(poll/notify?) attrd > attrd_updater -> attrd -> CIB Correct. On startup, attrd registers with CIB to be notified of all changes. > Just a quick question about this, is it possible to set a "dampening" high > enough so attrd never flush it to the CIB (kind of private attribute too)? I'd expect that to work, if the dampening interval was higher than the lifetime of the cluster being up. It's also possible to abuse attrd to create a kind of private attribute by using a node name that doesn't exist and never will. :) This ability is intentionally allowed, so you can set attributes for nodes that the current partition isn't aware of, or nodes that are planned to be added later, but only attributes for known nodes will be written to the CIB. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On Mon, 7 Nov 2016 09:31:20 -0600 Ken Gaillot wrote: > On 11/07/2016 03:47 AM, Klaus Wenninger wrote: > > On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote: > >> On Mon, 7 Nov 2016 10:12:04 +0100 > >> Klaus Wenninger wrote: > >> > >>> On 11/07/2016 08:41 AM, Ulrich Windl wrote: > >>> Ken Gaillot schrieb am 04.11.2016 um 22:37 in > >>> Nachricht > <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: > > On 11/04/2016 02:29 AM, Ulrich Windl wrote: > > Ken Gaillot schrieb am 03.11.2016 um 17:08 > > in > >> Nachricht > >> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: > >> ... > > Another possible use would be for a cron that needs to know whether a > > particular resource is running, and an attribute query is quicker and > > easier than something like parsing crm_mon output or probing the > > service. > crm_mon reads parts of the CIB; crm_attribute also does, I guess, so > besides of lacking options and inefficient implementation, why should one > be faster than the other? > >>> attrd_updater doesn't go for the CIB > >> AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" > >> since 1.1.13: > >> https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 > > That prevents values being stored in the CIB. attrd_updater should > > always talk to attrd as I got it ... > > It's a bit confusing: Both crm_attribute and attrd_updater will > ultimately affect both attrd and the CIB in most cases, but *how* they > do so is different. crm_attribute modifies the CIB, and lets attrd pick > up the change from there; attrd_updater notifies attrd, and lets attrd > modify the CIB. > > The difference is subtle. > > With corosync 2, attrd only modifies "transient" node attributes (which > stay in effect till the next reboot), not "permanent" attributes. So why "--private" is not compatible with corosync 1.x as attrd_updater only set "transient" attributes anyway? How and where private attributes are stored? > So crm_attribute must be used if you want to set a permanent attribute. > crm_attribute also has the ability to modify cluster properties and > resource defaults, as well as node attributes. > > On the other hand, by contacting attrd directly, attrd_updater can > change an attribute's "dampening" (how often it is flushed to the CIB), > and it can (as mentioned above) set "private" attributes that are never > written to the CIB (and thus never cause the cluster to re-calculate > resource placement). Interesting, thank you for the clarification. As I understand it, it resumes to: crm_attribute -> CIB <-(poll/notify?) attrd attrd_updater -> attrd -> CIB Just a quick question about this, is it possible to set a "dampening" high enough so attrd never flush it to the CIB (kind of private attribute too)? Regards, ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] permissions under /etc/corosync/qnetd (was: Corosync 2.4.0 is available at corosync.org!)
Jan Friesse writes: > Ferenc Wágner napsal(a): > >> Have you got any plans/timeline for 2.4.2 yet? > > Yep, I'm going to release it in few minutes/hours. Man, that was quick. I've got a bunch of typo fixes queued..:) Please consider announcing upcoming releases a couple of days in advance; as a packager, I'd much appreciate it. Maybe even tag release candidates... Anyway, I've got a question concerning corosync-qnetd. I run it as user and group coroqnetd. Is granting it read access to cert8.db and key3.db enough for proper operation? corosync-qnetd-certutil gives write access to group coroqnetd to everything, which seems unintuitive to me. Please note that I've got zero experience with NSS. But I don't expect the daemon to change the certificate database. Should I? -- Thanks, Feri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Corosync 2.4.2 is available at corosync.org!
I am pleased to announce the latest maintenance release of Corosync 2.4.2 available immediately from our website at http://build.clusterlabs.org/corosync/releases/. This release is mainly because we forgot to bump libvotequorum.so major version number in 2.4.0. This is not that big deal because libvotequorum isn't used by 3rd party applications (pacemaker, ...). Still makes sense to have this issue fixed. Also thanks to Ferenc Wágner for notice. Complete changelog for 2.4.2: Christine Caulfield (1): man: mention qdevice incompatibilites in votequorum.5 Fabio M. Di Nitto (1): [build] Fix build on RHEL7.3 latest Jan Friesse (3): Man: Fix corosync-qdevice-net-certutil link Qnetd LMS: Fix two partition use case libvotequorum: Bump version Michael Jones (1): cfg: Prevents use of uninitialized buffer Upgrade is (as usually) highly recommended. Thanks/congratulations to all people that contributed to achieve this great milestone. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!
Ferenc Wágner napsal(a): Jan Friesse writes: Jan Friesse writes: Please note that because of required changes in votequorum, libvotequorum is no longer binary compatible. This is reason for version bump. Er, what version bump? Corosync 2.4.1 still produces libvotequorum.so.7.0.0 for me, just like Corosync 2.3.6. Yep, you are right. Thanks for notice, this is something what should have happened. Thanks for confirming. Anyway, 2.3.6 and 2.4.x votequorum are incompatible (there were both API and ABI changes). Probably something to fix in 2.4.2. Have you got any plans/timeline for 2.4.2 yet? Yep, I'm going to release it in few minutes/hours. Anyway, we're packaging 2.4.1 for Debian now, shall we ship it with -7.0.0 +8.0.0 in lib/libvotequorum.verso? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] What is the logic when two node are down at the same time and needs to be fenced
Hi Ken, Thanks for the clarification. Now I have another real problem that needs your advise. The cluster consists of 5 nodes and one of the node got a 1 second network failure which resulted in one of the VirtualDomain resources to start on two nodes at the same time. The cluster property no_quorum_policy is set to stop. At 16:13:34, this happened: 16:13:34 zs95kj attrd[133000]: notice: crm_update_peer_proc: Node zs93KLpcs1[5] - state is now lost (was member) 16:13:34 zs95kj corosync[132974]: [CPG ] left_list[0] group:pacemakerd\x00, ip:r(0) ip(10.20.93.13) , pid:28721 16:13:34 zs95kj crmd[133002]: warning: No match for shutdown action on 5 16:13:34 zs95kj attrd[133000]: notice: Removing all zs93KLpcs1 attributes for attrd_peer_change_cb 16:13:34 zs95kj corosync[132974]: [CPG ] left_list_entries:1 16:13:34 zs95kj crmd[133002]: notice: Stonith/shutdown of zs93KLpcs1 not matched ... 16:13:35 zs95kj attrd[133000]: notice: crm_update_peer_proc: Node zs93KLpcs1[5] - state is now member (was (null)) From the DC: [root@zs95kj ~]# crm_simulate --xml-file /var/lib/pacemaker/pengine/pe-input-3288.bz2 |grep 110187 zs95kjg110187_res (ocf::heartbeat:VirtualDomain): Started zs93KLpcs1 <--This is the baseline that everything works normal [root@zs95kj ~]# crm_simulate --xml-file /var/lib/pacemaker/pengine/pe-input-3289.bz2 |grep 110187 zs95kjg110187_res (ocf::heartbeat:VirtualDomain): Stopped <--- Here the node zs93KLpcs1 lost it's network for 1 sec and resulted in this state. [root@zs95kj ~]# crm_simulate --xml-file /var/lib/pacemaker/pengine/pe-input-3290.bz2 |grep 110187 zs95kjg110187_res (ocf::heartbeat:VirtualDomain): Stopped [root@zs95kj ~]# crm_simulate --xml-file /var/lib/pacemaker/pengine/pe-input-3291.bz2 |grep 110187 zs95kjg110187_res (ocf::heartbeat:VirtualDomain): Stopped From the DC's pengine log, it has: 16:05:01 zs95kj pengine[133001]: notice: Calculated Transition 238: /var/lib/pacemaker/pengine/pe-input-3288.bz2 ... 16:13:41 zs95kj pengine[133001]: notice: Start zs95kjg110187_res#011(zs90kppcs1) ... 16:13:41 zs95kj pengine[133001]: notice: Calculated Transition 239: /var/lib/pacemaker/pengine/pe-input-3289.bz2 From the DC's CRMD log, it has: Sep 9 16:05:25 zs95kj crmd[133002]: notice: Transition 238 (Complete=48, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-3288.bz2): Complete ... Sep 9 16:13:42 zs95kj crmd[133002]: notice: Initiating action 752: start zs95kjg110187_res_start_0 on zs90kppcs1 ... Sep 9 16:13:56 zs95kj crmd[133002]: notice: Transition 241 (Complete=81, Pending=0, Fired=0, Skipped=172, Incomplete=341, Source=/var/lib/pacemaker/pengine/pe-input-3291.bz2): Stopped Here I do not see any log about pe-input-3289.bz2 and pe-input-3290.bz2. Why is this? From the log on zs93KLpcs1 where guest 110187 was running, i do not see any message regarding stopping this resource after it lost its connection to the cluster. Any ideas where to look for possible cause? On 11/3/2016 1:02 AM, Ken Gaillot wrote: On 11/02/2016 11:17 AM, Niu Sibo wrote: Hi all, I have a general question regarding the fence login in pacemaker. I have setup a three nodes cluster with Pacemaker 1.1.13 and cluster property no_quorum_policy set to ignore. When two nodes lost their NIC corosync is running on at the same time, it looks like the two nodes are getting fenced one by one, even I have three fence devices defined for each of the node. What should I be expecting in the case? It's probably coincidence that the fencing happens serially; there is nothing enforcing that for separate fence devices. There are many steps in a fencing request, so they can easily take different times to complete. I noticed if the node rejoins the cluster before the cluster starts the fence actions, some resources will get activated on 2 nodes at the sametime. This is really not good if the resource happens to be VirutalGuest. Thanks for any suggestions. Since you're ignoring quorum, there's nothing stopping the disconnected node from starting all resources on its own. It can even fence the other nodes, unless the downed NIC is used for fencing. From that node's point of view, it's the other two nodes that are lost. Quorum is the only solution I know of to prevent that. Fencing will correct the situation, but it won't prevent it. See the votequorum(5) man page for various options that can affect how quorum is calculated. Also, the very latest version of corosync supports qdevice (a lightweight daemon that run on a host outside the cluster strictly for the purposes of quorum). ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On 11/07/2016 03:47 AM, Klaus Wenninger wrote: > On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote: >> On Mon, 7 Nov 2016 10:12:04 +0100 >> Klaus Wenninger wrote: >> >>> On 11/07/2016 08:41 AM, Ulrich Windl wrote: >>> Ken Gaillot schrieb am 04.11.2016 um 22:37 in >>> Nachricht <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: > On 11/04/2016 02:29 AM, Ulrich Windl wrote: > Ken Gaillot schrieb am 03.11.2016 um 17:08 in >> Nachricht >> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: >> ... > Another possible use would be for a cron that needs to know whether a > particular resource is running, and an attribute query is quicker and > easier than something like parsing crm_mon output or probing the service. > crm_mon reads parts of the CIB; crm_attribute also does, I guess, so besides of lacking options and inefficient implementation, why should one be faster than the other? >>> attrd_updater doesn't go for the CIB >> AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" >> since 1.1.13: >> https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 > That prevents values being stored in the CIB. attrd_updater should > always talk to attrd as I got it ... It's a bit confusing: Both crm_attribute and attrd_updater will ultimately affect both attrd and the CIB in most cases, but *how* they do so is different. crm_attribute modifies the CIB, and lets attrd pick up the change from there; attrd_updater notifies attrd, and lets attrd modify the CIB. The difference is subtle. With corosync 2, attrd only modifies "transient" node attributes (which stay in effect till the next reboot), not "permanent" attributes. So crm_attribute must be used if you want to set a permanent attribute. crm_attribute also has the ability to modify cluster properties and resource defaults, as well as node attributes. On the other hand, by contacting attrd directly, attrd_updater can change an attribute's "dampening" (how often it is flushed to the CIB), and it can (as mentioned above) set "private" attributes that are never written to the CIB (and thus never cause the cluster to re-calculate resource placement). ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] pacemaker after upgrade from wheezy to jessie
We managed to change the validate-with option via workaround (cibadmin export & replace) as setting the value with cibadmin --modify doesn't write the changes to disk. After experimenting with various schemes (xml is correctly interpreted by crmsh) we are still not able to communicate with local crmd. Can someone please help to determine why the local crmd is not responding (we disabled our other nodes to eliminate possible corosync related issues) and runs into errors/timeouts when issuing crmsh or cibadmin related commands. examples for not working local commands timeout when running cibadmin: (strace attachment) > cibadmin --upgrade --force > Call cib_upgrade failed (-62): Timer expired error when running a crm resource cleanup > crm resource cleanup $vm > Error signing on to the CRMd service > Error performing operation: Transport endpoint is not connected I attached the strace log from running cib_upgrade, does this help to find the cause of the timeout issue? Here is the corosync dump when locally starting pacemaker: Nov 07 16:01:59 [24339] nebel1 corosync notice [MAIN ] main.c:1256 Corosync Cluster Engine ('2.3.6'): started and ready to provide service. Nov 07 16:01:59 [24339] nebel1 corosync info[MAIN ] main.c:1257 Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices snmp pie relro bindnow Nov 07 16:01:59 [24339] nebel1 corosync notice [TOTEM ] totemnet.c:248 Initializing transport (UDP/IP Multicast). Nov 07 16:01:59 [24339] nebel1 corosync notice [TOTEM ] totemcrypto.c:579 Initializing transmit/receive security (NSS) crypto: none hash: none Nov 07 16:01:59 [24339] nebel1 corosync notice [TOTEM ] totemnet.c:248 Initializing transport (UDP/IP Multicast). Nov 07 16:01:59 [24339] nebel1 corosync notice [TOTEM ] totemcrypto.c:579 Initializing transmit/receive security (NSS) crypto: none hash: none Nov 07 16:01:59 [24339] nebel1 corosync notice [TOTEM ] totemudp.c:671 The network interface [10.112.0.1] is now up. Nov 07 16:01:59 [24339] nebel1 corosync notice [SERV ] service.c:174 Service engine loaded: corosync configuration map access [0] Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server name: cmap Nov 07 16:01:59 [24339] nebel1 corosync notice [SERV ] service.c:174 Service engine loaded: corosync configuration service [1] Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server name: cfg Nov 07 16:01:59 [24339] nebel1 corosync notice [SERV ] service.c:174 Service engine loaded: corosync cluster closed process group service v1.01 [2] Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server name: cpg Nov 07 16:01:59 [24339] nebel1 corosync notice [SERV ] service.c:174 Service engine loaded: corosync profile loading service [4] Nov 07 16:01:59 [24339] nebel1 corosync notice [SERV ] service.c:174 Service engine loaded: corosync resource monitoring service [6] Nov 07 16:01:59 [24339] nebel1 corosync info[WD] wd.c:669 Watchdog /dev/watchdog is now been tickled by corosync. Nov 07 16:01:59 [24339] nebel1 corosync warning [WD] wd.c:625 Could not change the Watchdog timeout from 10 to 6 seconds Nov 07 16:01:59 [24339] nebel1 corosync warning [WD] wd.c:464 resource load_15min missing a recovery key. Nov 07 16:01:59 [24339] nebel1 corosync warning [WD] wd.c:464 resource memory_used missing a recovery key. Nov 07 16:01:59 [24339] nebel1 corosync info[WD] wd.c:581 no resources configured. Nov 07 16:01:59 [24339] nebel1 corosync notice [SERV ] service.c:174 Service engine loaded: corosync watchdog service [7] Nov 07 16:01:59 [24339] nebel1 corosync notice [SERV ] service.c:174 Service engine loaded: corosync cluster quorum service v0.1 [3] Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server name: quorum Nov 07 16:01:59 [24339] nebel1 corosync notice [TOTEM ] totemudp.c:671 The network interface [10.110.1.1] is now up. Nov 07 16:01:59 [24339] nebel1 corosync notice [TOTEM ] totemsrp.c:2095 A new membership (10.112.0.1:348) was formed. Members joined: 1 Nov 07 16:01:59 [24339] nebel1 corosync notice [MAIN ] main.c:310 Completed service synchronization, ready to provide service. Nov 07 16:01:59 [24341] nebel1 pacemakerd: notice: main: Starting Pacemaker 1.1.15 | build=e174ec8 features: generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc lha-fencing upstart systemd nagios corosync-native atomic-attrd snmp libesmtp acls Nov 07 16:01:59 [24341] nebel1 pacemakerd: info: main: Maximum core file size is: 18446744073709551615 Nov 07 16:01:59 [24341] nebel1 pacemakerd: info: qb_ipcs_us_publish: server name: pacemakerd Nov 07 16:01:59 [24341] nebel1 pacemakerd: info: corosync_node_name: Unable to get node name for nodeid 1 Nov 07 16:01:59 [24341] nebel1 pacemakerd: notice: get_node_name:
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On 11/07/2016 01:41 AM, Ulrich Windl wrote: Ken Gaillot schrieb am 04.11.2016 um 22:37 in Nachricht > <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: >> On 11/04/2016 02:29 AM, Ulrich Windl wrote: >> Ken Gaillot schrieb am 03.11.2016 um 17:08 in * The new ocf:pacemaker:attribute resource agent sets a node attribute according to whether the resource is running or stopped. This may be useful in combination with attribute-based rules to model dependencies that simple constraints can't handle. >>> >>> I don't quite understand this: Isn't the state of a resource in the CIB >> status >>> section anyway? If not, why not add it? So it would be readily available for >>> anyone (rules, constraints, etc.). >> >> This (hopefully) lets you model more complicated relationships. >> >> For example, someone recently asked whether they could make an ordering >> constraint apply only at "start-up" -- the first time resource A starts, >> it does some initialization that B needs, but once that's done, B can be >> independent of A. > > Is "at start-up" before start of the resource, after start of the resource, > or parallel to the start of the resource ;-) > Probably a "hook" in the corresponding RA is the better approach, unless you > can really model all of the above. > >> >> For that case, you could group A with an ocf:pacemaker:attribute >> resource. The important part is that the attribute is not set if A has >> never run on a node. So, you can make a rule that B can run only where >> the attribute is set, regardless of the value -- even if A is later >> stopped, the attribute will still be set. > > If a resource is not running on a node,, it is "stopped"; isn't it? Sure, but what I mean is, if resource A has *never* run on a node, then the corresponding node attribute will be *unset*. But if A has ever started and/or stopped on a node, the attribute will be set to one value or the other. So, a rule can be used to check whether the attribute is set, to determine whether A has *ever* been run on the node, regardless of whether it is currently running. >> Another possible use would be for a cron that needs to know whether a >> particular resource is running, and an attribute query is quicker and >> easier than something like parsing crm_mon output or probing the service. > > crm_mon reads parts of the CIB; crm_attribute also does, I guess, so besides > of lacking options and inefficient implementation, why should one be faster > than the other? > >> >> It's all theoretical at this point, and I'm not entirely sure those >> examples would be useful :) but I wanted to make the agent available for >> people to experiment with. > > A good product manager should resist the attempt to provide any feature the > customers ask for, avoiding bloat-ware. That is to protect the customer from > their own bad decisions. In most cases there is a better, more universal > solution to the specific problem. Sure, but this is a resource agent -- it adds no overhead to anyone not using it, and since we don't have any examples or walk-throughs using it, users would have to investigate and experiment to see whether it's of any use in their environment. Hopefully, this will turn out to be a general-purpose tool of value to multiple problem scenarios. * Pacemaker's existing "node health" feature allows resources to move off nodes that become unhealthy. Now, when using node-health-strategy=progressive, a new cluster property node-health-base will be used as the initial health score of newly joined nodes (defaulting to 0, which is the previous behavior). This allows cloned and multistate resource instances to start on a node even if it has some "yellow" health attributes. >>> >>> So the node health is more or less a "node score"? I don't understand the >> last >>> sentence. Maybe give an example? >> >> Yes, node health is a score that's added when deciding where to place a >> resource. It does get complicated ... >> >> Node health monitoring is optional, and off by default. >> >> Node health attributes are set to red, yellow or green (outside >> pacemaker itself -- either by a resource agent, or some external >> process). As an example, let's say we have three node health attributes >> for CPU usage, CPU temperature, and SMART error count. >> >> With a progressive strategy, red and yellow are assigned some negative >> score, and green is 0. In our example, let's say yellow gets a -10 score. >> >> If any of our attributes are yellow, resources will avoid the node >> (unless they have higher positive scores from something like stickiness >> or a location constraint). >> > > I understood so far. > >> Normally, this is what you want, but if your resources are cloned on all >> nodes, maybe you don't care if some attributes are yellow. In that case, >> you can set node-health-base=20, so even if two attributes are yellow, >> it won't prevent resources from runnin
Re: [ClusterLabs] Authoritative corosync's location
On 22/09/16 09:05 +0200, Jan Friesse wrote: > Jan Pokorný napsal(a): >> On 21/09/16 09:16 +0200, Jan Friesse wrote: >>> Thomas Lamprecht napsal(a): I have also another, organizational question. I saw on the GitHub page from corosync that pull request there are preferred, and also that the >>> >>> True >> >> At this point, it's worth noting that ClusterLabs/corosync is >> currently a stale fork of corosync/corosync location at GitHub, >> which may be a source of confusion. > > Nice catch, I didn't even know it exists. > >> >> It would make sense to settle on just a single one to be clearly >> authoritative place to be in touch with (not sure what options >> are -- aliasing/transfering?). > > Sure. I don't know who created that fork but whoever it was please consider > deleting it. It may be really confusing. Even more so when it's occasionally updated; https://github.com/ClusterLabs/corosync (at master branch) now says "This branch is 3 commits behind corosync:master.". That also means that there seems no satisfactory solution, yet. -- Jan (Poki) pgpd3UxMpnNBv.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.16 - Release Candidate 1
On 03/11/16 11:08 -0500, Ken Gaillot wrote: > ClusterLabs is happy to announce the first release candidate for > Pacemaker version 1.1.16. Source code is available at: > > https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 > > [...] As usual, there are COPR builds (using upstream spec file without any final touch that is usually done downstream) for easy consumption in some environments: https://copr.fedorainfracloud.org/coprs/jpokorny/pacemaker/build/473980/ I also have something to share regarding recently announced security fix in pacemaker if you are interested in Fedora: fixed packages should be available from updates-testing repo in Fedora 23 and Fedora 25, and regular updates repo in Fedora 24 at the moment. -- Jan (Poki) pgpeRMbXtWvm5.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote: > On Mon, 7 Nov 2016 10:12:04 +0100 > Klaus Wenninger wrote: > >> On 11/07/2016 08:41 AM, Ulrich Windl wrote: >> Ken Gaillot schrieb am 04.11.2016 um 22:37 in >> Nachricht >>> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: On 11/04/2016 02:29 AM, Ulrich Windl wrote: Ken Gaillot schrieb am 03.11.2016 um 17:08 in > Nachricht > <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: > ... Another possible use would be for a cron that needs to know whether a particular resource is running, and an attribute query is quicker and easier than something like parsing crm_mon output or probing the service. >>> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so >>> besides of lacking options and inefficient implementation, why should one >>> be faster than the other? >> attrd_updater doesn't go for the CIB > AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" > since 1.1.13: > https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 That prevents values being stored in the CIB. attrd_updater should always talk to attrd as I got it ... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On Mon, 7 Nov 2016 10:12:04 +0100 Klaus Wenninger wrote: > On 11/07/2016 08:41 AM, Ulrich Windl wrote: > Ken Gaillot schrieb am 04.11.2016 um 22:37 in > Nachricht > > <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: > >> On 11/04/2016 02:29 AM, Ulrich Windl wrote: > >> Ken Gaillot schrieb am 03.11.2016 um 17:08 in > >>> Nachricht > >>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: ... > >> Another possible use would be for a cron that needs to know whether a > >> particular resource is running, and an attribute query is quicker and > >> easier than something like parsing crm_mon output or probing the service. > > crm_mon reads parts of the CIB; crm_attribute also does, I guess, so > > besides of lacking options and inefficient implementation, why should one > > be faster than the other? > > attrd_updater doesn't go for the CIB AFAIK, attrd_updater actually goes to the CIB, unless you set "--private" since 1.1.13: https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1
On 11/07/2016 08:41 AM, Ulrich Windl wrote: Ken Gaillot schrieb am 04.11.2016 um 22:37 in Nachricht > <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>: >> On 11/04/2016 02:29 AM, Ulrich Windl wrote: >> Ken Gaillot schrieb am 03.11.2016 um 17:08 in >>> Nachricht >>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>: ClusterLabs is happy to announce the first release candidate for Pacemaker version 1.1.16. Source code is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 The most significant enhancements in this release are: * rsc-pattern may now be used instead of rsc in location constraints, to allow a single location constraint to apply to all resources whose names match a regular expression. Sed-like %0 - %9 backreferences let submatches be used in node attribute names in rules. * The new ocf:pacemaker:attribute resource agent sets a node attribute according to whether the resource is running or stopped. This may be useful in combination with attribute-based rules to model dependencies that simple constraints can't handle. >>> I don't quite understand this: Isn't the state of a resource in the CIB >> status >>> section anyway? If not, why not add it? So it would be readily available for >>> anyone (rules, constraints, etc.). >> This (hopefully) lets you model more complicated relationships. >> >> For example, someone recently asked whether they could make an ordering >> constraint apply only at "start-up" -- the first time resource A starts, >> it does some initialization that B needs, but once that's done, B can be >> independent of A. > Is "at start-up" before start of the resource, after start of the resource, > or parallel to the start of the resource ;-) > Probably a "hook" in the corresponding RA is the better approach, unless you > can really model all of the above. > >> For that case, you could group A with an ocf:pacemaker:attribute >> resource. The important part is that the attribute is not set if A has >> never run on a node. So, you can make a rule that B can run only where >> the attribute is set, regardless of the value -- even if A is later >> stopped, the attribute will still be set. > If a resource is not running on a node,, it is "stopped"; isn't it? > >> Another possible use would be for a cron that needs to know whether a >> particular resource is running, and an attribute query is quicker and >> easier than something like parsing crm_mon output or probing the service. > crm_mon reads parts of the CIB; crm_attribute also does, I guess, so besides > of lacking options and inefficient implementation, why should one be faster > than the other? attrd_updater doesn't go for the CIB > >> It's all theoretical at this point, and I'm not entirely sure those >> examples would be useful :) but I wanted to make the agent available for >> people to experiment with. > A good product manager should resist the attempt to provide any feature the > customers ask for, avoiding bloat-ware. That is to protect the customer from > their own bad decisions. In most cases there is a better, more universal > solution to the specific problem. > > * Pacemaker's existing "node health" feature allows resources to move off nodes that become unhealthy. Now, when using node-health-strategy=progressive, a new cluster property node-health-base will be used as the initial health score of newly joined nodes (defaulting to 0, which is the previous behavior). This allows cloned and multistate resource instances to start on a node even if it has some "yellow" health attributes. >>> So the node health is more or less a "node score"? I don't understand the >> last >>> sentence. Maybe give an example? >> Yes, node health is a score that's added when deciding where to place a >> resource. It does get complicated ... >> >> Node health monitoring is optional, and off by default. >> >> Node health attributes are set to red, yellow or green (outside >> pacemaker itself -- either by a resource agent, or some external >> process). As an example, let's say we have three node health attributes >> for CPU usage, CPU temperature, and SMART error count. >> >> With a progressive strategy, red and yellow are assigned some negative >> score, and green is 0. In our example, let's say yellow gets a -10 score. >> >> If any of our attributes are yellow, resources will avoid the node >> (unless they have higher positive scores from something like stickiness >> or a location constraint). >> > I understood so far. > >> Normally, this is what you want, but if your resources are cloned on all >> nodes, maybe you don't care if some attributes are yellow. In that case, >> you can set node-health-base=20, so even if two attributes are yellow, >> it won't prevent resources from running (20 + -10 + -10 = 0). > I don't understand that: "node-health-base" is a global setting,