date:20161107

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Jehan-Guillaume de Rorthais

On Mon, 7 Nov 2016 12:39:32 -0600
Ken Gaillot  wrote:

> On 11/07/2016 12:03 PM, Jehan-Guillaume de Rorthais wrote:
> > On Mon, 7 Nov 2016 09:31:20 -0600
> > Ken Gaillot  wrote:
> >   
> >> On 11/07/2016 03:47 AM, Klaus Wenninger wrote:  
> >>> On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote:
>  On Mon, 7 Nov 2016 10:12:04 +0100
>  Klaus Wenninger  wrote:
> 
> > On 11/07/2016 08:41 AM, Ulrich Windl wrote:
> > Ken Gaillot  schrieb am 04.11.2016 um 22:37 in
> > Nachricht  
> >> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:  
> >>> On 11/04/2016 02:29 AM, Ulrich Windl wrote:  
> >>> Ken Gaillot  schrieb am 03.11.2016 um 17:08
> >>> in  
>  Nachricht
>  <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:  
>  ...
> >>> Another possible use would be for a cron that needs to know whether a
> >>> particular resource is running, and an attribute query is quicker and
> >>> easier than something like parsing crm_mon output or probing the
> >>> service.  
> >> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so
> >> besides of lacking options and inefficient implementation, why should
> >> one be faster than the other?  
> > attrd_updater doesn't go for the CIB
>  AFAIK, attrd_updater actually goes to the CIB, unless you set "--private"
>  since 1.1.13:
>  https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177
> >>> That prevents values being stored in the CIB. attrd_updater should
> >>> always talk to attrd as I got it ...
> >>
> >> It's a bit confusing: Both crm_attribute and attrd_updater will
> >> ultimately affect both attrd and the CIB in most cases, but *how* they
> >> do so is different. crm_attribute modifies the CIB, and lets attrd pick
> >> up the change from there; attrd_updater notifies attrd, and lets attrd
> >> modify the CIB.
> >>
> >> The difference is subtle.
> >>
> >> With corosync 2, attrd only modifies "transient" node attributes (which
> >> stay in effect till the next reboot), not "permanent" attributes.  
> > 
> > So why "--private" is not compatible with corosync 1.x as attrd_updater
> > only set "transient" attributes anyway?  
> 
> Corosync 1 does not support certain reliability guarantees required by
> the current attrd, so when building against the corosync 1 libraries,
> pacemaker will install "legacy" attrd instead. The difference is mainly
> that the current attrd can guarantee atomic updates to attribute values.
> attrd_updater actually can set permanent attributes when used with
> legacy attrd.

OK, I understand now.

> > How and where private attributes are stored?  
> 
> They are kept in memory only, in attrd. Of course, attrd is clustered,
> so they are kept in sync across all nodes.

OK, that was my guess.

> >> So crm_attribute must be used if you want to set a permanent attribute.
> >> crm_attribute also has the ability to modify cluster properties and
> >> resource defaults, as well as node attributes.
> >>
> >> On the other hand, by contacting attrd directly, attrd_updater can
> >> change an attribute's "dampening" (how often it is flushed to the CIB),
> >> and it can (as mentioned above) set "private" attributes that are never
> >> written to the CIB (and thus never cause the cluster to re-calculate
> >> resource placement).  
> > 
> > Interesting, thank you for the clarification.
> > 
> > As I understand it, it resumes to:
> > 
> >   crm_attribute -> CIB <-(poll/notify?) attrd
> >   attrd_updater -> attrd -> CIB  
> 
> Correct. On startup, attrd registers with CIB to be notified of all changes.
> 
> > Just a quick question about this, is it possible to set a "dampening" high
> > enough so attrd never flush it to the CIB (kind of private attribute too)?  
> 
> I'd expect that to work, if the dampening interval was higher than the
> lifetime of the cluster being up.

Interesting.

> It's also possible to abuse attrd to create a kind of private attribute
> by using a node name that doesn't exist and never will. :) This ability
> is intentionally allowed, so you can set attributes for nodes that the
> current partition isn't aware of, or nodes that are planned to be added
> later, but only attributes for known nodes will be written to the CIB.

Again, interesting. I'll do some test on my RA as I need clustered private
attributes and was not able to get them under old stack (Debian < 8 or RHEL <
7).

Thank you very much for your answers!

Regards,

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Ken Gaillot

On 11/07/2016 12:03 PM, Jehan-Guillaume de Rorthais wrote:
> On Mon, 7 Nov 2016 09:31:20 -0600
> Ken Gaillot  wrote:
> 
>> On 11/07/2016 03:47 AM, Klaus Wenninger wrote:
>>> On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote:  
 On Mon, 7 Nov 2016 10:12:04 +0100
 Klaus Wenninger  wrote:
  
> On 11/07/2016 08:41 AM, Ulrich Windl wrote:  
> Ken Gaillot  schrieb am 04.11.2016 um 22:37 in
> Nachricht
>> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:
>>> On 11/04/2016 02:29 AM, Ulrich Windl wrote:
>>> Ken Gaillot  schrieb am 03.11.2016 um 17:08
>>> in
 Nachricht
 <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:
 ...  
>>> Another possible use would be for a cron that needs to know whether a
>>> particular resource is running, and an attribute query is quicker and
>>> easier than something like parsing crm_mon output or probing the
>>> service.
>> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so
>> besides of lacking options and inefficient implementation, why should one
>> be faster than the other?
> attrd_updater doesn't go for the CIB  
 AFAIK, attrd_updater actually goes to the CIB, unless you set "--private"
 since 1.1.13:
 https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177  
>>> That prevents values being stored in the CIB. attrd_updater should
>>> always talk to attrd as I got it ...  
>>
>> It's a bit confusing: Both crm_attribute and attrd_updater will
>> ultimately affect both attrd and the CIB in most cases, but *how* they
>> do so is different. crm_attribute modifies the CIB, and lets attrd pick
>> up the change from there; attrd_updater notifies attrd, and lets attrd
>> modify the CIB.
>>
>> The difference is subtle.
>>
>> With corosync 2, attrd only modifies "transient" node attributes (which
>> stay in effect till the next reboot), not "permanent" attributes.
> 
> So why "--private" is not compatible with corosync 1.x as attrd_updater only 
> set
> "transient" attributes anyway?

Corosync 1 does not support certain reliability guarantees required by
the current attrd, so when building against the corosync 1 libraries,
pacemaker will install "legacy" attrd instead. The difference is mainly
that the current attrd can guarantee atomic updates to attribute values.
attrd_updater actually can set permanent attributes when used with
legacy attrd.

> How and where private attributes are stored?

They are kept in memory only, in attrd. Of course, attrd is clustered,
so they are kept in sync across all nodes.

>> So crm_attribute must be used if you want to set a permanent attribute.
>> crm_attribute also has the ability to modify cluster properties and
>> resource defaults, as well as node attributes.
>>
>> On the other hand, by contacting attrd directly, attrd_updater can
>> change an attribute's "dampening" (how often it is flushed to the CIB),
>> and it can (as mentioned above) set "private" attributes that are never
>> written to the CIB (and thus never cause the cluster to re-calculate
>> resource placement).
> 
> Interesting, thank you for the clarification.
> 
> As I understand it, it resumes to:
> 
>   crm_attribute -> CIB <-(poll/notify?) attrd
>   attrd_updater -> attrd -> CIB

Correct. On startup, attrd registers with CIB to be notified of all changes.

> Just a quick question about this, is it possible to set a "dampening" high
> enough so attrd never flush it to the CIB (kind of private attribute too)?

I'd expect that to work, if the dampening interval was higher than the
lifetime of the cluster being up.

It's also possible to abuse attrd to create a kind of private attribute
by using a node name that doesn't exist and never will. :) This ability
is intentionally allowed, so you can set attributes for nodes that the
current partition isn't aware of, or nodes that are planned to be added
later, but only attributes for known nodes will be written to the CIB.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Jehan-Guillaume de Rorthais

On Mon, 7 Nov 2016 09:31:20 -0600
Ken Gaillot  wrote:

> On 11/07/2016 03:47 AM, Klaus Wenninger wrote:
> > On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote:  
> >> On Mon, 7 Nov 2016 10:12:04 +0100
> >> Klaus Wenninger  wrote:
> >>  
> >>> On 11/07/2016 08:41 AM, Ulrich Windl wrote:  
> >>> Ken Gaillot  schrieb am 04.11.2016 um 22:37 in
> >>> Nachricht
>  <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:
> > On 11/04/2016 02:29 AM, Ulrich Windl wrote:
> > Ken Gaillot  schrieb am 03.11.2016 um 17:08
> > in
> >> Nachricht
> >> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:
> >> ...  
> > Another possible use would be for a cron that needs to know whether a
> > particular resource is running, and an attribute query is quicker and
> > easier than something like parsing crm_mon output or probing the
> > service.
>  crm_mon reads parts of the CIB; crm_attribute also does, I guess, so
>  besides of lacking options and inefficient implementation, why should one
>  be faster than the other?
> >>> attrd_updater doesn't go for the CIB  
> >> AFAIK, attrd_updater actually goes to the CIB, unless you set "--private"
> >> since 1.1.13:
> >> https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177  
> > That prevents values being stored in the CIB. attrd_updater should
> > always talk to attrd as I got it ...  
> 
> It's a bit confusing: Both crm_attribute and attrd_updater will
> ultimately affect both attrd and the CIB in most cases, but *how* they
> do so is different. crm_attribute modifies the CIB, and lets attrd pick
> up the change from there; attrd_updater notifies attrd, and lets attrd
> modify the CIB.
> 
> The difference is subtle.
> 
> With corosync 2, attrd only modifies "transient" node attributes (which
> stay in effect till the next reboot), not "permanent" attributes.

So why "--private" is not compatible with corosync 1.x as attrd_updater only set
"transient" attributes anyway?

How and where private attributes are stored?

> So crm_attribute must be used if you want to set a permanent attribute.
> crm_attribute also has the ability to modify cluster properties and
> resource defaults, as well as node attributes.
> 
> On the other hand, by contacting attrd directly, attrd_updater can
> change an attribute's "dampening" (how often it is flushed to the CIB),
> and it can (as mentioned above) set "private" attributes that are never
> written to the CIB (and thus never cause the cluster to re-calculate
> resource placement).

Interesting, thank you for the clarification.

As I understand it, it resumes to:

  crm_attribute -> CIB <-(poll/notify?) attrd
  attrd_updater -> attrd -> CIB

Just a quick question about this, is it possible to set a "dampening" high
enough so attrd never flush it to the CIB (kind of private attribute too)?

Regards,

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] permissions under /etc/corosync/qnetd (was: Corosync 2.4.0 is available at corosync.org!)

2016-11-07 Thread Ferenc Wágner

Jan Friesse  writes:

> Ferenc Wágner napsal(a):
>
>> Have you got any plans/timeline for 2.4.2 yet?
>
> Yep, I'm going to release it in few minutes/hours.

Man, that was quick.  I've got a bunch of typo fixes queued..:) Please
consider announcing upcoming releases a couple of days in advance; as a
packager, I'd much appreciate it.  Maybe even tag release candidates...

Anyway, I've got a question concerning corosync-qnetd.  I run it as
user and group coroqnetd.  Is granting it read access to cert8.db and
key3.db enough for proper operation?  corosync-qnetd-certutil gives
write access to group coroqnetd to everything, which seems unintuitive
to me.  Please note that I've got zero experience with NSS.  But I don't
expect the daemon to change the certificate database.  Should I?
-- 
Thanks,
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Corosync 2.4.2 is available at corosync.org!

2016-11-07 Thread Jan Friesse


I am pleased to announce the latest maintenance release of Corosync
2.4.2 available immediately from our website at
http://build.clusterlabs.org/corosync/releases/.

This release is mainly because we forgot to bump libvotequorum.so major 
version number in 2.4.0. This is not that big deal because libvotequorum 
isn't used by 3rd party applications (pacemaker, ...). Still makes sense 
to have this issue fixed. Also thanks to Ferenc Wágner for notice.



Complete changelog for 2.4.2:
Christine Caulfield (1):
  man: mention qdevice incompatibilites in votequorum.5

Fabio M. Di Nitto (1):
  [build] Fix build on RHEL7.3 latest

Jan Friesse (3):
  Man: Fix corosync-qdevice-net-certutil link
  Qnetd LMS: Fix two partition use case
  libvotequorum: Bump version

Michael Jones (1):
  cfg: Prevents use of uninitialized buffer


Upgrade is (as usually) highly recommended.

Thanks/congratulations to all people that contributed to achieve this
great milestone.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-11-07 Thread Jan Friesse


Ferenc Wágner napsal(a):

Jan Friesse  writes:


Jan Friesse  writes:


Please note that because of required changes in votequorum,
libvotequorum is no longer binary compatible. This is reason for
version bump.


Er, what version bump?  Corosync 2.4.1 still produces
libvotequorum.so.7.0.0 for me, just like Corosync 2.3.6.


Yep, you are right. Thanks for notice, this is something what should
have happened.


Thanks for confirming.


Anyway, 2.3.6 and 2.4.x votequorum are incompatible (there were both
API and ABI changes). Probably something to fix in 2.4.2.


Have you got any plans/timeline for 2.4.2 yet?


Yep, I'm going to release it in few minutes/hours.



Anyway, we're packaging 2.4.1 for Debian now, shall we ship it with

-7.0.0
+8.0.0

in lib/libvotequorum.verso?




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] What is the logic when two node are down at the same time and needs to be fenced

2016-11-07 Thread Niu Sibo


Hi Ken,

Thanks for the clarification. Now I have another real problem that needs 
your advise.


The cluster consists of 5 nodes and one of the node got a 1 second 
network failure which resulted in one of the VirtualDomain resources to 
start on two nodes at the same time. The cluster property 
no_quorum_policy is set to stop.


At 16:13:34, this happened:
16:13:34 zs95kj attrd[133000]:  notice: crm_update_peer_proc: Node 
zs93KLpcs1[5] - state is now lost (was member)
16:13:34 zs95kj corosync[132974]:  [CPG   ] left_list[0] 
group:pacemakerd\x00, ip:r(0) ip(10.20.93.13) , pid:28721

16:13:34 zs95kj crmd[133002]: warning: No match for shutdown action on 5
16:13:34 zs95kj attrd[133000]:  notice: Removing all zs93KLpcs1 
attributes for attrd_peer_change_cb

16:13:34 zs95kj corosync[132974]:  [CPG   ] left_list_entries:1
16:13:34 zs95kj crmd[133002]:  notice: Stonith/shutdown of zs93KLpcs1 
not matched

...
16:13:35 zs95kj attrd[133000]:  notice: crm_update_peer_proc: Node 
zs93KLpcs1[5] - state is now member (was (null))


From the DC:
[root@zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3288.bz2 |grep 110187
 zs95kjg110187_res  (ocf::heartbeat:VirtualDomain): Started 
zs93KLpcs1 <--This is the baseline that everything works normal


[root@zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3289.bz2 |grep 110187
 zs95kjg110187_res  (ocf::heartbeat:VirtualDomain): Stopped 
<--- Here the node zs93KLpcs1 lost it's network for 1 sec and 
resulted in this state.


[root@zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3290.bz2 |grep 110187

 zs95kjg110187_res  (ocf::heartbeat:VirtualDomain): Stopped

[root@zs95kj ~]# crm_simulate --xml-file 
/var/lib/pacemaker/pengine/pe-input-3291.bz2 |grep 110187

 zs95kjg110187_res  (ocf::heartbeat:VirtualDomain): Stopped


From the DC's pengine log, it has:
16:05:01 zs95kj pengine[133001]:  notice: Calculated Transition 238: 
/var/lib/pacemaker/pengine/pe-input-3288.bz2

...
16:13:41 zs95kj pengine[133001]:  notice: Start 
zs95kjg110187_res#011(zs90kppcs1)

...
16:13:41 zs95kj pengine[133001]:  notice: Calculated Transition 239: 
/var/lib/pacemaker/pengine/pe-input-3289.bz2


From the DC's CRMD log, it has:
Sep  9 16:05:25 zs95kj crmd[133002]:  notice: Transition 238 
(Complete=48, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-3288.bz2): Complete

...
Sep  9 16:13:42 zs95kj crmd[133002]:  notice: Initiating action 752: 
start zs95kjg110187_res_start_0 on zs90kppcs1

...
Sep  9 16:13:56 zs95kj crmd[133002]:  notice: Transition 241 
(Complete=81, Pending=0, Fired=0, Skipped=172, Incomplete=341, 
Source=/var/lib/pacemaker/pengine/pe-input-3291.bz2): Stopped


Here I do not see any log about pe-input-3289.bz2 and pe-input-3290.bz2. 
Why is this?


From the log on zs93KLpcs1 where guest 110187 was running, i do not see 
any message regarding stopping this resource after it lost its 
connection to the cluster.


Any ideas where to look for possible cause?

On 11/3/2016 1:02 AM, Ken Gaillot wrote:

On 11/02/2016 11:17 AM, Niu Sibo wrote:

Hi all,

I have a general question regarding the fence login in pacemaker.

I have setup a three nodes cluster with Pacemaker 1.1.13 and cluster
property no_quorum_policy set to ignore. When two nodes lost their NIC
corosync is running on at the same time, it looks like the two nodes are
getting fenced one by one, even I have three fence devices defined for
each of the node.

What should I be expecting in the case?

It's probably coincidence that the fencing happens serially; there is
nothing enforcing that for separate fence devices. There are many steps
in a fencing request, so they can easily take different times to complete.


I noticed if the node rejoins the cluster before the cluster starts the
fence actions, some resources will get activated on 2 nodes at the
sametime. This is really not good if the resource happens to be
VirutalGuest.  Thanks for any suggestions.

Since you're ignoring quorum, there's nothing stopping the disconnected
node from starting all resources on its own. It can even fence the other
nodes, unless the downed NIC is used for fencing. From that node's point
of view, it's the other two nodes that are lost.

Quorum is the only solution I know of to prevent that. Fencing will
correct the situation, but it won't prevent it.

See the votequorum(5) man page for various options that can affect how
quorum is calculated. Also, the very latest version of corosync supports
qdevice (a lightweight daemon that run on a host outside the cluster
strictly for the purposes of quorum).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Ken Gaillot

On 11/07/2016 03:47 AM, Klaus Wenninger wrote:
> On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote:
>> On Mon, 7 Nov 2016 10:12:04 +0100
>> Klaus Wenninger  wrote:
>>
>>> On 11/07/2016 08:41 AM, Ulrich Windl wrote:
>>> Ken Gaillot  schrieb am 04.11.2016 um 22:37 in
>>> Nachricht  
 <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:  
> On 11/04/2016 02:29 AM, Ulrich Windl wrote:  
> Ken Gaillot  schrieb am 03.11.2016 um 17:08 in  
>> Nachricht
>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:  
>> ...
> Another possible use would be for a cron that needs to know whether a
> particular resource is running, and an attribute query is quicker and
> easier than something like parsing crm_mon output or probing the service. 
>  
 crm_mon reads parts of the CIB; crm_attribute also does, I guess, so
 besides of lacking options and inefficient implementation, why should one
 be faster than the other?  
>>> attrd_updater doesn't go for the CIB
>> AFAIK, attrd_updater actually goes to the CIB, unless you set "--private"
>> since 1.1.13:
>> https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177
> That prevents values being stored in the CIB. attrd_updater should
> always talk to attrd as I got it ...

It's a bit confusing: Both crm_attribute and attrd_updater will
ultimately affect both attrd and the CIB in most cases, but *how* they
do so is different. crm_attribute modifies the CIB, and lets attrd pick
up the change from there; attrd_updater notifies attrd, and lets attrd
modify the CIB.

The difference is subtle.

With corosync 2, attrd only modifies "transient" node attributes (which
stay in effect till the next reboot), not "permanent" attributes. So
crm_attribute must be used if you want to set a permanent attribute.
crm_attribute also has the ability to modify cluster properties and
resource defaults, as well as node attributes.

On the other hand, by contacting attrd directly, attrd_updater can
change an attribute's "dampening" (how often it is flushed to the CIB),
and it can (as mentioned above) set "private" attributes that are never
written to the CIB (and thus never cause the cluster to re-calculate
resource placement).

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemaker after upgrade from wheezy to jessie

2016-11-07 Thread Toni Tschampke

We managed to change the validate-with option via workaround (cibadmin 
export & replace) as setting the value with cibadmin --modify doesn't 
write the changes to disk.


After experimenting with various schemes (xml is correctly interpreted 
by crmsh) we are still not able to communicate with local crmd.


Can someone please help to determine why the local crmd is not 
responding (we disabled our other nodes to eliminate possible corosync 
related issues) and runs into errors/timeouts when issuing crmsh or 
cibadmin related commands.


examples for not working local commands

timeout when running cibadmin: (strace attachment)
> cibadmin --upgrade --force
> Call cib_upgrade failed (-62): Timer expired

error when running a crm resource cleanup
> crm resource cleanup $vm
> Error signing on to the CRMd service
> Error performing operation: Transport endpoint is not connected

I attached the strace log from running cib_upgrade, does this help to 
find the cause of the timeout issue?


Here is the corosync dump when locally starting pacemaker:


Nov 07 16:01:59 [24339] nebel1 corosync notice  [MAIN  ] main.c:1256 Corosync 
Cluster Engine ('2.3.6'): started and ready to provide service.
Nov 07 16:01:59 [24339] nebel1 corosync info[MAIN  ] main.c:1257 Corosync 
built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf 
qdevices snmp pie relro bindnow
Nov 07 16:01:59 [24339] nebel1 corosync notice  [TOTEM ] totemnet.c:248 
Initializing transport (UDP/IP Multicast).
Nov 07 16:01:59 [24339] nebel1 corosync notice  [TOTEM ] totemcrypto.c:579 
Initializing transmit/receive security (NSS) crypto: none hash: none
Nov 07 16:01:59 [24339] nebel1 corosync notice  [TOTEM ] totemnet.c:248 
Initializing transport (UDP/IP Multicast).
Nov 07 16:01:59 [24339] nebel1 corosync notice  [TOTEM ] totemcrypto.c:579 
Initializing transmit/receive security (NSS) crypto: none hash: none
Nov 07 16:01:59 [24339] nebel1 corosync notice  [TOTEM ] totemudp.c:671 The 
network interface [10.112.0.1] is now up.
Nov 07 16:01:59 [24339] nebel1 corosync notice  [SERV  ] service.c:174 Service 
engine loaded: corosync configuration map access [0]
Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server 
name: cmap
Nov 07 16:01:59 [24339] nebel1 corosync notice  [SERV  ] service.c:174 Service 
engine loaded: corosync configuration service [1]
Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server 
name: cfg
Nov 07 16:01:59 [24339] nebel1 corosync notice  [SERV  ] service.c:174 Service 
engine loaded: corosync cluster closed process group service v1.01 [2]
Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server 
name: cpg
Nov 07 16:01:59 [24339] nebel1 corosync notice  [SERV  ] service.c:174 Service 
engine loaded: corosync profile loading service [4]
Nov 07 16:01:59 [24339] nebel1 corosync notice  [SERV  ] service.c:174 Service 
engine loaded: corosync resource monitoring service [6]
Nov 07 16:01:59 [24339] nebel1 corosync info[WD] wd.c:669 Watchdog 
/dev/watchdog is now been tickled by corosync.
Nov 07 16:01:59 [24339] nebel1 corosync warning [WD] wd.c:625 Could not 
change the Watchdog timeout from 10 to 6 seconds
Nov 07 16:01:59 [24339] nebel1 corosync warning [WD] wd.c:464 resource 
load_15min missing a recovery key.
Nov 07 16:01:59 [24339] nebel1 corosync warning [WD] wd.c:464 resource 
memory_used missing a recovery key.
Nov 07 16:01:59 [24339] nebel1 corosync info[WD] wd.c:581 no resources 
configured.
Nov 07 16:01:59 [24339] nebel1 corosync notice  [SERV  ] service.c:174 Service 
engine loaded: corosync watchdog service [7]
Nov 07 16:01:59 [24339] nebel1 corosync notice  [SERV  ] service.c:174 Service 
engine loaded: corosync cluster quorum service v0.1 [3]
Nov 07 16:01:59 [24339] nebel1 corosync info[QB] ipc_setup.c:536 server 
name: quorum
Nov 07 16:01:59 [24339] nebel1 corosync notice  [TOTEM ] totemudp.c:671 The 
network interface [10.110.1.1] is now up.
Nov 07 16:01:59 [24339] nebel1 corosync notice  [TOTEM ] totemsrp.c:2095 A new 
membership (10.112.0.1:348) was formed. Members joined: 1
Nov 07 16:01:59 [24339] nebel1 corosync notice  [MAIN  ] main.c:310 Completed 
service synchronization, ready to provide service.
Nov 07 16:01:59 [24341] nebel1 pacemakerd:   notice: main:  Starting 
Pacemaker 1.1.15 | build=e174ec8 features: generated-manpages agent-manpages 
ascii-docs publican-docs ncurses libqb-logging libqb-ipc lha-fencing upstart 
systemd nagios  corosync-native atomic-attrd snmp libesmtp acls
Nov 07 16:01:59 [24341] nebel1 pacemakerd: info: main:  Maximum core 
file size is: 18446744073709551615
Nov 07 16:01:59 [24341] nebel1 pacemakerd: info: qb_ipcs_us_publish:
server name: pacemakerd
Nov 07 16:01:59 [24341] nebel1 pacemakerd: info: corosync_node_name:
Unable to get node name for nodeid 1
Nov 07 16:01:59 [24341] nebel1 pacemakerd:   notice: get_node_name:

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Ken Gaillot

On 11/07/2016 01:41 AM, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 04.11.2016 um 22:37 in 
 Nachricht
> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:
>> On 11/04/2016 02:29 AM, Ulrich Windl wrote:
>> Ken Gaillot  schrieb am 03.11.2016 um 17:08 in
 * The new ocf:pacemaker:attribute resource agent sets a node attribute
 according to whether the resource is running or stopped. This may be
 useful in combination with attribute-based rules to model dependencies
 that simple constraints can't handle.
>>>
>>> I don't quite understand this: Isn't the state of a resource in the CIB 
>> status
>>> section anyway? If not, why not add it? So it would be readily available for
>>> anyone (rules, constraints, etc.).
>>
>> This (hopefully) lets you model more complicated relationships.
>>
>> For example, someone recently asked whether they could make an ordering
>> constraint apply only at "start-up" -- the first time resource A starts,
>> it does some initialization that B needs, but once that's done, B can be
>> independent of A.
> 
> Is "at start-up" before start of the resource, after start of the resource, 
> or parallel to the start of the resource ;-)
> Probably a "hook" in the corresponding RA is the better approach, unless you 
> can really model all of the above.
> 
>>
>> For that case, you could group A with an ocf:pacemaker:attribute
>> resource. The important part is that the attribute is not set if A has
>> never run on a node. So, you can make a rule that B can run only where
>> the attribute is set, regardless of the value -- even if A is later
>> stopped, the attribute will still be set.
> 
> If a resource is not running on a node,, it is "stopped"; isn't it?

Sure, but what I mean is, if resource A has *never* run on a node, then
the corresponding node attribute will be *unset*. But if A has ever
started and/or stopped on a node, the attribute will be set to one value
or the other. So, a rule can be used to check whether the attribute is
set, to determine whether A has *ever* been run on the node, regardless
of whether it is currently running.

>> Another possible use would be for a cron that needs to know whether a
>> particular resource is running, and an attribute query is quicker and
>> easier than something like parsing crm_mon output or probing the service.
> 
> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so besides 
> of lacking options and inefficient implementation, why should one be faster 
> than the other?
> 
>>
>> It's all theoretical at this point, and I'm not entirely sure those
>> examples would be useful :) but I wanted to make the agent available for
>> people to experiment with.
> 
> A good product manager should resist the attempt to provide any feature the 
> customers ask for, avoiding bloat-ware. That is to protect the customer from 
> their own bad decisions. In most cases there is a better, more universal 
> solution to the specific problem.

Sure, but this is a resource agent -- it adds no overhead to anyone not
using it, and since we don't have any examples or walk-throughs using
it, users would have to investigate and experiment to see whether it's
of any use in their environment.

Hopefully, this will turn out to be a general-purpose tool of value to
multiple problem scenarios.

 * Pacemaker's existing "node health" feature allows resources to move
 off nodes that become unhealthy. Now, when using
 node-health-strategy=progressive, a new cluster property
 node-health-base will be used as the initial health score of newly
 joined nodes (defaulting to 0, which is the previous behavior). This
 allows cloned and multistate resource instances to start on a node even
 if it has some "yellow" health attributes.
>>>
>>> So the node health is more or less a "node score"? I don't understand the 
>> last
>>> sentence. Maybe give an example?
>>
>> Yes, node health is a score that's added when deciding where to place a
>> resource. It does get complicated ...
>>
>> Node health monitoring is optional, and off by default.
>>
>> Node health attributes are set to red, yellow or green (outside
>> pacemaker itself -- either by a resource agent, or some external
>> process). As an example, let's say we have three node health attributes
>> for CPU usage, CPU temperature, and SMART error count.
>>
>> With a progressive strategy, red and yellow are assigned some negative
>> score, and green is 0. In our example, let's say yellow gets a -10 score.
>>
>> If any of our attributes are yellow, resources will avoid the node
>> (unless they have higher positive scores from something like stickiness
>> or a location constraint).
>>
> 
> I understood so far.
> 
>> Normally, this is what you want, but if your resources are cloned on all
>> nodes, maybe you don't care if some attributes are yellow. In that case,
>> you can set node-health-base=20, so even if two attributes are yellow,
>> it won't prevent resources from runnin

Re: [ClusterLabs] Authoritative corosync's location

2016-11-07 Thread Jan Pokorný

On 22/09/16 09:05 +0200, Jan Friesse wrote:
> Jan Pokorný napsal(a):
>> On 21/09/16 09:16 +0200, Jan Friesse wrote:
>>> Thomas Lamprecht napsal(a):
 I have also another, organizational question. I saw on the GitHub page from
 corosync that pull request there are preferred, and also that the
>>> 
>>> True
>> 
>> At this point, it's worth noting that ClusterLabs/corosync is
>> currently a stale fork of corosync/corosync location at GitHub,
>> which may be a source of confusion.
> 
> Nice catch, I didn't even know it exists.
> 
>> 
>> It would make sense to settle on just a single one to be clearly
>> authoritative place to be in touch with (not sure what options
>> are -- aliasing/transfering?).
> 
> Sure. I don't know who created that fork but whoever it was please consider
> deleting it. It may be really confusing.

Even more so when it's occasionally updated;
https://github.com/ClusterLabs/corosync (at master branch) now says
"This branch is 3 commits behind corosync:master.".

That also means that there seems no satisfactory solution, yet.

-- 
Jan (Poki)


pgpd3UxMpnNBv.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Jan Pokorný

On 03/11/16 11:08 -0500, Ken Gaillot wrote:
> ClusterLabs is happy to announce the first release candidate for
> Pacemaker version 1.1.16. Source code is available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1
> 
> [...]

As usual, there are COPR builds (using upstream spec file without any
final touch that is usually done downstream) for easy consumption in
some environments:
https://copr.fedorainfracloud.org/coprs/jpokorny/pacemaker/build/473980/

I also have something to share regarding recently announced security
fix in pacemaker if you are interested in Fedora: fixed packages
should be available from updates-testing repo in Fedora 23
and Fedora 25, and regular updates repo in Fedora 24 at the moment.

-- 
Jan (Poki)


pgpeRMbXtWvm5.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Klaus Wenninger

On 11/07/2016 10:26 AM, Jehan-Guillaume de Rorthais wrote:
> On Mon, 7 Nov 2016 10:12:04 +0100
> Klaus Wenninger  wrote:
>
>> On 11/07/2016 08:41 AM, Ulrich Windl wrote:
>> Ken Gaillot  schrieb am 04.11.2016 um 22:37 in
>> Nachricht  
>>> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:  
 On 11/04/2016 02:29 AM, Ulrich Windl wrote:  
 Ken Gaillot  schrieb am 03.11.2016 um 17:08 in  
> Nachricht
> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:  
> ...
 Another possible use would be for a cron that needs to know whether a
 particular resource is running, and an attribute query is quicker and
 easier than something like parsing crm_mon output or probing the service.  
>>> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so
>>> besides of lacking options and inefficient implementation, why should one
>>> be faster than the other?  
>> attrd_updater doesn't go for the CIB
> AFAIK, attrd_updater actually goes to the CIB, unless you set "--private"
> since 1.1.13:
> https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177
That prevents values being stored in the CIB. attrd_updater should
always talk to attrd as I got it ...



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Jehan-Guillaume de Rorthais

On Mon, 7 Nov 2016 10:12:04 +0100
Klaus Wenninger  wrote:

> On 11/07/2016 08:41 AM, Ulrich Windl wrote:
>  Ken Gaillot  schrieb am 04.11.2016 um 22:37 in
>  Nachricht  
> > <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:  
> >> On 11/04/2016 02:29 AM, Ulrich Windl wrote:  
> >> Ken Gaillot  schrieb am 03.11.2016 um 17:08 in  
> >>> Nachricht
> >>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:  
...
> >> Another possible use would be for a cron that needs to know whether a
> >> particular resource is running, and an attribute query is quicker and
> >> easier than something like parsing crm_mon output or probing the service.  
> > crm_mon reads parts of the CIB; crm_attribute also does, I guess, so
> > besides of lacking options and inefficient implementation, why should one
> > be faster than the other?  
> 
> attrd_updater doesn't go for the CIB

AFAIK, attrd_updater actually goes to the CIB, unless you set "--private"
since 1.1.13:
https://github.com/ClusterLabs/pacemaker/blob/master/ChangeLog#L177


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-07 Thread Klaus Wenninger

On 11/07/2016 08:41 AM, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 04.11.2016 um 22:37 in 
 Nachricht
> <27c2ca20-c52c-8fb4-a60f-5ae12f7ff...@redhat.com>:
>> On 11/04/2016 02:29 AM, Ulrich Windl wrote:
>> Ken Gaillot  schrieb am 03.11.2016 um 17:08 in
>>> Nachricht
>>> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:
 ClusterLabs is happy to announce the first release candidate for
 Pacemaker version 1.1.16. Source code is available at:

 https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 

 The most significant enhancements in this release are:

 * rsc-pattern may now be used instead of rsc in location constraints, to
 allow a single location constraint to apply to all resources whose names
 match a regular expression. Sed-like %0 - %9 backreferences let
 submatches be used in node attribute names in rules.

 * The new ocf:pacemaker:attribute resource agent sets a node attribute
 according to whether the resource is running or stopped. This may be
 useful in combination with attribute-based rules to model dependencies
 that simple constraints can't handle.
>>> I don't quite understand this: Isn't the state of a resource in the CIB 
>> status
>>> section anyway? If not, why not add it? So it would be readily available for
>>> anyone (rules, constraints, etc.).
>> This (hopefully) lets you model more complicated relationships.
>>
>> For example, someone recently asked whether they could make an ordering
>> constraint apply only at "start-up" -- the first time resource A starts,
>> it does some initialization that B needs, but once that's done, B can be
>> independent of A.
> Is "at start-up" before start of the resource, after start of the resource, 
> or parallel to the start of the resource ;-)
> Probably a "hook" in the corresponding RA is the better approach, unless you 
> can really model all of the above.
>
>> For that case, you could group A with an ocf:pacemaker:attribute
>> resource. The important part is that the attribute is not set if A has
>> never run on a node. So, you can make a rule that B can run only where
>> the attribute is set, regardless of the value -- even if A is later
>> stopped, the attribute will still be set.
> If a resource is not running on a node,, it is "stopped"; isn't it?
>
>> Another possible use would be for a cron that needs to know whether a
>> particular resource is running, and an attribute query is quicker and
>> easier than something like parsing crm_mon output or probing the service.
> crm_mon reads parts of the CIB; crm_attribute also does, I guess, so besides 
> of lacking options and inefficient implementation, why should one be faster 
> than the other?

attrd_updater doesn't go for the CIB
 
>
>> It's all theoretical at this point, and I'm not entirely sure those
>> examples would be useful :) but I wanted to make the agent available for
>> people to experiment with.
> A good product manager should resist the attempt to provide any feature the 
> customers ask for, avoiding bloat-ware. That is to protect the customer from 
> their own bad decisions. In most cases there is a better, more universal 
> solution to the specific problem.
>
>
 * Pacemaker's existing "node health" feature allows resources to move
 off nodes that become unhealthy. Now, when using
 node-health-strategy=progressive, a new cluster property
 node-health-base will be used as the initial health score of newly
 joined nodes (defaulting to 0, which is the previous behavior). This
 allows cloned and multistate resource instances to start on a node even
 if it has some "yellow" health attributes.
>>> So the node health is more or less a "node score"? I don't understand the 
>> last
>>> sentence. Maybe give an example?
>> Yes, node health is a score that's added when deciding where to place a
>> resource. It does get complicated ...
>>
>> Node health monitoring is optional, and off by default.
>>
>> Node health attributes are set to red, yellow or green (outside
>> pacemaker itself -- either by a resource agent, or some external
>> process). As an example, let's say we have three node health attributes
>> for CPU usage, CPU temperature, and SMART error count.
>>
>> With a progressive strategy, red and yellow are assigned some negative
>> score, and green is 0. In our example, let's say yellow gets a -10 score.
>>
>> If any of our attributes are yellow, resources will avoid the node
>> (unless they have higher positive scores from something like stickiness
>> or a location constraint).
>>
> I understood so far.
>
>> Normally, this is what you want, but if your resources are cloned on all
>> nodes, maybe you don't care if some attributes are yellow. In that case,
>> you can set node-health-base=20, so even if two attributes are yellow,
>> it won't prevent resources from running (20 + -10 + -10 = 0).
> I don't understand that: "node-health-base" is a global setting,

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] permissions under /etc/corosync/qnetd (was: Corosync 2.4.0 is available at corosync.org!)

[ClusterLabs] Corosync 2.4.2 is available at corosync.org!

Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

Re: [ClusterLabs] What is the logic when two node are down at the same time and needs to be fenced

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] pacemaker after upgrade from wheezy to jessie

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] Authoritative corosync's location

Re: [ClusterLabs] Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

15 matches

Site Navigation

Mail list logo

Footer information