[ClusterLabs] Clearing failed actions

2018-07-06 Thread Casey Allen Shobe
Hi,

I found a web page which suggested to clear the Failed Actions, to use 
`crm_resource -P`.  Although this appears to work, it's not documented on the 
man page at all.  Is this deprecated and is there a more correct way to be 
doing this?

Also, is there a way to clear one specific item from the list, or is clearing 
all the only option?

Thank you in advance for any advice,
-- 
Casey
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Clearing "Failed Actions"

2018-07-06 Thread Casey Allen Shobe
I found a random web page which suggested to clear the Failed Actions, to use 
`crm_resource -P`.

Although this appears to work, it's not documented on the man page at all.  Is 
this deprecated and is there a more correct way to be doing this?

Cheers,
-- 
Casey
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Clearing "Failed Actions"

2018-07-06 Thread Casey Allen Shobe
I found a random web page which suggested to clear the Failed Actions, to use 
`crm_resource -P`.

Although this appears to work, it's not documented on the man page at all.  Is 
this deprecated and is there a more correct way to be doing this?

Cheers,
-- 
Casey
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker 2.0.0 has been released

2018-07-06 Thread Ken Gaillot
I am very happy to announce that source code for the final release of
Pacemaker version 2.0.0 is now available at:

https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0.0

The main goal of the change from Pacemaker 1 to 2 is to drop support
for deprecated legacy usage, in order to make the code base more
maintainable going into the future.

Rolling (live) upgrades are possible only from Pacemaker 1.1.11 or
later, on top of corosync 2 or later. Other setups can be upgraded with
the cluster stopped.

If upgrading an existing cluster, it is recommended to run "cibadmin --
upgrade" (or the equivalent in your higher-level tool of choice) both
before and after the upgrade.

Extensive details about the changes in this release are available in
the change log:

  https://github.com/ClusterLabs/pacemaker/blob/2.0/ChangeLog

and in a special wiki page for the 2.0 release:

  https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes

Highlights:

* Support has been dropped for heartbeat and corosync 1 (whether using
CMAN or plugin), and many legacy aliases for cluster options (including
default-resource-stickiness, which should be set as resource-
stickiness in rsc_defaults instead).

* The logs should be a little more user-friendly. The Pacemaker daemons
have been renamed for easier log searching. The default location of the
Pacemaker detail log is now /var/log/pacemaker/pacemaker.log, and
Pacemaker will no longer use Corosync's logging preferences.

* The master XML tag is deprecated (though still supported) in favor of
using the standard clone tag with a new "promotable" meta-attribute set
to true. The "master-max" and "master-node-max" master meta-attributes
are deprecated in favor of new "promoted-max" and "promoted-node-max"
clone meta-attributes. Documentation now refers to these as promotable
clones rather than master/slave, stateful or multistate clones.

* The record-pending option now defaults to true, which means pending
actions will be shown in status displays.

* The "Pacemaker Explained" document has grown large enough that topics
related to cluster administration have been moved to their own new
document, "Pacemaker Administration":

  http://clusterlabs.org/pacemaker/doc/

Many thanks to all contributors of source code to this release,
including Andrew Beekhof, Bin Liu, Bruno Travouillon, Gao,Yan, Hideo
Yamauchi, Jan Pokorný, Ken Gaillot, and Klaus Wenninger.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker alert framework

2018-07-06 Thread Ken Gaillot
On Fri, 2018-07-06 at 15:58 +0200, Klaus Wenninger wrote:
> On 07/06/2018 03:41 PM, Ian Underhill wrote:
> > requirement:
> > when a resource fails perform an actionm, run a script on all nodes
> > within the cluster, before the resource is relocated. i.e.
> > information gathering why the resource failed.
>  
> Have in mind that trying to run a script on all nodes in the cluster
> before
> proceeding is a delicate issue because not being able to run it on
> one
> node might prevent relocation and thus availability.
> Of course this largely depends on how it is implemented - just wanted
> to raise attention.
>   
> > what I have looked into:
> > 1) Use the monitor call within the resource to SSH to all nodes,
> > again SSH config needed.
> > 2) Alert framework : this only seems to be triggered for nodes
> > involved in the relocation of the resource. i.e. if resource moves
> > from node1 to node 2 node 3 doesnt know. so back to the SSH
> > solution :(
>  
> Alerts are designed not to block anything (no other alerts as well)
> so the alert-agents are 
> just called on nodes that anyway already have to do with that very
> event.
> 
> > 3) sending a custom alert to all nodes in the cluster? is this
> > possible? not found a way?
> > 
> > only solution I have:
> > 1) use SSH within an alert monitor (stop) to SSH onto all nodes to
> > perform the action, the nodes could be configured using the alert
> > monitors recipients, but I would still need to config SSH users and
> > certs etc.
> >      1.a) this doesnt seem to be usable if the resource is
> > relocated back to the same node, as the alerts start\stop are run
> > at the "same time". i.e I need to delay the start till the SSH has
> > completed.
> > 
> > what I would like:
> > 1) delay the start\relocation of the resource until the information
> > from all nodes is complete, using only pacemaker behaviour\config
> > 
> > any ideas?
>  
> Not 100% sure of what what your exact intention is ...
> You could experiment with a clone that depends on the running
> instance and
> use the stop of that to trigger whatever you need.
> Not sure but I'd expect that pacemaker would tear down all clone
> instances
> before it relocates your resource.

Yes, that sounds like a good solution. See clone notifications:

http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pa
cemaker_Explained/index.html#_clone_resource_agent_requirements

You could even combine everything into a single custom resource agent
for use as a master/slave resource, where the master is the only
instance that actually runs the resource, and the slaves just act on
the notifications.

> 
> Regards,
> Klaus
> 
> > Thanks
> > 
> > /Ian.
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker alert framework

2018-07-06 Thread Klaus Wenninger
On 07/06/2018 03:41 PM, Ian Underhill wrote:
> requirement:
> when a resource fails perform an actionm, run a script on all nodes
> within the cluster, before the resource is relocated. i.e. information
> gathering why the resource failed.

Have in mind that trying to run a script on all nodes in the cluster before
proceeding is a delicate issue because not being able to run it on one
node might prevent relocation and thus availability.
Of course this largely depends on how it is implemented - just wanted
to raise attention.
 
>
> what I have looked into:
> 1) Use the monitor call within the resource to SSH to all nodes, again
> SSH config needed.
> 2) Alert framework : this only seems to be triggered for nodes
> involved in the relocation of the resource. i.e. if resource moves
> from node1 to node 2 node 3 doesnt know. so back to the SSH solution :(

Alerts are designed not to block anything (no other alerts as well) so
the alert-agents are
just called on nodes that anyway already have to do with that very event.

> 3) sending a custom alert to all nodes in the cluster? is this
> possible? not found a way?
>
> only solution I have:
> 1) use SSH within an alert monitor (stop) to SSH onto all nodes to
> perform the action, the nodes could be configured using the alert
> monitors recipients, but I would still need to config SSH users and
> certs etc.
>      1.a) this doesnt seem to be usable if the resource is relocated
> back to the same node, as the alerts start\stop are run at the "same
> time". i.e I need to delay the start till the SSH has completed.
>
> what I would like:
> 1) delay the start\relocation of the resource until the information
> from all nodes is complete, using only pacemaker behaviour\config
>
> any ideas?

Not 100% sure of what what your exact intention is ...
You could experiment with a clone that depends on the running instance and
use the stop of that to trigger whatever you need.
Not sure but I'd expect that pacemaker would tear down all clone instances
before it relocates your resource.

Regards,
Klaus

>
> Thanks
>
> /Ian.
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker alert framework

2018-07-06 Thread Ian Underhill
requirement:
when a resource fails perform an actionm, run a script on all nodes within
the cluster, before the resource is relocated. i.e. information gathering
why the resource failed.

what I have looked into:
1) Use the monitor call within the resource to SSH to all nodes, again SSH
config needed.
2) Alert framework : this only seems to be triggered for nodes involved in
the relocation of the resource. i.e. if resource moves from node1 to node 2
node 3 doesnt know. so back to the SSH solution :(
3) sending a custom alert to all nodes in the cluster? is this possible?
not found a way?

only solution I have:
1) use SSH within an alert monitor (stop) to SSH onto all nodes to perform
the action, the nodes could be configured using the alert monitors
recipients, but I would still need to config SSH users and certs etc.
 1.a) this doesnt seem to be usable if the resource is relocated back
to the same node, as the alerts start\stop are run at the "same time". i.e
I need to delay the start till the SSH has completed.

what I would like:
1) delay the start\relocation of the resource until the information from
all nodes is complete, using only pacemaker behaviour\config

any ideas?

Thanks

/Ian.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Upgrade corosync problem

2018-07-06 Thread Salvatore D'angelo
Hi,

Thanks to reply. The problem is opposite to what you are saying.

When I build corosync with old libqb and I verified the new updated node worked 
properly I updated with new libqb hand-compiled and it works fine.
But in a normale upgrade procedure I first build libqb (removing first the old 
one) and then corosync, when I follow this order it does not work.
This is what make me crazy.
I do not understand this behavior.

> On 6 Jul 2018, at 14:40, Christine Caulfield  wrote:
> 
> On 06/07/18 13:24, Salvatore D'angelo wrote:
>> Hi All,
>> 
>> The option --ulimit memlock=536870912 worked fine.
>> 
>> I have now another strange issue. The upgrade without updating libqb
>> (leaving the 0.16.0) worked fine.
>> If after the upgrade I stop pacemaker and corosync, I download the
>> latest libqb version:
>> https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.gz
>> build and install it everything works fine.
>> 
>> If I try to install in sequence (after the installation of old code):
>> 
>> libqb 1.0.3
>> corosync 2.4.4
>> pacemaker 1.1.18
>> crmsh 3.0.1
>> resource agents 4.1.1
>> 
>> when I try to start corosync I got the following error:
>> *Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line
>> 99:  8470 Aborted $prog $COROSYNC_OPTIONS > /dev/null 2>&1*
>> *[FAILED]*
> 
> 
> Yes. you can't randomly swap in and out hand-compiled libqb versions.
> Find one that works and stick to it. It's an annoying 'feature' of newer
> linkers that we had to workaround in libqb. So if you rebuild libqb
> 1.0.3 then you will, in all likelihood, need to rebuild corosync to
> match it.
> 
> Chrissie
> 
> 
>> 
>> if I launch corosync -f I got:
>> *corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite
>> section is populated, otherwise target's build is at fault, preventing
>> reliable logging" && __start___verbose != __stop___verbose' failed.*
>> 
>> anything is logged (even in debug mode).
>> 
>> I do not understand why installing libqb during the normal upgrade
>> process fails while if I upgrade it after the
>> crmsh/pacemaker/corosync/resourceagents upgrade it works fine. 
>> 
>> On 3 Jul 2018, at 11:42, Christine Caulfield > 
>> >> wrote:
>>> 
>>> On 03/07/18 07:53, Jan Pokorný wrote:
 On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
> Today I tested the two suggestions you gave me. Here what I did. 
> In the script where I create my 5 machines cluster (I use three
> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
> that we use for database backup and WAL files).
> 
> FIRST TEST
> ——
> I added the —shm-size=512m to the “docker create” command. I noticed
> that as soon as I start it the shm size is 512m and I didn’t need to
> add the entry in /etc/fstab. However, I did it anyway:
> 
> tmpfs  /dev/shm  tmpfs   defaults,size=512m   0   0
> 
> and then
> mount -o remount /dev/shm
> 
> Then I uninstalled all pieces of software (crmsh, resource agents,
> corosync and pacemaker) and installed the new one.
> Started corosync and pacemaker but same problem occurred.
> 
> SECOND TEST
> ———
> stopped corosync and pacemaker
> uninstalled corosync
> build corosync with --enable-small-memory-footprint and installed it
> starte corosync and pacemaker
> 
> IT WORKED.
> 
> I would like to understand now why it didn’t worked in first test
> and why it worked in second. Which kind of memory is used too much
> here? /dev/shm seems not the problem, I allocated 512m on all three
> docker images (obviously on my single Mac) and enabled the container
> option as you suggested. Am I missing something here?
 
 My suspicion then fully shifts towards "maximum number of bytes of
 memory that may be locked into RAM" per-process resource limit as
 raised in one of the most recent message ...
 
> Now I want to use Docker for the moment only for test purpose so it
> could be ok to use the --enable-small-memory-footprint, but there is
> something I can do to have corosync working even without this
> option?
 
 ... so try running the container the already suggested way:
 
  docker run ... --ulimit memlock=33554432 ...
 
 or possibly higher (as a rule of thumb, keep doubling the accumulated
 value until some unreasonable amount is reached, like the equivalent
 of already used 512 MiB).
 
 Hope this helps.
>>> 
>>> This makes a lot of sense to me. As Poki pointed out earlier, in
>>> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
>>> NOT to be locked in RAM after it forked - which was causing potential
>>> performance issues. So if you replace an earlier corosync with 2.4.3 or
>>> later then it will use more locked memory t

Re: [ClusterLabs] Upgrade corosync problem

2018-07-06 Thread Christine Caulfield
On 06/07/18 13:24, Salvatore D'angelo wrote:
> Hi All,
> 
> The option --ulimit memlock=536870912 worked fine.
> 
> I have now another strange issue. The upgrade without updating libqb
> (leaving the 0.16.0) worked fine.
> If after the upgrade I stop pacemaker and corosync, I download the
> latest libqb version:
> https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.gz
> build and install it everything works fine.
> 
> If I try to install in sequence (after the installation of old code):
> 
> libqb 1.0.3
> corosync 2.4.4
> pacemaker 1.1.18
> crmsh 3.0.1
> resource agents 4.1.1
> 
> when I try to start corosync I got the following error:
> *Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line
> 99:  8470 Aborted                 $prog $COROSYNC_OPTIONS > /dev/null 2>&1*
> *[FAILED]*


Yes. you can't randomly swap in and out hand-compiled libqb versions.
Find one that works and stick to it. It's an annoying 'feature' of newer
linkers that we had to workaround in libqb. So if you rebuild libqb
1.0.3 then you will, in all likelihood, need to rebuild corosync to
match it.

Chrissie


> 
> if I launch corosync -f I got:
> *corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite
> section is populated, otherwise target's build is at fault, preventing
> reliable logging" && __start___verbose != __stop___verbose' failed.*
> 
> anything is logged (even in debug mode).
> 
> I do not understand why installing libqb during the normal upgrade
> process fails while if I upgrade it after the
> crmsh/pacemaker/corosync/resourceagents upgrade it works fine. 
> 
> On 3 Jul 2018, at 11:42, Christine Caulfield  > wrote:
>>
>> On 03/07/18 07:53, Jan Pokorný wrote:
>>> On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
 Today I tested the two suggestions you gave me. Here what I did. 
 In the script where I create my 5 machines cluster (I use three
 nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
 that we use for database backup and WAL files).

 FIRST TEST
 ——
 I added the —shm-size=512m to the “docker create” command. I noticed
 that as soon as I start it the shm size is 512m and I didn’t need to
 add the entry in /etc/fstab. However, I did it anyway:

 tmpfs  /dev/shm  tmpfs   defaults,size=512m   0   0

 and then
 mount -o remount /dev/shm

 Then I uninstalled all pieces of software (crmsh, resource agents,
 corosync and pacemaker) and installed the new one.
 Started corosync and pacemaker but same problem occurred.

 SECOND TEST
 ———
 stopped corosync and pacemaker
 uninstalled corosync
 build corosync with --enable-small-memory-footprint and installed it
 starte corosync and pacemaker

 IT WORKED.

 I would like to understand now why it didn’t worked in first test
 and why it worked in second. Which kind of memory is used too much
 here? /dev/shm seems not the problem, I allocated 512m on all three
 docker images (obviously on my single Mac) and enabled the container
 option as you suggested. Am I missing something here?
>>>
>>> My suspicion then fully shifts towards "maximum number of bytes of
>>> memory that may be locked into RAM" per-process resource limit as
>>> raised in one of the most recent message ...
>>>
 Now I want to use Docker for the moment only for test purpose so it
 could be ok to use the --enable-small-memory-footprint, but there is
 something I can do to have corosync working even without this
 option?
>>>
>>> ... so try running the container the already suggested way:
>>>
>>>  docker run ... --ulimit memlock=33554432 ...
>>>
>>> or possibly higher (as a rule of thumb, keep doubling the accumulated
>>> value until some unreasonable amount is reached, like the equivalent
>>> of already used 512 MiB).
>>>
>>> Hope this helps.
>>
>> This makes a lot of sense to me. As Poki pointed out earlier, in
>> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
>> NOT to be locked in RAM after it forked - which was causing potential
>> performance issues. So if you replace an earlier corosync with 2.4.3 or
>> later then it will use more locked memory than before.
>>
>> Chrissie
>>
>>
>>>
 The reason I am asking this is that, in the future, it could be
 possible we deploy in production our cluster in containerised way
 (for the moment is just an idea). This will save a lot of time in
 developing, maintaining and deploying our patch system. All
 prerequisites and dependencies will be enclosed in container and if
 IT team will do some maintenance on bare metal (i.e. install new
 dependencies) it will not affects our containers. I do not see a lot
 of performance drawbacks in using container. The point is to
 understand if a containerised approach could save us lot of headache
 about mainte

Re: [ClusterLabs] Upgrade corosync problem

2018-07-06 Thread Salvatore D'angelo
Here some strike of corosync:

execve("/usr/sbin/corosync", ["corosync"], [/* 21 vars */]) = 0
brk(0)  = 0x563b1f774000
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=26561, ...}) = 0
mmap(NULL, 26561, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0cd4182000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/usr/lib/libtotem_pg.so.5", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260a\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=917346, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f0cd4181000
mmap(NULL, 2267392, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f0cd3d3d000
mprotect(0x7f0cd3d61000, 2093056, PROT_NONE) = 0
mmap(0x7f0cd3f6, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x23000) = 0x7f0cd3f6
mmap(0x7f0cd3f62000, 18688, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0cd3f62000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/usr/lib/libcorosync_common.so.4", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\6\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=43858, ...}) = 0
mmap(NULL, 2105360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f0cd3b3a000
mprotect(0x7f0cd3b3b000, 2097152, PROT_NONE) = 0
mmap(0x7f0cd3d3b000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f0cd3d3b000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\16\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14664, ...}) = 0
mmap(NULL, 2109744, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f0cd3936000
mprotect(0x7f0cd3939000, 2093056, PROT_NONE) = 0
mmap(0x7f0cd3b38000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f0cd3b38000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0po\0\0\0\0\0\0"..., 832) 
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=141574, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f0cd418
mmap(NULL, 2217264, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f0cd3718000
mprotect(0x7f0cd3731000, 2093056, PROT_NONE) = 0
mmap(0x7f0cd393, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7f0cd393
mmap(0x7f0cd3932000, 13616, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0cd3932000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P \2\0\0\0\0\0"..., 832) 
= 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1857312, ...}) = 0
mmap(NULL, 3965632, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f0cd334f000
mprotect(0x7f0cd350d000, 2097152, PROT_NONE) = 0
mmap(0x7f0cd370d000, 24576, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x7f0cd370d000
mmap(0x7f0cd3713000, 17088, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0cd3713000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/usr/lib/libqb.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\243\0\0\0\0\0\0"..., 
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=951833, ...}) = 0
mmap(NULL, 6717576, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 
0x7f0cd2ce6000
mprotect(0x7f0cd2d0b000, 6557696, PROT_NONE) = 0
mmap(0x7f0cd2f0a000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f0cd2f0a000
mmap(0x7f0cd2f0c000, 264352, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0cd2f0c000
mmap(0x7f0cd334c000, 12288, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x66000) = 0x7f0cd334c000
close(3)= 0
access("/etc/ld.so.nohwcap", F_OK)  = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libnss3.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0

Re: [ClusterLabs] Upgrade corosync problem

2018-07-06 Thread Salvatore D'angelo
Hi All,

The option --ulimit memlock=536870912 worked fine.

I have now another strange issue. The upgrade without updating libqb (leaving 
the 0.16.0) worked fine.
If after the upgrade I stop pacemaker and corosync, I download the latest libqb 
version:
https://github.com/ClusterLabs/libqb/releases/download/v1.0.3/libqb-1.0.3.tar.gz
build and install it everything works fine.

If I try to install in sequence (after the installation of old code):

libqb 1.0.3
corosync 2.4.4
pacemaker 1.1.18
crmsh 3.0.1
resource agents 4.1.1

when I try to start corosync I got the following error:
Starting Corosync Cluster Engine (corosync): /etc/init.d/corosync: line 99:  
8470 Aborted $prog $COROSYNC_OPTIONS > /dev/null 2>&1
[FAILED]

if I launch corosync -f I got:
corosync: main.c:143: logsys_qb_init: Assertion `"implicit callsite section is 
populated, otherwise target's build is at fault, preventing reliable logging" 
&& __start___verbose != __stop___verbose' failed.

anything is logged (even in debug mode).

I do not understand why installing libqb during the normal upgrade process 
fails while if I upgrade it after the crmsh/pacemaker/corosync/resourceagents 
upgrade it works fine. 

On 3 Jul 2018, at 11:42, Christine Caulfield  wrote:
> 
> On 03/07/18 07:53, Jan Pokorný wrote:
>> On 02/07/18 17:19 +0200, Salvatore D'angelo wrote:
>>> Today I tested the two suggestions you gave me. Here what I did. 
>>> In the script where I create my 5 machines cluster (I use three
>>> nodes for pacemaker PostgreSQL cluster and two nodes for glusterfs
>>> that we use for database backup and WAL files).
>>> 
>>> FIRST TEST
>>> ——
>>> I added the —shm-size=512m to the “docker create” command. I noticed
>>> that as soon as I start it the shm size is 512m and I didn’t need to
>>> add the entry in /etc/fstab. However, I did it anyway:
>>> 
>>> tmpfs  /dev/shm  tmpfs   defaults,size=512m   0   0
>>> 
>>> and then
>>> mount -o remount /dev/shm
>>> 
>>> Then I uninstalled all pieces of software (crmsh, resource agents,
>>> corosync and pacemaker) and installed the new one.
>>> Started corosync and pacemaker but same problem occurred.
>>> 
>>> SECOND TEST
>>> ———
>>> stopped corosync and pacemaker
>>> uninstalled corosync
>>> build corosync with --enable-small-memory-footprint and installed it
>>> starte corosync and pacemaker
>>> 
>>> IT WORKED.
>>> 
>>> I would like to understand now why it didn’t worked in first test
>>> and why it worked in second. Which kind of memory is used too much
>>> here? /dev/shm seems not the problem, I allocated 512m on all three
>>> docker images (obviously on my single Mac) and enabled the container
>>> option as you suggested. Am I missing something here?
>> 
>> My suspicion then fully shifts towards "maximum number of bytes of
>> memory that may be locked into RAM" per-process resource limit as
>> raised in one of the most recent message ...
>> 
>>> Now I want to use Docker for the moment only for test purpose so it
>>> could be ok to use the --enable-small-memory-footprint, but there is
>>> something I can do to have corosync working even without this
>>> option?
>> 
>> ... so try running the container the already suggested way:
>> 
>>  docker run ... --ulimit memlock=33554432 ...
>> 
>> or possibly higher (as a rule of thumb, keep doubling the accumulated
>> value until some unreasonable amount is reached, like the equivalent
>> of already used 512 MiB).
>> 
>> Hope this helps.
> 
> This makes a lot of sense to me. As Poki pointed out earlier, in
> corosync 2.4.3 (I think) we fixed a regression in that caused corosync
> NOT to be locked in RAM after it forked - which was causing potential
> performance issues. So if you replace an earlier corosync with 2.4.3 or
> later then it will use more locked memory than before.
> 
> Chrissie
> 
> 
>> 
>>> The reason I am asking this is that, in the future, it could be
>>> possible we deploy in production our cluster in containerised way
>>> (for the moment is just an idea). This will save a lot of time in
>>> developing, maintaining and deploying our patch system. All
>>> prerequisites and dependencies will be enclosed in container and if
>>> IT team will do some maintenance on bare metal (i.e. install new
>>> dependencies) it will not affects our containers. I do not see a lot
>>> of performance drawbacks in using container. The point is to
>>> understand if a containerised approach could save us lot of headache
>>> about maintenance of this cluster without affect performance too
>>> much. I am notice in Cloud environment this approach in a lot of
>>> contexts.
>> 
>> 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> https://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterl

Re: [ClusterLabs] Found libqb issue that affects pacemaker 1.1.18

2018-07-06 Thread Christine Caulfield
On 06/07/18 10:09, Salvatore D'angelo wrote:
> I closed the issue.
> Libqb uses tagging and people should not download the Source code (zip)
>  or Source
> code (tar.gz) .
> The following should be downloaded.
> libqb-1.0.3.tar.gz
> 
> 
> I thought it contained the binary files. I wasn’t aware of tagging
> system and that its was required to download that version of tar.gz file.
> 

It does say so at the bottom of the releases page. Maybe it should be at
the top :)

Chrissie

>> On 5 Jul 2018, at 17:35, Salvatore D'angelo > > wrote:
>>
>> Hi,
>>
>> I tried to build libqb 1.0.3 on a fresh machine and then corosync
>> 2.4.4 and pacemaker 1.1.18.
>> I found the following bug and filed against libqb GitHub:
>> https://github.com/ClusterLabs/libqb/issues/312
>>
>> for the moment I fixed it manually on my env. Anyone experienced this
>> issue?
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Found libqb issue that affects pacemaker 1.1.18

2018-07-06 Thread Salvatore D'angelo
I closed the issue.
Libqb uses tagging and people should not download the Source code (zip) 
 or Source code 
(tar.gz) .
The following should be downloaded.
libqb-1.0.3.tar.gz 


I thought it contained the binary files. I wasn’t aware of tagging system and 
that its was required to download that version of tar.gz file.

> On 5 Jul 2018, at 17:35, Salvatore D'angelo  wrote:
> 
> Hi,
> 
> I tried to build libqb 1.0.3 on a fresh machine and then corosync 2.4.4 and 
> pacemaker 1.1.18.
> I found the following bug and filed against libqb GitHub:
> https://github.com/ClusterLabs/libqb/issues/312 
> 
> 
> for the moment I fixed it manually on my env. Anyone experienced this issue?

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org