[ClusterLabs] crm_mon change in behaviour PM 1.1.12 -> 1.1.14: crm_mon -XA filters #health.* node attributes

2016-03-03 Thread Martin Schlegel

Hello everybodyThis is my first post on this mailing list and I am only using Pacemaker since fall 2015 ... please be gentle :-) and I will do the same.Our cluster is using multiple resource agents that update various node attributes (via RAs like sysinfo, healthcpu, etc.) in the form of #health.* and we rely on the mechanism enabled via the property node-health-strategy=migrate-on-red to trigger a resource migrations.In Pacemaker version 1.1.12 crm_mon -A or -XA would still display these #health.* attributes, but not since we have moved up to 1.1.14 and I am not sure why this needed to be changed :root@ys0-resiliency-test-1:~# crm node status-attr pg1 show \#health-cpu scope=status name=#health-cpu value=green root@ys0-resiliency-test-1:~# crm_mon -XrRAf1 | grep -i '#health' ; echo $? 1This seems to be due to this part of the crm_mon.c code:/* Never display node attributes whose name starts with one of these prefixes */#define FILTER_STR { "shutdown", "terminate", "standby", "fail-count", \                                    "last-failure", "probe_complete", "#", NULL }I would like to know if anybody is sharing my opinions on that:   1. From an operations point of view it would be easier to get crm_mon to include #health.* in the general output or at least in the XML output via crm_mon -XA, so that I can get a comprehensive status view in one shot.   2. Because the node attributes list can be extensive and clutters up the output it would make sense to allow a user-defined filter for node attributes in generalRegards,Martin Schlegel

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Issues with crm_mon or ClusterMon resource agent

2016-03-03 Thread Debabrata Pani
Hi,

I wanted  to configure ClusterMon resource agent so that I can get
information about events in the pacemaker cluster.
*Objective is to generate traps for some specific resource agents and/or
conditions*


My cluster installation details :
pacemakerd --version
Pacemaker 1.1.11
Written by Andrew Beekhof


corosync -v
Corosync Cluster Engine, version '1.4.7'
Copyright (c) 2006-2009 Red Hat, Inc.


crm_mon --version
Pacemaker 1.1.11
Written by Andrew Beekhof



I followed the following documentation
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explain
ed/ch07.html
http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacem
akerclustermon-andor-external-agent/

My current cluster status shows:

MonitorClusterChange(ocf::pacemaker:ClusterMon):Started stvm4

The resource agent is configure

The crm_mon is indeed running as a daemon on the node stvm4

[root@stvm4 panidata]# ps -ef | grep crm_mon | grep -v grep
root  2908 1  0 10:59 ?00:00:00 /usr/sbin/crm_mon -p
/tmp/ClusterMon_MonitorClusterChange.pid -d -i 0 -E
/root/panidata/testscript.sh -e anonymous -h
/tmp/ClusterMon_MonitorClusterChange.html

My test script is the following
#!/bin/bash
echo "running" >> /root/running.log
echo "CRM_notify_recipient=$CRM_notify_recipient"
..



As I trigger events by shutting down one or the other service, I see the
html file "/tmp/ClusterMon_MonitorClusterChange.html² getting updated each
time an event is triggered.
So the timestamp of the file keeps on changing.

But I am not sure if the script is getting executed. Because I don¹t see
any ³/root/running.log² file.

Things I have tried:
* Using ³logger² command instead of echo.
* Running the crm_mon command with -d and other parameters manually to
check , if it is the problem with resource agent etc.

Queries:
* Is this a know issue ?
* Am I doing something incorrect ?




Regards,
Debabrata Pani


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Removing node from pacemaker.

2016-03-03 Thread Debabrata Pani
Thanx for this tip , Andrei .
I was complete unaware of this functionality of crm command line tool.

- Debabrata

From: Andrei Maruha mailto:andrei_mar...@epam.com>>
Reply-To: Cluster Labs - All topics related to open-source clustering welcomed 
mailto:users@clusterlabs.org>>
Date: Thursday, 3 March 2016 17:50
To: Cluster Labs - All topics related to open-source clustering welcomed 
mailto:users@clusterlabs.org>>
Subject: Re: [ClusterLabs] Removing node from pacemaker.

Hi,
Usually I use the following steps to delete node from the cluster:
1. #crm corosync del-node 
2. #crm_node -R node --force
3. #crm corosync reload

Instead of steps 1 and 2you can delete certain node from the corosync config 
manually and run:
#corosync-cfgtool -R

On 03/03/2016 02:44 PM, Somanath Jeeva wrote:
Hi,

I am trying to remove a node from the pacemaker’/corosync cluster, using the 
command “crm_node -R dl360x4061 –force”.
Though this command removes the node from the cluster, it is appearing as 
offline after pacemaker/corosync restart in the nodes that are online.

Is there any other command to completely delete the node from the 
pacemaker/corosync cluster.

Pacemaker and Corosync Versions.
PACEMAKER=1.1.10
COROSYNC=1.4.1

Regards
Somanath Thilak J



___
Users mailing list: 
Users@clusterlabs.orghttp://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Removing node from pacemaker.

2016-03-03 Thread Debabrata Pani
Thanx Ken,

Regards,
Debabrata

On 03/03/16 22:18, "Ken Gaillot"  wrote:

>On 03/03/2016 06:04 AM, Debabrata Pani wrote:
>> 
>>http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s
>>-n
>> ode-delete.html
>> 
>> 
>> 
>> Are we missing the deletion of the nodes from the cib ?
>
>That documentation is old; crm_node -R does remove the node from the CIB.
>
>> Regards,
>> Debabrata 
>> 
>> 
>> From:  Somanath Jeeva 
>> Reply-To:  Cluster Labs - All topics related to open-source clustering
>> welcomed 
>> Date:  Thursday, 3 March 2016 17:14
>> To:  "users@clusterlabs.org" 
>> Subject:  [ClusterLabs]  Removing node from pacemaker.
>> 
>> 
>> Hi,
>>  
>> I am trying to remove a node from the pacemaker¹/corosync cluster, using
>> the command ³crm_node -R dl360x4061 ­force².
>
>The above command removes the node from pacemaker itself, but not the
>underlying cluster layer, which should be done first.
>
>High-level tools often offer simple commands to handle both the
>pacemaker and cluster layer parts for you. For pcs, it's "pcs cluster
>node remove ".
>
>Otherwise, the process for removing the node from the cluster layer
>varies by what you're using -- heartbeat, corosync 1 with plugin,
>corosync 1 with CMAN, or corosync 2. Generally, you want to stop the
>cluster software on the node to be removed first, then remove the node
>from the layer's configuration on the remaining nodes if it is
>explicitly specified there.
>
>> Though this command removes the node from the cluster, it is appearing
>>as
>> offline after pacemaker/corosync restart in the nodes that are online.
>>  
>> Is there any other command to completely delete the node from the
>> pacemaker/corosync cluster.
>>  
>> Pacemaker and Corosync Versions.
>> PACEMAKER=1.1.10
>> COROSYNC=1.4.1
>>  
>> Regards
>> Somanath Thilak J
>
>
>___
>Users mailing list: Users@clusterlabs.org
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] crm_mon change in behaviour PM 1.1.12 -> 1.1.14: crm_mon -XA filters #health.* node attributes

2016-03-03 Thread Ken Gaillot
On 03/03/2016 10:07 AM, Martin Schlegel wrote:
> Hello everybody
> 
> 
> This is my first post on this mailing list and I am only using Pacemaker 
> since 
> fall 2015 ... please be gentle :-) and I will do the same.
> 
> 
> Our cluster is using multiple resource agents that update various node 
> attributes (via RAs like sysinfo, healthcpu, etc.) in the form of #health.* 
> and 
> we rely on the mechanism enabled via the 
> property node-health-strategy=migrate-on-red to trigger resource migrations.
> 
> In Pacemaker version 1.1.12 crm_mon -A or -XA would still display these 
> #health.* attributes, but not since we have moved up to 1.1.14 and I am not 
> sure 
> why this needed to be changed :
> 
> root@ys0-resiliency-test-1:~# crm node status-attr pg1 show \#health-cpu
> *scope=status name=#health-cpu value=green*
> 
> root@ys0-resiliency-test-1:~# crm_mon -XrRAf1 | grep -i '#health' ; echo $?
> *1*
> 
> *
> *
> 
> This seems to be due to this part of the crm_mon.c code:
> 
> /* Never display node attributes whose name starts with one of these prefixes 
> */
> #define FILTER_STR { "shutdown", "terminate", "standby", "fail-count", \
>  "last-failure", "probe_complete", *“#”*, 
> NULL }

You are correct. The goal is to hide attributes that are sort of
"internal" to the cluster or resource agents, and just show attributes
set by the user.

> *
> *
> 
> I would like to know if anybody is sharing my opinions on that:
> 
> 1. From an operations point of view it would be easier to get crm_mon to 
> include #health.* in the general output or at least in the XML output via 
> crm_mon -XA, so that I can get a comprehensive status view in one shot.

It does make sense to include all attributes in XML output since that is
intended for automated parsing. It should be fairly easy to move the
filtering from create_attr_list() to print_node_attribute().

> 2. Because the node attributes list can be extensive and clutters up the 
> output it would make sense to allow a user-defined filter for node attributes 
> in 
> general

Feel free to open a feature request on bugs.clusterlabs.org. We also
gladly accept code submissions at https://github.com/ClusterLabs/pacemaker

> 
> Regards,
> 
> Martin Schlegel


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Removing node from pacemaker.

2016-03-03 Thread Ken Gaillot
On 03/03/2016 06:04 AM, Debabrata Pani wrote:
> http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-n
> ode-delete.html
> 
> 
> 
> Are we missing the deletion of the nodes from the cib ?

That documentation is old; crm_node -R does remove the node from the CIB.

> Regards,
> Debabrata 
> 
> 
> From:  Somanath Jeeva 
> Reply-To:  Cluster Labs - All topics related to open-source clustering
> welcomed 
> Date:  Thursday, 3 March 2016 17:14
> To:  "users@clusterlabs.org" 
> Subject:  [ClusterLabs]  Removing node from pacemaker.
> 
> 
> Hi,
>  
> I am trying to remove a node from the pacemaker¹/corosync cluster, using
> the command ³crm_node -R dl360x4061 ­force².

The above command removes the node from pacemaker itself, but not the
underlying cluster layer, which should be done first.

High-level tools often offer simple commands to handle both the
pacemaker and cluster layer parts for you. For pcs, it's "pcs cluster
node remove ".

Otherwise, the process for removing the node from the cluster layer
varies by what you're using -- heartbeat, corosync 1 with plugin,
corosync 1 with CMAN, or corosync 2. Generally, you want to stop the
cluster software on the node to be removed first, then remove the node
from the layer's configuration on the remaining nodes if it is
explicitly specified there.

> Though this command removes the node from the cluster, it is appearing as
> offline after pacemaker/corosync restart in the nodes that are online.
>  
> Is there any other command to completely delete the node from the
> pacemaker/corosync cluster.
>  
> Pacemaker and Corosync Versions.
> PACEMAKER=1.1.10
> COROSYNC=1.4.1
>  
> Regards
> Somanath Thilak J


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] crm_mon change in behaviour PM 1.1.12 -> 1.1.14: crm_mon -XA filters #health.* node attributes

2016-03-03 Thread Martin Schlegel

Hello everybodyThis is my first post on this mailing list and I am only using Pacemaker since fall 2015 ... please be gentle :-) and I will do the same.Our cluster is using multiple resource agents that update various node attributes (via RAs like sysinfo, healthcpu, etc.) in the form of #health.* and we rely on the mechanism enabled via the property node-health-strategy=migrate-on-red to trigger resource migrations.In Pacemaker version 1.1.12 crm_mon -A or -XA would still display these #health.* attributes, but not since we have moved up to 1.1.14 and I am not sure why this needed to be changed :root@ys0-resiliency-test-1:~# crm node status-attr pg1 show \#health-cpu scope=status name=#health-cpu value=greenroot@ys0-resiliency-test-1:~# crm_mon -XrRAf1 | grep -i '#health' ; echo $? 1This seems to be due to this part of the crm_mon.c code:/* Never display node attributes whose name starts with one of these prefixes */#define FILTER_STR { "shutdown", "terminate", "standby", "fail-count", \                                    "last-failure", "probe_complete", “#”, NULL }I would like to know if anybody is sharing my opinions on that:   1. From an operations point of view it would be easier to get crm_mon to include #health.* in the general output or at least in the XML output via crm_mon -XA, so that I can get a comprehensive status view in one shot.   2. Because the node attributes list can be extensive and clutters up the output it would make sense to allow a user-defined filter for node attributes in generalRegards,Martin Schlegel

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Removing node from pacemaker.

2016-03-03 Thread Andrei Maruha

Hi,
Usually I use the following steps to delete node from the cluster:
1. #crm corosync del-node 
2. #crm_node -R node --force
3. #crm corosync reload

Instead of steps 1 and 2you can delete certain node from the corosync 
config manually and run:

#corosync-cfgtool -R

On 03/03/2016 02:44 PM, Somanath Jeeva wrote:


Hi,

I am trying to remove a node from the pacemaker’/corosync cluster, 
using the command “crm_node -R dl360x4061 –force”.


Though this command removes the node from the cluster, it is appearing 
as offline after pacemaker/corosync restart in the nodes that are online.


Is there any other command to completely delete the node from the 
pacemaker/corosync cluster.


Pacemaker and Corosync Versions.

PACEMAKER=1.1.10

COROSYNC=1.4.1

Regards

Somanath Thilak J



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Removing node from pacemaker.

2016-03-03 Thread Debabrata Pani
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-n
ode-delete.html



Are we missing the deletion of the nodes from the cib ?

Regards,
Debabrata 


From:  Somanath Jeeva 
Reply-To:  Cluster Labs - All topics related to open-source clustering
welcomed 
Date:  Thursday, 3 March 2016 17:14
To:  "users@clusterlabs.org" 
Subject:  [ClusterLabs]  Removing node from pacemaker.


Hi,
 
I am trying to remove a node from the pacemaker¹/corosync cluster, using
the command ³crm_node -R dl360x4061 ­force².

Though this command removes the node from the cluster, it is appearing as
offline after pacemaker/corosync restart in the nodes that are online.
 
Is there any other command to completely delete the node from the
pacemaker/corosync cluster.
 
Pacemaker and Corosync Versions.
PACEMAKER=1.1.10
COROSYNC=1.4.1
 
Regards
Somanath Thilak J


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Removing node from pacemaker.

2016-03-03 Thread Somanath Jeeva
Hi,

I am trying to remove a node from the pacemaker'/corosync cluster, using the 
command "crm_node -R dl360x4061 -force".
Though this command removes the node from the cluster, it is appearing as 
offline after pacemaker/corosync restart in the nodes that are online.

Is there any other command to completely delete the node from the 
pacemaker/corosync cluster.

Pacemaker and Corosync Versions.
PACEMAKER=1.1.10
COROSYNC=1.4.1

Regards
Somanath Thilak J
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] service network restart and corosync

2016-03-03 Thread Jan Friesse



Hi,

In our deployment, due to some requirement, we need to do a :
service network restart


What is exact reason for doing network restart?



Due to this corosync crashes and the associated pacemaker processes crash
as well.

As per the last comment on this issue,
---
Corosync reacts oddly to that. It's better to use an iptables rule to
block traffic (or crash the node with something like 'echo c >
/proc/sysrq-trigge




But other network services, like Postgres, do not crash due to this
network service restart :
I can login to psql , issue queries, without any problem.

In view of this, I would like to understand if it is possible to prevent a
corosync (and a corresponding Pacemaker) crash ?
Since postgres is somehow surviving this restart.

Any pointer to socket-level details for this behaviour will help me
understand (and explain the stakeholders) the problems better.


https://github.com/corosync/corosync/pull/32 should help.

Regards,
  Honza



Regards,
Debabrata Pani





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org