[ClusterLabs] corosync.conf - reload failed
Hi, I m trying to update corosync.conf mcast ip in existing cluster. I use this command "corosync-cfgtool -R" command to reload the corosync configuration. *corosync-cfgtool -R* *Reloading corosync.conf...* *Done* But corosync multicast ip not been changed. corosync provides any command to reload mcast ip or I have to restart pacemaker ? Kindly let me know corosync version is 2.3.5. Kindly Regards, Sriram. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Notification agent and Notification recipients
Thanks for clarifying. Regards, Sriram. On Mon, Aug 14, 2017 at 7:34 PM, Klaus Wenninger wrote: > On 08/14/2017 03:19 PM, Sriram wrote: > > Yes, I had precreated the script file with the required permission. > > [root@*node1* alerts]# ls -l /usr/share/pacemaker/alert_file.sh > -rwxr-xr-x. 1 root root 4140 Aug 14 01:51 /usr/share/pacemaker/alert_ > file.sh > [root@*node2* alerts]# ls -l /usr/share/pacemaker/alert_file.sh > -rwxr-xr-x. 1 root root 4139 Aug 14 01:51 /usr/share/pacemaker/alert_ > file.sh > [root@*node3* alerts]# ls -l /usr/share/pacemaker/alert_file.sh > -rwxr-xr-x. 1 root root 4139 Aug 14 01:51 /usr/share/pacemaker/alert_ > file.sh > > Later I observed that user "hacluster" is not able to create the log file > under /usr/share/pacemaker/alert_file.log. > I am sorry, I should have observed this in the log before posting the > query. Then I gave the path as /tmp/alert_file.log, it is able to create > now. > Thanks for pointing it out. > > I have one more clarification, > > if the resource is running in node2, > [root@node2 tmp]# pcs resource > TRR(ocf::heartbeat:TimingRedundancyRA):Started node2 > > And I executed the below command to make it standby. > [root@node2 tmp] # pcs node standby node2 > > Resource shifted to node3, because of higher location constraint. > [root@node2 tmp]# pcs resource > TRR(ocf::heartbeat:TimingRedundancyRA):Started node3. > > > I got the log file created under node2(resource stopped) and > node3(resource started). > > Node1 was not notified about the resource shift, I mean no log file was > created there. > Its because alerts are designed to notify the external agents about the > cluster events. Its not for internal notifications. > > Is my understanding correct ? > > > Quite simple: crmd of node1 just didn't have anything to do with shifting > the resource > from node2 -> node3. There is no additional information passed between the > nodes > just to create a full set of notifications on every node. If you want to > have a full log > (or whatever you altert-agent is doing) in one place this would be up to > your alert-agent. > > > Regards, > Klaus > > > Regards, > Sriram. > > > > On Mon, Aug 14, 2017 at 5:42 PM, Klaus Wenninger > wrote: > >> On 08/14/2017 12:32 PM, Sriram wrote: >> >> Hi Ken, >> >> I used the alerts as well, seems to be not working. >> >> Please check the below configuration >> [root@node1 alerts]# pcs config show >> Cluster Name: >> Corosync Nodes: >> Pacemaker Nodes: >> node1 node2 node3 >> >> Resources: >> Resource: TRR (class=ocf provider=heartbeat type=TimingRedundancyRA) >> Operations: start interval=0s timeout=60s (TRR-start-interval-0s) >> stop interval=0s timeout=20s (TRR-stop-interval-0s) >> monitor interval=10 timeout=20 (TRR-monitor-interval-10) >> >> Stonith Devices: >> Fencing Levels: >> >> Location Constraints: >> Resource: TRR >> Enabled on: node1 (score:100) (id:location-TRR-node1-100) >> Enabled on: node2 (score:200) (id:location-TRR-node2-200) >> Enabled on: node3 (score:300) (id:location-TRR-node3-300) >> Ordering Constraints: >> Colocation Constraints: >> Ticket Constraints: >> >> Alerts: >> Alert: alert_file (path=/usr/share/pacemaker/alert_file.sh) >> Options: debug_exec_order=false >> Meta options: timeout=15s >> Recipients: >>Recipient: recipient_alert_file_id (value=/usr/share/pacemaker/al >> ert_file.log) >> >> >> Did you pre-create the file with proper rights? Be aware that the >> alert-agent >> is called as user hacluster. >> >> >> Resources Defaults: >> resource-stickiness: INFINITY >> Operations Defaults: >> No defaults set >> >> Cluster Properties: >> cluster-infrastructure: corosync >> dc-version: 1.1.15-11.el7_3.4-e174ec8 >> default-action-timeout: 240 >> have-watchdog: false >> no-quorum-policy: ignore >> placement-strategy: balanced >> stonith-enabled: false >> symmetric-cluster: false >> >> Quorum: >> Options: >> >> >> /usr/share/pacemaker/alert_file.sh does not get called whenever I >> trigger a scenario for failover. >> Please let me know if I m missing anything. >> >> >> Do you get any logs - like for startup of resources - or nothing at all? >> >> Regards, >> Klaus >> >> >> >> >> Regards, >> Sriram. >> >> On Tue, Aug 8, 2
Re: [ClusterLabs] Antw: Re: Notification agent and Notification recipients
Yes, I had precreated the script file with the required permission. [root@*node1* alerts]# ls -l /usr/share/pacemaker/alert_file.sh -rwxr-xr-x. 1 root root 4140 Aug 14 01:51 /usr/share/pacemaker/alert_file.sh [root@*node2* alerts]# ls -l /usr/share/pacemaker/alert_file.sh -rwxr-xr-x. 1 root root 4139 Aug 14 01:51 /usr/share/pacemaker/alert_file.sh [root@*node3* alerts]# ls -l /usr/share/pacemaker/alert_file.sh -rwxr-xr-x. 1 root root 4139 Aug 14 01:51 /usr/share/pacemaker/alert_file.sh Later I observed that user "hacluster" is not able to create the log file under /usr/share/pacemaker/alert_file.log. I am sorry, I should have observed this in the log before posting the query. Then I gave the path as /tmp/alert_file.log, it is able to create now. Thanks for pointing it out. I have one more clarification, if the resource is running in node2, [root@node2 tmp]# pcs resource TRR(ocf::heartbeat:TimingRedundancyRA):Started node2 And I executed the below command to make it standby. [root@node2 tmp] # pcs node standby node2 Resource shifted to node3, because of higher location constraint. [root@node2 tmp]# pcs resource TRR(ocf::heartbeat:TimingRedundancyRA):Started node3. I got the log file created under node2(resource stopped) and node3(resource started). Node1 was not notified about the resource shift, I mean no log file was created there. Its because alerts are designed to notify the external agents about the cluster events. Its not for internal notifications. Is my understanding correct ? Regards, Sriram. On Mon, Aug 14, 2017 at 5:42 PM, Klaus Wenninger wrote: > On 08/14/2017 12:32 PM, Sriram wrote: > > Hi Ken, > > I used the alerts as well, seems to be not working. > > Please check the below configuration > [root@node1 alerts]# pcs config show > Cluster Name: > Corosync Nodes: > Pacemaker Nodes: > node1 node2 node3 > > Resources: > Resource: TRR (class=ocf provider=heartbeat type=TimingRedundancyRA) > Operations: start interval=0s timeout=60s (TRR-start-interval-0s) > stop interval=0s timeout=20s (TRR-stop-interval-0s) > monitor interval=10 timeout=20 (TRR-monitor-interval-10) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Resource: TRR > Enabled on: node1 (score:100) (id:location-TRR-node1-100) > Enabled on: node2 (score:200) (id:location-TRR-node2-200) > Enabled on: node3 (score:300) (id:location-TRR-node3-300) > Ordering Constraints: > Colocation Constraints: > Ticket Constraints: > > Alerts: > Alert: alert_file (path=/usr/share/pacemaker/alert_file.sh) > Options: debug_exec_order=false > Meta options: timeout=15s > Recipients: >Recipient: recipient_alert_file_id (value=/usr/share/pacemaker/ > alert_file.log) > > > Did you pre-create the file with proper rights? Be aware that the > alert-agent > is called as user hacluster. > > > Resources Defaults: > resource-stickiness: INFINITY > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > dc-version: 1.1.15-11.el7_3.4-e174ec8 > default-action-timeout: 240 > have-watchdog: false > no-quorum-policy: ignore > placement-strategy: balanced > stonith-enabled: false > symmetric-cluster: false > > Quorum: > Options: > > > /usr/share/pacemaker/alert_file.sh does not get called whenever I trigger > a scenario for failover. > Please let me know if I m missing anything. > > > Do you get any logs - like for startup of resources - or nothing at all? > > Regards, > Klaus > > > > > Regards, > Sriram. > > On Tue, Aug 8, 2017 at 8:29 PM, Ken Gaillot wrote: > >> On Tue, 2017-08-08 at 17:40 +0530, Sriram wrote: >> > Hi Ulrich, >> > >> > >> > Please see inline. >> > >> > On Tue, Aug 8, 2017 at 2:01 PM, Ulrich Windl >> > wrote: >> > >>> Sriram schrieb am 08.08.2017 um >> > 09:30 in Nachricht >> > > > +dv...@mail.gmail.com>: >> > > Hi Ken & Jan, >> > > >> > > In the cluster we have, there is only one resource running. >> > Its a OPT-IN >> > > cluster with resource-stickiness set to INFINITY. >> > > >> > > Just to clarify my question, lets take a scenario where >> > there are four >> > > nodes N1, N2, N3, N4 >> > > a. N1 comes up first, starts the cluster. >> > >> > The cluster will start once it has a quorum. >> > >> > > b. N1 Checks that there is no resource running
Re: [ClusterLabs] Antw: Re: Notification agent and Notification recipients
Hi Ken, I used the alerts as well, seems to be not working. Please check the below configuration [root@node1 alerts]# pcs config show Cluster Name: Corosync Nodes: Pacemaker Nodes: node1 node2 node3 Resources: Resource: TRR (class=ocf provider=heartbeat type=TimingRedundancyRA) Operations: start interval=0s timeout=60s (TRR-start-interval-0s) stop interval=0s timeout=20s (TRR-stop-interval-0s) monitor interval=10 timeout=20 (TRR-monitor-interval-10) Stonith Devices: Fencing Levels: Location Constraints: Resource: TRR Enabled on: node1 (score:100) (id:location-TRR-node1-100) Enabled on: node2 (score:200) (id:location-TRR-node2-200) Enabled on: node3 (score:300) (id:location-TRR-node3-300) Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: Alert: alert_file (path=/usr/share/pacemaker/alert_file.sh) Options: debug_exec_order=false Meta options: timeout=15s Recipients: Recipient: recipient_alert_file_id (value=/usr/share/pacemaker/alert_file.log) Resources Defaults: resource-stickiness: INFINITY Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.15-11.el7_3.4-e174ec8 default-action-timeout: 240 have-watchdog: false no-quorum-policy: ignore placement-strategy: balanced stonith-enabled: false symmetric-cluster: false Quorum: Options: /usr/share/pacemaker/alert_file.sh does not get called whenever I trigger a scenario for failover. Please let me know if I m missing anything. Regards, Sriram. On Tue, Aug 8, 2017 at 8:29 PM, Ken Gaillot wrote: > On Tue, 2017-08-08 at 17:40 +0530, Sriram wrote: > > Hi Ulrich, > > > > > > Please see inline. > > > > On Tue, Aug 8, 2017 at 2:01 PM, Ulrich Windl > > wrote: > > >>> Sriram schrieb am 08.08.2017 um > > 09:30 in Nachricht > > > +dv...@mail.gmail.com>: > > > Hi Ken & Jan, > > > > > > In the cluster we have, there is only one resource running. > > Its a OPT-IN > > > cluster with resource-stickiness set to INFINITY. > > > > > > Just to clarify my question, lets take a scenario where > > there are four > > > nodes N1, N2, N3, N4 > > > a. N1 comes up first, starts the cluster. > > > > The cluster will start once it has a quorum. > > > > > b. N1 Checks that there is no resource running, so it will > > add the > > > resource(R) with the some location constraint(lets say score > > 100) > > > c. So Resource(R) runs in N1 now. > > > d. N2 comes up next, checks that resource(R) is already > > running in N1, so > > > it will update the location constraint(lets say score 200) > > > e. N3 comes up next, checks that resource(R) is already > > running in N1, so > > > it will update the location constraint(lets say score 300) > > > > See my remark on quorum above. > > > > Yes you are right, I forgot to mention it. > > > > > > > f. N4 comes up next, checks that resource(R) is already > > running in N1, so > > > it will update the location constraint(lets say score 400) > > > g. For the some reason, if N1 goes down, resource(R) shifts > > to N4(as its > > > score is higher than anyone). > > > > > > In this case is it possible to notify the nodes N2, N3 that > > newly elected > > > active node is N4 ? > > > > What type of notification, and what would the node do with it? > > Any node in the cluster always has up to date configuration > > information. So it knows the status of the other nodes also. > > > > > > I agree that the node always has upto date configuration information, > > but an application or a thread needs to poll for that information. Is > > there any way, where the notifications are received through some > > action function in RA. ? > > Ah, I misunderstood your situation, I thought you had a cloned resource. > > For that, the alerts feature (available in Pacemaker 1.1.15 and later) > might be useful: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html- > single/Pacemaker_Explained/index.html#idm139900098676896 > > > > > > > > Regards, > > Sriram. > > > > > > > > I went through clone notifications and master-slave, Iooks > >
Re: [ClusterLabs] Antw: Re: Notification agent and Notification recipients
Hi Ulrich, Please see inline. On Tue, Aug 8, 2017 at 2:01 PM, Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > >>> Sriram schrieb am 08.08.2017 um 09:30 in > Nachricht > : > > Hi Ken & Jan, > > > > In the cluster we have, there is only one resource running. Its a OPT-IN > > cluster with resource-stickiness set to INFINITY. > > > > Just to clarify my question, lets take a scenario where there are four > > nodes N1, N2, N3, N4 > > a. N1 comes up first, starts the cluster. > > The cluster will start once it has a quorum. > > > b. N1 Checks that there is no resource running, so it will add the > > resource(R) with the some location constraint(lets say score 100) > > c. So Resource(R) runs in N1 now. > > d. N2 comes up next, checks that resource(R) is already running in N1, so > > it will update the location constraint(lets say score 200) > > e. N3 comes up next, checks that resource(R) is already running in N1, so > > it will update the location constraint(lets say score 300) > > See my remark on quorum above. > > Yes you are right, I forgot to mention it. > f. N4 comes up next, checks that resource(R) is already running in N1, so > > it will update the location constraint(lets say score 400) > > g. For the some reason, if N1 goes down, resource(R) shifts to N4(as its > > score is higher than anyone). > > > > In this case is it possible to notify the nodes N2, N3 that newly elected > > active node is N4 ? > > What type of notification, and what would the node do with it? > Any node in the cluster always has up to date configuration information. > So it knows the status of the other nodes also. > I agree that the node always has upto date configuration information, but an application or a thread needs to poll for that information. Is there any way, where the notifications are received through some action function in RA. ? Regards, Sriram. > > > > > I went through clone notifications and master-slave, Iooks like it either > > requires identical resources(Anonymous) or Unique or Stateful resources > to > > be running > > in all the nodes of the cluster, where as in our case there is only > > resource running in the whole cluster. > > Maybe the main reason for not having notifications is that if a node fails > hard, it won't be able to send out much status information to the other > nodes. > > Regards, > Ulrich > > > > > Regards, > > Sriram. > > > > > > > > > > On Mon, Aug 7, 2017 at 11:28 AM, Sriram wrote: > > > >> > >> Thanks Ken, Jan. Will look into the clone notifications. > >> > >> Regards, > >> Sriram. > >> > >> On Sat, Aug 5, 2017 at 1:25 AM, Ken Gaillot > wrote: > >> > >>> On Thu, 2017-08-03 at 12:31 +0530, Sriram wrote: > >>> > > >>> > Hi Team, > >>> > > >>> > > >>> > We have a four node cluster (1 active : 3 standby) in our lab for a > >>> > particular service. If the active node goes down, one of the three > >>> > standby node becomes active. Now there will be (1 active : 2 > >>> > standby : 1 offline). > >>> > > >>> > > >>> > Is there any way where this newly elected node sends notification to > >>> > the remaining 2 standby nodes about its new status ? > >>> > >>> Hi Sriram, > >>> > >>> This depends on how your service is configured in the cluster. > >>> > >>> If you have a clone or master/slave resource, then clone notifications > >>> is probably what you want (not alerts, which is the path you were going > >>> down -- alerts are designed to e.g. email a system administrator after > >>> an important event). > >>> > >>> For details about clone notifications, see: > >>> > >>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-sing > >>> le/Pacemaker_Explained/index.html#_clone_resource_agent_requirements > >>> > >>> The RA must support the "notify" action, which will be called when a > >>> clone instance is started or stopped. See the similar section later for > >>> master/slave resources for additional information. See the mysql or > >>> pgsql resource agents for examples of notify implementations. > >>> > >>> > I was exploring "notification agent" and "notification recipient" > >>> > features, but that d
Re: [ClusterLabs] Notification agent and Notification recipients
Hi Ken & Jan, In the cluster we have, there is only one resource running. Its a OPT-IN cluster with resource-stickiness set to INFINITY. Just to clarify my question, lets take a scenario where there are four nodes N1, N2, N3, N4 a. N1 comes up first, starts the cluster. b. N1 Checks that there is no resource running, so it will add the resource(R) with the some location constraint(lets say score 100) c. So Resource(R) runs in N1 now. d. N2 comes up next, checks that resource(R) is already running in N1, so it will update the location constraint(lets say score 200) e. N3 comes up next, checks that resource(R) is already running in N1, so it will update the location constraint(lets say score 300) f. N4 comes up next, checks that resource(R) is already running in N1, so it will update the location constraint(lets say score 400) g. For the some reason, if N1 goes down, resource(R) shifts to N4(as its score is higher than anyone). In this case is it possible to notify the nodes N2, N3 that newly elected active node is N4 ? I went through clone notifications and master-slave, Iooks like it either requires identical resources(Anonymous) or Unique or Stateful resources to be running in all the nodes of the cluster, where as in our case there is only resource running in the whole cluster. Regards, Sriram. On Mon, Aug 7, 2017 at 11:28 AM, Sriram wrote: > > Thanks Ken, Jan. Will look into the clone notifications. > > Regards, > Sriram. > > On Sat, Aug 5, 2017 at 1:25 AM, Ken Gaillot wrote: > >> On Thu, 2017-08-03 at 12:31 +0530, Sriram wrote: >> > >> > Hi Team, >> > >> > >> > We have a four node cluster (1 active : 3 standby) in our lab for a >> > particular service. If the active node goes down, one of the three >> > standby node becomes active. Now there will be (1 active : 2 >> > standby : 1 offline). >> > >> > >> > Is there any way where this newly elected node sends notification to >> > the remaining 2 standby nodes about its new status ? >> >> Hi Sriram, >> >> This depends on how your service is configured in the cluster. >> >> If you have a clone or master/slave resource, then clone notifications >> is probably what you want (not alerts, which is the path you were going >> down -- alerts are designed to e.g. email a system administrator after >> an important event). >> >> For details about clone notifications, see: >> >> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-sing >> le/Pacemaker_Explained/index.html#_clone_resource_agent_requirements >> >> The RA must support the "notify" action, which will be called when a >> clone instance is started or stopped. See the similar section later for >> master/slave resources for additional information. See the mysql or >> pgsql resource agents for examples of notify implementations. >> >> > I was exploring "notification agent" and "notification recipient" >> > features, but that doesn't seem to work. /etc/sysconfig/notify.sh >> > doesn't get invoked even in the newly elected active node. >> >> Yep, that's something different altogether -- it's only enabled on RHEL >> systems, and solely for backward compatibility with an early >> implementation of the alerts interface. The new alerts interface is more >> flexible, but it's not designed to send information between cluster >> nodes -- it's designed to send information to something external to the >> cluster, such as a human, or an SNMP server, or a monitoring system. >> >> >> > Cluster Properties: >> > cluster-infrastructure: corosync >> > dc-version: 1.1.17-e2e6cdce80 >> > default-action-timeout: 240 >> > have-watchdog: false >> > no-quorum-policy: ignore >> > notification-agent: /etc/sysconfig/notify.sh >> > notification-recipient: /var/log/notify.log >> > placement-strategy: balanced >> > stonith-enabled: false >> > symmetric-cluster: false >> > >> > >> > >> > >> > I m using the following versions of pacemaker and corosync. >> > >> > >> > /usr/sbin # ./pacemakerd --version >> > Pacemaker 1.1.17 >> > Written by Andrew Beekhof >> > /usr/sbin # ./corosync -v >> > Corosync Cluster Engine, version '2.3.5' >> > Copyright (c) 2006-2009 Red Hat, Inc. >> > >> > >> > Can you please suggest if I m doing anything wrong or if there any >> > other mechanisms to achieve this ? >> > >> > >> > Regards, >> &
Re: [ClusterLabs] Notification agent and Notification recipients
Thanks Ken, Jan. Will look into the clone notifications. Regards, Sriram. On Sat, Aug 5, 2017 at 1:25 AM, Ken Gaillot wrote: > On Thu, 2017-08-03 at 12:31 +0530, Sriram wrote: > > > > Hi Team, > > > > > > We have a four node cluster (1 active : 3 standby) in our lab for a > > particular service. If the active node goes down, one of the three > > standby node becomes active. Now there will be (1 active : 2 > > standby : 1 offline). > > > > > > Is there any way where this newly elected node sends notification to > > the remaining 2 standby nodes about its new status ? > > Hi Sriram, > > This depends on how your service is configured in the cluster. > > If you have a clone or master/slave resource, then clone notifications > is probably what you want (not alerts, which is the path you were going > down -- alerts are designed to e.g. email a system administrator after > an important event). > > For details about clone notifications, see: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html- > single/Pacemaker_Explained/index.html#_clone_resource_agent_requirements > > The RA must support the "notify" action, which will be called when a > clone instance is started or stopped. See the similar section later for > master/slave resources for additional information. See the mysql or > pgsql resource agents for examples of notify implementations. > > > I was exploring "notification agent" and "notification recipient" > > features, but that doesn't seem to work. /etc/sysconfig/notify.sh > > doesn't get invoked even in the newly elected active node. > > Yep, that's something different altogether -- it's only enabled on RHEL > systems, and solely for backward compatibility with an early > implementation of the alerts interface. The new alerts interface is more > flexible, but it's not designed to send information between cluster > nodes -- it's designed to send information to something external to the > cluster, such as a human, or an SNMP server, or a monitoring system. > > > > Cluster Properties: > > cluster-infrastructure: corosync > > dc-version: 1.1.17-e2e6cdce80 > > default-action-timeout: 240 > > have-watchdog: false > > no-quorum-policy: ignore > > notification-agent: /etc/sysconfig/notify.sh > > notification-recipient: /var/log/notify.log > > placement-strategy: balanced > > stonith-enabled: false > > symmetric-cluster: false > > > > > > > > > > I m using the following versions of pacemaker and corosync. > > > > > > /usr/sbin # ./pacemakerd --version > > Pacemaker 1.1.17 > > Written by Andrew Beekhof > > /usr/sbin # ./corosync -v > > Corosync Cluster Engine, version '2.3.5' > > Copyright (c) 2006-2009 Red Hat, Inc. > > > > > > Can you please suggest if I m doing anything wrong or if there any > > other mechanisms to achieve this ? > > > > > > Regards, > > Sriram. > > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > -- > Ken Gaillot > > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Notification agent and Notification recipients
Hi, Any idea what could have gone wrong or if there are other ways to achieve the same ? Regards, Sriram. -- Forwarded message -- From: Sriram Date: Thu, Aug 3, 2017 at 12:31 PM Subject: Notification agent and Notification recipients To: Cluster Labs - All topics related to open-source clustering welcomed < users@clusterlabs.org> Hi Team, We have a four node cluster (1 active : 3 standby) in our lab for a particular service. If the active node goes down, one of the three standby node becomes active. Now there will be (1 active : 2 standby : 1 offline). Is there any way where this newly elected node sends notification to the remaining 2 standby nodes about its new status ? I was exploring "notification agent" and "notification recipient" features, but that doesn't seem to work. /etc/sysconfig/notify.sh doesn't get invoked even in the newly elected active node. Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.17-e2e6cdce80 default-action-timeout: 240 have-watchdog: false no-quorum-policy: ignore *notification-agent: /etc/sysconfig/notify.sh* * notification-recipient: /var/log/notify.log* placement-strategy: balanced stonith-enabled: false symmetric-cluster: false I m using the following versions of pacemaker and corosync. /usr/sbin # ./pacemakerd --version Pacemaker 1.1.17 Written by Andrew Beekhof /usr/sbin # ./corosync -v Corosync Cluster Engine, version '2.3.5' Copyright (c) 2006-2009 Red Hat, Inc. Can you please suggest if I m doing anything wrong or if there any other mechanisms to achieve this ? Regards, Sriram. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Notification agent and Notification recipients
Hi Team, We have a four node cluster (1 active : 3 standby) in our lab for a particular service. If the active node goes down, one of the three standby node becomes active. Now there will be (1 active : 2 standby : 1 offline). Is there any way where this newly elected node sends notification to the remaining 2 standby nodes about its new status ? I was exploring "notification agent" and "notification recipient" features, but that doesn't seem to work. /etc/sysconfig/notify.sh doesn't get invoked even in the newly elected active node. Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.17-e2e6cdce80 default-action-timeout: 240 have-watchdog: false no-quorum-policy: ignore *notification-agent: /etc/sysconfig/notify.sh* * notification-recipient: /var/log/notify.log* placement-strategy: balanced stonith-enabled: false symmetric-cluster: false I m using the following versions of pacemaker and corosync. /usr/sbin # ./pacemakerd --version Pacemaker 1.1.17 Written by Andrew Beekhof /usr/sbin # ./corosync -v Corosync Cluster Engine, version '2.3.5' Copyright (c) 2006-2009 Red Hat, Inc. Can you please suggest if I m doing anything wrong or if there any other mechanisms to achieve this ? Regards, Sriram. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLab] : Corosync not initializing successfully
Corrected the subject. We went ahead and captured corosync debug logs for our ppc board. After log analysis and comparison with the sucessful logs( from x86 machine) , we didnt find * "[ MAIN ] Completed service synchronization, ready to provide service.*" in ppc logs. So, looks like corosync is not in a position to accept connection from Pacemaker. Even I tried with the new corosync.conf with no success. Any hints on this issue would be really helpful. Attaching ppc_notworking.log, x86_working.log, corosync.conf. Regards, Sriram On Fri, Apr 29, 2016 at 2:44 PM, Sriram wrote: > Hi, > > I went ahead and made some changes in file system(Like I brought in > /etc/init.d/corosync and /etc/init.d/pacemaker, /etc/sysconfig ), After > that I was able to run "pcs cluster start". > But it failed with the following error > # pcs cluster start > Starting Cluster... > Starting Pacemaker Cluster Manager[FAILED] > Error: unable to start pacemaker > > And in the /var/log/pacemaker.log, I saw these errors > pacemakerd: info: mcp_read_config: cmap connection setup failed: > CS_ERR_TRY_AGAIN. Retrying in 4s > Apr 29 08:53:47 [15863] node_cu pacemakerd: info: mcp_read_config: > cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 5s > Apr 29 08:53:52 [15863] node_cu pacemakerd: warning: mcp_read_config: > Could not connect to Cluster Configuration Database API, error 6 > Apr 29 08:53:52 [15863] node_cu pacemakerd: notice: main: Could not > obtain corosync config data, exiting > Apr 29 08:53:52 [15863] node_cu pacemakerd: info: crm_xml_cleanup: > Cleaning up memory from libxml2 > > > And in the /var/log/Debuglog, I saw these errors coming from corosync > 20160429 085347.487050 airv_cu daemon.warn corosync[12857]: [QB] > Denied connection, is not ready (12857-15863-14) > 20160429 085347.487067 airv_cu daemon.info corosync[12857]: [QB] > Denied connection, is not ready (12857-15863-14) > > > I browsed the code of libqb to find that it is failing in > > https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c > > Line 600 : > handle_new_connection function > > Line 637: > if (auth_result == 0 && c->service->serv_fns.connection_accept) { > res = c->service->serv_fns.connection_accept(c, > c->euid, c->egid); > } > if (res != 0) { > goto send_response; > } > > Any hints on this issue would be really helpful for me to go ahead. > Please let me know if any logs are required, > > Regards, > Sriram > > On Thu, Apr 28, 2016 at 2:42 PM, Sriram wrote: > >> Thanks Ken and Emmanuel. >> Its a big endian machine. I will try with running "pcs cluster setup" and >> "pcs cluster start" >> Inside cluster.py, "service pacemaker start" and "service corosync start" >> are executed to bring up pacemaker and corosync. >> Those service scripts and the infrastructure needed to bring up the >> processes in the above said manner doesn't exist in my board. >> As it is a embedded board with the limited memory, full fledged linux is >> not installed. >> Just curious to know, what could be reason the pacemaker throws that >> error. >> >> >> >> *"cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s"* >> Thanks for response. >> >> Regards, >> Sriram. >> >> On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot wrote: >> >>> On 04/27/2016 11:25 AM, emmanuel segura wrote: >>> > you need to use pcs to do everything, pcs cluster setup and pcs >>> > cluster start, try to use the redhat docs for more information. >>> >>> Agreed -- pcs cluster setup will create a proper corosync.conf for you. >>> Your corosync.conf below uses corosync 1 syntax, and there were >>> significant changes in corosync 2. In particular, you don't need the >>> file created in step 4, because pacemaker is no longer launched via a >>> corosync plugin. >>> >>> > 2016-04-27 17:28 GMT+02:00 Sriram : >>> >> Dear All, >>> >> >>> >> I m trying to use pacemaker and corosync for the clustering >>> requirement that >>> >> came up recently. >>> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc >>> >> environment (Target board where pacemaker and corosync are supposed >>> to run) >>> >> I m having trouble bringing up pacemaker in that environment, though >>> I could >>> >> successfully bring up corosync. >>> &
Re: [ClusterLabs] [ClusterLab] : Unable to bring up pacemaker
Hi, I went ahead and made some changes in file system(Like I brought in /etc/init.d/corosync and /etc/init.d/pacemaker, /etc/sysconfig ), After that I was able to run "pcs cluster start". But it failed with the following error # pcs cluster start Starting Cluster... Starting Pacemaker Cluster Manager[FAILED] Error: unable to start pacemaker And in the /var/log/pacemaker.log, I saw these errors pacemakerd: info: mcp_read_config: cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s Apr 29 08:53:47 [15863] node_cu pacemakerd: info: mcp_read_config: cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 5s Apr 29 08:53:52 [15863] node_cu pacemakerd: warning: mcp_read_config: Could not connect to Cluster Configuration Database API, error 6 Apr 29 08:53:52 [15863] node_cu pacemakerd: notice: main: Could not obtain corosync config data, exiting Apr 29 08:53:52 [15863] node_cu pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2 And in the /var/log/Debuglog, I saw these errors coming from corosync 20160429 085347.487050 airv_cu daemon.warn corosync[12857]: [QB] Denied connection, is not ready (12857-15863-14) 20160429 085347.487067 airv_cu daemon.info corosync[12857]: [QB] Denied connection, is not ready (12857-15863-14) I browsed the code of libqb to find that it is failing in https://github.com/ClusterLabs/libqb/blob/master/lib/ipc_setup.c Line 600 : handle_new_connection function Line 637: if (auth_result == 0 && c->service->serv_fns.connection_accept) { res = c->service->serv_fns.connection_accept(c, c->euid, c->egid); } if (res != 0) { goto send_response; } Any hints on this issue would be really helpful for me to go ahead. Please let me know if any logs are required, Regards, Sriram On Thu, Apr 28, 2016 at 2:42 PM, Sriram wrote: > Thanks Ken and Emmanuel. > Its a big endian machine. I will try with running "pcs cluster setup" and > "pcs cluster start" > Inside cluster.py, "service pacemaker start" and "service corosync start" > are executed to bring up pacemaker and corosync. > Those service scripts and the infrastructure needed to bring up the > processes in the above said manner doesn't exist in my board. > As it is a embedded board with the limited memory, full fledged linux is > not installed. > Just curious to know, what could be reason the pacemaker throws that error. > > > > *"cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s"* > Thanks for response. > > Regards, > Sriram. > > On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot wrote: > >> On 04/27/2016 11:25 AM, emmanuel segura wrote: >> > you need to use pcs to do everything, pcs cluster setup and pcs >> > cluster start, try to use the redhat docs for more information. >> >> Agreed -- pcs cluster setup will create a proper corosync.conf for you. >> Your corosync.conf below uses corosync 1 syntax, and there were >> significant changes in corosync 2. In particular, you don't need the >> file created in step 4, because pacemaker is no longer launched via a >> corosync plugin. >> >> > 2016-04-27 17:28 GMT+02:00 Sriram : >> >> Dear All, >> >> >> >> I m trying to use pacemaker and corosync for the clustering >> requirement that >> >> came up recently. >> >> We have cross compiled corosync, pacemaker and pcs(python) for ppc >> >> environment (Target board where pacemaker and corosync are supposed to >> run) >> >> I m having trouble bringing up pacemaker in that environment, though I >> could >> >> successfully bring up corosync. >> >> Any help is welcome. >> >> >> >> I m using these versions of pacemaker and corosync >> >> [root@node_cu pacemaker]# corosync -v >> >> Corosync Cluster Engine, version '2.3.5' >> >> Copyright (c) 2006-2009 Red Hat, Inc. >> >> [root@node_cu pacemaker]# pacemakerd -$ >> >> Pacemaker 1.1.14 >> >> Written by Andrew Beekhof >> >> >> >> For running corosync, I did the following. >> >> 1. Created the following directories, >> >> /var/lib/pacemaker >> >> /var/lib/corosync >> >> /var/lib/pacemaker >> >> /var/lib/pacemaker/cores >> >> /var/lib/pacemaker/pengine >> >> /var/lib/pacemaker/blackbox >> >> /var/lib/pacemaker/cib >> >> >> >> >> >> 2. Created a file called corosync.conf under /etc/corosync folder with >> the >> >> follo
Re: [ClusterLabs] [ClusterLab] : Unable to bring up pacemaker
Thanks Ken and Emmanuel. Its a big endian machine. I will try with running "pcs cluster setup" and "pcs cluster start" Inside cluster.py, "service pacemaker start" and "service corosync start" are executed to bring up pacemaker and corosync. Those service scripts and the infrastructure needed to bring up the processes in the above said manner doesn't exist in my board. As it is a embedded board with the limited memory, full fledged linux is not installed. Just curious to know, what could be reason the pacemaker throws that error. *"cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s"* Thanks for response. Regards, Sriram. On Thu, Apr 28, 2016 at 8:55 AM, Ken Gaillot wrote: > On 04/27/2016 11:25 AM, emmanuel segura wrote: > > you need to use pcs to do everything, pcs cluster setup and pcs > > cluster start, try to use the redhat docs for more information. > > Agreed -- pcs cluster setup will create a proper corosync.conf for you. > Your corosync.conf below uses corosync 1 syntax, and there were > significant changes in corosync 2. In particular, you don't need the > file created in step 4, because pacemaker is no longer launched via a > corosync plugin. > > > 2016-04-27 17:28 GMT+02:00 Sriram : > >> Dear All, > >> > >> I m trying to use pacemaker and corosync for the clustering requirement > that > >> came up recently. > >> We have cross compiled corosync, pacemaker and pcs(python) for ppc > >> environment (Target board where pacemaker and corosync are supposed to > run) > >> I m having trouble bringing up pacemaker in that environment, though I > could > >> successfully bring up corosync. > >> Any help is welcome. > >> > >> I m using these versions of pacemaker and corosync > >> [root@node_cu pacemaker]# corosync -v > >> Corosync Cluster Engine, version '2.3.5' > >> Copyright (c) 2006-2009 Red Hat, Inc. > >> [root@node_cu pacemaker]# pacemakerd -$ > >> Pacemaker 1.1.14 > >> Written by Andrew Beekhof > >> > >> For running corosync, I did the following. > >> 1. Created the following directories, > >> /var/lib/pacemaker > >> /var/lib/corosync > >> /var/lib/pacemaker > >> /var/lib/pacemaker/cores > >> /var/lib/pacemaker/pengine > >> /var/lib/pacemaker/blackbox > >> /var/lib/pacemaker/cib > >> > >> > >> 2. Created a file called corosync.conf under /etc/corosync folder with > the > >> following contents > >> > >> totem { > >> > >> version: 2 > >> token: 5000 > >> token_retransmits_before_loss_const: 20 > >> join: 1000 > >> consensus: 7500 > >> vsftype:none > >> max_messages: 20 > >> secauth:off > >> cluster_name: mycluster > >> transport: udpu > >> threads:0 > >> clear_node_high_bit: yes > >> > >> interface { > >> ringnumber: 0 > >> # The following three values need to be set based on > your > >> environment > >> bindnetaddr: 10.x.x.x > >> mcastaddr: 226.94.1.1 > >> mcastport: 5405 > >> } > >> } > >> > >> logging { > >> fileline: off > >> to_syslog: yes > >> to_stderr: no > >> to_syslog: yes > >> logfile: /var/log/corosync.log > >> syslog_facility: daemon > >> debug: on > >> timestamp: on > >> } > >> > >> amf { > >> mode: disabled > >> } > >> > >> quorum { > >> provider: corosync_votequorum > >> } > >> > >> nodelist { > >> node { > >> ring0_addr: node_cu > >> nodeid: 1 > >>} > >> } > >> > >> 3. Created authkey under /etc/corosync > >> > >> 4. Created a file called pcmk under /etc/corosync/service.d and > contents as > >> below, > >> cat pcmk > >> service { > >> # Load the Pacemaker Cluster Resource Manager > >> name: pacemaker > >> ver: 1 > >> } > >> > >> 5. Added the node name "node_c
[ClusterLabs] [ClusterLab] : Unable to bring up pacemaker
Dear All, I m trying to use pacemaker and corosync for the clustering requirement that came up recently. We have cross compiled corosync, pacemaker and pcs(python) for ppc environment (Target board where pacemaker and corosync are supposed to run) I m having trouble bringing up pacemaker in that environment, though I could successfully bring up corosync. Any help is welcome. I m using these versions of pacemaker and corosync [root@node_cu pacemaker]# corosync -v *Corosync Cluster Engine, version '2.3.5'* Copyright (c) 2006-2009 Red Hat, Inc. [root@node_cu pacemaker]# pacemakerd -$ *Pacemaker 1.1.14Written by Andrew Beekhof* For running corosync, I did the following. 1. Created the following directories, /var/lib/pacemaker /var/lib/corosync /var/lib/pacemaker /var/lib/pacemaker/cores /var/lib/pacemaker/pengine /var/lib/pacemaker/blackbox /var/lib/pacemaker/cib 2. Created a file called corosync.conf under /etc/corosync folder with the following contents totem { version: 2 token: 5000 token_retransmits_before_loss_const: 20 join: 1000 consensus: 7500 vsftype:none max_messages: 20 secauth:off cluster_name: mycluster transport: udpu threads:0 clear_node_high_bit: yes interface { ringnumber: 0 # The following three values need to be set based on your environment bindnetaddr: 10.x.x.x mcastaddr: 226.94.1.1 mcastport: 5405 } } logging { fileline: off to_syslog: yes to_stderr: no to_syslog: yes logfile: /var/log/corosync.log syslog_facility: daemon debug: on timestamp: on } amf { mode: disabled } quorum { provider: corosync_votequorum } nodelist { node { ring0_addr: node_cu nodeid: 1 } } 3. Created authkey under /etc/corosync 4. Created a file called pcmk under /etc/corosync/service.d and contents as below, cat pcmk service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 } 5. Added the node name "node_cu" in /etc/hosts with 10.X.X.X ip 6. ./corosync -f -p & --> this step started corosync [root@node_cu pacemaker]# netstat -alpn | grep -i coros udp0 0 10.X.X.X:61841 0.0.0.0:* 9133/corosync udp0 0 10.X.X.X:5405 0.0.0.0:* 9133/corosync unix 2 [ ACC ] STREAM LISTENING 14 9133/corosync @quorum unix 2 [ ACC ] STREAM LISTENING 148884 9133/corosync @cmap unix 2 [ ACC ] STREAM LISTENING 148887 9133/corosync @votequorum unix 2 [ ACC ] STREAM LISTENING 148885 9133/corosync @cfg unix 2 [ ACC ] STREAM LISTENING 148886 9133/corosync @cpg unix 2 [ ] DGRAM148840 9133/corosync 7. ./pacemakerd -f & gives the following error and exits. [root@node_cu pacemaker]# pacemakerd -f cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 1s cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 2s cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 3s cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 4s cmap connection setup failed: CS_ERR_TRY_AGAIN. Retrying in 5s Could not connect to Cluster Configuration Database API, error 6 Can you please point me, what is missing in these steps ? Before trying these steps, I tried running "pcs cluster start", but that command fails with "service" script not found. As the root filesystem doesn't contain either /etc/init.d/ or /sbin/service So, the plan is to bring up corosync and pacemaker manually, later do the cluster configuration using "pcs" commands. Regards, Sriram ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org