[Pacemaker] Cluster reboot fro maintenance
Hi, i have a two node cluster with some vms (pacemaker resources) running on the two hypervisors: pacemaker-1.0.10 corosync-1.3.0 I need to do maintenance stuff , so i need to: - put on maintenance the cluster so the cluster doesn't touch/start/stop/monitor the vms - update the vms - stop the vm - stop cluster stuff (corosync/pacemaker) - reboot the hypervisors. What is the corret way to do that ( corosync/pacemaker) side ? Best regards Marco ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Clustermon issue
I tried to insert some check code in my script: #!/bin/bash echo $(date) >> /tmp/check monitorfile=/tmp/clustermonitor.html hostname=$(hostname) echo "Cluster state changes detected" | mail -r "$hostname@" -s "Cluster Monitor" -a $monitorfile mquerc...@gmail.com to check if the script is been called or not. In /tmp dir there is clustermonitor.html file, that been created by ClusterMon resource, but no check file is present. Thanks. Il 08/01/2015 03:31, Andrew Beekhof ha scritto: And there is no indication this is being called? On 7 Jan 2015, at 6:21 pm, Marco Querci wrote: #!/bin/bash monitorfile=/tmp/clustermonitor.html hostname=$(hostname) echo "Cluster state changes detected" | mail -r "$hostname@" -s "Cluster Monitor" -a $monitorfile mquerc...@gmail.com Thanks. Il 06/01/2015 01:21, Andrew Beekhof ha scritto: On 6 Jan 2015, at 3:37 am, Marco Querci wrote: Hi All. Any news for my problem? Maybe post your /home/administrator/clustermonitor_notification.sh script? Many thanks. Il 19/12/2014 12:13, Marco Querci ha scritto: Many tahnk for your reply. Here is my configuration: Il 19/12/2014 10:02, Florian Crouzat ha scritto: Le 18/12/2014 16:21, Marco Querci a écrit : Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Before last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. Hello, please paste your full configuration here please so we understand how you use the ClusterMon stuff. Remember that on RHEL 6.x, SNMP support is not built in ; but that's probably why you use an external_agent. I just need to make sure by reading your configuration. Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _
Re: [Pacemaker] Clustermon issue
Sorry ... it was my error. My CentOS is 6.6: [root@langate1 ~]# cat /etc/redhat-release CentOS release 6.6 (Final) full upgraded. Il 08/01/2015 23:00, Andrew Beekhof ha scritto: On 8 Jan 2015, at 10:00 pm, Marco Querci wrote: Thanks for your suggestion. But my version is: [root@langate1 ~]# pacemakerd --version Pacemaker 1.1.11 Why were you saying my version is 1.1.12-rc3? Because that the tag that '97629de' in the following matches to: I installed pacemaker on CentOS 6.5 from repository. Why those repository aren't updated with this patch? Because CentOS blindly rebuilds from RHEL and no RHEL customers complained about it (which means package maintainers aren't allowed to fix it until 6.6). Many Thanks. Il 08/01/2015 03:39, Andrew Beekhof ha scritto: On 8 Jan 2015, at 1:31 pm, Andrew Beekhof wrote: And there is no indication this is being called? Doh. I know this one... you're actually using 1.1.12-rc3. You need this patch which landed after 1.1.12 shipped: https://github.com/beekhof/pacemaker/commit/3df6aff On 7 Jan 2015, at 6:21 pm, Marco Querci wrote: #!/bin/bash monitorfile=/tmp/clustermonitor.html hostname=$(hostname) echo "Cluster state changes detected" | mail -r "$hostname@" -s "Cluster Monitor" -a $monitorfile mquerc...@gmail.com Thanks. Il 06/01/2015 01:21, Andrew Beekhof ha scritto: On 6 Jan 2015, at 3:37 am, Marco Querci wrote: Hi All. Any news for my problem? Maybe post your /home/administrator/clustermonitor_notification.sh script? Many thanks. Il 19/12/2014 12:13, Marco Querci ha scritto: Many tahnk for your reply. Here is my configuration: Il 19/12/2014 10:02, Florian Crouzat ha scritto: Le 18/12/2014 16:21, Marco Querci a écrit : Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Before last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. Hello, please paste your full configuration here please so we understand how you use the ClusterMon stuff. Remember that on RHEL 6.x, SNMP support is not built in ; but that's probably why you use an external_agent. I just need to make sure by reading your configuration. Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___
Re: [Pacemaker] Clustermon issue
Thanks for your suggestion. But my version is: [root@langate1 ~]# pacemakerd --version Pacemaker 1.1.11 Why were you saying my version is 1.1.12-rc3? I installed pacemaker on CentOS 6.5 from repository. Why those repository aren't updated with this patch? Many Thanks. Il 08/01/2015 03:39, Andrew Beekhof ha scritto: On 8 Jan 2015, at 1:31 pm, Andrew Beekhof wrote: And there is no indication this is being called? Doh. I know this one... you're actually using 1.1.12-rc3. You need this patch which landed after 1.1.12 shipped: https://github.com/beekhof/pacemaker/commit/3df6aff On 7 Jan 2015, at 6:21 pm, Marco Querci wrote: #!/bin/bash monitorfile=/tmp/clustermonitor.html hostname=$(hostname) echo "Cluster state changes detected" | mail -r "$hostname@" -s "Cluster Monitor" -a $monitorfile mquerc...@gmail.com Thanks. Il 06/01/2015 01:21, Andrew Beekhof ha scritto: On 6 Jan 2015, at 3:37 am, Marco Querci wrote: Hi All. Any news for my problem? Maybe post your /home/administrator/clustermonitor_notification.sh script? Many thanks. Il 19/12/2014 12:13, Marco Querci ha scritto: Many tahnk for your reply. Here is my configuration: Il 19/12/2014 10:02, Florian Crouzat ha scritto: Le 18/12/2014 16:21, Marco Querci a écrit : Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Before last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. Hello, please paste your full configuration here please so we understand how you use the ClusterMon stuff. Remember that on RHEL 6.x, SNMP support is not built in ; but that's probably why you use an external_agent. I just need to make sure by reading your configuration. Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterl
Re: [Pacemaker] Clustermon issue
#!/bin/bash monitorfile=/tmp/clustermonitor.html hostname=$(hostname) echo "Cluster state changes detected" | mail -r "$hostname@" -s "Cluster Monitor" -a $monitorfile mquerc...@gmail.com Thanks. Il 06/01/2015 01:21, Andrew Beekhof ha scritto: On 6 Jan 2015, at 3:37 am, Marco Querci wrote: Hi All. Any news for my problem? Maybe post your /home/administrator/clustermonitor_notification.sh script? Many thanks. Il 19/12/2014 12:13, Marco Querci ha scritto: Many tahnk for your reply. Here is my configuration: Il 19/12/2014 10:02, Florian Crouzat ha scritto: Le 18/12/2014 16:21, Marco Querci a écrit : Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Before last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. Hello, please paste your full configuration here please so we understand how you use the ClusterMon stuff. Remember that on RHEL 6.x, SNMP support is not built in ; but that's probably why you use an external_agent. I just need to make sure by reading your configuration. Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Clustermon issue
Hi All. Any news for my problem? Many thanks. Il 19/12/2014 12:13, Marco Querci ha scritto: Many tahnk for your reply. Here is my configuration: validate-with="pacemaker-1.2" cib-last-written="Thu Dec 18 20:04:43 2014" update-origin="langate1" update-client="crmd" crm_feature_set="3.0.9" have-quorum="1" dc-uuid="langate1"> name="dc-version" value="1.1.11-97629de"/> name="cluster-infrastructure" value="classic openais (with plugin)"/> name="expected-quorum-votes" value="2"/> name="stonith-enabled" value="false"/> name="no-quorum-policy" value="ignore"/> name="last-lrm-refresh" value="1418929320"/> type="IPaddr2"> name="ip" value="192.168.0.254"/> id="ClusterIP_int-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/> name="nic" value="eth3"/> name="monitor"/> name="monitor"/> name="monitor"/> provider="heartbeat" type="IPaddr2"> name="ip" value="10.10.10.2"/> id="ClusterIP_ext1-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/> name="nic" value="eth0"/> interval="60s" name="monitor"/> provider="heartbeat" type="IPaddr2"> name="ip" value="172.16.0.2"/> id="ClusterIP_ext2-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/> name="nic" value="eth1"/> interval="60s" name="monitor"/> name="monitor"/> name="monitor"/> provider="pacemaker" type="ClusterMon"> id="ClusterMonitor-instance_attributes-extra_options" name="extra_options" value="-E /home/administrator/clustermonitor_notification.sh -e "/> name="start" timeout="20"/> name="stop" timeout="20"/> name="monitor" timeout="20"/> name="resource-stickiness" value="100"/> crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member"> value="0"/> name="probe_complete" value="true"/> operation_key="WanFailover_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="17:67:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;17:67:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="67" rc-code="0" op-status="0" interval="0" last-run="1418929483" last-rc-change="1418929483" exec-time="10" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" on_node="langate1"/> operation_key="WanFailover_monitor_6" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="18:67:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;18:67:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="68" rc-code="0" op-status="0" interval="6" last-rc-change="1418929483" exec-time="15" queue-time="0" op-digest="4811cef7f7f94e3a35a70be7916cb2fd" on_node="langate1"/> operation_key="Shorewall_start_0" oper
Re: [Pacemaker] Clustermon issue
itor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="15:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:7;15:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="18" rc-code="7" op-status="0" interval="0" last-run="1418929392" last-rc-change="1418929392" exec-time="452" queue-time="0" op-digest="3a2172b3600a74a02c56030c73d7efd6" on_node="langate2"/> provider="heartbeat"> operation_key="ClusterIP_int_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="12:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:7;12:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1418929392" last-rc-change="1418929392" exec-time="502" queue-time="0" op-digest="2e0d4879baaebfc3a7092f3adfeadb9e" on_node="langate2"/> provider="heartbeat"> operation_key="ClusterIP_ext2_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="16:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:7;16:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="22" rc-code="7" op-status="0" interval="0" last-run="1418929392" last-rc-change="1418929392" exec-time="431" queue-time="0" op-digest="cc4af9155b9449867acd30be849b0d3f" on_node="langate2"/> operation_key="Shorewall_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="28:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;28:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="37" rc-code="0" op-status="0" interval="0" last-run="1418929393" last-rc-change="1418929393" exec-time="6584" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" on_node="langate2"/> operation_key="Shorewall_monitor_6" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="23:66:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;23:66:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="41" rc-code="0" op-status="0" interval="6" last-rc-change="1418929399" exec-time="99" queue-time="1" op-digest="4811cef7f7f94e3a35a70be7916cb2fd" on_node="langate2"/> operation_key="Fail2ban_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="17:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:7;17:65:7:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="26" rc-code="7" op-status="0" interval="0" last-run="1418929392" last-rc-change="1418929392" exec-time="73" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" on_node="langate2"/> operation_key="Postfix_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="46:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;46:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="38" rc-code="0" op-status="0" interval="0" last-run="1418929393" last-rc-change="1418929393" exec-time="4401" queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8" on_node="langate2"/> operation_key="Postfix_monitor_6" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="42:66:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;42:66:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="42" rc-code="0" op-status="0" interval="6" last-rc-change="1418929399" exec-time="54" queue-time="0" op-digest="4811cef7f7f94e3a35a70be7916cb2fd" on_node="langate2"/> class="ocf" provider="pacemaker"> operation_key="ClusterMonitor_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="54:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;54:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="39" rc-code="0" op-status="0" interval="0" last-run="1418929393" last-rc-change="1418929393" exec-time="163" queue-time="0" op-digest="bea5e7b384fbbbc979747b3584d1c025" on_node="langate2"/> operation_key="ClusterMonitor_monitor_1" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.9" transition-key="55:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" transition-magic="0:0;55:65:0:2883fef9-479a-473e-a889-5ecd6d367ed0" call-id="40" rc-code="0" op-status="0" interval="1" last-rc-change="1418929393" exec-time="62" queue-time="0" op-digest="31503c31050d63046026e4abfd181f64" on_node="langate2"/> name="probe_complete" value="true"/> Il 19/12/2014 10:02, Florian Crouzat ha scritto: Le 18/12/2014 16:21, Marco Querci a écrit : Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Before last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. Hello, please paste your full configuration here please so we understand how you use the ClusterMon stuff. Remember that on RHEL 6.x, SNMP support is not built in ; but that's probably why you use an external_agent. I just need to make sure by reading your configuration. Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Clustermon issue
Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Before last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] clustermon external_agent issue
Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Since last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] corosync.conf tuning for Vm
Hi all, in a HA hypervisor environment corosync/pacemaker with virtual machines is ok setting token to 1 minute ? my needs are: - i don't want that a temporary overload on an hypervisor break corosync comunication or trigger a token lost. - is ok to set token so high ( 1 minute ) or there are things/problems i don't know ? thanks signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS
On Mon, 14 Apr 2014 14:40:43 +1000 Andrew Beekhof wrote: > > On 11 Apr 2014, at 10:54 pm, Marco Felettigh wrote: > > > On Fri, 11 Apr 2014 17:17:57 +1000 > > Andrew Beekhof wrote: > > > >> > >> On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: > >> > >>> On Tue, 8 Apr 2014 10:49:16 +1000 > >>> Andrew Beekhof wrote: > >>> > >>>> > >>>> On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote: > >>>> > >>>>> Hi, > >>>>> in a production environment with 2 nodes ( nodeA , nodeB ) we > >>>>> had an hardware failure so we restart the nodeB. > >>>>> After the restarted nodeB came up we restart corosync/pacemaker > >>>>> on it but for 2 days till now che corosync/pacemaker stuff is > >>>>> looping. > >>>>> > >>>>> crm_mon NodeA: > >>>>> > >>>>> Stack: openais > >>>>> Current DC: nodeA - partition with quorum > >>>>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > >>>>> 2 Nodes configured, 2 expected votes > >>>>> 17 Resources configured. > >>>>> > >>>>> > >>>>> Online: [ nodeA ] > >>>>> OFFLINE: [ nodeB ] > >>>>> > >>>>> > >>>>> crm_mon NodeB: > >>>>> > >>>>> Stack: openais > >>>>> Current DC: NONE > >>>>> 2 Nodes configured, 2 expected votes > >>>>> 17 Resources configured. > >>>>> > >>>>> > >>>>> OFFLINE: [ nodeA nodeB ] > >>>>> > >>>>> This loop on nodeB reports: > >>>>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner: > >>>>> nodeA) lost: vote from nodeA (Age) > >>>>> > >>>>> So investigating around i found these message on nodeA: > >>>>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS > >>>>> > >>>>> now this message is repeating for every operation. > >>>>> Is it a corosync problem or a cib/pacemaker one ? > >>>>> Any suggestion on what is happened ? > >>>> > >>>> For some reason the cib can't connect to corosync anymore. > >>>> No software got upgraded recently? > >>>> > >>>> Are there any logs from corosync? > >>>> Which distro is this? > >>>> > >>>>> And why the start of a cluster node crasched the DC suff ? :( > >>>>> > >>>>> > >>>>> Bye Marco > >>>>> > >>>>> ___ > >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>> > >>>>> Project Home: http://www.clusterlabs.org > >>>>> Getting started: > >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: > >>>>> http://bugs.clusterlabs.org > >>>> > >>> > >>> Hi, > >>> the distro in an opensuse 11.1 and there is no updates also > >>> because the distro is out of maintenance. > >> > >> A good reason to be using SLES (or RHEL/CentOS). > > > > Better Gentoo ;) > > > >> > >>> We are planning and upgrade but the interesting thing is to figure > >>> out the reasons of the problem. > >>> The log in attachment, thanks for the support > >> > >> There's nothing obvious in the logs. Just that as far as pacemaker > >> could tell, corosync suddenly went away. Was the corosync process > >> still running? > >> > > > > Yes , corosync was still running . > > Stopping pacemaker and restarting it didnt help? > At the end we restarted the two server and then start the corosync/pacemaker stuff. Thanks for the support Marco ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS
On Fri, 11 Apr 2014 17:17:57 +1000 Andrew Beekhof wrote: > > On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: > > > On Tue, 8 Apr 2014 10:49:16 +1000 > > Andrew Beekhof wrote: > > > >> > >> On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote: > >> > >>> Hi, > >>> in a production environment with 2 nodes ( nodeA , nodeB ) we had > >>> an hardware failure so we restart the nodeB. > >>> After the restarted nodeB came up we restart corosync/pacemaker on > >>> it but for 2 days till now che corosync/pacemaker stuff is > >>> looping. > >>> > >>> crm_mon NodeA: > >>> > >>> Stack: openais > >>> Current DC: nodeA - partition with quorum > >>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > >>> 2 Nodes configured, 2 expected votes > >>> 17 Resources configured. > >>> > >>> > >>> Online: [ nodeA ] > >>> OFFLINE: [ nodeB ] > >>> > >>> > >>> crm_mon NodeB: > >>> > >>> Stack: openais > >>> Current DC: NONE > >>> 2 Nodes configured, 2 expected votes > >>> 17 Resources configured. > >>> > >>> > >>> OFFLINE: [ nodeA nodeB ] > >>> > >>> This loop on nodeB reports: > >>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner: > >>> nodeA) lost: vote from nodeA (Age) > >>> > >>> So investigating around i found these message on nodeA: > >>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS > >>> > >>> now this message is repeating for every operation. > >>> Is it a corosync problem or a cib/pacemaker one ? > >>> Any suggestion on what is happened ? > >> > >> For some reason the cib can't connect to corosync anymore. > >> No software got upgraded recently? > >> > >> Are there any logs from corosync? > >> Which distro is this? > >> > >>> And why the start of a cluster node crasched the DC suff ? :( > >>> > >>> > >>> Bye Marco > >>> > >>> ___ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: > >>> http://bugs.clusterlabs.org > >> > > > > Hi, > > the distro in an opensuse 11.1 and there is no updates also because > > the distro is out of maintenance. > > A good reason to be using SLES (or RHEL/CentOS). Better Gentoo ;) > > > We are planning and upgrade but the interesting thing is to figure > > out the reasons of the problem. > > The log in attachment, thanks for the support > > There's nothing obvious in the logs. Just that as far as pacemaker > could tell, corosync suddenly went away. Was the corosync process > still running? > Yes , corosync was still running . ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] cib: ERROR: send_ais_message: Not connected to AIS
Hi, in a production environment with 2 nodes ( nodeA , nodeB ) we had an hardware failure so we restart the nodeB. After the restarted nodeB came up we restart corosync/pacemaker on it but for 2 days till now che corosync/pacemaker stuff is looping. crm_mon NodeA: Stack: openais Current DC: nodeA - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 17 Resources configured. Online: [ nodeA ] OFFLINE: [ nodeB ] crm_mon NodeB: Stack: openais Current DC: NONE 2 Nodes configured, 2 expected votes 17 Resources configured. OFFLINE: [ nodeA nodeB ] This loop on nodeB reports: crmd: [7149]: debug: do_election_count_vote: Election 3 (owner: nodeA) lost: vote from nodeA (Age) So investigating around i found these message on nodeA: cib: [28755]: ERROR: send_ais_message: Not connected to AIS now this message is repeating for every operation. Is it a corosync problem or a cib/pacemaker one ? Any suggestion on what is happened ? And why the start of a cluster node crasched the DC suff ? :( Bye Marco signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] token lost - need clarification
On Tue, 17 Dec 2013 09:28:51 +0100 Michael Schwartzkopff wrote: > Am Dienstag, 17. Dezember 2013, 09:17:31 schrieb ma...@nucleus.it: > > Hi to all, > > i set up a 2 node cluster with a cross cable between the two nodes > > without stonith ; i know this is not the best way but this is the > > scenario i need at that time. > > > > I know the releases are old: > > corosync-1.2.7-1.2 > > libcorosync-1.2.7-1.2 > > pacemaker-1.0.10-1.4 > > libpacemaker3-1.0.10-1.4 > > > > Everything was ok for some days/months but a few day ago without > > network interruption ( no messages relative to ethernet modules or > > errors in network statistics or notifications by nagios ping checks > > ) between the two nodes something went wrong. > > > > From what i try to understand from the logs attached : > > Token Timeout (1 ms) retransmit timeout (980 ms) > > token hold (774 ms) retransmits before loss (10 retrans) > > > > > > the 2 nodes lost a token and they try to solve the situation but > > node1 think node2 is up: > > > > Dec 7 05:01:41 node1 pengine: [1138]: info: > > determine_online_status: Node node2 is online > > Dec 7 05:01:41 node1 pengine: [1138]: info: > > determine_online_status: Node node1 is online > > > > and then lost > > > > Dec 7 05:01:54 node1 corosync[1128]: [pcmk ] info: > > ais_mark_unseen_peer_dead: Node node2 was not seen in the previous > > transition > > Dec 7 05:01:54 node1 corosync[1128]: [pcmk ] info: > > update_member: Node 33559980/node2 is now: lost > > > > while node2 think node1 was gone: > > > > Dec 7 05:01:34 node2 corosync[6356]: [pcmk ] info: > > ais_mark_unseen_peer_dead: Node node1 was not seen in the previous > > transition Dec 7 05:01:34 node2 corosync[6356]: [pcmk ] info: > > update_member: Node 16782764/node1 is now: lost > > > > then they go in spilt brain . > > Any suggestion about why node1 saw node2 ath the first time while > > node2 declared immediately lost node1 ? > > This depends who initiates the round. Both nodes recognized the > failure within 20 seconds. This is ok. Especially if you allow 10 > Sekunds for a token timeout. > > Mit freundlichen Grüßen, > > Michael Schwartzkopff > Ok that is fine but it is very strange without network loss between the nodes that they cannot resend the token and later restablish the quorum :( . Marco ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Configuring Clusterstack on Scientifc Linux 6
On 09/08/2011 11:08 AM, Vadim Bulst wrote: Hi all, I'd like to build a 10-Node Cluster based on ScientificLinux 6 with Corosync and Pacemaker. On my test-installation in the cluster-glue packages I didn't find any stonith-components . Do I have to install any more packages? In an opensuse-installation, there is a directory called /usr/lib/stonith/ I didn't find any similar in an SL-environment. My packagelist: cluster-glue.i686 1.0.5-2.el6 @sl cluster-glue-libs.i686 1.0.5-2.el6 @sl cluster-glue-libs-devel.i686 1.0.5-2.el6 @sl clusterlib.i686 3.0.12-41.el6 @sl corosync.i686 1.2.3-36.el6 @sl corosynclib.i686 1.2.3-36.el6 @sl pacemaker.i686 1.1.5-5.el6 @sl pacemaker-libs.i686 1.1.5-5.el6 @sl Cheers, Vadim For this you'll need the "fence-agents" package. The fencing binaries however are not in /usr/lib/stonith but in /usr/sbin/fence_* You may also need resource-agents package. Bye, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker in RHEL6.
On 08/12/2011 06:05 AM, Larry Brigman wrote: On Thu, Aug 11, 2011 at 8:51 PM, Larry Brigman mailto:larry.brig...@gmail.com>> wrote: On Thu, Aug 11, 2011 at 5:37 PM, Andrew Beekhof mailto:and...@beekhof.net>> wrote: On Fri, Aug 12, 2011 at 1:13 AM, Larry Brigman mailto:larry.brig...@gmail.com>> wrote: > On Wed, Aug 10, 2011 at 10:50 PM, Marco van Putten > mailto:marco.vanput...@tudelft.nl>> wrote: >> >> On 08/10/2011 06:23 PM, David Coulson wrote: >>> >>> On 8/10/11 11:43 AM, Marco van Putten wrote: >>>> >>>> Thanks Andreas. But our managers persist on using Redhat. >>> >>> I think the idea would be to take the HA packages distributed with >>> Scientific Linux 6.x and run them on RHEL. >> >> >> OK Thanks for the heads up. I will give it a try with the Scientific Linux >> packages on RHEL. >> >> >>> >>> Note that even when you do subscribe to the HA add-on in RHEL6, >>> pacemaker is not supported by RedHat. Are you sure you can't buy the HA >>> add-on to go with your base entitlement for RHEL? >> >> >> No unfortunately Redhat's license model doesn't work that way. In stead of >> the 150$ academic license you have to buy the full licensed version and then >> some extra for the add-on. >> > If you have the install DVD then the packages are there, just in a different > repo on the disk. > Directory is HighAvailability. > ls pacemaker-* > pacemaker-1.1.2-7.el6.x86_64.rpm pacemaker-libs-1.1.2-7.el6.i686.rpm > pacemaker-libs-1.1.2-7.el6.x86_64.rpm Is corosync and cluster-glue in there too? Yes. Packages]$ ls coro* corosync-1.2.3-21.el6.x86_64.rpm corosynclib-1.2.3-21.el6.x86_64.rpm corosynclib-1.2.3-21.el6.i686.rpm Packages]$ ls cluster* cluster-cim-0.16.2-10.el6.x86_64.rpm clusterlib-3.0.12-23.el6.i686.rpm cluster-glue-1.0.5-2.el6.x86_64.rpm clusterlib-3.0.12-23.el6.x86_64.rpm cluster-glue-libs-1.0.5-2.el6.i686.rpm cluster-snmp-0.16.2-10.el6.x86_64.rpm cluster-glue-libs-1.0.5-2.el6.x86_64.rpm The source packages are also available. ftp://ftp.redhat.com/pub/redhat/linux/enterprise/6Server/en/os/SRPMS/ I also found the rpm's on our Redhat satellite server. But this doesn't make it much easier if you want to do a upgrade to a newer version. I've tried the Scientific Linux way by adding it as a disabled repository. And then installing pacemaker by: # yum install --enablerepo=scientificlinux pacemaker Yum then takes care of all the dependencies and (somehow) only uses the pacemaker/corosync/etc packages from scientific while the rest comes from Redhat. You still need the epel repository as well btw. So The Scientific Linux option works best for our situation I think. Thanks everyone for all the reply's, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker in RHEL6.
On 08/10/2011 06:23 PM, David Coulson wrote: On 8/10/11 11:43 AM, Marco van Putten wrote: Thanks Andreas. But our managers persist on using Redhat. I think the idea would be to take the HA packages distributed with Scientific Linux 6.x and run them on RHEL. OK Thanks for the heads up. I will give it a try with the Scientific Linux packages on RHEL. Note that even when you do subscribe to the HA add-on in RHEL6, pacemaker is not supported by RedHat. Are you sure you can't buy the HA add-on to go with your base entitlement for RHEL? No unfortunately Redhat's license model doesn't work that way. In stead of the 150$ academic license you have to buy the full licensed version and then some extra for the add-on. David Bye, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker in RHEL6.
On 08/10/2011 04:31 PM, Andreas Kurz wrote: On 2011-08-10 14:13, Marco van Putten wrote: Hi, Is it possible to get the pacemaker rpm's available for RHEL6 on the Clusterlabs repository (like for RHEL5)? I know they are available through Redhat's "High Availability" channel. But since we have academic licences we don't have this channel available. scientific linux 6.1 should provide all packages Regards, Andreas Thanks Andreas. But our managers persist on using Redhat. Bye, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Pacemaker in RHEL6.
Hi, Is it possible to get the pacemaker rpm's available for RHEL6 on the Clusterlabs repository (like for RHEL5)? I know they are available through Redhat's "High Availability" channel. But since we have academic licences we don't have this channel available. Bye, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [RPMs] clusterlabs.org and epel-6 ?
I don't believe there is any need for additional RPMs. Pacemaker should already be in CentOS6 For example we're running Redhat on an academic licence and don't have a subscription for the HA channel. It would be very welcome to have RPM's from Clusterlabs. Bye, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Pacemaker and Apache recourse configuration problem
On 06/11/2010 01:09 PM, Julio Gómez Belmonte wrote: Hello everyone, I'm configuring Pacemaker as Active / Passive cluster between two nodes that need to run tomcat and mysql alternately. When I try to configure Apache I get the following error in the state. Apache_start_0 (node = SSCC-01, call = 39, rc =- 2, status = Timed Out): unknown error exec The sentence that I used to configure the Apache application is: configure primitive Apache ocf:heartbeat:apache params configfile="/etc/apache2/apache2.conf" port="443" Anyone have any idea why this may be happening? Did you activate mod_status and uncommented/set "/server-status> etc..." in your apache config? Bye, Marco. Thanks in advance and best regards, ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Unable to commit from crm.
Hi Dejan, Dejan Muhamedagic wrote: Hi, On Mon, Apr 26, 2010 at 11:53:17AM +0200, Marco van Putten wrote: Hi Dejan, Thanks for your response. Dejan Muhamedagic wrote: Never saw this. The '--noprofile' is a bash option. Looks like some strange interaction between python and bash. If you set the "user" option in crm, sudo is used to run all external programs. Perhaps that is the culprit. I was running the crm command with another user name than root but with userid 0 and groupid 0. You mean as effective id 0 (as in su or sudo) or that you have another user with the id 0? I have another user with uid 0 in /etc/passwd. But if I run crm as root it works well. For me this works as an OK workaround but on 3 other clusters (all with an older pacemaker version) I don't have this problem... Must be some environment issue. You should check your .profile .bashrc .bash_profile /etc/profile.d/* /etc/bash*, there seem to be so many in use and I probably forgot a few. I use the tcsh shell for this userid 0 user. When I switch to bash in /etc/passwd the problem disappears. So it's probably something in my .tcshrc. I'll look into it. Thanks you very much for your help. Bye, Marco. Thanks, Dejan Has this anything to do with the new version of cluster-glue a couple of days ago...? No, glue shouldn't have anything to do with this. Thanks, Dejan Thanks, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] Unable to commit from crm.
Hi Dejan, Thanks for your response. Dejan Muhamedagic wrote: Never saw this. The '--noprofile' is a bash option. Looks like some strange interaction between python and bash. If you set the "user" option in crm, sudo is used to run all external programs. Perhaps that is the culprit. I was running the crm command with another user name than root but with userid 0 and groupid 0. But if I run crm as root it works well. For me this works as an OK workaround but on 3 other clusters (all with an older pacemaker version) I don't have this problem... Has this anything to do with the new version of cluster-glue a couple of days ago...? No, glue shouldn't have anything to do with this. Thanks, Dejan Thanks, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
[Pacemaker] Unable to commit from crm.
Hi, I've just installed a fresh version of pacemaker/heartbeat/ldirectord on 2 RHEL5 servers with the repositories from http://www.clusterlabs.org/rpm/epel-5. When I startup heartbeat all is fine. No problems so far. But after I do some editing with "crm > configure > edit" when I try to do a commit I get this error message: crm(live)configure# commit Unknown option: `--noprofile' Usage: -norc [ -bcdefilmnqstvVxX ] [ argument ... ]. ERROR: creating tmp shadow __crmshell.2440 failed When I edit the file /var/lib/heartbeat/crm/shadow.__crmshell.2440 and do a "crm_shadow -C __crmshell.2440 --force" eventually my modifications are committed. Anyone else had this problem and is there something I can do about this? Has this anything to do with the new version of cluster-glue a couple of days ago...? These are the versions of heartbeat/pacemaker/ldirectord I'm running: heartbeat-libs-3.0.3-1.el5 heartbeat-3.0.3-1.el5 pacemaker-libs-1.0.8-5.el5 pacemaker-1.0.8-5.el5 cluster-glue-libs-1.0.4-1.el5 cluster-glue-1.0.4-1.el5 ldirectord-1.0.3-1.el5 Thanks, Marco. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Re: [Pacemaker] Resources don't start on second node after ping fails
Hi Benjamin, Congratulations! Do you mean not connected as in physicly not connected? I'm no expert on the matter but I just ran into the "number" problem a couple of weeks ago myself. Maybe in a newer version this is no longer an issue... Bye, Marco. benjamin.b...@t-systems.com wrote: Hi everybody! I fixed this 'problem'... My drbd-resource wasn't connected. m( The configuration of the ping resource and location were correct. I implemented Marco's advice but I'm sure my solution would've also worked. The failover works just fine right now. Thanks for reading! Benjamin Benz -Ursprüngliche Nachricht- Von: Benz, Benjamin Gesendet: Do 08.04.2010 14:46 An: pacemaker@oss.clusterlabs.org Betreff: [Pacemaker] Resources don't start on second node after ping fails Hi there! I've got a problem with the configuration. I'm using Pacemaker 1.0.7 to move my database from node1 to node2. Everything works fine when I migrate the resources manually or pull out the power plug. Since I want the database to be available in case of network problems I tried to integrate a ping resource as you can see below. When I pull out the network cable the resources stop on node1 but don't start on node2. crm_mon output: Online: [ bb-node1 bb-node2 ] Master/Slave Set: ms_drbd_ora Slaves: [ bb-node2 ] Stopped: [ drbd_ora:1 ] Clone Set: connected Started: [ bb-node1 bb-node2 ] I guess there's something wrong with my configuration of the location but I can't figure it out. It would be great if someone could help me out! If you have other helpful hints concerning my config feel free to answer! Regards Benjamin Benz crm configure show: node $id="d109b732-1cfc-4cd8-9cce-ba9323a56087" bb-node2 node $id="f995b3ac-734f-4cc4-aacb-cbec22e48de5" bb-node1 primitive drbd_ora ocf:linbit:drbd \ params drbd_resource="ora" \ op monitor interval="5s" timeout="20s" on-fail="restart" primitive fs_ora ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/oracle" fstype="ext3" \ op monitor interval="5s" timeout="40s" on-fail="restart" primitive ip_ora ocf:heartbeat:IPaddr2 \ params ip="53.113.178.29" cidr_netmask="255.255.255.0" \ op monitor interval="5s" timeout="20s" on-fail="restart" primitive oracle_ora ocf:heartbeat:oracle \ params home="/oracle" sid="bbcluster" user="oracle" ipcrm="orauser" \ op monitor interval="5s" timeout="30s" on-fail="restart" primitive oralsnr_ora ocf:heartbeat:oralsnr \ params home="/oracle" sid="bbcluster" user="oracle" \ op monitor interval="5s" timeout="30s" on-fail="restart" primitive ping ocf:pacemaker:ping \ params dampen="5s" host_list="53.118.160.121" multiplier="1000" name="pingval" \ operations $id="ping-operations" \ op monitor interval="10s" timeout="10s" group ora_group fs_ora ip_ora oralsnr_ora oracle_ora \ meta target-role="Started" ms ms_drbd_ora drbd_ora \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" clone connected ping \ meta globally-unique="false" target-role="Started" location ms_drbd_ora_on_connected_node ms_drbd_ora \ rule $id="ms_drbd_ora_on_connected_node-rule" -inf: not_defined pingval or pingval lte 0 colocation ora_group_on_ms_drbd_ora inf: ora_group ms_drbd_ora:Master order ms_drbd_ora_before_ora_group inf: ms_drbd_ora:promote ora_group:start property $id="cib-bootstrap-options" \ dc-version="1.0.7-6e1815972fc236825bf3658d7f8451d33227d420" \ cluster-infrastructure="Heartbeat" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1270732011" ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Resources don't start on second node after ping fails
Hi Benjamin, rule $id="ms_drbd_ora_on_connected_node-rule" -inf: not_defined pingval or pingval lte 0 You could give this a try instead: rule $id="ms_drbd_ora_on_connected_node-rule" -inf: not_defined pingval or pingval number:lte 0 Good luck, Marco. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker