On 12/16/2016 07:46 AM, avinash shankar wrote: > > Hello team, > > I am a newbie in pacemaker and corosync cluster. > I am facing trouble with fence_agent on RHEL 6.5 > I have installed pcs, pacemaker, corosync, cman on RHEL 6.5 on two > virtual nodes (libvirt) cluster. > SELINUX and firewall is completely disabled. > > # yum list installed | egrep 'pacemaker|corosync|cman|fence' > cman.x86_64 3.0.12.1-78.el6 > @rhel-ha-for-rhel-6-server-rpms > corosync.x86_64 1.4.7-5.el6 > @rhel-ha-for-rhel-6-server-rpms > corosynclib.x86_64 1.4.7-5.el6 > @rhel-ha-for-rhel-6-server-rpms > fence-agents.x86_64 4.0.15-12.el6 > @rhel-6-server-rpms > fence-virt.x86_64 0.2.3-19.el6 > @rhel-ha-for-rhel-6-server-eus-rpms > pacemaker.x86_64 1.1.14-8.el6_8.2 > @rhel-ha-for-rhel-6-server-rpms > pacemaker-cli.x86_64 1.1.14-8.el6_8.2 > @rhel-ha-for-rhel-6-server-rpms > pacemaker-cluster-libs.x86_64 1.1.14-8.el6_8.2 > @rhel-ha-for-rhel-6-server-rpms > pacemaker-libs.x86_64 1.1.14-8.el6_8.2 > @rhel-ha-for-rhel-6-server-rpms > > > I bring up cluster using pcs cluster start --all > also done pcs property set stonith-enabled=false
fence_pcmk simply tells CMAN to use pacemaker's fencing ... it can't work if pacemaker's fencing is disabled. > Below is the status > --------------------------- > # pcs status > Cluster name: roamclus > Last updated: Fri Dec 16 18:54:40 2016 Last change: Fri Dec 16 > 17:44:50 2016 by root via cibadmin on cnode1 > Stack: cman > Current DC: NONE > 2 nodes and 2 resources configured > > Online: [ cnode1 ] > OFFLINE: [ cnode2 ] > > Full list of resources: > > PCSD Status: > cnode1: Online > cnode2: Online > --------------------------- > Same kind of output is observed on other node = cnode2 > So nodes see each other as OFFLINE. > Expected result is Online: [ cnode1 cnode2 ] > I did same packages installation on RHEL 6.8 and when I am starting the > cluster, > it shows both nodes ONLINE from each other. > > I need to resolve this such that on RHEL 6.5 nodes when we start cluster > by default > both nodes should display each others status as online. > ---------------------------------------------- > Below is the /etc/cluster/cluster.conf > > <cluster config_version="9" name="roamclus"> > <fence_daemon/> > <clusternodes> > <clusternode name="cnode1" nodeid="1" votes="1"> > <fence> > <method name="pcmk-method"> > <device name="pcmk-redirect" port="cnode1"/> > </method> > </fence> > </clusternode> > <clusternode name="cnode2" nodeid="2" votes="1"> > <fence> > <method name="pcmk-method"> > <device name="pcmk-redirect" port="cnode2"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman broadcast="no" expected_votes="1" transport="udp" two_node="1"/> > <fencedevices> > <fencedevice agent="fence_pcmk" name="pcmk-redirect"/> > </fencedevices> > <rm> > <failoverdomains/> > <resources/> > </rm> > </cluster> > ---------------------------------------------- > # cat /var/lib/pacemaker/cib/cib.xml > <cib crm_feature_set="3.0.10" validate-with="pacemaker-2.4" epoch="15" > num_updates="0" admin_epoch="0" cib-last-written="Fri Dec 16 18:57:10 > 2016" update-origin="cnode1" update-client="cibadmin" update-user="root" > have-quorum="1" dc-uuid="cnode1"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <nvpair id="cib-bootstrap-options-have-watchdog" > name="have-watchdog" value="false"/> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > value="1.1.14-8.el6_8.2-70404b0"/> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" > name="cluster-infrastructure" value="cman"/> > <nvpair id="cib-bootstrap-options-stonith-enabled" > name="stonith-enabled" value="false"/> > </cluster_property_set> > </crm_config> > <nodes> > <node id="cnode1" uname="cnode1"/> > <node id="cnode2" uname="cnode2"/> > </nodes> > <resources/> > <constraints/> > </configuration> > </cib> > ------------------------------------------------ > /var/log/messages have below contents : > > Dec 15 20:29:43 cnode2 kernel: DLM (built Oct 26 2016 10:26:08) installed > Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Corosync Cluster > Engine ('1.4.7'): started and ready to provide service. > Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Corosync built-in > features: nss dbus rdma snmp > Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Successfully read > config from /etc/cluster/cluster.conf > Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Successfully parsed > cman config > Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] Initializing transport > (UDP/IP Multicast). > Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] Initializing > transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). > Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] The network interface > [10.10.18.138] is now up. > Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Using quorum provider > quorum_cman > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync cluster quorum service v0.1 > Dec 15 20:29:46 cnode2 corosync[2464]: [CMAN ] CMAN 3.0.12.1 (built > Feb 1 2016 07:06:19) started > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync CMAN membership service 2.90 > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > openais checkpoint service B.01.01 > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync extended virtual synchrony service > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync configuration service > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync cluster closed process group service v1.01 > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync cluster config database access v1.01 > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync profile loading service > Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Using quorum provider > quorum_cman > Dec 15 20:29:46 cnode2 corosync[2464]: [SERV ] Service engine loaded: > corosync cluster quorum service v0.1 > Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Compatibility mode set > to whitetank. Using V1 and V2 of the synchronization engine. > Dec 15 20:29:46 cnode2 corosync[2464]: [TOTEM ] A processor joined or > left the membership and a new membership was formed. > Dec 15 20:29:46 cnode2 corosync[2464]: [CMAN ] quorum regained, > resuming activity > Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] This node is within > the primary component and will provide service. > Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Members[1]: 2 > Dec 15 20:29:46 cnode2 corosync[2464]: [QUORUM] Members[1]: 2 > Dec 15 20:29:46 cnode2 corosync[2464]: [CPG ] chosen downlist: > sender r(0) ip(10.10.18.138) ; members(old:0 left:0) > Dec 15 20:29:46 cnode2 corosync[2464]: [MAIN ] Completed service > synchronization, ready to provide service. > Dec 15 20:29:50 cnode2 fenced[2529]: fenced 3.0.12.1 started > Dec 15 20:29:50 cnode2 dlm_controld[2543]: dlm_controld 3.0.12.1 started > Dec 15 20:29:51 cnode2 gfs_controld[2606]: gfs_controld 3.0.12.1 started > Dec 15 20:30:36 cnode2 pacemaker: Starting Pacemaker Cluster Manager > Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Additional logging > available in /var/log/pacemaker.log > Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Switching to > /var/log/cluster/corosync.log > Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Additional logging > available in /var/log/cluster/corosync.log > Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Starting Pacemaker > 1.1.14-8.el6_8.2 (Build: 70404b0): generated-manpages agent-manpages > ascii-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman acls > > Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: Membership 4: quorum > acquired > Dec 15 20:30:36 cnode2 pacemakerd[2767]: notice: cman_event_callback: > Node cnode2[2] - state is now member (was (null)) > > Dec 15 20:30:36 cnode2 cib[2773]: notice: Additional logging available > in /var/log/cluster/corosync.log > > Dec 15 20:30:36 cnode2 cib[2773]: notice: Using new config location: > /var/lib/pacemaker/cib > Dec 15 20:30:36 cnode2 cib[2773]: warning: Could not verify cluster > configuration file /var/lib/pacemaker/cib/cib.xml: No such file or > directory (2) > Dec 15 20:30:36 cnode2 cib[2773]: warning: Primary configuration > corrupt or unusable, trying backups in /var/lib/pacemaker/cib > Dec 15 20:30:36 cnode2 cib[2773]: warning: Continuing with an empty > configuration. The above is the problem. Your configuration may have a syntax error or be compatible with a different version of pacemaker. Try running "pcs cluster verify -V" to see what the issue is. Also, feel free to open a support case with Red Hat. > Dec 15 20:30:36 cnode2 stonith-ng[2774]: notice: Additional logging > available in /var/log/cluster/corosync.log > Dec 15 20:30:36 cnode2 stonith-ng[2774]: notice: Connecting to cluster > infrastructure: cman > Dec 15 20:30:36 cnode2 attrd[2776]: notice: Additional logging > available in /var/log/cluster/corosync.log > Dec 15 20:30:36 cnode2 attrd[2776]: notice: Connecting to cluster > infrastructure: cman > Dec 15 20:30:36 cnode2 stonith-ng[2774]: notice: crm_update_peer_proc: > Node cnode2[2] - state is now member (was (null)) > Dec 15 20:30:36 cnode2 pengine[2777]: notice: Additional logging > available in /var/log/cluster/corosync.log > Dec 15 20:30:36 cnode2 lrmd[2775]: notice: Additional logging > available in /var/log/cluster/corosync.log > Dec 15 20:30:36 cnode2 attrd[2776]: notice: crm_update_peer_proc: Node > cnode2[2] - state is now member (was (null)) > Dec 15 20:30:36 cnode2 crmd[2778]: notice: Additional logging > available in /var/log/cluster/corosync.log > Dec 15 20:30:36 cnode2 crmd[2778]: notice: CRM Git Version: > 1.1.14-8.el6_8.2 (70404b0) > Dec 15 20:30:36 cnode2 cib[2773]: notice: Connecting to cluster > infrastructure: cman > Dec 15 20:30:36 cnode2 attrd[2776]: notice: Starting mainloop... > Dec 15 20:30:36 cnode2 cib[2773]: notice: crm_update_peer_proc: Node > cnode2[2] - state is now member (was (null)) > Dec 15 20:30:36 cnode2 cib[2782]: warning: Could not verify cluster > configuration file /var/lib/pacemaker/cib/cib.xml: No such file or > directory (2) > Dec 15 20:30:37 cnode2 stonith-ng[2774]: notice: Watching for stonith > topology changes > Dec 15 20:30:37 cnode2 crmd[2778]: notice: Connecting to cluster > infrastructure: cman > Dec 15 20:30:37 cnode2 crmd[2778]: notice: Membership 4: quorum acquired > Dec 15 20:30:37 cnode2 crmd[2778]: notice: cman_event_callback: Node > cnode2[2] - state is now member (was (null)) > Dec 15 20:30:37 cnode2 crmd[2778]: notice: The local CRM is operational > Dec 15 20:30:37 cnode2 crmd[2778]: notice: State transition S_STARTING > -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] > Dec 15 20:30:42 cnode2 fenced[2529]: fencing node cnode1 > Dec 15 20:30:42 cnode2 fence_pcmk[2805]: Requesting Pacemaker fence > cnode1 (reset) > Dec 15 20:30:42 cnode2 stonith-ng[2774]: notice: Client > stonith_admin.cman.2806.6d791bd8 wants to fence (reboot) 'cnode1' with > device '(any)' > Dec 15 20:30:42 cnode2 stonith-ng[2774]: notice: Initiating remote > operation reboot for cnode1: c398b8b7-6ba1-4068-a174-547bac72476d (0) > Dec 15 20:30:42 cnode2 stonith-ng[2774]: notice: Couldn't find anyone > to fence (reboot) cnode1 with any device > Dec 15 20:30:42 cnode2 stonith-ng[2774]: error: Operation reboot of > cnode1 by <no-one> for stonith_admin.cman.2806@cnode2.c398b8b7: No such > device > Dec 15 20:30:42 cnode2 crmd[2778]: notice: Peer cnode1 was not > terminated (reboot) by <anyone> for cnode2: No such device > (ref=c398b8b7-6ba1-4068-a174-547bac72476d) by client stonith_admin.cman.2806 > Dec 15 20:30:42 cnode2 fence_pcmk[2805]: Call to fence cnode1 (reset) > failed with rc=237 > Dec 15 20:30:42 cnode2 fenced[2529]: fence cnode1 dev 0.0 agent > fence_pcmk result: error from agent > Dec 15 20:30:42 cnode2 fenced[2529]: fence cnode1 failed > Dec 15 20:30:45 cnode2 fenced[2529]: fencing node cnode1 > Dec 15 20:30:45 cnode2 fence_pcmk[2825]: Requesting Pacemaker fence > cnode1 (reset) > Dec 15 20:30:45 cnode2 stonith-ng[2774]: notice: Client > stonith_admin.cman.2826.f2c208fe wants to fence (reboot) 'cnode1' with > device '(any)' > Dec 15 20:30:45 cnode2 stonith-ng[2774]: notice: Initiating remote > operation reboot for cnode1: b5df8517-d8a7-4f33-8cd2-d41c512d13ae (0) > Dec 15 20:30:45 cnode2 stonith-ng[2774]: notice: Couldn't find anyone > to fence (reboot) cnode1 with any device > Dec 15 20:30:45 cnode2 stonith-ng[2774]: error: Operation reboot of > cnode1 by <no-one> for stonith_admin.cman.2826@cnode2.b5df8517: No such > device > Dec 15 20:30:48 cnode2 crmd[2778]: notice: Peer cnode1 was not > terminated (reboot) by <anyone> for cnode2: No such device > (ref=aff3eb58-4777-4fca-9802-eb084dc56ad4) by client stonith_admin.cman.2846 > Dec 15 20:30:48 cnode2 fence_pcmk[2845]: Call to fence cnode1 (reset) > failed with rc=237 > Dec 15 20:30:48 cnode2 fenced[2529]: fence cnode1 dev 0.0 agent > fence_pcmk result: error from agent > Dec 15 20:30:48 cnode2 fenced[2529]: fence cnode1 failed > Dec 15 20:30:51 cnode2 fence_pcmk[2869]: Requesting Pacemaker fence > cnode1 (reset) > Dec 15 20:30:51 cnode2 stonith-ng[2774]: notice: Client > stonith_admin.cman.2870.1c9e3d98 wants to fence (reboot) 'cnode1' with > device '(any)' > Dec 15 20:30:51 cnode2 stonith-ng[2774]: notice: Initiating remote > operation reboot for cnode1: b2435128-3702-44a0-a42e-52b642278686 (0) > Dec 15 20:30:51 cnode2 stonith-ng[2774]: notice: Couldn't find anyone > to fence (reboot) cnode1 with any device > Dec 15 20:30:51 cnode2 stonith-ng[2774]: error: Operation reboot of > cnode1 by <no-one> for stonith_admin.cman.2870@cnode2.b2435128: No such > device > > ================================================================ > > Please help to solve this problem. _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org