Hi thanks for your reply This is Cisco UCS machine. yesterday cisco guys created a separate vswitch for this heartbeat.
regards, Ben On Tue, Sep 18, 2012 at 6:25 AM, Digimer <li...@alteeve.ca> wrote: > You have two problems; > > 1. The nodes can't talk to each other (via multicast) *or* you are taking > too long to start each node. Given that you are using luci, I am guessing > the former. Log into your switch and see if the multicast group shown in > 'cman_tool status' exists. > > 2. Your fencing isn't working. Read the man page for fence_cisco_ucs to > try and debug it. > > digimer > > PS - Please don't reply directly to me. Keep the conversation public. > PPS - Filter out your passwords. ;) > > > On 09/17/2012 11:17 PM, Ben .T.George wrote: > >> Hi thanks for your reply >> >> Beloe is my cluster.conffile >> >> <?xml version="1.0"?> >> <cluster config_version="7" name="eccprd"> >> <clusternodes> >> <clusternode >> name="cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>" >> nodeid="1"> >> >> <fence> >> <method name="ucs-node1"/> >> </fence> >> </clusternode> >> <clusternode >> name="cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net> >> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>>" >> nodeid="2"> >> >> <fence> >> <method name="ucs-node2"/> >> </fence> >> </clusternode> >> </clusternodes> >> <cman expected_votes="1" two_node="1"/> >> <rm> >> <resources> >> <ip address="172.22.10.230" sleeptime="10"/> >> </resources> >> <service exclusive="1" name="eccsapmnt" >> recovery="relocate"> >> <ip ref="172.22.10.230"/> >> </service> >> </rm> >> <fencedevices> >> <fencedevice agent="fence_cisco_ucs" >> ipaddr="172.22.90.61" login="admin" name="ucs-node1" passwd="..."/> >> <fencedevice agent="fence_cisco_ucs" >> ipaddr="172.22.90.59" login="admin" name="ucs-node2" passwd="..."/> >> >> </fencedevices> >> </cluster> >> >> when i try to start cluster on node1, i am geeting this message on >> mesages: >> >> tail -f -n 0 /var/log/messages >> Sep 18 06:06:02 cgceccprd1 modcluster: Starting service: eccsapmnt on node >> Sep 18 06:06:08 cgceccprd1 modcluster: Starting service: eccsapmnt on >> node cgceccprd1.combinedgroup.net >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> >> >> >> but the service is not starting.on luci , it's showing both nodes are >> online.but on clustat different >> >> main error getting on messages is >> >> Sep 18 03:35:48 cgceccprd1 fenced[8424]: fencing node >> cgceccprd2.combinedgroup.net >> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>> >> still >> >> retrying >> Sep 18 04:06:16 cgceccprd1 fenced[8424]: fencing node >> cgceccprd2.combinedgroup.net >> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>> >> still >> >> retrying >> Sep 18 04:36:45 cgceccprd1 fenced[8424]: fencing node >> cgceccprd2.combinedgroup.net >> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>> >> still >> >> retrying >> Sep 18 05:07:14 cgceccprd1 fenced[8424]: fencing node >> cgceccprd2.combinedgroup.net >> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>> >> still >> >> retrying >> Sep 18 05:37:42 cgceccprd1 fenced[8424]: fencing node >> cgceccprd2.combinedgroup.net >> <http://cgceccprd2.**combinedgroup.net<http://cgceccprd2.combinedgroup.net>> >> still >> >> retrying >> >> These messages from node1.i am geeting same message on node saying that >> >> cgceccprd2 fenced[8424]: fencing node cgceccprd1.combinedgroup.net >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>> >> still retrying >> >> >> i don't know what is problem here. >> >> please help me solve >> Regards, >> Ben >> >> On Tue, Sep 18, 2012 at 4:42 AM, Digimer <li...@alteeve.ca >> <mailto:li...@alteeve.ca>> wrote: >> >> On 09/17/2012 06:07 PM, Ben .T.George wrote: >> >> Hi >> >> My cluster is failing to start. >> >> if i check clustat on node1, status is showing node1 online and >> node2 >> offline. If the check clustat on node2, node2 is showing online >> and >> node1 is offline >> >> i checked logs.fanced is throwing errors.how can i rectify this >> >> Sep 17 23:24:54 fenced fencing node cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> still retrying >> >> Sep 17 23:55:06 fenced fencing node cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> still retrying >> >> Sep 18 00:25:19 fenced fencing node cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> still retrying >> >> Sep 18 00:55:03 fenced fenced 3.0.12.1 started >> Sep 18 00:55:03 fenced failed to get dbus connection >> Sep 18 00:55:55 fenced fencing node cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> >> >> >> Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> dev 0.0 agent none >> result: error >> >> no method >> Sep 18 00:55:55 fenced fence cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> failed >> >> Sep 18 00:55:58 fenced fencing node cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> >> >> >> Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> dev 0.0 agent none >> result: error >> >> no method >> Sep 18 00:55:58 fenced fence cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> failed >> >> Sep 18 00:56:01 fenced fencing node cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> >> >> >> Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> dev 0.0 agent none >> result: error >> >> no method >> Sep 18 00:56:01 fenced fence cgceccprd1.combinedgroup.net >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net> >> > >> <http://cgceccprd1.__combinedg**roup.net<http://combinedgroup.net> >> >> >> <http://cgceccprd1.**combinedgroup.net<http://cgceccprd1.combinedgroup.net>>> >> failed >> >> >> >> please help me solve this issue >> >> Regards, >> Ben >> >> >> What is your cluster.conf? >> >> likely you either have no fencing configured, or your fencing is not >> working. Either way, failing to fence is a critical problem and the >> cluster will hang, just as you're seeing here. This is by design. >> Better to hang a cluster than to corrupt it. >> >> digimer >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca >> >> >> >> > > -- > Digimer > Papers and Projects: https://alteeve.ca > -- Yours Sincerely *#!/usr/bin/env python #Mysignature.py :)* Signature = " " " Ben.T.George \n Linux System Administrator \n Diyar United Company \n kuwait \n Phone : +965 - 50629829 \n " " " Print Signature
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster