Hello everyone. I am trying to run a 2 node cluster with 1 shared IP for Tomcat. This works fine until I set the monitor operation inside the Tomcat resource where the CRM keeps trying to restart Tomcat over and over infinitely.
Without the monitor operation in the CIB it won't keep trying to restart Tomcat but if I stop it manually it doesn't automatically get started again. I tried the tomcat OCF RA but there are lots of incorrect values hard coded in so I edited up an init script to what I thought was LSB compatible. This is the init script: #!/bin/sh # description: Start or stop the Tomcat server # ### BEGIN INIT INFO # Provides: tomcat # Required-Start: $network $syslog # Required-Stop: $network # Default-Start: 3 # Default-Stop: 0 # Description: Start or stop the Tomcat server ### END INIT INFO RETVAL=$? NAME=tomcat export JRE_HOME=/opt/java export CATALINA_HOME=/opt/$NAME export CATALINA_BASE=/opt/$NAME export JAVA_HOME=/opt/java check_running() { NAME=$1 LINES=`ps -ef | grep java | grep opt | grep $NAME | grep -v grep | wc -l ` [ $LINES -gt 0 ] && echo "yes" } case "$1" in 'start') RUNNING=`check_running $NAME` [ "$RUNNING" ] && exit 0 if [ -f $CATALINA_HOME/bin/startup.sh ]; then echo $"Starting Tomcat" $CATALINA_HOME/bin/startup.sh fi ;; 'stop') RUNNING=`check_running $NAME` [ ! "$RUNNING" ] && exit 0 if [ -f $CATALINA_HOME/bin/shutdown.sh ]; then echo $"Stopping Tomcat" $CATALINA_HOME/bin/shutdown.sh fi ;; 'restart') $0 stop sleep 15 $0 start ;; 'status') RUNNING=`check_running $NAME` [ "$RUNNING" ] && exit 0 || exit 1;; *) echo echo $"Usage: $0 {start|stop}" echo exit 1;; esac exit $RETVAL This is my cib.xml <cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" crm_feature_set="2.0" epoch="125" num_updates="82" cib-last-written="Wed Dec 3 16:45:56 2008" ccm_transition="2" dc_uuid="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <attributes> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/> </attributes> </cluster_property_set> </crm_config> <nodes> <node id="7e9a5233-d24c-441f-9f14-03352172f08b" uname="hs-node2" type="normal"/> <node id="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb" uname="hs-node1" type="normal"/> </nodes> <resources> <clone id="tomcat"> <instance_attributes id="5908d3eb-7d48-4c7d-bcca-9020f8eadc87"> <attributes> <nvpair name="clone_max" value="2" id="19a0d76d-9697-4d19-8990-0f098d299a4f"/> <nvpair name="clone_node_max" value="1" id="de765b64-ece4-4c19-9659-13e20b60d9bb"/> </attributes> </instance_attributes> <group id="tomcat_group"> <primitive id="ip_1" class="ocf" type="IPaddr" provider="heartbeat"> <instance_attributes id="e79760a4-c715-477a-a4b7-85eab9bf9ae9"> <attributes> <nvpair name="ip" value="2.21.2.5" id="07540941-f4f8-4bd0-ac78-7d62f212145a"/> </attributes> </instance_attributes> </primitive> <primitive id="tomcat_1" class="lsb" type="tomcat" provider="heartbeat"> <operations> <op id="monitor_tomcat" interval="120s" name="monitor" timeout="60s"/> </operations> </primitive> </group> </clone> </resources> <constraints/> </configuration> This is the ha.cf: udpport 694 autojoin none crm true ucast eth0 2.21.2.4 ucast eth0 2.21.2.3 node hs-node1 node hs-node2 respawn root /sbin/evmsd apiauth evms uid=hacluster,root This is what crm_mon says: ============ Last updated: Wed Dec 3 17:26:47 2008 Current DC: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb) 2 Nodes configured. 1 Resources configured. ============ Node: hs-node2 (7e9a5233-d24c-441f-9f14-03352172f08b): online Node: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb): online Clone Set: tomcat Resource Group: tomcat_group:0 ip_1:0 (ocf::heartbeat:IPaddr): Started hs-node2 tomcat_1:0 (lsb:tomcat): Started hs-node2 FAILED Resource Group: tomcat_group:1 ip_1:1 (ocf::heartbeat:IPaddr): Started hs-node1 tomcat_1:1 (lsb:tomcat): Stopped Failed actions: tomcat_1:0_monitor_120000 (node=hs-node2, call=809, rc=7): complete It was working but suddenly stopped and I have no idea why. If anyone could provide any pointers that would be great. I'm using: SLES 10 SP2 Heartbeat 2.1.3 Thanks Darren Mansell _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems