Hi, On Wed, Dec 03, 2008 at 05:09:03PM +0000, Darren Mansell wrote: > Hello everyone. > > I am trying to run a 2 node cluster with 1 shared IP for Tomcat. This > works fine until I set the monitor operation inside the Tomcat resource > where the CRM keeps trying to restart Tomcat over and over infinitely. > > Without the monitor operation in the CIB it won't keep trying to restart > Tomcat but if I stop it manually it doesn't automatically get started > again. > > I tried the tomcat OCF RA but there are lots of incorrect values hard > coded in so I edited up an init script to what I thought was LSB > compatible.
It should be fixed then. Can you provide a list of stuff which is wrong on your platform/distribution? Thanks, Dejan > This is the init script: > > > > #!/bin/sh > > > # description: Start or stop the Tomcat server > > > # > > > ### BEGIN INIT INFO > > > # Provides: tomcat > > > # Required-Start: $network $syslog > > > # Required-Stop: $network > # Default-Start: 3 > # Default-Stop: 0 > # Description: Start or stop the Tomcat server > ### END INIT INFO > > RETVAL=$? > NAME=tomcat > export JRE_HOME=/opt/java > export CATALINA_HOME=/opt/$NAME > export CATALINA_BASE=/opt/$NAME > export JAVA_HOME=/opt/java > > check_running() { > NAME=$1 > LINES=`ps -ef | grep java | grep opt | grep $NAME | grep -v grep | wc > -l ` > [ $LINES -gt 0 ] && echo "yes" > } > > case "$1" in > 'start') > RUNNING=`check_running $NAME` > [ "$RUNNING" ] && exit 0 > if [ -f $CATALINA_HOME/bin/startup.sh ]; > then > echo $"Starting Tomcat" > $CATALINA_HOME/bin/startup.sh > fi > ;; > 'stop') > RUNNING=`check_running $NAME` > [ ! "$RUNNING" ] && exit 0 > if [ -f $CATALINA_HOME/bin/shutdown.sh ]; > then > echo $"Stopping Tomcat" > $CATALINA_HOME/bin/shutdown.sh > fi > ;; > 'restart') > $0 stop > sleep 15 > $0 start > ;; > 'status') > RUNNING=`check_running $NAME` > [ "$RUNNING" ] && exit 0 || exit 1;; > *) > echo > echo $"Usage: $0 {start|stop}" > echo > exit 1;; > > esac > exit $RETVAL > > > > > > This is my cib.xml > > > > > > <cib generated="true" admin_epoch="0" have_quorum="true" ignore_dtd="false" > num_peers="2" cib_feature_revision="1.3" crm_feature_set="2.0" epoch="125" > num_updates="82" cib-last-written="Wed Dec 3 16:45:56 2008" > ccm_transition="2" dc_uuid="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb"> > > <configuration> > > > <crm_config> > > > <cluster_property_set id="cib-bootstrap-options"> > > > <attributes> > > > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/> > > </attributes> > > > </cluster_property_set> > > > </crm_config> > > > <nodes> > > > <node id="7e9a5233-d24c-441f-9f14-03352172f08b" uname="hs-node2" > type="normal"/> > > <node id="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb" uname="hs-node1" > type="normal"/> > > </nodes> > > > <resources> > > > <clone id="tomcat"> > > > <instance_attributes id="5908d3eb-7d48-4c7d-bcca-9020f8eadc87"> > > > <attributes> > > > <nvpair name="clone_max" value="2" > id="19a0d76d-9697-4d19-8990-0f098d299a4f"/> > > <nvpair name="clone_node_max" value="1" > id="de765b64-ece4-4c19-9659-13e20b60d9bb"/> > > </attributes> > > > </instance_attributes> > > > <group id="tomcat_group"> > > > <primitive id="ip_1" class="ocf" type="IPaddr" > provider="heartbeat"> > > <instance_attributes id="e79760a4-c715-477a-a4b7-85eab9bf9ae9"> > > > <attributes> > > > <nvpair name="ip" value="2.21.2.5" > id="07540941-f4f8-4bd0-ac78-7d62f212145a"/> > > </attributes> > > > </instance_attributes> > > > </primitive> > > > <primitive id="tomcat_1" class="lsb" type="tomcat" > provider="heartbeat"> > > <operations> > > > <op id="monitor_tomcat" interval="120s" name="monitor" > timeout="60s"/> > > </operations> > > > </primitive> > > > </group> > > > </clone> > > > </resources> > > > <constraints/> > > > </configuration> > > > > > This is the ha.cf: > > > > > udpport 694 > autojoin none > crm true > ucast eth0 2.21.2.4 > ucast eth0 2.21.2.3 > node hs-node1 > node hs-node2 > respawn root /sbin/evmsd > apiauth evms uid=hacluster,root > > > > > > This is what crm_mon says: > > > > > ============ > Last updated: Wed Dec 3 17:26:47 2008 > Current DC: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb) > 2 Nodes configured. > 1 Resources configured. > ============ > > Node: hs-node2 (7e9a5233-d24c-441f-9f14-03352172f08b): online > Node: hs-node1 (ae4489bf-2c5d-4cfd-bf81-5e25b11932eb): online > > Clone Set: tomcat > Resource Group: tomcat_group:0 > ip_1:0 (ocf::heartbeat:IPaddr): Started hs-node2 > tomcat_1:0 (lsb:tomcat): Started hs-node2 FAILED > Resource Group: tomcat_group:1 > ip_1:1 (ocf::heartbeat:IPaddr): Started hs-node1 > tomcat_1:1 (lsb:tomcat): Stopped > > Failed actions: > tomcat_1:0_monitor_120000 (node=hs-node2, call=809, rc=7): complete > > > > > > > It was working but suddenly stopped and I have no idea why. If anyone could > provide any pointers that would be great. I'm using: > > SLES 10 SP2 > Heartbeat 2.1.3 > > Thanks > > Darren Mansell > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems