Hi all,
i have a scenario where i have 2 front ends that receive traffic (http
server) and should run some scripts in crontab, some of the scripts
should just being running by 1 server at a time (active one) and others
should run on both.
Regarding the http like is load-sharing i think i cant use heartbeat,
right?
heartbeat is just for active-stanby or can we use to a active-active as
watchdog? i have a cisco css to load sharing the http, and i can make a
watchdog script to the apache.
Regarding the cron crontrol i was thinking to make a script that
replaces the crontab file to whatever is the correct one. When the
heartbeat start what parameter is sent to the script that are resources?
a start if active node and nothing if is the standby?allways start?how
should i config the haresources to do it? what is the best way?
i have other situation that is making a nfs server in solaris 10, i have
2 servers with shared disks ( sun array), can i use heartbeat to this
too? it is possible to make it in such way that if i had i failover in
nfs server the clients doesn't need to reconnect?
Its a long post...sorry!
thanks!!
--
I'll have a go at this one. J
I've got some clusters implementing both of the features you're
requesting there. In fact, I've got quite a few heavy, mission-critical
clusters with web services doing load-balancing using Pacemaker and LVS.
To load-balance using Pacemaker, although I believe there are other
ways, I've always used a combination of cloned resources on the cluster,
ldirectord, and a virtual IP address. The virtual IP and ldirectord are
standard primitive resources grouped together so they get run together
on one node like so:
Resource Group: Load-Balancing
VIP (ocf::heartbeat:IPaddr2): Started NODE-01
ldirectord (ocf::heartbeat:ldirectord): Started NODE-01
A configuration for this would be something like:
primitive VIP ocf:heartbeat:IPaddr2 \
params lvs_support="true" ip="192.168.1.100" cidr_netmask="24"
broadcast="192.168.1.255" \
op monitor interval="1m" timeout="10s" \
meta migration-threshold="10"
primitive ldirectord ocf:heartbeat:ldirectord \
params configfile="/etc/ha.d/ldirectord.cf" \
op monitor interval="2m" timeout="20s" \
meta migration-threshold="10" target-role="Started"
group Load-Balancing VIP ldirectord
location Prefer-Node1 ldirectord \
rule $id="prefer-node1-rule" 100: #uname eq NODE-01
And then just put your load-balancing rules in /etc/ha.d/ldirectord.cf:
checktimeout=5
checkinterval=7
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=no
emailalert=yourem...@you.com
virtual=192.168.1.100:80
fallback=192.168.1.250:80
real=192.168.1.10:80 gate 100
real=192.168.1.20:80 gate 100
service=http
scheduler=wlc
protocol=tcp
checktype=negotiate
request="/"
receive="OK"
Pacemaker with Ldirectord/LVS can make a fantastic load-balancer with HA
built using only 2 nodes. I'm surprised more people don't use it in this
way as while it makes the config slightly more complicated, you can use
your passive node to run the cloned resource and maximise your
performance.
Note that to do this you'll need to put the VIP as an extra IP on your
loopback interface (I've still got to file a bug about this), and set
ARP parameters in sysctl. (look on the pacemaker and LVS wiki). You can
also configure LVS to sync the connection table so on failover you won't
lose any connections.
On to your second point. I've found that just writing a loop inside of a
shell script with the appropriate controls and adding it as an LSB style
resource to the cluster works fine. You can then add the resource to the
load-balancing group so it will only run on one node. It's a little bit
like rewriting cron, but cron isn't cluster aware ;) My script is a bit
like:
#!/bin/sh
# description: Start or stop the task script
#
### BEGIN INIT INFO
# Provides: task
# Required-Start: $network $syslog
# Required-Stop: $network
# Default-Start: 3
# Default-Stop: 0
# Description: Start or stop the task script
### END INIT INFO
# Static variables here.
DIR=/opt/task/
BOTHER="y...@yourmail.com"
RUNFILE="/var/run/task"
RUNTIME="003000"
MAINLOOP() {
LOG="/var/log/task.log"
while true
do
TODAY=`date +%d-%m-%Y`
EXTRACTFILE="${DIR}OP/$TODAY.zip"
echo `date` >> $LOG
# Check for permission to run.
if [ ! -f "$RUNFILE" ]
then
echo "No $RUNFILE found" >> $LOG
exit 0
fi
# Check if we've already run today
if [ ! -f "$EXTRACTFILE" ]
then
echo "No $EXTRACTFILE found, continuing" >> $LOG
# Or if we're still running
NUMPROCS=`pgrep -f "Dname=task " | wc -l`
if [ $NUMPROCS -lt 1 ]
then
THETIME=`date +%H%M%S`
if [ $THETIME -gt $RUNTIME ]
then
echo -e "\nApparently $THETIME is greater
than $RUNTIME so it's time to do our thang" >> $LOG
echo -e "------------------------" >> $LOG
echo -e "\n*** Starting extract process
***\nThe time : $THETIME" >> $LOG
echo -e "\nNumber of existing task processes
: $NUMPROCS" >> $LOG
RUN_TASK
fi
fi
fi
sleep 10m
done
}
RUN_TASK() {
Do_stuff to create EXTRACTFILE...
CHECKRET
# Sync everything to the other node so a failover won't make any
difference
rsync -avz $DIR NODE-02:$DIR
# Lets clean up
kill $$
}
CHECKRET() {
RET=$?
if [ $RET -ne 0 ]
then
tail -n 200 $LOG | mail -s "Extract Failed" $BOTHER
else
tail -n 200 $LOG | mail -s "Extract Succeeded" $BOTHER
fi
}
CHECKSTATUS () {
if [ -f "$RUNFILE" ]
then
if [ `pgrep -f "task start" | wc -l` -gt 0 ]
then
RUNNING="yes"
else
unset $RUNNING
fi
fi
}
case "$1" in
'start')
CHECKSTATUS
[ "$RUNNING" ] && echo "$0 is already running" && exit 0
echo $"Starting $0"
touch $RUNFILE
MAINLOOP &
;;
'stop')
[ -f "$RUNFILE" ] && rm $RUNFILE
pkill -f "Dname=task "
pkill -f "task start"
echo "Stopping task"
;;
'restart')
$0 stop
sleep 5
$0 start
;;
'status')
CHECKSTATUS
[ "$RUNNING" ] && echo " running" && exit 0 || echo " stopped"
&& exit 3;;
*)
echo
echo $"Usage: $0 {start|stop}"
echo
exit 1;;
esac
If you edit anything, just run /etc/init.d/task restart.
The fantastic thing about modern Linux-HA with Pacemaker is that you can
do pretty much anything.
Hope this helps.
Darren
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems