Re: [Linux-HA] Watchdog configuration with SBD

2008-09-09 Thread NAKAHIRA Kazutomo

Hi, Lars

Thank you for your advice.

I decided to use the sbd command with -W option to enable watchdog.
It operates well when starting from the command line.

But, I encountered other problem when sbd watch process is started
by Heartbeat using respawn directive in ha.cf.

Start Heartbeat and sbd watch process is OK. The sbd watch process
check shared disk and process messages in the mailbox slot correctly.

But, when Heartbeat is stopped, kernel outputs the following
error message to syslog and the system was rebooted.

(snip)
Sep  9 11:15:56 dl380g5a kernel: SoftDog: Unexpected close, not stopping 
watchdog!

(snip)

It seems that the sbd watch process had been stopped
before watchdog_close() was done. And watchdog reboot
the system.


Best Regards,
NAKAHIRA Kazutomo

Lars Marowsky-Bree wrote:

On 2008-09-08T18:04:00, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote:


I'm trying SBD that introduced into the latest lha-2.1 repository.

In SBD official document(http://www.linux-ha.org/SBD_Fencing),
watchdog is recommended to be used, but I'm torn between
enable watchdog using sbd watch command option -W and
enable watchdog using watchdog directive in ha.cf.


I'd suggest to use the sbd one - that's the one you want to definitely
protect in an sbd configuration.


Please point it out if it is necessary to abolish an existing setting,
and to use sbd watch command with -W option.
# Or are both configurations needed?


You cannot use both; in that case, it'd continue running until both
fail, which is not recommended.


Regards,
Lars




--

NAKAHIRA Kazutomo
NTT DATA INTELLILINK CORPORATION
Open Source Business Unit
Software Services Integration Business Division

Toyosu Center Building Annex, 3-3-9, Toyosu,
Koto-ku, Tokyo 135-0061, Japan
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Watchdog configuration with SBD

2008-09-09 Thread Lars Marowsky-Bree
On 2008-09-09T15:28:31, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote:

 Hi, Lars

 Thank you for your advice.

 I decided to use the sbd command with -W option to enable watchdog.
 It operates well when starting from the command line.

 But, I encountered other problem when sbd watch process is started
 by Heartbeat using respawn directive in ha.cf.

Yes, that is a side-effect of starting it there. It really should be
started via the init script, as I do with the init script on SuSE. I'm
attaching the script for reference.

 (snip)
 Sep  9 11:15:56 dl380g5a kernel: SoftDog: Unexpected close, not stopping 
 watchdog!
 (snip)

 It seems that the sbd watch process had been stopped
 before watchdog_close() was done. And watchdog reboot
 the system.

Yes. heartbeat sends a kill signal and doesn't allow sbd to recover;
also, sbd really should continue running even if heartbeat crashes and
must continue running during hb shutdown.

Regards,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

#!/bin/sh
#
# heartbeat Start high-availability services
#
# Author:   Lars Marowsky-Bree [EMAIL PROTECTED]
#
# chkconfig: 2345 @HB_INITSTARTPRI@ @HB_INITSTOPPRI@
# description: Startup script high-availability services.
# processname: heartbeat
# pidfile: @localstatedir@/run/heartbeat.pid
# config: @sysconfdir@/ha.d/ha.cf
#
### BEGIN INIT INFO
# Provides: heartbeat
# Required-Start: $network $syslog $named
# Should-Start: drbd sshd xendomains o2cb evms ocfs2
# Required-Stop: $network $syslog $named
# Should-Stop: drbd sshd xendomains o2cb evms ocfs2
# Default-Start:  3 5
# Default-Stop:   0 1 2 6
# Description:Start heartbeat HA services
### END INIT INFO

HA_DIR=/etc/ha.d; export HA_DIR
CONFIG=$HA_DIR/ha.cf
. $HA_DIR/shellfuncs

# Setup SuSE specific variables
[ -r /etc/rc.status ]  . /etc/rc.status
rc_reset

if [ ! -x $HA_BIN/heartbeat ]; then
echo -n High-Availability services not installed (heartbeat)
if [ $1 = stop ]; then exit 0; fi
rc_status -s
rc_exit
fi

if [ ! -x $HA_BIN/ha_logd ]; then
echo -n High-Availability services not installed (ha_logd)
if [ $1 = stop ]; then exit 0; fi
rc_status -s
rc_exit
fi

SBD_CONFIG=/etc/sysconfig/sbd
if [ -f $SBD_CONFIG ]; then
. $SBD_CONFIG
fi

StartSBD() {
if [ -n $SBD_DEVICE ]; then
if ! sbd -d $SBD_DEVICE -D $SBD_OPTS watch ; then
rc_failed
rc_exit
fi
fi
}

StopSBD() {
if [ -n $SBD_DEVICE ]; then
if ! sbd -d $SBD_DEVICE -D $SBD_OPTS message LOCAL exit ; then
rc_failed
rc_exit
fi
fi
}

StartLogd() {
$HA_BIN/ha_logd -s 21 /dev/null

if [ $? -eq 0 ]; then
   Echo logd is already running
   return 0
fi

$HA_BIN/ha_logd -d
if [ $? -ne 0 ]; then
   Echo starting logd failed
fi
}

StopLogd() {
$HA_BIN/ha_logd -s 21 /dev/null

if [ $? -ne 0 ]; then
  Echo logd is already stopped
  return 0
fi

$HA_BIN/ha_logd -k
if [ $? -ne 0 ]; then
   Echo stopping logd failed
fi
}


StatusHA() {
  $HA_BIN/heartbeat -s /dev/null 21
}

StandbyHA() {
  auto_failback=`ha_parameter auto_failback | tr 'A-Z' 'a-z'`
  nice_failback=`ha_parameter nice_failback | tr 'A-Z' 'a-z'`
  
  case $auto_failback in
*legacy*)   echo auto_failback is set to legacy.  Cannot enter standby.
return 1;;
  esac
  case $nice_failback in
*off*)  echo nice_failback is disabled.  Cannot enter standby.
return 1;;
  esac
  case ${auto_failback}${nice_failback} in
) echo auto_failback defaulted to legacy.  Cannot enter standby.
return 1;;
  esac
  
  echo auto_failback: $auto_failback
  if StatusHA; then
echo -n Attempting to enter standby mode.
if $HA_BIN/hb_standby ; then
  return 0
else
  return 1
fi
  else
 echo -n heartbeat is not currently running.
 return 0
  fi
  
  # Fall-through case:
  # XXX Never reached?
  rc_status -s
  rc_exit
}

if [ ! -f $CONFIG ]; then
echo -n High-Availability services not configured
if [ $1 = stop ]; then exit 0; fi
rc_status -u
rc_exit
fi


case $1 in
  start)
echo -n Starting High-Availability services
StartLogd
StartSBD

if [ -s $HA_DIR/haresources ]; then 
$HA_BIN/ResourceManager verifyallidle
fi

$HA_BIN/heartbeat

rc_status -v

;;

  stop)
echo -n Stopping High-Availability services

$HA_BIN/heartbeat -k

rc_status -v

StopSBD
StopLogd

;;

  status)
echo -n Checking for High-Availability services


Re: [Linux-HA] Watchdog configuration with SBD

2008-09-09 Thread NAKAHIRA Kazutomo
Thank you again for useful information.

I modified heartbeat init script in my test environment
referring to your script and it works fine.
(Please see attached script.)

Best Regards,
NAKAHIRA Kazutomo

Lars Marowsky-Bree wrote:
 On 2008-09-09T15:28:31, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote:
 
 Hi, Lars

 Thank you for your advice.

 I decided to use the sbd command with -W option to enable watchdog.
 It operates well when starting from the command line.

 But, I encountered other problem when sbd watch process is started
 by Heartbeat using respawn directive in ha.cf.
 
 Yes, that is a side-effect of starting it there. It really should be
 started via the init script, as I do with the init script on SuSE. I'm
 attaching the script for reference.
 
 (snip)
 Sep  9 11:15:56 dl380g5a kernel: SoftDog: Unexpected close, not stopping 
 watchdog!
 (snip)

 It seems that the sbd watch process had been stopped
 before watchdog_close() was done. And watchdog reboot
 the system.
 
 Yes. heartbeat sends a kill signal and doesn't allow sbd to recover;
 also, sbd really should continue running even if heartbeat crashes and
 must continue running during hb shutdown.
 
 Regards,
 Lars
 
 
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


-- 

NAKAHIRA Kazutomo
NTT DATA INTELLILINK CORPORATION
Open Source Business Unit
Software Services Integration Business Division

Toyosu Center Building Annex, 3-3-9, Toyosu,
Koto-ku, Tokyo 135-0061, Japan
#!/bin/sh
#
#
# heartbeat Start high-availability services
#
# Author:   Alan Robertson  [EMAIL PROTECTED]
# License:  GNU General Public License (GPL)
#
#   This script works correctly under SuSE, Debian,
#   Conectiva, Red Hat and a few others.  Please let me know if it
#   doesn't work under your distribution, and we'll fix it.
#   We don't hate anyone, and like for everyone to use
#   our software, no matter what OS or distribution you're using.
#
# chkconfig: 2345 75 05
# description: Startup script high-availability services.
# processname: heartbeat
# pidfile: /var/run/heartbeat.pid
# config: /etc/ha.d/ha.cf
#
### BEGIN INIT INFO
# Description: heartbeat is a basic high-availability subsystem.
#   It will start services at initialization, and when machines go up
#   or down.  This version will also perform IP address takeover using
#   gratuitous ARPs.  It works correctly for a 2-node configuration,
#   and is extensible to larger configurations.
#   
#   It implements the following kinds of heartbeats:
#   - Bidirectional Serial Rings (raw serial ports)
#   - UDP/IP broadcast (ethernet, etc)
#   - UDP/IP multicast (ethernet, etc)
#   - Unicast heartbeats
#   - ping heartbeats (for routers, switches, etc.)
#   (to be used for breaking ties in 2-node systems
#and monitoring networking availability)
#
# Short-Description: High-availability services.
# Provides: heartbeat HA
# Required-Start: $remote_fs $network $time $syslog
# Should-Start: openhpid
# Required-Stop: $remote_fs $network $time $syslog
# Should-stop: openhpid
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
### END INIT INFO


HA_DIR=/etc/ha.d; export HA_DIR
CONFIG=$HA_DIR/ha.cf
. $HA_DIR/shellfuncs

LOCKDIR=/var/lock/subsys
RUNDIR=/var/run


#   Echo without putting a newline on the end
EchoNoNl() {
Echo  $@
}

#   Echo with escapes enabled...
EchoEsc() {
Echo  $@
}

echo_failure() {
EchoEsc  Heartbeat failure [rc=$1]. $rc_failed
return $1
}

echo_success() {
: Cool!  It started!
EchoEsc $rc_done
}

if
  [ -r /etc/SuSE-release ]
then
  # rc.status is new since SuSE 7.0
  [ -r /etc/rc.status ]  . /etc/rc.status
  [ -r /etc/rc.config ]  . /etc/rc.config

  # Determine the base and follow a runlevel link name.
  base=${0##*/}
  link=${base#*[SK][0-9][0-9]}

fi
if
  [ -z $rc_done ]
then
  rc_done=Done.
  rc_failed=Failed.
  rc_skipped=Skipped.
fi


# exec 2/var/log/ha-debug

#   This should probably be it's own autoconf parameter
#   because RH has moved it from time to time...
#   and I suspect Conectiva and Mandrake also supply it.

DISTFUNCS=/etc/rc.d/init.d/functions
SUBSYS=heartbeat
MODPROBE=/sbin/modprobe
US=`uname -n`

# Set this to a 1 if you want to automatically load kernel modules
USE_MODULES=1

[ -x $HA_BIN/heartbeat ] || exit 0

#
#   Some environments like it if we use their functions...
#
if
  [ ! -x $DISTFUNCS ]
then
  # Provide our own versions of these functions
  status() {
$HA_BIN/heartbeat -s
  }
  echo_failure() {
  EchoEsc  Heartbeat failure [rc=$1]. $rc_failed
  return $1
  }

Re: [Linux-HA] Watchdog configuration with SBD

2008-09-08 Thread Lars Marowsky-Bree
On 2008-09-08T18:04:00, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote:

 I'm trying SBD that introduced into the latest lha-2.1 repository.
 
 In SBD official document(http://www.linux-ha.org/SBD_Fencing),
 watchdog is recommended to be used, but I'm torn between
 enable watchdog using sbd watch command option -W and
 enable watchdog using watchdog directive in ha.cf.

I'd suggest to use the sbd one - that's the one you want to definitely
protect in an sbd configuration.

 Please point it out if it is necessary to abolish an existing setting,
 and to use sbd watch command with -W option.
 # Or are both configurations needed?

You cannot use both; in that case, it'd continue running until both
fail, which is not recommended.


Regards,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems