Re: [Linux-HA] Watchdog configuration with SBD
Hi, Lars Thank you for your advice. I decided to use the sbd command with -W option to enable watchdog. It operates well when starting from the command line. But, I encountered other problem when sbd watch process is started by Heartbeat using respawn directive in ha.cf. Start Heartbeat and sbd watch process is OK. The sbd watch process check shared disk and process messages in the mailbox slot correctly. But, when Heartbeat is stopped, kernel outputs the following error message to syslog and the system was rebooted. (snip) Sep 9 11:15:56 dl380g5a kernel: SoftDog: Unexpected close, not stopping watchdog! (snip) It seems that the sbd watch process had been stopped before watchdog_close() was done. And watchdog reboot the system. Best Regards, NAKAHIRA Kazutomo Lars Marowsky-Bree wrote: On 2008-09-08T18:04:00, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote: I'm trying SBD that introduced into the latest lha-2.1 repository. In SBD official document(http://www.linux-ha.org/SBD_Fencing), watchdog is recommended to be used, but I'm torn between enable watchdog using sbd watch command option -W and enable watchdog using watchdog directive in ha.cf. I'd suggest to use the sbd one - that's the one you want to definitely protect in an sbd configuration. Please point it out if it is necessary to abolish an existing setting, and to use sbd watch command with -W option. # Or are both configurations needed? You cannot use both; in that case, it'd continue running until both fail, which is not recommended. Regards, Lars -- NAKAHIRA Kazutomo NTT DATA INTELLILINK CORPORATION Open Source Business Unit Software Services Integration Business Division Toyosu Center Building Annex, 3-3-9, Toyosu, Koto-ku, Tokyo 135-0061, Japan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Watchdog configuration with SBD
On 2008-09-09T15:28:31, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote: Hi, Lars Thank you for your advice. I decided to use the sbd command with -W option to enable watchdog. It operates well when starting from the command line. But, I encountered other problem when sbd watch process is started by Heartbeat using respawn directive in ha.cf. Yes, that is a side-effect of starting it there. It really should be started via the init script, as I do with the init script on SuSE. I'm attaching the script for reference. (snip) Sep 9 11:15:56 dl380g5a kernel: SoftDog: Unexpected close, not stopping watchdog! (snip) It seems that the sbd watch process had been stopped before watchdog_close() was done. And watchdog reboot the system. Yes. heartbeat sends a kill signal and doesn't allow sbd to recover; also, sbd really should continue running even if heartbeat crashes and must continue running during hb shutdown. Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde #!/bin/sh # # heartbeat Start high-availability services # # Author: Lars Marowsky-Bree [EMAIL PROTECTED] # # chkconfig: 2345 @HB_INITSTARTPRI@ @HB_INITSTOPPRI@ # description: Startup script high-availability services. # processname: heartbeat # pidfile: @localstatedir@/run/heartbeat.pid # config: @sysconfdir@/ha.d/ha.cf # ### BEGIN INIT INFO # Provides: heartbeat # Required-Start: $network $syslog $named # Should-Start: drbd sshd xendomains o2cb evms ocfs2 # Required-Stop: $network $syslog $named # Should-Stop: drbd sshd xendomains o2cb evms ocfs2 # Default-Start: 3 5 # Default-Stop: 0 1 2 6 # Description:Start heartbeat HA services ### END INIT INFO HA_DIR=/etc/ha.d; export HA_DIR CONFIG=$HA_DIR/ha.cf . $HA_DIR/shellfuncs # Setup SuSE specific variables [ -r /etc/rc.status ] . /etc/rc.status rc_reset if [ ! -x $HA_BIN/heartbeat ]; then echo -n High-Availability services not installed (heartbeat) if [ $1 = stop ]; then exit 0; fi rc_status -s rc_exit fi if [ ! -x $HA_BIN/ha_logd ]; then echo -n High-Availability services not installed (ha_logd) if [ $1 = stop ]; then exit 0; fi rc_status -s rc_exit fi SBD_CONFIG=/etc/sysconfig/sbd if [ -f $SBD_CONFIG ]; then . $SBD_CONFIG fi StartSBD() { if [ -n $SBD_DEVICE ]; then if ! sbd -d $SBD_DEVICE -D $SBD_OPTS watch ; then rc_failed rc_exit fi fi } StopSBD() { if [ -n $SBD_DEVICE ]; then if ! sbd -d $SBD_DEVICE -D $SBD_OPTS message LOCAL exit ; then rc_failed rc_exit fi fi } StartLogd() { $HA_BIN/ha_logd -s 21 /dev/null if [ $? -eq 0 ]; then Echo logd is already running return 0 fi $HA_BIN/ha_logd -d if [ $? -ne 0 ]; then Echo starting logd failed fi } StopLogd() { $HA_BIN/ha_logd -s 21 /dev/null if [ $? -ne 0 ]; then Echo logd is already stopped return 0 fi $HA_BIN/ha_logd -k if [ $? -ne 0 ]; then Echo stopping logd failed fi } StatusHA() { $HA_BIN/heartbeat -s /dev/null 21 } StandbyHA() { auto_failback=`ha_parameter auto_failback | tr 'A-Z' 'a-z'` nice_failback=`ha_parameter nice_failback | tr 'A-Z' 'a-z'` case $auto_failback in *legacy*) echo auto_failback is set to legacy. Cannot enter standby. return 1;; esac case $nice_failback in *off*) echo nice_failback is disabled. Cannot enter standby. return 1;; esac case ${auto_failback}${nice_failback} in ) echo auto_failback defaulted to legacy. Cannot enter standby. return 1;; esac echo auto_failback: $auto_failback if StatusHA; then echo -n Attempting to enter standby mode. if $HA_BIN/hb_standby ; then return 0 else return 1 fi else echo -n heartbeat is not currently running. return 0 fi # Fall-through case: # XXX Never reached? rc_status -s rc_exit } if [ ! -f $CONFIG ]; then echo -n High-Availability services not configured if [ $1 = stop ]; then exit 0; fi rc_status -u rc_exit fi case $1 in start) echo -n Starting High-Availability services StartLogd StartSBD if [ -s $HA_DIR/haresources ]; then $HA_BIN/ResourceManager verifyallidle fi $HA_BIN/heartbeat rc_status -v ;; stop) echo -n Stopping High-Availability services $HA_BIN/heartbeat -k rc_status -v StopSBD StopLogd ;; status) echo -n Checking for High-Availability services
Re: [Linux-HA] Watchdog configuration with SBD
Thank you again for useful information. I modified heartbeat init script in my test environment referring to your script and it works fine. (Please see attached script.) Best Regards, NAKAHIRA Kazutomo Lars Marowsky-Bree wrote: On 2008-09-09T15:28:31, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote: Hi, Lars Thank you for your advice. I decided to use the sbd command with -W option to enable watchdog. It operates well when starting from the command line. But, I encountered other problem when sbd watch process is started by Heartbeat using respawn directive in ha.cf. Yes, that is a side-effect of starting it there. It really should be started via the init script, as I do with the init script on SuSE. I'm attaching the script for reference. (snip) Sep 9 11:15:56 dl380g5a kernel: SoftDog: Unexpected close, not stopping watchdog! (snip) It seems that the sbd watch process had been stopped before watchdog_close() was done. And watchdog reboot the system. Yes. heartbeat sends a kill signal and doesn't allow sbd to recover; also, sbd really should continue running even if heartbeat crashes and must continue running during hb shutdown. Regards, Lars ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- NAKAHIRA Kazutomo NTT DATA INTELLILINK CORPORATION Open Source Business Unit Software Services Integration Business Division Toyosu Center Building Annex, 3-3-9, Toyosu, Koto-ku, Tokyo 135-0061, Japan #!/bin/sh # # # heartbeat Start high-availability services # # Author: Alan Robertson [EMAIL PROTECTED] # License: GNU General Public License (GPL) # # This script works correctly under SuSE, Debian, # Conectiva, Red Hat and a few others. Please let me know if it # doesn't work under your distribution, and we'll fix it. # We don't hate anyone, and like for everyone to use # our software, no matter what OS or distribution you're using. # # chkconfig: 2345 75 05 # description: Startup script high-availability services. # processname: heartbeat # pidfile: /var/run/heartbeat.pid # config: /etc/ha.d/ha.cf # ### BEGIN INIT INFO # Description: heartbeat is a basic high-availability subsystem. # It will start services at initialization, and when machines go up # or down. This version will also perform IP address takeover using # gratuitous ARPs. It works correctly for a 2-node configuration, # and is extensible to larger configurations. # # It implements the following kinds of heartbeats: # - Bidirectional Serial Rings (raw serial ports) # - UDP/IP broadcast (ethernet, etc) # - UDP/IP multicast (ethernet, etc) # - Unicast heartbeats # - ping heartbeats (for routers, switches, etc.) # (to be used for breaking ties in 2-node systems #and monitoring networking availability) # # Short-Description: High-availability services. # Provides: heartbeat HA # Required-Start: $remote_fs $network $time $syslog # Should-Start: openhpid # Required-Stop: $remote_fs $network $time $syslog # Should-stop: openhpid # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 ### END INIT INFO HA_DIR=/etc/ha.d; export HA_DIR CONFIG=$HA_DIR/ha.cf . $HA_DIR/shellfuncs LOCKDIR=/var/lock/subsys RUNDIR=/var/run # Echo without putting a newline on the end EchoNoNl() { Echo $@ } # Echo with escapes enabled... EchoEsc() { Echo $@ } echo_failure() { EchoEsc Heartbeat failure [rc=$1]. $rc_failed return $1 } echo_success() { : Cool! It started! EchoEsc $rc_done } if [ -r /etc/SuSE-release ] then # rc.status is new since SuSE 7.0 [ -r /etc/rc.status ] . /etc/rc.status [ -r /etc/rc.config ] . /etc/rc.config # Determine the base and follow a runlevel link name. base=${0##*/} link=${base#*[SK][0-9][0-9]} fi if [ -z $rc_done ] then rc_done=Done. rc_failed=Failed. rc_skipped=Skipped. fi # exec 2/var/log/ha-debug # This should probably be it's own autoconf parameter # because RH has moved it from time to time... # and I suspect Conectiva and Mandrake also supply it. DISTFUNCS=/etc/rc.d/init.d/functions SUBSYS=heartbeat MODPROBE=/sbin/modprobe US=`uname -n` # Set this to a 1 if you want to automatically load kernel modules USE_MODULES=1 [ -x $HA_BIN/heartbeat ] || exit 0 # # Some environments like it if we use their functions... # if [ ! -x $DISTFUNCS ] then # Provide our own versions of these functions status() { $HA_BIN/heartbeat -s } echo_failure() { EchoEsc Heartbeat failure [rc=$1]. $rc_failed return $1 }
Re: [Linux-HA] Watchdog configuration with SBD
On 2008-09-08T18:04:00, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote: I'm trying SBD that introduced into the latest lha-2.1 repository. In SBD official document(http://www.linux-ha.org/SBD_Fencing), watchdog is recommended to be used, but I'm torn between enable watchdog using sbd watch command option -W and enable watchdog using watchdog directive in ha.cf. I'd suggest to use the sbd one - that's the one you want to definitely protect in an sbd configuration. Please point it out if it is necessary to abolish an existing setting, and to use sbd watch command with -W option. # Or are both configurations needed? You cannot use both; in that case, it'd continue running until both fail, which is not recommended. Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems