Package: ocfs2-tools
Version: 1.4.4-3
Tags: patch
Severity: important

Doing a clean shutdown of a cluster node (with 11 volumes mounted) does not
work and lead to exactly one volume with a still active heartbeat. This is
the console message on shutdown:
Stopping O2CB cluster dataserver: Failed
Unable to stop cluster as heartbeat region still active

First issue was to actually get the error message, as o2cb isn't stopped on
my system (using insserv). To make it stop, those dependencies in the init
scripts are required:
* in /etc/init.d/o2cb:
  # Required-Stop: $network
* in /etc/init.d/ocfs2:
  # Required-Stop: $local_fs $network o2cb
(see attached patch)

This leads to a system trying to shut down the ocfs2 cluster stack. The
active heartbeat region results from "sendsigs" running in parallel with
ocfs2 script trying to kill all running processes and sending a kill signal
to "umount -a -t ocfs2" ran by ocfs2 init script. This is why just one
volume (no matter how many volumes one has) needs to be recovered in the
cluster. Killing umount leads to a still active heartbeat region in the
filesystem while umount moves on to the next mounted filesystem.
To fix this adding "# X-Stop-After: sendsigs" to /etc/init.d/ocfs2 is
required.
Interesting enough, having just some ocfs2 volumes does not lead to
that condition as the unmounting does not take that long. (Depending on
your server's speed and stuff like that, of course)

I have discussed the issue with Petter Reinholdtsen (the maintainer of
insserv and initscripts that contain sendsigs) in a different bug[1]
requesting a feature in insserv to let a task (like sendsigs) run
exclusively on system shutdown.
He basically said that ocfs2-tools should put the correct dependencies in
the LSB header to make things work.

In a different bug report[2] I sent a patch to prevent mountnfs and
umountnfs from touching ocfs2 (and gfs) filesystems as this leads to error
messages about being unable to mount ocfs2 volumes without the cluster
stack.
A possible different aproach to the issue could be to ditch ocfs2 script
and force o2cb startup to happen before mountnfs and shutdown after
umountnfs. That way ocfs2 could be handled like a generic network file
system. But I guess it is not up to me to decide which way to go... ;-)

Find a patch attached that adds shutdown dependencies to LSB headers of
init scripts that works for me and seem to be sane.

Thanks,
        Adi

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=590892
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=504748
diff -Nru etc/init.d/o2cb etc/init.d/o2cb
--- etc/init.d/o2cb	2011-08-18 13:09:41.000000000 +0200
+++ etc/init.d/o2cb	2011-08-18 13:10:00.000000000 +0200
@@ -8,7 +8,7 @@
 # Provides: o2cb
 # Required-Start: $network
 # Should-Start:
-# Required-Stop:
+# Required-Stop: $network
 # Default-Start: S
 # Default-Stop: 0 6
 # Short-Description: Load O2CB cluster services at system boot.
diff -Nru etc/init.d/ocfs2 etc/init.d/ocfs2
--- etc/init.d/ocfs2	2011-08-18 13:10:27.000000000 +0200
+++ etc/init.d/ocfs2	2011-08-18 13:10:16.000000000 +0200
@@ -8,9 +8,10 @@
 ### BEGIN INIT INFO
 # Provides: ocfs2
 # Required-Start: $local_fs $network o2cb
-# Required-Stop: $local_fs
+# Required-Stop: $local_fs $network o2cb
 # X-UnitedLinux-Should-Start:
 # X-UnitedLinux-Should-Stop:
+# X-Stop-After: sendsigs
 # Default-Start: S
 # Default-Stop: 0 6
 # Short-Description: Mount OCFS2 volumes at boot.

Reply via email to