Team,

This is an old issue that persists over different HA platforms:

  *   Automounter (or autofs service) provides the ability of one FS to be 
mounted additionally on demand on specific location. Typical example of this is 
the home environment for different applications
     *   And app ID in naming services (LDAP, AD etc.) has one home environment 
- lets say "/home/appid"
     *   For different hosts this home environment might be different - on one 
host it can be "/export/home/appid/v1.0", on another host it might be 
"/export/home/appid/v2.0" etc.
     *   Still when the app is to execute something on a host it needs its home 
environment as listed in the naming layer, so we configure automounter to mount 
as /home/appid "/export/home/appid/v1.0" for the first server and 
"/export/home/appid/v2.0" for the second server.
  *   A directory used by automounter CANNOT BE USED by anyone else - any 
attempt to mount something under /home in the above example will show "device 
busy" message and will fail.
  *   File system mounted and then loopback mounted under autofs control looks 
like this:
tlsys-ucs-eng08a:/appl/test # cat /etc/auto.master
# Sample auto.master file
# Format of this file:
# mountpoint map options
# Also see variable AUTOFS_OPTIONS in /etc/sysconfig/autofs
# For details of the format look at autofs(8).
/appl   /etc/auto_appl -rw,intr,nosuid,nobrowse
...
tlsys-ucs-eng08a:/appl/test # cat /etc/auto_appl
# Local auto_appl automounter file.
# Example:
# directory             --bind          localhost:/export/appl/directory
#
# When adding entries for NFS mounts from Solaris servers add the following
# options:
# -rsize=32768,wsize=32768,nfsvers=3,tcp,retrans=5,timeo=600
# Example
# dir -rsize=32768,wsize=32768,nfsvers=3,tcp,retrans=5,timeo=600 server:/mount
...
test    --bind  localhost:/export/appl/test
tlsys-ucs-eng08a:/usr/lib/ocf/resource.d/heartbeat # cd /appl/test
tlsys-ucs-eng08a:/appl/test # mount |grep test
/dev/mapper/DG1-test on /export/appl/test type ext4 (rw,relatime)
/dev/mapper/DG1-test on /appl/test type ext4 (rw,relatime)

  *   If we try to unmount /export/appl/test in the above example we will get 
"device busy" message, but there will be no process in the processes table 
showing usage. Neither lsof will show anything regarding this FS.
  *   In case of SLES HA, attempt to stop the resource or to migrate it to 
another server will cause panic to the server as the Filesystem agent will be 
unable to stop the resource.

The above behavior is not acceptable. We have configure multiple service groups 
that can be executed independently on any of the members of a HA cluster, so on 
one host we may have more than one services. Panic on the host would disrupt 
the work of other applications.

To avoid this I modified lightly the Filesystem agent, allowing it to search 
for such cases. As a base I will use version 
resource-agents-4.4.0+git57.70549516-3.12.1.x86_64:

...
320 # Lists all filesystems potentially mounted under a given path,
321 # excluding the path itself.
322 list_submounts() {
323         list_mounts | grep " $1/" | cut -d' ' -f2 | sort -r
324 }
325
326 # FNMA - Lists automounter loopback
327 list_loopbacks() {
328         list_mounts | grep "$1" | grep -v "$2" | cut -d' ' -f2 | sort -r
329 }
330
...
649                 # for SUB in `list_submounts $MOUNTPOINT` $MOUNTPOINT; do
650                 # FNMA: original line above was modified bellow:
651                 for SUB in `list_submounts $MOUNTPOINT` `list_loopbacks 
$DEVICE $MOUNTPOINT` $MOUNTPOINT; do

Adding one extra subroutine to look for loopbacks, and during the buildup of 
the list of submounts adding it to address the issue. This makes graceful stop 
of the FS resource without the panic.

The logic of the proposed change is this - if we stop FS, we need to stop ANY 
its representations on the current host. Most likely this will be in 
preparation for the next step - to disable the vg for migration. Failure to 
release the device that has the FS will prevent the vg to be disabled.

Attached is a copy that we at Fannie Mae use with no problems. The 
inconvenience for us is with each and every patch upgrade or major version 
release we need to redo the agent modification. I believe this small change 
deserves to be part of the original code. Any thoughts?

Vladimir Yanakiev
Unix Engineer, Hosting & Engineering Services - Solution Engineering Compute
Phone: 703-833-3770 (direct) | 571-246-1946 (mobile)

Attachment: Filesystem
Description: Filesystem

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to