Hi,

First, thank you for spending time to analyse the problem.

I've apply the patch in our dev environnement, and defined a slow timeout for the mount (60s)

FYI, in production, in order to limit this problem, the timeout is set to 2 hours.

When it will be possible, I'll update every autofs. The problem is quite rare, so I won't be able to attest that the problem is solve or not immediatly.

In addition, can you tell me why the patch is "autofs-5.0.3 - fix expire race" and not autofs-5.0.5 ?

Thanks,

Erwan


Ian Kent wrote:
On Tue, 2010-10-26 at 09:22 +0200, Erwan Loaëc wrote:
Sorry for my previous mail with missing parts...


Ian Kent wrote:
On Fri, 2010-10-22 at 17:19 +0200, Erwan Loaëc wrote:
Hello,

Sorry for the time, but before posting again I've upgrade every production servers with newest autofs4 module (with last patch) and last
What does "newest autofs4 module" mean exactly?

Yes it is not the "newest module" but the module recompiled with the patch autofs4-2.6.26-v5-update-20090903.patch

OK.

automount with all patch EXCEPT these:

autofs-5.0.5-fix-restart.patch
autofs-5.0.5-fix-status-privilege-error.patch
autofs-5.0.4-always-read-file-maps-mount-lookup-map-read-fix.patch
autofs-5.0.5-fix-direct-map-not-updating-on-reread.patch
autofs-5.0.5-add-external-bind-method.patch
autofs-5.0.5-fix-add-simple-bind-auth.patch
autofs-5.0.5-add-dump-maps-option.patch
autofs-5.0.5-fix-submount-shutdown-wait.patch


Today I had the same behaviour than the issue explained my previous mail.

--------------------
Oct 22 16:44:06 SERVERNAME automount[2665]: umount_autofs_offset: couldn't get ioctl fd for offset /cifs/XXXXXXX/volume: No such file or directory Oct 22 16:44:06 SERVERNAME automount[2665]: handle_packet_missing_direct:1363: can't find map entry for (20,3548545)
--------------------
This could be caused by a umount returning success when in fact it
didn't succeed with the umount. Are you sure umount is returning correct
status?
Unfortunaly the last case occured on production server with debug disable. I can't find more information in logfile...

But, as I've explained in my previous mail, this could logical with the bad sequence found in my previous case:

*Call to the share /cifs/XXXXXXX/volume
*Mount /cifs/XXXXXXX/volume expiring
*New call to the share /cifs/XXXXXXX/volume
*Umount /cifs/XXXXXXX/volume

But that isn't quite what the problem is.

I've had a closer look at the log and:

Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_direct: token 
1526, name /cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: expiring path /cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path 
/cifs/SERV2/TM_termoz incl 1
Aug 26 10:11:42 bacchus automount[17827]: umount_subtree_mounts: unmounting dir 
= /cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1526

/cifs/SERV2/TM_termoz is umounted.

Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5
Aug 26 10:11:42 bacchus automount[17827]: handle_packet_missing_direct: token 
1527, name /cifs/SERV2/TM_termoz, request pid 29523
Aug 26 10:11:42 bacchus automount[17827]: attempting to mount entry 
/cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: lookup_mount: lookup(program): 
/cifs/SERV2/TM_termoz -> 
-fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
 ://SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): expanded 
entry: 
-fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
 ://SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): gathered 
options: 
fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
Aug 26 10:11:42 bacchus automount[17827]: sun_mount: parse(sun): mounting root 
/tmp/auto1I7gB7, mountpoint /cifs/SERV2/TM_termoz, what //SERV2/TM_termoz, 
fstype cifs, options 
file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
Aug 26 10:11:42 bacchus automount[17827]: do_mount: //SERV2/TM_termoz 
/cifs/SERV2/TM_termoz type cifs options 
file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile 
using module generic
Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): calling 
mkdir_path /tmp/auto1I7gB7
Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): calling 
mount -t cifs -s -o 
file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile 
//SERV2/TM_termoz /tmp/auto1I7gB7
Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): mounted 
//SERV2/TM_termoz type cifs on /tmp/auto1I7gB7
Aug 26 10:11:42 bacchus automount[17827]: move_mount: moved mount tree from 
/tmp/auto1I7gB7 to /cifs/SERV2/TM_termoz

And is then mounted again here, but the request is not complete and
we get an expire, which means that the kernel saw the tree of mounts
below /cifs/SERV2 as not busy. This means the dentry was selected
for expire before the mount, and in fact before the path walk which
caused it even started.

Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 4
Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_indirect: token 
1528, name SERV2
Aug 26 10:11:42 bacchus automount[17827]: expiring path /cifs/SERV2
Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path /cifs/SERV2 incl 1
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/01_MARKETING
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/01_MARKETING
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/02_ADMINISTRATIF
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/02_ADMINISTRATIF
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/03_EXPLOITATION
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/03_EXPLOITATION
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/04_SERVICE-CLIENTS
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/04_SERVICE-CLIENTS
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/05_TELECHARGEMENTS
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/05_TELECHARGEMENTS
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/06_UTILISATEURS
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/06_UTILISATEURS
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/07_PRODUCTION
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/07_PRODUCTION
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/08_BDD
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/08_BDD
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/ADMIN$
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/ADMIN$
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/C$
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/C$
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/CGIN
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/CGIN
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/D$
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/D$
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/F$
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/F$
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/ImageMagick-6.3.3V6
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/ImageMagick-6.3.3V6
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: umount_autofs_offset: couldn't get 
ioctl fd for offset /cifs/SERV2/TM_termoz: No such file or directory
Aug 26 10:11:42 bacchus automount[17827]: forcing umount of offset mount 
/cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: failed to remove dir 
/cifs/SERV2/TM_termoz: Device or resource busy
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/Projets
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/Projets
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/Projets_Air
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/Projets_Air
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/termozs
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/termozs
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/packages
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/packages
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/powervault
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
/cifs/SERV2/powervault
Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset 
/cifs/SERV2/temp
Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/temp
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/01_MARKETING
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/02_ADMINISTRATIF
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/03_EXPLOITATION
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/04_SERVICE-CLIENTS
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/05_TELECHARGEMENTS
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/06_UTILISATEURS
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/07_PRODUCTION
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/08_BDD
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/ADMIN$
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/C$
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/CGIN
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/D$
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/F$
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/ImageMagick-6.3.3V6
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/TM_termoz
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/Projets
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/Projets_Air
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/termozs
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/packages
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/powervault
Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting 
offset key /cifs/SERV2/temp

Now we are trying to umount the tree of autofs trigger mounts but one of
the trigger mounts now has a mount on it which causes the problem.

Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1527
Aug 26 10:11:42 bacchus automount[17827]: mounted /cifs/SERV2/TM_termoz

Now the mount completes.

It is the mount not completing before the expire which causes autofs to
not recognize that the tree has become busy which should prevent it from
attempting to umount it.

The check to determine if the umount of the tree can go ahead is done by
checking if any of these autofs trigger mounts have a open file handle
(which is used for control operations when they have a mount on top of
them). But, during the mount, the file handle is not set until the time
we see the "ioctl_send_ready: token = 1527" message so it is missed by
the expire check.

Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5
Aug 26 10:11:42 bacchus automount[17827]: rm_unwanted_fn: removing directory 
/cifs/SERV2
Aug 26 10:11:42 bacchus automount[17827]: unable to remove directory 
/cifs/SERV2: Directory not empty
Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2
Aug 26 10:11:42 bacchus automount[17827]: handle_packet_missing_direct:1323: 
can't find map entry for (46,12396497)

If we ever get any of these then it is game over.
They are fatal and, although you will see a hang because a lock is not
released, you must start over anyway. There is no other option.

I don't know why the above expire is getting through, it shouldn't, but
user space does need to handle this case anyway.

I've done some basic testing and this still needs work (like a problem
description) but I'm interested to know if this helps.

autofs-5.0.3 - fix expire race

From: Ian Kent <ra...@themaw.net>


---

 lib/mounts.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)


diff --git a/lib/mounts.c b/lib/mounts.c
index 4c44982..f26579e 100644
--- a/lib/mounts.c
+++ b/lib/mounts.c
@@ -1525,8 +1525,11 @@ int umount_multi_triggers(struct autofs_point *ap, 
struct mapent *me, char *root
                oe_base = oe->key + strlen(root);
                left += umount_multi_triggers(ap, oe, root, oe_base);
- if (oe->ioctlfd != -1)
+               if (oe->ioctlfd != -1 ||
+                   is_mounted(_PROC_MOUNTS, oe->key, MNTS_REAL)) {
                        left++;
+                       break;
+               }
        }
if (left)



_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to