On Tue, 2010-10-26 at 09:22 +0200, Erwan Loaëc wrote: > Sorry for my previous mail with missing parts... > > > Ian Kent wrote: > > On Fri, 2010-10-22 at 17:19 +0200, Erwan Loaëc wrote: > >> Hello, > >> > >> Sorry for the time, but before posting again I've upgrade every > >> production servers with newest autofs4 module (with last patch) and last > > > > What does "newest autofs4 module" mean exactly? > > > Yes it is not the "newest module" but the module recompiled with the > patch autofs4-2.6.26-v5-update-20090903.patch
OK. > > > > >> automount with all patch EXCEPT these: > >> > >> autofs-5.0.5-fix-restart.patch > >> autofs-5.0.5-fix-status-privilege-error.patch > >> autofs-5.0.4-always-read-file-maps-mount-lookup-map-read-fix.patch > >> autofs-5.0.5-fix-direct-map-not-updating-on-reread.patch > >> autofs-5.0.5-add-external-bind-method.patch > >> autofs-5.0.5-fix-add-simple-bind-auth.patch > >> autofs-5.0.5-add-dump-maps-option.patch > >> autofs-5.0.5-fix-submount-shutdown-wait.patch > >> > >> > >> Today I had the same behaviour than the issue explained my previous mail. > >> > >> -------------------- > >> Oct 22 16:44:06 SERVERNAME automount[2665]: umount_autofs_offset: > >> couldn't get ioctl fd for offset /cifs/XXXXXXX/volume: No such file or > >> directory > >> Oct 22 16:44:06 SERVERNAME automount[2665]: > >> handle_packet_missing_direct:1363: can't find map entry for (20,3548545) > >> -------------------- > > > > This could be caused by a umount returning success when in fact it > > didn't succeed with the umount. Are you sure umount is returning correct > > status? > > Unfortunaly the last case occured on production server with debug > disable. I can't find more information in logfile... > > But, as I've explained in my previous mail, this could logical with the > bad sequence found in my previous case: > > *Call to the share /cifs/XXXXXXX/volume > *Mount /cifs/XXXXXXX/volume expiring > *New call to the share /cifs/XXXXXXX/volume > *Umount /cifs/XXXXXXX/volume But that isn't quite what the problem is. I've had a closer look at the log and: Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_direct: token 1526, name /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: expiring path /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path /cifs/SERV2/TM_termoz incl 1 Aug 26 10:11:42 bacchus automount[17827]: umount_subtree_mounts: unmounting dir = /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1526 /cifs/SERV2/TM_termoz is umounted. Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5 Aug 26 10:11:42 bacchus automount[17827]: handle_packet_missing_direct: token 1527, name /cifs/SERV2/TM_termoz, request pid 29523 Aug 26 10:11:42 bacchus automount[17827]: attempting to mount entry /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: lookup_mount: lookup(program): /cifs/SERV2/TM_termoz -> -fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile ://SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): expanded entry: -fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile ://SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): gathered options: fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile Aug 26 10:11:42 bacchus automount[17827]: sun_mount: parse(sun): mounting root /tmp/auto1I7gB7, mountpoint /cifs/SERV2/TM_termoz, what //SERV2/TM_termoz, fstype cifs, options file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile Aug 26 10:11:42 bacchus automount[17827]: do_mount: //SERV2/TM_termoz /cifs/SERV2/TM_termoz type cifs options file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile using module generic Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): calling mkdir_path /tmp/auto1I7gB7 Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): calling mount -t cifs -s -o file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile //SERV2/TM_termoz /tmp/auto1I7gB7 Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): mounted //SERV2/TM_termoz type cifs on /tmp/auto1I7gB7 Aug 26 10:11:42 bacchus automount[17827]: move_mount: moved mount tree from /tmp/auto1I7gB7 to /cifs/SERV2/TM_termoz And is then mounted again here, but the request is not complete and we get an expire, which means that the kernel saw the tree of mounts below /cifs/SERV2 as not busy. This means the dentry was selected for expire before the mount, and in fact before the path walk which caused it even started. Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 4 Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_indirect: token 1528, name SERV2 Aug 26 10:11:42 bacchus automount[17827]: expiring path /cifs/SERV2 Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path /cifs/SERV2 incl 1 Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/01_MARKETING Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/01_MARKETING Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/02_ADMINISTRATIF Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/02_ADMINISTRATIF Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/03_EXPLOITATION Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/03_EXPLOITATION Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/04_SERVICE-CLIENTS Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/04_SERVICE-CLIENTS Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/05_TELECHARGEMENTS Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/05_TELECHARGEMENTS Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/06_UTILISATEURS Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/06_UTILISATEURS Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/07_PRODUCTION Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/07_PRODUCTION Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/08_BDD Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/08_BDD Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/ADMIN$ Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/ADMIN$ Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/C$ Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/C$ Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/CGIN Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/CGIN Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/D$ Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/D$ Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/F$ Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/F$ Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/ImageMagick-6.3.3V6 Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/ImageMagick-6.3.3V6 Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: umount_autofs_offset: couldn't get ioctl fd for offset /cifs/SERV2/TM_termoz: No such file or directory Aug 26 10:11:42 bacchus automount[17827]: forcing umount of offset mount /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: failed to remove dir /cifs/SERV2/TM_termoz: Device or resource busy Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/Projets Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/Projets Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/Projets_Air Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/Projets_Air Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/termozs Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/termozs Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/packages Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/packages Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/powervault Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/powervault Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount offset /cifs/SERV2/temp Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount /cifs/SERV2/temp Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/01_MARKETING Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/02_ADMINISTRATIF Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/03_EXPLOITATION Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/04_SERVICE-CLIENTS Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/05_TELECHARGEMENTS Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/06_UTILISATEURS Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/07_PRODUCTION Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/08_BDD Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/ADMIN$ Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/C$ Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/CGIN Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/D$ Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/F$ Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/ImageMagick-6.3.3V6 Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/TM_termoz Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/Projets Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/Projets_Air Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/termozs Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/packages Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/powervault Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: deleting offset key /cifs/SERV2/temp Now we are trying to umount the tree of autofs trigger mounts but one of the trigger mounts now has a mount on it which causes the problem. Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1527 Aug 26 10:11:42 bacchus automount[17827]: mounted /cifs/SERV2/TM_termoz Now the mount completes. It is the mount not completing before the expire which causes autofs to not recognize that the tree has become busy which should prevent it from attempting to umount it. The check to determine if the umount of the tree can go ahead is done by checking if any of these autofs trigger mounts have a open file handle (which is used for control operations when they have a mount on top of them). But, during the mount, the file handle is not set until the time we see the "ioctl_send_ready: token = 1527" message so it is missed by the expire check. Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5 Aug 26 10:11:42 bacchus automount[17827]: rm_unwanted_fn: removing directory /cifs/SERV2 Aug 26 10:11:42 bacchus automount[17827]: unable to remove directory /cifs/SERV2: Directory not empty Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2 Aug 26 10:11:42 bacchus automount[17827]: handle_packet_missing_direct:1323: can't find map entry for (46,12396497) If we ever get any of these then it is game over. They are fatal and, although you will see a hang because a lock is not released, you must start over anyway. There is no other option. I don't know why the above expire is getting through, it shouldn't, but user space does need to handle this case anyway. I've done some basic testing and this still needs work (like a problem description) but I'm interested to know if this helps. autofs-5.0.3 - fix expire race From: Ian Kent <ra...@themaw.net> --- lib/mounts.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/lib/mounts.c b/lib/mounts.c index 4c44982..f26579e 100644 --- a/lib/mounts.c +++ b/lib/mounts.c @@ -1525,8 +1525,11 @@ int umount_multi_triggers(struct autofs_point *ap, struct mapent *me, char *root oe_base = oe->key + strlen(root); left += umount_multi_triggers(ap, oe, root, oe_base); - if (oe->ioctlfd != -1) + if (oe->ioctlfd != -1 || + is_mounted(_PROC_MOUNTS, oe->key, MNTS_REAL)) { left++; + break; + } } if (left) _______________________________________________ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs