On Tue, 2010-12-07 at 09:54 +0100, Erwan Loaëc wrote: > Hello! > > For information, since I've updated our autofs with the suggested patch, > the problem has not occurred, and I don't see any CPU usage problem.
Thanks for letting me know. I'll commit that change soon as I get a chance. > > Thanks!! > > I hope this problem is definitively solved :o) > > -- > Erwan Loaec > > > Ian Kent wrote: > > On Tue, 2010-10-26 at 15:22 +0200, Erwan Loaëc wrote: > >> Hi, > >> > >> First, thank you for spending time to analyse the problem. > >> > >> I've apply the patch in our dev environnement, and defined a slow > >> timeout for the mount (60s) > >> > >> FYI, in production, in order to limit this problem, the timeout is set > >> to 2 hours. > >> > >> When it will be possible, I'll update every autofs. The problem is quite > >> rare, so I won't be able to attest that the problem is solve or not > >> immediatly. > > > > Right, that sounds sensible since I can't work out how it gets through. > > > > I also need to know that the change doesn't cause any unwanted side > > effects so let me know how things go. > > > > Adding an is_mounted() check in a frequently called function like this > > can have an overhead if your using the old ioctl interface, which you > > are by the look of the log. This should only show up if you have a > > largish number of mounts. How many is a large number is very much site > > dependent but if the CPU usage of the daemon is acceptable to you then > > you don't need to worry about it. > > > >> In addition, can you tell me why the patch is "autofs-5.0.3 - fix expire > >> race" and not autofs-5.0.5 ? > > > > That's just a typo, I'll rename the patch before committing it. > > > >> Thanks, > >> > >> Erwan > >> > >> > >> Ian Kent wrote: > >>> On Tue, 2010-10-26 at 09:22 +0200, Erwan Loaëc wrote: > >>>> Sorry for my previous mail with missing parts... > >>>> > >>>> > >>>> Ian Kent wrote: > >>>>> On Fri, 2010-10-22 at 17:19 +0200, Erwan Loaëc wrote: > >>>>>> Hello, > >>>>>> > >>>>>> Sorry for the time, but before posting again I've upgrade every > >>>>>> production servers with newest autofs4 module (with last patch) and > >>>>>> last > >>>>> What does "newest autofs4 module" mean exactly? > >>>> Yes it is not the "newest module" but the module recompiled with the > >>>> patch autofs4-2.6.26-v5-update-20090903.patch > >>> OK. > >>> > >>>>>> automount with all patch EXCEPT these: > >>>>>> > >>>>>> autofs-5.0.5-fix-restart.patch > >>>>>> autofs-5.0.5-fix-status-privilege-error.patch > >>>>>> autofs-5.0.4-always-read-file-maps-mount-lookup-map-read-fix.patch > >>>>>> autofs-5.0.5-fix-direct-map-not-updating-on-reread.patch > >>>>>> autofs-5.0.5-add-external-bind-method.patch > >>>>>> autofs-5.0.5-fix-add-simple-bind-auth.patch > >>>>>> autofs-5.0.5-add-dump-maps-option.patch > >>>>>> autofs-5.0.5-fix-submount-shutdown-wait.patch > >>>>>> > >>>>>> > >>>>>> Today I had the same behaviour than the issue explained my previous > >>>>>> mail. > >>>>>> > >>>>>> -------------------- > >>>>>> Oct 22 16:44:06 SERVERNAME automount[2665]: umount_autofs_offset: > >>>>>> couldn't get ioctl fd for offset /cifs/XXXXXXX/volume: No such file or > >>>>>> directory > >>>>>> Oct 22 16:44:06 SERVERNAME automount[2665]: > >>>>>> handle_packet_missing_direct:1363: can't find map entry for > >>>>>> (20,3548545) > >>>>>> -------------------- > >>>>> This could be caused by a umount returning success when in fact it > >>>>> didn't succeed with the umount. Are you sure umount is returning correct > >>>>> status? > >>>> Unfortunaly the last case occured on production server with debug > >>>> disable. I can't find more information in logfile... > >>>> > >>>> But, as I've explained in my previous mail, this could logical with the > >>>> bad sequence found in my previous case: > >>>> > >>>> *Call to the share /cifs/XXXXXXX/volume > >>>> *Mount /cifs/XXXXXXX/volume expiring > >>>> *New call to the share /cifs/XXXXXXX/volume > >>>> *Umount /cifs/XXXXXXX/volume > >>> But that isn't quite what the problem is. > >>> > >>> I've had a closer look at the log and: > >>> > >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_direct: > >>> token 1526, name /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: expiring path > >>> /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path > >>> /cifs/SERV2/TM_termoz incl 1 > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_subtree_mounts: > >>> unmounting dir = /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1526 > >>> > >>> /cifs/SERV2/TM_termoz is umounted. > >>> > >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5 > >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet_missing_direct: > >>> token 1527, name /cifs/SERV2/TM_termoz, request pid 29523 > >>> Aug 26 10:11:42 bacchus automount[17827]: attempting to mount entry > >>> /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: lookup_mount: lookup(program): > >>> /cifs/SERV2/TM_termoz -> > >>> -fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile > >>> ://SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): > >>> expanded entry: > >>> -fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile > >>> ://SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): > >>> gathered options: > >>> fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile > >>> Aug 26 10:11:42 bacchus automount[17827]: sun_mount: parse(sun): mounting > >>> root /tmp/auto1I7gB7, mountpoint /cifs/SERV2/TM_termoz, what > >>> //SERV2/TM_termoz, fstype cifs, options > >>> file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile > >>> Aug 26 10:11:42 bacchus automount[17827]: do_mount: //SERV2/TM_termoz > >>> /cifs/SERV2/TM_termoz type cifs options > >>> file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile > >>> using module generic > >>> Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): > >>> calling mkdir_path /tmp/auto1I7gB7 > >>> Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): > >>> calling mount -t cifs -s -o > >>> file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile > >>> //SERV2/TM_termoz /tmp/auto1I7gB7 > >>> Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): > >>> mounted //SERV2/TM_termoz type cifs on /tmp/auto1I7gB7 > >>> Aug 26 10:11:42 bacchus automount[17827]: move_mount: moved mount tree > >>> from /tmp/auto1I7gB7 to /cifs/SERV2/TM_termoz > >>> > >>> And is then mounted again here, but the request is not complete and > >>> we get an expire, which means that the kernel saw the tree of mounts > >>> below /cifs/SERV2 as not busy. This means the dentry was selected > >>> for expire before the mount, and in fact before the path walk which > >>> caused it even started. > >>> > >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 4 > >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_indirect: > >>> token 1528, name SERV2 > >>> Aug 26 10:11:42 bacchus automount[17827]: expiring path /cifs/SERV2 > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path /cifs/SERV2 > >>> incl 1 > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/01_MARKETING > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/01_MARKETING > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/02_ADMINISTRATIF > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/02_ADMINISTRATIF > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/03_EXPLOITATION > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/03_EXPLOITATION > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/04_SERVICE-CLIENTS > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/04_SERVICE-CLIENTS > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/05_TELECHARGEMENTS > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/05_TELECHARGEMENTS > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/06_UTILISATEURS > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/06_UTILISATEURS > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/07_PRODUCTION > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/07_PRODUCTION > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/08_BDD > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/08_BDD > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/ADMIN$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/ADMIN$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/C$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/C$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/CGIN > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/CGIN > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/D$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/D$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/F$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/F$ > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/ImageMagick-6.3.3V6 > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/ImageMagick-6.3.3V6 > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_autofs_offset: couldn't > >>> get ioctl fd for offset /cifs/SERV2/TM_termoz: No such file or directory > >>> Aug 26 10:11:42 bacchus automount[17827]: forcing umount of offset mount > >>> /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: failed to remove dir > >>> /cifs/SERV2/TM_termoz: Device or resource busy > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/Projets > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/Projets > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/Projets_Air > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/Projets_Air > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/termozs > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/termozs > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/packages > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/packages > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/powervault > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/powervault > >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount > >>> offset /cifs/SERV2/temp > >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount > >>> /cifs/SERV2/temp > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/01_MARKETING > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/02_ADMINISTRATIF > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/03_EXPLOITATION > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/04_SERVICE-CLIENTS > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/05_TELECHARGEMENTS > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/06_UTILISATEURS > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/07_PRODUCTION > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/08_BDD > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/ADMIN$ > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/C$ > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/CGIN > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/D$ > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/F$ > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/ImageMagick-6.3.3V6 > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/TM_termoz > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/Projets > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/Projets_Air > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/termozs > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/packages > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/powervault > >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: > >>> deleting offset key /cifs/SERV2/temp > >>> > >>> Now we are trying to umount the tree of autofs trigger mounts but one of > >>> the trigger mounts now has a mount on it which causes the problem. > >>> > >>> Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1527 > >>> Aug 26 10:11:42 bacchus automount[17827]: mounted /cifs/SERV2/TM_termoz > >>> > >>> Now the mount completes. > >>> > >>> It is the mount not completing before the expire which causes autofs to > >>> not recognize that the tree has become busy which should prevent it from > >>> attempting to umount it. > >>> > >>> The check to determine if the umount of the tree can go ahead is done by > >>> checking if any of these autofs trigger mounts have a open file handle > >>> (which is used for control operations when they have a mount on top of > >>> them). But, during the mount, the file handle is not set until the time > >>> we see the "ioctl_send_ready: token = 1527" message so it is missed by > >>> the expire check. > >>> > >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5 > >>> Aug 26 10:11:42 bacchus automount[17827]: rm_unwanted_fn: removing > >>> directory /cifs/SERV2 > >>> Aug 26 10:11:42 bacchus automount[17827]: unable to remove directory > >>> /cifs/SERV2: Directory not empty > >>> Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2 > >>> Aug 26 10:11:42 bacchus automount[17827]: > >>> handle_packet_missing_direct:1323: can't find map entry for (46,12396497) > >>> > >>> If we ever get any of these then it is game over. > >>> They are fatal and, although you will see a hang because a lock is not > >>> released, you must start over anyway. There is no other option. > >>> > >>> I don't know why the above expire is getting through, it shouldn't, but > >>> user space does need to handle this case anyway. > >>> > >>> I've done some basic testing and this still needs work (like a problem > >>> description) but I'm interested to know if this helps. > >>> > >>> autofs-5.0.3 - fix expire race > >>> > >>> From: Ian Kent <ra...@themaw.net> > >>> > >>> > >>> --- > >>> > >>> lib/mounts.c | 5 ++++- > >>> 1 files changed, 4 insertions(+), 1 deletions(-) > >>> > >>> > >>> diff --git a/lib/mounts.c b/lib/mounts.c > >>> index 4c44982..f26579e 100644 > >>> --- a/lib/mounts.c > >>> +++ b/lib/mounts.c > >>> @@ -1525,8 +1525,11 @@ int umount_multi_triggers(struct autofs_point *ap, > >>> struct mapent *me, char *root > >>> oe_base = oe->key + strlen(root); > >>> left += umount_multi_triggers(ap, oe, root, oe_base); > >>> > >>> - if (oe->ioctlfd != -1) > >>> + if (oe->ioctlfd != -1 || > >>> + is_mounted(_PROC_MOUNTS, oe->key, MNTS_REAL)) { > >>> left++; > >>> + break; > >>> + } > >>> } > >>> > >>> if (left) > >>> > >>> > >> _______________________________________________ > >> autofs mailing list > >> autofs@linux.kernel.org > >> http://linux.kernel.org/mailman/listinfo/autofs > > > > > > _______________________________________________ > autofs mailing list > autofs@linux.kernel.org > http://linux.kernel.org/mailman/listinfo/autofs _______________________________________________ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs