On Tue, 2010-12-07 at 09:54 +0100, Erwan Loaëc wrote:
> Hello!
> 
> For information, since I've updated our autofs with the suggested patch, 
> the problem has not occurred, and I don't see any CPU usage problem.

Thanks for letting me know.
I'll commit that change soon as I get a chance.

> 
> Thanks!!
> 
> I hope this problem is definitively solved :o)
> 
> --
> Erwan Loaec
> 
> 
> Ian Kent wrote:
> > On Tue, 2010-10-26 at 15:22 +0200, Erwan Loaëc wrote:
> >> Hi,
> >>
> >> First, thank you for spending time to analyse the problem.
> >>
> >> I've apply the patch in our dev environnement, and defined a slow 
> >> timeout for the mount (60s)
> >>
> >> FYI, in production, in order to limit this problem, the timeout is set 
> >> to 2 hours.
> >>
> >> When it will be possible, I'll update every autofs. The problem is quite 
> >> rare, so I won't be able to attest that the problem is solve or not 
> >> immediatly.
> > 
> > Right, that sounds sensible since I can't work out how it gets through.
> > 
> > I also need to know that the change doesn't cause any unwanted side
> > effects so let me know how things go.
> > 
> > Adding an is_mounted() check in a frequently called function like this
> > can have an overhead if your using the old ioctl interface, which you
> > are by the look of the log. This should only show up if you have a
> > largish number of mounts. How many is a large number is very much site
> > dependent but if the CPU usage of the daemon is acceptable to you then
> > you don't need to worry about it.
> > 
> >> In addition, can you tell me why the patch is "autofs-5.0.3 - fix expire 
> >> race" and not autofs-5.0.5 ?
> > 
> > That's just a typo, I'll rename the patch before committing it.
> > 
> >> Thanks,
> >>
> >> Erwan
> >>
> >>
> >> Ian Kent wrote:
> >>> On Tue, 2010-10-26 at 09:22 +0200, Erwan Loaëc wrote:
> >>>> Sorry for my previous mail with missing parts...
> >>>>
> >>>>
> >>>> Ian Kent wrote:
> >>>>> On Fri, 2010-10-22 at 17:19 +0200, Erwan Loaëc wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> Sorry for the time, but before posting again I've upgrade every 
> >>>>>> production servers with newest autofs4 module (with last patch) and 
> >>>>>> last 
> >>>>> What does "newest autofs4 module" mean exactly?
> >>>> Yes it is not the "newest module" but the module recompiled with the 
> >>>> patch autofs4-2.6.26-v5-update-20090903.patch
> >>> OK.
> >>>
> >>>>>> automount with all patch EXCEPT these:
> >>>>>>
> >>>>>> autofs-5.0.5-fix-restart.patch
> >>>>>> autofs-5.0.5-fix-status-privilege-error.patch
> >>>>>> autofs-5.0.4-always-read-file-maps-mount-lookup-map-read-fix.patch
> >>>>>> autofs-5.0.5-fix-direct-map-not-updating-on-reread.patch
> >>>>>> autofs-5.0.5-add-external-bind-method.patch
> >>>>>> autofs-5.0.5-fix-add-simple-bind-auth.patch
> >>>>>> autofs-5.0.5-add-dump-maps-option.patch
> >>>>>> autofs-5.0.5-fix-submount-shutdown-wait.patch
> >>>>>>
> >>>>>>
> >>>>>> Today I had the same behaviour than the issue explained my previous 
> >>>>>> mail.
> >>>>>>
> >>>>>> --------------------
> >>>>>> Oct 22 16:44:06 SERVERNAME automount[2665]: umount_autofs_offset: 
> >>>>>> couldn't get ioctl fd for offset /cifs/XXXXXXX/volume: No such file or 
> >>>>>> directory
> >>>>>> Oct 22 16:44:06 SERVERNAME automount[2665]: 
> >>>>>> handle_packet_missing_direct:1363: can't find map entry for 
> >>>>>> (20,3548545)
> >>>>>> --------------------
> >>>>> This could be caused by a umount returning success when in fact it
> >>>>> didn't succeed with the umount. Are you sure umount is returning correct
> >>>>> status?
> >>>> Unfortunaly the last case occured on production server with debug 
> >>>> disable. I can't find more information in logfile...
> >>>>
> >>>> But, as I've explained in my previous mail, this could logical with the 
> >>>> bad sequence found in my previous case:
> >>>>
> >>>> *Call to the share /cifs/XXXXXXX/volume
> >>>> *Mount /cifs/XXXXXXX/volume expiring
> >>>> *New call to the share /cifs/XXXXXXX/volume
> >>>> *Umount /cifs/XXXXXXX/volume
> >>> But that isn't quite what the problem is.
> >>>
> >>> I've had a closer look at the log and:
> >>>
> >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_direct: 
> >>> token 1526, name /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: expiring path 
> >>> /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path 
> >>> /cifs/SERV2/TM_termoz incl 1
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_subtree_mounts: 
> >>> unmounting dir = /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1526
> >>>
> >>> /cifs/SERV2/TM_termoz is umounted.
> >>>
> >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5
> >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet_missing_direct: 
> >>> token 1527, name /cifs/SERV2/TM_termoz, request pid 29523
> >>> Aug 26 10:11:42 bacchus automount[17827]: attempting to mount entry 
> >>> /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: lookup_mount: lookup(program): 
> >>> /cifs/SERV2/TM_termoz -> 
> >>> -fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
> >>>  ://SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): 
> >>> expanded entry: 
> >>> -fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
> >>>  ://SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: parse_mount: parse(sun): 
> >>> gathered options: 
> >>> fstype=cifs,file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
> >>> Aug 26 10:11:42 bacchus automount[17827]: sun_mount: parse(sun): mounting 
> >>> root /tmp/auto1I7gB7, mountpoint /cifs/SERV2/TM_termoz, what 
> >>> //SERV2/TM_termoz, fstype cifs, options 
> >>> file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
> >>> Aug 26 10:11:42 bacchus automount[17827]: do_mount: //SERV2/TM_termoz 
> >>> /cifs/SERV2/TM_termoz type cifs options 
> >>> file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
> >>>  using module generic
> >>> Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): 
> >>> calling mkdir_path /tmp/auto1I7gB7
> >>> Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): 
> >>> calling mount -t cifs -s -o 
> >>> file_mode=0644,dir_mode=0755,uid=uniok,gid=uniok,credentials=/etc/mycredfile
> >>>  //SERV2/TM_termoz /tmp/auto1I7gB7
> >>> Aug 26 10:11:42 bacchus automount[17827]: mount_mount: mount(generic): 
> >>> mounted //SERV2/TM_termoz type cifs on /tmp/auto1I7gB7
> >>> Aug 26 10:11:42 bacchus automount[17827]: move_mount: moved mount tree 
> >>> from /tmp/auto1I7gB7 to /cifs/SERV2/TM_termoz
> >>>
> >>> And is then mounted again here, but the request is not complete and
> >>> we get an expire, which means that the kernel saw the tree of mounts
> >>> below /cifs/SERV2 as not busy. This means the dentry was selected
> >>> for expire before the mount, and in fact before the path walk which
> >>> caused it even started.
> >>>
> >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 4
> >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet_expire_indirect: 
> >>> token 1528, name SERV2
> >>> Aug 26 10:11:42 bacchus automount[17827]: expiring path /cifs/SERV2
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi: path /cifs/SERV2 
> >>> incl 1
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/01_MARKETING
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/01_MARKETING
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/02_ADMINISTRATIF
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/02_ADMINISTRATIF
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/03_EXPLOITATION
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/03_EXPLOITATION
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/04_SERVICE-CLIENTS
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/04_SERVICE-CLIENTS
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/05_TELECHARGEMENTS
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/05_TELECHARGEMENTS
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/06_UTILISATEURS
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/06_UTILISATEURS
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/07_PRODUCTION
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/07_PRODUCTION
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/08_BDD
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/08_BDD
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/ADMIN$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/ADMIN$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/C$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/C$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/CGIN
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/CGIN
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/D$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/D$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/F$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/F$
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/ImageMagick-6.3.3V6
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/ImageMagick-6.3.3V6
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_autofs_offset: couldn't 
> >>> get ioctl fd for offset /cifs/SERV2/TM_termoz: No such file or directory
> >>> Aug 26 10:11:42 bacchus automount[17827]: forcing umount of offset mount 
> >>> /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: failed to remove dir 
> >>> /cifs/SERV2/TM_termoz: Device or resource busy
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/Projets
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/Projets
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/Projets_Air
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/Projets_Air
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/termozs
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/termozs
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/packages
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/packages
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/powervault
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/powervault
> >>> Aug 26 10:11:42 bacchus automount[17827]: umount_multi_triggers: umount 
> >>> offset /cifs/SERV2/temp
> >>> Aug 26 10:11:42 bacchus automount[17827]: umounted offset mount 
> >>> /cifs/SERV2/temp
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/01_MARKETING
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/02_ADMINISTRATIF
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/03_EXPLOITATION
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/04_SERVICE-CLIENTS
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/05_TELECHARGEMENTS
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/06_UTILISATEURS
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/07_PRODUCTION
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/08_BDD
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/ADMIN$
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/C$
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/CGIN
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/D$
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/F$
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/ImageMagick-6.3.3V6
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/TM_termoz
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/Projets
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/Projets_Air
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/termozs
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/packages
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/powervault
> >>> Aug 26 10:11:42 bacchus automount[17827]: cache_delete_offset_list: 
> >>> deleting offset key /cifs/SERV2/temp
> >>>
> >>> Now we are trying to umount the tree of autofs trigger mounts but one of
> >>> the trigger mounts now has a mount on it which causes the problem.
> >>>
> >>> Aug 26 10:11:42 bacchus automount[17827]: ioctl_send_ready: token = 1527
> >>> Aug 26 10:11:42 bacchus automount[17827]: mounted /cifs/SERV2/TM_termoz
> >>>
> >>> Now the mount completes.
> >>>
> >>> It is the mount not completing before the expire which causes autofs to
> >>> not recognize that the tree has become busy which should prevent it from
> >>> attempting to umount it.
> >>>
> >>> The check to determine if the umount of the tree can go ahead is done by
> >>> checking if any of these autofs trigger mounts have a open file handle
> >>> (which is used for control operations when they have a mount on top of
> >>> them). But, during the mount, the file handle is not set until the time
> >>> we see the "ioctl_send_ready: token = 1527" message so it is missed by
> >>> the expire check.
> >>>
> >>> Aug 26 10:11:42 bacchus automount[17827]: handle_packet: type = 5
> >>> Aug 26 10:11:42 bacchus automount[17827]: rm_unwanted_fn: removing 
> >>> directory /cifs/SERV2
> >>> Aug 26 10:11:42 bacchus automount[17827]: unable to remove directory 
> >>> /cifs/SERV2: Directory not empty
> >>> Aug 26 10:11:42 bacchus automount[17827]: expired /cifs/SERV2
> >>> Aug 26 10:11:42 bacchus automount[17827]: 
> >>> handle_packet_missing_direct:1323: can't find map entry for (46,12396497)
> >>>
> >>> If we ever get any of these then it is game over.
> >>> They are fatal and, although you will see a hang because a lock is not
> >>> released, you must start over anyway. There is no other option.
> >>>
> >>> I don't know why the above expire is getting through, it shouldn't, but
> >>> user space does need to handle this case anyway.
> >>>
> >>> I've done some basic testing and this still needs work (like a problem
> >>> description) but I'm interested to know if this helps.
> >>>
> >>> autofs-5.0.3 - fix expire race
> >>>
> >>> From: Ian Kent <ra...@themaw.net>
> >>>
> >>>
> >>> ---
> >>>
> >>>  lib/mounts.c |    5 ++++-
> >>>  1 files changed, 4 insertions(+), 1 deletions(-)
> >>>
> >>>
> >>> diff --git a/lib/mounts.c b/lib/mounts.c
> >>> index 4c44982..f26579e 100644
> >>> --- a/lib/mounts.c
> >>> +++ b/lib/mounts.c
> >>> @@ -1525,8 +1525,11 @@ int umount_multi_triggers(struct autofs_point *ap, 
> >>> struct mapent *me, char *root
> >>>           oe_base = oe->key + strlen(root);
> >>>           left += umount_multi_triggers(ap, oe, root, oe_base);
> >>>  
> >>> -         if (oe->ioctlfd != -1)
> >>> +         if (oe->ioctlfd != -1 ||
> >>> +             is_mounted(_PROC_MOUNTS, oe->key, MNTS_REAL)) {
> >>>                   left++;
> >>> +                 break;
> >>> +         }
> >>>   }
> >>>  
> >>>   if (left)
> >>>
> >>>
> >> _______________________________________________
> >> autofs mailing list
> >> autofs@linux.kernel.org
> >> http://linux.kernel.org/mailman/listinfo/autofs
> > 
> > 
> 
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs


_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to