On Fri, 2011-06-24 at 17:02 -0300, Leonardo Chiquitto wrote:
> On Mon, Jun 13, 2011 at 11:34 PM, Ian Kent <ra...@themaw.net> wrote:
> > On Thu, 2011-05-19 at 20:21 -0300, Leonardo Chiquitto wrote:
> >> Hello,
> >>
> >> We received a support request from a customer reporting a hang in the
> >> automount daemon. Analyzing the core dump, it looks like automount
> >> can deadlock if two threads execute in the following order:
> >>
> >> (All the line numbers are adjusted to match the latest version from Git)
> >>
> >> Thread 7 (Thread 25043):                      Thread 6 (Thread 24007):
> >>
> >> #9 start_thread() libpthread.so.0             #3  start_thread() 
> >> libpthread.so.0
> >> #8 do_read_master() at automount.c:1259               #2  do_readmap() at 
> >> state.c:479
> >> #7 master_read_master() at master.c:853                   --> 
> >> master_mutex_lock() [state.c:462]
> >>                                                   --> 
> >> master_mutex_unlock() [state.c:466]
> >>    --> master_mutex_lock() [master.c:836]
> >>    --> cache_writelock() [master.c:838 or :849]
> >>                                                   --> 
> >> master_source_readlock() [state.c:477]
> >> #6 lookup_nss_read_master() at lookup.c:229
> >> #5 do_read_master() at lookup.c:96
> >> #4 lookup_read_master() at lookup_ldap.c:1676
> >> #3 master_parse_entry() at master_parse.y:829
> >> #2 master_add_map_source() at master.c:192
> >> #1 master_source_writelock() at master.c:543  #1  cache_readlock() at 
> >> cache.c:60
> >> #0 pthread_rwlock_wrlock() libpthread.so.0    #0  pthread_rwlock_rdlock() 
> >> libpthread.so.0
> >>
> >> At this point:
> >>
> >>   Thread 7: locked(master_mutex_lock)
> >>           locked(cache_writelock)             A
> >>           waits(master_source_writelock)      B
> >>   Thread 6: locked(master_source_readlock)    B
> >>           waits(cache_readlock)               A
> >>
> >> The AutoFS version is 5.0.5 plus all upstream patches up to
> >> autofs-5.0.5-fix-submount-shutdown-wait.patch plus
> >> autofs-5.0.5-fix-out-of-order-locking-in-readmap.patch
> >> autofs-5.0.5-fix-next-task-list-update.patch
> >> autofs-5.0.5-fix-stale-map-read.patch
> >>
> >> At the first sight, moving pthread_cleanup_pop(1) (and consequently
> >> master_mutex_unlock()) to the end of do_readmap() could avoid the
> >> problem, but I didn't test this yet (please see untested patch below).
> >>
> >> Any insight would be much appreciated!
> >>
> >> Thanks,
> >> Leonardo
> >>
> >> Index: autofs/daemon/state.c
> >> ===================================================================
> >> --- autofs.orig/daemon/state.c
> >> +++ autofs/daemon/state.c
> >> @@ -463,7 +463,6 @@ static void *do_readmap(void *arg)
> >>       status = lookup_nss_read_map(ap, NULL, now);
> >>       if (!status)
> >>               pthread_exit(NULL);
> >> -     pthread_cleanup_pop(1);
> >>
> >>       if (ap->type == LKP_INDIRECT) {
> >>               lookup_prune_cache(ap, now);
> >> @@ -504,6 +503,7 @@ static void *do_readmap(void *arg)
> >>       }
> >>
> >>       pthread_cleanup_pop(1);
> >> +     pthread_cleanup_pop(1);
> >>
> >>       return NULL;
> >>  }
> >
> >
> > This might be all we need since once the master map is read the null
> > cache is set up and and can't change while we hold the read lock ....
> >
> > autofs-5.0.5 - fix null cache deadlock
> >
> > From: Ian Kent <ik...@redhat.com>
> >
> > ---
> >
> >  daemon/state.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> >
> >
> > diff --git a/daemon/state.c b/daemon/state.c
> > index 3645440..51809a1 100644
> > --- a/daemon/state.c
> > +++ b/daemon/state.c
> > @@ -473,11 +473,11 @@ static void *do_readmap(void *arg)
> >
> >                mnts = tree_make_mnt_tree(_PROC_MOUNTS, "/");
> >                pthread_cleanup_push(tree_mnts_cleanup, mnts);
> > -               pthread_cleanup_push(master_source_lock_cleanup, ap->entry);
> > -               master_source_readlock(ap->entry);
> >                nc = ap->entry->master->nc;
> >                cache_readlock(nc);
> >                pthread_cleanup_push(cache_lock_cleanup, nc);
> > +               master_source_readlock(ap->entry);
> > +               pthread_cleanup_push(master_source_lock_cleanup, ap->entry);
> >                map = ap->entry->maps;
> >                while (map) {
> >                        /* Is map source up to date or no longer valid */
> >
> 
> Ian,
> 
> Thanks for the patch! It really seems to be enough to resolve this issue: the
> problem didn't happen again after ~10 days testing.

Great, I'll add a description to the patch and commit it.

Ian


_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to