On Mon, Jun 13, 2011 at 11:34 PM, Ian Kent <ra...@themaw.net> wrote: > On Thu, 2011-05-19 at 20:21 -0300, Leonardo Chiquitto wrote: >> Hello, >> >> We received a support request from a customer reporting a hang in the >> automount daemon. Analyzing the core dump, it looks like automount >> can deadlock if two threads execute in the following order: >> >> (All the line numbers are adjusted to match the latest version from Git) >> >> Thread 7 (Thread 25043): Thread 6 (Thread 24007): >> >> #9 start_thread() libpthread.so.0 #3 start_thread() >> libpthread.so.0 >> #8 do_read_master() at automount.c:1259 #2 do_readmap() at >> state.c:479 >> #7 master_read_master() at master.c:853 --> >> master_mutex_lock() [state.c:462] >> --> master_mutex_unlock() >> [state.c:466] >> --> master_mutex_lock() [master.c:836] >> --> cache_writelock() [master.c:838 or :849] >> --> >> master_source_readlock() [state.c:477] >> #6 lookup_nss_read_master() at lookup.c:229 >> #5 do_read_master() at lookup.c:96 >> #4 lookup_read_master() at lookup_ldap.c:1676 >> #3 master_parse_entry() at master_parse.y:829 >> #2 master_add_map_source() at master.c:192 >> #1 master_source_writelock() at master.c:543 #1 cache_readlock() at >> cache.c:60 >> #0 pthread_rwlock_wrlock() libpthread.so.0 #0 pthread_rwlock_rdlock() >> libpthread.so.0 >> >> At this point: >> >> Thread 7: locked(master_mutex_lock) >> locked(cache_writelock) A >> waits(master_source_writelock) B >> Thread 6: locked(master_source_readlock) B >> waits(cache_readlock) A >> >> The AutoFS version is 5.0.5 plus all upstream patches up to >> autofs-5.0.5-fix-submount-shutdown-wait.patch plus >> autofs-5.0.5-fix-out-of-order-locking-in-readmap.patch >> autofs-5.0.5-fix-next-task-list-update.patch >> autofs-5.0.5-fix-stale-map-read.patch >> >> At the first sight, moving pthread_cleanup_pop(1) (and consequently >> master_mutex_unlock()) to the end of do_readmap() could avoid the >> problem, but I didn't test this yet (please see untested patch below). >> >> Any insight would be much appreciated! >> >> Thanks, >> Leonardo >> >> Index: autofs/daemon/state.c >> =================================================================== >> --- autofs.orig/daemon/state.c >> +++ autofs/daemon/state.c >> @@ -463,7 +463,6 @@ static void *do_readmap(void *arg) >> status = lookup_nss_read_map(ap, NULL, now); >> if (!status) >> pthread_exit(NULL); >> - pthread_cleanup_pop(1); >> >> if (ap->type == LKP_INDIRECT) { >> lookup_prune_cache(ap, now); >> @@ -504,6 +503,7 @@ static void *do_readmap(void *arg) >> } >> >> pthread_cleanup_pop(1); >> + pthread_cleanup_pop(1); >> >> return NULL; >> } > > > This might be all we need since once the master map is read the null > cache is set up and and can't change while we hold the read lock .... > > autofs-5.0.5 - fix null cache deadlock > > From: Ian Kent <ik...@redhat.com> > > --- > > daemon/state.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > > diff --git a/daemon/state.c b/daemon/state.c > index 3645440..51809a1 100644 > --- a/daemon/state.c > +++ b/daemon/state.c > @@ -473,11 +473,11 @@ static void *do_readmap(void *arg) > > mnts = tree_make_mnt_tree(_PROC_MOUNTS, "/"); > pthread_cleanup_push(tree_mnts_cleanup, mnts); > - pthread_cleanup_push(master_source_lock_cleanup, ap->entry); > - master_source_readlock(ap->entry); > nc = ap->entry->master->nc; > cache_readlock(nc); > pthread_cleanup_push(cache_lock_cleanup, nc); > + master_source_readlock(ap->entry); > + pthread_cleanup_push(master_source_lock_cleanup, ap->entry); > map = ap->entry->maps; > while (map) { > /* Is map source up to date or no longer valid */ >
Ian, Thanks for the patch! It really seems to be enough to resolve this issue: the problem didn't happen again after ~10 days testing. Thanks, Leonardo _______________________________________________ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs