On Mon, Jun 13, 2011 at 11:34 PM, Ian Kent <ra...@themaw.net> wrote:
> On Thu, 2011-05-19 at 20:21 -0300, Leonardo Chiquitto wrote:
>> Hello,
>>
>> We received a support request from a customer reporting a hang in the
>> automount daemon. Analyzing the core dump, it looks like automount
>> can deadlock if two threads execute in the following order:
>>
>> (All the line numbers are adjusted to match the latest version from Git)
>>
>> Thread 7 (Thread 25043):                      Thread 6 (Thread 24007):
>>
>> #9 start_thread() libpthread.so.0             #3  start_thread() 
>> libpthread.so.0
>> #8 do_read_master() at automount.c:1259               #2  do_readmap() at 
>> state.c:479
>> #7 master_read_master() at master.c:853                   --> 
>> master_mutex_lock() [state.c:462]
>>                                                   --> master_mutex_unlock() 
>> [state.c:466]
>>    --> master_mutex_lock() [master.c:836]
>>    --> cache_writelock() [master.c:838 or :849]
>>                                                   --> 
>> master_source_readlock() [state.c:477]
>> #6 lookup_nss_read_master() at lookup.c:229
>> #5 do_read_master() at lookup.c:96
>> #4 lookup_read_master() at lookup_ldap.c:1676
>> #3 master_parse_entry() at master_parse.y:829
>> #2 master_add_map_source() at master.c:192
>> #1 master_source_writelock() at master.c:543  #1  cache_readlock() at 
>> cache.c:60
>> #0 pthread_rwlock_wrlock() libpthread.so.0    #0  pthread_rwlock_rdlock() 
>> libpthread.so.0
>>
>> At this point:
>>
>>   Thread 7: locked(master_mutex_lock)
>>           locked(cache_writelock)             A
>>           waits(master_source_writelock)      B
>>   Thread 6: locked(master_source_readlock)    B
>>           waits(cache_readlock)               A
>>
>> The AutoFS version is 5.0.5 plus all upstream patches up to
>> autofs-5.0.5-fix-submount-shutdown-wait.patch plus
>> autofs-5.0.5-fix-out-of-order-locking-in-readmap.patch
>> autofs-5.0.5-fix-next-task-list-update.patch
>> autofs-5.0.5-fix-stale-map-read.patch
>>
>> At the first sight, moving pthread_cleanup_pop(1) (and consequently
>> master_mutex_unlock()) to the end of do_readmap() could avoid the
>> problem, but I didn't test this yet (please see untested patch below).
>>
>> Any insight would be much appreciated!
>>
>> Thanks,
>> Leonardo
>>
>> Index: autofs/daemon/state.c
>> ===================================================================
>> --- autofs.orig/daemon/state.c
>> +++ autofs/daemon/state.c
>> @@ -463,7 +463,6 @@ static void *do_readmap(void *arg)
>>       status = lookup_nss_read_map(ap, NULL, now);
>>       if (!status)
>>               pthread_exit(NULL);
>> -     pthread_cleanup_pop(1);
>>
>>       if (ap->type == LKP_INDIRECT) {
>>               lookup_prune_cache(ap, now);
>> @@ -504,6 +503,7 @@ static void *do_readmap(void *arg)
>>       }
>>
>>       pthread_cleanup_pop(1);
>> +     pthread_cleanup_pop(1);
>>
>>       return NULL;
>>  }
>
>
> This might be all we need since once the master map is read the null
> cache is set up and and can't change while we hold the read lock ....
>
> autofs-5.0.5 - fix null cache deadlock
>
> From: Ian Kent <ik...@redhat.com>
>
> ---
>
>  daemon/state.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
>
> diff --git a/daemon/state.c b/daemon/state.c
> index 3645440..51809a1 100644
> --- a/daemon/state.c
> +++ b/daemon/state.c
> @@ -473,11 +473,11 @@ static void *do_readmap(void *arg)
>
>                mnts = tree_make_mnt_tree(_PROC_MOUNTS, "/");
>                pthread_cleanup_push(tree_mnts_cleanup, mnts);
> -               pthread_cleanup_push(master_source_lock_cleanup, ap->entry);
> -               master_source_readlock(ap->entry);
>                nc = ap->entry->master->nc;
>                cache_readlock(nc);
>                pthread_cleanup_push(cache_lock_cleanup, nc);
> +               master_source_readlock(ap->entry);
> +               pthread_cleanup_push(master_source_lock_cleanup, ap->entry);
>                map = ap->entry->maps;
>                while (map) {
>                        /* Is map source up to date or no longer valid */
>

Ian,

Thanks for the patch! It really seems to be enough to resolve this issue: the
problem didn't happen again after ~10 days testing.

Thanks,
Leonardo

_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to