Hmm, well, that’s easy to fix…


Instead of:



mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);

goto next_lane;



It could:



QUNLOCK(qlane);

mdcache_put(entry);

continue;



Fix posted here:



https://review.gerrithub.io/371764



Frank





From: Pradeep [mailto:pradeep.tho...@gmail.com]
Sent: Friday, July 28, 2017 12:44 PM
To: nfs-ganesha-devel@lists.sourceforge.net
Subject: [Nfs-ganesha-devel] deadlock in lru_reap_impl()





I'm hitting another deadlock in mdcache with 2.5.1 base.  In this case two 
threads are in different places in lru_reap_impl()



Thread 1:



    636                 QLOCK(qlane);

    637                 lru = glist_first_entry(&lq->q, mdcache_lru_t, q);

    638                 if (!lru)

    639                         goto next_lane;

    640                 refcnt = atomic_inc_int32_t(&lru->refcnt);

    641                 entry = container_of(lru, mdcache_entry_t, lru);

    642                 if (unlikely(refcnt != (LRU_SENTINEL_REFCOUNT + 1))) {

    643                         /* cant use it. */

    644                         mdcache_lru_unref(entry, LRU_UNREF_QLOCKED);



​mdcache_lru_unref() could lead to the set of calls below:​



​mdcache_lru_unref() -> mdcache_lru_clean() -> mdc_clean_entry() -> 
cih_remove_checked()



This tries to get partition lock which is held by 'Thread 2' which is trying to 
acquire queue lane lock.



Thread 2:

    650                 if (cih_latch_entry(&entry->fh_hk.key, &latch, 
CIH_GET_WLOCK,

    651                                     __func__, __LINE__)) {

    652                         QLOCK(qlane);



Stack traces:



Thread 1:


#0  0x00007f571328103e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0

#1  0x000000000052f928 in cih_remove_checked (entry=0x7f548e86c400)

    at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_hash.h:394

#2  0x0000000000530805 in mdc_clean_entry (entry=0x7f548e86c400)

    at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:272

#3  0x000000000051df7e in mdcache_lru_clean (entry=0x7f548e86c400)

    at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:590

#4  0x00000000005229c0 in _mdcache_lru_unref (entry=0x7f548e86c400, flags=8, 
func=0x58b5c0 <__func__.23710> "lru_reap_impl", line=687)

    at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1918

#5  0x000000000051e83a in lru_reap_impl (qid=LRU_ENTRY_L1)

    at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:687



Thread 2:

#0  0x00007f57132841bd in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x00007f571327fd02 in _L_lock_791 () from /lib64/libpthread.so.0

#2  0x00007f571327fc08 in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x000000000051e4f5 in lru_reap_impl (qid=LRU_ENTRY_L1)

    at 
/usr/src/debug/nfs-ganesha-2.5.1-0.1.1-Source/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:652









---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to