Hi Malahal, We are using VFS FSAL.
In the original email, I noted that the parent cache entry in question is qid = LRU_ENTRY_CLEANUP, so I guess the cache entry is in the cleanup queue, whereas this thread in question is trying to access the entry to fill up some post-op attributes in the NFS reply. The workload is "rm -rf" of 1000K files that runs for couple of days, with other IO in parallel, it does not crash always and it is hard to re-create. My guess of what is happening here is - a file is getting removed Also, we cannot pick up Ganesha 2.2 because of our release cycles, it is in the plan, it came out only recently. Having said that, the crash you see is with Ganesha 2.1.0 + refcount and other patches from 2.2.0. As you said, I also suspect 2.2 may not fix this issue, I just need help in debugging, my question is - when a directory entry is getting removed, at the time of filling up the postOp attributes from the parent Directory cache entry, what lock is supposed to be held on the parent entry? Also, the remove operation has returned with error from the VFS FSAL. valgrind does not show any use-after-free errors or any other significant errors, only a bunch of allocated-but-not-freed memory at the end on normal exit, and that is usual for Ganehsa I guess? Regards. Krishna Harathi On Wed, Jun 17, 2015 at 6:36 AM, Malahal Naineni <mala...@us.ibm.com> wrote: > Hi Krishna, The code doesn't seem to match exactly with V2.1.0 but it > does look like nfs3_remove() entered label "out_fail". Wondering what > the cache_status was at the time of the crash. > > There were some fixes in V2.2-stable related refcounting, but I am not > sure if V2.2-stable fixes your issues. > > What FSAL are you using? Also, if you can reproduce this under valgrind, > that should give us more information to see if we are using the freed > entry itself here. > > As I said, I don't see any commit in particular that fixes this issue but > V2.2-stable is the current release (and it is our long term release!) > > Regards, Malahal. > > Krishna Harathi [khara...@exablox.com] wrote: > > Using Ganesha version 2.1.0, NFSv3 exports and clients. > > We are seeing the following crash where Ganesha is trying to access > parent > > inode to SetPostOpAttr() and ion the crash, we see that the parent > > obj_handle is NULL. > > Is this a known issue, and are there any recent fices in this area? > Any > > help is > > appreciated. > > > > Thread 1 (LWP 6688): > > #0 0x0050ad94 in cache_inode_is_attrs_valid (entry=0x6b424500) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h:939 > > #1 0x0050e5d8 in cache_inode_lock_trust_attrs (entry=0x6b424500, > need_wr_lock=false) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/cache_inode/cache_inode_misc.c:887 > > #2 0x004a1e04 in cache_entry_to_nfs3_Fattr (entry=0x6b424500, > Fattr=0x698092f0) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:3567 > > #3 0x0049a940 in nfs_SetPostOpAttr (entry=0x6b424500, attr=0x698092e8) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:79 > > #4 0x0049abc8 in nfs_SetWccData (before_attr=0x70ffdc00, > entry=0x6b424500, wcc_data=0x698092c8) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:132 > > #5 0x00466bbc in nfs3_remove (arg=0x5fc90358, worker=0x6f008140, > req=0x5fc902e8, res=0x698092c0) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs3_remove.c:161 > > #6 0x0045b340 in nfs_rpc_execute (req=0x5fc72d30, > worker_data=0x6f008140) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1257 > > #7 0x0045bfa8 in worker_run (ctx=0x76562f00) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1506 > > #8 0x00542684 in fridgethr_start_routine (arg=0x76562f00) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/support/fridgethr.c:562 > > #9 0x76f47368 in start_thread () from > /lib/mips-linux-gnu/libpthread.so.0 > > #10 0x76e9af18 in fcvt_r () from /lib/mips-linux-gnu/libc.so.6 > > #11 0x00000000 in ?? () > > (gdb) f 0 > > #0 0x0050ad94 in cache_inode_is_attrs_valid (entry=0x6b424500) > > at > /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h:939 > > 939 in > /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h > > > > (gdb) p entry->obj_handle > > $1 = (struct fsal_obj_handle *) 0x0 > > > > Regards. > > Krishna Harathi > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > > Nfs-ganesha-devel mailing list > > Nfs-ganesha-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > >
------------------------------------------------------------------------------
_______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel