Hi Malahal,

We are using VFS FSAL.

In the original email, I noted that the parent cache entry in question is qid
= LRU_ENTRY_CLEANUP, so I guess the cache entry is in the cleanup queue,
whereas this thread in question is trying to access the entry to fill up
some post-op attributes in the NFS reply.

The workload is "rm -rf" of 1000K files that runs for couple of days, with
other IO in parallel, it does not crash always and it is hard to
re-create.  My guess of what is happening here is - a file is getting
removed

Also, we cannot pick up Ganesha 2.2 because of our release cycles, it is in
the plan, it came out only recently.
Having said that,  the crash you see is with Ganesha 2.1.0 + refcount and
other patches from 2.2.0.  As you said, I also suspect 2.2 may not fix this
issue,

I just need help in debugging, my question is - when a directory entry is
getting removed, at the time of filling up the postOp attributes from the
parent Directory cache entry, what lock is supposed to be held on the
parent entry? Also, the remove operation has returned with error from the
VFS FSAL.

valgrind does not show any use-after-free errors or any other significant
errors, only a bunch of allocated-but-not-freed memory at the end on normal
exit, and that is usual for Ganehsa I guess?

Regards.
Krishna Harathi

On Wed, Jun 17, 2015 at 6:36 AM, Malahal Naineni <mala...@us.ibm.com> wrote:

> Hi Krishna, The code doesn't seem to match exactly with V2.1.0 but it
> does look like nfs3_remove() entered label "out_fail". Wondering what
> the cache_status was at the time of the crash.
>
> There were some fixes in V2.2-stable related refcounting, but I am not
> sure if V2.2-stable fixes your issues.
>
> What FSAL are you using? Also, if you can reproduce this under valgrind,
> that should give us more information to see if we are using the freed
> entry itself here.
>
> As I said, I don't see any commit in particular that fixes this issue but
> V2.2-stable is the current release (and it is our long term release!)
>
> Regards, Malahal.
>
> Krishna Harathi [khara...@exablox.com] wrote:
> >    Using Ganesha version 2.1.0, NFSv3 exports and clients.
> >    We are seeing the following crash where Ganesha is trying to access
> parent
> >    inode to SetPostOpAttr() and ion the crash, we see that the parent
> >    obj_handle is NULL.
> >    Is this a known issue, and are there any recent fices in this area?
> Any
> >    help is
> >    appreciated.
> >
> >  Thread 1 (LWP 6688):
> >  #0  0x0050ad94 in cache_inode_is_attrs_valid (entry=0x6b424500)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h:939
> >  #1  0x0050e5d8 in cache_inode_lock_trust_attrs (entry=0x6b424500,
> need_wr_lock=false)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/cache_inode/cache_inode_misc.c:887
> >  #2  0x004a1e04 in cache_entry_to_nfs3_Fattr (entry=0x6b424500,
> Fattr=0x698092f0)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:3567
> >  #3  0x0049a940 in nfs_SetPostOpAttr (entry=0x6b424500, attr=0x698092e8)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:79
> >  #4  0x0049abc8 in nfs_SetWccData (before_attr=0x70ffdc00,
> entry=0x6b424500, wcc_data=0x698092c8)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:132
> >  #5  0x00466bbc in nfs3_remove (arg=0x5fc90358, worker=0x6f008140,
> req=0x5fc902e8, res=0x698092c0)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs3_remove.c:161
> >  #6  0x0045b340 in nfs_rpc_execute (req=0x5fc72d30,
> worker_data=0x6f008140)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1257
> >  #7  0x0045bfa8 in worker_run (ctx=0x76562f00)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1506
> >  #8  0x00542684 in fridgethr_start_routine (arg=0x76562f00)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/support/fridgethr.c:562
> >  #9  0x76f47368 in start_thread () from
> /lib/mips-linux-gnu/libpthread.so.0
> >  #10 0x76e9af18 in fcvt_r () from /lib/mips-linux-gnu/libc.so.6
> >  #11 0x00000000 in ?? ()
> >  (gdb) f 0
> >  #0  0x0050ad94 in cache_inode_is_attrs_valid (entry=0x6b424500)
> >      at
> /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h:939
> >  939    in
> /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h
> >
> >  (gdb) p entry->obj_handle
> >  $1 = (struct fsal_obj_handle *) 0x0
> >
> >    Regards.
> >    Krishna Harathi
>
> >
> ------------------------------------------------------------------------------
>
> > _______________________________________________
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to