Re: [Nfs-ganesha-devel] ESTALE on 'find .'

Daniel Gryniewicz Thu, 28 Jul 2016 07:52:13 -0700

On 07/28/2016 09:46 AM, Soumya Koduri wrote:
>
>
> On 07/27/2016 10:45 AM, Soumya Koduri wrote:
>>>>
>>>> There could be another issue too.  I have yet to create a reliable test
>>>> to trigger this but I often receive stale file handles in situations
>>>> where I would not expect to.  Almost always after a server restart where
>>>> client does not umount and sometimes after data has been updated by
>>>> another client.  I'll send a new message to the lists when I have some
>>>> repeatable examples.
>> I see this issue too. Will look into it.
>
> Changing the subject to reflect this other issue. CCin Frank for inputs.
> I see this issue even with FSAL_VFS.  This issue seems to happen
> whenever NFS-Ganesha is populating an already existing directory into
> its cache as part of readdir.
>
> One of the methods to reproduce this issue via VFS:
>
> 1) Have a share with few directories already created
> 2) Export it via Ganesha
> 3) Mount the share and run "find ."
>
> We can see Stale find handle errors for those directory entries. There
> are no errors sent from the server.
>
> I checked rpcdebug logs on the client side -
>
> [root@dhcp35-197 ~]# cd /mnt
> [root@dhcp35-197 mnt]# find .
> .
> ./ganesha
> find: ‘./ganesha’: Stale file handle
> ./tmp-dir
> find: ‘./tmp-dir’: Stale file handle
>
> [178230.657482] NFS: nfs_update_inode(0:46/302684 fh_crc=0xcc2e1973 ct=2
> info=0x427e7f)
> [178230.657485] NFS: nfs_fhget(0:46/302684 fh_crc=0xcc2e1973 ct=2)
> [178230.657488] <-- nfs_xdev_mount() = -116
> [178230.657491] nfs_do_submount: done
> [178230.657493] <-- nfs_do_submount() = ffffffffffffff8c
> [178230.657504] <-- nfs_d_automount(): error -116
> [178230.657506] NFS: dentry_delete(/tmp-dir, 3a00cc)
> [178230.659927] NFS: permission(0:46/277019), mask=0x41, res=0
> [178235.547441] NFS: permission(0:46/277019), mask=0x81, res=0
> [178235.648999] nfs4_renew_state: start
> [178235.649011] nfs4_renew_state: failed to call renewd. Reason: lease
> not expired
> [178235.649012] nfs4_schedule_state_renewal: requeueing work. Lease
> period = 36
> [178235.649014] nfs4_renew_state: done
>
> where as for 2.3 build -
>
> [158104.422011] NFS: nfs_update_inode(0:44/6 fh_crc=0xeca9ed83 ct=1
> info=0x26040)
> [158104.422013] NFS: permission(0:44/6), mask=0x24, res=0
> [158104.422015] NFS: open dir(.trashcan/internal_op)
> [158104.422020] NFS: permission(0:44/5), mask=0x81, res=0
> [158104.422022] NFS: nfs_lookup_revalidate(.trashcan/internal_op) is valid
> [158104.422025] NFS: readdir(.trashcan/internal_op) starting at cookie 0
> [158104.422027] NFS: nfs_do_filldir() filling ended @ cookie 2;
> returning = 0
> [158104.422027] NFS: readdir(.trashcan/internal_op) returns 0
> [158104.422029] NFS: readdir(.trashcan/internal_op) starting at cookie 2
> [158104.422030] NFS: readdir(.trashcan/internal_op) returns 0
> [158104.422033] NFS: dentry_delete(.trashcan/internal_op, 2800cc)
> [158104.422039] NFS: permission(0:44/1), mask=0x41, res=0
>
>
> I do not know how NFS-client readdir response processing happens. But it
> is taking different path in case of 2.4 . From the pkt traces I
> collected for both, one difference I see is the fsid values of the
> directory entries in the readdir response. It had proper values in case
> of 2.3 but is {0,0} in 2.4. I am not sure if this is the actual cause.
> When checked via gdb, I see the attrs returned by FSAL contained proper
> fsid but that is not reflected in readdir response somehow. Still
> looking into it.
>
> Thanks,
> Soumya
>


fsid was broken in the attribute-copy changes. I'll submit a potential 
fix for Frank to look at.

Daniel


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] ESTALE on 'find .'

Reply via email to