On Monday 22 March 2010 11:47:43 am Steve Polyack wrote: > On 03/22/10 10:52, Steve Polyack wrote: > > On 3/19/2010 11:27 PM, Rick Macklem wrote: > >> On Fri, 19 Mar 2010, Steve Polyack wrote: > >> > >> [good stuff snipped] > >>> > >>> This makes sense. According to wireshark, the server is indeed > >>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE > >>> instead; it sounds more correct than marking it a general IO error. > >>> Also, the NFS server is serving its share off of a ZFS filesystem, > >>> if it makes any difference. I suppose ZFS could be talking to the > >>> NFS server threads with some mismatched language, but I doubt it. > >>> > >> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return > >> ESTALE when the file no longer exists, the NFS server returns whatever > >> error it has returned. > >> > >> So, either VFS_FHTOVP() succeeds after the file has been deleted, which > >> would be a problem that needs to be fixed within ZFS > >> OR > >> ZFS returns an error other than ESTALE when it doesn't exist. > >> > >> Try the following patch on the server (which just makes any error > >> returned by VFS_FHTOVP() into ESTALE) and see if that helps. > >> > >> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 > >> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 > >> @@ -1127,6 +1127,8 @@ > >> } > >> } > >> error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp); > >> + if (error != 0) > >> + error = ESTALE; > >> vfs_unbusy(mp); > >> if (error) > >> goto out; > >> > >> Please let me know if the patch helps, rick > >> > >> > > The patch seems to fix the bad behavior. Running with the patch, I > > see the following output from my patch (return code of nfs_doio from > > within nfsiod): > > nfssvc_iod: iod 0 nfs_doio returned errno: 70 > > > > Furthermore, when inspecting the transaction with Wireshark, after > > deleting the file on the NFS server it looks like there is only a > > single error. This time there it is a reply to a V3 Lookup call that > > contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. > > The client also does not repeatedly try to complete the failed request. > > > > Any suggestions on the next step here? Based on what you said it > > looks like ZFS is falsely reporting an IO error to VFS instead of > > ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw > > returns of EINVAL, but I'm not even sure I'm looking in the right place. > > Further on down the rabbit hole... here's the piece in zfs_fhtovp() > where it's kicking out EINVAL instead of ESTALE - the following patch > corrects the behavior, but of course also suggests further digging > within the zfs_zget() function to ensure that _it_ is returning the > correct thing and whether or not it needs to be handled there or within > zfs_fhtovp(). > > --- > src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > 2010-03-22 11:41:21.000000000 -0400 > +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > 2010-03-22 16:25:21.000000000 -0400 > @@ -1246,7 +1246,7 @@ > dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); > if (err = zfs_zget(zfsvfs, object, &zp)) { > ZFS_EXIT(zfsvfs); > - return (err); > + return (ESTALE); > } > zp_gen = zp->z_phys->zp_gen & gen_mask; > if (zp_gen == 0)
So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if VFS_VGET() (which calls ffs_vget()) fails, it only returns ESTALE if the generation count doesn't matter. -- John Baldwin _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"