On 18 Nov 2023, at 21:19, Rick Macklem wrote: > On Sat, Nov 18, 2023 at 4:43 PM Mike Karels <m...@karels.net> wrote: >> >> On 18 Nov 2023, at 17:23, Rick Macklem wrote: >> >>> On Sat, Nov 18, 2023 at 2:27 PM Mike Karels <m...@karels.net> wrote: >>>> >>>> On 18 Nov 2023, at 15:58, Rick Macklem wrote: >>>> >>>>> On Sat, Nov 18, 2023 at 8:09 AM Rick Macklem <rick.mack...@gmail.com> >>>>> wrote: >>>>>> >>>>>> On Fri, Nov 17, 2023 at 8:19 PM Mike Karels <m...@karels.net> wrote: >>>>>>> >>>>>>> On 17 Nov 2023, at 22:14, Mike Karels wrote: >>>>>>> >>>>>>>> On 17 Nov 2023, at 21:24, Rick Macklem wrote: >>>>>>>> >>>>>>>>> Most of the changes in stable/13 that are not in releng/13.2 >>>>>>>>> are the "make it work in a jail" stuff. Unfortunately, they are >>>>>>>>> a large # of changes (mostly trivial edits adding vnet macros), >>>>>>>>> but it also includes export check changes. >>>>>>>>> >>>>>>>>> I have attached a trivial patch that I think disables the export >>>>>>>>> checks for jails. If either of you can try it and see if it fixes >>>>>>>>> the problem, that would be great. >>>>>>>>> (Note that this is only for testing, although it probably does not >>>>>>>>> matter unless you are running nfsd(8) in vnet jails.) >>>>>>>> >>>>>>>> Yes, I can see snapshots with the patch. This system is just a test >>>>>>>> system that doesn't normally run ZFS or NFS, so no problem messing >>>>>>>> with permissions. It's a bhyve VM, so I just added a small disk and >>>>>>>> enabled ZFS for testing. >>>>>>> >>>>>>> btw, you might try to get mm@ or maybe mav@ to help out from the ZFS >>>>>>> side. It must be doing something differently inside a snapshot than >>>>>>> outside, maybe with file handles or something like that. >>>>>> Yes. I've added freebsd-current@ (although Garrett is not on it, he is >>>>>> cc'd) and these guys specifically... >>>>>> >>>>>> So, here's what appears to be the problem... >>>>>> Commit 88175af (in main and stable/13, but not 13.2) added checks for >>>>>> nfsd(8) running in jails by filling in mnt_exjail with a reference to >>>>>> the cred >>>>>> used when the file system is exported. >>>>>> When mnt_exjail is found NULL, the current nfsd code assumes that there >>>>>> is no access allowed for the mount. >>>>>> >>>>>> My vague understanding is that when a ZFS snapshot is accessed, it is >>>>>> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that >>>>>> mnt_exjail is NULL as a result. >>>>>> Since I do not know the ZFS code and don't even have an easy way to >>>>>> test this (thankfully Mike can test easily), I do not know what to do >>>>>> from >>>>>> here? >>>>>> >>>>>> Is there a "struct mount" constructed for this pseudo mount >>>>>> (or it actually appears to be the lookup of ".." that fails, so it >>>>>> might be the parent of the snapshot subdir?)? >>>>>> >>>>>> One thought is that I can check to see if the mount pointer is in the >>>>>> mountlist (I don't think the snapshot's mount is in the mountlist) and >>>>>> avoid the jail test for this case. This would assume that snapshots are >>>>>> always within the file system(s) exported via that jail (which includes >>>>>> the case of prison0, of course), so that they do not need a separate >>>>>> jail check. >>>>>> >>>>>> If this doesn't work, there will need to be some sort of messing about >>>>>> in ZFS to set mnt_exjail for these. >>>>> Ok, so now onto the hard part... >>>>> Thanks to Mike and others, I did create a snapshot under .zfs and I can >>>>> see the problem. It is that mnt_exjail == NULL. >>>>> Now, is there a way that this "struct mount" can be recognized as >>>>> "special" >>>>> for snapshots, so I can avoid the mnt_exjail == NULL test? >>>>> (I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is not >>>>> the case.) >>>> >>>> Dumb question, is the mount point (mp presumably) different between the >>>> snapshot and the main file system? >>> Not a dump question and the answer is rather interesting... >>> It is "sometimes" or "usually" according to my printf(). >>> It seems that when you first "cd <snapshot-name"" you get a different mp >>> where mnt_exjail == NULL.. Then when you look at directories within the >>> snapshot, you get the mp of the file system that .zfs exists in, which does >>> have mnt_exjail set non-NULL. >>> >>> There is this snippet of code in zfsctl_snapdir_lookup(): >>> /* >>> * Fix up the root vnode mounted on .zfs/snapshot/<snapname>. >>> * >>> * This is where we lie about our v_vfsp in order to >>> * make .zfs/snapshot/<snapname> accessible over NFS >>> * without requiring manual mounts of <snapname>. >>> */ >>> ASSERT3P(VTOZ(*vpp)->z_zfsvfs, !=, zfsvfs); >>> VTOZ(*vpp)->z_zfsvfs->z_parent = zfsvfs; >>> >>> /* Clear the root flag (set via VFS_ROOT) as well. */ >>> (*vpp)->v_vflag &= ~VV_ROOT; >>> which seems to set the mp to that of the parent, but it >>> seems this does not happen for the initial lookup of >>> the <snapname>? >>> >>> I'll note that there is code before this in >>> zfsctl_snapdir_lookup() for handling cases >>> like "." and ".." that return without doing this. >>> >>> Now, why does this work without the mnt_exjail >>> check (as in 13.2)? >>> I am not quite sure, but there is this "cheat" in the >>> NFS server (it has been there for years, maybe decades): >>> /* >>> * Allow a Lookup, Getattr, GetFH, Secinfo on an >>> * non-exported directory if >>> * nfs_rootfhset. Do I need to allow any other Ops? >>> * (You can only have a non-exported vpnes if >>> * nfs_rootfhset is true. See nfsd_fhtovp()) >>> * Allow AUTH_SYS to be used for file systems >>> * exported GSS only for certain Ops, to allow >>> * clients to do mounts more easily. >>> */ >>> if (nfsv4_opflag[op].needscfh && vp) { >>> if (!NFSVNO_EXPORTED(&vpnes) && >>> op != NFSV4OP_LOOKUP && >>> op != NFSV4OP_GETATTR && >>> op != NFSV4OP_GETFH && >>> op != NFSV4OP_ACCESS && >>> op != NFSV4OP_READLINK && >>> op != NFSV4OP_SECINFO && >>> op != NFSV4OP_SECINFONONAME) >>> nd->nd_repstat = NFSERR_NOFILEHANDLE; >>> This allows certain operations to be done on >>> non-exported file systems and I think that is enough >>> to allow this to work when mnt_exjail is not checked. >>> (Note that NFSV4OP_LOOKUPP is not in the list, >>> which might explain why it is the one that fails for >>> Garrett. I don't think it can be added to this list >>> safely, since that would allow a client to move above >>> the exported file system into "uncharted territory".) >>> >>>> Just curious. Also, what is mnt_exjail >>>> normally set to for file systems not in a jail? >>> mnt_exjail is set to the credentials of the thread/process >>> that exported the file system (usually mountd(8)). >>> When not in a jail, cr_prison for these credentials >>> points to prison0. >>> >>> Btw, I checked and the "other mp that has mnt_exjail == NULL >>> is in the mountlist, so the idea of checking "not in mountlist" >>> is a dead end. >>> >>> I am looking for something "unique" about this other mp, >>> but haven't found anything yet. >>> Alternately, it might be necessary to add code to >>> zfsctl_snapdir_lookup() to "cheat and change the mp" >>> in more cases, such as "." and ".." lookups? >> >> It seems to me that if ZFS is creating an additional mount structure, >> it should be responsible for setting it up correctly. That could >> involve a vfs-level routine to do some of the cloning. In any case, >> it seems to me that mnt_exjail should be set properly, e.g. by duping >> the one in the original mount structure. Probably ZFS is the only >> file system type that would need this added. > I've created a patch that I think does this. It seemed to test ok for my case. > It's in D42672 on reviews.freebsd.org. > I have also attached it here (this diff doesn't use -U999999, so it might be > easier to apply).
It works for me in 13-stable, and looks plausible. Mike > Hopefully this fixes the problem. Sorry for the breakage. > > rick > >> >> Mike >> >>> rick >>> ps: I added all the cc's back in because I want the >>> ZFS folk to hopefully chime in. >>> >>>> >>>> Mike >>>> >>>>> Do I need to search mountlist for it? >>>>> >>>>> rick >>>>> ps: The hack patch attached should fix the problem, but can only be >>>>> safely used if mountd/nfsd are not run in any jails. >>>>> >>>>>> >>>>>> I will try and get a test setup going here, which leads me to.. >>>>>> how do I create a ZFS snapshot? (I do have a simple ZFS pool running >>>>>> on a test machine, but I've never done a snapshot.) >>>>>> >>>>>> Although this problem is not in 13.2, it will have shipped in 14.0. >>>>>> >>>>>> Any help with be appreciated, rick >>>>>> >>>>>>> >>>>>>> Mike >>>>>>>> >>>>>>>>> rick >>>>>>>>> >>>>>>>>> On Fri, Nov 17, 2023 at 6:14 PM Mike Karels <m...@karels.net> wrote: >>>>>>>>>> >>>>>>>>>> CAUTION: This email originated from outside of the University of >>>>>>>>>> Guelph. Do not click links or open attachments unless you recognize >>>>>>>>>> the sender and know the content is safe. If in doubt, forward >>>>>>>>>> suspicious emails to ith...@uoguelph.ca. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Rick, have you been following this thread on freebsd-stable? I have >>>>>>>>>> been able >>>>>>>>>> to reproduce this using a 13-stable server from Oct 7 and a >>>>>>>>>> 15-current system >>>>>>>>>> that is up to date using NFSv3. I did not reproduce with a 13.2 >>>>>>>>>> server. The >>>>>>>>>> client was running 13.2. Any ideas? A full bisect seems fairly >>>>>>>>>> painful, but >>>>>>>>>> maybe you have an idea of points to try. Fortunately, these are all >>>>>>>>>> test >>>>>>>>>> systems that I can reboot at will. >>>>>>>>>> >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>>> Forwarded message: >>>>>>>>>> >>>>>>>>>>> From: Garrett Wollman <woll...@bimajority.org> >>>>>>>>>>> To: Mike Karels <m...@karels.net> >>>>>>>>>>> Cc: freebsd-sta...@freebsd.org >>>>>>>>>>> Subject: Re: NFS exports of ZFS snapshots broken >>>>>>>>>>> Date: Fri, 17 Nov 2023 17:35:04 -0500 >>>>>>>>>>> >>>>>>>>>>> <<On Fri, 17 Nov 2023 15:57:42 -0600, Mike Karels <m...@karels.net> >>>>>>>>>>> said: >>>>>>>>>>> >>>>>>>>>>>> I have not run into this, so I tried it just now. I had no >>>>>>>>>>>> problem. >>>>>>>>>>>> The server is 13.2, fully patched, the client is up-to-date >>>>>>>>>>>> -current, >>>>>>>>>>>> and the mount is v4. >>>>>>>>>>> >>>>>>>>>>> On my 13.2 client and 13-stable server, I see: >>>>>>>>>>> >>>>>>>>>>> 25034 ls CALL >>>>>>>>>>> open(0x237d32f9a000,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>) >>>>>>>>>>> 25034 ls NAMI "/mnt/tools/.zfs/snapshot/weekly-2023-45" >>>>>>>>>>> 25034 ls RET open 4 >>>>>>>>>>> 25034 ls CALL fcntl(0x4,F_ISUNIONSTACK,0x0) >>>>>>>>>>> 25034 ls RET fcntl 0 >>>>>>>>>>> 25034 ls CALL >>>>>>>>>>> getdirentries(0x4,0x237d32faa000,0x1000,0x237d32fa7028) >>>>>>>>>>> 25034 ls RET getdirentries -1 errno 5 Input/output error >>>>>>>>>>> 25034 ls CALL close(0x4) >>>>>>>>>>> 25034 ls RET close 0 >>>>>>>>>>> 25034 ls CALL exit(0) >>>>>>>>>>> >>>>>>>>>>> Certainly a libc bug here that getdirentries(2) returning [EIO] >>>>>>>>>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is >>>>>>>>>>> consistent across both FreeBSD and Linux clients. >>>>>>>>>>> >>>>>>>>>>> Looking at this from the RPC side: >>>>>>>>>>> >>>>>>>>>>> (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR) >>>>>>>>>>> [NFS4_OK for all ops] >>>>>>>>>>> (PUTFH, GETATTR) >>>>>>>>>>> [NFS4_OK, NFS4_OK] >>>>>>>>>>> (PUTFH, ACCESS(0x3f), GETATTR) >>>>>>>>>>> [NFS4_OK, NFS4_OK, rights = 0x03, NFS4_OK] >>>>>>>>>>> (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR) >>>>>>>>>>> [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE] >>>>>>>>>>> >>>>>>>>>>> and at this point the [EIO] is returned. >>>>>>>>>>> >>>>>>>>>>> It seems that clients always do a LOOKUPP before calling READDIR, >>>>>>>>>>> and >>>>>>>>>>> this is failing when the subject file handle is the snapshot. The >>>>>>>>>>> client is perfectly able to *traverse into* the snapshot: if I try >>>>>>>>>>> to >>>>>>>>>>> list a subdirectory I know exists in the snapshot, the client is >>>>>>>>>>> able to >>>>>>>>>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with >>>>>>>>>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*. >>>>>>>>>>> >>>>>>>>>>> -GAWollman >>>>>>>>>>