On Sat, Nov 18, 2023 at 8:10 PM Mike Karels <m...@karels.net> wrote: > > On 18 Nov 2023, at 21:19, Rick Macklem wrote: > > > On Sat, Nov 18, 2023 at 4:43 PM Mike Karels <m...@karels.net> wrote: > >> > >> On 18 Nov 2023, at 17:23, Rick Macklem wrote: > >> > >>> On Sat, Nov 18, 2023 at 2:27 PM Mike Karels <m...@karels.net> wrote: > >>>> > >>>> On 18 Nov 2023, at 15:58, Rick Macklem wrote: > >>>> > >>>>> On Sat, Nov 18, 2023 at 8:09 AM Rick Macklem <rick.mack...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> On Fri, Nov 17, 2023 at 8:19 PM Mike Karels <m...@karels.net> wrote: > >>>>>>> > >>>>>>> On 17 Nov 2023, at 22:14, Mike Karels wrote: > >>>>>>> > >>>>>>>> On 17 Nov 2023, at 21:24, Rick Macklem wrote: > >>>>>>>> > >>>>>>>>> Most of the changes in stable/13 that are not in releng/13.2 > >>>>>>>>> are the "make it work in a jail" stuff. Unfortunately, they are > >>>>>>>>> a large # of changes (mostly trivial edits adding vnet macros), > >>>>>>>>> but it also includes export check changes. > >>>>>>>>> > >>>>>>>>> I have attached a trivial patch that I think disables the export > >>>>>>>>> checks for jails. If either of you can try it and see if it fixes > >>>>>>>>> the problem, that would be great. > >>>>>>>>> (Note that this is only for testing, although it probably does not > >>>>>>>>> matter unless you are running nfsd(8) in vnet jails.) > >>>>>>>> > >>>>>>>> Yes, I can see snapshots with the patch. This system is just a test > >>>>>>>> system that doesn't normally run ZFS or NFS, so no problem messing > >>>>>>>> with permissions. It's a bhyve VM, so I just added a small disk and > >>>>>>>> enabled ZFS for testing. > >>>>>>> > >>>>>>> btw, you might try to get mm@ or maybe mav@ to help out from the ZFS > >>>>>>> side. It must be doing something differently inside a snapshot than > >>>>>>> outside, maybe with file handles or something like that. > >>>>>> Yes. I've added freebsd-current@ (although Garrett is not on it, he is > >>>>>> cc'd) and these guys specifically... > >>>>>> > >>>>>> So, here's what appears to be the problem... > >>>>>> Commit 88175af (in main and stable/13, but not 13.2) added checks for > >>>>>> nfsd(8) running in jails by filling in mnt_exjail with a reference to > >>>>>> the cred > >>>>>> used when the file system is exported. > >>>>>> When mnt_exjail is found NULL, the current nfsd code assumes that there > >>>>>> is no access allowed for the mount. > >>>>>> > >>>>>> My vague understanding is that when a ZFS snapshot is accessed, it is > >>>>>> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that > >>>>>> mnt_exjail is NULL as a result. > >>>>>> Since I do not know the ZFS code and don't even have an easy way to > >>>>>> test this (thankfully Mike can test easily), I do not know what to do > >>>>>> from > >>>>>> here? > >>>>>> > >>>>>> Is there a "struct mount" constructed for this pseudo mount > >>>>>> (or it actually appears to be the lookup of ".." that fails, so it > >>>>>> might be the parent of the snapshot subdir?)? > >>>>>> > >>>>>> One thought is that I can check to see if the mount pointer is in the > >>>>>> mountlist (I don't think the snapshot's mount is in the mountlist) and > >>>>>> avoid the jail test for this case. This would assume that snapshots > >>>>>> are > >>>>>> always within the file system(s) exported via that jail (which includes > >>>>>> the case of prison0, of course), so that they do not need a separate > >>>>>> jail check. > >>>>>> > >>>>>> If this doesn't work, there will need to be some sort of messing about > >>>>>> in ZFS to set mnt_exjail for these. > >>>>> Ok, so now onto the hard part... > >>>>> Thanks to Mike and others, I did create a snapshot under .zfs and I can > >>>>> see the problem. It is that mnt_exjail == NULL. > >>>>> Now, is there a way that this "struct mount" can be recognized as > >>>>> "special" > >>>>> for snapshots, so I can avoid the mnt_exjail == NULL test? > >>>>> (I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is not > >>>>> the case.) > >>>> > >>>> Dumb question, is the mount point (mp presumably) different between the > >>>> snapshot and the main file system? > >>> Not a dump question and the answer is rather interesting... > >>> It is "sometimes" or "usually" according to my printf(). > >>> It seems that when you first "cd <snapshot-name"" you get a different mp > >>> where mnt_exjail == NULL.. Then when you look at directories within the > >>> snapshot, you get the mp of the file system that .zfs exists in, which > >>> does > >>> have mnt_exjail set non-NULL. > >>> > >>> There is this snippet of code in zfsctl_snapdir_lookup(): > >>> /* > >>> * Fix up the root vnode mounted on .zfs/snapshot/<snapname>. > >>> * > >>> * This is where we lie about our v_vfsp in order to > >>> * make .zfs/snapshot/<snapname> accessible over NFS > >>> * without requiring manual mounts of <snapname>. > >>> */ > >>> ASSERT3P(VTOZ(*vpp)->z_zfsvfs, !=, zfsvfs); > >>> VTOZ(*vpp)->z_zfsvfs->z_parent = zfsvfs; > >>> > >>> /* Clear the root flag (set via VFS_ROOT) as well. */ > >>> (*vpp)->v_vflag &= ~VV_ROOT; > >>> which seems to set the mp to that of the parent, but it > >>> seems this does not happen for the initial lookup of > >>> the <snapname>? > >>> > >>> I'll note that there is code before this in > >>> zfsctl_snapdir_lookup() for handling cases > >>> like "." and ".." that return without doing this. > >>> > >>> Now, why does this work without the mnt_exjail > >>> check (as in 13.2)? > >>> I am not quite sure, but there is this "cheat" in the > >>> NFS server (it has been there for years, maybe decades): > >>> /* > >>> * Allow a Lookup, Getattr, GetFH, Secinfo on an > >>> * non-exported directory if > >>> * nfs_rootfhset. Do I need to allow any other Ops? > >>> * (You can only have a non-exported vpnes if > >>> * nfs_rootfhset is true. See nfsd_fhtovp()) > >>> * Allow AUTH_SYS to be used for file systems > >>> * exported GSS only for certain Ops, to allow > >>> * clients to do mounts more easily. > >>> */ > >>> if (nfsv4_opflag[op].needscfh && vp) { > >>> if (!NFSVNO_EXPORTED(&vpnes) && > >>> op != NFSV4OP_LOOKUP && > >>> op != NFSV4OP_GETATTR && > >>> op != NFSV4OP_GETFH && > >>> op != NFSV4OP_ACCESS && > >>> op != NFSV4OP_READLINK && > >>> op != NFSV4OP_SECINFO && > >>> op != NFSV4OP_SECINFONONAME) > >>> nd->nd_repstat = NFSERR_NOFILEHANDLE; > >>> This allows certain operations to be done on > >>> non-exported file systems and I think that is enough > >>> to allow this to work when mnt_exjail is not checked. > >>> (Note that NFSV4OP_LOOKUPP is not in the list, > >>> which might explain why it is the one that fails for > >>> Garrett. I don't think it can be added to this list > >>> safely, since that would allow a client to move above > >>> the exported file system into "uncharted territory".) > >>> > >>>> Just curious. Also, what is mnt_exjail > >>>> normally set to for file systems not in a jail? > >>> mnt_exjail is set to the credentials of the thread/process > >>> that exported the file system (usually mountd(8)). > >>> When not in a jail, cr_prison for these credentials > >>> points to prison0. > >>> > >>> Btw, I checked and the "other mp that has mnt_exjail == NULL > >>> is in the mountlist, so the idea of checking "not in mountlist" > >>> is a dead end. > >>> > >>> I am looking for something "unique" about this other mp, > >>> but haven't found anything yet. > >>> Alternately, it might be necessary to add code to > >>> zfsctl_snapdir_lookup() to "cheat and change the mp" > >>> in more cases, such as "." and ".." lookups? > >> > >> It seems to me that if ZFS is creating an additional mount structure, > >> it should be responsible for setting it up correctly. That could > >> involve a vfs-level routine to do some of the cloning. In any case, > >> it seems to me that mnt_exjail should be set properly, e.g. by duping > >> the one in the original mount structure. Probably ZFS is the only > >> file system type that would need this added. > > I've created a patch that I think does this. It seemed to test ok for my > > case. > > It's in D42672 on reviews.freebsd.org. > > I have also attached it here (this diff doesn't use -U999999, so it might be > > easier to apply). > > It works for me in 13-stable, and looks plausible.
The patch has now been fixed so that there is no race between vfs_exjail_clone() and a jail dying. I believe the current patch fixes the problem. The patch can be found as an attachment to PR#275200 (that is a FreeBSD bugzilla problem report, not an OpenZFS pull request). I will not close the PR until the above has all happened. This patch will be needed for systems running stable/13, stable/14 and 14.0. The vfs_exjail_clone() part of the patch is now committed to main and will be MFC'd in 3 days. The ZFS part of the patch is a pull request on OpenZFS. I cannot say how long the ZFS part takes or if/when an errata to 14.0 will happen. Sorry about the breakage and thanks for reporting it (and persevering when I said I knew nothing about ZFS, which is true). Thanks goes to Mike Karels, who was able to determine that the problem was not in 13.2, but was in stable/13, which allowed me to conclude it was a "nfsd in jail" related breakage. rick > > Mike > > > Hopefully this fixes the problem. Sorry for the breakage. > > > > rick > > > >> > >> Mike > >> > >>> rick > >>> ps: I added all the cc's back in because I want the > >>> ZFS folk to hopefully chime in. > >>> > >>>> > >>>> Mike > >>>> > >>>>> Do I need to search mountlist for it? > >>>>> > >>>>> rick > >>>>> ps: The hack patch attached should fix the problem, but can only be > >>>>> safely used if mountd/nfsd are not run in any jails. > >>>>> > >>>>>> > >>>>>> I will try and get a test setup going here, which leads me to.. > >>>>>> how do I create a ZFS snapshot? (I do have a simple ZFS pool running > >>>>>> on a test machine, but I've never done a snapshot.) > >>>>>> > >>>>>> Although this problem is not in 13.2, it will have shipped in 14.0. > >>>>>> > >>>>>> Any help with be appreciated, rick > >>>>>> > >>>>>>> > >>>>>>> Mike > >>>>>>>> > >>>>>>>>> rick > >>>>>>>>> > >>>>>>>>> On Fri, Nov 17, 2023 at 6:14 PM Mike Karels <m...@karels.net> wrote: > >>>>>>>>>> > >>>>>>>>>> CAUTION: This email originated from outside of the University of > >>>>>>>>>> Guelph. Do not click links or open attachments unless you > >>>>>>>>>> recognize the sender and know the content is safe. If in doubt, > >>>>>>>>>> forward suspicious emails to ith...@uoguelph.ca. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Rick, have you been following this thread on freebsd-stable? I > >>>>>>>>>> have been able > >>>>>>>>>> to reproduce this using a 13-stable server from Oct 7 and a > >>>>>>>>>> 15-current system > >>>>>>>>>> that is up to date using NFSv3. I did not reproduce with a 13.2 > >>>>>>>>>> server. The > >>>>>>>>>> client was running 13.2. Any ideas? A full bisect seems fairly > >>>>>>>>>> painful, but > >>>>>>>>>> maybe you have an idea of points to try. Fortunately, these are > >>>>>>>>>> all test > >>>>>>>>>> systems that I can reboot at will. > >>>>>>>>>> > >>>>>>>>>> Mike > >>>>>>>>>> > >>>>>>>>>> Forwarded message: > >>>>>>>>>> > >>>>>>>>>>> From: Garrett Wollman <woll...@bimajority.org> > >>>>>>>>>>> To: Mike Karels <m...@karels.net> > >>>>>>>>>>> Cc: freebsd-sta...@freebsd.org > >>>>>>>>>>> Subject: Re: NFS exports of ZFS snapshots broken > >>>>>>>>>>> Date: Fri, 17 Nov 2023 17:35:04 -0500 > >>>>>>>>>>> > >>>>>>>>>>> <<On Fri, 17 Nov 2023 15:57:42 -0600, Mike Karels > >>>>>>>>>>> <m...@karels.net> said: > >>>>>>>>>>> > >>>>>>>>>>>> I have not run into this, so I tried it just now. I had no > >>>>>>>>>>>> problem. > >>>>>>>>>>>> The server is 13.2, fully patched, the client is up-to-date > >>>>>>>>>>>> -current, > >>>>>>>>>>>> and the mount is v4. > >>>>>>>>>>> > >>>>>>>>>>> On my 13.2 client and 13-stable server, I see: > >>>>>>>>>>> > >>>>>>>>>>> 25034 ls CALL > >>>>>>>>>>> open(0x237d32f9a000,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>) > >>>>>>>>>>> 25034 ls NAMI "/mnt/tools/.zfs/snapshot/weekly-2023-45" > >>>>>>>>>>> 25034 ls RET open 4 > >>>>>>>>>>> 25034 ls CALL fcntl(0x4,F_ISUNIONSTACK,0x0) > >>>>>>>>>>> 25034 ls RET fcntl 0 > >>>>>>>>>>> 25034 ls CALL > >>>>>>>>>>> getdirentries(0x4,0x237d32faa000,0x1000,0x237d32fa7028) > >>>>>>>>>>> 25034 ls RET getdirentries -1 errno 5 Input/output error > >>>>>>>>>>> 25034 ls CALL close(0x4) > >>>>>>>>>>> 25034 ls RET close 0 > >>>>>>>>>>> 25034 ls CALL exit(0) > >>>>>>>>>>> > >>>>>>>>>>> Certainly a libc bug here that getdirentries(2) returning [EIO] > >>>>>>>>>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is > >>>>>>>>>>> consistent across both FreeBSD and Linux clients. > >>>>>>>>>>> > >>>>>>>>>>> Looking at this from the RPC side: > >>>>>>>>>>> > >>>>>>>>>>> (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR) > >>>>>>>>>>> [NFS4_OK for all ops] > >>>>>>>>>>> (PUTFH, GETATTR) > >>>>>>>>>>> [NFS4_OK, NFS4_OK] > >>>>>>>>>>> (PUTFH, ACCESS(0x3f), GETATTR) > >>>>>>>>>>> [NFS4_OK, NFS4_OK, rights = 0x03, NFS4_OK] > >>>>>>>>>>> (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR) > >>>>>>>>>>> [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE] > >>>>>>>>>>> > >>>>>>>>>>> and at this point the [EIO] is returned. > >>>>>>>>>>> > >>>>>>>>>>> It seems that clients always do a LOOKUPP before calling READDIR, > >>>>>>>>>>> and > >>>>>>>>>>> this is failing when the subject file handle is the snapshot. The > >>>>>>>>>>> client is perfectly able to *traverse into* the snapshot: if I > >>>>>>>>>>> try to > >>>>>>>>>>> list a subdirectory I know exists in the snapshot, the client is > >>>>>>>>>>> able to > >>>>>>>>>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with > >>>>>>>>>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*. > >>>>>>>>>>> > >>>>>>>>>>> -GAWollman > >>>>>>>>>>