On Sat, Nov 18, 2023 at 8:09 AM Rick Macklem <rick.mack...@gmail.com> wrote:
>
> On Fri, Nov 17, 2023 at 8:19 PM Mike Karels <m...@karels.net> wrote:
> >
> > On 17 Nov 2023, at 22:14, Mike Karels wrote:
> >
> > > On 17 Nov 2023, at 21:24, Rick Macklem wrote:
> > >
> > >> Most of the changes in stable/13 that are not in releng/13.2
> > >> are the "make it work in a jail" stuff. Unfortunately, they are
> > >> a large # of changes (mostly trivial edits adding vnet macros),
> > >> but it also includes export check changes.
> > >>
> > >> I have attached a trivial patch that I think disables the export
> > >> checks for jails. If either of you can try it and see if it fixes
> > >> the problem, that would be great.
> > >> (Note that this is only for testing, although it probably does not
> > >>  matter unless you are running nfsd(8) in vnet jails.)
> > >
> > > Yes, I can see snapshots with the patch.  This system is just a test
> > > system that doesn't normally run ZFS or NFS, so no problem messing
> > > with permissions.  It's a bhyve VM, so I just added a small disk and
> > > enabled ZFS for testing.
> >
> > btw, you might try to get mm@ or maybe mav@ to help out from the ZFS
> > side.  It must be doing something differently inside a snapshot than
> > outside, maybe with file handles or something like that.
> Yes. I've added freebsd-current@ (although Garrett is not on it, he is
> cc'd) and these guys specifically...
>
> So, here's what appears to be the problem...
> Commit 88175af (in main and stable/13, but not 13.2) added checks for
> nfsd(8) running in jails by filling in mnt_exjail with a reference to the cred
> used when the file system is exported.
> When mnt_exjail is found NULL, the current nfsd code assumes that there
> is no access allowed for the mount.
>
> My vague understanding is that when a ZFS snapshot is accessed, it is
> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that
> mnt_exjail is NULL as a result.
> Since I do not know the ZFS code and don't even have an easy way to
> test this (thankfully Mike can test easily), I do not know what to do from
> here?
>
> Is there a "struct mount" constructed for this pseudo mount
> (or it actually appears to be the lookup of ".." that fails, so it
> might be the parent of the snapshot subdir?)?
>
> One thought is that I can check to see if the mount pointer is in the
> mountlist (I don't think the snapshot's mount is in the mountlist) and
> avoid the jail test for this case.  This would assume that snapshots are
> always within the file system(s) exported via that jail (which includes
> the case of prison0, of course), so that they do not need a separate
> jail check.
>
> If this doesn't work, there will need to be some sort of messing about
> in ZFS to set mnt_exjail for these.
Ok, so now onto the hard part...
Thanks to Mike and others, I did create a snapshot under .zfs and I can
see the problem. It is that mnt_exjail == NULL.
Now, is there a way that this "struct mount" can be recognized as "special"
for snapshots, so I can avoid the mnt_exjail == NULL test?
(I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is not
 the case.)

Do I need to search mountlist for it?

rick
ps: The hack patch attached should fix the problem, but can only be
      safely used if mountd/nfsd are not run in any jails.

>
> I will try and get a test setup going here, which leads me to..
> how do I create a ZFS snapshot? (I do have a simple ZFS pool running
> on a test machine, but I've never done a snapshot.)
>
> Although this problem is not in 13.2, it will have shipped in 14.0.
>
> Any help with be appreciated, rick
>
> >
> >                 Mike
> > >
> > >> rick
> > >>
> > >> On Fri, Nov 17, 2023 at 6:14 PM Mike Karels <m...@karels.net> wrote:
> > >>>
> > >>> CAUTION: This email originated from outside of the University of 
> > >>> Guelph. Do not click links or open attachments unless you recognize the 
> > >>> sender and know the content is safe. If in doubt, forward suspicious 
> > >>> emails to ith...@uoguelph.ca.
> > >>>
> > >>>
> > >>> Rick, have you been following this thread on freebsd-stable?  I have 
> > >>> been able
> > >>> to reproduce this using a 13-stable server from Oct 7 and a 15-current 
> > >>> system
> > >>> that is up to date using NFSv3.  I did not reproduce with a 13.2 
> > >>> server.  The
> > >>> client was running 13.2.  Any ideas?  A full bisect seems fairly 
> > >>> painful, but
> > >>> maybe you have an idea of points to try.  Fortunately, these are all 
> > >>> test
> > >>> systems that I can reboot at will.
> > >>>
> > >>>                 Mike
> > >>>
> > >>> Forwarded message:
> > >>>
> > >>>> From: Garrett Wollman <woll...@bimajority.org>
> > >>>> To: Mike Karels <m...@karels.net>
> > >>>> Cc: freebsd-sta...@freebsd.org
> > >>>> Subject: Re: NFS exports of ZFS snapshots broken
> > >>>> Date: Fri, 17 Nov 2023 17:35:04 -0500
> > >>>>
> > >>>> <<On Fri, 17 Nov 2023 15:57:42 -0600, Mike Karels <m...@karels.net> 
> > >>>> said:
> > >>>>
> > >>>>> I have not run into this, so I tried it just now.  I had no problem.
> > >>>>> The server is 13.2, fully patched, the client is up-to-date -current,
> > >>>>> and the mount is v4.
> > >>>>
> > >>>> On my 13.2 client and 13-stable server, I see:
> > >>>>
> > >>>>  25034 ls       CALL  
> > >>>> open(0x237d32f9a000,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>)
> > >>>>  25034 ls       NAMI  "/mnt/tools/.zfs/snapshot/weekly-2023-45"
> > >>>>  25034 ls       RET   open 4
> > >>>>  25034 ls       CALL  fcntl(0x4,F_ISUNIONSTACK,0x0)
> > >>>>  25034 ls       RET   fcntl 0
> > >>>>  25034 ls       CALL  
> > >>>> getdirentries(0x4,0x237d32faa000,0x1000,0x237d32fa7028)
> > >>>>  25034 ls       RET   getdirentries -1 errno 5 Input/output error
> > >>>>  25034 ls       CALL  close(0x4)
> > >>>>  25034 ls       RET   close 0
> > >>>>  25034 ls       CALL  exit(0)
> > >>>>
> > >>>> Certainly a libc bug here that getdirentries(2) returning [EIO]
> > >>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is
> > >>>> consistent across both FreeBSD and Linux clients.
> > >>>>
> > >>>> Looking at this from the RPC side:
> > >>>>
> > >>>>       (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR)
> > >>>>               [NFS4_OK for all ops]
> > >>>>       (PUTFH, GETATTR)
> > >>>>               [NFS4_OK, NFS4_OK]
> > >>>>       (PUTFH, ACCESS(0x3f), GETATTR)
> > >>>>               [NFS4_OK, NFS4_OK, rights = 0x03, NFS4_OK]
> > >>>>       (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR)
> > >>>>               [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE]
> > >>>>
> > >>>> and at this point the [EIO] is returned.
> > >>>>
> > >>>> It seems that clients always do a LOOKUPP before calling READDIR, and
> > >>>> this is failing when the subject file handle is the snapshot.  The
> > >>>> client is perfectly able to *traverse into* the snapshot: if I try to
> > >>>> list a subdirectory I know exists in the snapshot, the client is able 
> > >>>> to
> > >>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with
> > >>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*.
> > >>>>
> > >>>> -GAWollman
> > >>>

Attachment: zfsjail.patch
Description: Binary data

Reply via email to