On Thu, Sep 14, 2023 at 11:48:16PM -0700, Eric Biggers wrote:
> On Thu, Sep 14, 2023 at 08:47:41PM -0400, Josef Bacik wrote:
> > Hello,
> > 
> > This is meant as a replacement for the last set of patches Sweet Tea sent 
> > [1].
> > This is an attempt to find a different path forward.  Strip down everything 
> > to
> > the basics.  Essentially all we appear to need is a nonce, and then we can 
> > use
> > the inode context to derive per-extent keys.
> > 
> > I'm sending this as an RFC to see if this is a better direction to try and 
> > make
> > some headway on this project.  The btrfs side doesn't change too much, the 
> > code
> > just needs to be adjusted to use the new helpers for the extent contexts.  I
> > have this work mostly complete, but I'm afraid I won't have it ready for 
> > another
> > day or two and I want to get feedback on this ASAP before I burn too much 
> > time
> > on it.
> > 
> > Additionally there is a callback I've put in the inline block crypto stuff 
> > that
> > we need in order to handle the checksumming.  I made my best guess here as 
> > to
> > what would be the easiest and simplest way to acheive what we need, but I'm 
> > open
> > to suggestions here.
> > 
> > The other note is I've disabled all of the policy variations other than 
> > default
> > v2 policies if you enable extent encryption.  This is for simplicity sake.  
> > We
> > could probably make most of it work, but reflink is basically impossible 
> > for v1
> > with direct key, and is problematic for the lblk related options.  It 
> > appears
> > this is fine, as those other modes are for specific use cases and the vast
> > majority of normal users are encouraged to use normal v2 policies anyway.
> > 
> > This stripped down version gives us most of what we want, we can reflink 
> > between
> > different inodes that have the same policy.  We lose the ability to mix
> > differently encrypted extents in the same inode, but this is an acceptable
> > limitation for now.
> > 
> > This has only been compile tested, and as I've said I haven't wired it
> > completely up into btrfs yet.  But this is based on a rough wire up and 
> > appears
> > to give us everything we need.  The btrfs portion of Sweet Teas patches are
> > basically untouched except where we use these helpers to deal with the 
> > extent
> > contexts.  Thanks,
> > 
> > Josef
> > 
> > [1] 
> > https://lore.kernel.org/linux-fscrypt/cover.1693630890.git.sweettea-ker...@dorminy.me/
> > 
> > Josef Bacik (4):
> >   fscrypt: rename fscrypt_info => fscrypt_inode_info
> >   fscrypt: add per-extent encryption support
> >   fscrypt: disable all but standard v2 policies for extent encryption
> >   blk-crypto: add a process bio callback
> > 
> >  block/blk-crypto-fallback.c |  18 ++++
> >  block/blk-crypto-profile.c  |   2 +
> >  block/blk-crypto.c          |   6 +-
> >  fs/crypto/crypto.c          |  23 +++--
> >  fs/crypto/fname.c           |   6 +-
> >  fs/crypto/fscrypt_private.h |  78 ++++++++++++----
> >  fs/crypto/hooks.c           |   2 +-
> >  fs/crypto/inline_crypt.c    |  50 +++++++++--
> >  fs/crypto/keyring.c         |   4 +-
> >  fs/crypto/keysetup.c        | 174 ++++++++++++++++++++++++++++++++----
> >  fs/crypto/keysetup_v1.c     |  14 +--
> >  fs/crypto/policy.c          |  45 ++++++++--
> >  include/linux/blk-crypto.h  |   9 +-
> >  include/linux/fs.h          |   4 +-
> >  include/linux/fscrypt.h     |  41 ++++++++-
> >  15 files changed, 400 insertions(+), 76 deletions(-)
> 
> Thanks Josef!  At a high level this looks good to me.  It's much simpler.  I
> guess my main question is "what is missing" (besides the obvious things like
> updating the documentation and polishing code comments).  I see you got rid 
> of a
> lot of the complexity in Sweet Tea's patchset, which is great as I think a lot
> of it was unnecessary as I've mentioned.  But maybe something got overlooked?
> I'm mainly wondering about the patches like "fscrypt: allow asynchronous info
> freeing" that were a bit puzzling but have now gone away.
> 

I'm going to fix this in a different way internally in btrfs.  There's only once
place where we can't drop the lock, so I plan to just collate free'd EM's and
free them in bulk after we drop the lock.  This *may* not fix the problem, I
have to wait for lockdep to tell me I'm stupid and I missed some other
dependency, but if that's the case I'll just async free our EM's, that way I can
synchronize dropping the objects at inode drop time to avoid unpleasant timing
issues.

> Not supporting v1 encryption policies is the right call, I think.  xfstests 
> will
> need to be updated to not assume that v1 is always supported, but that's
> something I've been thinking about doing anyway.
> 
> The patch that adds support for checksumming the on-disk data is new.  I see 
> why
> it's needed.  I suppose that's just been overlooked until now?  It's 
> definitely
> correct that you need to checksum the ciphertext, not the plaintext.  
> Otherwise
> the checksums leak information about the plaintext.
> 

I think Sweet Tea was leaving this as a followup exercise, but I'd rather have
everything working out of the box, and since my patchset was a lot simpler I
figured I'd give you more opportunities to yell at me for something.

> Did you consider the idea I mentioned at the end of
> https://lore.kernel.org/r/20230907055233.GB37146@sol.localdomain where we 
> store
> a full fscrypt_context per extent, but for now validate that it matches the
> inode's context (minus the nonce) and only support that case?
> 

I missed this bit of feedback, I like it a lot actually because if we do decide
later to tackle key changing it would be nice if existing file systems had
everything they needed.  I think I'll still keep the slimmed down extent context
and just throw the master key in there and the encrytion type, and then do as
you say and validate it matches the inode.

> I guess the reasons to do that would be (1) futureproofing, (2) error checking
> to catch any bugs where an extent might be accessed inconsistently, and (3)
> making extents "standalone" so that they can be decrypted by anything that
> iterates through the extents only (e.g. btrfs scrub as mentioned by Sweet Tea;
> though, how will scrub even have access to the encryption keys?).  I don't 
> have
> a great sense of how strong these reasons actually are, so any thoughts would 
> be
> appreciated.  If just the nonce is really all that's needed, then that's fine
> too.  The point is, the part I was concerned about wasn't really whether the 
> key
> identifier and encryption settings get stored per extent, but rather whether 
> we
> actually support the case where these differ from the inode's.

Scrub just needs to read the raw bytes and validate the checksums.  This is my
other motivation for doing the checksum thing now, I want to make sure things
like scrub work out of the box since that's one of the "tricky" parts when it
comes to having data it can't actually derypt.

The key identifier is mostly around being able to simply support different keys
in the same inode, primarly for re-keying.  There is some talk of people wanting
to have differently encrypted subvolumes and reflink between them, and having
this information per-extent would make that a lot easier.  For now I've told
those people to kick rocks, but it'll be nice to be setup to add this support
later on.

> 
> (And by "the inode" I really mean "the inode that owns the extent cache entry
> the extent is being accessed through".  It's the case that when an extent is
> shared by multiple inodes, it gets a cache entry for each one, right?)
> 

Yup the cache entry gets loaded from the extent itself, so we'll have duplicate
cache entries since they're tied to the inode.  It's the same for our extent
maps as well, since you could at any point overwrite part of it and need to
update the in-memory mapping.

Thanks for the quick feedback, I'm going to finish making the appropriate
changes to the btrfs patches, make sure everything works.  I'll clean up these
patches and make the changes you suggested, and I'll update the documentation.
It'll probably be early next week when I send again, I'll do the work for
xfstests so everything works in this v2 only world.

Josef

Reply via email to