Re: [PATCH v2 01/19] fs: new API for handling inode->i_version

2017-12-16 Thread Jeff Layton
On Sun, 2017-12-17 at 09:37 +1100, Dave Chinner wrote:
> On Sat, Dec 16, 2017 at 08:46:38AM -0500, Jeff Layton wrote:
> > From: Jeff Layton 
> > 
> > Add a documentation blob that explains what the i_version field is, how
> > it is expected to work, and how it is currently implemented by various
> > filesystems.
> > 
> > We already have inode_inc_iversion. Add several other functions for
> > manipulating and accessing the i_version counter. For now, the
> > implementation is trivial and basically works the way that all of the
> > open-coded i_version accesses work today.
> > 
> > Future patches will convert existing users of i_version to use the new
> > API, and then convert the backend implementation to do things more
> > efficiently.
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  include/linux/fs.h | 200 
> > ++---
> >  1 file changed, 192 insertions(+), 8 deletions(-)
> 
> Just a random sunday morning coffee musing
> 
> I was just wondering if it would be better to split this stuff out
> into it's own header file now? include/linux/fs.h is aleady a
> massive header file (~3500 lines) and changes cause tree-wide
> rebuilds, so maybe it would be better to split relatively isolated
> functionality like this out while it's being reworked and you're
> already touching every file that uses it?
> 
> Cheers,
> 
> Dave.

That's a good idea. Let me do that and I'll re-post.

Thanks,
-- 
Jeff Layton 


Re: [PATCH v2 01/19] fs: new API for handling inode->i_version

2017-12-16 Thread Dave Chinner
On Sat, Dec 16, 2017 at 08:46:38AM -0500, Jeff Layton wrote:
> From: Jeff Layton 
> 
> Add a documentation blob that explains what the i_version field is, how
> it is expected to work, and how it is currently implemented by various
> filesystems.
> 
> We already have inode_inc_iversion. Add several other functions for
> manipulating and accessing the i_version counter. For now, the
> implementation is trivial and basically works the way that all of the
> open-coded i_version accesses work today.
> 
> Future patches will convert existing users of i_version to use the new
> API, and then convert the backend implementation to do things more
> efficiently.
> 
> Signed-off-by: Jeff Layton 
> ---
>  include/linux/fs.h | 200 
> ++---
>  1 file changed, 192 insertions(+), 8 deletions(-)

Just a random sunday morning coffee musing

I was just wondering if it would be better to split this stuff out
into it's own header file now? include/linux/fs.h is aleady a
massive header file (~3500 lines) and changes cause tree-wide
rebuilds, so maybe it would be better to split relatively isolated
functionality like this out while it's being reworked and you're
already touching every file that uses it?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


[PATCH v2 01/19] fs: new API for handling inode->i_version

2017-12-16 Thread Jeff Layton
From: Jeff Layton 

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.

Signed-off-by: Jeff Layton 
---
 include/linux/fs.h | 200 ++---
 1 file changed, 192 insertions(+), 8 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..5001e77342fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2036,19 +2036,203 @@ static inline void inode_dec_link_count(struct inode 
*inode)
mark_inode_dirty(inode);
 }
 
+/*
+ * The change attribute (i_version) is mandated by NFSv4 and is mostly for
+ * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
+ * appear different to observers if there was a change to the inode's data or
+ * metadata since it was last queried.
+ *
+ * It should be considered an opaque value by observers. If it remains the same
+ * since it was last checked, then nothing has changed in the inode. If it's
+ * different then something has changed. Observers cannot infer anything about
+ * the nature or magnitude of the changes from the value, only that the inode
+ * has changed in some fashion.
+ *
+ * Not all filesystems properly implement the i_version counter. Subsystems 
that
+ * want to use i_version field on an inode should first check whether the
+ * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
+ *
+ * Those that set SB_I_VERSION will automatically have their i_version counter
+ * incremented on writes to normal files. If the SB_I_VERSION is not set, then
+ * the VFS will not touch it on writes, and the filesystem can use it how it
+ * wishes. Note that the filesystem is always responsible for updating the
+ * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
+ * We consider these sorts of filesystems to have a kernel-managed i_version.
+ *
+ * Note that some filesystems (e.g. NFS and AFS) just use the field to store
+ * a server-provided value (for the most part). For that reason, those
+ * filesystems do not set SB_I_VERSION. These filesystems are considered to
+ * have a self-managed i_version.
+ */
+
+/**
+ * inode_set_iversion_raw - set i_version to the specified raw value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for use by
+ * filesystems that self-manage the i_version.
+ *
+ * For example, the NFS client stores its NFSv4 change attribute in this way,
+ * and the AFS client stores the data_version from the server here.
+ */
+static inline void
+inode_set_iversion_raw(struct inode *inode, const u64 new)
+{
+   inode->i_version = new;
+}
+
+/**
+ * inode_set_iversion - set i_version to a particular value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for filesystems with
+ * a kernel-managed i_version.
+ *
+ * For now, this just does the same thing as the _raw variant.
+ */
+static inline void
+inode_set_iversion(struct inode *inode, const u64 new)
+{
+   inode_set_iversion_raw(inode, new);
+}
+
+/**
+ * inode_set_iversion_queried - set i_version to a particular value and set
+ *  flag to indicate that it has been viewed
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * When loading in an i_version value from a backing store, we typically don't
+ * know whether it was previously viewed before being stored or not. Thus, we
+ * must assume that it was, to ensure that any changes will result in the
+ * value changing.
+ *
+ * This function will set the inode's i_version, and possibly flag the value
+ * as if it has already been viewed at least once.
+ *
+ * For now, this just does what inode_set_iversion does.
+ */
+static inline void
+inode_set_iversion_queried(struct inode *inode, const u64 new)
+{
+   inode_set_iversion(inode, new);
+}
+
+/**
+ * inode_maybe_inc_iversion - increments i_version
+ * @inode: inode with the i_version that should be updated
+ * @force: increment the counter even if it's not necessary
+ *
+ * Every time the inode is modified, the i_version field must be seen to have
+ * changed by any observer.
+ *
+ * In this implementation, we always increment it after taking the i_lock to
+ * ensure that we don't race with other incrementors.
+ *
+ * Returns true if counter was bumped, and false if it wasn't.
+ */
+static inline bool
+inode_maybe_inc_iversion(struct inode *inode, bool force)
+