Re: [f2fs-dev] [PATCH v6 2/7] fs: add infrastructure for multigrain timestamps

2023-08-02 Thread Jeff Layton
On Wed, 2023-08-02 at 21:35 +0200, Jan Kara wrote:
> On Tue 25-07-23 10:58:15, Jeff Layton wrote:
> > The VFS always uses coarse-grained timestamps when updating the ctime
> > and mtime after a change. This has the benefit of allowing filesystems
> > to optimize away a lot metadata updates, down to around 1 per jiffy,
> > even when a file is under heavy writes.
> > 
> > Unfortunately, this has always been an issue when we're exporting via
> > NFSv3, which relies on timestamps to validate caches. A lot of changes
> > can happen in a jiffy, so timestamps aren't sufficient to help the
> > client decide to invalidate the cache. Even with NFSv4, a lot of
> > exported filesystems don't properly support a change attribute and are
> > subject to the same problems with timestamp granularity. Other
> > applications have similar issues with timestamps (e.g backup
> > applications).
> > 
> > If we were to always use fine-grained timestamps, that would improve the
> > situation, but that becomes rather expensive, as the underlying
> > filesystem would have to log a lot more metadata updates.
> > 
> > What we need is a way to only use fine-grained timestamps when they are
> > being actively queried.
> > 
> > POSIX generally mandates that when the the mtime changes, the ctime must
> > also change. The kernel always stores normalized ctime values, so only
> > the first 30 bits of the tv_nsec field are ever used.
> > 
> > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > has queried the inode for the mtime or ctime. When this flag is set,
> > on the next mtime or ctime update, the kernel will fetch a fine-grained
> > timestamp instead of the usual coarse-grained one.
> > 
> > Filesytems can opt into this behavior by setting the FS_MGTIME flag in
> > the fstype. Filesystems that don't set this flag will continue to use
> > coarse-grained timestamps.
> > 
> > Later patches will convert individual filesystems to use the new
> > infrastructure.
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/inode.c | 98 
> > ++
> >  fs/stat.c  | 41 +--
> >  include/linux/fs.h | 45 +++--
> >  3 files changed, 151 insertions(+), 33 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index d4ab92233062..369621e7faf5 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1919,6 +1919,21 @@ int inode_update_time(struct inode *inode, struct 
> > timespec64 *time, int flags)
> >  }
> >  EXPORT_SYMBOL(inode_update_time);
> >  
> > +/**
> > + * current_coarse_time - Return FS time
> > + * @inode: inode.
> > + *
> > + * Return the current coarse-grained time truncated to the time
> > + * granularity supported by the fs.
> > + */
> > +static struct timespec64 current_coarse_time(struct inode *inode)
> > +{
> > +   struct timespec64 now;
> > +
> > +   ktime_get_coarse_real_ts64();
> > +   return timestamp_truncate(now, inode);
> > +}
> > +
> >  /**
> >   * atime_needs_update  -   update the access time
> >   * @path: the  path to update
> > @@ -1952,7 +1967,7 @@ bool atime_needs_update(const struct path *path, 
> > struct inode *inode)
> > if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
> > return false;
> >  
> > -   now = current_time(inode);
> > +   now = current_coarse_time(inode);
> >  
> > if (!relatime_need_update(mnt, inode, now))
> > return false;
> > @@ -1986,7 +2001,7 @@ void touch_atime(const struct path *path)
> >  * We may also fail on filesystems that have the ability to make parts
> >  * of the fs read only, e.g. subvolumes in Btrfs.
> >  */
> > -   now = current_time(inode);
> > +   now = current_coarse_time(inode);
> > inode_update_time(inode, , S_ATIME);
> > __mnt_drop_write(mnt);
> >  skip_update:
> 
> There are also calls in fs/smb/client/file.c:cifs_readpage_worker() and in
> fs/ocfs2/file.c:ocfs2_update_inode_atime() that should probably use
> current_coarse_time() to avoid needless querying of fine grained
> timestamps. But see below...
> 

Technically, they already devolve to current_coarse_time anyway, but
changing them would allow them to skip the fstype flag check, but I like
your idea below better anyway.

> > @@ -2072,6 +2087,56 @@ int file_remove_privs(struct file *file)
> >  }
> >  EXPORT_SYMBOL(file_remove_privs);
> >  
> > +/**
> > + * current_mgtime - Return FS time (possibly fine-grained)
> > + * @inode: inode.
> > + *
> > + * Return the current time truncated to the time granularity supported by
> > + * the fs, as suitable for a ctime/mtime change. If the ctime is flagged
> > + * as having been QUERIED, get a fine-grained timestamp.
> > + */
> > +static struct timespec64 current_mgtime(struct inode *inode)
> > +{
> > +   struct timespec64 now;
> > +   atomic_long_t *pnsec = (atomic_long_t *)>__i_ctime.tv_nsec;
> > +   long nsec = atomic_long_read(pnsec);
> > +
> > +   if (nsec & 

Re: [f2fs-dev] [PATCH v6 4/7] tmpfs: add support for multigrain timestamps

2023-08-02 Thread Jan Kara
On Tue 25-07-23 10:58:17, Jeff Layton wrote:
> Enable multigrain timestamps, which should ensure that there is an
> apparent change to the timestamp whenever it has been written after
> being actively observed via getattr.
> 
> tmpfs only requires the FS_MGTIME flag.
> 
> Signed-off-by: Jeff Layton 

Looks good. Feel free to add:

Reviewed-by: Jan Kara 

Honza

> ---
>  mm/shmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 654d9a585820..b6019c905058 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -4264,7 +4264,7 @@ static struct file_system_type shmem_fs_type = {
>  #endif
>   .kill_sb= kill_litter_super,
>  #ifdef CONFIG_SHMEM
> - .fs_flags   = FS_USERNS_MOUNT | FS_ALLOW_IDMAP,
> + .fs_flags   = FS_USERNS_MOUNT | FS_ALLOW_IDMAP | FS_MGTIME,
>  #else
>   .fs_flags   = FS_USERNS_MOUNT,
>  #endif
> 
> -- 
> 2.41.0
> 
-- 
Jan Kara 
SUSE Labs, CR


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6 6/7] ext4: switch to multigrain timestamps

2023-08-02 Thread Jan Kara
On Tue 25-07-23 10:58:19, Jeff Layton wrote:
> Enable multigrain timestamps, which should ensure that there is an
> apparent change to the timestamp whenever it has been written after
> being actively observed via getattr.
> 
> For ext4, we only need to enable the FS_MGTIME flag.
> 
> Signed-off-by: Jeff Layton 

Looks good. Feel free to add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/ext4/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b54c70e1a74e..cb1ff47af156 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -7279,7 +7279,7 @@ static struct file_system_type ext4_fs_type = {
>   .init_fs_context= ext4_init_fs_context,
>   .parameters = ext4_param_specs,
>   .kill_sb= kill_block_super,
> - .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> + .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
>  };
>  MODULE_ALIAS_FS("ext4");
>  
> 
> -- 
> 2.41.0
> 
-- 
Jan Kara 
SUSE Labs, CR


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6 2/7] fs: add infrastructure for multigrain timestamps

2023-08-02 Thread Jan Kara
On Tue 25-07-23 10:58:15, Jeff Layton wrote:
> The VFS always uses coarse-grained timestamps when updating the ctime
> and mtime after a change. This has the benefit of allowing filesystems
> to optimize away a lot metadata updates, down to around 1 per jiffy,
> even when a file is under heavy writes.
> 
> Unfortunately, this has always been an issue when we're exporting via
> NFSv3, which relies on timestamps to validate caches. A lot of changes
> can happen in a jiffy, so timestamps aren't sufficient to help the
> client decide to invalidate the cache. Even with NFSv4, a lot of
> exported filesystems don't properly support a change attribute and are
> subject to the same problems with timestamp granularity. Other
> applications have similar issues with timestamps (e.g backup
> applications).
> 
> If we were to always use fine-grained timestamps, that would improve the
> situation, but that becomes rather expensive, as the underlying
> filesystem would have to log a lot more metadata updates.
> 
> What we need is a way to only use fine-grained timestamps when they are
> being actively queried.
> 
> POSIX generally mandates that when the the mtime changes, the ctime must
> also change. The kernel always stores normalized ctime values, so only
> the first 30 bits of the tv_nsec field are ever used.
> 
> Use the 31st bit of the ctime tv_nsec field to indicate that something
> has queried the inode for the mtime or ctime. When this flag is set,
> on the next mtime or ctime update, the kernel will fetch a fine-grained
> timestamp instead of the usual coarse-grained one.
> 
> Filesytems can opt into this behavior by setting the FS_MGTIME flag in
> the fstype. Filesystems that don't set this flag will continue to use
> coarse-grained timestamps.
> 
> Later patches will convert individual filesystems to use the new
> infrastructure.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/inode.c | 98 
> ++
>  fs/stat.c  | 41 +--
>  include/linux/fs.h | 45 +++--
>  3 files changed, 151 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index d4ab92233062..369621e7faf5 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1919,6 +1919,21 @@ int inode_update_time(struct inode *inode, struct 
> timespec64 *time, int flags)
>  }
>  EXPORT_SYMBOL(inode_update_time);
>  
> +/**
> + * current_coarse_time - Return FS time
> + * @inode: inode.
> + *
> + * Return the current coarse-grained time truncated to the time
> + * granularity supported by the fs.
> + */
> +static struct timespec64 current_coarse_time(struct inode *inode)
> +{
> + struct timespec64 now;
> +
> + ktime_get_coarse_real_ts64();
> + return timestamp_truncate(now, inode);
> +}
> +
>  /**
>   *   atime_needs_update  -   update the access time
>   *   @path: the  path to update
> @@ -1952,7 +1967,7 @@ bool atime_needs_update(const struct path *path, struct 
> inode *inode)
>   if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
>   return false;
>  
> - now = current_time(inode);
> + now = current_coarse_time(inode);
>  
>   if (!relatime_need_update(mnt, inode, now))
>   return false;
> @@ -1986,7 +2001,7 @@ void touch_atime(const struct path *path)
>* We may also fail on filesystems that have the ability to make parts
>* of the fs read only, e.g. subvolumes in Btrfs.
>*/
> - now = current_time(inode);
> + now = current_coarse_time(inode);
>   inode_update_time(inode, , S_ATIME);
>   __mnt_drop_write(mnt);
>  skip_update:

There are also calls in fs/smb/client/file.c:cifs_readpage_worker() and in
fs/ocfs2/file.c:ocfs2_update_inode_atime() that should probably use
current_coarse_time() to avoid needless querying of fine grained
timestamps. But see below...

> @@ -2072,6 +2087,56 @@ int file_remove_privs(struct file *file)
>  }
>  EXPORT_SYMBOL(file_remove_privs);
>  
> +/**
> + * current_mgtime - Return FS time (possibly fine-grained)
> + * @inode: inode.
> + *
> + * Return the current time truncated to the time granularity supported by
> + * the fs, as suitable for a ctime/mtime change. If the ctime is flagged
> + * as having been QUERIED, get a fine-grained timestamp.
> + */
> +static struct timespec64 current_mgtime(struct inode *inode)
> +{
> + struct timespec64 now;
> + atomic_long_t *pnsec = (atomic_long_t *)>__i_ctime.tv_nsec;
> + long nsec = atomic_long_read(pnsec);
> +
> + if (nsec & I_CTIME_QUERIED) {
> + ktime_get_real_ts64();
> + } else {
> + struct timespec64 ctime;
> +
> + ktime_get_coarse_real_ts64();
> +
> + /*
> +  * If we've recently fetched a fine-grained timestamp
> +  * then the coarse-grained one may still be earlier than the
> +  * existing one. Just keep the existing ctime if so.
> +  */
> +   

Re: [f2fs-dev] [PATCH v6 1/7] fs: pass the request_mask to generic_fillattr

2023-08-02 Thread Paulo Alcantara via Linux-f2fs-devel
Jeff Layton  writes:

> generic_fillattr just fills in the entire stat struct indiscriminately
> today, copying data from the inode. There is at least one attribute
> (STATX_CHANGE_COOKIE) that can have side effects when it is reported,
> and we're looking at adding more with the addition of multigrain
> timestamps.
>
> Add a request_mask argument to generic_fillattr and have most callers
> just pass in the value that is passed to getattr. Have other callers
> (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
> STATX_CHANGE_COOKIE into generic_fillattr.
>
> Signed-off-by: Jeff Layton 
> ---
>  fs/9p/vfs_inode.c   |  4 ++--
>  fs/9p/vfs_inode_dotl.c  |  4 ++--
>  fs/afs/inode.c  |  2 +-
>  fs/btrfs/inode.c|  2 +-
>  fs/ceph/inode.c |  2 +-
>  fs/coda/inode.c |  3 ++-
>  fs/ecryptfs/inode.c |  5 +++--
>  fs/erofs/inode.c|  2 +-
>  fs/exfat/file.c |  2 +-
>  fs/ext2/inode.c |  2 +-
>  fs/ext4/inode.c |  2 +-
>  fs/f2fs/file.c  |  2 +-
>  fs/fat/file.c   |  2 +-
>  fs/fuse/dir.c   |  2 +-
>  fs/gfs2/inode.c |  2 +-
>  fs/hfsplus/inode.c  |  2 +-
>  fs/kernfs/inode.c   |  2 +-
>  fs/libfs.c  |  4 ++--
>  fs/minix/inode.c|  2 +-
>  fs/nfs/inode.c  |  2 +-
>  fs/nfs/namespace.c  |  3 ++-
>  fs/ntfs3/file.c |  2 +-
>  fs/ocfs2/file.c |  2 +-
>  fs/orangefs/inode.c |  2 +-
>  fs/proc/base.c  |  4 ++--
>  fs/proc/fd.c|  2 +-
>  fs/proc/generic.c   |  2 +-
>  fs/proc/proc_net.c  |  2 +-
>  fs/proc/proc_sysctl.c   |  2 +-
>  fs/proc/root.c  |  3 ++-
>  fs/smb/client/inode.c   |  2 +-
>  fs/smb/server/smb2pdu.c | 22 +++---
>  fs/smb/server/vfs.c |  3 ++-
>  fs/stat.c   | 18 ++
>  fs/sysv/itree.c |  3 ++-
>  fs/ubifs/dir.c  |  2 +-
>  fs/udf/symlink.c|  2 +-
>  fs/vboxsf/utils.c   |  2 +-
>  include/linux/fs.h  |  2 +-
>  mm/shmem.c  |  2 +-
>  40 files changed, 70 insertions(+), 62 deletions(-)
>
> [...]
>
> diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
> index 218f03dd3f52..93fe43789d7a 100644
> --- a/fs/smb/client/inode.c
> +++ b/fs/smb/client/inode.c
> @@ -2540,7 +2540,7 @@ int cifs_getattr(struct mnt_idmap *idmap, const struct 
> path *path,
>   return rc;
>   }
>  
> - generic_fillattr(_mnt_idmap, inode, stat);
> + generic_fillattr(_mnt_idmap, request_mask, inode, stat);
>   stat->blksize = cifs_sb->ctx->bsize;
>   stat->ino = CIFS_I(inode)->uniqueid;

Reviewed-by: Paulo Alcantara (SUSE) 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6 5/7] xfs: switch to multigrain timestamps

2023-08-02 Thread Jeff Layton
On Wed, 2023-08-02 at 10:48 -0700, Darrick J. Wong wrote:
> On Tue, Jul 25, 2023 at 10:58:18AM -0400, Jeff Layton wrote:
> > Enable multigrain timestamps, which should ensure that there is an
> > apparent change to the timestamp whenever it has been written after
> > being actively observed via getattr.
> > 
> > Also, anytime the mtime changes, the ctime must also change, and those
> > are now the only two options for xfs_trans_ichgtime. Have that function
> > unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
> > always set.
> > 
> > Signed-off-by: Jeff Layton 
> > ---
> >  fs/xfs/libxfs/xfs_trans_inode.c | 6 +++---
> >  fs/xfs/xfs_iops.c   | 4 ++--
> >  fs/xfs/xfs_super.c  | 2 +-
> >  3 files changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c 
> > b/fs/xfs/libxfs/xfs_trans_inode.c
> > index 6b2296ff248a..ad22656376d3 100644
> > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > @@ -62,12 +62,12 @@ xfs_trans_ichgtime(
> > ASSERT(tp);
> > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> >  
> > -   tv = current_time(inode);
> > +   /* If the mtime changes, then ctime must also change */
> > +   ASSERT(flags & XFS_ICHGTIME_CHG);
> >  
> > +   tv = inode_set_ctime_current(inode);
> > if (flags & XFS_ICHGTIME_MOD)
> > inode->i_mtime = tv;
> > -   if (flags & XFS_ICHGTIME_CHG)
> > -   inode_set_ctime_to_ts(inode, tv);
> > if (flags & XFS_ICHGTIME_CREATE)
> > ip->i_crtime = tv;
> >  }
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 3a9363953ef2..3f89ef5a2820 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -573,10 +573,10 @@ xfs_vn_getattr(
> > stat->gid = vfsgid_into_kgid(vfsgid);
> > stat->ino = ip->i_ino;
> > stat->atime = inode->i_atime;
> > -   stat->mtime = inode->i_mtime;
> > -   stat->ctime = inode_get_ctime(inode);
> > stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
> >  
> > +   fill_mg_cmtime(request_mask, inode, stat);
> 
> Huh.  I would've thought @stat would come first since that's what we're
> acting upon, but ... eh. :)
> 
> If everyone else is ok with the fill_mg_cmtime signature,
> Acked-by: Darrick J. Wong 
> 
> 

Good point. We can change the signature. I think xfs is the only caller
outside of the generic vfs right now, and it'd be best to do it now.

Christian, would you prefer that I send an updated series, or patches on
top of vfs.ctime that can be folded in?
 
-- 
Jeff Layton 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6 5/7] xfs: switch to multigrain timestamps

2023-08-02 Thread Darrick J. Wong
On Tue, Jul 25, 2023 at 10:58:18AM -0400, Jeff Layton wrote:
> Enable multigrain timestamps, which should ensure that there is an
> apparent change to the timestamp whenever it has been written after
> being actively observed via getattr.
> 
> Also, anytime the mtime changes, the ctime must also change, and those
> are now the only two options for xfs_trans_ichgtime. Have that function
> unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
> always set.
> 
> Signed-off-by: Jeff Layton 
> ---
>  fs/xfs/libxfs/xfs_trans_inode.c | 6 +++---
>  fs/xfs/xfs_iops.c   | 4 ++--
>  fs/xfs/xfs_super.c  | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> index 6b2296ff248a..ad22656376d3 100644
> --- a/fs/xfs/libxfs/xfs_trans_inode.c
> +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> @@ -62,12 +62,12 @@ xfs_trans_ichgtime(
>   ASSERT(tp);
>   ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
>  
> - tv = current_time(inode);
> + /* If the mtime changes, then ctime must also change */
> + ASSERT(flags & XFS_ICHGTIME_CHG);
>  
> + tv = inode_set_ctime_current(inode);
>   if (flags & XFS_ICHGTIME_MOD)
>   inode->i_mtime = tv;
> - if (flags & XFS_ICHGTIME_CHG)
> - inode_set_ctime_to_ts(inode, tv);
>   if (flags & XFS_ICHGTIME_CREATE)
>   ip->i_crtime = tv;
>  }
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 3a9363953ef2..3f89ef5a2820 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -573,10 +573,10 @@ xfs_vn_getattr(
>   stat->gid = vfsgid_into_kgid(vfsgid);
>   stat->ino = ip->i_ino;
>   stat->atime = inode->i_atime;
> - stat->mtime = inode->i_mtime;
> - stat->ctime = inode_get_ctime(inode);
>   stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
>  
> + fill_mg_cmtime(request_mask, inode, stat);

Huh.  I would've thought @stat would come first since that's what we're
acting upon, but ... eh. :)

If everyone else is ok with the fill_mg_cmtime signature,
Acked-by: Darrick J. Wong 

--D

> +
>   if (xfs_has_v3inodes(mp)) {
>   if (request_mask & STATX_BTIME) {
>   stat->result_mask |= STATX_BTIME;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 818510243130..4b10edb2c972 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -2009,7 +2009,7 @@ static struct file_system_type xfs_fs_type = {
>   .init_fs_context= xfs_init_fs_context,
>   .parameters = xfs_fs_parameters,
>   .kill_sb= kill_block_super,
> - .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> + .fs_flags   = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
>  };
>  MODULE_ALIAS_FS("xfs");
>  
> 
> -- 
> 2.41.0
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v6 1/7] fs: pass the request_mask to generic_fillattr

2023-08-02 Thread Jan Kara
On Tue 25-07-23 10:58:14, Jeff Layton wrote:
> generic_fillattr just fills in the entire stat struct indiscriminately
> today, copying data from the inode. There is at least one attribute
> (STATX_CHANGE_COOKIE) that can have side effects when it is reported,
> and we're looking at adding more with the addition of multigrain
> timestamps.
> 
> Add a request_mask argument to generic_fillattr and have most callers
> just pass in the value that is passed to getattr. Have other callers
> (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
> STATX_CHANGE_COOKIE into generic_fillattr.
> 
> Signed-off-by: Jeff Layton 

Looks good. Feel free to add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/9p/vfs_inode.c   |  4 ++--
>  fs/9p/vfs_inode_dotl.c  |  4 ++--
>  fs/afs/inode.c  |  2 +-
>  fs/btrfs/inode.c|  2 +-
>  fs/ceph/inode.c |  2 +-
>  fs/coda/inode.c |  3 ++-
>  fs/ecryptfs/inode.c |  5 +++--
>  fs/erofs/inode.c|  2 +-
>  fs/exfat/file.c |  2 +-
>  fs/ext2/inode.c |  2 +-
>  fs/ext4/inode.c |  2 +-
>  fs/f2fs/file.c  |  2 +-
>  fs/fat/file.c   |  2 +-
>  fs/fuse/dir.c   |  2 +-
>  fs/gfs2/inode.c |  2 +-
>  fs/hfsplus/inode.c  |  2 +-
>  fs/kernfs/inode.c   |  2 +-
>  fs/libfs.c  |  4 ++--
>  fs/minix/inode.c|  2 +-
>  fs/nfs/inode.c  |  2 +-
>  fs/nfs/namespace.c  |  3 ++-
>  fs/ntfs3/file.c |  2 +-
>  fs/ocfs2/file.c |  2 +-
>  fs/orangefs/inode.c |  2 +-
>  fs/proc/base.c  |  4 ++--
>  fs/proc/fd.c|  2 +-
>  fs/proc/generic.c   |  2 +-
>  fs/proc/proc_net.c  |  2 +-
>  fs/proc/proc_sysctl.c   |  2 +-
>  fs/proc/root.c  |  3 ++-
>  fs/smb/client/inode.c   |  2 +-
>  fs/smb/server/smb2pdu.c | 22 +++---
>  fs/smb/server/vfs.c |  3 ++-
>  fs/stat.c   | 18 ++
>  fs/sysv/itree.c |  3 ++-
>  fs/ubifs/dir.c  |  2 +-
>  fs/udf/symlink.c|  2 +-
>  fs/vboxsf/utils.c   |  2 +-
>  include/linux/fs.h  |  2 +-
>  mm/shmem.c  |  2 +-
>  40 files changed, 70 insertions(+), 62 deletions(-)
> 
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 16d85e6033a3..d24d1f20e922 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -1016,7 +1016,7 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct 
> path *path,
>   p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
>   v9ses = v9fs_dentry2v9ses(dentry);
>   if (v9ses->cache & (CACHE_META|CACHE_LOOSE)) {
> - generic_fillattr(_mnt_idmap, inode, stat);
> + generic_fillattr(_mnt_idmap, request_mask, inode, stat);
>   return 0;
>   } else if (v9ses->cache & CACHE_WRITEBACK) {
>   if (S_ISREG(inode->i_mode)) {
> @@ -1037,7 +1037,7 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct 
> path *path,
>   return PTR_ERR(st);
>  
>   v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
> - generic_fillattr(_mnt_idmap, d_inode(dentry), stat);
> + generic_fillattr(_mnt_idmap, request_mask, d_inode(dentry), stat);
>  
>   p9stat_free(st);
>   kfree(st);
> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> index 464ea73d1bf8..8e8d5d2a13d8 100644
> --- a/fs/9p/vfs_inode_dotl.c
> +++ b/fs/9p/vfs_inode_dotl.c
> @@ -451,7 +451,7 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
>   p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
>   v9ses = v9fs_dentry2v9ses(dentry);
>   if (v9ses->cache & (CACHE_META|CACHE_LOOSE)) {
> - generic_fillattr(_mnt_idmap, inode, stat);
> + generic_fillattr(_mnt_idmap, request_mask, inode, stat);
>   return 0;
>   } else if (v9ses->cache) {
>   if (S_ISREG(inode->i_mode)) {
> @@ -476,7 +476,7 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
>   return PTR_ERR(st);
>  
>   v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
> - generic_fillattr(_mnt_idmap, d_inode(dentry), stat);
> + generic_fillattr(_mnt_idmap, request_mask, d_inode(dentry), stat);
>   /* Change block size to what the server returned */
>   stat->blksize = st->st_blksize;
>  
> diff --git a/fs/afs/inode.c b/fs/afs/inode.c
> index 6b636f43f548..1c794a1896aa 100644
> --- a/fs/afs/inode.c
> +++ b/fs/afs/inode.c
> @@ -773,7 +773,7 @@ int afs_getattr(struct mnt_idmap *idmap, const struct 
> path *path,
>  
>   do {
>   read_seqbegin_or_lock(>cb_lock, );
> - generic_fillattr(_mnt_idmap, inode, stat);
> + generic_fillattr(_mnt_idmap, request_mask, inode, stat);
>   if (test_bit(AFS_VNODE_SILLY_DELETED, >flags) &&
>   stat->nlink > 0)
>   stat->nlink -= 1;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 

Re: [f2fs-dev] [PATCH 12/12] xfs use fs_holder_ops for the log and RT devices

2023-08-02 Thread Darrick J. Wong
On Wed, Aug 02, 2023 at 05:41:31PM +0200, Christoph Hellwig wrote:
> Use the generic fs_holder_ops to shut down the file system when the
> log or RT device goes away instead of duplicating the logic.
> 
> Signed-off-by: Christoph Hellwig 

Nice cleanup,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/xfs/xfs_super.c | 17 +++--
>  1 file changed, 3 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d5042419ed9997..338eba71ff8667 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -377,17 +377,6 @@ xfs_setup_dax_always(
>   return 0;
>  }
>  
> -static void
> -xfs_bdev_mark_dead(
> - struct block_device *bdev)
> -{
> - xfs_force_shutdown(bdev->bd_holder, SHUTDOWN_DEVICE_REMOVED);
> -}
> -
> -static const struct blk_holder_ops xfs_holder_ops = {
> - .mark_dead  = xfs_bdev_mark_dead,
> -};
> -
>  STATIC int
>  xfs_blkdev_get(
>   xfs_mount_t *mp,
> @@ -396,8 +385,8 @@ xfs_blkdev_get(
>  {
>   int error = 0;
>  
> - *bdevp = blkdev_get_by_path(name, BLK_OPEN_READ | BLK_OPEN_WRITE, mp,
> - _holder_ops);
> + *bdevp = blkdev_get_by_path(name, BLK_OPEN_READ | BLK_OPEN_WRITE,
> + mp->m_super, _holder_ops);
>   if (IS_ERR(*bdevp)) {
>   error = PTR_ERR(*bdevp);
>   xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
> @@ -412,7 +401,7 @@ xfs_blkdev_put(
>   struct block_device *bdev)
>  {
>   if (bdev)
> - blkdev_put(bdev, mp);
> + blkdev_put(bdev, mp->m_super);
>  }
>  
>  STATIC void
> -- 
> 2.39.2
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 11/12] xfs: drop s_umount over opening the log and RT devices

2023-08-02 Thread Darrick J. Wong
On Wed, Aug 02, 2023 at 05:41:30PM +0200, Christoph Hellwig wrote:
> Just like get_tree_bdev needs to drop s_umount when opening the main
> device, we need to do the same for the xfs log and RT devices to avoid a
> potential lock order reversal with s_unmount for the mark_dead path.
> 
> It might be preferable to just drop s_umount over ->fill_super entirely,
> but that will require a fairly massive audit first, so we'll do the easy
> version here first.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/xfs/xfs_super.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 8185102431301d..d5042419ed9997 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -448,17 +448,21 @@ STATIC int
>  xfs_open_devices(
>   struct xfs_mount*mp)
>  {
> - struct block_device *ddev = mp->m_super->s_bdev;
> + struct super_block  *sb = mp->m_super;
> + struct block_device *ddev = sb->s_bdev;
>   struct block_device *logdev = NULL, *rtdev = NULL;
>   int error;
>  
> + /* see get_tree_bdev why this is needed and safe */

Which part of get_tree_bdev?  Is it this?

/*
 * s_umount nests inside open_mutex during
 * __invalidate_device().  blkdev_put() acquires
 * open_mutex and can't be called under s_umount.  Drop
 * s_umount temporarily.  This is safe as we're
 * holding an active reference.
 */
up_write(>s_umount);
blkdev_put(bdev, fc->fs_type);
down_write(>s_umount);



> + up_write(>s_umount);
> +
>   /*
>* Open real time and log devices - order is important.
>*/
>   if (mp->m_logname) {
>   error = xfs_blkdev_get(mp, mp->m_logname, );
>   if (error)
> - return error;
> + goto out_unlock;
>   }
>  
>   if (mp->m_rtname) {
> @@ -496,7 +500,10 @@ xfs_open_devices(
>   mp->m_logdev_targp = mp->m_ddev_targp;
>   }
>  
> - return 0;
> + error = 0;
> +out_unlock:
> + down_write(>s_umount);

Isn't down_write taking s_umount?  I think the label should be
out_relock or something less misleading.

--D

> + return error;
>  
>   out_free_rtdev_targ:
>   if (mp->m_rtdev_targp)
> @@ -508,7 +515,7 @@ xfs_open_devices(
>   out_close_logdev:
>   if (logdev && logdev != ddev)
>   xfs_blkdev_put(mp, logdev);
> - return error;
> + goto out_unlock;
>  }
>  
>  /*
> -- 
> 2.39.2
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 12/12] xfs use fs_holder_ops for the log and RT devices

2023-08-02 Thread Christoph Hellwig
Use the generic fs_holder_ops to shut down the file system when the
log or RT device goes away instead of duplicating the logic.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_super.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d5042419ed9997..338eba71ff8667 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -377,17 +377,6 @@ xfs_setup_dax_always(
return 0;
 }
 
-static void
-xfs_bdev_mark_dead(
-   struct block_device *bdev)
-{
-   xfs_force_shutdown(bdev->bd_holder, SHUTDOWN_DEVICE_REMOVED);
-}
-
-static const struct blk_holder_ops xfs_holder_ops = {
-   .mark_dead  = xfs_bdev_mark_dead,
-};
-
 STATIC int
 xfs_blkdev_get(
xfs_mount_t *mp,
@@ -396,8 +385,8 @@ xfs_blkdev_get(
 {
int error = 0;
 
-   *bdevp = blkdev_get_by_path(name, BLK_OPEN_READ | BLK_OPEN_WRITE, mp,
-   _holder_ops);
+   *bdevp = blkdev_get_by_path(name, BLK_OPEN_READ | BLK_OPEN_WRITE,
+   mp->m_super, _holder_ops);
if (IS_ERR(*bdevp)) {
error = PTR_ERR(*bdevp);
xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
@@ -412,7 +401,7 @@ xfs_blkdev_put(
struct block_device *bdev)
 {
if (bdev)
-   blkdev_put(bdev, mp);
+   blkdev_put(bdev, mp->m_super);
 }
 
 STATIC void
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 10/12] ext4: use fs_holder_ops for the log device

2023-08-02 Thread Christoph Hellwig
Use the generic fs_holder_ops to shut down the file system when the
log device goes away instead of duplicating the logic.

Signed-off-by: Christoph Hellwig 
---
 fs/ext4/super.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2ccb19d345c6dd..063832e2d12a8e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1096,15 +1096,6 @@ void ext4_update_dynamic_rev(struct super_block *sb)
 */
 }
 
-static void ext4_bdev_mark_dead(struct block_device *bdev)
-{
-   ext4_force_shutdown(bdev->bd_holder, EXT4_GOING_FLAGS_NOLOGFLUSH);
-}
-
-static const struct blk_holder_ops ext4_holder_ops = {
-   .mark_dead  = ext4_bdev_mark_dead,
-};
-
 /*
  * Open the external journal device
  */
@@ -1113,7 +1104,7 @@ static struct block_device *ext4_blkdev_get(dev_t dev, 
struct super_block *sb)
struct block_device *bdev;
 
bdev = blkdev_get_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_WRITE, sb,
-_holder_ops);
+_holder_ops);
if (IS_ERR(bdev))
goto fail;
return bdev;
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 04/12] btrfs: open block devices after superblock creation

2023-08-02 Thread Christoph Hellwig
Currently btrfs_mount_root opens the block devices before committing to
allocating a super block. That creates problems for restricting the
number of writers to a device, and also leads to a unusual and not very
helpful holder (the fs_type).

Reorganize the code to first look whether the superblock for a
particular fsid does already exist and open the block devices only if it
doesn't, mirror the recent changes to the VFS mount helpers.

Signed-off-by: Christoph Hellwig 
---
 fs/btrfs/super.c   | 51 ++
 fs/btrfs/volumes.c |  4 ++--
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b318bddefd5236..5980b5dcc6b163 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1434,10 +1434,8 @@ static struct dentry *mount_subvol(const char 
*subvol_name, u64 subvol_objectid,
 static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
int flags, const char *device_name, void *data)
 {
-   struct block_device *bdev = NULL;
struct super_block *s;
struct btrfs_device *device = NULL;
-   struct btrfs_fs_devices *fs_devices = NULL;
struct btrfs_fs_info *fs_info = NULL;
void *new_sec_opts = NULL;
int error = 0;
@@ -1483,35 +1481,36 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
error = PTR_ERR(device);
goto error_fs_info;
}
-
-   fs_devices = device->fs_devices;
-   fs_info->fs_devices = fs_devices;
-
-   error = btrfs_open_devices(fs_devices, sb_open_mode(flags), fs_type);
+   fs_info->fs_devices = device->fs_devices;
mutex_unlock(_mutex);
-   if (error)
-   goto error_fs_info;
-
-   if (!(flags & SB_RDONLY) && fs_devices->rw_devices == 0) {
-   error = -EACCES;
-   goto error_close_devices;
-   }
 
-   bdev = fs_devices->latest_dev->bdev;
s = sget(fs_type, btrfs_test_super, btrfs_set_super, flags | SB_NOSEC,
 fs_info);
if (IS_ERR(s)) {
error = PTR_ERR(s);
-   goto error_close_devices;
+   goto error_fs_info;
}
 
if (s->s_root) {
-   btrfs_close_devices(fs_devices);
btrfs_free_fs_info(fs_info);
if ((flags ^ s->s_flags) & SB_RDONLY)
error = -EBUSY;
} else {
-   snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
+   struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
+
+   error = btrfs_open_devices(fs_devices, sb_open_mode(flags),
+  fs_type);
+   if (error)
+   goto out_deactivate;
+
+   if (!(flags & SB_RDONLY) && fs_devices->rw_devices == 0) {
+   error = -EACCES;
+   btrfs_close_devices(fs_info->fs_devices);
+   goto out_deactivate;
+   }
+
+   snprintf(s->s_id, sizeof(s->s_id), "%pg",
+fs_devices->latest_dev->bdev);
shrinker_debugfs_rename(>s_shrink, "sb-%s:%s", fs_type->name,
s->s_id);
btrfs_sb(s)->bdev_holder = fs_type;
@@ -1519,21 +1518,19 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
}
if (!error)
error = security_sb_set_mnt_opts(s, new_sec_opts, 0, NULL);
+   if (error)
+   goto out_deactivate;
security_free_mnt_opts(_sec_opts);
-   if (error) {
-   deactivate_locked_super(s);
-   return ERR_PTR(error);
-   }
-
return dget(s->s_root);
 
-error_close_devices:
-   btrfs_close_devices(fs_devices);
-error_fs_info:
-   btrfs_free_fs_info(fs_info);
+out_deactivate:
+   deactivate_locked_super(s);
 error_sec_opts:
security_free_mnt_opts(_sec_opts);
return ERR_PTR(error);
+error_fs_info:
+   btrfs_free_fs_info(fs_info);
+   goto error_sec_opts;
 }
 
 /*
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8246578c70f55b..88e9daae5e74ed 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1269,7 +1269,6 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
 {
int ret;
 
-   lockdep_assert_held(_mutex);
/*
 * The device_list_mutex cannot be taken here in case opening the
 * underlying device takes further locks like open_mutex.
@@ -1277,7 +1276,7 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices,
 * We also don't need the lock here as this is called during mount and
 * exclusion is provided by uuid_mutex
 */
-
+   mutex_lock(_mutex);
if (fs_devices->opened) {
fs_devices->opened++;
ret = 0;
@@ -1285,6 +1284,7 @@ int btrfs_open_devices(struct 

[f2fs-dev] [PATCH 07/12] fs: stop using get_super in fs_mark_dead

2023-08-02 Thread Christoph Hellwig
fs_mark_dead currently uses get_super to find the superblock for the
block device that is going away.  This means it is limited to the
main device stored in sb->s_dev, leading to a lot of code duplication
for file systems that can use multiple block devices.

Now that the holder for all block devices used by file systems is set
to the super_block, we can instead look at that holder and then check
if the file system is born and active, so do that instead.

Signed-off-by: Christoph Hellwig 
---
 fs/super.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 09b65ee1a8b737..0cda4af0a7e16c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1209,17 +1209,39 @@ int get_tree_keyed(struct fs_context *fc,
 EXPORT_SYMBOL(get_tree_keyed);
 
 #ifdef CONFIG_BLOCK
+/*
+ * Lock a super block that the callers holds a reference to.
+ *
+ * The caller needs to ensure that the super_block isn't being freed while
+ * calling this function, e.g. by holding a lock over the call to this function
+ * and the place that clears the pointer to the superblock used by this 
function
+ * before freeing the superblock.
+ */
+static bool lock_active_super(struct super_block *sb)
+{
+   down_read(>s_umount);
+   if (!sb->s_root ||
+   (sb->s_flags & (SB_ACTIVE | SB_BORN)) != (SB_ACTIVE | SB_BORN)) {
+   up_read(>s_umount);
+   return false;
+   }
+   return true;
+}
+
 static void fs_mark_dead(struct block_device *bdev)
 {
-   struct super_block *sb;
+   struct super_block *sb = bdev->bd_holder;
 
-   sb = get_super(bdev);
-   if (!sb)
+   /* bd_holder_lock ensures that the sb isn't freed */
+   lockdep_assert_held(>bd_holder_lock);
+
+   if (!lock_active_super(sb))
return;
 
if (sb->s_op->shutdown)
sb->s_op->shutdown(sb);
-   drop_super(sb);
+
+   up_read(>s_umount);
 }
 
 static const struct blk_holder_ops fs_holder_ops = {
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 11/12] xfs: drop s_umount over opening the log and RT devices

2023-08-02 Thread Christoph Hellwig
Just like get_tree_bdev needs to drop s_umount when opening the main
device, we need to do the same for the xfs log and RT devices to avoid a
potential lock order reversal with s_unmount for the mark_dead path.

It might be preferable to just drop s_umount over ->fill_super entirely,
but that will require a fairly massive audit first, so we'll do the easy
version here first.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_super.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8185102431301d..d5042419ed9997 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -448,17 +448,21 @@ STATIC int
 xfs_open_devices(
struct xfs_mount*mp)
 {
-   struct block_device *ddev = mp->m_super->s_bdev;
+   struct super_block  *sb = mp->m_super;
+   struct block_device *ddev = sb->s_bdev;
struct block_device *logdev = NULL, *rtdev = NULL;
int error;
 
+   /* see get_tree_bdev why this is needed and safe */
+   up_write(>s_umount);
+
/*
 * Open real time and log devices - order is important.
 */
if (mp->m_logname) {
error = xfs_blkdev_get(mp, mp->m_logname, );
if (error)
-   return error;
+   goto out_unlock;
}
 
if (mp->m_rtname) {
@@ -496,7 +500,10 @@ xfs_open_devices(
mp->m_logdev_targp = mp->m_ddev_targp;
}
 
-   return 0;
+   error = 0;
+out_unlock:
+   down_write(>s_umount);
+   return error;
 
  out_free_rtdev_targ:
if (mp->m_rtdev_targp)
@@ -508,7 +515,7 @@ xfs_open_devices(
  out_close_logdev:
if (logdev && logdev != ddev)
xfs_blkdev_put(mp, logdev);
-   return error;
+   goto out_unlock;
 }
 
 /*
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] more blkdev_get and holder work

2023-08-02 Thread Christoph Hellwig
Hi all,

this series sits on top of the vfs.super branch in the VFS tree and does a
few closely related things:

  1) it also converts nilfs2 and btrfs to the new scheme where the file system
 only opens the block devices after we know that a new super_block was
 allocated.
  2) it then makes sure that for all file system openers the super_block is
 stored in bd_holder, and makes use of that fact in the mark_dead method
 so that it doesn't have to fall get_super and thus can also work on
 block devices that sb->s_bdev doesn't point to
  3) it then drops the fs-specific holder ops in ext4 and xfs and uses the
 generic fs_holder_ops there

A git tree is available here:

git://git.infradead.org/users/hch/misc.git fs-holder-rework

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/fs-holder-rework

Diffstat:
 fs/btrfs/super.c   |   67 -
 fs/btrfs/volumes.c |8 ++--
 fs/btrfs/volumes.h |2 -
 fs/ext4/super.c|   18 +++---
 fs/f2fs/super.c|7 +--
 fs/nilfs2/super.c  |   81 -
 fs/super.c |   44 ++--
 fs/xfs/xfs_super.c |   32 +++--
 include/linux/blkdev.h |2 +
 include/linux/fs_context.h |2 +
 10 files changed, 126 insertions(+), 137 deletions(-)


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 08/12] fs: export fs_holder_ops

2023-08-02 Thread Christoph Hellwig
Export fs_holder_ops so that file systems that open additional block
devices can use it as well.

Signed-off-by: Christoph Hellwig 
---
 fs/super.c | 3 ++-
 include/linux/blkdev.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 0cda4af0a7e16c..dac05f96ab9ac8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1244,9 +1244,10 @@ static void fs_mark_dead(struct block_device *bdev)
up_read(>s_umount);
 }
 
-static const struct blk_holder_ops fs_holder_ops = {
+const struct blk_holder_ops fs_holder_ops = {
.mark_dead  = fs_mark_dead,
 };
+EXPORT_SYMBOL_GPL(fs_holder_ops);
 
 static int set_bdev_super(struct super_block *s, void *data)
 {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ed44a997f629f5..83262702eea71a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1464,6 +1464,8 @@ struct blk_holder_ops {
void (*mark_dead)(struct block_device *bdev);
 };
 
+extern const struct blk_holder_ops fs_holder_ops;
+
 /*
  * Return the correct open flags for blkdev_get_by_* for super block flags
  * as stored in sb->s_flags.
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 01/12] fs: export setup_bdev_super

2023-08-02 Thread Christoph Hellwig
We'll want to use setup_bdev_super instead of duplicating it in nilfs2.

Signed-off-by: Christoph Hellwig 
---
 fs/super.c | 3 ++-
 include/linux/fs_context.h | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 3ef39df5bec506..6aaa275fa8630d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1243,7 +1243,7 @@ static int test_bdev_super_fc(struct super_block *s, 
struct fs_context *fc)
s->s_dev == *(dev_t *)fc->sget_key;
 }
 
-static int setup_bdev_super(struct super_block *sb, int sb_flags,
+int setup_bdev_super(struct super_block *sb, int sb_flags,
struct fs_context *fc)
 {
blk_mode_t mode = sb_open_mode(sb_flags);
@@ -1295,6 +1295,7 @@ static int setup_bdev_super(struct super_block *sb, int 
sb_flags,
sb_set_blocksize(sb, block_size(bdev));
return 0;
 }
+EXPORT_SYMBOL_GPL(setup_bdev_super);
 
 /**
  * get_tree_bdev - Get a superblock based on a single block device
diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h
index ff6341e09925bc..58ef8433a94b8c 100644
--- a/include/linux/fs_context.h
+++ b/include/linux/fs_context.h
@@ -158,6 +158,8 @@ extern int get_tree_keyed(struct fs_context *fc,
   struct fs_context *fc),
 void *key);
 
+int setup_bdev_super(struct super_block *sb, int sb_flags,
+   struct fs_context *fc);
 extern int get_tree_bdev(struct fs_context *fc,
   int (*fill_super)(struct super_block *sb,
 struct fs_context *fc));
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 09/12] ext4: drop s_umount over opening the log device

2023-08-02 Thread Christoph Hellwig
Just like get_tree_bdev needs to drop s_umount when opening the main
device, we need to do the same for the ext4 log device to avoid a
potential lock order reversal with s_unmount for the mark_dead path.

It might be preferable to just drop s_umount over ->fill_super entirely,
but that will require a fairly massive audit first, so we'll do the easy
version here first.

Signed-off-by: Christoph Hellwig 
---
 fs/ext4/super.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 193d665813b611..2ccb19d345c6dd 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5854,7 +5854,10 @@ static journal_t *ext4_get_dev_journal(struct 
super_block *sb,
if (WARN_ON_ONCE(!ext4_has_feature_journal(sb)))
return NULL;
 
+   /* see get_tree_bdev why this is needed and safe */
+   up_write(>s_umount);
bdev = ext4_blkdev_get(j_dev, sb);
+   down_write(>s_umount);
if (bdev == NULL)
return NULL;
 
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 06/12] fs: use the super_block as holder when mounting file systems

2023-08-02 Thread Christoph Hellwig
The file system type is not a very useful holder as it doesn't allow us
to go back to the actual file system instance.  Pass the super_block instead
which is useful when passed back to the file system driver.

Signed-off-by: Christoph Hellwig 
---
 fs/btrfs/super.c | 7 ++-
 fs/f2fs/super.c  | 7 +++
 fs/super.c   | 8 
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 5980b5dcc6b163..8a47c7f2690880 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -69,8 +69,6 @@ static const struct super_operations btrfs_super_ops;
  * requested by subvol=/path. That way the callchain is straightforward and we
  * don't have to play tricks with the mount options and recursive calls to
  * btrfs_mount.
- *
- * The new btrfs_root_fs_type also servers as a tag for the bdev_holder.
  */
 static struct file_system_type btrfs_fs_type;
 static struct file_system_type btrfs_root_fs_type;
@@ -1498,8 +1496,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
} else {
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
 
-   error = btrfs_open_devices(fs_devices, sb_open_mode(flags),
-  fs_type);
+   error = btrfs_open_devices(fs_devices, sb_open_mode(flags), s);
if (error)
goto out_deactivate;
 
@@ -1513,7 +1510,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
 fs_devices->latest_dev->bdev);
shrinker_debugfs_rename(>s_shrink, "sb-%s:%s", fs_type->name,
s->s_id);
-   btrfs_sb(s)->bdev_holder = fs_type;
+   fs_info->bdev_holder = s;
error = btrfs_fill_super(s, fs_devices, data);
}
if (!error)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index ca31163da00a55..05c90fdb7a6cca 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1561,7 +1561,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
int i;
 
for (i = 0; i < sbi->s_ndevs; i++) {
-   blkdev_put(FDEV(i).bdev, sbi->sb->s_type);
+   blkdev_put(FDEV(i).bdev, sbi->sb);
 #ifdef CONFIG_BLK_DEV_ZONED
kvfree(FDEV(i).blkz_seq);
 #endif
@@ -4198,7 +4198,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
/* Single zoned block device mount */
FDEV(0).bdev =
blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev, mode,
- sbi->sb->s_type, NULL);
+ sbi->sb, NULL);
} else {
/* Multi-device mount */
memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
@@ -4217,8 +4217,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
sbi->log_blocks_per_seg) - 1;
}
FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path, mode,
- sbi->sb->s_type,
- NULL);
+ sbi->sb, NULL);
}
if (IS_ERR(FDEV(i).bdev))
return PTR_ERR(FDEV(i).bdev);
diff --git a/fs/super.c b/fs/super.c
index 6aaa275fa8630d..09b65ee1a8b737 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1249,7 +1249,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
blk_mode_t mode = sb_open_mode(sb_flags);
struct block_device *bdev;
 
-   bdev = blkdev_get_by_dev(sb->s_dev, mode, sb->s_type, _holder_ops);
+   bdev = blkdev_get_by_dev(sb->s_dev, mode, sb, _holder_ops);
if (IS_ERR(bdev)) {
if (fc)
errorf(fc, "%s: Can't open blockdev", fc->source);
@@ -1262,7 +1262,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
 * writable from userspace even for a read-only block device.
 */
if ((mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
-   blkdev_put(bdev, sb->s_type);
+   blkdev_put(bdev, sb);
return -EACCES;
}
 
@@ -1278,7 +1278,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
mutex_unlock(>bd_fsfreeze_mutex);
if (fc)
warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
-   blkdev_put(bdev, sb->s_type);
+   blkdev_put(bdev, sb);
return -EBUSY;
}
spin_lock(_lock);
@@ -1418,7 +1418,7 @@ void kill_block_super(struct super_block *sb)
if (bdev) {
bdev->bd_super = NULL;
sync_blockdev(bdev);
-   blkdev_put(bdev, sb->s_type);
+

[f2fs-dev] [PATCH 02/12] nilfs2: use setup_bdev_super to de-duplicate the mount code

2023-08-02 Thread Christoph Hellwig
Use the generic setup_bdev_super helper to open the main block device
and do various bits of superblock setup instead of duplicating the
logic.  This includes moving to the new scheme implemented in common
code that only opens the block device after the superblock has allocated.

It does not yet convert nilfs2 to the new mount API, but doing so will
become a bit simpler after this first step.

Signed-off-by: Christoph Hellwig 
---
 fs/nilfs2/super.c | 81 ++-
 1 file changed, 30 insertions(+), 51 deletions(-)

diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 0ef8c71bde8e5f..a5d1fa4e7552f6 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "nilfs.h"
 #include "export.h"
 #include "mdt.h"
@@ -1216,7 +1217,6 @@ static int nilfs_remount(struct super_block *sb, int 
*flags, char *data)
 }
 
 struct nilfs_super_data {
-   struct block_device *bdev;
__u64 cno;
int flags;
 };
@@ -1283,64 +1283,49 @@ static int nilfs_identify(char *data, struct 
nilfs_super_data *sd)
 
 static int nilfs_set_bdev_super(struct super_block *s, void *data)
 {
-   s->s_bdev = data;
-   s->s_dev = s->s_bdev->bd_dev;
+   s->s_dev = *(dev_t *)data;
return 0;
 }
 
 static int nilfs_test_bdev_super(struct super_block *s, void *data)
 {
-   return (void *)s->s_bdev == data;
+   return !(s->s_iflags & SB_I_RETIRED) && s->s_dev == *(dev_t *)data;
 }
 
 static struct dentry *
 nilfs_mount(struct file_system_type *fs_type, int flags,
 const char *dev_name, void *data)
 {
-   struct nilfs_super_data sd;
+   struct nilfs_super_data sd = { .flags = flags };
struct super_block *s;
-   struct dentry *root_dentry;
-   int err, s_new = false;
+   dev_t dev;
+   int err;
 
-   sd.bdev = blkdev_get_by_path(dev_name, sb_open_mode(flags), fs_type,
-NULL);
-   if (IS_ERR(sd.bdev))
-   return ERR_CAST(sd.bdev);
+   if (nilfs_identify(data, ))
+   return ERR_PTR(-EINVAL);
 
-   sd.cno = 0;
-   sd.flags = flags;
-   if (nilfs_identify((char *)data, )) {
-   err = -EINVAL;
-   goto failed;
-   }
+   err = lookup_bdev(dev_name, );
+   if (err)
+   return ERR_PTR(err);
 
-   /*
-* once the super is inserted into the list by sget, s_umount
-* will protect the lockfs code from trying to start a snapshot
-* while we are mounting
-*/
-   mutex_lock(>bd_fsfreeze_mutex);
-   if (sd.bdev->bd_fsfreeze_count > 0) {
-   mutex_unlock(>bd_fsfreeze_mutex);
-   err = -EBUSY;
-   goto failed;
-   }
s = sget(fs_type, nilfs_test_bdev_super, nilfs_set_bdev_super, flags,
-sd.bdev);
-   mutex_unlock(>bd_fsfreeze_mutex);
-   if (IS_ERR(s)) {
-   err = PTR_ERR(s);
-   goto failed;
-   }
+);
+   if (IS_ERR(s))
+   return ERR_CAST(s);
 
if (!s->s_root) {
-   s_new = true;
-
-   /* New superblock instance created */
-   snprintf(s->s_id, sizeof(s->s_id), "%pg", sd.bdev);
-   sb_set_blocksize(s, block_size(sd.bdev));
-
-   err = nilfs_fill_super(s, data, flags & SB_SILENT ? 1 : 0);
+   /*
+* We drop s_umount here because we need to open the bdev and
+* bdev->open_mutex ranks above s_umount (blkdev_put() ->
+* __invalidate_device()). It is safe because we have active sb
+* reference and SB_BORN is not set yet.
+*/
+   up_write(>s_umount);
+   err = setup_bdev_super(s, flags, NULL);
+   down_write(>s_umount);
+   if (!err)
+   err = nilfs_fill_super(s, data,
+  flags & SB_SILENT ? 1 : 0);
if (err)
goto failed_super;
 
@@ -1366,24 +1351,18 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
}
 
if (sd.cno) {
+   struct dentry *root_dentry;
+
err = nilfs_attach_snapshot(s, sd.cno, _dentry);
if (err)
goto failed_super;
-   } else {
-   root_dentry = dget(s->s_root);
+   return root_dentry;
}
 
-   if (!s_new)
-   blkdev_put(sd.bdev, fs_type);
-
-   return root_dentry;
+   return dget(s->s_root);
 
  failed_super:
deactivate_locked_super(s);
-
- failed:
-   if (!s_new)
-   blkdev_put(sd.bdev, fs_type);
return ERR_PTR(err);
 }
 
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net

[f2fs-dev] [PATCH 03/12] btrfs: always open the device read-only in btrfs_scan_one_device

2023-08-02 Thread Christoph Hellwig
btrfs_scan_one_device opens the block device only to read the super
block.  Instead of passing a blk_mode_t argument to sometimes open
it for writing, just hard code BLK_OPEN_READ as it will never write
to the device or hand the block_device out to someone else.

Signed-off-by: Christoph Hellwig 
---
 fs/btrfs/super.c   | 15 +++
 fs/btrfs/volumes.c |  4 ++--
 fs/btrfs/volumes.h |  2 +-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f1dd172d8d5bd7..b318bddefd5236 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -849,7 +849,7 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
  * All other options will be parsed on much later in the mount process and
  * only when we need to allocate a new super block.
  */
-static int btrfs_parse_device_options(const char *options, blk_mode_t flags)
+static int btrfs_parse_device_options(const char *options)
 {
substring_t args[MAX_OPT_ARGS];
char *device_name, *opts, *orig, *p;
@@ -883,7 +883,7 @@ static int btrfs_parse_device_options(const char *options, 
blk_mode_t flags)
error = -ENOMEM;
goto out;
}
-   device = btrfs_scan_one_device(device_name, flags);
+   device = btrfs_scan_one_device(device_name);
kfree(device_name);
if (IS_ERR(device)) {
error = PTR_ERR(device);
@@ -1440,7 +1440,6 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
struct btrfs_fs_devices *fs_devices = NULL;
struct btrfs_fs_info *fs_info = NULL;
void *new_sec_opts = NULL;
-   blk_mode_t mode = sb_open_mode(flags);
int error = 0;
 
if (data) {
@@ -1472,13 +1471,13 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
}
 
mutex_lock(_mutex);
-   error = btrfs_parse_device_options(data, mode);
+   error = btrfs_parse_device_options(data);
if (error) {
mutex_unlock(_mutex);
goto error_fs_info;
}
 
-   device = btrfs_scan_one_device(device_name, mode);
+   device = btrfs_scan_one_device(device_name);
if (IS_ERR(device)) {
mutex_unlock(_mutex);
error = PTR_ERR(device);
@@ -1488,7 +1487,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
fs_devices = device->fs_devices;
fs_info->fs_devices = fs_devices;
 
-   error = btrfs_open_devices(fs_devices, mode, fs_type);
+   error = btrfs_open_devices(fs_devices, sb_open_mode(flags), fs_type);
mutex_unlock(_mutex);
if (error)
goto error_fs_info;
@@ -2190,7 +2189,7 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
switch (cmd) {
case BTRFS_IOC_SCAN_DEV:
mutex_lock(_mutex);
-   device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
+   device = btrfs_scan_one_device(vol->name);
ret = PTR_ERR_OR_ZERO(device);
mutex_unlock(_mutex);
break;
@@ -2204,7 +2203,7 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
break;
case BTRFS_IOC_DEVICES_READY:
mutex_lock(_mutex);
-   device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
+   device = btrfs_scan_one_device(vol->name);
if (IS_ERR(device)) {
mutex_unlock(_mutex);
ret = PTR_ERR(device);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73f9ea7672dbda..8246578c70f55b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1356,7 +1356,7 @@ int btrfs_forget_devices(dev_t devt)
  * and we are not allowed to call set_blocksize during the scan. The superblock
  * is read via pagecache
  */
-struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
+struct btrfs_device *btrfs_scan_one_device(const char *path)
 {
struct btrfs_super_block *disk_super;
bool new_device_added = false;
@@ -1384,7 +1384,7 @@ struct btrfs_device *btrfs_scan_one_device(const char 
*path, blk_mode_t flags)
 * values temporarily, as the device paths of the fsid are the only
 * required information for assembling the volume.
 */
-   bdev = blkdev_get_by_path(path, flags, NULL, NULL);
+   bdev = blkdev_get_by_path(path, BLK_OPEN_READ, NULL, NULL);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index b8c51f16ba867f..824161c6dd063e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -611,7 +611,7 @@ struct btrfs_block_group *btrfs_create_chunk(struct 
btrfs_trans_handle *trans,
 void 

[f2fs-dev] [PATCH 05/12] ext4: make the IS_EXT2_SB/IS_EXT3_SB checks more robust

2023-08-02 Thread Christoph Hellwig
Check for sb->s_type which is the right place to look at the file system
type, not the holder, which is just an implementation detail in the VFS
helpers.

Signed-off-by: Christoph Hellwig 
---
 fs/ext4/super.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c94ebf704616e5..193d665813b611 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -140,7 +140,7 @@ static struct file_system_type ext2_fs_type = {
 };
 MODULE_ALIAS_FS("ext2");
 MODULE_ALIAS("ext2");
-#define IS_EXT2_SB(sb) ((sb)->s_bdev->bd_holder == _fs_type)
+#define IS_EXT2_SB(sb) ((sb)->s_type == _fs_type)
 #else
 #define IS_EXT2_SB(sb) (0)
 #endif
@@ -156,7 +156,7 @@ static struct file_system_type ext3_fs_type = {
 };
 MODULE_ALIAS_FS("ext3");
 MODULE_ALIAS("ext3");
-#define IS_EXT3_SB(sb) ((sb)->s_bdev->bd_holder == _fs_type)
+#define IS_EXT3_SB(sb) ((sb)->s_type == _fs_type)
 
 
 static inline void __ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags,
-- 
2.39.2



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Only lfs mode is allowed with zoned block device feature

2023-08-02 Thread Chao Yu

On 2023/6/20 22:10, Chunhai Guo wrote:

Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.

Signed-off-by: Chunhai Guo 
---
  fs/f2fs/super.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c917fa771f0e..6249483be905 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -861,12 +861,14 @@ static int parse_options(struct super_block *sb, char 
*options, bool is_remount)
  
  			if (!name)

return -ENOMEM;
+
+   if (strcmp(name, "lfs") && f2fs_sb_has_blkzoned(sbi)) {
+   f2fs_warn(sbi, "Only lfs mode is allowed with zoned 
block device feature");
+   kfree(name);
+   return -EINVAL;
+   }


What about:

From f6e71f1d33c2cca543ebb41734c7a95af5190767 Mon Sep 17 00:00:00 2001
From: Chunhai Guo 
Date: Sun, 30 Jul 2023 10:06:50 +0800
Subject: [PATCH] f2fs: only lfs mode is allowed with zoned block device
 feature

Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.

Signed-off-by: Chunhai Guo 
---
 fs/f2fs/super.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 099e126c61e1..1d0593358125 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -864,11 +864,6 @@ static int parse_options(struct super_block *sb, char 
*options, bool is_remount)
if (!name)
return -ENOMEM;
if (!strcmp(name, "adaptive")) {
-   if (f2fs_sb_has_blkzoned(sbi)) {
-   f2fs_warn(sbi, "adaptive mode is not allowed 
with zoned block device feature");
-   kfree(name);
-   return -EINVAL;
-   }
F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE;
} else if (!strcmp(name, "lfs")) {
F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS;
@@ -1333,6 +1328,10 @@ static int parse_options(struct super_block *sb, char 
*options, bool is_remount)
F2FS_OPTION(sbi).discard_unit =
DISCARD_UNIT_SECTION;
}
+   if (F2FS_OPTION(sbi).fs_mode != FS_MODE_LFS) {
+   f2fs_info(sbi, "Only lfs mode is allowed with zoned block 
device feature");
+   return -EINVAL;
+   }
 #else
f2fs_err(sbi, "Zoned block device support is not enabled");
return -EINVAL;
--
2.40.1




+
if (!strcmp(name, "adaptive")) {
-   if (f2fs_sb_has_blkzoned(sbi)) {
-   f2fs_warn(sbi, "adaptive mode is not allowed 
with zoned block device feature");
-   kfree(name);
-   return -EINVAL;
-   }
F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE;
} else if (!strcmp(name, "lfs")) {
F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS;



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] f2fs: introduce two helper functions for the largest cached extent

2023-08-02 Thread Chao Yu

On 2023/7/26 14:16, Chao Liu wrote:

On 7月 26 09:24, Chao Yu wrote:

On 2023/7/25 9:36, Chao Liu wrote:

From: Chao Liu 

This patch is a cleanup:
1. Merge __drop_largest_extent() since it has only one caller.
2. Introduce __unlock_tree_with_checking_largest() and
 __drop_largest_extent() to help manage largest and largest_update
 in extent_tree.

Signed-off-by: Chao Liu 
---
v2: Make sure et->largest_updated gets updated within >lock.
  Thanks to Chao Yu for pointing out.
---
   fs/f2fs/extent_cache.c | 66 --
   fs/f2fs/f2fs.h |  4 +--
   2 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
index 0e2d49140c07f..cfc69621a8a26 100644
--- a/fs/f2fs/extent_cache.c
+++ b/fs/f2fs/extent_cache.c
@@ -19,6 +19,12 @@
   #include "node.h"
   #include 
+static void __drop_largest_extent(struct extent_tree *et)
+{
+   et->largest.len = 0;
+   et->largest_updated = true;
+}
+
   bool sanity_check_extent_cache(struct inode *inode)
   {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -35,8 +41,7 @@ bool sanity_check_extent_cache(struct inode *inode)
/* Let's drop, if checkpoint got corrupted. */
if (is_set_ckpt_flags(sbi, CP_ERROR_FLAG)) {
-   ei->len = 0;
-   et->largest_updated = true;
+   __drop_largest_extent(et);


__drop_largest_extent_force(et);


return true;
}
@@ -310,6 +315,8 @@ static void __detach_extent_node(struct f2fs_sb_info *sbi,
if (et->cached_en == en)
et->cached_en = NULL;
+
+   /* keep the largest as we can still use it */


The comments doesn't match below code?



Sorry for not explaining this earlier.

It's just a hint and has nothing to do with the code below. It


So, it should be near to related code..., otherwise, IMO, it makes the code
more confused.

Thanks,


simply explains that we don't need to disable the largest here, which
makes the whole code logic of the largest more clear. :)

If it's not fitting, please let me know, and I'll drop them.


kmem_cache_free(extent_node_slab, en);
   }
@@ -385,15 +392,6 @@ static unsigned int __free_extent_tree(struct f2fs_sb_info 
*sbi,
return count - atomic_read(>node_cnt);
   }
-static void __drop_largest_extent(struct extent_tree *et,
-   pgoff_t fofs, unsigned int len)
-{
-   if (fofs < et->largest.fofs + et->largest.len &&
-   fofs + len > et->largest.fofs) {
-   et->largest.len = 0;
-   et->largest_updated = true;
-   }
-}


What about:

static void __drop_largest_extent_cond(struct extent_tree *et,
pgoff_t fofs, unsigned int len,
bool force)
{
if (force || (fofs < et->largest.fofs + et->largest.len &&
fofs + len > et->largest.fofs)) {
et->largest.len = 0;
et->largest_updated = true;
}
}

static void __drop_largest_extent_force(struct extent_tree *et)
{
__drop_largest_extent_cond(et, 0, 0, true);
}



Thank you. I feel it matches better with the existing code
organization style. Let me apply it in v3.


   void f2fs_init_read_extent_tree(struct inode *inode, struct page *ipage)
   {
@@ -601,6 +599,19 @@ static struct extent_node *__insert_extent_tree(struct 
f2fs_sb_info *sbi,
return en;
   }
+static void __unlock_tree_with_checking_largest(struct extent_tree *et,
+   struct inode *inode)
+{
+   if (et->type == EX_READ && et->largest_updated) {
+   et->largest_updated = false;
+   write_unlock(>lock);
+   f2fs_mark_inode_dirty_sync(inode, true);
+   return;
+   }
+
+   write_unlock(>lock);
+}
+
   static void __update_extent_tree_range(struct inode *inode,
struct extent_info *tei, enum extent_type type)
   {
@@ -612,7 +623,6 @@ static void __update_extent_tree_range(struct inode *inode,
struct rb_node **insert_p = NULL, *insert_parent = NULL;
unsigned int fofs = tei->fofs, len = tei->len;
unsigned int end = fofs + len;
-   bool updated = false;
bool leftmost = false;
if (!et)
@@ -636,11 +646,10 @@ static void __update_extent_tree_range(struct inode 
*inode,
prev = et->largest;
dei.len = 0;
-   /*
-* drop largest extent before lookup, in case it's already
-* been shrunk from extent tree
-*/
-   __drop_largest_extent(et, fofs, len);


__drop_largest_extent_cond(et, fofs, len, false);


+   /* updates may cause largest extent cache to become invalid */
+   if (fofs < et->largest.fofs + et->largest.len &&
+   fofs + len > 

Re: [f2fs-dev] [syzbot] [f2fs?] general protection fault in f2fs_drop_extent_tree

2023-08-02 Thread Aleksandr Nogikh via Linux-f2fs-devel
On Wed, Aug 2, 2023 at 2:41 AM syzbot
 wrote:
>
> syzbot suspects this issue was fixed by commit:
>
> commit 458c15dfbce62c35fefd9ca637b20a051309c9f1
> Author: Chao Yu 
> Date:   Tue May 23 03:58:22 2023 +
>
> f2fs: don't reset unchangable mount option in f2fs_remount()
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=12461d31a8
> start commit:   a92b7d26c743 Merge tag 'drm-fixes-2023-06-23' of git://ano..
> git tree:   upstream
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2cbd298d0aff1140
> dashboard link: https://syzkaller.appspot.com/bug?extid=f4649be1be739e030111
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1564afb0a8
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=166928c728
>
> If the result looks correct, please mark the issue as fixed by replying with:
>
> #syz fix: f2fs: don't reset unchangable mount option in f2fs_remount()

Looks reasonable.
#syz fix: f2fs: don't reset unchangable mount option in f2fs_remount()

>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: Only lfs mode is allowed with zoned block device feature

2023-08-02 Thread Chunhai Guo via Linux-f2fs-devel

Hi Chao & Jaegeuk,

Could you please help to review this patch?

Thanks.

On 2023/6/20 22:10, 郭纯海 wrote:

Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.

Signed-off-by: Chunhai Guo 
---
  fs/f2fs/super.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c917fa771f0e..6249483be905 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -861,12 +861,14 @@ static int parse_options(struct super_block *sb, char 
*options, bool is_remount)
  
  			if (!name)

return -ENOMEM;
+
+   if (strcmp(name, "lfs") && f2fs_sb_has_blkzoned(sbi)) {
+   f2fs_warn(sbi, "Only lfs mode is allowed with zoned 
block device feature");
+   kfree(name);
+   return -EINVAL;
+   }
+
if (!strcmp(name, "adaptive")) {
-   if (f2fs_sb_has_blkzoned(sbi)) {
-   f2fs_warn(sbi, "adaptive mode is not allowed 
with zoned block device feature");
-   kfree(name);
-   return -EINVAL;
-   }
F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE;
} else if (!strcmp(name, "lfs")) {
F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS;



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel