Re: [Cluster-devel] [PATCH 01/35] fscache: Remove unused ->now_uncached callback

2017-06-01 Thread David Howells
Jan Kara  wrote:

> The callback doesn't ever get called. Remove it.

Hmmm...  I should perhaps be calling this.  I'm not sure why I never did.

At the moment, it doesn't strictly matter as ops on pages marked with
PG_fscache get ignored if the cache has suffered an I/O error or has been
withdrawn - but it will incur a performance penalty (the PG_fscache flag is
checked in the netfs before calling into fscache).

The downside of calling this is that when a cache is removed, fscache would go
through all the cookies for that cache and iterate over all the pages
associated with those cookies - which could cause a performance dip in the
system.

David



Re: [Cluster-devel] [PATCH 1/2] gfs2: Convert gfs2 to fs_context

2019-03-18 Thread David Howells
Andrew Price  wrote:

> sget() is still used instead of sget_fc() as there doesn't seem to be an
> obvious replacement for the bdev pointer propagation to the test/set
> functions (yet?)

Umm...  What about the fs_context struct?  Why can't that be used to propagate
the bdev pointer?  That's kind of what it's for...

struct super_block *sget_fc(
struct fs_context *fc,
int (*test)(struct super_block *, struct fs_context *),
int (*set)(struct super_block *, struct fs_context *))

It looks like you should be able to stash the bdev pointer in the gfs2_args
struct.

> + fsparam_s32("commit", Opt_commit),
> + fsparam_s32("statfs_quantum", Opt_statfs_quantum),
> + fsparam_s32("statfs_percent", Opt_statfs_percent),

Why s32?  Why not u32?

David



Re: [Cluster-devel] [PATCH 1/2] gfs2: Convert gfs2 to fs_context

2019-03-18 Thread David Howells
Andrew Price  wrote:

> > Umm...  What about the fs_context struct?  Why can't that be used to
> > propagate the bdev pointer?  That's kind of what it's for...
> 
> It would be useful to have the block device pointer in the fs_context since so
> many of the filesystems use them and it makes for an obvious API migration.

That may be so.  I've argued also that we should put a net-namespace pointer
in there, but Al disagreed.

However, I think most bdev-based filesystems use mount_bdev() which gfs2 does
not.

> > It looks like you should be able to stash the bdev pointer in the gfs2_args
> > struct.
> 
> Sure, but since the new API is young I figured I'd hold off until we had this
> conversation because adding it to the fs_context might be agreeable :)

Note that I have patches to kill of sget_userns() and sget() will hopefully
soon follow.  Have a look at:


https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=mount-api-viro

David



Re: [Cluster-devel] [PATCH v2 1/2] gfs2: Convert gfs2 to fs_context

2019-03-19 Thread David Howells
Andrew Price  wrote:

> + pr_warn("-o debug and -o errors=panic are mutually 
> exclusive\n");
> + return -EINVAL;

return invalf(fc, "gfs2: -o debug and -o errors=panic are mutually exclusive"); 

(Note: no "\n")

> + if (result.int_32 > 0)
> + args->ar_quota = opt_quota_values[result.int_32];
> + else if (result.negated)
> + args->ar_quota = GFS2_QUOTA_OFF;
> + else
> + args->ar_quota = GFS2_QUOTA_ON;

I recommend checking result.negated first.

> + /* Not allowed to change locking details */
> + if (strcmp(newargs->ar_lockproto, oldargs->ar_lockproto) ||
> + strcmp(newargs->ar_locktable, oldargs->ar_locktable) ||
> + strcmp(newargs->ar_hostdata, oldargs->ar_hostdata))
> + return -EINVAL;

Use errorf().  (Not invalf - the parameter isn't exactly invalid, it's just
that you're not allowed to do this operation).

> + error = gfs2_make_fs_ro(sdp);
> + else
> + error = gfs2_make_fs_rw(sdp);
> + if (error)
> + return error;

Might want to call errorf() here too.

> - s = sget(&gfs2_fs_type, test_gfs2_super, set_meta_super, flags,
> + s = sget(&gfs2_fs_type, test_meta_super, set_meta_super, flags,

Try and use sget_fc() please.  If you look at the fuse patchset I cc'd you on,
there's a commit there that adds a ->bdev and ->bdev_mode to fs_context that
may be of use to you.

Can you use vfs_get_block_super()?  Would it be of use to export
test_bdev_super_fc() and set_bdev_super_fc()?

David



Re: [Cluster-devel] [PATCH v2 1/2] gfs2: Convert gfs2 to fs_context

2019-03-21 Thread David Howells
Andrew Price  wrote:

> Would this fly?

Can you update the comment to document what it's for?

David



Re: [Cluster-devel] [PATCH] vfs: Allow selection of fs root independent of sb

2019-03-21 Thread David Howells
That's not what I meant.  There's a kerneldoc comment on vfs_get_block_super()
that you need to modify.  I would in any case roll your patch into mine.

> This is intended for gfs2 to select between the normal root and the
> 'meta' root, which must be done whether a root already exists for a
> superblock or not, i.e. it cannot be decided in fill_super().

Can this be done in gfs2_get_tree, where it sets fc->root?

David



[Cluster-devel] [RFC PATCH 00/68] VFS: Convert a bunch of filesystems to the new mount API

2019-03-27 Thread David Howells


Hi Al,

Here's a set of patches that converts a bunch (but not yet all!) to the new
mount API.  To this end, it makes the following changes:

 (1) Provides a convenience member in struct fs_context that is OR'd into
 sb->s_iflags by sget_fc().

 (2) Provides a convenience helper function, vfs_init_pseudo_fs_context(),
 for doing most of the work in mounting a pseudo filesystem.

 (3) Provides a convenience helper function, vfs_get_block_super(), for
 doing the work in setting up a block-based superblock.

 (4) Improves the handling of fd-type parameters.

 (5) Moves some of the subtype handling int fuse.

 (6) Provides a convenience helper function, vfs_get_mtd_super(), for
 doing the work in setting up an MTD device-based superblock.

 (7) Kills off mount_pseudo(), mount_pseudo_xattr(), mount_ns(),
 sget_userns(), mount_mtd(), mount_single().

 (8) Converts a slew of filesystems to use the mount API.

 (9) Fixes a bug in hypfs.

The patches can be found here also:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git

on branch:

mount-api-viro

David
---
Andrew Price (1):
  gfs2: Convert gfs2 to fs_context

David Howells (66):
  vfs: Update mount API docs
  vfs: Fix refcounting of filenames in fs_parser
  vfs: Provide sb->s_iflags settings in fs_context struct
  vfs: Provide a mount_pseudo-replacement for the new mount API
  vfs: Convert aio to use the new mount API
  vfs: Convert anon_inodes to use the new mount API
  vfs: Convert bdev to use the new mount API
  vfs: Convert nsfs to use the new mount API
  vfs: Convert pipe to use the new mount API
  vfs: Convert zsmalloc to use the new mount API
  vfs: Convert sockfs to use the new mount API
  vfs: Convert dax to use the new mount API
  vfs: Convert drm to use the new mount API
  vfs: Convert ia64 perfmon to use the new mount API
  vfs: Convert cxl to use the new mount API
  vfs: Convert ocxlflash to use the new mount API
  vfs: Convert virtio_balloon to use the new mount API
  vfs: Convert btrfs_test to use the new mount API
  vfs: Kill off mount_pseudo() and mount_pseudo_xattr()
  vfs: Use sget_fc() for pseudo-filesystems
  vfs: Convert binderfs to use the new mount API
  vfs: Convert nfsctl to use the new mount API
  vfs: Convert rpc_pipefs to use the new mount API
  vfs: Kill mount_ns()
  vfs: Kill sget_userns()
  vfs: Convert binfmt_misc to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert fusectl to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert smackfs to use the new mount API
  vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount 
API
  vfs: Create fs_context-aware mount_bdev() replacement
  vfs: Make fs_parse() handle fs_param_is_fd-type params better
  vfs: Convert fuse to use the new mount API
  vfs: Move the subtype parameter into fuse
  mtd: Provide fs_context-aware mount_mtd() replacement
  vfs: Convert romfs to use the new mount API
  vfs: Convert cramfs to use the new mount API
  vfs: Convert jffs2 to use the new mount API
  mtd: Kill mount_mtd()
  vfs: Convert squashfs to use the new mount API
  vfs: Convert ceph to use the new mount API
  vfs: Convert functionfs to use the new mount API
  vfs: Add a single-or-reconfig keying to vfs_get_super()
  vfs: Convert debugfs to use the new mount API
  vfs: Convert tracefs to use the new mount API
  vfs: Convert pstore to use the new mount API
  hypfs: Fix error number left in struct pointer member
  vfs: Convert hypfs to use the new mount API
  vfs: Convert spufs to use the new mount API
  vfs: Kill mount_single()
  vfs: Convert coda to use the new mount API
  vfs: Convert autofs to use the new mount API
  vfs: Convert devpts to use the new mount API
  vfs: Convert bpf to use the new mount API
  vfs: Convert ubifs to use the new mount API
  vfs: Convert orangefs to use the new mount API

Masahiro Yamada (1):
  kbuild: skip sub-make for in-tree build with GNU Make 4.x


 Documentation/filesystems/mount_api.txt   |  367 ---
 Documentation/filesystems/vfs.txt |4 
 Makefile  |   31 +
 arch/ia64/kernel/perfmon.c|   14 -
 arch/powerpc/platforms/cell/spuf

[Cluster-devel] [RFC PATCH 68/68] gfs2: Convert gfs2 to fs_context

2019-03-27 Thread David Howells
From: Andrew Price 

Convert gfs2 and gfs2meta to fs_context. Removes the duplicated vfs code
from gfs2_mount and instead uses the new vfs_get_block_super() before
switching the ->root to the appropriate dentry.

The mount option parsing has been converted to the new API and error
reporting for invalid options has been made more precise at the same
time.

All of the mount/remount code has been moved into ops_fstype.c

Signed-off-by: Andrew Price 
Signed-off-by: David Howells 
cc: cluster-devel@redhat.com
---

 fs/gfs2/incore.h |8 -
 fs/gfs2/ops_fstype.c |  495 ++
 fs/gfs2/super.c  |  335 --
 fs/gfs2/super.h  |3 
 4 files changed, 382 insertions(+), 459 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index cdf07b408f54..10febb298da5 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -587,10 +587,10 @@ struct gfs2_args {
unsigned int ar_rgrplvb:1;  /* use lvbs for rgrp info */
unsigned int ar_loccookie:1;/* use location based readdir
   cookies */
-   int ar_commit;  /* Commit interval */
-   int ar_statfs_quantum;  /* The fast statfs interval */
-   int ar_quota_quantum;   /* The quota interval */
-   int ar_statfs_percent;  /* The % change to force sync */
+   s32 ar_commit;  /* Commit interval */
+   s32 ar_statfs_quantum;  /* The fast statfs interval */
+   s32 ar_quota_quantum;   /* The quota interval */
+   s32 ar_statfs_percent;  /* The % change to force sync */
 };
 
 struct gfs2_tune {
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index b041cb8ae383..8e8d624170a1 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "gfs2.h"
 #include "incore.h"
@@ -1024,16 +1025,17 @@ void gfs2_online_uevent(struct gfs2_sbd *sdp)
 }
 
 /**
- * fill_super - Read in superblock
+ * gfs2_fill_super - Read in superblock
  * @sb: The VFS superblock
- * @data: Mount options
+ * @args: Mount options
  * @silent: Don't complain if it's not a GFS2 filesystem
  *
- * Returns: errno
+ * Returns: -errno
  */
-
-static int fill_super(struct super_block *sb, struct gfs2_args *args, int 
silent)
+static int gfs2_fill_super(struct super_block *sb, struct fs_context *fc)
 {
+   struct gfs2_args *args = fc->fs_private;
+   int silent = fc->sb_flags & SB_SILENT;
struct gfs2_sbd *sdp;
struct gfs2_holder mount_gh;
int error;
@@ -1200,161 +1202,415 @@ static int fill_super(struct super_block *sb, struct 
gfs2_args *args, int silent
return error;
 }
 
-static int set_gfs2_super(struct super_block *s, void *data)
+/**
+ * gfs2_get_tree - Get the GFS2 superblock and root directory
+ * @fc: The filesystem context
+ *
+ * Returns: 0 or -errno on error
+ */
+static int gfs2_get_tree(struct fs_context *fc)
 {
-   s->s_bdev = data;
-   s->s_dev = s->s_bdev->bd_dev;
-   s->s_bdi = bdi_get(s->s_bdev->bd_bdi);
+   struct gfs2_args *args = fc->fs_private;
+   struct gfs2_sbd *sdp;
+   int error;
+
+   error = vfs_get_block_super(fc, gfs2_fill_super);
+   if (error)
+   return error;
+
+   sdp = fc->root->d_sb->s_fs_info;
+   dput(fc->root);
+   if (args->ar_meta)
+   fc->root = dget(sdp->sd_master_dir);
+   else
+   fc->root = dget(sdp->sd_root_dir);
return 0;
 }
 
-static int test_gfs2_super(struct super_block *s, void *ptr)
+static void gfs2_fc_free(struct fs_context *fc)
 {
-   struct block_device *bdev = ptr;
-   return (bdev == s->s_bdev);
+   struct gfs2_args *args = fc->fs_private;
+
+   kfree(args);
 }
 
-/**
- * gfs2_mount - Get the GFS2 superblock
- * @fs_type: The GFS2 filesystem type
- * @flags: Mount flags
- * @dev_name: The name of the device
- * @data: The mount arguments
- *
- * Q. Why not use get_sb_bdev() ?
- * A. We need to select one of two root directories to mount, independent
- *of whether this is the initial, or subsequent, mount of this sb
- *
- * Returns: 0 or -ve on error
- */
+enum gfs2_param {
+   Opt_lockproto,
+   Opt_locktable,
+   Opt_hostdata,
+   Opt_spectator,
+   Opt_ignore_local_fs,
+   Opt_localflocks,
+   Opt_localcaching,
+   Opt_debug,
+   Opt_upgrade,
+   Opt_acl,
+   Opt_quota,
+   Opt_suiddir,
+   Opt_data,
+   Opt_meta,
+   Opt_discard,
+   Opt_commit,
+   Opt_errors,
+   Opt_statfs_quantum,
+   Opt_statfs_percent,
+   Opt_quota_quantum,
+   Opt_barrier,
+   Opt_rgrplvb,
+   Opt_loccookie,
+

Re: [Cluster-devel] [RFC][PATCH] wake_up_var() memory ordering

2019-06-25 Thread David Howells
Peter Zijlstra  wrote:

> I tried using wake_up_var() today and accidentally noticed that it
> didn't imply an smp_mb() and specifically requires it through
> wake_up_bit() / waitqueue_active().

Thinking about it again, I'm not sure why you need to add the barrier when
wake_up() (which this is a wrapper around) is required to impose a barrier at
the front if there's anything to wake up (ie. the wait queue isn't empty).

If this is insufficient, does it make sense just to have wake_up*() functions
do an unconditional release or full barrier right at the front, rather than it
being conditional on something being woken up?

> @@ -619,9 +614,7 @@ static int dvb_usb_fe_sleep(struct dvb_frontend *fe)
>  err:
>   if (!adap->suspend_resume_active) {
>   adap->active_fe = -1;

I'm wondering if there's a missing barrier here.  Should the clear_bit() on
the next line be clear_bit_unlock() or clear_bit_release()?

> - clear_bit(ADAP_SLEEP, &adap->state_bits);
> - smp_mb__after_atomic();
> - wake_up_bit(&adap->state_bits, ADAP_SLEEP);
> + clear_and_wake_up_bit(ADAP_SLEEP, &adap->state_bits);
>   }
>  
>   dev_dbg(&d->udev->dev, "%s: ret=%d\n", __func__, ret);
> diff --git a/fs/afs/fs_probe.c b/fs/afs/fs_probe.c
> index cfe62b154f68..377ee07d5f76 100644
> --- a/fs/afs/fs_probe.c
> +++ b/fs/afs/fs_probe.c
> @@ -18,6 +18,7 @@ static bool afs_fs_probe_done(struct afs_server *server)
>  
>   wake_up_var(&server->probe_outstanding);
>   clear_bit_unlock(AFS_SERVER_FL_PROBING, &server->flags);
> + smp_mb__after_atomic();
>   wake_up_bit(&server->flags, AFS_SERVER_FL_PROBING);
>   return true;
>  }

Looking at this and the dvb one, does it make sense to stick the release
semantics of clear_bit_unlock() into clear_and_wake_up_bit()?

Also, should clear_bit_unlock() be renamed to clear_bit_release() (and
similarly test_and_set_bit_lock() -> test_and_set_bit_acquire()) if we seem to
be trying to standardise on that terminology.

David



Re: [Cluster-devel] [PATCH 29/33] rxrpc_sock_set_min_security_level

2020-05-13 Thread David Howells
Christoph Hellwig  wrote:

> +int rxrpc_sock_set_min_security_level(struct sock *sk, unsigned int val);
> +

Looks good - but you do need to add this to Documentation/networking/rxrpc.txt
also, thanks.

David



Re: [Cluster-devel] [PATCH 21/33] ipv4: add ip_sock_set_mtu_discover

2020-05-13 Thread David Howells
Christoph Hellwig  wrote:

> + ip_sock_set_mtu_discover(conn->params.local->socket->sk,
> + IP_PMTUDISC_DONT);

Um... The socket in question could be an AF_INET6 socket, not an AF_INET4
socket - I presume it will work in that case.  If so:

Reviewed-by: David Howells  [rxrpc bits]



Re: [Cluster-devel] [PATCH 20/33] ipv4: add ip_sock_set_recverr

2020-05-13 Thread David Howells
Christoph Hellwig  wrote:

> Add a helper to directly set the IP_RECVERR sockopt from kernel space
> without going through a fake uaccess.

It looks like if this is an AF_INET6 socket, it will just pass the message
straight through to AF_INET4, so:

Reviewed-by: David Howells 



Re: [Cluster-devel] [PATCH 23/33] ipv6: add ip6_sock_set_recverr

2020-05-13 Thread David Howells
Christoph Hellwig  wrote:

> Add a helper to directly set the IPV6_RECVERR sockopt from kernel space
> without going through a fake uaccess.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: David Howells 



Re: [Cluster-devel] [PATCH 06/33] net: add sock_set_timestamps

2020-05-13 Thread David Howells
Christoph Hellwig  wrote:

> Add a helper to directly set the SO_TIMESTAMP* sockopts from kernel space
> without going through a fake uaccess.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: David Howells 



Re: [Cluster-devel] [PATCH 29/33] rxrpc_sock_set_min_security_level

2020-05-15 Thread David Howells
Christoph Hellwig  wrote:

> > Looks good - but you do need to add this to 
> > Documentation/networking/rxrpc.txt
> > also, thanks.
> 
> That file doesn't exist, instead we now have a
> cumentation/networking/rxrpc.rst in weird markup.

Yeah - that's only in net/next thus far.

> Where do you want this to be added, and with what text?  Remember I don't
> really know what this thing does, I just provide a shortcut.

The document itself describes what each rxrpc sockopt does.  Just look for
RXRPC_MIN_SECURITY_LEVEL in there;-)

Anyway, see the attached.  This also fixes a couple of errors in the doc that
I noticed.

David
---
diff --git a/Documentation/networking/rxrpc.rst 
b/Documentation/networking/rxrpc.rst
index 5ad35113d0f4..68552b92dc44 100644
--- a/Documentation/networking/rxrpc.rst
+++ b/Documentation/networking/rxrpc.rst
@@ -477,7 +477,7 @@ AF_RXRPC sockets support a few socket options at the 
SOL_RXRPC level:
 Encrypted checksum plus packet padded and first eight bytes of packet
 encrypted - which includes the actual packet length.
 
- (c) RXRPC_SECURITY_ENCRYPTED
+ (c) RXRPC_SECURITY_ENCRYPT
 
 Encrypted checksum plus entire packet padded and encrypted, including
 actual packet length.
@@ -578,7 +578,7 @@ A client would issue an operation by:
  This issues a request_key() to get the key representing the security
  context.  The minimum security level can be set::
 
-   unsigned int sec = RXRPC_SECURITY_ENCRYPTED;
+   unsigned int sec = RXRPC_SECURITY_ENCRYPT;
setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
   &sec, sizeof(sec));
 
@@ -1090,6 +1090,15 @@ The kernel interface functions are as follows:
  jiffies).  In the event of the timeout occurring, the call will be
  aborted and -ETIME or -ETIMEDOUT will be returned.
 
+ (#) Apply the RXRPC_MIN_SECURITY_LEVEL sockopt to a socket from within in the
+ kernel::
+
+   int rxrpc_sock_set_min_security_level(struct sock *sk,
+unsigned int val);
+
+ This specifies the minimum security level required for calls on this
+ socket.
+
 
 Configurable Parameters
 ===
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 7dfcbd58da85..e313dae01674 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -57,7 +57,7 @@ int afs_open_socket(struct afs_net *net)
srx.transport.sin6.sin6_port= htons(AFS_CM_PORT);
 
ret = rxrpc_sock_set_min_security_level(socket->sk,
-   RXRPC_SECURITY_ENCRYPT);
+   RXRPC_SECURITY_ENCRYPT);
if (ret < 0)
goto error_2;
 



Re: [Cluster-devel] [PATCH 21/33] ipv4: add ip_sock_set_mtu_discover

2020-05-15 Thread David Howells
Christoph Hellwig  wrote:

> > > + ip_sock_set_mtu_discover(conn->params.local->socket->sk,
> > > + IP_PMTUDISC_DONT);
> > 
> > Um... The socket in question could be an AF_INET6 socket, not an AF_INET4
> > socket - I presume it will work in that case.  If so:
> 
> Yes, the implementation of that sockopt, including the inet_sock
> structure where these options are set is shared between ipv4 and ipv6.

Great!  Could you note that either in the patch description or in the
kerneldoc attached to the function?

Thanks,
David



Re: [Cluster-devel] [PATCH 27/33] sctp: export sctp_setsockopt_bindx

2020-05-15 Thread David Howells
Christoph Hellwig  wrote:

> > The advantage on using kernel_setsockopt here is that sctp module will
> > only be loaded if dlm actually creates a SCTP socket.  With this
> > change, sctp will be loaded on setups that may not be actually using
> > it. It's a quite big module and might expose the system.
> 
> True.  Not that the intent is to kill kernel space callers of setsockopt,
> as I plan to remove the set_fs address space override used for it.

For getsockopt, does it make sense to have the core kernel load optval/optlen
into a buffer before calling the protocol driver?  Then the driver need not
see the userspace pointer at all.

Similar could be done for setsockopt - allocate a buffer of the size requested
by the user inside the kernel and pass it into the driver, then copy the data
back afterwards.

David



Re: [Cluster-devel] [PATCH 29/33] rxrpc: add rxrpc_sock_set_min_security_level

2020-05-21 Thread David Howells
Christoph Hellwig  wrote:

> Add a helper to directly set the RXRPC_MIN_SECURITY_LEVEL sockopt from
> kernel space without going through a fake uaccess.
> 
> Thanks to David Howells for the documentation updates.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David Howells 



Re: [Cluster-devel] [PATCH 05/23] afs: Convert afs_writepages_region() to use filemap_get_folios_tag()

2022-10-14 Thread David Howells
Vishal Moola (Oracle)  wrote:

> Convert to use folios throughout. This function is in preparation to
> remove find_get_pages_range_tag().
> 
> Also modified this function to write the whole batch one at a time,
> rather than calling for a new set every single write.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Tested-by: David Howells 



Re: [Cluster-devel] [PATCH v1 2/3] Treewide: Stop corrupting socket's task_frag

2022-11-21 Thread David Howells


Benjamin Coddington  wrote:

> Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
> GFP_NOIO flag on sk_allocation which the networking system uses to decide
> when it is safe to use current->task_frag.

Um, what's task_frag?

David



Re: [Cluster-devel] [PATCH] filelock: move file locking definitions to separate header file

2022-11-21 Thread David Howells
Jeff Layton  wrote:

> The file locking definitions have lived in fs.h since the dawn of time,
> but they are only used by a small subset of the source files that
> include it.
> 
> Move the file locking definitions to a new header file, and add the
> appropriate #include directives to the source files that need them. By
> doing this we trim down fs.h a bit and limit the amount of rebuilding
> that has to be done when we make changes to the file locking APIs.
> 
> Signed-off-by: Jeff Layton 

Reviewed-by: David Howells 



[Cluster-devel] [RFC PATCH 26/28] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-03-17 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index a9b14f81d655..9c0c691b6106 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1394,8 +1394,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1411,8 +1414,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);



[Cluster-devel] [RFC PATCH v2 39/48] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-03-29 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index a9b14f81d655..9c0c691b6106 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1394,8 +1394,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1411,8 +1414,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);



[Cluster-devel] [PATCH v3 47/55] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-03-31 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index a9b14f81d655..9c0c691b6106 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1394,8 +1394,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1411,8 +1414,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);



Re: [Cluster-devel] [linux-next:master] [splice] 2cb1e08985: stress-ng.sendfile.ops_per_sec 11.6% improvement

2023-06-12 Thread David Howells
kernel test robot  wrote:

> kernel test robot noticed a 11.6% improvement of 
> stress-ng.sendfile.ops_per_sec on:

If it's sending to a socket, this is entirely feasible.  The
splice_to_socket() function now sends multiple pages in one go to the network
protocol's sendmsg() method to process instead of using sendpage to send one
page at a time.

David



[Cluster-devel] [PATCH net-next 09/17] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-06-16 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 3d3802c47b8b..5c12d8cdfc16 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1395,8 +1395,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1412,8 +1415,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);



[Cluster-devel] [PATCH net-next v2 09/17] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-06-17 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 3d3802c47b8b..5c12d8cdfc16 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1395,8 +1395,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1412,8 +1415,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);



[Cluster-devel] [PATCH net-next v3 09/18] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-06-20 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 3d3802c47b8b..5c12d8cdfc16 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1395,8 +1395,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1412,8 +1415,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);



[Cluster-devel] [PATCH net-next v4 06/15] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-06-23 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 3d3802c47b8b..5c12d8cdfc16 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1395,8 +1395,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1412,8 +1415,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);



[Cluster-devel] [PATCH net-next v5 06/16] dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

2023-06-23 Thread David Howells
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage.  This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.

Signed-off-by: David Howells 
cc: Christine Caulfield 
cc: David Teigland 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Jens Axboe 
cc: Matthew Wilcox 
cc: cluster-devel@redhat.com
cc: net...@vger.kernel.org
---
 fs/dlm/lowcomms.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 3d3802c47b8b..5c12d8cdfc16 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1395,8 +1395,11 @@ int dlm_lowcomms_resend_msg(struct dlm_msg *msg)
 /* Send a message */
 static int send_to_sock(struct connection *con)
 {
-   const int msg_flags = MSG_DONTWAIT | MSG_NOSIGNAL;
struct writequeue_entry *e;
+   struct bio_vec bvec;
+   struct msghdr msg = {
+   .msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL,
+   };
int len, offset, ret;
 
spin_lock_bh(&con->writequeue_lock);
@@ -1412,8 +1415,9 @@ static int send_to_sock(struct connection *con)
WARN_ON_ONCE(len == 0 && e->users == 0);
spin_unlock_bh(&con->writequeue_lock);
 
-   ret = kernel_sendpage(con->sock, e->page, offset, len,
- msg_flags);
+   bvec_set_page(&bvec, e->page, len, offset);
+   iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
+   ret = sock_sendmsg(con->sock, &msg);
trace_dlm_send(con->nodeid, ret);
if (ret == -EAGAIN || ret == 0) {
lock_sock(con->sock->sk);