from:"David Howells"

Re: [Linux-cachefs] [RFC PATCH 02/53] netfs: Track the fpos above which the server has no data

2023-10-16 Thread David Howells

Jeff Layton  wrote:

> >  (7) If stored data is culled from the local cache, we must set zero_point
> >  above that if the data also got written to the server.
> 
> When you say culled here, it sounds like you're just throwing out the
> dirty cache without writing the data back. That shouldn't be allowed
> though, so I must be misunderstanding what you mean here. Can you
> explain?

I meant fscache specifically.  Too many caches - and some of them with the
same names!

> >  (8) If dirty data is written back to the server, but not the local cache,
> >  we must set zero_point above that.
> 
> How do you write back without writing to the local cache? I'm guessing
> this means you're doing a non-buffered write?

I meant fscache.  fscache can decline to honour a request to store data.

> > +   if (size != i_size) {
> > +   truncate_pagecache(>netfs.inode, size);
> > +   netfs_resize_file(>netfs, size);
> > +   fscache_resize_cookie(afs_vnode_cache(vnode), size);
> > +   }
> 
> Isn't this an existing bug? AFS is not setting remote_i_size in the
> setattr path currently? I think this probably ought to be done in a
> preliminary AFS patch.

It is being set.  afs_apply_status() sets it.  This is called by
afs_vnode_commit_status() which is called from afs_setattr_success().  The
value isn't updated until we get the return status from the server that
includes the new value.

> > +   loff_t  zero_point; /* Size after which we assume 
> > there's no data
> > +* on the server */
> 
> While I understand the concept, I'm not yet sure I understand how this
> new value will be used. It might be better to merge this patch in with
> the patch that adds the first user of this data.

I'll consider it.  At least it might make sense to move them adjacent to each
other in the series.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [RFC PATCH 08/53] netfs: Add rsize to netfs_io_request

2023-10-16 Thread David Howells

Jeff Layton  wrote:

> > +   rreq->rsize = 4 * 1024 * 1024;
> > return 0;
> ...
> > +   rreq->rsize = 1024 * 1024;
> > +
> 
> Holy magic numbers, batman! I think this deserves a comment that
> explains how you came up with these values.

Actually, that should be set to something like the object size for ceph.

> Also, do 9p and cifs not need this for some reason?

At this point, cifs doesn't use netfslib, so that's implemented in a later
patch in this series.

9p does need setting, but I haven't tested that yet.  It probably needs
setting to 1MiB as I think that's the maximum the 9p transport can handle.

But in the case of cifs, this is actually dynamic, depending on how many
credits we can obtain.  The same may be true of ceph, though I'm not entirely
clear on that as yet.

For afs, the maximum [rw]size the protocol supports is actually something like
281350422593565 (ie. (65535-28) * (2^32-1)) minus a few bytes, but that's
probably not a good idea.  I might be best setting it at something like 256KiB
as that's what OpenAFS uses.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [RFC PATCH 09/53] netfs: Implement unbuffered/DIO vs buffered I/O locking

2023-10-16 Thread David Howells

Jeff Layton  wrote:

> It's nice to see this go into common code, but why not go ahead and
> convert ceph (and possibly NFS) to use this? Is there any reason not to?

I'm converting ceph on a follow-on branch and for ceph this will be dealt with
there.

I could do NFS round about here, I suppose.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 41/53] netfs: Rearrange netfs_io_subrequest to put request pointer first

2023-10-13 Thread David Howells

Rearrange the netfs_io_subrequest struct to put the netfs_io_request
pointer (rreq) first.  This then allows netfs_io_subrequest to be put in a
union with a pointer to a wrapper around netfs_io_request for cifs.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 include/linux/netfs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index c416645649e1..ff4f86ae64e4 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -209,8 +209,8 @@ struct netfs_cache_resources {
  * the pages it points to can be relied on to exist for the duration.
  */
 struct netfs_io_subrequest {
-   struct work_struct  work;
struct netfs_io_request *rreq;  /* Supervising I/O request */
+   struct work_struct  work;
struct list_headrreq_link;  /* Link in rreq->subrequests */
struct iov_iter io_iter;/* Iterator for this subrequest 
*/
loff_t  start;  /* Where to start the I/O */
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 51/53] cifs: Remove some code that's no longer used, part 1

2023-10-13 Thread David Howells

Remove some code that was #if'd out with the netfslib conversion.  This is
split into parts for file.c as the diff generator otherwise produces a hard
to read diff for part of it where a big chunk is cut out.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/cifsglob.h  |  12 -
 fs/smb/client/cifsproto.h |  21 --
 fs/smb/client/file.c  | 639 --
 fs/smb/client/fscache.c   | 111 ---
 fs/smb/client/fscache.h   |  58 
 5 files changed, 841 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index a5e114eeeb8b..01ea1206ec7e 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1443,18 +1443,6 @@ struct cifs_io_subrequest {
struct smbd_mr  *mr;
 #endif
struct cifs_credits credits;
-
-#if 0 // TODO: Remove following elements
-   struct list_headlist;
-   struct completion   done;
-   struct work_struct  work;
-   struct cifsFileInfo *cfile;
-   struct address_space*mapping;
-   struct cifs_aio_ctx *ctx;
-   enum writeback_sync_modes   sync_mode;
-   booluncached;
-   struct bio_vec  *bv;
-#endif
 };
 
 /*
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index 52ff5e889af2..25985b56cd7f 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -580,32 +580,11 @@ void __cifs_put_smb_ses(struct cifs_ses *ses);
 extern struct cifs_ses *
 cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx);
 
-#if 0 // TODO Remove
-void cifs_readdata_release(struct cifs_io_subrequest *rdata);
-static inline void cifs_put_readdata(struct cifs_io_subrequest *rdata)
-{
-   if (refcount_dec_and_test(>subreq.ref))
-   cifs_readdata_release(rdata);
-}
-#endif
 int cifs_async_readv(struct cifs_io_subrequest *rdata);
 int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry 
*mid);
 
 int cifs_async_writev(struct cifs_io_subrequest *wdata);
 void cifs_writev_complete(struct work_struct *work);
-#if 0 // TODO Remove
-struct cifs_io_subrequest *cifs_writedata_alloc(work_func_t complete);
-void cifs_writedata_release(struct cifs_io_subrequest *rdata);
-static inline void cifs_get_writedata(struct cifs_io_subrequest *wdata)
-{
-   refcount_inc(>subreq.ref);
-}
-static inline void cifs_put_writedata(struct cifs_io_subrequest *wdata)
-{
-   if (refcount_dec_and_test(>subreq.ref))
-   cifs_writedata_release(wdata);
-}
-#endif
 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
  struct cifs_sb_info *cifs_sb,
  const unsigned char *path, char *pbuf,
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 4c9125a98d18..2c64dccdc81d 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -411,133 +411,6 @@ const struct netfs_request_ops cifs_req_ops = {
.create_write_requests  = cifs_create_write_requests,
 };
 
-#if 0 // TODO remove 397
-/*
- * Remove the dirty flags from a span of pages.
- */
-static void cifs_undirty_folios(struct inode *inode, loff_t start, unsigned 
int len)
-{
-   struct address_space *mapping = inode->i_mapping;
-   struct folio *folio;
-   pgoff_t end;
-
-   XA_STATE(xas, >i_pages, start / PAGE_SIZE);
-
-   rcu_read_lock();
-
-   end = (start + len - 1) / PAGE_SIZE;
-   xas_for_each_marked(, folio, end, PAGECACHE_TAG_DIRTY) {
-   if (xas_retry(, folio))
-   continue;
-   xas_pause();
-   rcu_read_unlock();
-   folio_lock(folio);
-   folio_clear_dirty_for_io(folio);
-   folio_unlock(folio);
-   rcu_read_lock();
-   }
-
-   rcu_read_unlock();
-}
-
-/*
- * Completion of write to server.
- */
-void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int 
len)
-{
-   struct address_space *mapping = inode->i_mapping;
-   struct folio *folio;
-   pgoff_t end;
-
-   XA_STATE(xas, >i_pages, start / PAGE_SIZE);
-
-   if (!len)
-   return;
-
-   rcu_read_lock();
-
-   end = (start + len - 1) / PAGE_SIZE;
-   xas_for_each(, folio, end) {
-   if (xas_retry(, folio))
-   continue;
-   if (!folio_test_writeback(folio)) {
-   WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
- len, start, folio_index(folio), end);
-   continue;
-   }
-
-   folio_detach_private(folio);
-

[Linux-cachefs] [RFC PATCH 53/53] cifs: Remove some code that's no longer used, part 3

2023-10-13 Thread David Howells

Remove some code that was #if'd out with the netfslib conversion.  This is
split into parts for file.c as the diff generator otherwise produces a hard
to read diff for part of it where a big chunk is cut out.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/file.c | 1003 --
 1 file changed, 1003 deletions(-)

diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index f6b148aa184c..be2c786e7c52 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2700,470 +2700,6 @@ int cifs_flush(struct file *file, fl_owner_t id)
return rc;
 }
 
-#if 0 // TODO remove 3594
-static void collect_uncached_write_data(struct cifs_aio_ctx *ctx);
-
-static void
-cifs_uncached_writev_complete(struct work_struct *work)
-{
-   struct cifs_io_subrequest *wdata = container_of(work,
-   struct cifs_io_subrequest, work);
-   struct inode *inode = d_inode(wdata->cfile->dentry);
-   struct cifsInodeInfo *cifsi = CIFS_I(inode);
-
-   spin_lock(>i_lock);
-   cifs_update_eof(cifsi, wdata->subreq.start, wdata->subreq.len);
-   if (cifsi->netfs.remote_i_size > inode->i_size)
-   i_size_write(inode, cifsi->netfs.remote_i_size);
-   spin_unlock(>i_lock);
-
-   complete(>done);
-   collect_uncached_write_data(wdata->ctx);
-   /* the below call can possibly free the last ref to aio ctx */
-   cifs_put_writedata(wdata);
-}
-
-static int
-cifs_resend_wdata(struct cifs_io_subrequest *wdata, struct list_head 
*wdata_list,
-   struct cifs_aio_ctx *ctx)
-{
-   size_t wsize;
-   struct cifs_credits credits;
-   int rc;
-   struct TCP_Server_Info *server = wdata->server;
-
-   do {
-   if (wdata->cfile->invalidHandle) {
-   rc = cifs_reopen_file(wdata->cfile, false);
-   if (rc == -EAGAIN)
-   continue;
-   else if (rc)
-   break;
-   }
-
-
-   /*
-* Wait for credits to resend this wdata.
-* Note: we are attempting to resend the whole wdata not in
-* segments
-*/
-   do {
-   rc = server->ops->wait_mtu_credits(server, 
wdata->subreq.len,
-   , );
-   if (rc)
-   goto fail;
-
-   if (wsize < wdata->subreq.len) {
-   add_credits_and_wake_if(server, , 0);
-   msleep(1000);
-   }
-   } while (wsize < wdata->subreq.len);
-   wdata->credits = credits;
-
-   rc = adjust_credits(server, >credits, wdata->subreq.len);
-
-   if (!rc) {
-   if (wdata->cfile->invalidHandle)
-   rc = -EAGAIN;
-   else {
-#ifdef CONFIG_CIFS_SMB_DIRECT
-   if (wdata->mr) {
-   wdata->mr->need_invalidate = true;
-   smbd_deregister_mr(wdata->mr);
-   wdata->mr = NULL;
-   }
-#endif
-   rc = server->ops->async_writev(wdata);
-   }
-   }
-
-   /* If the write was successfully sent, we are done */
-   if (!rc) {
-   list_add_tail(>list, wdata_list);
-   return 0;
-   }
-
-   /* Roll back credits and retry if needed */
-   add_credits_and_wake_if(server, >credits, 0);
-   } while (rc == -EAGAIN);
-
-fail:
-   cifs_put_writedata(wdata);
-   return rc;
-}
-
-/*
- * Select span of a bvec iterator we're going to use.  Limit it by both maximum
- * size and maximum number of segments.
- */
-static size_t cifs_limit_bvec_subset(const struct iov_iter *iter, size_t 
max_size,
-size_t max_segs, unsigned int *_nsegs)
-{
-   const struct bio_vec *bvecs = iter->bvec;
-   unsigned int nbv = iter->nr_segs, ix = 0, nsegs = 0;
-   size_t len, span = 0, n = iter->count;
-   size_t skip = iter->iov_offset;
-
-   if (WARN_ON(!iov_iter_is_bvec(iter)) || n == 0)
-   return 0;
-
-   while (n && ix < nbv && skip) {
-   len = bvecs[ix].bv_len;
-   if (skip < len)
-   break;
-   skip -= len;
-

[Linux-cachefs] [RFC PATCH 50/53] cifs: Cut over to using netfslib

2023-10-13 Thread David Howells

Make the cifs filesystem use netfslib to handle reading and writing on
behalf of cifs.  The changes include:

 (1) Various read_iter/write_iter type functions are turned into wrappers
 around netfslib API functions or are pointed directly at those
 functions:

cifs_file_direct{,_nobrl}_ops switch to use
netfs_unbuffered_read_iter and netfs_unbuffered_write_iter.

Large pieces of code that will be removed are #if'd out and will be removed
in subsequent patches.

[?] Why does cifs mark the page dirty in the destination buffer of a DIO
read?  Should that happen automatically?  Does netfs need to do that?

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/io.c |   7 +-
 fs/smb/client/cifsfs.c|   8 +--
 fs/smb/client/cifsfs.h|   8 +--
 fs/smb/client/cifsglob.h  |   3 +-
 fs/smb/client/cifsproto.h |   4 ++
 fs/smb/client/cifssmb.c   |  45 +++-
 fs/smb/client/file.c  | 130 ++
 fs/smb/client/fscache.c   |   2 +
 fs/smb/client/fscache.h   |   4 ++
 fs/smb/client/inode.c |  19 -
 fs/smb/client/smb2pdu.c   |  98 --
 fs/smb/client/trace.h | 144 +-
 fs/smb/client/transport.c |   3 +
 13 files changed, 326 insertions(+), 149 deletions(-)

diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 14a9f3312d3b..112fa0548f22 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -351,8 +351,13 @@ static void netfs_rreq_assess_dio(struct netfs_io_request 
*rreq)
unsigned int i;
size_t transferred = 0;
 
-   for (i = 0; i < rreq->direct_bv_count; i++)
+   for (i = 0; i < rreq->direct_bv_count; i++) {
flush_dcache_page(rreq->direct_bv[i].bv_page);
+   // TODO: cifs marks pages in the destination buffer
+   // dirty under some circumstances after a read.  Do we
+   // need to do that too?
+   set_page_dirty(rreq->direct_bv[i].bv_page);
+   }
 
list_for_each_entry(subreq, >subrequests, rreq_link) {
if (subreq->error || subreq->transferred == 0)
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 0c19b65206f6..26b6ea9eb53e 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1352,8 +1352,8 @@ const struct file_operations cifs_file_strict_ops = {
 };
 
 const struct file_operations cifs_file_direct_ops = {
-   .read_iter = cifs_direct_readv,
-   .write_iter = cifs_direct_writev,
+   .read_iter = netfs_unbuffered_read_iter,
+   .write_iter = netfs_file_write_iter,
.open = cifs_open,
.release = cifs_close,
.lock = cifs_lock,
@@ -1408,8 +1408,8 @@ const struct file_operations cifs_file_strict_nobrl_ops = 
{
 };
 
 const struct file_operations cifs_file_direct_nobrl_ops = {
-   .read_iter = cifs_direct_readv,
-   .write_iter = cifs_direct_writev,
+   .read_iter = netfs_unbuffered_read_iter,
+   .write_iter = netfs_file_write_iter,
.open = cifs_open,
.release = cifs_close,
.fsync = cifs_fsync,
diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index 24d5bac07f87..6bbb26a462db 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -85,6 +85,7 @@ extern const struct inode_operations 
cifs_namespace_inode_operations;
 
 
 /* Functions related to files and directories */
+extern const struct netfs_request_ops cifs_req_ops;
 extern const struct file_operations cifs_file_ops;
 extern const struct file_operations cifs_file_direct_ops; /* if directio mnt */
 extern const struct file_operations cifs_file_strict_ops; /* if strictio mnt */
@@ -94,11 +95,7 @@ extern const struct file_operations 
cifs_file_strict_nobrl_ops;
 extern int cifs_open(struct inode *inode, struct file *file);
 extern int cifs_close(struct inode *inode, struct file *file);
 extern int cifs_closedir(struct inode *inode, struct file *file);
-extern ssize_t cifs_user_readv(struct kiocb *iocb, struct iov_iter *to);
-extern ssize_t cifs_direct_readv(struct kiocb *iocb, struct iov_iter *to);
 extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct iov_iter *to);
-extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
-extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
 ssize_t cifs_file_write_iter(struct kiocb *iocb, struct iov_iter *from);
 ssize_t cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter);
@@ -112,9 +109,6 @@ extern int cifs_file_strict_mmap(struct file *file, struct 
vm_area_struct *vma);
 extern const struct file_operations cifs_dir_ops;
 extern int cifs_dir_open(struct inode *inode, struct file *file);
 extern i

[Linux-cachefs] [RFC PATCH 52/53] cifs: Remove some code that's no longer used, part 2

2023-10-13 Thread David Howells

Remove some code that was #if'd out with the netfslib conversion.  This is
split into parts for file.c as the diff generator otherwise produces a hard
to read diff for part of it where a big chunk is cut out.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/file.c | 696 +--
 1 file changed, 1 insertion(+), 695 deletions(-)

diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 2c64dccdc81d..f6b148aa184c 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2574,701 +2574,6 @@ cifs_get_readable_path(struct cifs_tcon *tcon, const 
char *name,
return -ENOENT;
 }
 
-#if 0 // TODO remove 2773
-void
-cifs_writedata_release(struct cifs_io_subrequest *wdata)
-{
-   if (wdata->uncached)
-   kref_put(>ctx->refcount, cifs_aio_ctx_release);
-#ifdef CONFIG_CIFS_SMB_DIRECT
-   if (wdata->mr) {
-   smbd_deregister_mr(wdata->mr);
-   wdata->mr = NULL;
-   }
-#endif
-
-   if (wdata->cfile)
-   cifsFileInfo_put(wdata->cfile);
-
-   kfree(wdata);
-}
-
-/*
- * Write failed with a retryable error. Resend the write request. It's also
- * possible that the page was redirtied so re-clean the page.
- */
-static void
-cifs_writev_requeue(struct cifs_io_subrequest *wdata)
-{
-   int rc = 0;
-   struct inode *inode = d_inode(wdata->cfile->dentry);
-   struct TCP_Server_Info *server;
-   unsigned int rest_len = wdata->subreq.len;
-   loff_t fpos = wdata->subreq.start;
-
-   server = tlink_tcon(wdata->cfile->tlink)->ses->server;
-   do {
-   struct cifs_io_subrequest *wdata2;
-   unsigned int wsize, cur_len;
-
-   wsize = server->ops->wp_retry_size(inode);
-   if (wsize < rest_len) {
-   if (wsize < PAGE_SIZE) {
-   rc = -EOPNOTSUPP;
-   break;
-   }
-   cur_len = min(round_down(wsize, PAGE_SIZE), rest_len);
-   } else {
-   cur_len = rest_len;
-   }
-
-   wdata2 = cifs_writedata_alloc(cifs_writev_complete);
-   if (!wdata2) {
-   rc = -ENOMEM;
-   break;
-   }
-
-   wdata2->sync_mode = wdata->sync_mode;
-   wdata2->subreq.start= fpos;
-   wdata2->subreq.len  = cur_len;
-   wdata2->subreq.io_iter = wdata->subreq.io_iter;
-
-   iov_iter_advance(>subreq.io_iter, fpos - 
wdata->subreq.start);
-   iov_iter_truncate(>subreq.io_iter, wdata2->subreq.len);
-
-   if (iov_iter_is_xarray(>subreq.io_iter))
-   /* Check for pages having been redirtied and clean
-* them.  We can do this by walking the xarray.  If
-* it's not an xarray, then it's a DIO and we shouldn't
-* be mucking around with the page bits.
-*/
-   cifs_undirty_folios(inode, fpos, cur_len);
-
-   rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY,
-   >cfile);
-   if (!wdata2->cfile) {
-   cifs_dbg(VFS, "No writable handle to retry writepages 
rc=%d\n",
-rc);
-   if (!is_retryable_error(rc))
-   rc = -EBADF;
-   } else {
-   wdata2->pid = wdata2->cfile->pid;
-   rc = server->ops->async_writev(wdata2);
-   }
-
-   cifs_put_writedata(wdata2);
-   if (rc) {
-   if (is_retryable_error(rc))
-   continue;
-   fpos += cur_len;
-   rest_len -= cur_len;
-   break;
-   }
-
-   fpos += cur_len;
-   rest_len -= cur_len;
-   } while (rest_len > 0);
-
-   /* Clean up remaining pages from the original wdata */
-   if (iov_iter_is_xarray(>subreq.io_iter))
-   cifs_pages_write_failed(inode, fpos, rest_len);
-
-   if (rc != 0 && !is_retryable_error(rc))
-   mapping_set_error(inode->i_mapping, rc);
-   cifs_put_writedata(wdata);
-}
-
-void
-cifs_writev_complete(struct work_struct *work)
-{
-   struct cifs_io_subrequest *wdata = container_of(work,
-   struct cifs_io_subrequest, 
work);
-   struct inode *inode = d_inod

[Linux-cachefs] [RFC PATCH 49/53] cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c

2023-10-13 Thread David Howells

Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c so that
they are colocated with similar functions rather than being split with
cifsfs.c.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/cifsfs.c | 55 --
 fs/smb/client/cifsfs.h |  2 ++
 fs/smb/client/file.c   | 53 
 3 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 85799e9e0f4c..0c19b65206f6 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -982,61 +982,6 @@ cifs_smb3_do_mount(struct file_system_type *fs_type,
return root;
 }
 
-
-static ssize_t
-cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter)
-{
-   ssize_t rc;
-   struct inode *inode = file_inode(iocb->ki_filp);
-
-   if (iocb->ki_flags & IOCB_DIRECT)
-   return cifs_user_readv(iocb, iter);
-
-   rc = cifs_revalidate_mapping(inode);
-   if (rc)
-   return rc;
-
-   return generic_file_read_iter(iocb, iter);
-}
-
-static ssize_t cifs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
-{
-   struct inode *inode = file_inode(iocb->ki_filp);
-   struct cifsInodeInfo *cinode = CIFS_I(inode);
-   ssize_t written;
-   int rc;
-
-   if (iocb->ki_filp->f_flags & O_DIRECT) {
-   written = cifs_user_writev(iocb, from);
-   if (written > 0 && CIFS_CACHE_READ(cinode)) {
-   cifs_zap_mapping(inode);
-   cifs_dbg(FYI,
-"Set no oplock for inode=%p after a write 
operation\n",
-inode);
-   cinode->oplock = 0;
-   }
-   return written;
-   }
-
-   written = cifs_get_writer(cinode);
-   if (written)
-   return written;
-
-   written = generic_file_write_iter(iocb, from);
-
-   if (CIFS_CACHE_WRITE(CIFS_I(inode)))
-   goto out;
-
-   rc = filemap_fdatawrite(inode->i_mapping);
-   if (rc)
-   cifs_dbg(FYI, "cifs_file_write_iter: %d rc on %p inode\n",
-rc, inode);
-
-out:
-   cifs_put_writer(cinode);
-   return written;
-}
-
 static loff_t cifs_llseek(struct file *file, loff_t offset, int whence)
 {
struct cifsFileInfo *cfile = file->private_data;
diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index 41daebd220ff..24d5bac07f87 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -100,6 +100,8 @@ extern ssize_t cifs_strict_readv(struct kiocb *iocb, struct 
iov_iter *to);
 extern ssize_t cifs_user_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_direct_writev(struct kiocb *iocb, struct iov_iter *from);
 extern ssize_t cifs_strict_writev(struct kiocb *iocb, struct iov_iter *from);
+ssize_t cifs_file_write_iter(struct kiocb *iocb, struct iov_iter *from);
+ssize_t cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter);
 extern int cifs_flock(struct file *pfile, int cmd, struct file_lock *plock);
 extern int cifs_lock(struct file *, int, struct file_lock *);
 extern int cifs_fsync(struct file *, loff_t, loff_t, int);
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index 6c7b91728dd4..3112233c4835 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -4584,6 +4584,59 @@ ssize_t cifs_user_readv(struct kiocb *iocb, struct 
iov_iter *to)
return __cifs_readv(iocb, to, false);
 }
 
+ssize_t cifs_loose_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   ssize_t rc;
+   struct inode *inode = file_inode(iocb->ki_filp);
+
+   if (iocb->ki_flags & IOCB_DIRECT)
+   return cifs_user_readv(iocb, iter);
+
+   rc = cifs_revalidate_mapping(inode);
+   if (rc)
+   return rc;
+
+   return generic_file_read_iter(iocb, iter);
+}
+
+ssize_t cifs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct inode *inode = file_inode(iocb->ki_filp);
+   struct cifsInodeInfo *cinode = CIFS_I(inode);
+   ssize_t written;
+   int rc;
+
+   if (iocb->ki_filp->f_flags & O_DIRECT) {
+   written = cifs_user_writev(iocb, from);
+   if (written > 0 && CIFS_CACHE_READ(cinode)) {
+   cifs_zap_mapping(inode);
+   cifs_dbg(FYI,
+"Set no oplock for inode=%p after a write 
operation\n",
+inode);
+   cinode->oplock = 0;
+   }
+   return written;
+   }
+
+

[Linux-cachefs] [RFC PATCH 45/53] cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest

2023-10-13 Thread David Howells

Replace the cifs_writedata struct with the same wrapper around
netfs_io_subrequest that was used to replace cifs_readdata.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/cifsglob.h  | 30 +++
 fs/smb/client/cifsproto.h | 16 ++--
 fs/smb/client/cifssmb.c   |  9 ++---
 fs/smb/client/file.c  | 79 ---
 fs/smb/client/smb2pdu.c   |  9 ++---
 fs/smb/client/smb2proto.h |  3 +-
 6 files changed, 58 insertions(+), 88 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 1943d035b8d3..0b1835751bda 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -238,7 +238,6 @@ struct cifs_fattr;
 struct smb3_fs_context;
 struct cifs_fid;
 struct cifs_io_subrequest;
-struct cifs_writedata;
 struct cifs_io_parms;
 struct cifs_search_info;
 struct cifsInodeInfo;
@@ -413,8 +412,7 @@ struct smb_version_operations {
/* async read from the server */
int (*async_readv)(struct cifs_io_subrequest *);
/* async write to the server */
-   int (*async_writev)(struct cifs_writedata *,
-   void (*release)(struct kref *));
+   int (*async_writev)(struct cifs_io_subrequest *);
/* sync read from the server */
int (*sync_read)(const unsigned int, struct cifs_fid *,
 struct cifs_io_parms *, unsigned int *, char **,
@@ -1438,35 +1436,17 @@ struct cifs_io_subrequest {
 #endif
struct cifs_credits credits;
 
-   // TODO: Remove following elements
-   struct list_headlist;
-   struct completion   done;
-   struct work_struct  work;
-   struct iov_iter iter;
-   __u64   offset;
-   unsigned intbytes;
-};
+   enum writeback_sync_modes   sync_mode;
+   booluncached;
+   struct bio_vec  *bv;
 
-/* asynchronous write support */
-struct cifs_writedata {
-   struct kref refcount;
+   // TODO: Remove following elements
struct list_headlist;
struct completion   done;
-   enum writeback_sync_modes   sync_mode;
struct work_struct  work;
-   struct cifsFileInfo *cfile;
-   struct cifs_aio_ctx *ctx;
struct iov_iter iter;
-   struct bio_vec  *bv;
__u64   offset;
-   pid_t   pid;
unsigned intbytes;
-   int result;
-   struct TCP_Server_Info  *server;
-#ifdef CONFIG_CIFS_SMB_DIRECT
-   struct smbd_mr  *mr;
-#endif
-   struct cifs_credits credits;
 };
 
 /*
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index 7748fe148fb4..561dac1576a5 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -589,11 +589,19 @@ static inline void cifs_put_readdata(struct 
cifs_io_subrequest *rdata)
 int cifs_async_readv(struct cifs_io_subrequest *rdata);
 int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry 
*mid);
 
-int cifs_async_writev(struct cifs_writedata *wdata,
- void (*release)(struct kref *kref));
+int cifs_async_writev(struct cifs_io_subrequest *wdata);
 void cifs_writev_complete(struct work_struct *work);
-struct cifs_writedata *cifs_writedata_alloc(work_func_t complete);
-void cifs_writedata_release(struct kref *refcount);
+struct cifs_io_subrequest *cifs_writedata_alloc(work_func_t complete);
+void cifs_writedata_release(struct cifs_io_subrequest *rdata);
+static inline void cifs_get_writedata(struct cifs_io_subrequest *wdata)
+{
+   refcount_inc(>subreq.ref);
+}
+static inline void cifs_put_writedata(struct cifs_io_subrequest *wdata)
+{
+   if (refcount_dec_and_test(>subreq.ref))
+   cifs_writedata_release(wdata);
+}
 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
  struct cifs_sb_info *cifs_sb,
  const unsigned char *path, char *pbuf,
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index 76005b3d5ffe..14fca3fa3e08 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1610,7 +1610,7 @@ CIFSSMBWrite(const unsigned int xid, struct cifs_io_parms 
*io_parms,
 static void
 cifs_writev_callback(struct mid_q_entry *mid)
 {
-   struct cifs_writedata *wdata = mid->callback_data;
+   struct cifs_io_subrequest *wdata = mid->callback_data;
struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);
un

[Linux-cachefs] [RFC PATCH 48/53] cifs: Implement netfslib hooks

2023-10-13 Thread David Howells

Provide implementation of the netfslib hooks that will be used by netfslib
to ask cifs to set up and perform operations.  Of particular note are

 (*) cifs_clamp_length() - This is used to negotiate the size of the next
 subrequest in a read request, taking into account the credit available
 and the rsize.  The credits are attached to the subrequest.

 (*) cifs_req_issue_read() - This is used to issue a subrequest that has
 been set up and clamped.

 (*) cifs_create_write_requests() - This is used to break the given span of
 file positions into suboperations according to cifs's wsize and
 available credits.  As each subop is created, it can be dispatched or
 queued for dispatch.

At this point, cifs is not wired up to actually *use* netfslib; that will
be done in a subsequent patch.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_write.c|   3 +
 fs/smb/client/Kconfig|   1 +
 fs/smb/client/cifsglob.h |  26 ++-
 fs/smb/client/file.c | 373 +++
 include/linux/netfs.h|   1 +
 include/trace/events/netfs.h |   1 +
 6 files changed, 397 insertions(+), 8 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 6657dbd07b9d..c2f7dc99ff92 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -373,6 +373,9 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct 
iov_iter *iter,
} while (iov_iter_count(iter));
 
 out:
+   if (likely(written) && ctx->ops->post_modify)
+   ctx->ops->post_modify(inode);
+
if (unlikely(wreq)) {
ret = netfs_end_writethrough(wreq, iocb);
wbc_detach_inode();
diff --git a/fs/smb/client/Kconfig b/fs/smb/client/Kconfig
index 2927bd174a88..2517dc242386 100644
--- a/fs/smb/client/Kconfig
+++ b/fs/smb/client/Kconfig
@@ -2,6 +2,7 @@
 config CIFS
tristate "SMB3 and CIFS support (advanced network filesystem)"
depends on INET
+   select NETFS_SUPPORT
select NLS
select NLS_UCS2_UTILS
select CRYPTO
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 73367fc3a77c..a215c092725a 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1420,15 +1420,23 @@ struct cifs_aio_ctx {
booldirect_io;
 };
 
+struct cifs_io_request {
+   struct netfs_io_request rreq;
+   struct cifsFileInfo *cfile;
+};
+
 /* asynchronous read support */
 struct cifs_io_subrequest {
-   struct netfs_io_subrequest  subreq;
-   struct cifsFileInfo *cfile;
-   struct address_space*mapping;
-   struct cifs_aio_ctx *ctx;
+   union {
+   struct netfs_io_subrequest subreq;
+   struct netfs_io_request *rreq;
+   struct cifs_io_request *req;
+   };
ssize_t got_bytes;
pid_t   pid;
+   unsigned intxid;
int result;
+   boolhave_credits;
struct kvec iov[2];
struct TCP_Server_Info  *server;
 #ifdef CONFIG_CIFS_SMB_DIRECT
@@ -1436,14 +1444,16 @@ struct cifs_io_subrequest {
 #endif
struct cifs_credits credits;
 
-   enum writeback_sync_modes   sync_mode;
-   booluncached;
-   struct bio_vec  *bv;
-
// TODO: Remove following elements
struct list_headlist;
struct completion   done;
struct work_struct  work;
+   struct cifsFileInfo *cfile;
+   struct address_space*mapping;
+   struct cifs_aio_ctx *ctx;
+   enum writeback_sync_modes   sync_mode;
+   booluncached;
+   struct bio_vec  *bv;
 };
 
 /*
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index dd5e52d5e8d0..6c7b91728dd4 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -36,6 +36,379 @@
 #include "fs_context.h"
 #include "cifs_ioctl.h"
 #include "cached_dir.h"
+#include 
+
+static int cifs_reopen_file(struct cifsFileInfo *cfile, bool can_flush);
+
+static void cifs_upload_to_server(struct netfs_io_subrequest *subreq)
+{
+   struct cifs_io_subrequest *wdata =
+   container_of(subreq, struct cifs_io_subrequest, subreq);
+   ssize_t rc;
+
+   trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
+
+   if (wdata->req->cfile->invalidHandle)
+   rc = -EAGAIN;
+   else
+

[Linux-cachefs] [RFC PATCH 47/53] cifs: Make wait_mtu_credits take size_t args

2023-10-13 Thread David Howells

Make the wait_mtu_credits functions use size_t for the size and num
arguments rather than unsigned int as netfslib uses size_t/ssize_t for
arguments and return values to allow for extra capacity.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/cifsglob.h  |  4 ++--
 fs/smb/client/cifsproto.h |  2 +-
 fs/smb/client/file.c  | 18 ++
 fs/smb/client/smb2ops.c   |  4 ++--
 fs/smb/client/transport.c |  4 ++--
 5 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index c7f04f9853c5..73367fc3a77c 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -507,8 +507,8 @@ struct smb_version_operations {
/* writepages retry size */
unsigned int (*wp_retry_size)(struct inode *);
/* get mtu credits */
-   int (*wait_mtu_credits)(struct TCP_Server_Info *, unsigned int,
-   unsigned int *, struct cifs_credits *);
+   int (*wait_mtu_credits)(struct TCP_Server_Info *, size_t,
+   size_t *, struct cifs_credits *);
/* adjust previously taken mtu credits to request size */
int (*adjust_credits)(struct TCP_Server_Info *server,
  struct cifs_credits *credits,
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index 561dac1576a5..735337e8326c 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -121,7 +121,7 @@ extern struct mid_q_entry *cifs_setup_async_request(struct 
TCP_Server_Info *,
 extern int cifs_check_receive(struct mid_q_entry *mid,
struct TCP_Server_Info *server, bool log_error);
 extern int cifs_wait_mtu_credits(struct TCP_Server_Info *server,
-unsigned int size, unsigned int *num,
+size_t size, size_t *num,
 struct cifs_credits *credits);
 extern int SendReceive2(const unsigned int /* xid */ , struct cifs_ses *,
struct kvec *, int /* nvec to send */,
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index c70d106a413f..dd5e52d5e8d0 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2733,9 +2733,9 @@ static ssize_t cifs_write_back_from_locked_folio(struct 
address_space *mapping,
struct cifs_credits credits_on_stack;
struct cifs_credits *credits = _on_stack;
struct cifsFileInfo *cfile = NULL;
-   unsigned int xid, wsize, len;
+   unsigned int xid, len;
loff_t i_size = i_size_read(inode);
-   size_t max_len;
+   size_t max_len, wsize;
long count = wbc->nr_to_write;
int rc;
 
@@ -3248,7 +3248,7 @@ static int
 cifs_resend_wdata(struct cifs_io_subrequest *wdata, struct list_head 
*wdata_list,
struct cifs_aio_ctx *ctx)
 {
-   unsigned int wsize;
+   size_t wsize;
struct cifs_credits credits;
int rc;
struct TCP_Server_Info *server = wdata->server;
@@ -3382,7 +3382,8 @@ cifs_write_from_iter(loff_t fpos, size_t len, struct 
iov_iter *from,
do {
struct cifs_credits credits_on_stack;
struct cifs_credits *credits = _on_stack;
-   unsigned int wsize, nsegs = 0;
+   unsigned int nsegs = 0;
+   size_t wsize;
 
if (signal_pending(current)) {
rc = -EINTR;
@@ -3819,7 +3820,7 @@ static int cifs_resend_rdata(struct cifs_io_subrequest 
*rdata,
struct list_head *rdata_list,
struct cifs_aio_ctx *ctx)
 {
-   unsigned int rsize;
+   size_t rsize;
struct cifs_credits credits;
int rc;
struct TCP_Server_Info *server;
@@ -3893,10 +3894,10 @@ cifs_send_async_read(loff_t fpos, size_t len, struct 
cifsFileInfo *open_file,
 struct cifs_aio_ctx *ctx)
 {
struct cifs_io_subrequest *rdata;
-   unsigned int rsize, nsegs, max_segs = INT_MAX;
+   unsigned int nsegs, max_segs = INT_MAX;
struct cifs_credits credits_on_stack;
struct cifs_credits *credits = _on_stack;
-   size_t cur_len, max_len;
+   size_t cur_len, max_len, rsize;
int rc;
pid_t pid;
struct TCP_Server_Info *server;
@@ -4492,12 +4493,13 @@ static void cifs_readahead(struct readahead_control 
*ractl)
 * Chop the readahead request up into rsize-sized read requests.
 */
while ((nr_pages = ra_pages)) {
-   unsigned int i, rsize;
+   unsigned int i;
struct cifs_io_subrequest *rdata;
struct cifs_credits credits_on_stack;
struct cifs_credits *credits = _on_stack;
struct folio

[Linux-cachefs] [RFC PATCH 44/53] cifs: Share server EOF pos with netfslib

2023-10-13 Thread David Howells

Use cifsi->netfs_ctx.remote_i_size instead of cifsi->server_eof so that
netfslib can refer to it to.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/cifsfs.c   |  2 +-
 fs/smb/client/cifsglob.h |  1 -
 fs/smb/client/file.c |  8 
 fs/smb/client/inode.c|  6 +++---
 fs/smb/client/smb2ops.c  | 10 +-
 5 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 22869cda1356..85799e9e0f4c 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -395,7 +395,7 @@ cifs_alloc_inode(struct super_block *sb)
spin_lock_init(_inode->writers_lock);
cifs_inode->writers = 0;
cifs_inode->netfs.inode.i_blkbits = 14;  /* 2**14 = CIFS_MAX_MSGSIZE */
-   cifs_inode->server_eof = 0;
+   cifs_inode->netfs.remote_i_size = 0;
cifs_inode->uniqueid = 0;
cifs_inode->createtime = 0;
cifs_inode->epoch = 0;
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 22fa98428845..1943d035b8d3 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1527,7 +1527,6 @@ struct cifsInodeInfo {
spinlock_t writers_lock;
unsigned int writers;   /* Number of writers on this inode */
unsigned long time; /* jiffies of last update of inode */
-   u64  server_eof;/* current file size on server -- 
protected by i_lock */
u64  uniqueid;  /* server inode number */
u64  createtime;/* creation time on server */
__u8 lease_key[SMB2_LEASE_KEY_SIZE];/* lease key for this inode */
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b4f16ef62115..0383ce61ac35 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -2117,8 +2117,8 @@ cifs_update_eof(struct cifsInodeInfo *cifsi, loff_t 
offset,
 {
loff_t end_of_write = offset + bytes_written;
 
-   if (end_of_write > cifsi->server_eof)
-   cifsi->server_eof = end_of_write;
+   if (end_of_write > cifsi->netfs.remote_i_size)
+   netfs_resize_file(>netfs, end_of_write);
 }
 
 static ssize_t
@@ -3246,8 +3246,8 @@ cifs_uncached_writev_complete(struct work_struct *work)
 
spin_lock(>i_lock);
cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
-   if (cifsi->server_eof > inode->i_size)
-   i_size_write(inode, cifsi->server_eof);
+   if (cifsi->netfs.remote_i_size > inode->i_size)
+   i_size_write(inode, cifsi->netfs.remote_i_size);
spin_unlock(>i_lock);
 
complete(>done);
diff --git a/fs/smb/client/inode.c b/fs/smb/client/inode.c
index d7c302442c1e..6815b50ec56c 100644
--- a/fs/smb/client/inode.c
+++ b/fs/smb/client/inode.c
@@ -102,7 +102,7 @@ cifs_revalidate_cache(struct inode *inode, struct 
cifs_fattr *fattr)
 /* revalidate if mtime or size have changed */
fattr->cf_mtime = timestamp_truncate(fattr->cf_mtime, inode);
if (timespec64_equal(>i_mtime, >cf_mtime) &&
-   cifs_i->server_eof == fattr->cf_eof) {
+   cifs_i->netfs.remote_i_size == fattr->cf_eof) {
cifs_dbg(FYI, "%s: inode %llu is unchanged\n",
 __func__, cifs_i->uniqueid);
return;
@@ -191,7 +191,7 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr 
*fattr)
else
clear_bit(CIFS_INO_DELETE_PENDING, _i->flags);
 
-   cifs_i->server_eof = fattr->cf_eof;
+   cifs_i->netfs.remote_i_size = fattr->cf_eof;
/*
 * Can't safely change the file size here if the client is writing to
 * it due to potential races.
@@ -2776,7 +2776,7 @@ cifs_set_file_size(struct inode *inode, struct iattr 
*attrs,
 
 set_size_out:
if (rc == 0) {
-   cifsInode->server_eof = attrs->ia_size;
+   netfs_resize_file(>netfs, attrs->ia_size);
cifs_setsize(inode, attrs->ia_size);
/*
 * i_blocks is not related to (i_size / i_blksize), but instead
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index dc18130db9b3..e7f765673246 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -3554,7 +3554,7 @@ static long smb3_simple_falloc(struct file *file, struct 
cifs_tcon *tcon,
rc = SMB2_set_eof(xid, tcon, cfile->fid.persistent_fid,
  cfile->fid.volatile_fid, cfile->pid, );
if (rc == 0) {
-   cifsi->server_eof = off + len;
+

[Linux-cachefs] [RFC PATCH 46/53] cifs: Use more fields from netfs_io_subrequest

2023-10-13 Thread David Howells

Use more fields from netfs_io_subrequest instead of those incorporated into
cifs_io_subrequest from cifs_readdata and cifs_writedata.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/cifsglob.h  |   3 -
 fs/smb/client/cifssmb.c   |  52 +-
 fs/smb/client/file.c  | 112 +++---
 fs/smb/client/smb2ops.c   |   4 +-
 fs/smb/client/smb2pdu.c   |  52 +-
 fs/smb/client/transport.c |   6 +-
 6 files changed, 113 insertions(+), 116 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 0b1835751bda..c7f04f9853c5 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1444,9 +1444,6 @@ struct cifs_io_subrequest {
struct list_headlist;
struct completion   done;
struct work_struct  work;
-   struct iov_iter iter;
-   __u64   offset;
-   unsigned intbytes;
 };
 
 /*
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index 14fca3fa3e08..112a5a2d95b8 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -1267,12 +1267,12 @@ cifs_readv_callback(struct mid_q_entry *mid)
struct TCP_Server_Info *server = tcon->ses->server;
struct smb_rqst rqst = { .rq_iov = rdata->iov,
 .rq_nvec = 2,
-.rq_iter = rdata->iter };
+.rq_iter = rdata->subreq.io_iter };
struct cifs_credits credits = { .value = 1, .instance = 0 };
 
-   cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n",
+   cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%zu\n",
 __func__, mid->mid, mid->mid_state, rdata->result,
-rdata->bytes);
+rdata->subreq.len);
 
switch (mid->mid_state) {
case MID_RESPONSE_RECEIVED:
@@ -1320,14 +1320,14 @@ cifs_async_readv(struct cifs_io_subrequest *rdata)
struct smb_rqst rqst = { .rq_iov = rdata->iov,
 .rq_nvec = 2 };
 
-   cifs_dbg(FYI, "%s: offset=%llu bytes=%u\n",
-__func__, rdata->offset, rdata->bytes);
+   cifs_dbg(FYI, "%s: offset=%llu bytes=%zu\n",
+__func__, rdata->subreq.start, rdata->subreq.len);
 
if (tcon->ses->capabilities & CAP_LARGE_FILES)
wct = 12;
else {
wct = 10; /* old style read */
-   if ((rdata->offset >> 32) > 0)  {
+   if ((rdata->subreq.start >> 32) > 0)  {
/* can not handle this big offset for old */
return -EIO;
}
@@ -1342,12 +1342,12 @@ cifs_async_readv(struct cifs_io_subrequest *rdata)
 
smb->AndXCommand = 0xFF;/* none */
smb->Fid = rdata->cfile->fid.netfid;
-   smb->OffsetLow = cpu_to_le32(rdata->offset & 0x);
+   smb->OffsetLow = cpu_to_le32(rdata->subreq.start & 0x);
if (wct == 12)
-   smb->OffsetHigh = cpu_to_le32(rdata->offset >> 32);
+   smb->OffsetHigh = cpu_to_le32(rdata->subreq.start >> 32);
smb->Remaining = 0;
-   smb->MaxCount = cpu_to_le16(rdata->bytes & 0x);
-   smb->MaxCountHigh = cpu_to_le32(rdata->bytes >> 16);
+   smb->MaxCount = cpu_to_le16(rdata->subreq.len & 0x);
+   smb->MaxCountHigh = cpu_to_le32(rdata->subreq.len >> 16);
if (wct == 12)
smb->ByteCount = 0;
else {
@@ -1631,13 +1631,13 @@ cifs_writev_callback(struct mid_q_entry *mid)
 * client. OS/2 servers are known to set incorrect
 * CountHigh values.
 */
-   if (written > wdata->bytes)
+   if (written > wdata->subreq.len)
written &= 0x;
 
-   if (written < wdata->bytes)
+   if (written < wdata->subreq.len)
wdata->result = -ENOSPC;
else
-   wdata->bytes = written;
+   wdata->subreq.len = written;
break;
case MID_REQUEST_SUBMITTED:
case MID_RETRY_NEEDED:
@@ -1668,7 +1668,7 @@ cifs_async_writev(struct cifs_io_subrequest *wdata)
wct = 14;
} else {
wct = 12;
-   if (wdata->offset >> 32 > 0) {
+   if (wdata->subreq.start >> 32 > 0) {

[Linux-cachefs] [RFC PATCH 42/53] afs: Use the netfs write helpers

2023-10-13 Thread David Howells

Make afs use the netfs write helpers.

Signed-off-by: David Howells 
cc: Marc Dionne 
cc: Jeff Layton 
cc: linux-...@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/afs/file.c  |  65 +++-
 fs/afs/internal.h  |  10 +-
 fs/afs/write.c | 704 ++---
 include/trace/events/afs.h |  23 --
 4 files changed, 81 insertions(+), 721 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 5bb78d874292..586a573b1a9b 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -34,7 +34,7 @@ const struct file_operations afs_file_operations = {
.release= afs_release,
.llseek = generic_file_llseek,
.read_iter  = afs_file_read_iter,
-   .write_iter = afs_file_write,
+   .write_iter = netfs_file_write_iter,
.mmap   = afs_file_mmap,
.splice_read= afs_file_splice_read,
.splice_write   = iter_file_splice_write,
@@ -50,16 +50,15 @@ const struct inode_operations afs_file_inode_operations = {
 };
 
 const struct address_space_operations afs_file_aops = {
+   .direct_IO  = noop_direct_IO,
.read_folio = netfs_read_folio,
.readahead  = netfs_readahead,
.dirty_folio= afs_dirty_folio,
-   .launder_folio  = afs_launder_folio,
+   .launder_folio  = netfs_launder_folio,
.release_folio  = netfs_release_folio,
.invalidate_folio = netfs_invalidate_folio,
-   .write_begin= afs_write_begin,
-   .write_end  = afs_write_end,
-   .writepages = afs_writepages,
.migrate_folio  = filemap_migrate_folio,
+   .writepages = afs_writepages,
 };
 
 const struct address_space_operations afs_symlink_aops = {
@@ -355,8 +354,10 @@ static int afs_symlink_read_folio(struct file *file, 
struct folio *folio)
 
 static int afs_init_request(struct netfs_io_request *rreq, struct file *file)
 {
-   rreq->netfs_priv = key_get(afs_file_key(file));
+   if (file)
+   rreq->netfs_priv = key_get(afs_file_key(file));
rreq->rsize = 4 * 1024 * 1024;
+   rreq->wsize = 16 * 1024;
return 0;
 }
 
@@ -373,12 +374,37 @@ static void afs_free_request(struct netfs_io_request 
*rreq)
key_put(rreq->netfs_priv);
 }
 
+static void afs_update_i_size(struct inode *inode, loff_t new_i_size)
+{
+   struct afs_vnode *vnode = AFS_FS_I(inode);
+   loff_t i_size;
+
+   write_seqlock(>cb_lock);
+   i_size = i_size_read(>netfs.inode);
+   if (new_i_size > i_size) {
+   i_size_write(>netfs.inode, new_i_size);
+   inode_set_bytes(>netfs.inode, new_i_size);
+   }
+   write_sequnlock(>cb_lock);
+   fscache_update_cookie(afs_vnode_cache(vnode), NULL, _i_size);
+}
+
+static void afs_netfs_invalidate_cache(struct netfs_io_request *wreq)
+{
+   struct afs_vnode *vnode = AFS_FS_I(wreq->inode);
+
+   afs_invalidate_cache(vnode, 0);
+}
+
 const struct netfs_request_ops afs_req_ops = {
.init_request   = afs_init_request,
.free_request   = afs_free_request,
.begin_cache_operation  = fscache_begin_cache_operation,
.check_write_begin  = afs_check_write_begin,
.issue_read = afs_issue_read,
+   .update_i_size  = afs_update_i_size,
+   .invalidate_cache   = afs_netfs_invalidate_cache,
+   .create_write_requests  = afs_create_write_requests,
 };
 
 int afs_write_inode(struct inode *inode, struct writeback_control *wbc)
@@ -453,28 +479,39 @@ static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, 
pgoff_t start_pgoff, pg
 
 static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 {
-   struct afs_vnode *vnode = AFS_FS_I(file_inode(iocb->ki_filp));
+   struct inode *inode = file_inode(iocb->ki_filp);
+   struct afs_vnode *vnode = AFS_FS_I(inode);
struct afs_file *af = iocb->ki_filp->private_data;
int ret;
 
-   ret = afs_validate(vnode, af->key);
+   if (iocb->ki_flags & IOCB_DIRECT)
+   return netfs_unbuffered_read_iter(iocb, iter);
+
+   ret = netfs_start_io_read(inode);
if (ret < 0)
return ret;
-
-   return generic_file_read_iter(iocb, iter);
+   ret = afs_validate(vnode, af->key);
+   if (ret == 0)
+   ret = netfs_file_read_iter(iocb, iter);
+   netfs_end_io_read(inode);
+   return ret;
 }
 
 static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe,
size_t len, unsigned int flags)
 {
-   struct afs_vnode *vnode = AFS_FS_I(file_inode(in));
+   struct inode *inode = file_inode(in);
+   struct afs_vnode *vnode = AFS_FS_I(inode);
struct afs_file *af = in->private

[Linux-cachefs] [RFC PATCH 43/53] cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest

2023-10-13 Thread David Howells

Netfslib has a facility whereby the allocation for netfs_io_subrequest can
be increased to so that filesystem-specific data can be tagged on the end.

Prepare to use this by making a struct, cifs_io_subrequest, that wraps
netfs_io_subrequest, and absorb struct cifs_readdata into it.

Signed-off-by: David Howells 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/smb/client/cifsglob.h  | 22 ++
 fs/smb/client/cifsproto.h |  9 ++--
 fs/smb/client/cifssmb.c   | 11 -
 fs/smb/client/file.c  | 48 ++-
 fs/smb/client/smb2ops.c   |  2 +-
 fs/smb/client/smb2pdu.c   | 13 ++-
 fs/smb/client/smb2proto.h |  2 +-
 fs/smb/client/transport.c |  4 ++--
 8 files changed, 56 insertions(+), 55 deletions(-)

diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index 02082621d8e0..22fa98428845 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -237,7 +237,7 @@ struct dfs_info3_param;
 struct cifs_fattr;
 struct smb3_fs_context;
 struct cifs_fid;
-struct cifs_readdata;
+struct cifs_io_subrequest;
 struct cifs_writedata;
 struct cifs_io_parms;
 struct cifs_search_info;
@@ -411,7 +411,7 @@ struct smb_version_operations {
/* send a flush request to the server */
int (*flush)(const unsigned int, struct cifs_tcon *, struct cifs_fid *);
/* async read from the server */
-   int (*async_readv)(struct cifs_readdata *);
+   int (*async_readv)(struct cifs_io_subrequest *);
/* async write to the server */
int (*async_writev)(struct cifs_writedata *,
void (*release)(struct kref *));
@@ -1423,26 +1423,28 @@ struct cifs_aio_ctx {
 };
 
 /* asynchronous read support */
-struct cifs_readdata {
-   struct kref refcount;
-   struct list_headlist;
-   struct completion   done;
+struct cifs_io_subrequest {
+   struct netfs_io_subrequest  subreq;
struct cifsFileInfo *cfile;
struct address_space*mapping;
struct cifs_aio_ctx *ctx;
-   __u64   offset;
ssize_t got_bytes;
-   unsigned intbytes;
pid_t   pid;
int result;
-   struct work_struct  work;
-   struct iov_iter iter;
struct kvec iov[2];
struct TCP_Server_Info  *server;
 #ifdef CONFIG_CIFS_SMB_DIRECT
struct smbd_mr  *mr;
 #endif
struct cifs_credits credits;
+
+   // TODO: Remove following elements
+   struct list_headlist;
+   struct completion   done;
+   struct work_struct  work;
+   struct iov_iter iter;
+   __u64   offset;
+   unsigned intbytes;
 };
 
 /* asynchronous write support */
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index 0c37eefa18a5..7748fe148fb4 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -580,8 +580,13 @@ void __cifs_put_smb_ses(struct cifs_ses *ses);
 extern struct cifs_ses *
 cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx);
 
-void cifs_readdata_release(struct kref *refcount);
-int cifs_async_readv(struct cifs_readdata *rdata);
+void cifs_readdata_release(struct cifs_io_subrequest *rdata);
+static inline void cifs_put_readdata(struct cifs_io_subrequest *rdata)
+{
+   if (refcount_dec_and_test(>subreq.ref))
+   cifs_readdata_release(rdata);
+}
+int cifs_async_readv(struct cifs_io_subrequest *rdata);
 int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry 
*mid);
 
 int cifs_async_writev(struct cifs_writedata *wdata,
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index 25503f1a4fd2..76005b3d5ffe 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "cifspdu.h"
 #include "cifsfs.h"
 #include "cifsglob.h"
@@ -1260,12 +1262,11 @@ CIFS_open(const unsigned int xid, struct 
cifs_open_parms *oparms, int *oplock,
 static void
 cifs_readv_callback(struct mid_q_entry *mid)
 {
-   struct cifs_readdata *rdata = mid->callback_data;
+   struct cifs_io_subrequest *rdata = mid->callback_data;
struct cifs_tcon *tcon = tlink_tcon(rdata->cfile->tlink);
struct TCP_Server_Info *server = tcon->ses->server;
struct smb_rqst rqst = { .rq_iov = rdata->iov,
 .rq_nvec = 2,
-

[Linux-cachefs] [RFC PATCH 39/53] netfs: Provide a launder_folio implementation

2023-10-13 Thread David Howells

Provide a launder_folio implementation for netfslib.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_write.c| 71 
 fs/netfs/main.c  |  1 +
 include/linux/netfs.h|  2 +
 include/trace/events/netfs.h |  3 ++
 4 files changed, 77 insertions(+)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index b81d807f89f0..5695bc3acf6c 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -1101,3 +1101,74 @@ int netfs_writepages(struct address_space *mapping,
return ret;
 }
 EXPORT_SYMBOL(netfs_writepages);
+
+/*
+ * Deal with the disposition of a laundered folio.
+ */
+static void netfs_cleanup_launder_folio(struct netfs_io_request *wreq)
+{
+   if (wreq->error) {
+   pr_notice("R=%08x Laundering error %d\n", wreq->debug_id, 
wreq->error);
+   mapping_set_error(wreq->mapping, wreq->error);
+   }
+}
+
+/**
+ * netfs_launder_folio - Clean up a dirty folio that's being invalidated
+ * @folio: The folio to clean
+ *
+ * This is called to write back a folio that's being invalidated when an inode
+ * is getting torn down.  Ideally, writepages would be used instead.
+ */
+int netfs_launder_folio(struct folio *folio)
+{
+   struct netfs_io_request *wreq;
+   struct address_space *mapping = folio->mapping;
+   struct netfs_folio *finfo;
+   struct bio_vec bvec;
+   unsigned long long i_size = i_size_read(mapping->host);
+   unsigned long long start = folio_pos(folio);
+   size_t offset = 0, len;
+   int ret = 0;
+
+   finfo = netfs_folio_info(folio);
+   if (finfo) {
+   offset = finfo->dirty_offset;
+   start += offset;
+   len = finfo->dirty_len;
+   } else {
+   len = folio_size(folio);
+   }
+   len = min_t(unsigned long long, len, i_size - start);
+
+   wreq = netfs_alloc_request(mapping, NULL, start, len, 
NETFS_LAUNDER_WRITE);
+   if (IS_ERR(wreq)) {
+   ret = PTR_ERR(wreq);
+   goto out;
+   }
+
+   if (!folio_clear_dirty_for_io(folio))
+   goto out_put;
+
+   trace_netfs_folio(folio, netfs_folio_trace_launder);
+
+   _debug("launder %llx-%llx", start, start + len - 1);
+
+   /* Speculatively write to the cache.  We have to fix this up later if
+* the store fails.
+*/
+   wreq->cleanup = netfs_cleanup_launder_folio;
+
+   bvec_set_folio(, folio, len, offset);
+   iov_iter_bvec(>iter, ITER_SOURCE, , 1, len);
+   __set_bit(NETFS_RREQ_UPLOAD_TO_SERVER, >flags);
+   ret = netfs_begin_write(wreq, true, netfs_write_trace_launder);
+
+out_put:
+   netfs_put_request(wreq, false, netfs_rreq_trace_put_return);
+out:
+   folio_wait_fscache(folio);
+   _leave(" = %d", ret);
+   return ret;
+}
+EXPORT_SYMBOL(netfs_launder_folio);
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index b335e6a50f9c..577c8a9fc0f2 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -33,6 +33,7 @@ static const char *netfs_origins[nr__netfs_io_origin] = {
[NETFS_READPAGE]= "RP",
[NETFS_READ_FOR_WRITE]  = "RW",
[NETFS_WRITEBACK]   = "WB",
+   [NETFS_LAUNDER_WRITE]   = "LW",
[NETFS_RMW_READ]= "RM",
[NETFS_UNBUFFERED_WRITE]= "UW",
[NETFS_DIO_READ]= "DR",
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 9661ae24120f..d4a1073cc541 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -234,6 +234,7 @@ enum netfs_io_origin {
NETFS_READPAGE, /* This read is a synchronous read */
NETFS_READ_FOR_WRITE,   /* This read is to prepare a write */
NETFS_WRITEBACK,/* This write was triggered by 
writepages */
+   NETFS_LAUNDER_WRITE,/* This is triggered by 
->launder_folio() */
NETFS_RMW_READ, /* This is an unbuffered read for RMW */
NETFS_UNBUFFERED_WRITE, /* This is an unbuffered write */
NETFS_DIO_READ, /* This is a direct I/O read */
@@ -422,6 +423,7 @@ int netfs_writepages(struct address_space *mapping,
 struct writeback_control *wbc);
 void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length);
 bool netfs_release_folio(struct folio *folio, gfp_t gfp);
+int netfs_launder_folio(struct folio *folio);
 
 /* VMA operations API. */
 vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group 
*netfs_group);
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 825946f510ee

[Linux-cachefs] [RFC PATCH 37/53] netfs: Support decryption on ubuffered/DIO read

2023-10-13 Thread David Howells

Support unbuffered and direct I/O reads from an encrypted file.  This may
require making a larger read than is required into a bounce buffer and
copying out the required bits.  We don't decrypt in-place in the user
buffer lest userspace interfere and muck up the decryption.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/direct_read.c | 10 ++
 fs/netfs/internal.h| 17 +
 2 files changed, 27 insertions(+)

diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index 52ad8fa66dd5..158719b56900 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -181,6 +181,16 @@ static ssize_t netfs_unbuffered_read_iter_locked(struct 
kiocb *iocb, struct iov_
iov_iter_advance(iter, orig_count);
}
 
+   /* If we're going to do decryption or decompression, we're going to
+* need a bounce buffer - and if the data is misaligned for the crypto
+* algorithm, we decrypt in place and then copy.
+*/
+   if (test_bit(NETFS_RREQ_CONTENT_ENCRYPTION, >flags)) {
+   if (!netfs_is_crypto_aligned(rreq, iter))
+   __set_bit(NETFS_RREQ_CRYPT_IN_PLACE, >flags);
+   __set_bit(NETFS_RREQ_USE_BOUNCE_BUFFER, >flags);
+   }
+
/* If we're going to use a bounce buffer, we need to set it up.  We
 * will then need to pad the request out to the minimum block size.
 */
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 8dc68a75d6cd..7dd37d3aff3f 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -196,6 +196,23 @@ static inline void netfs_put_group_many(struct netfs_group 
*netfs_group, int nr)
netfs_group->free(netfs_group);
 }
 
+/*
+ * Check to see if a buffer aligns with the crypto unit block size.  If it
+ * doesn't the crypto layer is going to copy all the data - in which case
+ * relying on the crypto op for a free copy is pointless.
+ */
+static inline bool netfs_is_crypto_aligned(struct netfs_io_request *rreq,
+  struct iov_iter *iter)
+{
+   struct netfs_inode *ctx = netfs_inode(rreq->inode);
+   unsigned long align, mask = (1UL << ctx->min_bshift) - 1;
+
+   if (!ctx->min_bshift)
+   return true;
+   align = iov_iter_alignment(iter);
+   return (align & mask) == 0;
+}
+
 /*/
 /*
  * debug tracing
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 40/53] netfs: Implement a write-through caching option

2023-10-13 Thread David Howells

Provide a flag whereby a filesystem may request that cifs_perform_write()
perform write-through caching.  This involves putting pages directly into
writeback rather than dirty and attaching them to a write operation as we
go.

Further, the writes being made are limited to the byte range being written
rather than whole folios being written.  This can be used by cifs, for
example, to deal with strict byte-range locking.

This can't be used with content encryption as that may require expansion of
the write RPC beyond the write being made.

This doesn't affect writes via mmap - those are written back in the normal
way; similarly failed writethrough writes are marked dirty and left to
writeback to retry.  Another option would be to simply invalidate them, but
the contents can be simultaneously accessed by read() and through mmap.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_write.c| 66 ++
 fs/netfs/internal.h  |  3 ++
 fs/netfs/main.c  |  1 +
 fs/netfs/objects.c   |  1 +
 fs/netfs/output.c| 90 
 include/linux/netfs.h|  2 +
 include/trace/events/netfs.h |  8 +++-
 7 files changed, 159 insertions(+), 12 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 5695bc3acf6c..6657dbd07b9d 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -26,6 +26,8 @@ enum netfs_how_to_modify {
NETFS_FLUSH_CONTENT,/* Flush incompatible content. */
 };
 
+static void netfs_cleanup_buffered_write(struct netfs_io_request *wreq);
+
 static void netfs_set_group(struct folio *folio, struct netfs_group 
*netfs_group)
 {
if (netfs_group && !folio_get_private(folio))
@@ -135,6 +137,14 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct 
iov_iter *iter,
struct inode *inode = file_inode(file);
struct address_space *mapping = inode->i_mapping;
struct netfs_inode *ctx = netfs_inode(inode);
+   struct writeback_control wbc = {
+   .sync_mode  = WB_SYNC_NONE,
+   .for_sync   = true,
+   .nr_to_write= LONG_MAX,
+   .range_start= iocb->ki_pos,
+   .range_end  = iocb->ki_pos + iter->count,
+   };
+   struct netfs_io_request *wreq = NULL;
struct netfs_folio *finfo;
struct folio *folio;
enum netfs_how_to_modify howto;
@@ -145,6 +155,30 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct 
iov_iter *iter,
size_t max_chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER;
bool maybe_trouble = false;
 
+   if (unlikely(test_bit(NETFS_ICTX_WRITETHROUGH, >flags) ||
+iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC))
+   ) {
+   if (pos < i_size_read(inode)) {
+   ret = filemap_write_and_wait_range(mapping, pos, pos + 
iter->count);
+   if (ret < 0) {
+   goto out;
+   }
+   }
+
+   wbc_attach_fdatawrite_inode(, mapping->host);
+
+   wreq = netfs_begin_writethrough(iocb, iter->count);
+   if (IS_ERR(wreq)) {
+   wbc_detach_inode();
+   ret = PTR_ERR(wreq);
+   wreq = NULL;
+   goto out;
+   }
+   if (!is_sync_kiocb(iocb))
+   wreq->iocb = iocb;
+   wreq->cleanup = netfs_cleanup_buffered_write;
+   }
+
do {
size_t flen;
size_t offset;  /* Offset into pagecache folio */
@@ -314,7 +348,22 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct 
iov_iter *iter,
}
written += copied;
 
-   folio_mark_dirty(folio);
+   if (likely(!wreq)) {
+   folio_mark_dirty(folio);
+   } else {
+   if (folio_test_dirty(folio))
+   /* Sigh.  mmap. */
+   folio_clear_dirty_for_io(folio);
+   /* We make multiple writes to the folio... */
+   if (!folio_start_writeback(folio)) {
+   if (wreq->iter.count == 0)
+   trace_netfs_folio(folio, 
netfs_folio_trace_wthru);
+   else
+   trace_netfs_folio(folio, 
netfs_folio_trace_wthru_plus);
+   }
+   netfs_advance_writethrough(wreq, copied,
+  offset + copied == flen);
+   }
retry:
folio_unlock(folio);
folio_put(foli

[Linux-cachefs] [RFC PATCH 38/53] netfs: Support encryption on Unbuffered/DIO write

2023-10-13 Thread David Howells

Support unbuffered and direct I/O writes to an encrypted file.  This may
require making an RMW cycle if the write is not appropriately aligned with
respect to the crypto blocks.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/direct_read.c   |   2 +-
 fs/netfs/direct_write.c  | 210 ++-
 fs/netfs/internal.h  |   8 ++
 fs/netfs/io.c| 117 +++
 fs/netfs/main.c  |   1 +
 include/linux/netfs.h|   4 +
 include/trace/events/netfs.h |   1 +
 7 files changed, 337 insertions(+), 6 deletions(-)

diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index 158719b56900..c01cbe42db8a 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -88,7 +88,7 @@ static int netfs_copy_xarray_to_iter(struct netfs_io_request 
*rreq,
  * If we did a direct read to a bounce buffer (say we needed to decrypt it),
  * copy the data obtained to the destination iterator.
  */
-static int netfs_dio_copy_bounce_to_dest(struct netfs_io_request *rreq)
+int netfs_dio_copy_bounce_to_dest(struct netfs_io_request *rreq)
 {
struct iov_iter *dest_iter = >iter;
struct kiocb *iocb = rreq->iocb;
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index b1a4921ac4a2..f9dea801d6dd 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -23,6 +23,100 @@ static void netfs_cleanup_dio_write(struct netfs_io_request 
*wreq)
}
 }
 
+/*
+ * Allocate a bunch of pages and add them into the xarray buffer starting at
+ * the given index.
+ */
+static int netfs_alloc_buffer(struct xarray *xa, pgoff_t index, unsigned int 
nr_pages)
+{
+   struct page *page;
+   unsigned int n;
+   int ret = 0;
+   LIST_HEAD(list);
+
+   n = alloc_pages_bulk_list(GFP_NOIO, nr_pages, );
+   if (n < nr_pages) {
+   ret = -ENOMEM;
+   }
+
+   while ((page = list_first_entry_or_null(, struct page, lru))) {
+   list_del(>lru);
+   page->index = index;
+   ret = xa_insert(xa, index++, page, GFP_NOIO);
+   if (ret < 0)
+   break;
+   }
+
+   while ((page = list_first_entry_or_null(, struct page, lru))) {
+   list_del(>lru);
+   __free_page(page);
+   }
+   return ret;
+}
+
+/*
+ * Copy all of the data from the source iterator into folios in the destination
+ * xarray.  We cannot step through and kmap the source iterator if it's an
+ * iovec, so we have to step through the xarray and drop the RCU lock each
+ * time.
+ */
+static int netfs_copy_iter_to_xarray(struct iov_iter *src, struct xarray *xa,
+unsigned long long start)
+{
+   struct folio *folio;
+   void *base;
+   pgoff_t index = start / PAGE_SIZE;
+   size_t len, copied, count = iov_iter_count(src);
+
+   XA_STATE(xas, xa, index);
+
+   _enter("%zx", count);
+
+   if (!count)
+   return -EIO;
+
+   len = PAGE_SIZE - offset_in_page(start);
+   rcu_read_lock();
+   xas_for_each(, folio, ULONG_MAX) {
+   size_t offset;
+
+   if (xas_retry(, folio))
+   continue;
+
+   /* There shouldn't be a need to call xas_pause() as no one else
+* can see the xarray we're iterating over.
+*/
+   rcu_read_unlock();
+
+   offset = offset_in_folio(folio, start);
+   _debug("folio %lx +%zx [%llx]", folio->index, offset, start);
+
+   while (offset < folio_size(folio)) {
+   len = min(count, len);
+
+   base = kmap_local_folio(folio, offset);
+   copied = copy_from_iter(base, len, src);
+   kunmap_local(base);
+   if (copied != len)
+   goto out;
+   count -= len;
+   if (count == 0)
+   goto out;
+
+   start += len;
+   offset += len;
+   len = PAGE_SIZE;
+   }
+
+   rcu_read_lock();
+   }
+
+   rcu_read_unlock();
+out:
+   _leave(" = %zx", count);
+   return count ? -EIO : 0;
+}
+
 /*
  * Perform an unbuffered write where we may have to do an RMW operation on an
  * encrypted file.  This can also be used for direct I/O writes.
@@ -31,20 +125,47 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb 
*iocb, struct iov_iter *
   struct netfs_group *netfs_group)
 {
struct netfs_io_request *wreq;
+   struct netfs_inode *ctx = netfs_inode(file_inode(iocb->ki_filp));
+   unsigned long long real_size = ctx->

[Linux-cachefs] [RFC PATCH 34/53] netfs: Make netfs_skip_folio_read() take account of blocksize

2023-10-13 Thread David Howells

Make netfs_skip_folio_read() take account of blocksize such as crypto
blocksize.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_read.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index e06461ef0bfa..de696aaaefbd 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -337,6 +337,7 @@ EXPORT_SYMBOL(netfs_read_folio);
 
 /*
  * Prepare a folio for writing without reading first
+ * @ctx: File context
  * @folio: The folio being prepared
  * @pos: starting position for the write
  * @len: length of write
@@ -350,32 +351,41 @@ EXPORT_SYMBOL(netfs_read_folio);
  * If any of these criteria are met, then zero out the unwritten parts
  * of the folio and return true. Otherwise, return false.
  */
-static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
-bool always_fill)
+static bool netfs_skip_folio_read(struct netfs_inode *ctx, struct folio *folio,
+ loff_t pos, size_t len, bool always_fill)
 {
struct inode *inode = folio_inode(folio);
-   loff_t i_size = i_size_read(inode);
+   loff_t i_size = i_size_read(inode), low, high;
size_t offset = offset_in_folio(folio, pos);
size_t plen = folio_size(folio);
+   size_t min_bsize = 1UL << ctx->min_bshift;
+
+   if (likely(min_bsize == 1)) {
+   low = folio_file_pos(folio);
+   high = low + plen;
+   } else {
+   low = round_down(pos, min_bsize);
+   high = round_up(pos + len, min_bsize);
+   }
 
if (unlikely(always_fill)) {
-   if (pos - offset + len <= i_size)
-   return false; /* Page entirely before EOF */
+   if (low < i_size)
+   return false; /* Some part of the block before EOF */
zero_user_segment(>page, 0, plen);
folio_mark_uptodate(folio);
return true;
}
 
-   /* Full folio write */
-   if (offset == 0 && len >= plen)
+   /* Full page write */
+   if (pos == low && high == pos + len)
return true;
 
-   /* Page entirely beyond the end of the file */
-   if (pos - offset >= i_size)
+   /* pos beyond last page in the file */
+   if (low >= i_size)
goto zero_out;
 
/* Write that covers from the start of the folio to EOF or beyond */
-   if (offset == 0 && (pos + len) >= i_size)
+   if (pos == low && (pos + len) >= i_size)
goto zero_out;
 
return false;
@@ -454,7 +464,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
 * to preload the granule.
 */
if (!netfs_is_cache_enabled(ctx) &&
-   netfs_skip_folio_read(folio, pos, len, false)) {
+   netfs_skip_folio_read(ctx, folio, pos, len, false)) {
netfs_stat(_n_rh_write_zskip);
goto have_folio_no_wait;
}
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 33/53] netfs: Provide minimum blocksize parameter

2023-10-13 Thread David Howells

Add a parameter for minimum blocksize in the netfs_i_context struct.  This
can be used, for instance, to force I/O alignment for content encryption.
It also requires the use of an RMW cycle if a write we want to do doesn't
meet the block alignment requirements.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_read.c  | 26 ++
 fs/netfs/buffered_write.c |  3 ++-
 fs/netfs/direct_read.c|  3 ++-
 include/linux/netfs.h |  2 ++
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index ab9f8e123245..e06461ef0bfa 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -527,14 +527,26 @@ int netfs_prefetch_for_write(struct file *file, struct 
folio *folio,
struct address_space *mapping = folio_file_mapping(folio);
struct netfs_inode *ctx = netfs_inode(mapping->host);
unsigned long long start = folio_pos(folio);
-   size_t flen = folio_size(folio);
+   unsigned long long i_size, rstart, end;
+   size_t rlen;
int ret;
 
-   _enter("%zx @%llx", flen, start);
+   DEFINE_READAHEAD(ractl, file, NULL, mapping, folio_index(folio));
+
+   _enter("%zx @%llx", len, start);
 
ret = -ENOMEM;
 
-   rreq = netfs_alloc_request(mapping, file, start, flen,
+   i_size = i_size_read(mapping->host);
+   end = round_up(start + len, 1U << ctx->min_bshift);
+   if (end > i_size) {
+   unsigned long long limit = round_up(start + len, PAGE_SIZE);
+   end = max(limit, round_up(i_size, PAGE_SIZE));
+   }
+   rstart = round_down(start, 1U << ctx->min_bshift);
+   rlen   = end - rstart;
+
+   rreq = netfs_alloc_request(mapping, file, rstart, rlen,
   NETFS_READ_FOR_WRITE);
if (IS_ERR(rreq)) {
ret = PTR_ERR(rreq);
@@ -548,7 +560,13 @@ int netfs_prefetch_for_write(struct file *file, struct 
folio *folio,
goto error_put;
 
netfs_stat(_n_rh_write_begin);
-   trace_netfs_read(rreq, start, flen, 
netfs_read_trace_prefetch_for_write);
+   trace_netfs_read(rreq, rstart, rlen, 
netfs_read_trace_prefetch_for_write);
+
+   /* Expand the request to meet caching requirements and download
+* preferences.
+*/
+   ractl._nr_pages = folio_nr_pages(folio);
+   netfs_rreq_expand(rreq, );
 
/* Set up the output buffer */
iov_iter_xarray(>iter, ITER_DEST, >i_pages,
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index d5a5a315fbd3..7163fcc05206 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -80,7 +80,8 @@ static enum netfs_how_to_modify netfs_how_to_modify(struct 
netfs_inode *ctx,
if (file->f_mode & FMODE_READ)
return NETFS_JUST_PREFETCH;
 
-   if (netfs_is_cache_enabled(ctx))
+   if (netfs_is_cache_enabled(ctx) ||
+   ctx->min_bshift > 0)
return NETFS_JUST_PREFETCH;
 
if (!finfo)
diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
index 1d26468aafd9..52ad8fa66dd5 100644
--- a/fs/netfs/direct_read.c
+++ b/fs/netfs/direct_read.c
@@ -185,7 +185,8 @@ static ssize_t netfs_unbuffered_read_iter_locked(struct 
kiocb *iocb, struct iov_
 * will then need to pad the request out to the minimum block size.
 */
if (test_bit(NETFS_RREQ_USE_BOUNCE_BUFFER, >flags)) {
-   start = rreq->start;
+   min_bsize = 1ULL << ctx->min_bshift;
+   start = round_down(rreq->start, min_bsize);
end = min_t(unsigned long long,
round_up(rreq->start + rreq->len, min_bsize),
ctx->remote_i_size);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index fb4f4f826b93..6244f7a9a44a 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -141,6 +141,7 @@ struct netfs_inode {
unsigned long   flags;
 #define NETFS_ICTX_ODIRECT 0   /* The file has DIO in progress 
*/
 #define NETFS_ICTX_UNBUFFERED  1   /* I/O should not use the 
pagecache */
+   unsigned char   min_bshift; /* log2 min block size for 
bounding box or 0 */
 };
 
 /*
@@ -462,6 +463,7 @@ static inline void netfs_inode_init(struct netfs_inode *ctx,
ctx->remote_i_size = i_size_read(>inode);
ctx->zero_point = ctx->remote_i_size;
ctx->flags = 0;
+   ctx->min_bshift = 0;
 #if IS_ENABLED(CONFIG_FSCACHE)
ctx->cache = NULL;
 #endif
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 36/53] netfs: Decrypt encrypted content

2023-10-13 Thread David Howells

Implement a facility to provide decryption for encrypted content to a whole
read-request in one go (which might have been stitched together from
disparate sources with divisions that don't match page boundaries).

Note that this doesn't necessarily gain the best throughput if the crypto
block size is equal to or less than the size of a page (in which case we
might be better doing it as pages become read), but it will handle crypto
blocks larger than the size of a page.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/crypto.c| 59 
 fs/netfs/internal.h  |  1 +
 fs/netfs/io.c|  6 +++-
 include/linux/netfs.h|  3 ++
 include/trace/events/netfs.h |  2 ++
 5 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/crypto.c b/fs/netfs/crypto.c
index 943d01f430e2..6729bcda4f47 100644
--- a/fs/netfs/crypto.c
+++ b/fs/netfs/crypto.c
@@ -87,3 +87,62 @@ bool netfs_encrypt(struct netfs_io_request *wreq)
wreq->error = ret;
return false;
 }
+
+/*
+ * Decrypt the result of a read request.
+ */
+void netfs_decrypt(struct netfs_io_request *rreq)
+{
+   struct netfs_inode *ctx = netfs_inode(rreq->inode);
+   struct scatterlist source_sg[16], dest_sg[16];
+   unsigned int n_source;
+   size_t n, chunk, bsize = 1UL << ctx->crypto_bshift;
+   loff_t pos;
+   int ret;
+
+   trace_netfs_rreq(rreq, netfs_rreq_trace_decrypt);
+   if (rreq->start >= rreq->i_size)
+   return;
+
+   n = min_t(unsigned long long, rreq->len, rreq->i_size - rreq->start);
+
+   _debug("DECRYPT %llx-%llx f=%lx",
+  rreq->start, rreq->start + n, rreq->flags);
+
+   pos = rreq->start;
+   for (; n > 0; n -= chunk, pos += chunk) {
+   chunk = min(n, bsize);
+
+   ret = netfs_iter_to_sglist(>io_iter, chunk,
+  source_sg, ARRAY_SIZE(source_sg));
+   if (ret < 0)
+   goto error;
+   n_source = ret;
+
+   if (test_bit(NETFS_RREQ_CRYPT_IN_PLACE, >flags)) {
+   ret = ctx->ops->decrypt_block(rreq, pos, chunk,
+ source_sg, n_source,
+ source_sg, n_source);
+   } else {
+   ret = netfs_iter_to_sglist(>iter, chunk,
+  dest_sg, 
ARRAY_SIZE(dest_sg));
+   if (ret < 0)
+   goto error;
+   ret = ctx->ops->decrypt_block(rreq, pos, chunk,
+ source_sg, n_source,
+ dest_sg, ret);
+   }
+
+   if (ret < 0)
+   goto error_failed;
+   }
+
+   return;
+
+error_failed:
+   trace_netfs_failure(rreq, NULL, ret, netfs_fail_decryption);
+error:
+   rreq->error = ret;
+   set_bit(NETFS_RREQ_FAILED, >flags);
+   return;
+}
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 3f4e64968623..8dc68a75d6cd 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -26,6 +26,7 @@ int netfs_prefetch_for_write(struct file *file, struct folio 
*folio,
  * crypto.c
  */
 bool netfs_encrypt(struct netfs_io_request *wreq);
+void netfs_decrypt(struct netfs_io_request *rreq);
 
 /*
  * direct_write.c
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 36a3f720193a..9887b22e4cb3 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -398,6 +398,9 @@ static void netfs_rreq_assess(struct netfs_io_request 
*rreq, bool was_async)
return;
}
 
+   if (!test_bit(NETFS_RREQ_FAILED, >flags) &&
+   test_bit(NETFS_RREQ_CONTENT_ENCRYPTION, >flags))
+   netfs_decrypt(rreq);
if (rreq->origin != NETFS_DIO_READ)
netfs_rreq_unlock_folios(rreq);
else
@@ -427,7 +430,8 @@ static void netfs_rreq_work(struct work_struct *work)
 static void netfs_rreq_terminated(struct netfs_io_request *rreq,
  bool was_async)
 {
-   if (test_bit(NETFS_RREQ_INCOMPLETE_IO, >flags) &&
+   if ((test_bit(NETFS_RREQ_INCOMPLETE_IO, >flags) ||
+test_bit(NETFS_RREQ_CONTENT_ENCRYPTION, >flags)) &&
was_async) {
if (!queue_work(system_unbound_wq, >work))
BUG();
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index cdb471938225..524e6f5ff3fd 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -326,6 +326,9 @@ struct netfs_request_ops {
int (*encrypt_block)

[Linux-cachefs] [RFC PATCH 35/53] netfs: Perform content encryption

2023-10-13 Thread David Howells

When dealing with an encrypted file, we gather together sufficient pages
from the pagecache to constitute a logical crypto block, allocate a bounce
buffer and then ask the filesystem to encrypt between the buffers.  The
bounce buffer is then passed to the filesystem to upload.

The network filesystem must set a flag to indicate what service is desired
and what the logical blocksize will be.

The netfs library iterates through each block to be processed, providing a
pair of scatterlists to describe the start and end buffers.

Note that it should be possible in future to encrypt DIO writes also by
this same mechanism.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/Makefile|  1 +
 fs/netfs/buffered_write.c|  3 +-
 fs/netfs/crypto.c| 89 
 fs/netfs/internal.h  |  5 ++
 fs/netfs/objects.c   |  2 +
 fs/netfs/output.c|  7 ++-
 include/linux/netfs.h| 11 +
 include/trace/events/netfs.h |  2 +
 8 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 fs/netfs/crypto.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index d5c2809fc029..5ea852ac276c 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -3,6 +3,7 @@
 netfs-y := \
buffered_read.o \
buffered_write.o \
+   crypto.o \
direct_read.o \
direct_write.o \
io.o \
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 7163fcc05206..b81d807f89f0 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -77,7 +77,8 @@ static enum netfs_how_to_modify netfs_how_to_modify(struct 
netfs_inode *ctx,
if (!maybe_trouble && offset == 0 && len >= flen)
return NETFS_WHOLE_FOLIO_MODIFY;
 
-   if (file->f_mode & FMODE_READ)
+   if (file->f_mode & FMODE_READ ||
+   test_bit(NETFS_ICTX_ENCRYPTED, >flags))
return NETFS_JUST_PREFETCH;
 
if (netfs_is_cache_enabled(ctx) ||
diff --git a/fs/netfs/crypto.c b/fs/netfs/crypto.c
new file mode 100644
index ..943d01f430e2
--- /dev/null
+++ b/fs/netfs/crypto.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Network filesystem content encryption support.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+/*
+ * Populate a scatterlist from the next bufferage of an I/O iterator.
+ */
+static int netfs_iter_to_sglist(const struct iov_iter *iter, size_t len,
+   struct scatterlist *sg, unsigned int n_sg)
+{
+   struct iov_iter tmp_iter = *iter;
+   struct sg_table sgtable = { .sgl = sg };
+   ssize_t ret;
+
+   _enter("%zx/%zx", len, iov_iter_count(iter));
+
+   sg_init_table(sg, n_sg);
+   ret = extract_iter_to_sg(_iter, len, , n_sg, 0);
+   if (ret < 0)
+   return ret;
+   sg_mark_end([sgtable.nents - 1]);
+   return sgtable.nents;
+}
+
+/*
+ * Prepare a write request for writing.  We encrypt in/into the bounce buffer.
+ */
+bool netfs_encrypt(struct netfs_io_request *wreq)
+{
+   struct netfs_inode *ctx = netfs_inode(wreq->inode);
+   struct scatterlist source_sg[16], dest_sg[16];
+   unsigned int n_dest;
+   size_t n, chunk, bsize = 1UL << ctx->crypto_bshift;
+   loff_t pos;
+   int ret;
+
+   _enter("");
+
+   trace_netfs_rreq(wreq, netfs_rreq_trace_encrypt);
+
+   pos = wreq->start;
+   n = wreq->len;
+   _debug("ENCRYPT %llx-%llx", pos, pos + n - 1);
+
+   for (; n > 0; n -= chunk, pos += chunk) {
+   chunk = min(n, bsize);
+
+   ret = netfs_iter_to_sglist(>io_iter, chunk,
+  dest_sg, ARRAY_SIZE(dest_sg));
+   if (ret < 0)
+   goto error;
+   n_dest = ret;
+
+   if (test_bit(NETFS_RREQ_CRYPT_IN_PLACE, >flags)) {
+   ret = ctx->ops->encrypt_block(wreq, pos, chunk,
+ dest_sg, n_dest,
+ dest_sg, n_dest);
+   } else {
+   ret = netfs_iter_to_sglist(>iter, chunk,
+  source_sg, 
ARRAY_SIZE(source_sg));
+   if (ret < 0)
+   goto error;
+   ret = ctx->ops->encrypt_block(wreq, pos, chunk,
+ source_sg, ret,
+ dest_sg, n_dest);
+   }
+
+   if (ret < 0)
+

[Linux-cachefs] [RFC PATCH 32/53] netfs: Provide a writepages implementation

2023-10-13 Thread David Howells

Provide an implementation of writepages for network filesystems to delegate
to.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_write.c | 627 ++
 include/linux/netfs.h |   2 +
 2 files changed, 629 insertions(+)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 3c1f26f32351..d5a5a315fbd3 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -32,6 +32,18 @@ static void netfs_set_group(struct folio *folio, struct 
netfs_group *netfs_group
folio_attach_private(folio, netfs_get_group(netfs_group));
 }
 
+#if IS_ENABLED(CONFIG_FSCACHE)
+static void netfs_folio_start_fscache(bool caching, struct folio *folio)
+{
+   if (caching)
+   folio_start_fscache(folio);
+}
+#else
+static void netfs_folio_start_fscache(bool caching, struct folio *folio)
+{
+}
+#endif
+
 /*
  * Decide how we should modify a folio.  We might be attempting to do
  * write-streaming, in which case we don't want to a local RMW cycle if we can
@@ -472,3 +484,618 @@ vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, 
struct netfs_group *netfs_gr
return ret;
 }
 EXPORT_SYMBOL(netfs_page_mkwrite);
+
+/*
+ * Kill all the pages in the given range
+ */
+static void netfs_kill_pages(struct address_space *mapping,
+loff_t start, loff_t len)
+{
+   struct folio *folio;
+   pgoff_t index = start / PAGE_SIZE;
+   pgoff_t last = (start + len - 1) / PAGE_SIZE, next;
+
+   _enter("%llx-%llx", start, start + len - 1);
+
+   do {
+   _debug("kill %lx (to %lx)", index, last);
+
+   folio = filemap_get_folio(mapping, index);
+   if (IS_ERR(folio)) {
+   next = index + 1;
+   continue;
+   }
+
+   next = folio_next_index(folio);
+
+   folio_clear_uptodate(folio);
+   folio_end_writeback(folio);
+   folio_lock(folio);
+   trace_netfs_folio(folio, netfs_folio_trace_kill);
+   generic_error_remove_page(mapping, >page);
+   folio_unlock(folio);
+   folio_put(folio);
+
+   } while (index = next, index <= last);
+
+   _leave("");
+}
+
+/*
+ * Redirty all the pages in a given range.
+ */
+static void netfs_redirty_pages(struct address_space *mapping,
+   loff_t start, loff_t len)
+{
+   struct folio *folio;
+   pgoff_t index = start / PAGE_SIZE;
+   pgoff_t last = (start + len - 1) / PAGE_SIZE, next;
+
+   _enter("%llx-%llx", start, start + len - 1);
+
+   do {
+   _debug("redirty %llx @%llx", len, start);
+
+   folio = filemap_get_folio(mapping, index);
+   if (IS_ERR(folio)) {
+   next = index + 1;
+   continue;
+   }
+
+   next = folio_next_index(folio);
+   trace_netfs_folio(folio, netfs_folio_trace_redirty);
+   filemap_dirty_folio(mapping, folio);
+   folio_end_writeback(folio);
+   folio_put(folio);
+   } while (index = next, index <= last);
+
+   balance_dirty_pages_ratelimited(mapping);
+
+   _leave("");
+}
+
+/*
+ * Completion of write to server
+ */
+static void netfs_pages_written_back(struct netfs_io_request *wreq)
+{
+   struct address_space *mapping = wreq->mapping;
+   struct netfs_folio *finfo;
+   struct netfs_group *group = NULL;
+   struct folio *folio;
+   pgoff_t last;
+   int gcount = 0;
+
+   XA_STATE(xas, >i_pages, wreq->start / PAGE_SIZE);
+
+   _enter("%llx-%llx", wreq->start, wreq->start + wreq->len);
+
+   rcu_read_lock();
+
+   last = (wreq->start + wreq->len - 1) / PAGE_SIZE;
+   xas_for_each(, folio, last) {
+   WARN(!folio_test_writeback(folio),
+"bad %zx @%llx page %lx %lx\n",
+wreq->len, wreq->start, folio_index(folio), last);
+
+   if ((finfo = netfs_folio_info(folio))) {
+   /* Streaming writes cannot be redirtied whilst under
+* writeback, so discard the streaming record.
+*/
+   folio_detach_private(folio);
+   group = finfo->netfs_group;
+   gcount++;
+   trace_netfs_folio(folio, netfs_folio_trace_clear_s);
+   } else if ((group = netfs_folio_group(folio))) {
+   /* Need to detach the group pointer if the page didn't
+* get redirtied.  If it has been redirtied, then it
+* must be within the same group.
+

[Linux-cachefs] [RFC PATCH 31/53] netfs: Provide netfs_file_read_iter()

2023-10-13 Thread David Howells

Provide a top-level-ish function that can be pointed to directly by
->read_iter file op.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_read.c | 33 +
 include/linux/netfs.h|  1 +
 2 files changed, 34 insertions(+)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 374707df6575..ab9f8e123245 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -564,3 +564,36 @@ int netfs_prefetch_for_write(struct file *file, struct 
folio *folio,
_leave(" = %d", ret);
return ret;
 }
+
+/**
+ * netfs_file_read_iter - Generic filesystem read routine
+ * @iocb: kernel I/O control block
+ * @iter: destination for the data read
+ *
+ * This is the ->read_iter() routine for all filesystems that can use the page
+ * cache directly.
+ *
+ * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall be
+ * returned when no data can be read without waiting for I/O requests to
+ * complete; it doesn't prevent readahead.
+ *
+ * The IOCB_NOIO flag in iocb->ki_flags indicates that no new I/O requests
+ * shall be made for the read or for readahead.  When no data can be read,
+ * -EAGAIN shall be returned.  When readahead would be triggered, a partial,
+ * possibly empty read shall be returned.
+ *
+ * Return:
+ * * number of bytes copied, even for partial reads
+ * * negative error code (or 0 if IOCB_NOIO) if nothing was read
+ */
+ssize_t netfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+   struct netfs_inode *ictx = netfs_inode(iocb->ki_filp->f_mapping->host);
+
+   if ((iocb->ki_flags & IOCB_DIRECT) ||
+   test_bit(NETFS_ICTX_UNBUFFERED, >flags))
+   return netfs_unbuffered_read_iter(iocb, iter);
+
+   return filemap_read(iocb, iter, 0);
+}
+EXPORT_SYMBOL(netfs_file_read_iter);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index e2a5a441b7fc..6e02a68a51f7 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -384,6 +384,7 @@ struct netfs_cache_ops {
 
 /* High-level read API. */
 ssize_t netfs_unbuffered_read_iter(struct kiocb *iocb, struct iov_iter *iter);
+ssize_t netfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter);
 
 /* High-level write API */
 ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 30/53] netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite()

2023-10-13 Thread David Howells

Provide an entry point to delegate a filesystem's ->page_mkwrite() to.
This checks for conflicting writes, then attached any netfs-specific group
marking (e.g. ceph snap) to the page to be considered dirty.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_write.c | 59 +++
 include/linux/netfs.h |  4 +++
 2 files changed, 63 insertions(+)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 60e7da53cbd2..3c1f26f32351 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -413,3 +413,62 @@ ssize_t netfs_file_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 EXPORT_SYMBOL(netfs_file_write_iter);
+
+/*
+ * Notification that a previously read-only page is about to become writable.
+ * Note that the caller indicates a single page of a multipage folio.
+ */
+vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group 
*netfs_group)
+{
+   struct folio *folio = page_folio(vmf->page);
+   struct file *file = vmf->vma->vm_file;
+   struct inode *inode = file_inode(file);
+   vm_fault_t ret = VM_FAULT_RETRY;
+   int err;
+
+   _enter("%lx", folio->index);
+
+   sb_start_pagefault(inode->i_sb);
+
+   if (folio_wait_writeback_killable(folio))
+   goto out;
+
+   if (folio_lock_killable(folio) < 0)
+   goto out;
+
+   /* Can we see a streaming write here? */
+   if (WARN_ON(!folio_test_uptodate(folio))) {
+   ret = VM_FAULT_SIGBUS | VM_FAULT_LOCKED;
+   goto out;
+   }
+
+   if (netfs_folio_group(folio) != netfs_group) {
+   folio_unlock(folio);
+   err = filemap_fdatawait_range(inode->i_mapping,
+ folio_pos(folio),
+ folio_pos(folio) + 
folio_size(folio));
+   switch (err) {
+   case 0:
+   ret = VM_FAULT_RETRY;
+   goto out;
+   case -ENOMEM:
+   ret = VM_FAULT_OOM;
+   goto out;
+   default:
+   ret = VM_FAULT_SIGBUS;
+   goto out;
+   }
+   }
+
+   if (folio_test_dirty(folio))
+   trace_netfs_folio(folio, netfs_folio_trace_mkwrite_plus);
+   else
+   trace_netfs_folio(folio, netfs_folio_trace_mkwrite);
+   netfs_set_group(folio, netfs_group);
+   file_update_time(file);
+   ret = VM_FAULT_LOCKED;
+out:
+   sb_end_pagefault(inode->i_sb);
+   return ret;
+}
+EXPORT_SYMBOL(netfs_page_mkwrite);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index d1dc7ba62f17..e2a5a441b7fc 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -403,6 +403,10 @@ int netfs_write_begin(struct netfs_inode *, struct file *,
 void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length);
 bool netfs_release_folio(struct folio *folio, gfp_t gfp);
 
+/* VMA operations API. */
+vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group 
*netfs_group);
+
+/* (Sub)request management API. */
 void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool);
 void netfs_get_subrequest(struct netfs_io_subrequest *subreq,
  enum netfs_sreq_ref_trace what);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 29/53] netfs: Implement buffered write API

2023-10-13 Thread David Howells

Institute a netfs write helper, netfs_file_write_iter(), to be pointed at
by the network filesystem ->write_iter() call.  Make it handled buffered
writes by calling the previously defined netfs_perform_write() to copy the
source data into the pagecache.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_write.c | 83 +++
 include/linux/netfs.h |  3 ++
 2 files changed, 86 insertions(+)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 4de6a12149e4..60e7da53cbd2 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -330,3 +330,86 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct 
iov_iter *iter,
goto out;
 }
 EXPORT_SYMBOL(netfs_perform_write);
+
+/**
+ * netfs_buffered_write_iter_locked - write data to a file
+ * @iocb:  IO state structure (file, offset, etc.)
+ * @from:  iov_iter with data to write
+ * @netfs_group: Grouping for dirty pages (eg. ceph snaps).
+ *
+ * This function does all the work needed for actually writing data to a
+ * file. It does all basic checks, removes SUID from the file, updates
+ * modification times and calls proper subroutines depending on whether we
+ * do direct IO or a standard buffered write.
+ *
+ * The caller must hold appropriate locks around this function and have called
+ * generic_write_checks() already.  The caller is also responsible for doing
+ * any necessary syncing afterwards.
+ *
+ * This function does *not* take care of syncing data in case of O_SYNC write.
+ * A caller has to handle it. This is mainly due to the fact that we want to
+ * avoid syncing under i_rwsem.
+ *
+ * Return:
+ * * number of bytes written, even for truncated writes
+ * * negative error code if no data has been written at all
+ */
+ssize_t netfs_buffered_write_iter_locked(struct kiocb *iocb, struct iov_iter 
*from,
+struct netfs_group *netfs_group)
+{
+   struct file *file = iocb->ki_filp;
+   ssize_t ret;
+
+   trace_netfs_write_iter(iocb, from);
+
+   ret = file_remove_privs(file);
+   if (ret)
+   return ret;
+
+   ret = file_update_time(file);
+   if (ret)
+   return ret;
+
+   return netfs_perform_write(iocb, from, netfs_group);
+}
+EXPORT_SYMBOL(netfs_buffered_write_iter_locked);
+
+/**
+ * netfs_file_write_iter - write data to a file
+ * @iocb: IO state structure
+ * @from: iov_iter with data to write
+ *
+ * Perform a write to a file, writing into the pagecache if possible and doing
+ * an unbuffered write instead if not.
+ *
+ * Return:
+ * * Negative error code if no data has been written at all of
+ *   vfs_fsync_range() failed for a synchronous write
+ * * Number of bytes written, even for truncated writes
+ */
+ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct file *file = iocb->ki_filp;
+   struct inode *inode = file->f_mapping->host;
+   struct netfs_inode *ictx = netfs_inode(inode);
+   ssize_t ret;
+
+   _enter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), 
i_size_read(inode));
+
+   if ((iocb->ki_flags & IOCB_DIRECT) ||
+   test_bit(NETFS_ICTX_UNBUFFERED, >flags))
+   return netfs_unbuffered_write_iter(iocb, from);
+
+   ret = netfs_start_io_write(inode);
+   if (ret < 0)
+   return ret;
+
+   ret = generic_write_checks(iocb, from);
+   if (ret > 0)
+   ret = netfs_buffered_write_iter_locked(iocb, from, NULL);
+   netfs_end_io_write(inode);
+   if (ret > 0)
+   ret = generic_write_sync(iocb, ret);
+   return ret;
+}
+EXPORT_SYMBOL(netfs_file_write_iter);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 052d62625796..d1dc7ba62f17 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -388,7 +388,10 @@ ssize_t netfs_unbuffered_read_iter(struct kiocb *iocb, 
struct iov_iter *iter);
 /* High-level write API */
 ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
struct netfs_group *netfs_group);
+ssize_t netfs_buffered_write_iter_locked(struct kiocb *iocb, struct iov_iter 
*from,
+struct netfs_group *netfs_group);
 ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from);
+ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from);
 
 /* Address operations API */
 struct readahead_control;
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 28/53] netfs: Implement unbuffered/DIO write support

2023-10-13 Thread David Howells

Implement support for unbuffered writes and direct I/O writes.  If the
write is misaligned with respect to the fscrypt block size, then RMW cycles
are performed if necessary.  DIO writes are a special case of unbuffered
writes with extra restriction imposed, such as block size alignment
requirements.

Also provide a field that can tell the code to add some extra space onto
the bounce buffer for use by the filesystem in the case of a
content-encrypted file.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/afs/inode.c   |   2 +-
 fs/netfs/Makefile|   1 +
 fs/netfs/direct_write.c  | 159 +++
 fs/netfs/internal.h  |   6 ++
 fs/netfs/io.c|   2 +-
 fs/netfs/main.c  |  12 +--
 fs/netfs/objects.c   |   6 +-
 fs/netfs/output.c|  24 ++
 include/linux/netfs.h|   4 +
 include/trace/events/netfs.h |   4 +-
 10 files changed, 210 insertions(+), 10 deletions(-)
 create mode 100644 fs/netfs/direct_write.c

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 46bc5574d6f5..a8f4301aca9a 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -250,7 +250,7 @@ static void afs_apply_status(struct afs_operation *op,
 * what's on the server.
 */
vnode->netfs.remote_i_size = status->size;
-   if (change_size) {
+   if (change_size || status->size > i_size_read(inode)) {
afs_set_i_size(vnode, status->size);
vnode->netfs.zero_point = status->size;
inode_set_ctime_to_ts(inode, t);
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 27643557b443..d5c2809fc029 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -4,6 +4,7 @@ netfs-y := \
buffered_read.o \
buffered_write.o \
direct_read.o \
+   direct_write.o \
io.o \
iterator.o \
locking.o \
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
new file mode 100644
index ..b1a4921ac4a2
--- /dev/null
+++ b/fs/netfs/direct_write.c
@@ -0,0 +1,159 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Unbuffered and direct write support.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include "internal.h"
+
+static void netfs_cleanup_dio_write(struct netfs_io_request *wreq)
+{
+   struct inode *inode = wreq->inode;
+   unsigned long long end = wreq->start + wreq->len;
+
+   if (!wreq->error &&
+   i_size_read(inode) < end) {
+   if (wreq->netfs_ops->update_i_size)
+   wreq->netfs_ops->update_i_size(inode, end);
+   else
+   i_size_write(inode, end);
+   }
+}
+
+/*
+ * Perform an unbuffered write where we may have to do an RMW operation on an
+ * encrypted file.  This can also be used for direct I/O writes.
+ */
+ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter 
*iter,
+  struct netfs_group *netfs_group)
+{
+   struct netfs_io_request *wreq;
+   unsigned long long start = iocb->ki_pos;
+   unsigned long long end = start + iov_iter_count(iter);
+   ssize_t ret, n;
+   bool async = !is_sync_kiocb(iocb);
+
+   _enter("");
+
+   /* We're going to need a bounce buffer if what we transmit is going to
+* be different in some way to the source buffer, e.g. because it gets
+* encrypted/compressed or because it needs expanding to a block size.
+*/
+   // TODO
+
+   _debug("uw %llx-%llx", start, end);
+
+   wreq = netfs_alloc_request(iocb->ki_filp->f_mapping, iocb->ki_filp,
+  start, end - start,
+  iocb->ki_flags & IOCB_DIRECT ?
+  NETFS_DIO_WRITE : NETFS_UNBUFFERED_WRITE);
+   if (IS_ERR(wreq))
+   return PTR_ERR(wreq);
+
+   {
+   /* If this is an async op and we're not using a bounce buffer,
+* we have to save the source buffer as the iterator is only
+* good until we return.  In such a case, extract an iterator
+* to represent as much of the the output buffer as we can
+* manage.  Note that the extraction might not be able to
+* allocate a sufficiently large bvec array and may shorten the
+* request.
+*/
+   if (async || user_backed_iter(iter)) {
+   n = netfs_extract_user_iter(iter, wreq->len, 
>iter, 0);
+   if (n < 0) {
+

[Linux-cachefs] [RFC PATCH 27/53] netfs: Implement support for unbuffered/DIO read

2023-10-13 Thread David Howells

Implement support for unbuffered and DIO reads in the netfs library,
utilising the existing read helper code to do block splitting and
individual queuing.  The code also handles extraction of the destination
buffer from the supplied iterator, allowing async unbuffered reads to take
place.

The read will be split up according to the rsize setting and, if supplied,
the ->clamp_length() method.  Note that the next subrequest will be issued
as soon as issue_op returns, without waiting for previous ones to finish.
The network filesystem needs to pause or handle queuing them if it doesn't
want to fire them all at the server simultaneously.

Once all the subrequests have finished, the state will be assessed and the
amount of data to be indicated as having being obtained will be
determined.  As the subrequests may finish in any order, if an intermediate
subrequest is short, any further subrequests may be copied into the buffer
and then abandoned.

In the future, this will also take care of doing an unbuffered read from
encrypted content, with the decryption being done by the library.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/Makefile|   2 +-
 fs/netfs/direct_read.c   | 252 +++
 fs/netfs/internal.h  |   1 +
 fs/netfs/io.c|  78 +--
 fs/netfs/main.c  |   1 +
 fs/netfs/objects.c   |   3 +-
 fs/netfs/stats.c |   4 +-
 include/linux/netfs.h|   6 +
 include/trace/events/netfs.h |   7 +-
 9 files changed, 342 insertions(+), 12 deletions(-)
 create mode 100644 fs/netfs/direct_read.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 5c450db29932..27643557b443 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -3,7 +3,7 @@
 netfs-y := \
buffered_read.o \
buffered_write.o \
-   crypto.o \
+   direct_read.o \
io.o \
iterator.o \
locking.o \
diff --git a/fs/netfs/direct_read.c b/fs/netfs/direct_read.c
new file mode 100644
index ..1d26468aafd9
--- /dev/null
+++ b/fs/netfs/direct_read.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Direct I/O support.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+/*
+ * Copy all of the data from the folios in the source xarray into the
+ * destination iterator.  We cannot step through and kmap the dest iterator if
+ * it's an iovec, so we have to step through the xarray and drop the RCU lock
+ * each time.
+ */
+static int netfs_copy_xarray_to_iter(struct netfs_io_request *rreq,
+struct xarray *xa, struct iov_iter *dst,
+unsigned long long start, size_t avail)
+{
+   struct folio *folio;
+   void *base;
+   pgoff_t index = start / PAGE_SIZE;
+   size_t len, copied, count = min(avail, iov_iter_count(dst));
+
+   XA_STATE(xas, xa, index);
+
+   _enter("%zx", count);
+
+   if (!count) {
+   trace_netfs_failure(rreq, NULL, -EIO, netfs_fail_dio_read_zero);
+   return -EIO;
+   }
+
+   len = PAGE_SIZE - offset_in_page(start);
+   rcu_read_lock();
+   xas_for_each(, folio, ULONG_MAX) {
+   size_t offset;
+
+   if (xas_retry(, folio))
+   continue;
+
+   /* There shouldn't be a need to call xas_pause() as no one else
+* should be modifying the xarray we're iterating over.
+* Really, we only need the RCU readlock to keep lockdep happy
+* inside xas_for_each().
+*/
+   rcu_read_unlock();
+
+   offset = offset_in_folio(folio, start);
+   kdebug("folio %lx +%zx [%llx]", folio->index, offset, start);
+
+   while (offset < folio_size(folio)) {
+   len = min(count, len);
+
+   base = kmap_local_folio(folio, offset);
+   copied = copy_to_iter(base, len, dst);
+   kunmap_local(base);
+   if (copied != len)
+   goto out;
+   count -= len;
+   if (count == 0)
+   goto out;
+
+   start += len;
+   offset += len;
+   len = PAGE_SIZE;
+   }
+
+   rcu_read_lock();
+   }
+
+   rcu_read_unlock();
+out:
+   _leave(" = %zx", count);
+   return count ? -EFAULT : 0;
+}
+
+/*
+ * If we did a direct read to a bounce buffer (say we needed to decrypt it),
+ * copy the data o

[Linux-cachefs] [RFC PATCH 26/53] netfs: Allocate multipage folios in the writepath

2023-10-13 Thread David Howells

Allocate a multipage folio when copying data into the pagecache if possible
if there's sufficient data to warrant it.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_write.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index 406c3f3666fa..4de6a12149e4 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -84,14 +84,19 @@ static enum netfs_how_to_modify netfs_how_to_modify(struct 
netfs_inode *ctx,
 }
 
 /*
- * Grab a folio for writing and lock it.
+ * Grab a folio for writing and lock it.  Attempt to allocate as large a folio
+ * as possible to hold as much of the remaining length as possible in one go.
  */
 static struct folio *netfs_grab_folio_for_write(struct address_space *mapping,
loff_t pos, size_t part)
 {
pgoff_t index = pos / PAGE_SIZE;
+   fgf_t fgp_flags = FGP_WRITEBEGIN;
 
-   return __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
+   if (mapping_large_folio_support(mapping))
+   fgp_flags |= fgf_set_order(pos % PAGE_SIZE + part);
+
+   return __filemap_get_folio(mapping, index, fgp_flags,
   mapping_gfp_mask(mapping));
 }
 
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 23/53] netfs: Dispatch write requests to process a writeback slice

2023-10-13 Thread David Howells

Dispatch one or more write reqeusts to process a writeback slice, where a
slice is tailored more to logical block divisions within the file (such as
crypto blocks, an object layout or cache granules) than the protocol RPC
maximum capacity.

The dispatch doesn't happen until throttling allows, at which point the
entire writeback slice is processed and queued.  A slice may be written to
multiple destinations (one or more servers and the local cache) and the
writes to each destination might be split up along different lines.

The writeback slice holds the required folios pinned.  An iov_iter is
provided in netfs_write_request that describes the buffer to be used.  This
may be part of the pagecache, may have auxiliary padding pages attached or
may be a bounce buffer resulting from crypto or compression.  Consequently,
the filesystem must not twiddle the folio markings directly.

The following API is available to the filesystem:

 (1) The ->create_write_requests() method is called to ask the filesystem
 to create the requests it needs.  This is passed the writeback slice
 to be processed.

 (2) The filesystem should then call netfs_create_write_request() to create
 the requests it needs.

 (3) Once a request is initialised, netfs_queue_write_request() can be
 called to dispatch it asynchronously, if not completed immediately.

 (4) netfs_write_request_completed() should be called to note the
 completion of a request.

 (5) netfs_get_write_request() and netfs_put_write_request() are provided
 to refcount a request.  These take constants from the netfs_wreq_trace
 enum for logging into ftrace.

 (6) The ->free_write_request is method is called to ask the filesystem to
 clean up a request.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/Makefile|   3 +-
 fs/netfs/internal.h  |   6 +
 fs/netfs/output.c| 366 +++
 include/linux/netfs.h|  13 ++
 include/trace/events/netfs.h |  50 -
 5 files changed, 435 insertions(+), 3 deletions(-)
 create mode 100644 fs/netfs/output.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 647ce1935674..ce1197713276 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -7,7 +7,8 @@ netfs-y := \
locking.o \
main.o \
misc.o \
-   objects.o
+   objects.o \
+   output.o
 
 netfs-$(CONFIG_NETFS_STATS) += stats.o
 
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 83418a918ee1..30ec8949ebcd 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -87,6 +87,12 @@ static inline void netfs_see_request(struct netfs_io_request 
*rreq,
trace_netfs_rreq_ref(rreq->debug_id, refcount_read(>ref), what);
 }
 
+/*
+ * output.c
+ */
+int netfs_begin_write(struct netfs_io_request *wreq, bool may_wait,
+ enum netfs_write_trace what);
+
 /*
  * stats.c
  */
diff --git a/fs/netfs/output.c b/fs/netfs/output.c
new file mode 100644
index ..e93453f4372d
--- /dev/null
+++ b/fs/netfs/output.c
@@ -0,0 +1,366 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Network filesystem high-level write support.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+/**
+ * netfs_create_write_request - Create a write operation.
+ * @wreq: The write request this is storing from.
+ * @dest: The destination type
+ * @start: Start of the region this write will modify
+ * @len: Length of the modification
+ * @worker: The worker function to handle the write(s)
+ *
+ * Allocate a write operation, set it up and add it to the list on a write
+ * request.
+ */
+struct netfs_io_subrequest *netfs_create_write_request(struct netfs_io_request 
*wreq,
+  enum netfs_io_source 
dest,
+  loff_t start, size_t len,
+  work_func_t worker)
+{
+   struct netfs_io_subrequest *subreq;
+
+   subreq = netfs_alloc_subrequest(wreq);
+   if (subreq) {
+   INIT_WORK(>work, worker);
+   subreq->source  = dest;
+   subreq->start   = start;
+   subreq->len = len;
+   subreq->debug_index = wreq->subreq_counter++;
+
+   switch (subreq->source) {
+   case NETFS_UPLOAD_TO_SERVER:
+   netfs_stat(_n_wh_upload);
+   break;
+   case NETFS_WRITE_TO_CACHE:
+   netfs_stat(_n_wh_write);
+   break;
+   default:
+   BUG();
+   }
+
+   subreq->io_iter = wreq->io_iter;

[Linux-cachefs] [RFC PATCH 25/53] netfs: Make netfs_read_folio() handle streaming-write pages

2023-10-13 Thread David Howells

netfs_read_folio() needs to handle partially-valid pages that are marked
dirty, but not uptodate in the event that someone tries to read a page was
used to cache data by a streaming write.

In such a case, make netfs_read_folio() set up a bvec iterator that points
to the parts of the folio that need filling and to a sink page for the data
that should be discarded and use that instead of i_pages as the iterator to
be written to.

This requires netfs_rreq_unlock_folios() to convert the page into a normal
dirty uptodate page, getting rid of the partial write record and bumping
the group pointer over to folio->private.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_read.c | 61 ++--
 include/trace/events/netfs.h |  2 ++
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 2f06344bba21..374707df6575 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -16,6 +16,7 @@
 void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 {
struct netfs_io_subrequest *subreq;
+   struct netfs_folio *finfo;
struct folio *folio;
pgoff_t start_page = rreq->start / PAGE_SIZE;
pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
@@ -86,6 +87,15 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
 
if (!pg_failed) {
flush_dcache_folio(folio);
+   finfo = netfs_folio_info(folio);
+   if (finfo) {
+   trace_netfs_folio(folio, 
netfs_folio_trace_filled_gaps);
+   if (finfo->netfs_group)
+   folio_change_private(folio, 
finfo->netfs_group);
+   else
+   folio_detach_private(folio);
+   kfree(finfo);
+   }
folio_mark_uptodate(folio);
}
 
@@ -245,6 +255,7 @@ int netfs_read_folio(struct file *file, struct folio *folio)
struct address_space *mapping = folio_file_mapping(folio);
struct netfs_io_request *rreq;
struct netfs_inode *ctx = netfs_inode(mapping->host);
+   struct folio *sink = NULL;
int ret;
 
_enter("%lx", folio_index(folio));
@@ -265,12 +276,56 @@ int netfs_read_folio(struct file *file, struct folio 
*folio)
trace_netfs_read(rreq, rreq->start, rreq->len, 
netfs_read_trace_readpage);
 
/* Set up the output buffer */
-   iov_iter_xarray(>iter, ITER_DEST, >i_pages,
-   rreq->start, rreq->len);
+   if (folio_test_dirty(folio)) {
+   /* Handle someone trying to read from an unflushed streaming
+* write.  We fiddle the buffer so that a gap at the beginning
+* and/or a gap at the end get copied to, but the middle is
+* discarded.
+*/
+   struct netfs_folio *finfo = netfs_folio_info(folio);
+   struct bio_vec *bvec;
+   unsigned int from = finfo->dirty_offset;
+   unsigned int to = from + finfo->dirty_len;
+   unsigned int off = 0, i = 0;
+   size_t flen = folio_size(folio);
+   size_t nr_bvec = flen / PAGE_SIZE + 2;
+   size_t part;
+
+   ret = -ENOMEM;
+   bvec = kmalloc_array(nr_bvec, sizeof(*bvec), GFP_KERNEL);
+   if (!bvec)
+   goto discard;
+
+   sink = folio_alloc(GFP_KERNEL, 0);
+   if (!sink)
+   goto discard;
+
+   trace_netfs_folio(folio, netfs_folio_trace_read_gaps);
+
+   rreq->direct_bv = bvec;
+   rreq->direct_bv_count = nr_bvec;
+   if (from > 0) {
+   bvec_set_folio([i++], folio, from, 0);
+   off = from;
+   }
+   while (off < to) {
+   part = min_t(size_t, to - off, PAGE_SIZE);
+   bvec_set_folio([i++], sink, part, 0);
+   off += part;
+   }
+   if (to < flen)
+   bvec_set_folio([i++], folio, flen - to, to);
+   iov_iter_bvec(>iter, ITER_DEST, bvec, i, rreq->len);
+   } else {
+   iov_iter_xarray(>iter, ITER_DEST, >i_pages,
+   rreq->start, rreq->len);
+   }
 
ret = netfs_begin_read(rreq, true);
+   if (sink)
+   folio_put(sink);
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
-   return ret;
+   return ret < 0 ? ret : 0;
 
 discard:

[Linux-cachefs] [RFC PATCH 24/53] netfs: Provide func to copy data to pagecache for buffered write

2023-10-13 Thread David Howells

Provide a netfs write helper, netfs_perform_write() to buffer data to be
written in the pagecache and mark the modified folios dirty.

It will perform "streaming writes" for folios that aren't currently
resident, if possible, storing data in partially modified folios that are
marked dirty, but not uptodate.  It will also tag pages as belonging to
fs-specific write groups if so directed by the filesystem.

This is derived from generic_perform_write(), but doesn't use
->write_begin() and ->write_end(), having that logic rolled in instead.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/Makefile|   2 +
 fs/netfs/buffered_read.c |  48 +
 fs/netfs/buffered_write.c| 327 +++
 fs/netfs/internal.h  |   2 +
 include/linux/netfs.h|   5 +
 include/trace/events/netfs.h |  70 
 6 files changed, 454 insertions(+)
 create mode 100644 fs/netfs/buffered_write.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index ce1197713276..5c450db29932 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -2,6 +2,8 @@
 
 netfs-y := \
buffered_read.o \
+   buffered_write.o \
+   crypto.o \
io.o \
iterator.o \
locking.o \
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 05824f73cfc7..2f06344bba21 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -461,3 +461,51 @@ int netfs_write_begin(struct netfs_inode *ctx,
return ret;
 }
 EXPORT_SYMBOL(netfs_write_begin);
+
+/*
+ * Preload the data into a page we're proposing to write into.
+ */
+int netfs_prefetch_for_write(struct file *file, struct folio *folio,
+size_t offset, size_t len)
+{
+   struct netfs_io_request *rreq;
+   struct address_space *mapping = folio_file_mapping(folio);
+   struct netfs_inode *ctx = netfs_inode(mapping->host);
+   unsigned long long start = folio_pos(folio);
+   size_t flen = folio_size(folio);
+   int ret;
+
+   _enter("%zx @%llx", flen, start);
+
+   ret = -ENOMEM;
+
+   rreq = netfs_alloc_request(mapping, file, start, flen,
+  NETFS_READ_FOR_WRITE);
+   if (IS_ERR(rreq)) {
+   ret = PTR_ERR(rreq);
+   goto error;
+   }
+
+   rreq->no_unlock_folio = folio_index(folio);
+   __set_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, >flags);
+   ret = netfs_begin_cache_operation(rreq, ctx);
+   if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
+   goto error_put;
+
+   netfs_stat(_n_rh_write_begin);
+   trace_netfs_read(rreq, start, flen, 
netfs_read_trace_prefetch_for_write);
+
+   /* Set up the output buffer */
+   iov_iter_xarray(>iter, ITER_DEST, >i_pages,
+   rreq->start, rreq->len);
+
+   ret = netfs_begin_read(rreq, true);
+   netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
+   return ret;
+
+error_put:
+   netfs_put_request(rreq, false, netfs_rreq_trace_put_discard);
+error:
+   _leave(" = %d", ret);
+   return ret;
+}
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
new file mode 100644
index ..406c3f3666fa
--- /dev/null
+++ b/fs/netfs/buffered_write.c
@@ -0,0 +1,327 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Network filesystem high-level write support.
+ *
+ * Copyright (C) 2023 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+/*
+ * Determined write method.  Adjust netfs_folio_traces if this is changed.
+ */
+enum netfs_how_to_modify {
+   NETFS_FOLIO_IS_UPTODATE,/* Folio is uptodate already */
+   NETFS_JUST_PREFETCH,/* We have to read the folio anyway */
+   NETFS_WHOLE_FOLIO_MODIFY,   /* We're going to overwrite the whole 
folio */
+   NETFS_MODIFY_AND_CLEAR, /* We can assume there is no data to be 
downloaded. */
+   NETFS_STREAMING_WRITE,  /* Store incomplete data in 
non-uptodate page. */
+   NETFS_STREAMING_WRITE_CONT, /* Continue streaming write. */
+   NETFS_FLUSH_CONTENT,/* Flush incompatible content. */
+};
+
+static void netfs_set_group(struct folio *folio, struct netfs_group 
*netfs_group)
+{
+   if (netfs_group && !folio_get_private(folio))
+   folio_attach_private(folio, netfs_get_group(netfs_group));
+}
+
+/*
+ * Decide how we should modify a folio.  We might be attempting to do
+ * write-streaming, in which case we don't want to a local RMW cycle if we can
+ * avoid it.  If we're doing local caching or content crypto, we award that
+ * priority over avoiding RMW.  If the file is open readably, then w

[Linux-cachefs] [RFC PATCH 22/53] netfs: Prep to use folio->private for write grouping and streaming write

2023-10-13 Thread David Howells

Prepare to use folio->private to hold information write grouping and
streaming write.  These are implemented in the same commit as they both
make use of folio->private and will be both checked at the same time in
several places.

"Write grouping" involves ordering the writeback of groups of writes, such
as is needed for ceph snaps.  A group is represented by a
filesystem-supplied object which must contain a netfs_group struct.  This
contains just a refcount and a pointer to a destructor.

"Streaming write" is the storage of data in folios that are marked dirty,
but not uptodate, to avoid unnecessary reads of data.  This is represented
by a netfs_folio struct.  This contains the offset and length of the
modified region plus the otherwise displaced write grouping pointer.

The way folio->private is multiplexed is:

 (1) If private is NULL then neither is in operation on a dirty folio.

 (2) If private is set, with bit 0 clear, then this points to a group.

 (3) If private is set, with bit 0 set, then this points to a netfs_folio
 struct (with bit 0 AND'ed out).

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/internal.h   | 28 ++
 fs/netfs/misc.c   | 46 +++
 include/linux/netfs.h | 41 ++
 3 files changed, 115 insertions(+)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 46183dad4d50..83418a918ee1 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -147,6 +147,34 @@ static inline bool netfs_is_cache_enabled(struct 
netfs_inode *ctx)
 #endif
 }
 
+/*
+ * Get a ref on a netfs group attached to a dirty page (e.g. a ceph snap).
+ */
+static inline struct netfs_group *netfs_get_group(struct netfs_group 
*netfs_group)
+{
+   if (netfs_group)
+   refcount_inc(_group->ref);
+   return netfs_group;
+}
+
+/*
+ * Dispose of a netfs group attached to a dirty page (e.g. a ceph snap).
+ */
+static inline void netfs_put_group(struct netfs_group *netfs_group)
+{
+   if (netfs_group && refcount_dec_and_test(_group->ref))
+   netfs_group->free(netfs_group);
+}
+
+/*
+ * Dispose of a netfs group attached to a dirty page (e.g. a ceph snap).
+ */
+static inline void netfs_put_group_many(struct netfs_group *netfs_group, int 
nr)
+{
+   if (netfs_group && refcount_sub_and_test(nr, _group->ref))
+   netfs_group->free(netfs_group);
+}
+
 /*/
 /*
  * debug tracing
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index c70f856f3129..8a2a56f1f623 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -159,9 +159,55 @@ void netfs_clear_buffer(struct xarray *buffer)
  */
 void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
 {
+   struct netfs_folio *finfo = NULL;
+   size_t flen = folio_size(folio);
+
_enter("{%lx},%zx,%zx", folio_index(folio), offset, length);
 
folio_wait_fscache(folio);
+
+   if (!folio_test_private(folio))
+   return;
+
+   finfo = netfs_folio_info(folio);
+
+   if (offset == 0 && length >= flen)
+   goto erase_completely;
+
+   if (finfo) {
+   /* We have a partially uptodate page from a streaming write. */
+   unsigned int fstart = finfo->dirty_offset;
+   unsigned int fend = fstart + finfo->dirty_len;
+   unsigned int end = offset + length;
+
+   if (offset >= fend)
+   return;
+   if (end <= fstart)
+   return;
+   if (offset <= fstart && end >= fend)
+   goto erase_completely;
+   if (offset <= fstart && end > fstart)
+   goto reduce_len;
+   if (offset > fstart && end >= fend)
+   goto move_start;
+   /* A partial write was split.  The caller has already zeroed
+* it, so just absorb the hole.
+*/
+   }
+   return;
+
+erase_completely:
+   netfs_put_group(netfs_folio_group(folio));
+   folio_detach_private(folio);
+   folio_clear_uptodate(folio);
+   kfree(finfo);
+   return;
+reduce_len:
+   finfo->dirty_len = offset + length - finfo->dirty_offset;
+   return;
+move_start:
+   finfo->dirty_len -= offset - finfo->dirty_offset;
+   finfo->dirty_offset = offset;
 }
 EXPORT_SYMBOL(netfs_invalidate_folio);
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 39b3eeefa03c..11a073506f98 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -142,6 +142,47 @@ struct netfs_inode {
 #define NETFS_ICTX_ODIRECT 0

[Linux-cachefs] [RFC PATCH 21/53] netfs: Make the refcounting of netfs_begin_read() easier to use

2023-10-13 Thread David Howells

Make the refcounting of netfs_begin_read() easier to use by not eating the
caller's ref on the netfs_io_request it's given.  This makes it easier to
use when we need to look in the request struct after.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/buffered_read.c |  6 +-
 fs/netfs/io.c| 28 +---
 include/trace/events/netfs.h |  9 +
 3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 3b7eb706f2fe..05824f73cfc7 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -217,6 +217,7 @@ void netfs_readahead(struct readahead_control *ractl)
;
 
netfs_begin_read(rreq, false);
+   netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
return;
 
 cleanup_free:
@@ -267,7 +268,9 @@ int netfs_read_folio(struct file *file, struct folio *folio)
iov_iter_xarray(>iter, ITER_DEST, >i_pages,
rreq->start, rreq->len);
 
-   return netfs_begin_read(rreq, true);
+   ret = netfs_begin_read(rreq, true);
+   netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
+   return ret;
 
 discard:
netfs_put_request(rreq, false, netfs_rreq_trace_put_discard);
@@ -436,6 +439,7 @@ int netfs_write_begin(struct netfs_inode *ctx,
ret = netfs_begin_read(rreq, true);
if (ret < 0)
goto error;
+   netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
 
 have_folio:
ret = folio_wait_fscache_killable(folio);
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index c80b8eed1209..1795f8679be9 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -362,6 +362,7 @@ static void netfs_rreq_assess(struct netfs_io_request 
*rreq, bool was_async)
 
netfs_rreq_unlock_folios(rreq);
 
+   trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip);
clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, >flags);
wake_up_bit(>flags, NETFS_RREQ_IN_PROGRESS);
 
@@ -657,7 +658,6 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool 
sync)
 
if (rreq->len == 0) {
pr_err("Zero-sized read [R=%x]\n", rreq->debug_id);
-   netfs_put_request(rreq, false, netfs_rreq_trace_put_zero_len);
return -EIO;
}
 
@@ -669,12 +669,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool 
sync)
 
INIT_WORK(>work, netfs_rreq_work);
 
-   if (sync)
-   netfs_get_request(rreq, netfs_rreq_trace_get_hold);
-
/* Chop the read into slices according to what the cache and the netfs
 * want and submit each one.
 */
+   netfs_get_request(rreq, netfs_rreq_trace_get_for_outstanding);
atomic_set(>nr_outstanding, 1);
io_iter = rreq->io_iter;
do {
@@ -684,25 +682,25 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool 
sync)
} while (rreq->submitted < rreq->len);
 
if (sync) {
-   /* Keep nr_outstanding incremented so that the ref always 
belongs to
-* us, and the service code isn't punted off to a random thread 
pool to
-* process.
+   /* Keep nr_outstanding incremented so that the ref always
+* belongs to us, and the service code isn't punted off to a
+* random thread pool to process.  Note that this might start
+* further work, such as writing to the cache.
 */
-   for (;;) {
-   wait_var_event(>nr_outstanding,
-  atomic_read(>nr_outstanding) == 1);
+   wait_var_event(>nr_outstanding,
+  atomic_read(>nr_outstanding) == 1);
+   if (atomic_dec_and_test(>nr_outstanding))
netfs_rreq_assess(rreq, false);
-   if (!test_bit(NETFS_RREQ_IN_PROGRESS, >flags))
-   break;
-   cond_resched();
-   }
+
+   trace_netfs_rreq(rreq, netfs_rreq_trace_wait_ip);
+   wait_on_bit(>flags, NETFS_RREQ_IN_PROGRESS,
+   TASK_UNINTERRUPTIBLE);
 
ret = rreq->error;
if (ret == 0 && rreq->submitted < rreq->len) {
trace_netfs_failure(rreq, NULL, ret, 
netfs_fail_short_read);
ret = -EIO;
}
-   netfs_put_request(rreq, false, netfs_rreq_trace_put_hold);
} else {
/* If we decrement nr_outstanding to 0, the ref belongs to us. 
*/
if (atomic_dec_and_test(>nr_outstanding))
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 4ea4

[Linux-cachefs] [RFC PATCH 20/53] fscache: Add a function to begin an cache op from a netfslib request

2023-10-13 Thread David Howells

Add a function to begin an cache read or write operation from a netfslib
I/O request.  This function can then be pointed to directly by the network
filesystem's netfs_request_ops::begin_cache_operation op pointer.

Ideally, netfslib would just call into fscache directly, but that would
cause dependency cycles as fscache calls into netfslib directly.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/9p/vfs_addr.c| 18 ++
 fs/afs/file.c   | 14 +-
 fs/ceph/addr.c  |  2 +-
 fs/ceph/cache.h | 12 
 fs/fscache/io.c | 42 +
 include/linux/fscache.h |  6 ++
 6 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 18a666c43e4a..516572bad412 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -82,25 +83,10 @@ static void v9fs_free_request(struct netfs_io_request *rreq)
p9_fid_put(fid);
 }
 
-/**
- * v9fs_begin_cache_operation - Begin a cache operation for a read
- * @rreq: The read request
- */
-static int v9fs_begin_cache_operation(struct netfs_io_request *rreq)
-{
-#ifdef CONFIG_9P_FSCACHE
-   struct fscache_cookie *cookie = v9fs_inode_cookie(V9FS_I(rreq->inode));
-
-   return fscache_begin_read_operation(>cache_resources, cookie);
-#else
-   return -ENOBUFS;
-#endif
-}
-
 const struct netfs_request_ops v9fs_req_ops = {
.init_request   = v9fs_init_request,
.free_request   = v9fs_free_request,
-   .begin_cache_operation  = v9fs_begin_cache_operation,
+   .begin_cache_operation  = fscache_begin_cache_operation,
.issue_read = v9fs_issue_read,
 };
 
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 3e39a2ebcad6..5bb78d874292 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -360,18 +360,6 @@ static int afs_init_request(struct netfs_io_request *rreq, 
struct file *file)
return 0;
 }
 
-static int afs_begin_cache_operation(struct netfs_io_request *rreq)
-{
-#ifdef CONFIG_AFS_FSCACHE
-   struct afs_vnode *vnode = AFS_FS_I(rreq->inode);
-
-   return fscache_begin_read_operation(>cache_resources,
-   afs_vnode_cache(vnode));
-#else
-   return -ENOBUFS;
-#endif
-}
-
 static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
 struct folio **foliop, void **_fsdata)
 {
@@ -388,7 +376,7 @@ static void afs_free_request(struct netfs_io_request *rreq)
 const struct netfs_request_ops afs_req_ops = {
.init_request   = afs_init_request,
.free_request   = afs_free_request,
-   .begin_cache_operation  = afs_begin_cache_operation,
+   .begin_cache_operation  = fscache_begin_cache_operation,
.check_write_begin  = afs_check_write_begin,
.issue_read = afs_issue_read,
 };
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 92a5ddcd9a76..4841b06df78c 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -488,7 +488,7 @@ static void ceph_netfs_free_request(struct netfs_io_request 
*rreq)
 const struct netfs_request_ops ceph_netfs_ops = {
.init_request   = ceph_init_request,
.free_request   = ceph_netfs_free_request,
-   .begin_cache_operation  = ceph_begin_cache_operation,
+   .begin_cache_operation  = fscache_begin_cache_operation,
.issue_read = ceph_netfs_issue_read,
.expand_readahead   = ceph_netfs_expand_readahead,
.clamp_length   = ceph_netfs_clamp_length,
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index dc502daac49a..b804f1094764 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -57,13 +57,6 @@ static inline int ceph_fscache_dirty_folio(struct 
address_space *mapping,
return fscache_dirty_folio(mapping, folio, ceph_fscache_cookie(ci));
 }
 
-static inline int ceph_begin_cache_operation(struct netfs_io_request *rreq)
-{
-   struct fscache_cookie *cookie = 
ceph_fscache_cookie(ceph_inode(rreq->inode));
-
-   return fscache_begin_read_operation(>cache_resources, cookie);
-}
-
 static inline bool ceph_is_cache_enabled(struct inode *inode)
 {
return fscache_cookie_enabled(ceph_fscache_cookie(ceph_inode(inode)));
@@ -135,11 +128,6 @@ static inline bool ceph_is_cache_enabled(struct inode 
*inode)
return false;
 }
 
-static inline int ceph_begin_cache_operation(struct netfs_io_request *rreq)
-{
-   return -ENOBUFS;
-}
-
 static inline void ceph_fscache_note_page_release(struct inode *inode)
 {
 }
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 0d2b8dec8f82..cb602dd651e6 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -158,6 +158,48 @@ int __fscache_begin_write_operation(struct 
net

[Linux-cachefs] [RFC PATCH 18/53] netfs: Add a hook to allow tell the netfs to update its i_size

2023-10-13 Thread David Howells

Add a hook for netfslib's write helpers to call to tell the network
filesystem that it should update its i_size.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 include/linux/netfs.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 4115274e3129..39b3eeefa03c 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -250,6 +250,7 @@ struct netfs_request_ops {
void (*free_subrequest)(struct netfs_io_subrequest *rreq);
int (*begin_cache_operation)(struct netfs_io_request *rreq);
 
+   /* Read request handling */
void (*expand_readahead)(struct netfs_io_request *rreq);
bool (*clamp_length)(struct netfs_io_subrequest *subreq);
void (*issue_read)(struct netfs_io_subrequest *subreq);
@@ -257,6 +258,9 @@ struct netfs_request_ops {
int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
 struct folio **foliop, void **_fsdata);
void (*done)(struct netfs_io_request *rreq);
+
+   /* Modification handling */
+   void (*update_i_size)(struct inode *inode, loff_t i_size);
 };
 
 /*
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 19/53] netfs: Make netfs_put_request() handle a NULL pointer

2023-10-13 Thread David Howells

Make netfs_put_request() just return if given a NULL request pointer.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 30ec42566966..7a78c1665bc9 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -111,19 +111,22 @@ static void netfs_free_request(struct work_struct *work)
 void netfs_put_request(struct netfs_io_request *rreq, bool was_async,
   enum netfs_rreq_ref_trace what)
 {
-   unsigned int debug_id = rreq->debug_id;
+   unsigned int debug_id;
bool dead;
int r;
 
-   dead = __refcount_dec_and_test(>ref, );
-   trace_netfs_rreq_ref(debug_id, r - 1, what);
-   if (dead) {
-   if (was_async) {
-   rreq->work.func = netfs_free_request;
-   if (!queue_work(system_unbound_wq, >work))
-   BUG();
-   } else {
-   netfs_free_request(>work);
+   if (rreq) {
+   debug_id = rreq->debug_id;
+   dead = __refcount_dec_and_test(>ref, );
+   trace_netfs_rreq_ref(debug_id, r - 1, what);
+   if (dead) {
+   if (was_async) {
+   rreq->work.func = netfs_free_request;
+   if (!queue_work(system_unbound_wq, >work))
+   BUG();
+   } else {
+   netfs_free_request(>work);
+   }
}
}
 }
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 17/53] netfs: Extend the netfs_io_*request structs to handle writes

2023-10-13 Thread David Howells

Modify the netfs_io_request struct to act as a point around which writes
can be coordinated.  It represents and pins a range of pages that need
writing and a list of regions of dirty data in that range of pages.

If RMW is required, the original data can be downloaded into the bounce
buffer, decrypted if necessary, the modifications made, then the modified
data can be reencrypted/recompressed and sent back to the server.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/internal.h  |  6 ++
 fs/netfs/main.c  |  3 ++-
 fs/netfs/objects.c   |  6 ++
 fs/netfs/stats.c | 18 ++
 include/linux/netfs.h| 15 ++-
 include/trace/events/netfs.h |  8 ++--
 6 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 00e01278316f..46183dad4d50 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -109,6 +109,12 @@ extern atomic_t netfs_n_rh_write_begin;
 extern atomic_t netfs_n_rh_write_done;
 extern atomic_t netfs_n_rh_write_failed;
 extern atomic_t netfs_n_rh_write_zskip;
+extern atomic_t netfs_n_wh_upload;
+extern atomic_t netfs_n_wh_upload_done;
+extern atomic_t netfs_n_wh_upload_failed;
+extern atomic_t netfs_n_wh_write;
+extern atomic_t netfs_n_wh_write_done;
+extern atomic_t netfs_n_wh_write_failed;
 
 
 static inline void netfs_stat(atomic_t *stat)
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 0f0c6e70aa44..e990738c2213 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -28,10 +28,11 @@ MODULE_PARM_DESC(netfs_debug, "Netfs support debugging 
mask");
 LIST_HEAD(netfs_io_requests);
 DEFINE_SPINLOCK(netfs_proc_lock);
 
-static const char *netfs_origins[] = {
+static const char *netfs_origins[nr__netfs_io_origin] = {
[NETFS_READAHEAD]   = "RA",
[NETFS_READPAGE]= "RP",
[NETFS_READ_FOR_WRITE]  = "RW",
+   [NETFS_WRITEBACK]   = "WB",
 };
 
 /*
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 9b965a509e5a..30ec42566966 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -20,6 +20,7 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
struct inode *inode = file ? file_inode(file) : mapping->host;
struct netfs_inode *ctx = netfs_inode(inode);
struct netfs_io_request *rreq;
+   bool cached = netfs_is_cache_enabled(ctx);
int ret;
 
rreq = kzalloc(ctx->ops->io_request_size ?: sizeof(struct 
netfs_io_request),
@@ -38,7 +39,10 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
xa_init(>bounce);
INIT_LIST_HEAD(>subrequests);
refcount_set(>ref, 1);
+
__set_bit(NETFS_RREQ_IN_PROGRESS, >flags);
+   if (cached)
+   __set_bit(NETFS_RREQ_WRITE_TO_CACHE, >flags);
if (file && file->f_flags & O_NONBLOCK)
__set_bit(NETFS_RREQ_NONBLOCK, >flags);
if (rreq->netfs_ops->init_request) {
@@ -50,6 +54,7 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
}
}
 
+   trace_netfs_rreq_ref(rreq->debug_id, 1, netfs_rreq_trace_new);
netfs_proc_add_rreq(rreq);
netfs_stat(_n_rh_rreq);
return rreq;
@@ -134,6 +139,7 @@ struct netfs_io_subrequest *netfs_alloc_subrequest(struct 
netfs_io_request *rreq
 sizeof(struct netfs_io_subrequest),
 GFP_KERNEL);
if (subreq) {
+   INIT_WORK(>work, NULL);
INIT_LIST_HEAD(>rreq_link);
refcount_set(>ref, 2);
subreq->rreq = rreq;
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
index 5510a7a14a40..ce2a1a983280 100644
--- a/fs/netfs/stats.c
+++ b/fs/netfs/stats.c
@@ -27,6 +27,12 @@ atomic_t netfs_n_rh_write_begin;
 atomic_t netfs_n_rh_write_done;
 atomic_t netfs_n_rh_write_failed;
 atomic_t netfs_n_rh_write_zskip;
+atomic_t netfs_n_wh_upload;
+atomic_t netfs_n_wh_upload_done;
+atomic_t netfs_n_wh_upload_failed;
+atomic_t netfs_n_wh_write;
+atomic_t netfs_n_wh_write_done;
+atomic_t netfs_n_wh_write_failed;
 
 void netfs_stats_show(struct seq_file *m)
 {
@@ -50,9 +56,13 @@ void netfs_stats_show(struct seq_file *m)
   atomic_read(_n_rh_read),
   atomic_read(_n_rh_read_done),
   atomic_read(_n_rh_read_failed));
-   seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n",
-  atomic_read(_n_rh_write),
-  atomic_read(_n_rh_write_done),
-  atomic_read(_n_rh_write_failed));
+   seq_printf(m, "WrHelp : UL=%u us=%u uf=%u\n",
+  atomic_read(_n_wh_upload),
+  atomic_read(_n_wh_upload_done),
+

[Linux-cachefs] [RFC PATCH 16/53] netfs: Export netfs_put_subrequest() and some tracepoints

2023-10-13 Thread David Howells

Export netfs_put_subrequest() and the netfs_rreq and netfs_sreq
tracepoints.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/main.c| 3 +++
 fs/netfs/objects.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 21f814eee6af..0f0c6e70aa44 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -17,6 +17,9 @@ MODULE_DESCRIPTION("Network fs support");
 MODULE_AUTHOR("Red Hat, Inc.");
 MODULE_LICENSE("GPL");
 
+EXPORT_TRACEPOINT_SYMBOL(netfs_rreq);
+EXPORT_TRACEPOINT_SYMBOL(netfs_sreq);
+
 unsigned netfs_debug;
 module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask");
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 0782a284dda8..9b965a509e5a 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -180,3 +180,4 @@ void netfs_put_subrequest(struct netfs_io_subrequest 
*subreq, bool was_async,
if (dead)
netfs_free_subrequest(subreq, was_async);
 }
+EXPORT_SYMBOL(netfs_put_subrequest);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 14/53] netfs: Add func to calculate pagecount/size-limited span of an iterator

2023-10-13 Thread David Howells

Add a function to work out how much of an ITER_BVEC or ITER_XARRAY iterator
we can use in a pagecount-limited and size-limited span.  This will be
used, for example, to limit the number of segments in a subrequest to the
maximum number of elements that an RDMA transfer can handle.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/iterator.c   | 97 +++
 include/linux/netfs.h |  2 +
 2 files changed, 99 insertions(+)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 2ff07ba655a0..b781bbbf1d8d 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -101,3 +101,100 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, 
size_t orig_len,
return npages;
 }
 EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
+
+/*
+ * Select the span of a bvec iterator we're going to use.  Limit it by both 
maximum
+ * size and maximum number of segments.  Returns the size of the span in bytes.
+ */
+static size_t netfs_limit_bvec(const struct iov_iter *iter, size_t 
start_offset,
+  size_t max_size, size_t max_segs)
+{
+   const struct bio_vec *bvecs = iter->bvec;
+   unsigned int nbv = iter->nr_segs, ix = 0, nsegs = 0;
+   size_t len, span = 0, n = iter->count;
+   size_t skip = iter->iov_offset + start_offset;
+
+   if (WARN_ON(!iov_iter_is_bvec(iter)) ||
+   WARN_ON(start_offset > n) ||
+   n == 0)
+   return 0;
+
+   while (n && ix < nbv && skip) {
+   len = bvecs[ix].bv_len;
+   if (skip < len)
+   break;
+   skip -= len;
+   n -= len;
+   ix++;
+   }
+
+   while (n && ix < nbv) {
+   len = min3(n, bvecs[ix].bv_len - skip, max_size);
+   span += len;
+   nsegs++;
+   ix++;
+   if (span >= max_size || nsegs >= max_segs)
+   break;
+   skip = 0;
+   n -= len;
+   }
+
+   return min(span, max_size);
+}
+
+/*
+ * Select the span of an xarray iterator we're going to use.  Limit it by both
+ * maximum size and maximum number of segments.  It is assumed that segments
+ * can be larger than a page in size, provided they're physically contiguous.
+ * Returns the size of the span in bytes.
+ */
+static size_t netfs_limit_xarray(const struct iov_iter *iter, size_t 
start_offset,
+size_t max_size, size_t max_segs)
+{
+   struct folio *folio;
+   unsigned int nsegs = 0;
+   loff_t pos = iter->xarray_start + iter->iov_offset;
+   pgoff_t index = pos / PAGE_SIZE;
+   size_t span = 0, n = iter->count;
+
+   XA_STATE(xas, iter->xarray, index);
+
+   if (WARN_ON(!iov_iter_is_xarray(iter)) ||
+   WARN_ON(start_offset > n) ||
+   n == 0)
+   return 0;
+   max_size = min(max_size, n - start_offset);
+
+   rcu_read_lock();
+   xas_for_each(, folio, ULONG_MAX) {
+   size_t offset, flen, len;
+   if (xas_retry(, folio))
+   continue;
+   if (WARN_ON(xa_is_value(folio)))
+   break;
+   if (WARN_ON(folio_test_hugetlb(folio)))
+   break;
+
+   flen = folio_size(folio);
+   offset = offset_in_folio(folio, pos);
+   len = min(max_size, flen - offset);
+   span += len;
+   nsegs++;
+   if (span >= max_size || nsegs >= max_segs)
+   break;
+   }
+
+   rcu_read_unlock();
+   return min(span, max_size);
+}
+
+size_t netfs_limit_iter(const struct iov_iter *iter, size_t start_offset,
+   size_t max_size, size_t max_segs)
+{
+   if (iov_iter_is_bvec(iter))
+   return netfs_limit_bvec(iter, start_offset, max_size, max_segs);
+   if (iov_iter_is_xarray(iter))
+   return netfs_limit_xarray(iter, start_offset, max_size, 
max_segs);
+   BUG();
+}
+EXPORT_SYMBOL(netfs_limit_iter);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index a7220e906287..2b5e04ea4db2 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -328,6 +328,8 @@ void netfs_stats_show(struct seq_file *);
 ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
struct iov_iter *new,
iov_iter_extraction_t extraction_flags);
+size_t netfs_limit_iter(const struct iov_iter *iter, size_t start_offset,
+   size_t max_size, size_t max_segs);
 
 int netfs_start_io_read(struct inode *inode);
 void netfs_end_io_read(struct inode *inode);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 15/53] netfs: Limit subrequest by size or number of segments

2023-10-13 Thread David Howells

Limit a subrequest to a maximum size and/or a maximum number of contiguous
physical regions.  This permits, for instance, an subreq's iterator to be
limited to the number of DMA'able segments that a large RDMA request can
handle.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/io.c| 18 ++
 include/linux/netfs.h|  1 +
 include/trace/events/netfs.h |  1 +
 3 files changed, 20 insertions(+)

diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index d8e9cd6ce338..c80b8eed1209 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -525,6 +525,7 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
struct iov_iter *io_iter)
 {
enum netfs_io_source source;
+   size_t lsize;
 
_enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, 
rreq->i_size);
 
@@ -547,13 +548,30 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq,
source = NETFS_INVALID_READ;
goto out;
}
+
+   if (subreq->max_nr_segs) {
+   lsize = netfs_limit_iter(io_iter, 0, subreq->len,
+subreq->max_nr_segs);
+   if (subreq->len > lsize) {
+   subreq->len = lsize;
+   trace_netfs_sreq(subreq, 
netfs_sreq_trace_limited);
+   }
+   }
}
 
+   if (subreq->len > rreq->len)
+   pr_warn("R=%08x[%u] SREQ>RREQ %zx > %zx\n",
+   rreq->debug_id, subreq->debug_index,
+   subreq->len, rreq->len);
+
if (WARN_ON(subreq->len == 0)) {
source = NETFS_INVALID_READ;
goto out;
}
 
+   subreq->source = source;
+   trace_netfs_sreq(subreq, netfs_sreq_trace_prepare);
+
subreq->io_iter = *io_iter;
iov_iter_truncate(>io_iter, subreq->len);
iov_iter_advance(io_iter, subreq->len);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 2b5e04ea4db2..aaf1c1d4de51 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -163,6 +163,7 @@ struct netfs_io_subrequest {
refcount_t  ref;
short   error;  /* 0 or error that occurred */
unsigned short  debug_index;/* Index in list (for debugging 
output) */
+   unsigned intmax_nr_segs;/* 0 or max number of segments 
in an iterator */
enum netfs_io_sourcesource; /* Where to read from/write to 
*/
unsigned long   flags;
 #define NETFS_SREQ_COPY_TO_CACHE   0   /* Set if should copy the data 
to the cache */
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index beec534cbaab..fce6d0bc78e5 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -44,6 +44,7 @@
 #define netfs_sreq_traces  \
EM(netfs_sreq_trace_download_instead,   "RDOWN")\
EM(netfs_sreq_trace_free,   "FREE ")\
+   EM(netfs_sreq_trace_limited,"LIMIT")\
EM(netfs_sreq_trace_prepare,"PREP ")\
EM(netfs_sreq_trace_resubmit_short, "SHORT")\
EM(netfs_sreq_trace_submit, "SUBMT")\
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 13/53] netfs: Add bounce buffering support

2023-10-13 Thread David Howells

Add a second xarray struct to netfs_io_request for the purposes of holding
a bounce buffer for when we have to deal with encrypted/compressed data or
if we have to up/download data in blocks larger than we were asked for.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/io.c | 6 +-
 fs/netfs/objects.c| 3 +++
 include/linux/netfs.h | 2 ++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index e9d408e211b8..d8e9cd6ce338 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -643,7 +643,11 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool 
sync)
return -EIO;
}
 
-   rreq->io_iter = rreq->iter;
+   if (test_bit(NETFS_RREQ_USE_BOUNCE_BUFFER, >flags))
+   iov_iter_xarray(>io_iter, ITER_DEST, >bounce,
+   rreq->start, rreq->len);
+   else
+   rreq->io_iter = rreq->iter;
 
INIT_WORK(>work, netfs_rreq_work);
 
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 4396318081bf..0782a284dda8 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -35,6 +35,7 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
rreq->inode = inode;
rreq->i_size= i_size_read(inode);
rreq->debug_id  = atomic_inc_return(_ids);
+   xa_init(>bounce);
INIT_LIST_HEAD(>subrequests);
refcount_set(>ref, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, >flags);
@@ -43,6 +44,7 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
if (rreq->netfs_ops->init_request) {
ret = rreq->netfs_ops->init_request(rreq, file);
if (ret < 0) {
+   xa_destroy(>bounce);
kfree(rreq);
return ERR_PTR(ret);
}
@@ -96,6 +98,7 @@ static void netfs_free_request(struct work_struct *work)
}
kvfree(rreq->direct_bv);
}
+   netfs_clear_buffer(>bounce);
kfree_rcu(rreq, rcu);
netfs_stat_d(_n_rh_rreq);
 }
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index e8d702ac6968..a7220e906287 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -196,6 +196,7 @@ struct netfs_io_request {
struct iov_iter iter;   /* Unencrypted-side iterator */
struct iov_iter io_iter;/* I/O (Encrypted-side) 
iterator */
struct bio_vec  *direct_bv; /* DIO buffer list (when 
handling iovec-iter) */
+   struct xarray   bounce; /* Bounce buffer (eg. for 
crypto/compression) */
void*netfs_priv;/* Private data for the netfs */
unsigned intdirect_bv_count; /* Number of elements in bv[] 
*/
unsigned intdebug_id;
@@ -220,6 +221,7 @@ struct netfs_io_request {
 #define NETFS_RREQ_IN_PROGRESS 5   /* Unlocked when the request 
completes */
 #define NETFS_RREQ_NONBLOCK6   /* Don't block if possible 
(O_NONBLOCK) */
 #define NETFS_RREQ_BLOCKED 7   /* We blocked */
+#define NETFS_RREQ_USE_BOUNCE_BUFFER   8   /* Use bounce buffer */
const struct netfs_request_ops *netfs_ops;
 };
 
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 12/53] netfs: Provide tools to create a buffer in an xarray

2023-10-13 Thread David Howells

Provide tools to create a buffer in an xarray, with a function to add
new folios with a mark.  This will be used to create bounce buffer and can be
used more easily to create a list of folios the span of which would require
more than a page's worth of bio_vec structs.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/internal.h   |  16 +
 fs/netfs/misc.c   | 140 ++
 include/linux/netfs.h |   4 ++
 3 files changed, 160 insertions(+)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 1f067aa96c50..00e01278316f 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -52,6 +52,22 @@ static inline void netfs_proc_add_rreq(struct 
netfs_io_request *rreq) {}
 static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq) {}
 #endif
 
+/*
+ * misc.c
+ */
+int netfs_xa_store_and_mark(struct xarray *xa, unsigned long index,
+   struct folio *folio, bool put_mark,
+   bool pagecache_mark, gfp_t gfp_mask);
+int netfs_add_folios_to_buffer(struct xarray *buffer,
+  struct address_space *mapping,
+  pgoff_t index, pgoff_t to, gfp_t gfp_mask);
+int netfs_set_up_buffer(struct xarray *buffer,
+   struct address_space *mapping,
+   struct readahead_control *ractl,
+   struct folio *keep,
+   pgoff_t have_index, unsigned int have_folios);
+void netfs_clear_buffer(struct xarray *buffer);
+
 /*
  * objects.c
  */
diff --git a/fs/netfs/misc.c b/fs/netfs/misc.c
index c3baf2b247d9..c70f856f3129 100644
--- a/fs/netfs/misc.c
+++ b/fs/netfs/misc.c
@@ -8,6 +8,146 @@
 #include 
 #include "internal.h"
 
+/*
+ * Attach a folio to the buffer and maybe set marks on it to say that we need
+ * to put the folio later and twiddle the pagecache flags.
+ */
+int netfs_xa_store_and_mark(struct xarray *xa, unsigned long index,
+   struct folio *folio, bool put_mark,
+   bool pagecache_mark, gfp_t gfp_mask)
+{
+   XA_STATE_ORDER(xas, xa, index, folio_order(folio));
+
+retry:
+   xas_lock();
+   for (;;) {
+   xas_store(, folio);
+   if (!xas_error())
+   break;
+   xas_unlock();
+   if (!xas_nomem(, gfp_mask))
+   return xas_error();
+   goto retry;
+   }
+
+   if (put_mark)
+   xas_set_mark(, NETFS_BUF_PUT_MARK);
+   if (pagecache_mark)
+   xas_set_mark(, NETFS_BUF_PAGECACHE_MARK);
+   xas_unlock();
+   return xas_error();
+}
+
+/*
+ * Create the specified range of folios in the buffer attached to the read
+ * request.  The folios are marked with NETFS_BUF_PUT_MARK so that we know that
+ * these need freeing later.
+ */
+int netfs_add_folios_to_buffer(struct xarray *buffer,
+  struct address_space *mapping,
+  pgoff_t index, pgoff_t to, gfp_t gfp_mask)
+{
+   struct folio *folio;
+   int ret;
+
+   if (to + 1 == index) /* Page range is inclusive */
+   return 0;
+
+   do {
+   /* TODO: Figure out what order folio can be allocated here */
+   folio = filemap_alloc_folio(readahead_gfp_mask(mapping), 0);
+   if (!folio)
+   return -ENOMEM;
+   folio->index = index;
+   ret = netfs_xa_store_and_mark(buffer, index, folio,
+ true, false, gfp_mask);
+   if (ret < 0) {
+   folio_put(folio);
+   return ret;
+   }
+
+   index += folio_nr_pages(folio);
+   } while (index <= to && index != 0);
+
+   return 0;
+}
+
+/*
+ * Set up a buffer into which to data will be read or decrypted/decompressed.
+ * The folios to be read into are attached to this buffer and the gaps filled
+ * in to form a continuous region.
+ */
+int netfs_set_up_buffer(struct xarray *buffer,
+   struct address_space *mapping,
+   struct readahead_control *ractl,
+   struct folio *keep,
+   pgoff_t have_index, unsigned int have_folios)
+{
+   struct folio *folio;
+   gfp_t gfp_mask = readahead_gfp_mask(mapping);
+   unsigned int want_folios = have_folios;
+   pgoff_t want_index = have_index;
+   int ret;
+
+   ret = netfs_add_folios_to_buffer(buffer, mapping, want_index,
+have_index - 1, gfp_mask);
+   if (ret < 0)
+   return ret;
+   have_folios += have_index - want_index;
+
+   ret = netfs_add_folios_to_buffer(buffer, mapping,
+

[Linux-cachefs] [RFC PATCH 11/53] netfs: Add support for DIO buffering

2023-10-13 Thread David Howells

Add a bvec array pointer and an iterator to netfs_io_request for either
holding a copy of a DIO iterator or a list of all the bits of buffer
pointed to by a DIO iterator.

There are two problems:  Firstly, if an iovec-class iov_iter is passed to
->read_iter() or ->write_iter(), this cannot be passed directly to
kernel_sendmsg() or kernel_recvmsg() as that may cause locking recursion if
a fault is generated, so we need to keep track of the pages involved
separately.

Secondly, if the I/O is asynchronous, we must copy the iov_iter describing
the buffer before returning to the caller as it may be immediately
deallocated.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c| 10 ++
 include/linux/netfs.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 8e92b8401aaa..4396318081bf 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -78,6 +78,7 @@ static void netfs_free_request(struct work_struct *work)
 {
struct netfs_io_request *rreq =
container_of(work, struct netfs_io_request, work);
+   unsigned int i;
 
trace_netfs_rreq(rreq, netfs_rreq_trace_free);
netfs_proc_del_rreq(rreq);
@@ -86,6 +87,15 @@ static void netfs_free_request(struct work_struct *work)
rreq->netfs_ops->free_request(rreq);
if (rreq->cache_resources.ops)

rreq->cache_resources.ops->end_operation(>cache_resources);
+   if (rreq->direct_bv) {
+   for (i = 0; i < rreq->direct_bv_count; i++) {
+   if (rreq->direct_bv[i].bv_page) {
+   if (rreq->direct_bv_unpin)
+   
unpin_user_page(rreq->direct_bv[i].bv_page);
+   }
+   }
+   kvfree(rreq->direct_bv);
+   }
kfree_rcu(rreq, rcu);
netfs_stat_d(_n_rh_rreq);
 }
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index bd0437088f0e..66479a61ad00 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -191,7 +191,9 @@ struct netfs_io_request {
struct list_headsubrequests;/* Contributory I/O operations 
*/
struct iov_iter iter;   /* Unencrypted-side iterator */
struct iov_iter io_iter;/* I/O (Encrypted-side) 
iterator */
+   struct bio_vec  *direct_bv; /* DIO buffer list (when 
handling iovec-iter) */
void*netfs_priv;/* Private data for the netfs */
+   unsigned intdirect_bv_count; /* Number of elements in bv[] 
*/
unsigned intdebug_id;
unsigned intrsize;  /* Maximum read size (0 for 
none) */
atomic_tnr_outstanding; /* Number of ops in progress */
@@ -200,6 +202,7 @@ struct netfs_io_request {
size_t  len;/* Length of the request */
short   error;  /* 0 or error that occurred */
enum netfs_io_originorigin; /* Origin of the request */
+   booldirect_bv_unpin; /* T if direct_bv[] must be 
unpinned */
loff_t  i_size; /* Size of the file */
loff_t  start;  /* Start position */
pgoff_t no_unlock_folio; /* Don't unlock this folio 
after read */
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 10/53] netfs: Add iov_iters to (sub)requests to describe various buffers

2023-10-13 Thread David Howells

Add three iov_iter structs:

 (1) Add an iov_iter (->iter) to the I/O request to describe the
 unencrypted-side buffer.

 (2) Add an iov_iter (->io_iter) to the I/O request to describe the
 encrypted-side I/O buffer.  This may be a different size to the buffer
 in (1).

 (3) Add an iov_iter (->io_iter) to the I/O subrequest to describe the part
 of the I/O buffer for that subrequest.

This will allow future patches to point to a bounce buffer instead for
purposes of handling oversize writes, decryption (where we want to save the
encrypted data to the cache) and decompression.

These iov_iters persist for the lifetime of the (sub)request, and so can be
accessed multiple times without worrying about them being deallocated upon
return to the caller.

The network filesystem must appropriately advance the iterator before
terminating the request.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/afs/file.c|  6 +---
 fs/netfs/buffered_read.c | 13 
 fs/netfs/io.c| 69 +---
 include/linux/netfs.h|  3 ++
 4 files changed, 67 insertions(+), 24 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 3d2e1913ea27..3e39a2ebcad6 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -323,11 +323,7 @@ static void afs_issue_read(struct netfs_io_subrequest 
*subreq)
fsreq->len  = subreq->len   - subreq->transferred;
fsreq->key  = key_get(subreq->rreq->netfs_priv);
fsreq->vnode= vnode;
-   fsreq->iter = >def_iter;
-
-   iov_iter_xarray(>def_iter, ITER_DEST,
-   >vnode->netfs.inode.i_mapping->i_pages,
-   fsreq->pos, fsreq->len);
+   fsreq->iter = >io_iter;
 
afs_fetch_data(fsreq->vnode, fsreq);
afs_put_read(fsreq);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index a2852fa64ad0..3b7eb706f2fe 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -206,6 +206,10 @@ void netfs_readahead(struct readahead_control *ractl)
 
netfs_rreq_expand(rreq, ractl);
 
+   /* Set up the output buffer */
+   iov_iter_xarray(>iter, ITER_DEST, >mapping->i_pages,
+   rreq->start, rreq->len);
+
/* Drop the refs on the folios here rather than in the cache or
 * filesystem.  The locks will be dropped in netfs_rreq_unlock().
 */
@@ -258,6 +262,11 @@ int netfs_read_folio(struct file *file, struct folio 
*folio)
 
netfs_stat(_n_rh_readpage);
trace_netfs_read(rreq, rreq->start, rreq->len, 
netfs_read_trace_readpage);
+
+   /* Set up the output buffer */
+   iov_iter_xarray(>iter, ITER_DEST, >i_pages,
+   rreq->start, rreq->len);
+
return netfs_begin_read(rreq, true);
 
 discard:
@@ -415,6 +424,10 @@ int netfs_write_begin(struct netfs_inode *ctx,
ractl._nr_pages = folio_nr_pages(folio);
netfs_rreq_expand(rreq, );
 
+   /* Set up the output buffer */
+   iov_iter_xarray(>iter, ITER_DEST, >i_pages,
+   rreq->start, rreq->len);
+
/* We hold the folio locks, so we can drop the references */
folio_get(folio);
while (readahead_folio())
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index 7f753380e047..e9d408e211b8 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -21,12 +21,7 @@
  */
 static void netfs_clear_unread(struct netfs_io_subrequest *subreq)
 {
-   struct iov_iter iter;
-
-   iov_iter_xarray(, ITER_DEST, >rreq->mapping->i_pages,
-   subreq->start + subreq->transferred,
-   subreq->len   - subreq->transferred);
-   iov_iter_zero(iov_iter_count(), );
+   iov_iter_zero(iov_iter_count(>io_iter), >io_iter);
 }
 
 static void netfs_cache_read_terminated(void *priv, ssize_t 
transferred_or_error,
@@ -46,14 +41,9 @@ static void netfs_read_from_cache(struct netfs_io_request 
*rreq,
  enum netfs_read_from_hole read_hole)
 {
struct netfs_cache_resources *cres = >cache_resources;
-   struct iov_iter iter;
 
netfs_stat(_n_rh_read);
-   iov_iter_xarray(, ITER_DEST, >mapping->i_pages,
-   subreq->start + subreq->transferred,
-   subreq->len   - subreq->transferred);
-
-   cres->ops->read(cres, subreq->start, , read_hole,
+   cres->ops->read(cres, subreq->start, >io_iter, read_hole,
netfs_cache_read_terminated, subreq);
 }
 
@@ -88,6 +78,11 @@ static void netfs_read_from_server(struct netfs_io_request 
*rreq,
   struct netfs_io_subrequest *subreq)
 {
n

[Linux-cachefs] [RFC PATCH 07/53] netfs: Provide invalidate_folio and release_folio calls

2023-10-13 Thread David Howells

Provide default invalidate_folio and release_folio calls.  These will need
to interact with invalidation correctly at some point.  They will be needed
if netfslib is to make use of folio->private for its own purposes.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/9p/vfs_addr.c  | 33 ++-
 fs/afs/file.c | 53 ---
 fs/ceph/addr.c| 24 ++--
 fs/netfs/Makefile |  1 +
 fs/netfs/misc.c   | 51 +
 include/linux/netfs.h |  6 +++--
 6 files changed, 64 insertions(+), 104 deletions(-)
 create mode 100644 fs/netfs/misc.c

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 8a635999a7d6..18a666c43e4a 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -104,35 +104,6 @@ const struct netfs_request_ops v9fs_req_ops = {
.issue_read = v9fs_issue_read,
 };
 
-/**
- * v9fs_release_folio - release the private state associated with a folio
- * @folio: The folio to be released
- * @gfp: The caller's allocation restrictions
- *
- * Returns true if the page can be released, false otherwise.
- */
-
-static bool v9fs_release_folio(struct folio *folio, gfp_t gfp)
-{
-   if (folio_test_private(folio))
-   return false;
-#ifdef CONFIG_9P_FSCACHE
-   if (folio_test_fscache(folio)) {
-   if (current_is_kswapd() || !(gfp & __GFP_FS))
-   return false;
-   folio_wait_fscache(folio);
-   }
-   
fscache_note_page_release(v9fs_inode_cookie(V9FS_I(folio_inode(folio;
-#endif
-   return true;
-}
-
-static void v9fs_invalidate_folio(struct folio *folio, size_t offset,
-size_t length)
-{
-   folio_wait_fscache(folio);
-}
-
 #ifdef CONFIG_9P_FSCACHE
 static void v9fs_write_to_cache_done(void *priv, ssize_t transferred_or_error,
 bool was_async)
@@ -355,8 +326,8 @@ const struct address_space_operations v9fs_addr_operations 
= {
.writepage = v9fs_vfs_writepage,
.write_begin = v9fs_write_begin,
.write_end = v9fs_write_end,
-   .release_folio = v9fs_release_folio,
-   .invalidate_folio = v9fs_invalidate_folio,
+   .release_folio = netfs_release_folio,
+   .invalidate_folio = netfs_invalidate_folio,
.launder_folio = v9fs_launder_folio,
.direct_IO = v9fs_direct_IO,
 };
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 0c49b3b6f214..3fea5cd8ef13 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -20,9 +20,6 @@
 
 static int afs_file_mmap(struct file *file, struct vm_area_struct *vma);
 static int afs_symlink_read_folio(struct file *file, struct folio *folio);
-static void afs_invalidate_folio(struct folio *folio, size_t offset,
-  size_t length);
-static bool afs_release_folio(struct folio *folio, gfp_t gfp_flags);
 
 static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter);
 static ssize_t afs_file_splice_read(struct file *in, loff_t *ppos,
@@ -57,8 +54,8 @@ const struct address_space_operations afs_file_aops = {
.readahead  = netfs_readahead,
.dirty_folio= afs_dirty_folio,
.launder_folio  = afs_launder_folio,
-   .release_folio  = afs_release_folio,
-   .invalidate_folio = afs_invalidate_folio,
+   .release_folio  = netfs_release_folio,
+   .invalidate_folio = netfs_invalidate_folio,
.write_begin= afs_write_begin,
.write_end  = afs_write_end,
.writepages = afs_writepages,
@@ -67,8 +64,8 @@ const struct address_space_operations afs_file_aops = {
 
 const struct address_space_operations afs_symlink_aops = {
.read_folio = afs_symlink_read_folio,
-   .release_folio  = afs_release_folio,
-   .invalidate_folio = afs_invalidate_folio,
+   .release_folio  = netfs_release_folio,
+   .invalidate_folio = netfs_invalidate_folio,
.migrate_folio  = filemap_migrate_folio,
 };
 
@@ -405,48 +402,6 @@ int afs_write_inode(struct inode *inode, struct 
writeback_control *wbc)
return 0;
 }
 
-/*
- * invalidate part or all of a page
- * - release a page and clean up its private data if offset is 0 (indicating
- *   the entire page)
- */
-static void afs_invalidate_folio(struct folio *folio, size_t offset,
-  size_t length)
-{
-   _enter("{%lu},%zu,%zu", folio->index, offset, length);
-
-   folio_wait_fscache(folio);
-   _leave("");
-}
-
-/*
- * release a page and clean up its private state if it's not busy
- * - return true if the page can now be released, false if not
- */
-static bool afs_release_folio(struct folio *folio, gfp_t gfp)
-{
-   struct afs_vnode *vnode = AFS_FS_I(folio_inode(folio));
-
-   _enter("{{%llx:%llu}[%lu],%lx},%x",
-

[Linux-cachefs] [RFC PATCH 09/53] netfs: Implement unbuffered/DIO vs buffered I/O locking

2023-10-13 Thread David Howells

Borrow NFS's direct-vs-buffered I/O locking into netfslib.  Similar code is
also used in ceph.

Modify it to have the correct checker annotations for i_rwsem lock
acquisition/release and to return -ERESTARTSYS if waits are interrupted.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/Makefile |   1 +
 fs/netfs/locking.c| 209 ++
 include/linux/netfs.h |  10 ++
 3 files changed, 220 insertions(+)
 create mode 100644 fs/netfs/locking.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index cd22554d9048..647ce1935674 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -4,6 +4,7 @@ netfs-y := \
buffered_read.o \
io.o \
iterator.o \
+   locking.o \
main.o \
misc.o \
objects.o
diff --git a/fs/netfs/locking.c b/fs/netfs/locking.c
new file mode 100644
index ..fecca8ea6322
--- /dev/null
+++ b/fs/netfs/locking.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * I/O and data path helper functionality.
+ *
+ * Borrowed from NFS Copyright (c) 2016 Trond Myklebust
+ */
+
+#include 
+#include 
+
+/*
+ * inode_dio_wait_interruptible - wait for outstanding DIO requests to finish
+ * @inode: inode to wait for
+ *
+ * Waits for all pending direct I/O requests to finish so that we can
+ * proceed with a truncate or equivalent operation.
+ *
+ * Must be called under a lock that serializes taking new references
+ * to i_dio_count, usually by inode->i_mutex.
+ */
+static int inode_dio_wait_interruptible(struct inode *inode)
+{
+   if (!atomic_read(>i_dio_count))
+   return 0;
+
+   wait_queue_head_t *wq = bit_waitqueue(>i_state, __I_DIO_WAKEUP);
+   DEFINE_WAIT_BIT(q, >i_state, __I_DIO_WAKEUP);
+
+   for (;;) {
+   prepare_to_wait(wq, _entry, TASK_INTERRUPTIBLE);
+   if (!atomic_read(>i_dio_count))
+   break;
+   if (signal_pending(current))
+   break;
+   schedule();
+   }
+   finish_wait(wq, _entry);
+
+   return atomic_read(>i_dio_count) ? -ERESTARTSYS : 0;
+}
+
+/* Call with exclusively locked inode->i_rwsem */
+static int netfs_block_o_direct(struct netfs_inode *ictx)
+{
+   if (!test_bit(NETFS_ICTX_ODIRECT, >flags))
+   return 0;
+   clear_bit(NETFS_ICTX_ODIRECT, >flags);
+   return inode_dio_wait_interruptible(>inode);
+}
+
+/**
+ * netfs_start_io_read - declare the file is being used for buffered reads
+ * @inode: file inode
+ *
+ * Declare that a buffered read operation is about to start, and ensure
+ * that we block all direct I/O.
+ * On exit, the function ensures that the NETFS_ICTX_ODIRECT flag is unset,
+ * and holds a shared lock on inode->i_rwsem to ensure that the flag
+ * cannot be changed.
+ * In practice, this means that buffered read operations are allowed to
+ * execute in parallel, thanks to the shared lock, whereas direct I/O
+ * operations need to wait to grab an exclusive lock in order to set
+ * NETFS_ICTX_ODIRECT.
+ * Note that buffered writes and truncates both take a write lock on
+ * inode->i_rwsem, meaning that those are serialised w.r.t. the reads.
+ */
+int netfs_start_io_read(struct inode *inode)
+   __acquires(inode->i_rwsem)
+{
+   struct netfs_inode *ictx = netfs_inode(inode);
+
+   /* Be an optimist! */
+   if (down_read_interruptible(>i_rwsem) < 0)
+   return -ERESTARTSYS;
+   if (test_bit(NETFS_ICTX_ODIRECT, >flags) == 0)
+   return 0;
+   up_read(>i_rwsem);
+
+   /* Slow path */
+   if (down_write_killable(>i_rwsem) < 0)
+   return -ERESTARTSYS;
+   if (netfs_block_o_direct(ictx) < 0) {
+   up_write(>i_rwsem);
+   return -ERESTARTSYS;
+   }
+   downgrade_write(>i_rwsem);
+   return 0;
+}
+
+/**
+ * netfs_end_io_read - declare that the buffered read operation is done
+ * @inode: file inode
+ *
+ * Declare that a buffered read operation is done, and release the shared
+ * lock on inode->i_rwsem.
+ */
+void netfs_end_io_read(struct inode *inode)
+   __releases(inode->i_rwsem)
+{
+   up_read(>i_rwsem);
+}
+
+/**
+ * netfs_start_io_write - declare the file is being used for buffered writes
+ * @inode: file inode
+ *
+ * Declare that a buffered read operation is about to start, and ensure
+ * that we block all direct I/O.
+ */
+int netfs_start_io_write(struct inode *inode)
+   __acquires(inode->i_rwsem)
+{
+   struct netfs_inode *ictx = netfs_inode(inode);
+
+   if (down_write_killable(>i_rwsem) < 0)
+   return -ERESTARTSYS;
+   if (netfs_block_o_direct(ictx) < 0) {
+   up_write(>i_rwsem);
+   return -ERESTARTSYS;
+   }
+   return 0;
+}
+
+/**
+ * net

[Linux-cachefs] [RFC PATCH 08/53] netfs: Add rsize to netfs_io_request

2023-10-13 Thread David Howells

Add an rsize parameter to netfs_io_request to be filled in by the network
filesystem when the request is initialised.  This indicates the maximum
size of a read request that the netfs will honour in that region.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/afs/file.c | 1 +
 fs/ceph/addr.c| 2 ++
 include/linux/netfs.h | 1 +
 3 files changed, 4 insertions(+)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 3fea5cd8ef13..3d2e1913ea27 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -360,6 +360,7 @@ static int afs_symlink_read_folio(struct file *file, struct 
folio *folio)
 static int afs_init_request(struct netfs_io_request *rreq, struct file *file)
 {
rreq->netfs_priv = key_get(afs_file_key(file));
+   rreq->rsize = 4 * 1024 * 1024;
return 0;
 }
 
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index ced19ff08988..92a5ddcd9a76 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -419,6 +419,8 @@ static int ceph_init_request(struct netfs_io_request *rreq, 
struct file *file)
struct ceph_netfs_request_data *priv;
int ret = 0;
 
+   rreq->rsize = 1024 * 1024;
+
if (rreq->origin != NETFS_READAHEAD)
return 0;
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index daa431c4148d..02e888c170da 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -188,6 +188,7 @@ struct netfs_io_request {
struct list_headsubrequests;/* Contributory I/O operations 
*/
void*netfs_priv;/* Private data for the netfs */
unsigned intdebug_id;
+   unsigned intrsize;  /* Maximum read size (0 for 
none) */
atomic_tnr_outstanding; /* Number of ops in progress */
atomic_tnr_copy_ops;/* Number of copy-to-cache ops 
in progress */
size_t  submitted;  /* Amount submitted for I/O so 
far */
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 05/53] netfs: Add a ->free_subrequest() op

2023-10-13 Thread David Howells

Add a ->free_subrequest() op so that the netfs can clean up data attached
to a subrequest.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c| 2 ++
 include/linux/netfs.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 2f1865ff7cce..8e92b8401aaa 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -147,6 +147,8 @@ static void netfs_free_subrequest(struct 
netfs_io_subrequest *subreq,
struct netfs_io_request *rreq = subreq->rreq;
 
trace_netfs_sreq(subreq, netfs_sreq_trace_free);
+   if (rreq->netfs_ops->free_subrequest)
+   rreq->netfs_ops->free_subrequest(subreq);
kfree(subreq);
netfs_stat_d(_n_rh_sreq);
netfs_put_request(rreq, was_async, netfs_rreq_trace_put_subreq);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 6942b8cf03dc..ed64d1034afa 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -218,6 +218,7 @@ struct netfs_request_ops {
unsigned intio_subrequest_size; /* Alloc size for 
netfs_io_subrequest struct */
int (*init_request)(struct netfs_io_request *rreq, struct file *file);
void (*free_request)(struct netfs_io_request *rreq);
+   void (*free_subrequest)(struct netfs_io_subrequest *rreq);
int (*begin_cache_operation)(struct netfs_io_request *rreq);
 
void (*expand_readahead)(struct netfs_io_request *rreq);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 06/53] afs: Don't use folio->private to record partial modification

2023-10-13 Thread David Howells

AFS currently uses folio->private to store the range of bytes within a
folio that have been modified - the idea being that if we have, say, a 2MiB
folio and someone writes a single byte, we only have to write back that
single page and not the whole 2MiB folio - thereby saving on network
bandwidth.

Remove this, at least for now, and accept the extra network load (which
doesn't matter in the common case of writing a whole file at a time from
beginning to end).

This makes folio->private available for netfslib to use.

Signed-off-by: David Howells 
cc: Marc Dionne 
cc: Jeff Layton 
cc: linux-...@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/afs/file.c  |  67 -
 fs/afs/internal.h  |  56 ---
 fs/afs/write.c | 188 -
 include/trace/events/afs.h |  16 +---
 4 files changed, 42 insertions(+), 285 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index d37dd201752b..0c49b3b6f214 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -405,63 +405,6 @@ int afs_write_inode(struct inode *inode, struct 
writeback_control *wbc)
return 0;
 }
 
-/*
- * Adjust the dirty region of the page on truncation or full invalidation,
- * getting rid of the markers altogether if the region is entirely invalidated.
- */
-static void afs_invalidate_dirty(struct folio *folio, size_t offset,
-size_t length)
-{
-   struct afs_vnode *vnode = AFS_FS_I(folio_inode(folio));
-   unsigned long priv;
-   unsigned int f, t, end = offset + length;
-
-   priv = (unsigned long)folio_get_private(folio);
-
-   /* we clean up only if the entire page is being invalidated */
-   if (offset == 0 && length == folio_size(folio))
-   goto full_invalidate;
-
-/* If the page was dirtied by page_mkwrite(), the PTE stays writable
- * and we don't get another notification to tell us to expand it
- * again.
- */
-   if (afs_is_folio_dirty_mmapped(priv))
-   return;
-
-   /* We may need to shorten the dirty region */
-   f = afs_folio_dirty_from(folio, priv);
-   t = afs_folio_dirty_to(folio, priv);
-
-   if (t <= offset || f >= end)
-   return; /* Doesn't overlap */
-
-   if (f < offset && t > end)
-   return; /* Splits the dirty region - just absorb it */
-
-   if (f >= offset && t <= end)
-   goto undirty;
-
-   if (f < offset)
-   t = offset;
-   else
-   f = end;
-   if (f == t)
-   goto undirty;
-
-   priv = afs_folio_dirty(folio, f, t);
-   folio_change_private(folio, (void *)priv);
-   trace_afs_folio_dirty(vnode, tracepoint_string("trunc"), folio);
-   return;
-
-undirty:
-   trace_afs_folio_dirty(vnode, tracepoint_string("undirty"), folio);
-   folio_clear_dirty_for_io(folio);
-full_invalidate:
-   trace_afs_folio_dirty(vnode, tracepoint_string("inval"), folio);
-   folio_detach_private(folio);
-}
-
 /*
  * invalidate part or all of a page
  * - release a page and clean up its private data if offset is 0 (indicating
@@ -472,11 +415,6 @@ static void afs_invalidate_folio(struct folio *folio, 
size_t offset,
 {
_enter("{%lu},%zu,%zu", folio->index, offset, length);
 
-   BUG_ON(!folio_test_locked(folio));
-
-   if (folio_get_private(folio))
-   afs_invalidate_dirty(folio, offset, length);
-
folio_wait_fscache(folio);
_leave("");
 }
@@ -504,11 +442,6 @@ static bool afs_release_folio(struct folio *folio, gfp_t 
gfp)
fscache_note_page_release(afs_vnode_cache(vnode));
 #endif
 
-   if (folio_test_private(folio)) {
-   trace_afs_folio_dirty(vnode, tracepoint_string("rel"), folio);
-   folio_detach_private(folio);
-   }
-
/* Indicate that the folio can be released */
_leave(" = T");
return true;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 469a717467a4..03fed7ecfab9 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -892,62 +892,6 @@ static inline void afs_invalidate_cache(struct afs_vnode 
*vnode, unsigned int fl
   i_size_read(>netfs.inode), flags);
 }
 
-/*
- * We use folio->private to hold the amount of the folio that we've written to,
- * splitting the field into two parts.  However, we need to represent a range
- * 0...FOLIO_SIZE, so we reduce the resolution if the size of the folio
- * exceeds what we can encode.
- */
-#ifdef CONFIG_64BIT
-#define __AFS_FOLIO_PRIV_MASK  0x7fffUL
-#define __AFS_FOLIO_PRIV_SHIFT 32
-#define __AFS_FOLIO_PRIV_MMAPPED   0x8000UL
-#else
-#define __AFS_FOLIO_PRIV_MASK  0x7fffUL
-#define __AFS_FOLIO_PRIV_

[Linux-cachefs] [RFC PATCH 03/53] netfs: Note nonblockingness in the netfs_io_request struct

2023-10-13 Thread David Howells

Allow O_NONBLOCK to be noted in the netfs_io_request struct.  Also add a
flag, NETFS_RREQ_BLOCKED to record if we did block.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c| 2 ++
 include/linux/netfs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 85f428fc52e6..e41f9fc9bdd2 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -37,6 +37,8 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
INIT_LIST_HEAD(>subrequests);
refcount_set(>ref, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, >flags);
+   if (file && file->f_flags & O_NONBLOCK)
+   __set_bit(NETFS_RREQ_NONBLOCK, >flags);
if (rreq->netfs_ops->init_request) {
ret = rreq->netfs_ops->init_request(rreq, file);
if (ret < 0) {
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 282511090ead..b92e982ac4a0 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -205,6 +205,8 @@ struct netfs_io_request {
 #define NETFS_RREQ_DONT_UNLOCK_FOLIOS  3   /* Don't unlock the folios on 
completion */
 #define NETFS_RREQ_FAILED  4   /* The request failed */
 #define NETFS_RREQ_IN_PROGRESS 5   /* Unlocked when the request 
completes */
+#define NETFS_RREQ_NONBLOCK6   /* Don't block if possible 
(O_NONBLOCK) */
+#define NETFS_RREQ_BLOCKED 7   /* We blocked */
const struct netfs_request_ops *netfs_ops;
 };
 
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 04/53] netfs: Allow the netfs to make the io (sub)request alloc larger

2023-10-13 Thread David Howells

Allow the network filesystem to specify extra space to be allocated on the
end of the io (sub)request.  This allows cifs, for example, to use this
space rather than allocating its own cifs_readdata struct.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c| 7 +--
 include/linux/netfs.h | 2 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index e41f9fc9bdd2..2f1865ff7cce 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -22,7 +22,8 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
struct netfs_io_request *rreq;
int ret;
 
-   rreq = kzalloc(sizeof(struct netfs_io_request), GFP_KERNEL);
+   rreq = kzalloc(ctx->ops->io_request_size ?: sizeof(struct 
netfs_io_request),
+  GFP_KERNEL);
if (!rreq)
return ERR_PTR(-ENOMEM);
 
@@ -116,7 +117,9 @@ struct netfs_io_subrequest *netfs_alloc_subrequest(struct 
netfs_io_request *rreq
 {
struct netfs_io_subrequest *subreq;
 
-   subreq = kzalloc(sizeof(struct netfs_io_subrequest), GFP_KERNEL);
+   subreq = kzalloc(rreq->netfs_ops->io_subrequest_size ?:
+sizeof(struct netfs_io_subrequest),
+GFP_KERNEL);
if (subreq) {
INIT_LIST_HEAD(>rreq_link);
refcount_set(>ref, 2);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index b92e982ac4a0..6942b8cf03dc 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -214,6 +214,8 @@ struct netfs_io_request {
  * Operations the network filesystem can/must provide to the helpers.
  */
 struct netfs_request_ops {
+   unsigned intio_request_size;/* Alloc size for 
netfs_io_request struct */
+   unsigned intio_subrequest_size; /* Alloc size for 
netfs_io_subrequest struct */
int (*init_request)(struct netfs_io_request *rreq, struct file *file);
void (*free_request)(struct netfs_io_request *rreq);
int (*begin_cache_operation)(struct netfs_io_request *rreq);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 02/53] netfs: Track the fpos above which the server has no data

2023-10-13 Thread David Howells

Track the file position above which the server is not expected to have any
data and preemptively assume that we can simply fill blocks with zeroes
locally rather than attempting to download them - even if we've written
data back to the server.  Assume that any data that was written back above
that position is held in the local cache.  Call this the "zero point".

Make use of this to optimise away some reads from the server.  We need to
set the zero point in the following circumstances:

 (1) When we see an extant remote inode and have no cache for it, we set
 the zero_point to i_size.

 (2) On local inode creation, we set zero_point to 0.

 (3) On local truncation down, we reduce zero_point to the new i_size if
 the new i_size is lower.

 (4) On local truncation up, we don't change zero_point.

 (5) On local modification, we don't change zero_point.

 (6) On remote invalidation, we set zero_point to the new i_size.

 (7) If stored data is culled from the local cache, we must set zero_point
 above that if the data also got written to the server.

 (8) If dirty data is written back to the server, but not the local cache,
 we must set zero_point above that.

Assuming the above, any read from the server at or above the zero_point
position will return all zeroes.

The zero_point value can be stored in the cache, provided the above rules
are applied to it by any code that culls part of the local cache.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/afs/inode.c   | 13 +++--
 fs/netfs/buffered_read.c | 40 +---
 include/linux/netfs.h|  5 +
 3 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 1c794a1896aa..46bc5574d6f5 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -252,6 +252,7 @@ static void afs_apply_status(struct afs_operation *op,
vnode->netfs.remote_i_size = status->size;
if (change_size) {
afs_set_i_size(vnode, status->size);
+   vnode->netfs.zero_point = status->size;
inode_set_ctime_to_ts(inode, t);
inode->i_atime = t;
}
@@ -865,17 +866,17 @@ static void afs_setattr_success(struct afs_operation *op)
 static void afs_setattr_edit_file(struct afs_operation *op)
 {
struct afs_vnode_param *vp = >file[0];
-   struct inode *inode = >vnode->netfs.inode;
+   struct afs_vnode *vnode = vp->vnode;
 
if (op->setattr.attr->ia_valid & ATTR_SIZE) {
loff_t size = op->setattr.attr->ia_size;
loff_t i_size = op->setattr.old_i_size;
 
-   if (size < i_size)
-   truncate_pagecache(inode, size);
-   if (size != i_size)
-   fscache_resize_cookie(afs_vnode_cache(vp->vnode),
- vp->scb.status.size);
+   if (size != i_size) {
+   truncate_pagecache(>netfs.inode, size);
+   netfs_resize_file(>netfs, size);
+   fscache_resize_cookie(afs_vnode_cache(vnode), size);
+   }
}
 }
 
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 2cd3ccf4c439..a2852fa64ad0 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -147,6 +147,22 @@ static void netfs_rreq_expand(struct netfs_io_request 
*rreq,
}
 }
 
+/*
+ * Begin an operation, and fetch the stored zero point value from the cookie if
+ * available.
+ */
+static int netfs_begin_cache_operation(struct netfs_io_request *rreq,
+  struct netfs_inode *ctx)
+{
+   int ret = -ENOBUFS;
+
+   if (ctx->ops->begin_cache_operation) {
+   ret = ctx->ops->begin_cache_operation(rreq);
+   /* TODO: Get the zero point value from the cache */
+   }
+   return ret;
+}
+
 /**
  * netfs_readahead - Helper to manage a read request
  * @ractl: The description of the readahead request
@@ -180,11 +196,9 @@ void netfs_readahead(struct readahead_control *ractl)
if (IS_ERR(rreq))
return;
 
-   if (ctx->ops->begin_cache_operation) {
-   ret = ctx->ops->begin_cache_operation(rreq);
-   if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
-   goto cleanup_free;
-   }
+   ret = netfs_begin_cache_operation(rreq, ctx);
+   if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
+   goto cleanup_free;
 
netfs_stat(_n_rh_readahead);
trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
@@ -238,11 +252,9 @@ int netfs_read_folio(struct file *file, struct folio

[Linux-cachefs] [RFC PATCH 01/53] netfs: Add a procfile to list in-progress requests

2023-10-13 Thread David Howells

Add a procfile, /proc/fs/netfs/requests, to list in-progress netfslib I/O
requests.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/internal.h   | 22 +++
 fs/netfs/main.c   | 91 +++
 fs/netfs/objects.c|  4 +-
 include/linux/netfs.h |  6 ++-
 4 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 43fac1b14e40..1f067aa96c50 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -29,6 +29,28 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool 
sync);
  * main.c
  */
 extern unsigned int netfs_debug;
+extern struct list_head netfs_io_requests;
+extern spinlock_t netfs_proc_lock;
+
+#ifdef CONFIG_PROC_FS
+static inline void netfs_proc_add_rreq(struct netfs_io_request *rreq)
+{
+   spin_lock(_proc_lock);
+   list_add_tail_rcu(>proc_link, _io_requests);
+   spin_unlock(_proc_lock);
+}
+static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq)
+{
+   if (!list_empty(>proc_link)) {
+   spin_lock(_proc_lock);
+   list_del_rcu(>proc_link);
+   spin_unlock(_proc_lock);
+   }
+}
+#else
+static inline void netfs_proc_add_rreq(struct netfs_io_request *rreq) {}
+static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq) {}
+#endif
 
 /*
  * objects.c
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 068568702957..21f814eee6af 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -7,6 +7,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include "internal.h"
 #define CREATE_TRACE_POINTS
 #include 
@@ -18,3 +20,92 @@ MODULE_LICENSE("GPL");
 unsigned netfs_debug;
 module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask");
+
+#ifdef CONFIG_PROC_FS
+LIST_HEAD(netfs_io_requests);
+DEFINE_SPINLOCK(netfs_proc_lock);
+
+static const char *netfs_origins[] = {
+   [NETFS_READAHEAD]   = "RA",
+   [NETFS_READPAGE]= "RP",
+   [NETFS_READ_FOR_WRITE]  = "RW",
+};
+
+/*
+ * Generate a list of I/O requests in /proc/fs/netfs/requests
+ */
+static int netfs_requests_seq_show(struct seq_file *m, void *v)
+{
+   struct netfs_io_request *rreq;
+
+   if (v == _io_requests) {
+   seq_puts(m,
+"REQUEST  OR REF FL ERR  OPS COVERAGE\n"
+" == === ==  === =\n"
+);
+   return 0;
+   }
+
+   rreq = list_entry(v, struct netfs_io_request, proc_link);
+   seq_printf(m,
+  "%08x %s %3d %2lx %4d %3d @%04llx %zx/%zx",
+  rreq->debug_id,
+  netfs_origins[rreq->origin],
+  refcount_read(>ref),
+  rreq->flags,
+  rreq->error,
+  atomic_read(>nr_outstanding),
+  rreq->start, rreq->submitted, rreq->len);
+   seq_putc(m, '\n');
+   return 0;
+}
+
+static void *netfs_requests_seq_start(struct seq_file *m, loff_t *_pos)
+   __acquires(rcu)
+{
+   rcu_read_lock();
+   return seq_list_start_head(_io_requests, *_pos);
+}
+
+static void *netfs_requests_seq_next(struct seq_file *m, void *v, loff_t *_pos)
+{
+   return seq_list_next(v, _io_requests, _pos);
+}
+
+static void netfs_requests_seq_stop(struct seq_file *m, void *v)
+   __releases(rcu)
+{
+   rcu_read_unlock();
+}
+
+static const struct seq_operations netfs_requests_seq_ops = {
+   .start  = netfs_requests_seq_start,
+   .next   = netfs_requests_seq_next,
+   .stop   = netfs_requests_seq_stop,
+   .show   = netfs_requests_seq_show,
+};
+#endif /* CONFIG_PROC_FS */
+
+static int __init netfs_init(void)
+{
+   if (!proc_mkdir("fs/netfs", NULL))
+   goto error;
+
+   if (!proc_create_seq("fs/netfs/requests", S_IFREG | 0444, NULL,
+_requests_seq_ops))
+   goto error_proc;
+
+   return 0;
+
+error_proc:
+   remove_proc_entry("fs/netfs", NULL);
+error:
+   return -ENOMEM;
+}
+fs_initcall(netfs_init);
+
+static void __exit netfs_exit(void)
+{
+   remove_proc_entry("fs/netfs", NULL);
+}
+module_exit(netfs_exit);
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index e17cdf53f6a7..85f428fc52e6 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -45,6 +45,7 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
}
}
 
+   netfs_proc_add_rreq(rreq);
netfs_stat(_n_rh_rreq);
return rreq;
 }
@@ -76,12 +77,13 @@ static void netfs_free_request(struct work_struct *work)
container_of(work, s

[Linux-cachefs] [RFC PATCH 05/53] netfs: Add a ->free_subrequest() op

2023-10-13 Thread David Howells

Add a ->free_subrequest() op so that the netfs can clean up data attached
to a subrequest.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c| 2 ++
 include/linux/netfs.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 2f1865ff7cce..8e92b8401aaa 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -147,6 +147,8 @@ static void netfs_free_subrequest(struct 
netfs_io_subrequest *subreq,
struct netfs_io_request *rreq = subreq->rreq;
 
trace_netfs_sreq(subreq, netfs_sreq_trace_free);
+   if (rreq->netfs_ops->free_subrequest)
+   rreq->netfs_ops->free_subrequest(subreq);
kfree(subreq);
netfs_stat_d(_n_rh_sreq);
netfs_put_request(rreq, was_async, netfs_rreq_trace_put_subreq);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 6942b8cf03dc..ed64d1034afa 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -218,6 +218,7 @@ struct netfs_request_ops {
unsigned intio_subrequest_size; /* Alloc size for 
netfs_io_subrequest struct */
int (*init_request)(struct netfs_io_request *rreq, struct file *file);
void (*free_request)(struct netfs_io_request *rreq);
+   void (*free_subrequest)(struct netfs_io_subrequest *rreq);
int (*begin_cache_operation)(struct netfs_io_request *rreq);
 
void (*expand_readahead)(struct netfs_io_request *rreq);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 04/53] netfs: Allow the netfs to make the io (sub)request alloc larger

2023-10-13 Thread David Howells

Allow the network filesystem to specify extra space to be allocated on the
end of the io (sub)request.  This allows cifs, for example, to use this
space rather than allocating its own cifs_readdata struct.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c| 7 +--
 include/linux/netfs.h | 2 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index e41f9fc9bdd2..2f1865ff7cce 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -22,7 +22,8 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
struct netfs_io_request *rreq;
int ret;
 
-   rreq = kzalloc(sizeof(struct netfs_io_request), GFP_KERNEL);
+   rreq = kzalloc(ctx->ops->io_request_size ?: sizeof(struct 
netfs_io_request),
+  GFP_KERNEL);
if (!rreq)
return ERR_PTR(-ENOMEM);
 
@@ -116,7 +117,9 @@ struct netfs_io_subrequest *netfs_alloc_subrequest(struct 
netfs_io_request *rreq
 {
struct netfs_io_subrequest *subreq;
 
-   subreq = kzalloc(sizeof(struct netfs_io_subrequest), GFP_KERNEL);
+   subreq = kzalloc(rreq->netfs_ops->io_subrequest_size ?:
+sizeof(struct netfs_io_subrequest),
+GFP_KERNEL);
if (subreq) {
INIT_LIST_HEAD(>rreq_link);
refcount_set(>ref, 2);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index b92e982ac4a0..6942b8cf03dc 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -214,6 +214,8 @@ struct netfs_io_request {
  * Operations the network filesystem can/must provide to the helpers.
  */
 struct netfs_request_ops {
+   unsigned intio_request_size;/* Alloc size for 
netfs_io_request struct */
+   unsigned intio_subrequest_size; /* Alloc size for 
netfs_io_subrequest struct */
int (*init_request)(struct netfs_io_request *rreq, struct file *file);
void (*free_request)(struct netfs_io_request *rreq);
int (*begin_cache_operation)(struct netfs_io_request *rreq);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 03/53] netfs: Note nonblockingness in the netfs_io_request struct

2023-10-13 Thread David Howells

Allow O_NONBLOCK to be noted in the netfs_io_request struct.  Also add a
flag, NETFS_RREQ_BLOCKED to record if we did block.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/objects.c| 2 ++
 include/linux/netfs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index 85f428fc52e6..e41f9fc9bdd2 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -37,6 +37,8 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
INIT_LIST_HEAD(>subrequests);
refcount_set(>ref, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, >flags);
+   if (file && file->f_flags & O_NONBLOCK)
+   __set_bit(NETFS_RREQ_NONBLOCK, >flags);
if (rreq->netfs_ops->init_request) {
ret = rreq->netfs_ops->init_request(rreq, file);
if (ret < 0) {
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 282511090ead..b92e982ac4a0 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -205,6 +205,8 @@ struct netfs_io_request {
 #define NETFS_RREQ_DONT_UNLOCK_FOLIOS  3   /* Don't unlock the folios on 
completion */
 #define NETFS_RREQ_FAILED  4   /* The request failed */
 #define NETFS_RREQ_IN_PROGRESS 5   /* Unlocked when the request 
completes */
+#define NETFS_RREQ_NONBLOCK6   /* Don't block if possible 
(O_NONBLOCK) */
+#define NETFS_RREQ_BLOCKED 7   /* We blocked */
const struct netfs_request_ops *netfs_ops;
 };
 
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH 02/53] netfs: Track the fpos above which the server has no data

2023-10-13 Thread David Howells

Track the file position above which the server is not expected to have any
data and preemptively assume that we can simply fill blocks with zeroes
locally rather than attempting to download them - even if we've written
data back to the server.  Assume that any data that was written back above
that position is held in the local cache.  Call this the "zero point".

Make use of this to optimise away some reads from the server.  We need to
set the zero point in the following circumstances:

 (1) When we see an extant remote inode and have no cache for it, we set
 the zero_point to i_size.

 (2) On local inode creation, we set zero_point to 0.

 (3) On local truncation down, we reduce zero_point to the new i_size if
 the new i_size is lower.

 (4) On local truncation up, we don't change zero_point.

 (5) On local modification, we don't change zero_point.

 (6) On remote invalidation, we set zero_point to the new i_size.

 (7) If stored data is culled from the local cache, we must set zero_point
 above that if the data also got written to the server.

 (8) If dirty data is written back to the server, but not the local cache,
 we must set zero_point above that.

Assuming the above, any read from the server at or above the zero_point
position will return all zeroes.

The zero_point value can be stored in the cache, provided the above rules
are applied to it by any code that culls part of the local cache.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/afs/inode.c   | 13 +++--
 fs/netfs/buffered_read.c | 40 +---
 include/linux/netfs.h|  5 +
 3 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 1c794a1896aa..46bc5574d6f5 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -252,6 +252,7 @@ static void afs_apply_status(struct afs_operation *op,
vnode->netfs.remote_i_size = status->size;
if (change_size) {
afs_set_i_size(vnode, status->size);
+   vnode->netfs.zero_point = status->size;
inode_set_ctime_to_ts(inode, t);
inode->i_atime = t;
}
@@ -865,17 +866,17 @@ static void afs_setattr_success(struct afs_operation *op)
 static void afs_setattr_edit_file(struct afs_operation *op)
 {
struct afs_vnode_param *vp = >file[0];
-   struct inode *inode = >vnode->netfs.inode;
+   struct afs_vnode *vnode = vp->vnode;
 
if (op->setattr.attr->ia_valid & ATTR_SIZE) {
loff_t size = op->setattr.attr->ia_size;
loff_t i_size = op->setattr.old_i_size;
 
-   if (size < i_size)
-   truncate_pagecache(inode, size);
-   if (size != i_size)
-   fscache_resize_cookie(afs_vnode_cache(vp->vnode),
- vp->scb.status.size);
+   if (size != i_size) {
+   truncate_pagecache(>netfs.inode, size);
+   netfs_resize_file(>netfs, size);
+   fscache_resize_cookie(afs_vnode_cache(vnode), size);
+   }
}
 }
 
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 2cd3ccf4c439..a2852fa64ad0 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -147,6 +147,22 @@ static void netfs_rreq_expand(struct netfs_io_request 
*rreq,
}
 }
 
+/*
+ * Begin an operation, and fetch the stored zero point value from the cookie if
+ * available.
+ */
+static int netfs_begin_cache_operation(struct netfs_io_request *rreq,
+  struct netfs_inode *ctx)
+{
+   int ret = -ENOBUFS;
+
+   if (ctx->ops->begin_cache_operation) {
+   ret = ctx->ops->begin_cache_operation(rreq);
+   /* TODO: Get the zero point value from the cache */
+   }
+   return ret;
+}
+
 /**
  * netfs_readahead - Helper to manage a read request
  * @ractl: The description of the readahead request
@@ -180,11 +196,9 @@ void netfs_readahead(struct readahead_control *ractl)
if (IS_ERR(rreq))
return;
 
-   if (ctx->ops->begin_cache_operation) {
-   ret = ctx->ops->begin_cache_operation(rreq);
-   if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
-   goto cleanup_free;
-   }
+   ret = netfs_begin_cache_operation(rreq, ctx);
+   if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
+   goto cleanup_free;
 
netfs_stat(_n_rh_readahead);
trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
@@ -238,11 +252,9 @@ int netfs_read_folio(struct file *file, struct folio

[Linux-cachefs] [RFC PATCH 01/53] netfs: Add a procfile to list in-progress requests

2023-10-13 Thread David Howells

Add a procfile, /proc/fs/netfs/requests, to list in-progress netfslib I/O
requests.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---
 fs/netfs/internal.h   | 22 +++
 fs/netfs/main.c   | 91 +++
 fs/netfs/objects.c|  4 +-
 include/linux/netfs.h |  6 ++-
 4 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 43fac1b14e40..1f067aa96c50 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -29,6 +29,28 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool 
sync);
  * main.c
  */
 extern unsigned int netfs_debug;
+extern struct list_head netfs_io_requests;
+extern spinlock_t netfs_proc_lock;
+
+#ifdef CONFIG_PROC_FS
+static inline void netfs_proc_add_rreq(struct netfs_io_request *rreq)
+{
+   spin_lock(_proc_lock);
+   list_add_tail_rcu(>proc_link, _io_requests);
+   spin_unlock(_proc_lock);
+}
+static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq)
+{
+   if (!list_empty(>proc_link)) {
+   spin_lock(_proc_lock);
+   list_del_rcu(>proc_link);
+   spin_unlock(_proc_lock);
+   }
+}
+#else
+static inline void netfs_proc_add_rreq(struct netfs_io_request *rreq) {}
+static inline void netfs_proc_del_rreq(struct netfs_io_request *rreq) {}
+#endif
 
 /*
  * objects.c
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 068568702957..21f814eee6af 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -7,6 +7,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include "internal.h"
 #define CREATE_TRACE_POINTS
 #include 
@@ -18,3 +20,92 @@ MODULE_LICENSE("GPL");
 unsigned netfs_debug;
 module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask");
+
+#ifdef CONFIG_PROC_FS
+LIST_HEAD(netfs_io_requests);
+DEFINE_SPINLOCK(netfs_proc_lock);
+
+static const char *netfs_origins[] = {
+   [NETFS_READAHEAD]   = "RA",
+   [NETFS_READPAGE]= "RP",
+   [NETFS_READ_FOR_WRITE]  = "RW",
+};
+
+/*
+ * Generate a list of I/O requests in /proc/fs/netfs/requests
+ */
+static int netfs_requests_seq_show(struct seq_file *m, void *v)
+{
+   struct netfs_io_request *rreq;
+
+   if (v == _io_requests) {
+   seq_puts(m,
+"REQUEST  OR REF FL ERR  OPS COVERAGE\n"
+" == === ==  === =\n"
+);
+   return 0;
+   }
+
+   rreq = list_entry(v, struct netfs_io_request, proc_link);
+   seq_printf(m,
+  "%08x %s %3d %2lx %4d %3d @%04llx %zx/%zx",
+  rreq->debug_id,
+  netfs_origins[rreq->origin],
+  refcount_read(>ref),
+  rreq->flags,
+  rreq->error,
+  atomic_read(>nr_outstanding),
+  rreq->start, rreq->submitted, rreq->len);
+   seq_putc(m, '\n');
+   return 0;
+}
+
+static void *netfs_requests_seq_start(struct seq_file *m, loff_t *_pos)
+   __acquires(rcu)
+{
+   rcu_read_lock();
+   return seq_list_start_head(_io_requests, *_pos);
+}
+
+static void *netfs_requests_seq_next(struct seq_file *m, void *v, loff_t *_pos)
+{
+   return seq_list_next(v, _io_requests, _pos);
+}
+
+static void netfs_requests_seq_stop(struct seq_file *m, void *v)
+   __releases(rcu)
+{
+   rcu_read_unlock();
+}
+
+static const struct seq_operations netfs_requests_seq_ops = {
+   .start  = netfs_requests_seq_start,
+   .next   = netfs_requests_seq_next,
+   .stop   = netfs_requests_seq_stop,
+   .show   = netfs_requests_seq_show,
+};
+#endif /* CONFIG_PROC_FS */
+
+static int __init netfs_init(void)
+{
+   if (!proc_mkdir("fs/netfs", NULL))
+   goto error;
+
+   if (!proc_create_seq("fs/netfs/requests", S_IFREG | 0444, NULL,
+_requests_seq_ops))
+   goto error_proc;
+
+   return 0;
+
+error_proc:
+   remove_proc_entry("fs/netfs", NULL);
+error:
+   return -ENOMEM;
+}
+fs_initcall(netfs_init);
+
+static void __exit netfs_exit(void)
+{
+   remove_proc_entry("fs/netfs", NULL);
+}
+module_exit(netfs_exit);
diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c
index e17cdf53f6a7..85f428fc52e6 100644
--- a/fs/netfs/objects.c
+++ b/fs/netfs/objects.c
@@ -45,6 +45,7 @@ struct netfs_io_request *netfs_alloc_request(struct 
address_space *mapping,
}
}
 
+   netfs_proc_add_rreq(rreq);
netfs_stat(_n_rh_rreq);
return rreq;
 }
@@ -76,12 +77,13 @@ static void netfs_free_request(struct work_struct *work)
container_of(work, s

[Linux-cachefs] [PATCH v2] netfs: Only call folio_start_fscache() one time for each folio

2023-09-18 Thread David Howells

Hi Linus,

Could you apply this please?

Thanks,
David
---
From: Dave Wysochanski 

If a network filesystem using netfs implements a clamp_length()
function, it can set subrequest lengths smaller than a page size.
When we loop through the folios in netfs_rreq_unlock_folios() to
set any folios to be written back, we need to make sure we only
call folio_start_fscache() once for each folio.  Otherwise,
this simple testcase:

  mount -o fsc,rsize=1024,wsize=1024 127.0.0.1:/export /mnt/nfs
  dd if=/dev/zero of=/mnt/nfs/file.bin bs=4096 count=1
  1+0 records in
  1+0 records out
  4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0126359 s, 324 kB/s
  echo 3 > /proc/sys/vm/drop_caches
  cat /mnt/nfs/file.bin > /dev/null

will trigger an oops similar to the following:

...
 page dumped because: VM_BUG_ON_FOLIO(folio_test_private_2(folio))
 [ cut here ]
 kernel BUG at include/linux/netfs.h:44!
...
 CPU: 5 PID: 134 Comm: kworker/u16:5 Kdump: loaded Not tainted 6.4.0-rc5
...
 RIP: 0010:netfs_rreq_unlock_folios+0x68e/0x730 [netfs]
...
 Call Trace:
  
  netfs_rreq_assess+0x497/0x660 [netfs]
  netfs_subreq_terminated+0x32b/0x610 [netfs]
  nfs_netfs_read_completion+0x14e/0x1a0 [nfs]
  nfs_read_completion+0x2f9/0x330 [nfs]
  rpc_free_task+0x72/0xa0 [sunrpc]
  rpc_async_release+0x46/0x70 [sunrpc]
  process_one_work+0x3bd/0x710
  worker_thread+0x89/0x610
  kthread+0x181/0x1c0
  ret_from_fork+0x29/0x50

Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers"
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2210612
Signed-off-by: Dave Wysochanski 
Reviewed-by: Jeff Layton 
Signed-off-by: David Howells 
Link: https://lore.kernel.org/r/20230608214137.856006-1-dwyso...@redhat.com/ # 
v1
Link: https://lore.kernel.org/r/20230915185704.1082982-1-dwyso...@redhat.com/ # 
v2
---
 fs/netfs/buffered_read.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 3404707ddbe7..2cd3ccf4c439 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -47,12 +47,14 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
xas_for_each(, folio, last_page) {
loff_t pg_end;
bool pg_failed = false;
+   bool folio_started;
 
if (xas_retry(, folio))
continue;
 
pg_end = folio_pos(folio) + folio_size(folio) - 1;
 
+   folio_started = false;
for (;;) {
loff_t sreq_end;
 
@@ -60,8 +62,10 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
pg_failed = true;
break;
}
-   if (test_bit(NETFS_SREQ_COPY_TO_CACHE, >flags))
+   if (!folio_started && 
test_bit(NETFS_SREQ_COPY_TO_CACHE, >flags)) {
folio_start_fscache(folio);
+   folio_started = true;
+   }
pg_failed |= subreq_failed;
sreq_end = subreq->start + subreq->len - 1;
if (pg_end < sreq_end)
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH] netfs: Only call folio_start_fscache() one time for each folio

2023-09-15 Thread David Howells

Okay, this looks reasonable.  Should I apply Jeff's suggestion before I send
it to Linus?

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH -next] fscache: Remove duplicated include

2023-08-14 Thread David Howells

GUO Zihua  wrote:

> Remove duplicated include for linux/uio.h. Resolves checkincludes
> message.
> 
> Signed-off-by: GUO Zihua 

Acked-by: David Howells 
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH -next] fscache: Remove duplicated include

2023-08-14 Thread David Howells

GUO Zihua  wrote:

> Remove duplicated include for linux/uio.h. Resolves checkincludes
> message.
> 
> Signed-off-by: GUO Zihua 

Acked-by: David Howells 
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH v7 0/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache

2023-06-28 Thread David Howells

Hi Andrew,

Should this go through the mm tree?

This fixes an optimisation in fscache whereby we don't read from the cache
for a particular file until we know that there's data there that we don't
have in the pagecache.  The problem is that I'm no longer using PG_fscache
(aka PG_private_2) to indicate that the page is cached and so I don't get a
notification when a cached page is dropped from the pagecache.

The first patch merges some folio_has_private() and filemap_release_folio()
pairs and introduces a helper, folio_needs_release(), to indicate if a
release is required.

The second patch is the actual fix.  Following Willy's suggestions[1], it
adds an AS_RELEASE_ALWAYS flag to an address_space that will make
filemap_release_folio() always call ->release_folio(), even if
PG_private/PG_private_2 aren't set.  folio_needs_release() is altered to
add a check for this.

David

Changes:

ver #7)
 - Make NFS set AS_RELEASE_ALWAYS.

ver #6)
 - Drop the third patch which removes a duplicate check in vmscan().

ver #5)
 - Rebased on linus/master.  try_to_release_page() has now been entirely
   replaced by filemap_release_folio(), barring one comment.
 - Cleaned up some pairs in ext4.

ver #4)
 - Split has_private/release call pairs into own patch.
 - Moved folio_needs_release() to mm/internal.h and removed open-coded
   version from filemap_release_folio().
 - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
 - Added experimental patch to reduce shrink_folio_list().

ver #3)
 - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
 - Moved a '&&' to the correct line.

ver #2)
 - Rewrote entirely according to Willy's suggestion[1].

Link: https://lore.kernel.org/r/Yk9V/03wgdyi6...@casper.infradead.org/ [1]
Link: 
https://lore.kernel.org/r/164928630577.457102.8519251179327601178.st...@warthog.procyon.org.uk/
 # v1
Link: 
https://lore.kernel.org/r/166844174069.1124521.10890506360974169994.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/166869495238.3720468.4878151409085146764.st...@warthog.procyon.org.uk/
 # v3
Link: https://lore.kernel.org/r/1459152.1669208...@warthog.procyon.org.uk/ # v3 
also
Link: 
https://lore.kernel.org/r/166924370539.1772793.13730698360771821317.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/167172131368.2334525.8569808925687731937.st...@warthog.procyon.org.uk/
 # v5
Link: https://lore.kernel.org/r/20230216150701.3654894-1-dhowe...@redhat.com/ # 
v6

David Howells (2):
  mm: Merge folio_has_private()/filemap_release_folio() call pairs
  mm, netfs, fscache: Stop read optimisation when folio removed from
pagecache

 fs/9p/cache.c   |  2 ++
 fs/afs/internal.h   |  2 ++
 fs/cachefiles/namei.c   |  2 ++
 fs/ceph/cache.c |  2 ++
 fs/ext4/move_extent.c   | 12 
 fs/nfs/fscache.c|  3 +++
 fs/smb/client/fscache.c |  2 ++
 fs/splice.c |  3 +--
 include/linux/pagemap.h | 16 
 mm/filemap.c|  2 ++
 mm/huge_memory.c|  3 +--
 mm/internal.h   | 11 +++
 mm/khugepaged.c |  3 +--
 mm/memory-failure.c |  8 +++-
 mm/migrate.c|  3 +--
 mm/truncate.c   |  6 ++
 mm/vmscan.c |  8 
 17 files changed, 59 insertions(+), 29 deletions(-)

--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH v7 1/2] mm: Merge folio_has_private()/filemap_release_folio() call pairs

2023-06-28 Thread David Howells

Make filemap_release_folio() check folio_has_private().  Then, in most
cases, where a call to folio_has_private() is immediately followed by a
call to filemap_release_folio(), we can get rid of the test in the pair.

There are a couple of sites in mm/vscan.c that this can't so easily be
done.  In shrink_folio_list(), there are actually three cases (something
different is done for incompletely invalidated buffers), but
filemap_release_folio() elides two of them.

In shrink_active_list(), we don't have have the folio lock yet, so the
check allows us to avoid locking the page unnecessarily.

A wrapper function to check if a folio needs release is provided for those
places that still need to do it in the mm/ directory.  This will acquire
additional parts to the condition in a future patch.

After this, the only remaining caller of folio_has_private() outside of mm/
is a check in fuse.

Reported-by: Rohith Surabattula 
Suggested-by: Matthew Wilcox 
Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: Linus Torvalds 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Dave Wysochanski 
cc: Dominique Martinet 
cc: Ilya Dryomov 
cc: "Theodore Ts'o" 
cc: Andreas Dilger 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-...@lists.infradead.org
cc: v9fs-develo...@lists.sourceforge.net
cc: ceph-de...@vger.kernel.org
cc: linux-...@vger.kernel.org
cc: linux-e...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---

Notes:
ver #5)
 - Rebased on linus/master.  try_to_release_page() has now been entirely
   replaced by filemap_release_folio(), barring one comment.
 - Cleaned up some pairs in ext4.

ver #4)
 - Split from fscache fix.
 - Moved folio_needs_release() to mm/internal.h and removed open-coded
   version from filemap_release_folio().

ver #3)
 - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
 - Moved a '&&' to the correct line.

ver #2)
 - Rewrote entirely according to Willy's suggestion[1].

 fs/ext4/move_extent.c | 12 
 fs/splice.c   |  3 +--
 mm/filemap.c  |  2 ++
 mm/huge_memory.c  |  3 +--
 mm/internal.h |  8 
 mm/khugepaged.c   |  3 +--
 mm/memory-failure.c   |  8 +++-
 mm/migrate.c  |  3 +--
 mm/truncate.c |  6 ++
 mm/vmscan.c   |  8 
 10 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index b5af2fc03b2f..251584a23d05 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -340,10 +340,8 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
ext4_double_up_write_data_sem(orig_inode, donor_inode);
goto data_copy;
}
-   if ((folio_has_private(folio[0]) &&
-!filemap_release_folio(folio[0], 0)) ||
-   (folio_has_private(folio[1]) &&
-!filemap_release_folio(folio[1], 0))) {
+   if (!filemap_release_folio(folio[0], 0) ||
+   !filemap_release_folio(folio[1], 0)) {
*err = -EBUSY;
goto drop_data_sem;
}
@@ -362,10 +360,8 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
 
/* At this point all buffers in range are uptodate, old mapping layout
 * is no longer required, try to drop it now. */
-   if ((folio_has_private(folio[0]) &&
-   !filemap_release_folio(folio[0], 0)) ||
-   (folio_has_private(folio[1]) &&
-   !filemap_release_folio(folio[1], 0))) {
+   if (!filemap_release_folio(folio[0], 0) ||
+   !filemap_release_folio(folio[1], 0)) {
*err = -EBUSY;
goto unlock_folios;
}
diff --git a/fs/splice.c b/fs/splice.c
index 7a9565d8ec4f..6412848891df 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -82,8 +82,7 @@ static bool page_cache_pipe_buf_try_steal(struct 
pipe_inode_info *pipe,
 */
folio_wait_writeback(folio);
 
-   if (folio_has_private(folio) &&
-   !filemap_release_folio(folio, GFP_KERNEL))
+   if (!filemap_release_folio(folio, GFP_KERNEL))
goto out_unlock;
 
/*
diff --git a/mm/filemap.c b/mm/filemap.c
index 00f01d8ead47..31d07c2f8d32 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4134,6 +4134,8 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
struct address_space * const mapping = folio->mapping;
 
BUG_ON(!folio_test_locked(folio));
+   if (!folio_needs_release(folio))
+   return true;
if (folio_test_writeback(folio))
return false;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 624671aaa60d..a14b

[Linux-cachefs] [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache

2023-06-28 Thread David Howells

Fscache has an optimisation by which reads from the cache are skipped until
we know that (a) there's data there to be read and (b) that data isn't
entirely covered by pages resident in the netfs pagecache.  This is done
with two flags manipulated by fscache_note_page_release():

if (...
test_bit(FSCACHE_COOKIE_HAVE_DATA, >flags) &&
test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, >flags))
clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, >flags);

where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
that netfslib should download from the server or clear the page instead.

The fscache_note_page_release() function is intended to be called from
->releasepage() - but that only gets called if PG_private or PG_private_2
is set - and currently the former is at the discretion of the network
filesystem and the latter is only set whilst a page is being written to the
cache, so sometimes we miss clearing the optimisation.

Fix this by following Willy's suggestion[1] and adding an address_space
flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
->release_folio() if it's set, even if PG_private or PG_private_2 aren't
set.

Note that this would require folio_test_private() and page_has_private() to
become more complicated.  To avoid that, in the places[*] where these are
used to conditionalise calls to filemap_release_folio() and
try_to_release_page(), the tests are removed the those functions just
jumped to unconditionally and the test is performed there.

[*] There are some exceptions in vmscan.c where the check guards more than
just a call to the releaser.  I've added a function, folio_needs_release()
to wrap all the checks for that.

AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
is called.

Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
and the optimisation cancelled if a cachefiles object already contains data
when we open it.

Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a 
page")
Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
Reported-by: Rohith Surabattula 
Suggested-by: Matthew Wilcox 
Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: Linus Torvalds 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Dave Wysochanski 
cc: Dominique Martinet 
cc: Ilya Dryomov 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-...@lists.infradead.org
cc: v9fs-develo...@lists.sourceforge.net
cc: ceph-de...@vger.kernel.org
cc: linux-...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---

Notes:
ver #7)
 - Make NFS set AS_RELEASE_ALWAYS.

ver #4)
 - Split out merging of folio_has_private()/filemap_release_folio() call
   pairs into a preceding patch.
 - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().

ver #3)
 - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
 - Moved a '&&' to the correct line.

ver #2)
 - Rewrote entirely according to Willy's suggestion[1].

 fs/9p/cache.c   |  2 ++
 fs/afs/internal.h   |  2 ++
 fs/cachefiles/namei.c   |  2 ++
 fs/ceph/cache.c |  2 ++
 fs/nfs/fscache.c|  3 +++
 fs/smb/client/fscache.c |  2 ++
 include/linux/pagemap.h | 16 
 mm/internal.h   |  5 -
 8 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index cebba4eaa0b5..12c0ae29f185 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -68,6 +68,8 @@ void v9fs_cache_inode_get_cookie(struct inode *inode)
   , sizeof(path),
   , sizeof(version),
   i_size_read(>netfs.inode));
+   if (v9inode->netfs.cache)
+   mapping_set_release_always(inode->i_mapping);
 
p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
 inode, v9fs_inode_cookie(v9inode));
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 9d3d64921106..da73b97e19a9 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -681,6 +681,8 @@ static inline void afs_vnode_set_cache(struct afs_vnode 
*vnode,
 {
 #ifdef CONFIG_AFS_FSCACHE
vnode->netfs.cache = cookie;
+   if (cookie)
+   mapping_set_release_always(vnode->netfs.inode.i_mapping);
 #endif
 }
 
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d9d22d0ec38a..7bf7a5fcc045 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -585,6 +585,8 @@ static bool cachefiles_open_file(struct cachefiles_object 
*object,
if (ret < 0)
goto check_failed;
 
+   clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, >cookie->flags);
+
object->file = file;

Re: [Linux-cachefs] [PATCH] cachefiles: allocate static minor for /dev/cachefiles

2023-06-22 Thread David Howells

Marcel Holtmann  wrote:

> The cachefiles misc character device uses MISC_DYNAMIC_MINOR and thus
> doesn't support module auto-loading. Assign a static minor number for it
> and provide appropriate module aliases for it. This is enough for kmod to
> create the /dev/cachefiles device node on startup and facility module
> auto-loading.

Why?  The systemd unit file for it just modprobe's the module first.  It's a
specialist device file only for use by the appropriate daemon.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH net-next v3 04/10] Move netfs_extract_iter_to_sg() to lib/scatterlist.c

2023-06-06 Thread David Howells

Move netfs_extract_iter_to_sg() to lib/scatterlist.c as it's going to be
used by more than just network filesystems (AF_ALG, for example).

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Matthew Wilcox 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/netfs/iterator.c   | 267 -
 include/linux/netfs.h |   4 -
 include/linux/uio.h   |   5 +
 lib/scatterlist.c | 269 ++
 4 files changed, 274 insertions(+), 271 deletions(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 9f09dc30ceb6..2ff07ba655a0 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -101,270 +101,3 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, 
size_t orig_len,
return npages;
 }
 EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
-
-/*
- * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class
- * iterators, and add them to the scatterlist.
- */
-static ssize_t extract_user_to_sg(struct iov_iter *iter,
- ssize_t maxsize,
- struct sg_table *sgtable,
- unsigned int sg_max,
- iov_iter_extraction_t extraction_flags)
-{
-   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
-   struct page **pages;
-   unsigned int npages;
-   ssize_t ret = 0, res;
-   size_t len, off;
-
-   /* We decant the page list into the tail of the scatterlist */
-   pages = (void *)sgtable->sgl +
-   array_size(sg_max, sizeof(struct scatterlist));
-   pages -= sg_max;
-
-   do {
-   res = iov_iter_extract_pages(iter, , maxsize, sg_max,
-extraction_flags, );
-   if (res < 0)
-   goto failed;
-
-   len = res;
-   maxsize -= len;
-   ret += len;
-   npages = DIV_ROUND_UP(off + len, PAGE_SIZE);
-   sg_max -= npages;
-
-   for (; npages > 0; npages--) {
-   struct page *page = *pages;
-   size_t seg = min_t(size_t, PAGE_SIZE - off, len);
-
-   *pages++ = NULL;
-   sg_set_page(sg, page, seg, off);
-   sgtable->nents++;
-   sg++;
-   len -= seg;
-   off = 0;
-   }
-   } while (maxsize > 0 && sg_max > 0);
-
-   return ret;
-
-failed:
-   while (sgtable->nents > sgtable->orig_nents)
-   put_page(sg_page(>sgl[--sgtable->nents]));
-   return res;
-}
-
-/*
- * Extract up to sg_max pages from a BVEC-type iterator and add them to the
- * scatterlist.  The pages are not pinned.
- */
-static ssize_t extract_bvec_to_sg(struct iov_iter *iter,
- ssize_t maxsize,
- struct sg_table *sgtable,
- unsigned int sg_max,
- iov_iter_extraction_t extraction_flags)
-{
-   const struct bio_vec *bv = iter->bvec;
-   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
-   unsigned long start = iter->iov_offset;
-   unsigned int i;
-   ssize_t ret = 0;
-
-   for (i = 0; i < iter->nr_segs; i++) {
-   size_t off, len;
-
-   len = bv[i].bv_len;
-   if (start >= len) {
-   start -= len;
-   continue;
-   }
-
-   len = min_t(size_t, maxsize, len - start);
-   off = bv[i].bv_offset + start;
-
-   sg_set_page(sg, bv[i].bv_page, len, off);
-   sgtable->nents++;
-   sg++;
-   sg_max--;
-
-   ret += len;
-   maxsize -= len;
-   if (maxsize <= 0 || sg_max == 0)
-   break;
-   start = 0;
-   }
-
-   if (ret > 0)
-   iov_iter_advance(iter, ret);
-   return ret;
-}
-
-/*
- * Extract up to sg_max pages from a KVEC-type iterator and add them to the
- * scatterlist.  This can deal with vmalloc'd buffers as well as kmalloc'd or
- * static buffers.  The pages are not pinned.
- */
-static ssize_t extract_kvec_to_sg(struct iov_iter *iter,
- ssize_t maxsize,
- struct sg_table *sgtable,
- unsigned int sg_max,
- iov_iter_extraction_t extraction_flags)
-{
-   const struc

[Linux-cachefs] [PATCH net-next v3 02/10] Fix a couple of spelling mistakes

2023-06-06 Thread David Howells

Fix a couple of spelling mistakes in a comment.

Suggested-by: Simon Horman 
Link: https://lore.kernel.org/r/zhh2msrqel4gs...@corigine.com/
Link: https://lore.kernel.org/r/zhh1nqzwogzxl...@corigine.com/
Signed-off-by: David Howells 
Reviewed-by: Simon Horman 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Matthew Wilcox 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/netfs/iterator.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index f8eba3de1a97..f41a37bca1e8 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -312,7 +312,7 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
 }
 
 /**
- * extract_iter_to_sg - Extract pages from an iterator and add ot an sglist
+ * extract_iter_to_sg - Extract pages from an iterator and add to an sglist
  * @iter: The iterator to extract from
  * @maxsize: The amount of iterator to copy
  * @sgtable: The scatterlist table to fill in
@@ -332,7 +332,7 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
  * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA
  * be allowed on the pages extracted.
  *
- * If successul, @sgtable->nents is updated to include the number of elements
+ * If successful, @sgtable->nents is updated to include the number of elements
  * added and the number of bytes added is returned.  @sgtable->orig_nents is
  * left unaltered.
  *
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH net-next v3 03/10] Wrap lines at 80

2023-06-06 Thread David Howells

Wrap a line at 80 to stop checkpatch complaining.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Matthew Wilcox 
cc: Simon Horman 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/netfs/iterator.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index f41a37bca1e8..9f09dc30ceb6 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -119,7 +119,8 @@ static ssize_t extract_user_to_sg(struct iov_iter *iter,
size_t len, off;
 
/* We decant the page list into the tail of the scatterlist */
-   pages = (void *)sgtable->sgl + array_size(sg_max, sizeof(struct 
scatterlist));
+   pages = (void *)sgtable->sgl +
+   array_size(sg_max, sizeof(struct scatterlist));
pages -= sg_max;
 
do {
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH net-next v3 01/10] Drop the netfs_ prefix from netfs_extract_iter_to_sg()

2023-06-06 Thread David Howells

Rename netfs_extract_iter_to_sg() and its auxiliary functions to drop the
netfs_ prefix.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "Matthew Wilcox (Oracle)" 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---

Notes:
ver #3)
 - Deal with fs/cifs/ moving.
 - Reimpose the ALG_MAX_PAGES limit in hash_sendmsg() for kernel iters.

ver #2:
 - Put the "netfs_" prefix removal first to shorten lines and avoid
   checkpatch 80-char warnings.

 fs/netfs/iterator.c   | 66 +++
 fs/smb/client/smb2ops.c   |  4 +--
 fs/smb/client/smbdirect.c |  2 +-
 include/linux/netfs.h |  6 ++--
 4 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 8a4c86687429..f8eba3de1a97 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -106,11 +106,11 @@ EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
  * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class
  * iterators, and add them to the scatterlist.
  */
-static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_user_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t extraction_flags)
 {
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
struct page **pages;
@@ -159,11 +159,11 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter 
*iter,
  * Extract up to sg_max pages from a BVEC-type iterator and add them to the
  * scatterlist.  The pages are not pinned.
  */
-static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_bvec_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t extraction_flags)
 {
const struct bio_vec *bv = iter->bvec;
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
@@ -205,11 +205,11 @@ static ssize_t netfs_extract_bvec_to_sg(struct iov_iter 
*iter,
  * scatterlist.  This can deal with vmalloc'd buffers as well as kmalloc'd or
  * static buffers.  The pages are not pinned.
  */
-static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_kvec_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t extraction_flags)
 {
const struct kvec *kv = iter->kvec;
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
@@ -266,11 +266,11 @@ static ssize_t netfs_extract_kvec_to_sg(struct iov_iter 
*iter,
  * Extract up to sg_max folios from an XARRAY-type iterator and add them to
  * the scatterlist.  The pages are not pinned.
  */
-static ssize_t netfs_extract_xarray_to_sg(struct iov_iter *iter,
- ssize_t maxsize,
- struct sg_table *sgtable,
- unsigned int sg_max,
- iov_iter_extraction_t 
extraction_flags)
+static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
+   ssize_t maxsize,
+   struct sg_table *sgtable,
+   unsigned int sg_max,
+   iov_iter_extraction_t extraction_flags)
 {
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
struct xarray *xa = iter->xarray;
@@ -3

[Linux-cachefs] [PATCH net-next v2 04/10] Move netfs_extract_iter_to_sg() to lib/scatterlist.c

2023-05-30 Thread David Howells

Move netfs_extract_iter_to_sg() to lib/scatterlist.c as it's going to be
used by more than just network filesystems (AF_ALG, for example).

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Matthew Wilcox 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/netfs/iterator.c   | 267 -
 include/linux/netfs.h |   4 -
 include/linux/uio.h   |   5 +
 lib/scatterlist.c | 269 ++
 4 files changed, 274 insertions(+), 271 deletions(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 9f09dc30ceb6..2ff07ba655a0 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -101,270 +101,3 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, 
size_t orig_len,
return npages;
 }
 EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
-
-/*
- * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class
- * iterators, and add them to the scatterlist.
- */
-static ssize_t extract_user_to_sg(struct iov_iter *iter,
- ssize_t maxsize,
- struct sg_table *sgtable,
- unsigned int sg_max,
- iov_iter_extraction_t extraction_flags)
-{
-   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
-   struct page **pages;
-   unsigned int npages;
-   ssize_t ret = 0, res;
-   size_t len, off;
-
-   /* We decant the page list into the tail of the scatterlist */
-   pages = (void *)sgtable->sgl +
-   array_size(sg_max, sizeof(struct scatterlist));
-   pages -= sg_max;
-
-   do {
-   res = iov_iter_extract_pages(iter, , maxsize, sg_max,
-extraction_flags, );
-   if (res < 0)
-   goto failed;
-
-   len = res;
-   maxsize -= len;
-   ret += len;
-   npages = DIV_ROUND_UP(off + len, PAGE_SIZE);
-   sg_max -= npages;
-
-   for (; npages > 0; npages--) {
-   struct page *page = *pages;
-   size_t seg = min_t(size_t, PAGE_SIZE - off, len);
-
-   *pages++ = NULL;
-   sg_set_page(sg, page, seg, off);
-   sgtable->nents++;
-   sg++;
-   len -= seg;
-   off = 0;
-   }
-   } while (maxsize > 0 && sg_max > 0);
-
-   return ret;
-
-failed:
-   while (sgtable->nents > sgtable->orig_nents)
-   put_page(sg_page(>sgl[--sgtable->nents]));
-   return res;
-}
-
-/*
- * Extract up to sg_max pages from a BVEC-type iterator and add them to the
- * scatterlist.  The pages are not pinned.
- */
-static ssize_t extract_bvec_to_sg(struct iov_iter *iter,
- ssize_t maxsize,
- struct sg_table *sgtable,
- unsigned int sg_max,
- iov_iter_extraction_t extraction_flags)
-{
-   const struct bio_vec *bv = iter->bvec;
-   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
-   unsigned long start = iter->iov_offset;
-   unsigned int i;
-   ssize_t ret = 0;
-
-   for (i = 0; i < iter->nr_segs; i++) {
-   size_t off, len;
-
-   len = bv[i].bv_len;
-   if (start >= len) {
-   start -= len;
-   continue;
-   }
-
-   len = min_t(size_t, maxsize, len - start);
-   off = bv[i].bv_offset + start;
-
-   sg_set_page(sg, bv[i].bv_page, len, off);
-   sgtable->nents++;
-   sg++;
-   sg_max--;
-
-   ret += len;
-   maxsize -= len;
-   if (maxsize <= 0 || sg_max == 0)
-   break;
-   start = 0;
-   }
-
-   if (ret > 0)
-   iov_iter_advance(iter, ret);
-   return ret;
-}
-
-/*
- * Extract up to sg_max pages from a KVEC-type iterator and add them to the
- * scatterlist.  This can deal with vmalloc'd buffers as well as kmalloc'd or
- * static buffers.  The pages are not pinned.
- */
-static ssize_t extract_kvec_to_sg(struct iov_iter *iter,
- ssize_t maxsize,
- struct sg_table *sgtable,
- unsigned int sg_max,
- iov_iter_extraction_t extraction_flags)
-{
-   const struc

[Linux-cachefs] [PATCH net-next v2 03/10] Wrap lines at 80

2023-05-30 Thread David Howells

Wrap a line at 80 to stop checkpatch complaining.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Matthew Wilcox 
cc: Simon Horman 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/netfs/iterator.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index f41a37bca1e8..9f09dc30ceb6 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -119,7 +119,8 @@ static ssize_t extract_user_to_sg(struct iov_iter *iter,
size_t len, off;
 
/* We decant the page list into the tail of the scatterlist */
-   pages = (void *)sgtable->sgl + array_size(sg_max, sizeof(struct 
scatterlist));
+   pages = (void *)sgtable->sgl +
+   array_size(sg_max, sizeof(struct scatterlist));
pages -= sg_max;
 
do {
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH net-next v2 02/10] Fix a couple of spelling mistakes

2023-05-30 Thread David Howells

Fix a couple of spelling mistakes in a comment.

Suggested-by: Simon Horman 
Link: https://lore.kernel.org/r/zhh2msrqel4gs...@corigine.com/
Link: https://lore.kernel.org/r/zhh1nqzwogzxl...@corigine.com/
Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: Matthew Wilcox 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/netfs/iterator.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index f8eba3de1a97..f41a37bca1e8 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -312,7 +312,7 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
 }
 
 /**
- * extract_iter_to_sg - Extract pages from an iterator and add ot an sglist
+ * extract_iter_to_sg - Extract pages from an iterator and add to an sglist
  * @iter: The iterator to extract from
  * @maxsize: The amount of iterator to copy
  * @sgtable: The scatterlist table to fill in
@@ -332,7 +332,7 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
  * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA
  * be allowed on the pages extracted.
  *
- * If successul, @sgtable->nents is updated to include the number of elements
+ * If successful, @sgtable->nents is updated to include the number of elements
  * added and the number of bytes added is returned.  @sgtable->orig_nents is
  * left unaltered.
  *
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH net-next v2 01/10] Drop the netfs_ prefix from netfs_extract_iter_to_sg()

2023-05-30 Thread David Howells

Rename netfs_extract_iter_to_sg() and its auxiliary functions to drop the
netfs_ prefix.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "Matthew Wilcox (Oracle)" 
cc: "David S. Miller" 
cc: Eric Dumazet 
cc: Jakub Kicinski 
cc: Paolo Abeni 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---

Notes:
ver #2:
 - Put the "netfs_" prefix removal first to shorten lines and avoid
   checkpatch 80-char warnings.

 fs/cifs/smb2ops.c |  4 +--
 fs/cifs/smbdirect.c   |  2 +-
 fs/netfs/iterator.c   | 66 +--
 include/linux/netfs.h |  6 ++--
 4 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 5065398665f1..2a0e0a7f009c 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -4334,8 +4334,8 @@ static void *smb2_get_aead_req(struct crypto_aead *tfm, 
struct smb_rqst *rqst,
}
sgtable.orig_nents = sgtable.nents;
 
-   rc = netfs_extract_iter_to_sg(iter, count, ,
- num_sgs - sgtable.nents, 0);
+   rc = extract_iter_to_sg(iter, count, ,
+   num_sgs - sgtable.nents, 0);
iov_iter_revert(iter, rc);
sgtable.orig_nents = sgtable.nents;
}
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 0362ebd4fa0f..223e17c16b60 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2227,7 +2227,7 @@ static int smbd_iter_to_mr(struct smbd_connection *info,
 
memset(sgt->sgl, 0, max_sg * sizeof(struct scatterlist));
 
-   ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), sgt, max_sg, 
0);
+   ret = extract_iter_to_sg(iter, iov_iter_count(iter), sgt, max_sg, 0);
WARN_ON(ret < 0);
if (sgt->nents > 0)
sg_mark_end(>sgl[sgt->nents - 1]);
diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 8a4c86687429..f8eba3de1a97 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -106,11 +106,11 @@ EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
  * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class
  * iterators, and add them to the scatterlist.
  */
-static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_user_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t extraction_flags)
 {
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
struct page **pages;
@@ -159,11 +159,11 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter 
*iter,
  * Extract up to sg_max pages from a BVEC-type iterator and add them to the
  * scatterlist.  The pages are not pinned.
  */
-static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_bvec_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t extraction_flags)
 {
const struct bio_vec *bv = iter->bvec;
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
@@ -205,11 +205,11 @@ static ssize_t netfs_extract_bvec_to_sg(struct iov_iter 
*iter,
  * scatterlist.  This can deal with vmalloc'd buffers as well as kmalloc'd or
  * static buffers.  The pages are not pinned.
  */
-static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_kvec_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t

Re: [Linux-cachefs] [PATCH net-next 1/8] Move netfs_extract_iter_to_sg() to lib/scatterlist.c

2023-05-28 Thread David Howells

If it comes to a respin, I'll stick in an extra patch to fix the spellings -
and if not, I'll submit the patch separately.  It shouldn't be changed in with
the movement of code to give git analysis a better chance of tracking the
movement.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH net-next 2/8] Drop the netfs_ prefix from netfs_extract_iter_to_sg()

2023-05-26 Thread David Howells

Rename netfs_extract_iter_to_sg() and its auxiliary functions to drop the
netfs_ prefix.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/cifs/smb2ops.c   |  4 +--
 fs/cifs/smbdirect.c |  2 +-
 include/linux/uio.h |  6 ++---
 lib/scatterlist.c   | 66 ++---
 4 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index a295e4c2d54e..196bc49e73b8 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -4335,8 +4335,8 @@ static void *smb2_get_aead_req(struct crypto_aead *tfm, 
struct smb_rqst *rqst,
}
sgtable.orig_nents = sgtable.nents;
 
-   rc = netfs_extract_iter_to_sg(iter, count, ,
- num_sgs - sgtable.nents, 0);
+   rc = extract_iter_to_sg(iter, count, ,
+   num_sgs - sgtable.nents, 0);
iov_iter_revert(iter, rc);
sgtable.orig_nents = sgtable.nents;
}
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 0362ebd4fa0f..223e17c16b60 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2227,7 +2227,7 @@ static int smbd_iter_to_mr(struct smbd_connection *info,
 
memset(sgt->sgl, 0, max_sg * sizeof(struct scatterlist));
 
-   ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), sgt, max_sg, 
0);
+   ret = extract_iter_to_sg(iter, iov_iter_count(iter), sgt, max_sg, 0);
WARN_ON(ret < 0);
if (sgt->nents > 0)
sg_mark_end(>sgl[sgt->nents - 1]);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 09b8b107956e..0ccb983cf645 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -434,8 +434,8 @@ static inline bool iov_iter_extract_will_pin(const struct 
iov_iter *iter)
 }
 
 struct sg_table;
-ssize_t netfs_extract_iter_to_sg(struct iov_iter *iter, size_t len,
-struct sg_table *sgtable, unsigned int sg_max,
-iov_iter_extraction_t extraction_flags);
+ssize_t extract_iter_to_sg(struct iov_iter *iter, size_t len,
+  struct sg_table *sgtable, unsigned int sg_max,
+  iov_iter_extraction_t extraction_flags);
 
 #endif
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 31ef86e6a33a..8612b9deaa7e 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1101,11 +1101,11 @@ EXPORT_SYMBOL(sg_zero_buffer);
  * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class
  * iterators, and add them to the scatterlist.
  */
-static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_user_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t extraction_flags)
 {
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
struct page **pages;
@@ -1154,11 +1154,11 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter 
*iter,
  * Extract up to sg_max pages from a BVEC-type iterator and add them to the
  * scatterlist.  The pages are not pinned.
  */
-static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
+static ssize_t extract_bvec_to_sg(struct iov_iter *iter,
+ ssize_t maxsize,
+ struct sg_table *sgtable,
+ unsigned int sg_max,
+ iov_iter_extraction_t extraction_flags)
 {
const struct bio_vec *bv = iter->bvec;
struct scatterlist *sg = sgtable->sgl + sgtable->nents;
@@ -1200,11 +1200,11 @@ static ssize_t netfs_extract_bvec_to_sg(struct iov_iter 
*iter,
  * scatterlist.  This can deal with vmalloc'd buffers as well as kmalloc'd or
  * static buffers.  The pages are not pinned.
  */
-static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-

[Linux-cachefs] [PATCH net-next 1/8] Move netfs_extract_iter_to_sg() to lib/scatterlist.c

2023-05-26 Thread David Howells

Move netfs_extract_iter_to_sg() to lib/scatterlist.c as it's going to be
used by more than just network filesystems (AF_ALG, for example).

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Jens Axboe 
cc: Herbert Xu 
cc: "David S. Miller" 
cc: linux-cry...@vger.kernel.org
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: net...@vger.kernel.org
---
 fs/netfs/iterator.c | 266 ---
 include/linux/netfs.h   |   4 -
 include/linux/scatterlist.h |   1 +
 include/linux/uio.h |   5 +
 lib/scatterlist.c   | 267 
 5 files changed, 273 insertions(+), 270 deletions(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 8a4c86687429..2ff07ba655a0 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -101,269 +101,3 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, 
size_t orig_len,
return npages;
 }
 EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
-
-/*
- * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class
- * iterators, and add them to the scatterlist.
- */
-static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
-{
-   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
-   struct page **pages;
-   unsigned int npages;
-   ssize_t ret = 0, res;
-   size_t len, off;
-
-   /* We decant the page list into the tail of the scatterlist */
-   pages = (void *)sgtable->sgl + array_size(sg_max, sizeof(struct 
scatterlist));
-   pages -= sg_max;
-
-   do {
-   res = iov_iter_extract_pages(iter, , maxsize, sg_max,
-extraction_flags, );
-   if (res < 0)
-   goto failed;
-
-   len = res;
-   maxsize -= len;
-   ret += len;
-   npages = DIV_ROUND_UP(off + len, PAGE_SIZE);
-   sg_max -= npages;
-
-   for (; npages > 0; npages--) {
-   struct page *page = *pages;
-   size_t seg = min_t(size_t, PAGE_SIZE - off, len);
-
-   *pages++ = NULL;
-   sg_set_page(sg, page, seg, off);
-   sgtable->nents++;
-   sg++;
-   len -= seg;
-   off = 0;
-   }
-   } while (maxsize > 0 && sg_max > 0);
-
-   return ret;
-
-failed:
-   while (sgtable->nents > sgtable->orig_nents)
-   put_page(sg_page(>sgl[--sgtable->nents]));
-   return res;
-}
-
-/*
- * Extract up to sg_max pages from a BVEC-type iterator and add them to the
- * scatterlist.  The pages are not pinned.
- */
-static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-   iov_iter_extraction_t extraction_flags)
-{
-   const struct bio_vec *bv = iter->bvec;
-   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
-   unsigned long start = iter->iov_offset;
-   unsigned int i;
-   ssize_t ret = 0;
-
-   for (i = 0; i < iter->nr_segs; i++) {
-   size_t off, len;
-
-   len = bv[i].bv_len;
-   if (start >= len) {
-   start -= len;
-   continue;
-   }
-
-   len = min_t(size_t, maxsize, len - start);
-   off = bv[i].bv_offset + start;
-
-   sg_set_page(sg, bv[i].bv_page, len, off);
-   sgtable->nents++;
-   sg++;
-   sg_max--;
-
-   ret += len;
-   maxsize -= len;
-   if (maxsize <= 0 || sg_max == 0)
-   break;
-   start = 0;
-   }
-
-   if (ret > 0)
-   iov_iter_advance(iter, ret);
-   return ret;
-}
-
-/*
- * Extract up to sg_max pages from a KVEC-type iterator and add them to the
- * scatterlist.  This can deal with vmalloc'd buffers as well as kmalloc'd or
- * static buffers.  The pages are not pinned.
- */
-static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter,
-   ssize_t maxsize,
-   struct sg_table *sgtable,
-   unsigned int sg_max,
-

[Linux-cachefs] [PATCH] cachefilesd: Allow the daemon to run as a non-root user

2023-05-19 Thread David Howells


Allow the daemon to run as a non-root user after opening the control device
- which will also make the kernel driver run as the same non-root user
since it borrows the daemons credentials.

This requires a fix to the cachefiles kernel driver to make it set the mode
on files in creates to 0600.

This also requires the SELinux policy to be changed so that cachefilesd can
access /etc/passwd, otherwise only numeric uids and gids can be set.

Signed-off-by: David Howells 
---
 cachefilesd.c  |   59 +++--
 cachefilesd.conf.5 |7 ++
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/cachefilesd.c b/cachefilesd.c
index 6c435f6..81bb87d 100644
--- a/cachefilesd.c
+++ b/cachefilesd.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -121,6 +122,8 @@ static unsigned long long brun, bcull, bstop, frun, fcull, 
fstop;
 static unsigned long long b_resume_threshold = ULLONG_MAX;
 static unsigned long long f_resume_threshold = 5;
 
+static uid_t daemon_uid;
+static gid_t daemon_gid;
 static const gid_t group_list[0];
 
 #define cachefd 3
@@ -489,6 +492,47 @@ int main(int argc, char *argv[])
continue;
}
 
+   /* Note UID to run as. */
+   if (memcmp(cp, "uid", 3) == 0 && isspace(cp[3])) {
+   struct passwd *pwd;
+   char *end;
+
+   for (cp += 3; isspace(*cp); cp++) {;}
+   if (!*cp)
+   cfgerror("Error parsing username/uid");
+
+   daemon_uid = strtoul(cp, , 10);
+   if (*end) {
+   pwd = getpwnam(cp);
+   if (!pwd)
+   oserror("Couldn't look up username/uid 
'%s'", cp);
+   daemon_uid = pwd->pw_uid;
+   daemon_gid = pwd->pw_gid;
+   } else {
+   daemon_gid = -1;
+   }
+   continue;
+   }
+
+   /* Note GID to run as. */
+   if (memcmp(cp, "gid", 3) == 0 && isspace(cp[3])) {
+   struct group *grp;
+   char *end;
+
+   for (cp += 3; isspace(*cp); cp++) {;}
+   if (!*cp)
+   cfgerror("Error parsing group name/gid");
+
+   daemon_gid = strtoul(cp, , 10);
+   if (*end) {
+   grp = getgrnam(cp);
+   if (!grp)
+   oserror("Couldn't look up group 
name/gid '%s'", cp);
+   daemon_gid = grp->gr_gid;
+   }
+   continue;
+   }
+
/* note the dir command */
if (memcmp(cp, "dir", 3) == 0 && isspace(cp[3])) {
char *sp;
@@ -545,13 +589,24 @@ int main(int argc, char *argv[])
if (nullfd != 1)
dup2(nullfd, 1);
 
-   for (loop = 4; loop < open_max; loop++)
-   close(loop);
+   if (close_range(4, open_max, 0) == -1) {
+   for (loop = 4; loop < open_max; loop++)
+   close(loop);
+   }
 
/* set up a connection to syslog whilst we still can (the bind command
 * will give us our own namespace with no /dev/log */
openlog("cachefilesd", LOG_PID, LOG_DAEMON);
xopenedlog = true;
+
+   if (daemon_uid || daemon_gid) {
+   info("Setting credentials");
+   if (setresgid(daemon_gid, daemon_gid, daemon_gid) < 0)
+   oserror("Unable to set GID to %d", daemon_gid);
+   if (setresuid(daemon_uid, daemon_uid, daemon_uid) < 0)
+   oserror("Unable to set UID to %d", daemon_uid);
+   }
+
info("About to bind cache");
 
/* now issue the bind command */
diff --git a/cachefilesd.conf.5 b/cachefilesd.conf.5
index b108bdc..534b8f0 100644
--- a/cachefilesd.conf.5
+++ b/cachefilesd.conf.5
@@ -35,6 +35,13 @@ access the cache.  The default is to use cachefilesd's 
security context.  Files
 will be created in the cache with the label of directory specified to the 'dir'
 command.
 .TP
+.B uid 
+.TP
+.B gid 
+Set the UID or GID that the daemon runs as to the specified ID.  The ID can be
+given as a number or as a name.  The base cache directory and all the
+directories and files under it must be owned by these IDs.
+.TP
 .B brun %
 .TP
 .B bcull %
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH] cachefilesd: Remove pointer poisoning code as it is likely to fail under ASLR

2023-05-19 Thread David Howells


The pointer checking code assumes that addresses in the range 0x6000 to
0x6fff are not going to be encountered and can thus be used to poison
dead pointers.  Unfortunately, this assumption breaks occasionally on
systems with address space layout randomisation.

Remove the poisoning and, in particular, the poison checking which will cause
the process to abort with no message as to why.

Signed-off-by: David Howells 
---
 cachefilesd.c |   25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/cachefilesd.c b/cachefilesd.c
index d4d236f..6c435f6 100644
--- a/cachefilesd.c
+++ b/cachefilesd.c
@@ -1092,7 +1092,6 @@ static void put_object(struct object *object)
 
parent = object->parent;
 
-   memset(object, 0x6d, sizeof(struct object));
free(object);
 
if (parent)
@@ -1213,7 +1212,6 @@ static void insert_into_cull_table(struct object *object)
 
/* newest object in table will be displaced by this one */
put_object(cullbuild[0]);
-   cullbuild[0] = (void *)(0x6b00 | __LINE__);
object->usage++;
 
/* place directly in first slot if second is older */
@@ -1391,7 +1389,7 @@ next:
 
if (loop == nr_in_ready_table - 1) {
/* child was oldest object */
-   cullready[--nr_in_ready_table] = (void 
*)(0x6b00 | __LINE__);
+   cullready[--nr_in_ready_table] = NULL;
put_object(child);
goto removed;
}
@@ -1400,7 +1398,7 @@ next:
memmove([loop],
[loop + 1],
(nr_in_ready_table - (loop + 1)) * 
sizeof(cullready[0]));
-   cullready[--nr_in_ready_table] = (void 
*)(0x6b00 | __LINE__);
+   cullready[--nr_in_ready_table] = NULL;
put_object(child);
goto removed;
}
@@ -1411,7 +1409,7 @@ next:
 
if (loop == nr_in_build_table - 1) {
/* child was oldest object */
-   cullbuild[--nr_in_build_table] = (void 
*)(0x6b00 | __LINE__);
+   cullbuild[--nr_in_build_table] = NULL;
put_object(child);
}
else if (loop < nr_in_build_table - 1) {
@@ -1419,7 +1417,7 @@ next:
memmove([loop],
[loop + 1],
(nr_in_build_table - (loop + 1)) * 
sizeof(cullbuild[0]));
-   cullbuild[--nr_in_build_table] = (void 
*)(0x6b00 | __LINE__);
+   cullbuild[--nr_in_build_table] = NULL;
put_object(child);
}
 
@@ -1531,10 +1529,10 @@ static void decant_cull_table(void)
 
n = copy * sizeof(cullready[0]);
memcpy(cullready, cullbuild, n);
-   memset(cullbuild, 0x6e, n);
+   memset(cullbuild, 0, n);
nr_in_ready_table = nr_in_build_table;
nr_in_build_table = 0;
-   goto check;
+   return;
}
 
/* decant some of the build table if there's space */
@@ -1542,7 +1540,7 @@ static void decant_cull_table(void)
error("Less than zero space in ready table");
space = culltable_size - nr_in_ready_table;
if (space == 0)
-   goto check;
+   return;
 
/* work out how much of the build table we can copy */
copy = avail = nr_in_build_table;
@@ -1559,16 +1557,11 @@ static void decant_cull_table(void)
nr_in_ready_table += copy;
 
memcpy([0], [leave], copy * sizeof(cullready[0]));
-   memset([leave], 0x6b, copy * sizeof(cullbuild[0]));
+   memset([leave], 0, copy * sizeof(cullbuild[0]));
nr_in_build_table = leave;
 
if (copy + leave > culltable_size)
error("Scan table exceeded (%d+%d)", copy, leave);
-
-check:
-   for (loop = 0; loop < nr_in_ready_table; loop++)
-   if (((long)cullready[loop] & 0xf000) == 0x6000)
-   abort();
 }
 
 /*/
@@ -1645,6 +1638,6 @@ static void cull_objects(void)
 
if (cullready[nr_in_ready_table - 1]->cullable) {
cull_object(cullready[nr_in_ready_table - 1]);
-   cullready[--nr_in_ready_table] = (void *)(0x6b00 | 
__LINE__);
+   cullready[--nr_in_ready_table] = NULL;
}
 }
--
Linux-cachefs mailing list
Linux-cac

[Linux-cachefs] [PATCH] cachefiles: Allow the cache to be non-root

2023-05-19 Thread David Howells


Set mode 0600 on files in the cache so that cachefilesd can run as an
unprivileged user rather than leaving the files all with 0.  Directories
are already set to 0700.

Userspace then needs to set the uid and gid before issuing the "bind"
command and the cache must've been chown'd to those IDs.

Signed-off-by: David Howells 
cc: David Howells 
cc: Jeff Layton 
cc: linux-cachefs@redhat.com
cc: linux-er...@lists.ozlabs.org
cc: linux-fsde...@vger.kernel.org
---
 fs/cachefiles/namei.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 82219a8f6084..66482c193e86 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -451,7 +451,8 @@ struct file *cachefiles_create_tmpfile(struct 
cachefiles_object *object)
 
ret = cachefiles_inject_write_error();
if (ret == 0) {
-   file = vfs_tmpfile_open(_mnt_idmap, , S_IFREG,
+   file = vfs_tmpfile_open(_mnt_idmap, ,
+   S_IFREG | 0600,
O_RDWR | O_LARGEFILE | O_DIRECT,
cache->cache_cred);
ret = PTR_ERR_OR_ZERO(file);
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [BUG] fscache writing but not reading

2023-05-19 Thread David Howells

Chris Chilvers  wrote:

> While testing the fscache performance fixes [1] that were merged into 6.4-rc1
> it appears that the caching no longer works. The client will write to the 
> cache
> but never reads.

Can you try reading from afs?  You would need to enable CONFIG_AFS_FS in your
kernel if it's not already set.

Install kafs-client and do:

systemctl enable afs.mount
md5sum /afs/openafs.org/software/openafs/1.9.1/openafs-1.9.1-doc.tar.bz2
cat /proc/fs/fscache/stats
umount /afs/openafs.org
md5sum /afs/openafs.org/software/openafs/1.9.1/openafs-1.9.1-doc.tar.bz2
cat /proc/fs/fscache/stats

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH V5 5/5] cachefiles: add restore command to recover inflight ondemand read requests

2023-04-14 Thread David Howells

Jia Zhu  wrote:

> +int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args)
> +{
> + struct cachefiles_req *req;
> +
> + XA_STATE(xas, >reqs, 0);
> +
> + if (!test_bit(CACHEFILES_ONDEMAND_MODE, >flags))
> + return -EOPNOTSUPP;
> +
> + /*
> +  * Reset the requests to CACHEFILES_REQ_NEW state, so that the
> +  * requests have been processed halfway before the crash of the
> +  * user daemon could be reprocessed after the recovery.
> +  */
> + xas_lock();
> + xas_for_each(, req, ULONG_MAX)
> + xas_set_mark(, CACHEFILES_REQ_NEW);
> + xas_unlock();
> +
> + wake_up_all(>daemon_pollwq);
> + return 0;
> +}

Should there be a check to see if this is needed?

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH V5 2/5] cachefiles: extract ondemand info field from cachefiles_object

2023-04-14 Thread David Howells

Jia Zhu  wrote:

>  #define CACHEFILES_OBJECT_STATE_FUNCS(_state, _STATE)\
>  static inline bool   
> \
>  cachefiles_ondemand_object_is_##_state(const struct cachefiles_object 
> *object) \
>  {
> \
> - return object->state == CACHEFILES_ONDEMAND_OBJSTATE_##_STATE; \
> + return object->ondemand->state == 
> CACHEFILES_ONDEMAND_OBJSTATE_##_STATE; \
>  }
> \
>   
> \
>  static inline void   
> \
>  cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \
>  {
> \
> - object->state = CACHEFILES_ONDEMAND_OBJSTATE_##_STATE; \
> + object->ondemand->state = CACHEFILES_ONDEMAND_OBJSTATE_##_STATE; \
>  }

I wonder if those need barriers - smp_load_acquire() and smp_store_release().

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH V5 4/5] cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode

2023-04-14 Thread David Howells

Jia Zhu  wrote:

>   if (cachefiles_in_ondemand_mode(cache)) {
> - if (!xa_empty(>reqs))
> - mask |= EPOLLIN;
> + if (!xa_empty(xa)) {
> + rcu_read_lock();
> + xa_for_each_marked(xa, index, req, CACHEFILES_REQ_NEW) {
> + if 
> (!cachefiles_ondemand_is_reopening_read(req)) {
> + mask |= EPOLLIN;
> + break;
> + }
> + }
> + rcu_read_unlock();

You should probably use xas_for_each_marked() instead of xa_for_each_marked()
as the former should perform better.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH] netfs: Fix netfs_extract_iter_to_sg() for ITER_UBUF/IOVEC

2023-04-12 Thread David Howells

Hi Linus,

Could you apply this, please?  It doesn't affect anything yet, but I have
patches in the works that will use it.

Thanks,
David
---
netfs: Fix netfs_extract_iter_to_sg() for ITER_UBUF/IOVEC

Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the size
of the page to the part of the page extracted, not the remaining amount of
data in the extracted page array at that point.

This doesn't yet affect anything as cifs, the only current user, only
passes in non-user-backed iterators.

Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a 
scatterlist")
Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
---
 fs/netfs/iterator.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index e9a45dea748a..8a4c86687429 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -139,7 +139,7 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter 
*iter,
size_t seg = min_t(size_t, PAGE_SIZE - off, len);
 
*pages++ = NULL;
-   sg_set_page(sg, page, len, off);
+   sg_set_page(sg, page, seg, off);
sgtable->nents++;
sg++;
len -= seg;
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH v3 01/55] netfs: Fix netfs_extract_iter_to_sg() for ITER_UBUF/IOVEC

2023-03-31 Thread David Howells

Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the size
of the page to the part of the page extracted, not the remaining amount of
data in the extracted page array at that point.

This doesn't yet affect anything as cifs, the only current user, only
passes in non-user-backed iterators.

Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a 
scatterlist")
Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
---
 fs/netfs/iterator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index e9a45dea748a..8a4c86687429 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -139,7 +139,7 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter 
*iter,
size_t seg = min_t(size_t, PAGE_SIZE - off, len);
 
*pages++ = NULL;
-   sg_set_page(sg, page, len, off);
+   sg_set_page(sg, page, seg, off);
sgtable->nents++;
sg++;
len -= seg;
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [RFC PATCH v2 01/48] netfs: Fix netfs_extract_iter_to_sg() for ITER_UBUF/IOVEC

2023-03-29 Thread David Howells

Fix netfs_extract_iter_to_sg() for ITER_UBUF and ITER_IOVEC to set the size
of the page to the part of the page extracted, not the remaining amount of
data in the extracted page array at that point.

This doesn't yet affect anything as cifs, the only current user, only
passes in non-user-backed iterators.

Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a 
scatterlist")
Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
---
 fs/netfs/iterator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index e9a45dea748a..8a4c86687429 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -139,7 +139,7 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter 
*iter,
size_t seg = min_t(size_t, PAGE_SIZE - off, len);
 
*pages++ = NULL;
-   sg_set_page(sg, page, len, off);
+   sg_set_page(sg, page, seg, off);
sgtable->nents++;
sg++;
len -= seg;
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH V4 4/5] cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode

2023-03-28 Thread David Howells

Jia Zhu  wrote:

> + if (!xa_empty(xa)) {
> + xa_lock(xa);
> + xa_for_each_marked(xa, index, req, CACHEFILES_REQ_NEW) {
> + if 
> (!cachefiles_ondemand_is_reopening_read(req)) {
> + mask |= EPOLLIN;
> + break;
> + }
> + }
> + xa_unlock(xa);
> + }

I wonder if there's a more efficient way to do this.  I guess it depends on
how many reqs you expect to get in a queue.  It might be worth taking the
rcu_read_lock before calling xa_lock() and holding it over the whole loop.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH V4 3/5] cachefiles: resend an open request if the read request's object is closed

2023-03-28 Thread David Howells

Jia Zhu  wrote:

> + struct cachefiles_object *object =
> + ((struct cachefiles_ondemand_info *)work)->object;

container_of().

> + continue;
> + } else if (cachefiles_ondemand_object_is_reopening(object)) {

The "else" is unnecessary.

> +static void ondemand_object_worker(struct work_struct *work)
> +{
> + struct cachefiles_object *object =
> + ((struct cachefiles_ondemand_info *)work)->object;
> +
> + cachefiles_ondemand_init_object(object);
> +}

I can't help but feel there's some missing exclusion/locking.  This feels like
it really ought to be driven from the fscache object state machine.

--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH V4 2/5] cachefiles: extract ondemand info field from cachefiles_object

2023-03-28 Thread David Howells

Jia Zhu  wrote:

> @@ -65,10 +71,7 @@ struct cachefiles_object {
>   enum cachefiles_content content_info:8; /* Info about content 
> presence */
>   unsigned long   flags;
>  #define CACHEFILES_OBJECT_USING_TMPFILE  0   /* Have an 
> unlinked tmpfile */
> -#ifdef CONFIG_CACHEFILES_ONDEMAND
> - int ondemand_id;
> - enum cachefiles_object_statestate;
> -#endif
> + struct cachefiles_ondemand_info *private;

Why is this no longer inside "#ifdef CONFIG_CACHEFILES_ONDEMAND"?

Also, please don't call it "private", but rather something like "ondemand" or
"ondemand_info".

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH V4 1/5] cachefiles: introduce object ondemand state

2023-03-28 Thread David Howells

Jia Zhu  wrote:

> +enum cachefiles_object_state {
> + CACHEFILES_ONDEMAND_OBJSTATE_close, /* Anonymous fd closed by daemon or 
> initial state */
> + CACHEFILES_ONDEMAND_OBJSTATE_open, /* Anonymous fd associated with 
> object is available */

That looks weird.  Maybe make them all-lowercase?

> @@ -296,6 +302,21 @@ extern void cachefiles_ondemand_clean_object(struct 
> cachefiles_object *object);
>  extern int cachefiles_ondemand_read(struct cachefiles_object *object,
>   loff_t pos, size_t len);
>  
> +#define CACHEFILES_OBJECT_STATE_FUNCS(_state)\
> +static inline bool   
> \
> +cachefiles_ondemand_object_is_##_state(const struct cachefiles_object 
> *object) \
> +{
> \
> + return object->state == CACHEFILES_ONDEMAND_OBJSTATE_##_state; \
> +}
> \
> + 
> \
> +static inline void   
> \
> +cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \
> +{
> \
> + object->state = CACHEFILES_ONDEMAND_OBJSTATE_##_state; \
> +}
> +
> +CACHEFILES_OBJECT_STATE_FUNCS(open);
> +CACHEFILES_OBJECT_STATE_FUNCS(close);

Or just get rid of the macroisation?  If there are only two states, it doesn't
save you that much and it means that "make TAGS" won't generate refs for those
functions and grep won't find them.

David
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH 2/2] iov: Fix netfs_extract_user_to_sg()

2023-02-27 Thread David Howells

Fix the loop check in netfs_extract_user_to_sg() for extraction from
user-backed iterators to do the body if npages > 0, not if npages < 0
(which it can never be).

This isn't currently used by cifs, which only ever extracts data from BVEC,
KVEC and XARRAY iterators at this level, user-backed iterators having being
decanted into BVEC iterators at a higher level to accommodate the work
being done in a kernel thread.

Found by smatch:
fs/netfs/iterator.c:139 netfs_extract_user_to_sg() warn: unsigned 
'npages' is never less than zero.

Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a 
scatterlist")
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
Signed-off-by: David Howells 
cc: Steve French 
cc: Jeff Layton 
cc: linux-c...@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/oe-kbuild-all/202302261115.p3tqi1zo-...@intel.com/
---
 fs/netfs/iterator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index f00d43b8ac0a..e9a45dea748a 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -134,7 +134,7 @@ static ssize_t netfs_extract_user_to_sg(struct iov_iter 
*iter,
npages = DIV_ROUND_UP(off + len, PAGE_SIZE);
sg_max -= npages;
 
-   for (; npages < 0; npages--) {
+   for (; npages > 0; npages--) {
struct page *page = *pages;
size_t seg = min_t(size_t, PAGE_SIZE - off, len);
 
--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

[Linux-cachefs] [PATCH 09/17] netfs: Add a function to extract an iterator into a scatterlist

2023-02-16 Thread David Howells

Provide a function for filling in a scatterlist from the list of pages
contained in an iterator.

If the iterator is UBUF- or IOBUF-type, the pages have a pin taken on them
(as FOLL_PIN).

If the iterator is BVEC-, KVEC- or XARRAY-type, no pin is taken on the
pages and it is left to the caller to manage their lifetime.  It cannot be
assumed that a ref can be validly taken, particularly in the case of a KVEC
iterator.

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
---
 fs/netfs/iterator.c   | 268 ++
 include/linux/netfs.h |   4 +
 mm/vmalloc.c  |   1 +
 3 files changed, 273 insertions(+)

diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
index 6f0d79080abc..80d7ff440cac 100644
--- a/fs/netfs/iterator.c
+++ b/fs/netfs/iterator.c
@@ -7,7 +7,9 @@
 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include "internal.h"
 
@@ -101,3 +103,269 @@ ssize_t netfs_extract_user_iter(struct iov_iter *orig, 
size_t orig_len,
return npages;
 }
 EXPORT_SYMBOL_GPL(netfs_extract_user_iter);
+
+/*
+ * Extract and pin a list of up to sg_max pages from UBUF- or IOVEC-class
+ * iterators, and add them to the scatterlist.
+ */
+static ssize_t netfs_extract_user_to_sg(struct iov_iter *iter,
+   ssize_t maxsize,
+   struct sg_table *sgtable,
+   unsigned int sg_max,
+   iov_iter_extraction_t extraction_flags)
+{
+   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
+   struct page **pages;
+   unsigned int npages;
+   ssize_t ret = 0, res;
+   size_t len, off;
+
+   /* We decant the page list into the tail of the scatterlist */
+   pages = (void *)sgtable->sgl + array_size(sg_max, sizeof(struct 
scatterlist));
+   pages -= sg_max;
+
+   do {
+   res = iov_iter_extract_pages(iter, , maxsize, sg_max,
+extraction_flags, );
+   if (res < 0)
+   goto failed;
+
+   len = res;
+   maxsize -= len;
+   ret += len;
+   npages = DIV_ROUND_UP(off + len, PAGE_SIZE);
+   sg_max -= npages;
+
+   for (; npages < 0; npages--) {
+   struct page *page = *pages;
+   size_t seg = min_t(size_t, PAGE_SIZE - off, len);
+
+   *pages++ = NULL;
+   sg_set_page(sg, page, len, off);
+   sgtable->nents++;
+   sg++;
+   len -= seg;
+   off = 0;
+   }
+   } while (maxsize > 0 && sg_max > 0);
+
+   return ret;
+
+failed:
+   while (sgtable->nents > sgtable->orig_nents)
+   put_page(sg_page(>sgl[--sgtable->nents]));
+   return res;
+}
+
+/*
+ * Extract up to sg_max pages from a BVEC-type iterator and add them to the
+ * scatterlist.  The pages are not pinned.
+ */
+static ssize_t netfs_extract_bvec_to_sg(struct iov_iter *iter,
+   ssize_t maxsize,
+   struct sg_table *sgtable,
+   unsigned int sg_max,
+   iov_iter_extraction_t extraction_flags)
+{
+   const struct bio_vec *bv = iter->bvec;
+   struct scatterlist *sg = sgtable->sgl + sgtable->nents;
+   unsigned long start = iter->iov_offset;
+   unsigned int i;
+   ssize_t ret = 0;
+
+   for (i = 0; i < iter->nr_segs; i++) {
+   size_t off, len;
+
+   len = bv[i].bv_len;
+   if (start >= len) {
+   start -= len;
+   continue;
+   }
+
+   len = min_t(size_t, maxsize, len - start);
+   off = bv[i].bv_offset + start;
+
+   sg_set_page(sg, bv[i].bv_page, len, off);
+   sgtable->nents++;
+   sg++;
+   sg_max--;
+
+   ret += len;
+   maxsize -= len;
+   if (maxsize <= 0 || sg_max == 0)
+   break;
+   start = 0;
+   }
+
+   if (ret > 0)
+   iov_iter_advance(iter, ret);
+   return ret;
+}
+
+/*
+ * Extract up to sg_max pages from a KVEC-type iterator and add them to the
+ * scatterlist.  This can deal with vmalloc'd buffers as well as kmalloc'd or
+ * static buffers.  The pages are not pinned.
+ */
+static ssize_t netfs_extract_kvec_to_sg(struct iov_iter *iter,
+   ssize_t maxsize,
+

[Linux-cachefs] [PATCH 08/17] netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator

2023-02-16 Thread David Howells

Add a function to extract the pages from a user-space supplied iterator
(UBUF- or IOVEC-type) into a BVEC-type iterator, retaining the pages by
getting a pin on them (as FOLL_PIN) as we go.

This is useful in three situations:

 (1) A userspace thread may have a sibling that unmaps or remaps the
 process's VM during the operation, changing the assignment of the
 pages and potentially causing an error.  Retaining the pages keeps
 some pages around, even if this occurs; futher, we find out at the
 point of extraction if EFAULT is going to be incurred.

 (2) Pages might get swapped out/discarded if not retained, so we want to
 retain them to avoid the reload causing a deadlock due to a DIO
 from/to an mmapped region on the same file.

 (3) The iterator may get passed to sendmsg() by the filesystem.  If a
 fault occurs, we may get a short write to a TCP stream that's then
 tricky to recover from.

We don't deal with other types of iterator here, leaving it to other
mechanisms to retain the pages (eg. PG_locked, PG_writeback and the pipe
lock).

Signed-off-by: David Howells 
cc: Jeff Layton 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
---
 fs/netfs/Makefile |   1 +
 fs/netfs/iterator.c   | 103 ++
 include/linux/netfs.h |   4 ++
 3 files changed, 108 insertions(+)
 create mode 100644 fs/netfs/iterator.c

diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index f684c0cd1ec5..386d6fb92793 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -3,6 +3,7 @@
 netfs-y := \
buffered_read.o \
io.o \
+   iterator.o \
main.o \
objects.o
 
diff --git a/fs/netfs/iterator.c b/fs/netfs/iterator.c
new file mode 100644
index ..6f0d79080abc
--- /dev/null
+++ b/fs/netfs/iterator.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Iterator helpers.
+ *
+ * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+/**
+ * netfs_extract_user_iter - Extract the pages from a user iterator into a bvec
+ * @orig: The original iterator
+ * @orig_len: The amount of iterator to copy
+ * @new: The iterator to be set up
+ * @extraction_flags: Flags to qualify the request
+ *
+ * Extract the page fragments from the given amount of the source iterator and
+ * build up a second iterator that refers to all of those bits.  This allows
+ * the original iterator to disposed of.
+ *
+ * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer 
DMA be
+ * allowed on the pages extracted.
+ *
+ * On success, the number of elements in the bvec is returned, the original
+ * iterator will have been advanced by the amount extracted.
+ *
+ * The iov_iter_extract_mode() function should be used to query how cleanup
+ * should be performed.
+ */
+ssize_t netfs_extract_user_iter(struct iov_iter *orig, size_t orig_len,
+   struct iov_iter *new,
+   iov_iter_extraction_t extraction_flags)
+{
+   struct bio_vec *bv = NULL;
+   struct page **pages;
+   unsigned int cur_npages;
+   unsigned int max_pages;
+   unsigned int npages = 0;
+   unsigned int i;
+   ssize_t ret;
+   size_t count = orig_len, offset, len;
+   size_t bv_size, pg_size;
+
+   if (WARN_ON_ONCE(!iter_is_ubuf(orig) && !iter_is_iovec(orig)))
+   return -EIO;
+
+   max_pages = iov_iter_npages(orig, INT_MAX);
+   bv_size = array_size(max_pages, sizeof(*bv));
+   bv = kvmalloc(bv_size, GFP_KERNEL);
+   if (!bv)
+   return -ENOMEM;
+
+   /* Put the page list at the end of the bvec list storage.  bvec
+* elements are larger than page pointers, so as long as we work
+* 0->last, we should be fine.
+*/
+   pg_size = array_size(max_pages, sizeof(*pages));
+   pages = (void *)bv + bv_size - pg_size;
+
+   while (count && npages < max_pages) {
+   ret = iov_iter_extract_pages(orig, , count,
+max_pages - npages, 
extraction_flags,
+);
+   if (ret < 0) {
+   pr_err("Couldn't get user pages (rc=%zd)\n", ret);
+   break;
+   }
+
+   if (ret > count) {
+   pr_err("get_pages rc=%zd more than %zu\n", ret, count);
+   break;
+   }
+
+   count -= ret;
+   ret += offset;
+   cur_npages = DIV_ROUND_UP(ret, PAGE_SIZE);
+
+   if (npages + cur_npages > max_pages) {
+   pr_err("Out of bvec array capacity

[Linux-cachefs] [PATCH v6 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache

2023-02-16 Thread David Howells

Fscache has an optimisation by which reads from the cache are skipped until
we know that (a) there's data there to be read and (b) that data isn't
entirely covered by pages resident in the netfs pagecache.  This is done
with two flags manipulated by fscache_note_page_release():

if (...
test_bit(FSCACHE_COOKIE_HAVE_DATA, >flags) &&
test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, >flags))
clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, >flags);

where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
that netfslib should download from the server or clear the page instead.

The fscache_note_page_release() function is intended to be called from
->releasepage() - but that only gets called if PG_private or PG_private_2
is set - and currently the former is at the discretion of the network
filesystem and the latter is only set whilst a page is being written to the
cache, so sometimes we miss clearing the optimisation.

Fix this by following Willy's suggestion[1] and adding an address_space
flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
->release_folio() if it's set, even if PG_private or PG_private_2 aren't
set.

Note that this would require folio_test_private() and page_has_private() to
become more complicated.  To avoid that, in the places[*] where these are
used to conditionalise calls to filemap_release_folio() and
try_to_release_page(), the tests are removed the those functions just
jumped to unconditionally and the test is performed there.

[*] There are some exceptions in vmscan.c where the check guards more than
just a call to the releaser.  I've added a function, folio_needs_release()
to wrap all the checks for that.

AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
is called.

Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
and the optimisation cancelled if a cachefiles object already contains data
when we open it.

Reported-by: Rohith Surabattula 
Suggested-by: Matthew Wilcox 
Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: Linus Torvalds 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Dave Wysochanski 
cc: Dominique Martinet 
cc: Ilya Dryomov 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-...@lists.infradead.org
cc: v9fs-develo...@lists.sourceforge.net
cc: ceph-de...@vger.kernel.org
cc: linux-...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---

Notes:
ver #4)
 - Split out merging of folio_has_private()/filemap_release_folio() call
   pairs into a preceding patch.
 - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().

ver #3)
 - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
 - Moved a '&&' to the correct line.

ver #2)
 - Rewrote entirely according to Willy's suggestion[1].

 fs/9p/cache.c   |  2 ++
 fs/afs/internal.h   |  2 ++
 fs/cachefiles/namei.c   |  2 ++
 fs/ceph/cache.c |  2 ++
 fs/cifs/fscache.c   |  2 ++
 include/linux/pagemap.h | 16 
 mm/internal.h   |  5 -
 7 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index cebba4eaa0b5..12c0ae29f185 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -68,6 +68,8 @@ void v9fs_cache_inode_get_cookie(struct inode *inode)
   , sizeof(path),
   , sizeof(version),
   i_size_read(>netfs.inode));
+   if (v9inode->netfs.cache)
+   mapping_set_release_always(inode->i_mapping);
 
p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
 inode, v9fs_inode_cookie(v9inode));
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index fd8567b98e2b..2d7e06fcb77f 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -680,6 +680,8 @@ static inline void afs_vnode_set_cache(struct afs_vnode 
*vnode,
 {
 #ifdef CONFIG_AFS_FSCACHE
vnode->netfs.cache = cookie;
+   if (cookie)
+   mapping_set_release_always(vnode->netfs.inode.i_mapping);
 #endif
 }
 
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 03ca8f2f657a..50b2ee163af6 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -584,6 +584,8 @@ static bool cachefiles_open_file(struct cachefiles_object 
*object,
if (ret < 0)
goto check_failed;
 
+   clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, >cookie->flags);
+
object->file = file;
 
/* Always update the atime on an object we've just looked up (this is
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 177d8e8d73fe..de1dee46d3df 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@

[Linux-cachefs] [PATCH v6 1/2] mm: Merge folio_has_private()/filemap_release_folio() call pairs

2023-02-16 Thread David Howells

Make filemap_release_folio() check folio_has_private().  Then, in most
cases, where a call to folio_has_private() is immediately followed by a
call to filemap_release_folio(), we can get rid of the test in the pair.

There are a couple of sites in mm/vscan.c that this can't so easily be
done.  In shrink_folio_list(), there are actually three cases (something
different is done for incompletely invalidated buffers), but
filemap_release_folio() elides two of them.

In shrink_active_list(), we don't have have the folio lock yet, so the
check allows us to avoid locking the page unnecessarily.

A wrapper function to check if a folio needs release is provided for those
places that still need to do it in the mm/ directory.  This will acquire
additional parts to the condition in a future patch.

After this, the only remaining caller of folio_has_private() outside of mm/
is a check in fuse.

Reported-by: Rohith Surabattula 
Suggested-by: Matthew Wilcox 
Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: Linus Torvalds 
cc: Steve French 
cc: Shyam Prasad N 
cc: Rohith Surabattula 
cc: Dave Wysochanski 
cc: Dominique Martinet 
cc: Ilya Dryomov 
cc: "Theodore Ts'o" 
cc: Andreas Dilger 
cc: linux-cachefs@redhat.com
cc: linux-c...@vger.kernel.org
cc: linux-...@lists.infradead.org
cc: v9fs-develo...@lists.sourceforge.net
cc: ceph-de...@vger.kernel.org
cc: linux-...@vger.kernel.org
cc: linux-e...@vger.kernel.org
cc: linux-fsde...@vger.kernel.org
cc: linux...@kvack.org
---

Notes:
ver #5)
 - Rebased on linus/master.  try_to_release_page() has now been entirely
   replaced by filemap_release_folio(), barring one comment.
 - Cleaned up some pairs in ext4.

ver #4)
 - Split from fscache fix.
 - Moved folio_needs_release() to mm/internal.h and removed open-coded
   version from filemap_release_folio().

ver #3)
 - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
 - Moved a '&&' to the correct line.

ver #2)
 - Rewrote entirely according to Willy's suggestion[1].

 fs/ext4/move_extent.c | 12 
 fs/splice.c   |  3 +--
 mm/filemap.c  |  2 ++
 mm/huge_memory.c  |  3 +--
 mm/internal.h |  8 
 mm/khugepaged.c   |  3 +--
 mm/memory-failure.c   |  8 +++-
 mm/migrate.c  |  3 +--
 mm/truncate.c |  6 ++
 mm/vmscan.c   |  8 
 10 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 8dbb87edf24c..dedc9d445f24 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -339,10 +339,8 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
ext4_double_up_write_data_sem(orig_inode, donor_inode);
goto data_copy;
}
-   if ((folio_has_private(folio[0]) &&
-!filemap_release_folio(folio[0], 0)) ||
-   (folio_has_private(folio[1]) &&
-!filemap_release_folio(folio[1], 0))) {
+   if (!filemap_release_folio(folio[0], 0) ||
+   !filemap_release_folio(folio[1], 0)) {
*err = -EBUSY;
goto drop_data_sem;
}
@@ -361,10 +359,8 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
 
/* At this point all buffers in range are uptodate, old mapping layout
 * is no longer required, try to drop it now. */
-   if ((folio_has_private(folio[0]) &&
-   !filemap_release_folio(folio[0], 0)) ||
-   (folio_has_private(folio[1]) &&
-   !filemap_release_folio(folio[1], 0))) {
+   if (!filemap_release_folio(folio[0], 0) ||
+   !filemap_release_folio(folio[1], 0)) {
*err = -EBUSY;
goto unlock_folios;
}
diff --git a/fs/splice.c b/fs/splice.c
index 5969b7a1d353..e69eddaf9d7c 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -65,8 +65,7 @@ static bool page_cache_pipe_buf_try_steal(struct 
pipe_inode_info *pipe,
 */
folio_wait_writeback(folio);
 
-   if (folio_has_private(folio) &&
-   !filemap_release_folio(folio, GFP_KERNEL))
+   if (!filemap_release_folio(folio, GFP_KERNEL))
goto out_unlock;
 
/*
diff --git a/mm/filemap.c b/mm/filemap.c
index c4d4ace9cc70..344146c170b0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3960,6 +3960,8 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
struct address_space * const mapping = folio->mapping;
 
BUG_ON(!folio_test_locked(folio));
+   if (!folio_needs_release(folio))
+   return true;
if (folio_test_writeback(folio))
return false;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index abe6cfd92ffa..8490

[Linux-cachefs] [PATCH v6 0/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache

2023-02-16 Thread David Howells

Hi Willy,

Is this okay by you?  You said you wanted to look at the remaining uses of
page_has_private(), of which there are then three after these patches, not
counting folio_has_private():

arch/s390/kernel/uv.c:  if (page_has_private(page))
mm/khugepaged.c:1 + page_mapcount(page) + 
page_has_private(page)) {
mm/migrate_device.c:extra += 1 + page_has_private(page);

--
I've split the folio_has_private()/filemap_release_folio() call pair
merging into its own patch, separate from the actual bugfix and pulled out
the folio_needs_release() function into mm/internal.h and made
filemap_release_folio() use it.  I've also got rid of the bit clearances
from the network filesystem evict_inode functions as they doesn't seem to
be necessary.

Note that the last vestiges of try_to_release_page() got swept away, so I
rebased and dealt with that.  One comment remained, which is removed by the
first patch.

David

Changes:

ver #6)
 - Drop the third patch which removes a duplicate check in vmscan().

ver #5)
 - Rebased on linus/master.  try_to_release_page() has now been entirely
   replaced by filemap_release_folio(), barring one comment.
 - Cleaned up some pairs in ext4.

ver #4)
 - Split has_private/release call pairs into own patch.
 - Moved folio_needs_release() to mm/internal.h and removed open-coded
   version from filemap_release_folio().
 - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
 - Added experimental patch to reduce shrink_folio_list().

ver #3)
 - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
 - Moved a '&&' to the correct line.

ver #2)
 - Rewrote entirely according to Willy's suggestion[1].

Link: https://lore.kernel.org/r/Yk9V/03wgdyi6...@casper.infradead.org/ [1]
Link: 
https://lore.kernel.org/r/164928630577.457102.8519251179327601178.st...@warthog.procyon.org.uk/
 # v1
Link: 
https://lore.kernel.org/r/166844174069.1124521.10890506360974169994.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/166869495238.3720468.4878151409085146764.st...@warthog.procyon.org.uk/
 # v3
Link: https://lore.kernel.org/r/1459152.1669208...@warthog.procyon.org.uk/ # v3 
also
Link: 
https://lore.kernel.org/r/166924370539.1772793.13730698360771821317.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/167172131368.2334525.8569808925687731937.st...@warthog.procyon.org.uk/
 # v5
---
%(shortlog)s
%(diffstat)s

David Howells (2):
  mm: Merge folio_has_private()/filemap_release_folio() call pairs
  mm, netfs, fscache: Stop read optimisation when folio removed from
pagecache

 fs/9p/cache.c   |  2 ++
 fs/afs/internal.h   |  2 ++
 fs/cachefiles/namei.c   |  2 ++
 fs/ceph/cache.c |  2 ++
 fs/cifs/fscache.c   |  2 ++
 fs/ext4/move_extent.c   | 12 
 fs/splice.c |  3 +--
 include/linux/pagemap.h | 16 
 mm/filemap.c|  2 ++
 mm/huge_memory.c|  3 +--
 mm/internal.h   | 11 +++
 mm/khugepaged.c |  3 +--
 mm/memory-failure.c |  8 +++-
 mm/migrate.c|  3 +--
 mm/truncate.c   |  6 ++
 mm/vmscan.c |  8 
 16 files changed, 56 insertions(+), 29 deletions(-)

--
Linux-cachefs mailing list
Linux-cachefs@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-cachefs

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1520 matches

Mail list logo