Re: [PATCH 3/6] fs: Convert block_read_full_page to be synchronous

2020-10-23 Thread Matthew Wilcox
On Fri, Oct 23, 2020 at 09:13:35AM -0700, Eric Biggers wrote:
> On Fri, Oct 23, 2020 at 02:21:38PM +0100, Matthew Wilcox wrote:
> > I wonder about allocating bios that can accommodate more bvecs.  Not sure
> > how often filesystems have adjacent blocks which go into non-adjacent
> > sub-page blocks.  It's certainly possible that a filesystem might have
> > a page consisting of DDhh ('D' for Data, 'h' for hole), but how
> > likely is it to have written the two data chunks next to each other?
> > Maybe with O_SYNC?
> 
> I think that's a rare case that's not very important to optimize.  And there's
> already a lot of code where filesystems *could* submit a single bio in that 
> case
> but don't.  For example, both fs/direct-io.c and fs/iomap/direct-io.c only
> submit bios that contain logically contiguous data.

True.  iomap/buffered-io.c will do it though.

> If you do implement this optimization, note that it wouldn't work when a
> bio_crypt_ctx is set, since the data must be logically contiguous in that 
> case.
> To handle that you'd need to call fscrypt_mergeable_bio_bh() when adding each
> block, and submit the bio if it returns false.  (In contrast, with your 
> current
> proposal, calling fscrypt_mergeable_bio_bh() isn't necessary because each bio
> only contains logically contiguous data within one page.)

Oh, that's disappointing.  I had assumed that you'd set up the dun for
the logical block corresponding to the start of the page and then you'd
be able to decrypt any range in the page.


Re: [PATCH 3/6] fs: Convert block_read_full_page to be synchronous

2020-10-23 Thread Eric Biggers
On Fri, Oct 23, 2020 at 02:21:38PM +0100, Matthew Wilcox wrote:
> > 
> > The following is needed to set the bio encryption context for the
> > '-o inlinecrypt' case on ext4:
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 95c338e2b99c..546a08c5003b 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -2237,6 +2237,7 @@ static int readpage_submit_bhs(struct page *page, 
> > struct blk_completion *cmpl,
> > submit_bio(bio);
> > }
> > bio = bio_alloc(GFP_NOIO, 1);
> > +   fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
> > bio_set_dev(bio, bh->b_bdev);
> > bio->bi_iter.bi_sector = sector;
> > bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
> 
> Thanks!  I saw that and had every intention of copying it across.
> And then I forgot.  I'll add that.  I'm also going to do:
> 
> -   __bio_try_merge_page(bio, bh->b_page, bh->b_size,
> -   bh_offset(bh), &same_page))
> +   bio_add_page(bio, bh->b_page, bh->b_size,
> +   bh_offset(bh)))
> 
> I wonder about allocating bios that can accommodate more bvecs.  Not sure
> how often filesystems have adjacent blocks which go into non-adjacent
> sub-page blocks.  It's certainly possible that a filesystem might have
> a page consisting of DDhh ('D' for Data, 'h' for hole), but how
> likely is it to have written the two data chunks next to each other?
> Maybe with O_SYNC?
> 

I think that's a rare case that's not very important to optimize.  And there's
already a lot of code where filesystems *could* submit a single bio in that case
but don't.  For example, both fs/direct-io.c and fs/iomap/direct-io.c only
submit bios that contain logically contiguous data.

If you do implement this optimization, note that it wouldn't work when a
bio_crypt_ctx is set, since the data must be logically contiguous in that case.
To handle that you'd need to call fscrypt_mergeable_bio_bh() when adding each
block, and submit the bio if it returns false.  (In contrast, with your current
proposal, calling fscrypt_mergeable_bio_bh() isn't necessary because each bio
only contains logically contiguous data within one page.)

- Eric


Re: [PATCH 3/6] fs: Convert block_read_full_page to be synchronous

2020-10-23 Thread Matthew Wilcox
On Thu, Oct 22, 2020 at 04:40:11PM -0700, Eric Biggers wrote:
> On Thu, Oct 22, 2020 at 10:22:25PM +0100, Matthew Wilcox (Oracle) wrote:
> > +static int readpage_submit_bhs(struct page *page, struct blk_completion 
> > *cmpl,
> > +   unsigned int nr, struct buffer_head **bhs)
> > +{
> > +   struct bio *bio = NULL;
> > +   unsigned int i;
> > +   int err;
> > +
> > +   blk_completion_init(cmpl, nr);
> > +
> > +   for (i = 0; i < nr; i++) {
> > +   struct buffer_head *bh = bhs[i];
> > +   sector_t sector = bh->b_blocknr * (bh->b_size >> 9);
> > +   bool same_page;
> > +
> > +   if (buffer_uptodate(bh)) {
> > +   end_buffer_async_read(bh, 1);
> > +   blk_completion_sub(cmpl, BLK_STS_OK, 1);
> > +   continue;
> > +   }
> > +   if (bio) {
> > +   if (bio_end_sector(bio) == sector &&
> > +   __bio_try_merge_page(bio, bh->b_page, bh->b_size,
> > +   bh_offset(bh), &same_page))
> > +   continue;
> > +   submit_bio(bio);
> > +   }
> > +   bio = bio_alloc(GFP_NOIO, 1);
> > +   bio_set_dev(bio, bh->b_bdev);
> > +   bio->bi_iter.bi_sector = sector;
> > +   bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
> > +   bio->bi_end_io = readpage_end_bio;
> > +   bio->bi_private = cmpl;
> > +   /* Take care of bh's that straddle the end of the device */
> > +   guard_bio_eod(bio);
> > +   }
> 
> The following is needed to set the bio encryption context for the
> '-o inlinecrypt' case on ext4:
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 95c338e2b99c..546a08c5003b 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2237,6 +2237,7 @@ static int readpage_submit_bhs(struct page *page, 
> struct blk_completion *cmpl,
>   submit_bio(bio);
>   }
>   bio = bio_alloc(GFP_NOIO, 1);
> + fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
>   bio_set_dev(bio, bh->b_bdev);
>   bio->bi_iter.bi_sector = sector;
>   bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));

Thanks!  I saw that and had every intention of copying it across.
And then I forgot.  I'll add that.  I'm also going to do:

-   __bio_try_merge_page(bio, bh->b_page, bh->b_size,
-   bh_offset(bh), &same_page))
+   bio_add_page(bio, bh->b_page, bh->b_size,
+   bh_offset(bh)))

I wonder about allocating bios that can accommodate more bvecs.  Not sure
how often filesystems have adjacent blocks which go into non-adjacent
sub-page blocks.  It's certainly possible that a filesystem might have
a page consisting of DDhh ('D' for Data, 'h' for hole), but how
likely is it to have written the two data chunks next to each other?
Maybe with O_SYNC?

Anyway, this patchset needs some more thought because I've just seen
the path from mpage_readahead() to block_read_full_page() that should
definitely not be synchronous.


Re: [PATCH 3/6] fs: Convert block_read_full_page to be synchronous

2020-10-22 Thread Eric Biggers
On Thu, Oct 22, 2020 at 10:22:25PM +0100, Matthew Wilcox (Oracle) wrote:
> +static int readpage_submit_bhs(struct page *page, struct blk_completion 
> *cmpl,
> + unsigned int nr, struct buffer_head **bhs)
> +{
> + struct bio *bio = NULL;
> + unsigned int i;
> + int err;
> +
> + blk_completion_init(cmpl, nr);
> +
> + for (i = 0; i < nr; i++) {
> + struct buffer_head *bh = bhs[i];
> + sector_t sector = bh->b_blocknr * (bh->b_size >> 9);
> + bool same_page;
> +
> + if (buffer_uptodate(bh)) {
> + end_buffer_async_read(bh, 1);
> + blk_completion_sub(cmpl, BLK_STS_OK, 1);
> + continue;
> + }
> + if (bio) {
> + if (bio_end_sector(bio) == sector &&
> + __bio_try_merge_page(bio, bh->b_page, bh->b_size,
> + bh_offset(bh), &same_page))
> + continue;
> + submit_bio(bio);
> + }
> + bio = bio_alloc(GFP_NOIO, 1);
> + bio_set_dev(bio, bh->b_bdev);
> + bio->bi_iter.bi_sector = sector;
> + bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
> + bio->bi_end_io = readpage_end_bio;
> + bio->bi_private = cmpl;
> + /* Take care of bh's that straddle the end of the device */
> + guard_bio_eod(bio);
> + }

The following is needed to set the bio encryption context for the
'-o inlinecrypt' case on ext4:

diff --git a/fs/buffer.c b/fs/buffer.c
index 95c338e2b99c..546a08c5003b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2237,6 +2237,7 @@ static int readpage_submit_bhs(struct page *page, struct 
blk_completion *cmpl,
submit_bio(bio);
}
bio = bio_alloc(GFP_NOIO, 1);
+   fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
bio_set_dev(bio, bh->b_bdev);
bio->bi_iter.bi_sector = sector;
bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));


Re: [PATCH 3/6] fs: Convert block_read_full_page to be synchronous

2020-10-22 Thread Eric Biggers
On Thu, Oct 22, 2020 at 10:22:25PM +0100, Matthew Wilcox (Oracle) wrote:
> Use the new blk_completion infrastructure to wait for multiple I/Os.
> Also coalesce adjacent buffer heads into a single BIO instead of
> submitting one BIO per buffer head.  This doesn't work for fscrypt yet,
> so keep the old code around for now.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  fs/buffer.c | 90 +
>  1 file changed, 90 insertions(+)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 1b0ba1d59966..ccb90081117c 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2249,6 +2249,87 @@ int block_is_partially_uptodate(struct page *page, 
> unsigned long from,
>  }
>  EXPORT_SYMBOL(block_is_partially_uptodate);
>  
> +static void readpage_end_bio(struct bio *bio)
> +{
> + struct bio_vec *bvec;
> + struct page *page;
> + struct buffer_head *bh;
> + int i, nr = 0;
> +
> + bio_for_each_bvec_all(bvec, bio, i) {

Shouldn't this technically be bio_for_each_segment_all()?  This wants to iterate
over the pages, not the bvecs -- and in general, each bvec might contain
multiple pages.

Now, in this case, each bio has only 1 page and 1 bvec, so it doesn't really
matter.  But if we're going to use an iterator, it seems we should use the right
kind.

Likewise in decrypt_bio() in patch 6.

- Eric