Re: [Cluster-devel] [PATCH V12 16/20] block: enable multipage bvecs
On Mon, Nov 26, 2018 at 01:58:42PM +0100, Christoph Hellwig wrote: > > + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) + > > + bv->bv_offset + bv->bv_len; > > The name is a little confusing, as the real end addr would be -1. Maybe > throw the -1 in here, and adjust for it in the contigous check below? Yeah, it makes sense. thanks, Ming
Re: [Cluster-devel] [PATCH] gfs2: get rid of potential double free
On Tue, Nov 27, 2018 at 12:19:48AM +0100, Andreas Gruenbacher wrote: > Pan, > > On Mon, 26 Nov 2018 at 09:44, Pan Bian wrote: > > > __gfs2_set_acl will not increase the reference count of acl if it fails. > > In this case, posix_acl_release are called twice, one after > > __gfs2_set_acl and one in the error handling block. This patch does not > > release acl when __gfs2_set_acl fails. > > > > thanks for the bug report and patch; I think this should be fixed more > clearly as in the patch I've posted earlier today. I have seen the patch. It is really much clearer. Thank you! Pan > > Andreas
Re: [Cluster-devel] [PATCH] gfs2: get rid of potential double free
Pan, On Mon, 26 Nov 2018 at 09:44, Pan Bian wrote: > __gfs2_set_acl will not increase the reference count of acl if it fails. > In this case, posix_acl_release are called twice, one after > __gfs2_set_acl and one in the error handling block. This patch does not > release acl when __gfs2_set_acl fails. > thanks for the bug report and patch; I think this should be fixed more clearly as in the patch I've posted earlier today. Andreas
Re: [Cluster-devel] [PATCH V12 16/20] block: enable multipage bvecs
On Mon, Nov 26, 2018 at 10:17:16AM +0800, Ming Lei wrote: > This patch pulls the trigger for multi-page bvecs. Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > block/bio.c | 22 +++--- > fs/iomap.c | 4 ++-- > fs/xfs/xfs_aops.c | 4 ++-- > include/linux/bio.h | 2 +- > 4 files changed, 20 insertions(+), 12 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 75fde30af51f..8bf9338d4783 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -753,6 +753,8 @@ EXPORT_SYMBOL(bio_add_pc_page); > * @page: page to add > * @len: length of the data to add > * @off: offset of the data in @page > + * @same_page: if %true only merge if the new data is in the same physical > + * page as the last segment of the bio. > * > * Try to add the data at @page + @off to the last bvec of @bio. This is a > * a useful optimisation for file systems with a block size smaller than the > @@ -761,19 +763,25 @@ EXPORT_SYMBOL(bio_add_pc_page); > * Return %true on success or %false on failure. > */ > bool __bio_try_merge_page(struct bio *bio, struct page *page, > - unsigned int len, unsigned int off) > + unsigned int len, unsigned int off, bool same_page) > { > if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) > return false; > > if (bio->bi_vcnt > 0) { > struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1]; > + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) + > + bv->bv_offset + bv->bv_len; > + phys_addr_t page_addr = page_to_phys(page); > > - if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) { > - bv->bv_len += len; > - bio->bi_iter.bi_size += len; > - return true; > - } > + if (vec_end_addr != page_addr + off) > + return false; > + if (same_page && ((vec_end_addr - 1) & PAGE_MASK) != page_addr) > + return false; > + > + bv->bv_len += len; > + bio->bi_iter.bi_size += len; > + return true; > } > return false; > } > @@ -819,7 +827,7 @@ EXPORT_SYMBOL_GPL(__bio_add_page); > int bio_add_page(struct bio *bio, struct page *page, >unsigned int len, unsigned int offset) > { > - if (!__bio_try_merge_page(bio, page, len, offset)) { > + if (!__bio_try_merge_page(bio, page, len, offset, false)) { > if (bio_full(bio)) > return 0; > __bio_add_page(bio, page, len, offset); > diff --git a/fs/iomap.c b/fs/iomap.c > index 1f648d098a3b..ec5527b0fba4 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -313,7 +313,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, > loff_t length, void *data, >*/ > sector = iomap_sector(iomap, pos); > if (ctx->bio && bio_end_sector(ctx->bio) == sector) { > - if (__bio_try_merge_page(ctx->bio, page, plen, poff)) > + if (__bio_try_merge_page(ctx->bio, page, plen, poff, true)) > goto done; > is_contig = true; > } > @@ -344,7 +344,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, > loff_t length, void *data, > ctx->bio->bi_end_io = iomap_read_end_io; > } > > - __bio_add_page(ctx->bio, page, plen, poff); > + bio_add_page(ctx->bio, page, plen, poff); > done: > /* >* Move the caller beyond our range so that it keeps making progress. > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 1f1829e506e8..b9fd44168f61 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -616,12 +616,12 @@ xfs_add_to_ioend( > bdev, sector); > } > > - if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) { > + if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff, true)) { > if (iop) > atomic_inc(>write_count); > if (bio_full(wpc->ioend->io_bio)) > xfs_chain_bio(wpc->ioend, wbc, bdev, sector); > - __bio_add_page(wpc->ioend->io_bio, page, len, poff); > + bio_add_page(wpc->ioend->io_bio, page, len, poff); > } > > wpc->ioend->io_size += len; > diff --git a/include/linux/bio.h b/include/linux/bio.h > index c35997dd02c2..5505f74aef8b 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -441,7 +441,7 @@ extern int bio_add_page(struct bio *, struct page *, > unsigned int,unsigned int); > extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page > *, > unsigned int, unsigned int); > bool __bio_try_merge_page(struct bio *bio, struct page *page, > - unsigned int len, unsigned int off); > + unsigned int len, unsigned int off, bool same_page); > void
Re: [Cluster-devel] [PATCH V12 02/20] btrfs: look at bi_size for repair decisions
On Mon, Nov 26, 2018 at 10:17:02AM +0800, Ming Lei wrote: > From: Christoph Hellwig > > bio_readpage_error currently uses bi_vcnt to decide if it is worth > retrying an I/O. But the vector count is mostly an implementation > artifact - it really should figure out if there is more than a > single sector worth retrying. Use bi_size for that and shift by > PAGE_SHIFT. This really should be blocks/sectors, but given that > btrfs doesn't support a sector size different from the PAGE_SIZE > using the page size keeps the changes to a minimum. > > Reviewed-by: David Sterba Reviewed-by: Omar Sandoval > Signed-off-by: Christoph Hellwig > --- > fs/btrfs/extent_io.c | 2 +- > include/linux/bio.h | 6 -- > 2 files changed, 1 insertion(+), 7 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 15fd46582bb2..40751e86a2a9 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2368,7 +2368,7 @@ static int bio_readpage_error(struct bio *failed_bio, > u64 phy_offset, > int read_mode = 0; > blk_status_t status; > int ret; > - unsigned failed_bio_pages = bio_pages_all(failed_bio); > + unsigned failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT; > > BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 056fb627edb3..6f6bc331a5d1 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -263,12 +263,6 @@ static inline void bio_get_last_bvec(struct bio *bio, > struct bio_vec *bv) > bv->bv_len = iter.bi_bvec_done; > } > > -static inline unsigned bio_pages_all(struct bio *bio) > -{ > - WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); > - return bio->bi_vcnt; > -} > - > static inline struct bio_vec *bio_first_bvec_all(struct bio *bio) > { > WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V12 17/20] block: always define BIO_MAX_PAGES as 256
On Mon, Nov 26, 2018 at 10:17:17AM +0800, Ming Lei wrote: > Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to > increase BIO_MAX_PAGES for it. > > CONFIG_THP_SWAP needs to split one THP into normal pages and adds > them all to one bio. With multipage-bvec, it just takes one bvec to > hold them all. > > Reviewed-by: Christoph Hellwig Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > include/linux/bio.h | 8 > 1 file changed, 8 deletions(-) > > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 5505f74aef8b..7be48c55b14a 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -34,15 +34,7 @@ > #define BIO_BUG_ON > #endif > > -#ifdef CONFIG_THP_SWAP > -#if HPAGE_PMD_NR > 256 > -#define BIO_MAX_PAGESHPAGE_PMD_NR > -#else > #define BIO_MAX_PAGES256 > -#endif > -#else > -#define BIO_MAX_PAGES256 > -#endif > > #define bio_prio(bio)(bio)->bi_ioprio > #define bio_set_prio(bio, prio) ((bio)->bi_ioprio = prio) > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V12 18/20] block: document usage of bio iterator helpers
On Mon, Nov 26, 2018 at 10:17:18AM +0800, Ming Lei wrote: > Now multi-page bvec is supported, some helpers may return page by > page, meantime some may return segment by segment, this patch > documents the usage. Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > Documentation/block/biovecs.txt | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt > index 25689584e6e0..ce6eccaf5df7 100644 > --- a/Documentation/block/biovecs.txt > +++ b/Documentation/block/biovecs.txt > @@ -117,3 +117,28 @@ Other implications: > size limitations and the limitations of the underlying devices. Thus > there's no need to define ->merge_bvec_fn() callbacks for individual block > drivers. > + > +Usage of helpers: > += > + > +* The following helpers whose names have the suffix of "_all" can only be > used > +on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers > +shouldn't use them because the bio may have been split before it reached the > +driver. > + > + bio_for_each_segment_all() > + bio_first_bvec_all() > + bio_first_page_all() > + bio_last_bvec_all() > + > +* The following helpers iterate over single-page segment. The passed 'struct > +bio_vec' will contain a single-page IO vector during the iteration > + > + bio_for_each_segment() > + bio_for_each_segment_all() > + > +* The following helpers iterate over multi-page bvec. The passed 'struct > +bio_vec' will contain a multi-page IO vector during the iteration > + > + bio_for_each_bvec() > + rq_for_each_bvec() > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V12 15/20] block: allow bio_for_each_segment_all() to iterate over multi-page bvec
On Mon, Nov 26, 2018 at 10:17:15AM +0800, Ming Lei wrote: > This patch introduces one extra iterator variable to > bio_for_each_segment_all(), > then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. > > Given it is just one mechannical & simple change on all > bio_for_each_segment_all() > users, this patch does tree-wide change in one single patch, so that we can > avoid to use a temporary helper for this conversion. > > Reviewed-by: Christoph Hellwig Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > block/bio.c | 27 ++- > block/bounce.c| 6 -- > drivers/md/bcache/btree.c | 3 ++- > drivers/md/dm-crypt.c | 3 ++- > drivers/md/raid1.c| 3 ++- > drivers/staging/erofs/data.c | 3 ++- > drivers/staging/erofs/unzip_vle.c | 3 ++- > fs/block_dev.c| 6 -- > fs/btrfs/compression.c| 3 ++- > fs/btrfs/disk-io.c| 3 ++- > fs/btrfs/extent_io.c | 9 ++--- > fs/btrfs/inode.c | 6 -- > fs/btrfs/raid56.c | 3 ++- > fs/crypto/bio.c | 3 ++- > fs/direct-io.c| 4 +++- > fs/exofs/ore.c| 3 ++- > fs/exofs/ore_raid.c | 3 ++- > fs/ext4/page-io.c | 3 ++- > fs/ext4/readpage.c| 3 ++- > fs/f2fs/data.c| 9 ++--- > fs/gfs2/lops.c| 6 -- > fs/gfs2/meta_io.c | 3 ++- > fs/iomap.c| 6 -- > fs/mpage.c| 3 ++- > fs/xfs/xfs_aops.c | 5 +++-- > include/linux/bio.h | 11 +-- > include/linux/bvec.h | 30 ++ > 27 files changed, 125 insertions(+), 45 deletions(-)
Re: [Cluster-devel] [PATCH V12 14/20] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()
On Mon, Nov 26, 2018 at 10:17:14AM +0800, Ming Lei wrote: > bch_bio_alloc_pages() is always called on one new bio, so it is safe > to access the bvec table directly. Given it is the only kind of this > case, open code the bvec table access since bio_for_each_segment_all() > will be changed to support for iterating over multipage bvec. > > Acked-by: Coly Li > Reviewed-by: Christoph Hellwig Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > drivers/md/bcache/util.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c > index 20eddeac1531..62fb917f7a4f 100644 > --- a/drivers/md/bcache/util.c > +++ b/drivers/md/bcache/util.c > @@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask) > int i; > struct bio_vec *bv; > > - bio_for_each_segment_all(bv, bio, i) { > + /* > + * This is called on freshly new bio, so it is safe to access the > + * bvec table directly. > + */ > + for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) { > bv->bv_page = alloc_page(gfp_mask); > if (!bv->bv_page) { > while (--bv >= bio->bi_io_vec) > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V12 09/20] block: use bio_for_each_bvec() to compute multi-page bvec count
On Mon, Nov 26, 2018 at 10:17:09AM +0800, Ming Lei wrote: > First it is more efficient to use bio_for_each_bvec() in both > blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how > many multi-page bvecs there are in the bio. > > Secondly once bio_for_each_bvec() is used, the bvec may need to be > splitted because its length can be very longer than max segment size, > so we have to split the big bvec into several segments. > > Thirdly when splitting multi-page bvec into segments, the max segment > limit may be reached, so the bio split need to be considered under > this situation too. Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > block/blk-merge.c | 100 > +++--- > 1 file changed, 80 insertions(+), 20 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 51ec6ca56a0a..2d8f388d43de 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -161,6 +161,70 @@ static inline unsigned get_max_io_size(struct > request_queue *q, > return sectors; > } > > +static unsigned get_max_segment_size(struct request_queue *q, > + unsigned offset) > +{ > + unsigned long mask = queue_segment_boundary(q); > + > + return min_t(unsigned long, mask - (mask & offset) + 1, > + queue_max_segment_size(q)); > +} > + > +/* > + * Split the bvec @bv into segments, and update all kinds of > + * variables. > + */ > +static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv, > + unsigned *nsegs, unsigned *last_seg_size, > + unsigned *front_seg_size, unsigned *sectors) > +{ > + unsigned len = bv->bv_len; > + unsigned total_len = 0; > + unsigned new_nsegs = 0, seg_size = 0; > + > + /* > + * Multipage bvec may be too big to hold in one segment, > + * so the current bvec has to be splitted as multiple > + * segments. > + */ > + while (len && new_nsegs + *nsegs < queue_max_segments(q)) { > + seg_size = get_max_segment_size(q, bv->bv_offset + total_len); > + seg_size = min(seg_size, len); > + > + new_nsegs++; > + total_len += seg_size; > + len -= seg_size; > + > + if ((bv->bv_offset + total_len) & queue_virt_boundary(q)) > + break; > + } > + > + if (!new_nsegs) > + return !!len; > + > + /* update front segment size */ > + if (!*nsegs) { > + unsigned first_seg_size; > + > + if (new_nsegs == 1) > + first_seg_size = get_max_segment_size(q, bv->bv_offset); > + else > + first_seg_size = queue_max_segment_size(q); > + > + if (*front_seg_size < first_seg_size) > + *front_seg_size = first_seg_size; > + } > + > + /* update other varibles */ > + *last_seg_size = seg_size; > + *nsegs += new_nsegs; > + if (sectors) > + *sectors += total_len >> 9; > + > + /* split in the middle of the bvec if len != 0 */ > + return !!len; > +} > + > static struct bio *blk_bio_segment_split(struct request_queue *q, >struct bio *bio, >struct bio_set *bs, > @@ -174,7 +238,7 @@ static struct bio *blk_bio_segment_split(struct > request_queue *q, > struct bio *new = NULL; > const unsigned max_sectors = get_max_io_size(q, bio); > > - bio_for_each_segment(bv, bio, iter) { > + bio_for_each_bvec(bv, bio, iter) { > /* >* If the queue doesn't support SG gaps and adding this >* offset would create a gap, disallow it. > @@ -189,8 +253,12 @@ static struct bio *blk_bio_segment_split(struct > request_queue *q, >*/ > if (nsegs < queue_max_segments(q) && > sectors < max_sectors) { > - nsegs++; > - sectors = max_sectors; > + /* split in the middle of bvec */ > + bv.bv_len = (max_sectors - sectors) << 9; > + bvec_split_segs(q, , , > + _size, > + _seg_size, > + ); > } > goto split; > } > @@ -212,14 +280,12 @@ static struct bio *blk_bio_segment_split(struct > request_queue *q, > if (nsegs == queue_max_segments(q)) > goto split; > > - if (nsegs == 1 && seg_size > front_seg_size) > - front_seg_size = seg_size; > - > - nsegs++; > bvprv = bv; > bvprvp = > - seg_size = bv.bv_len; > - sectors += bv.bv_len >> 9;
Re: [Cluster-devel] [PATCH V12 13/20] block: loop: pass multi-page bvec to iov_iter
On Mon, Nov 26, 2018 at 10:17:13AM +0800, Ming Lei wrote: > iov_iter is implemented on bvec itererator helpers, so it is safe to pass > multi-page bvec to it, and this way is much more efficient than passing one > page in each bvec. > > Reviewed-by: Christoph Hellwig Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > drivers/block/loop.c | 20 ++-- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 176ab1f28eca..e3683211f12d 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -510,21 +510,22 @@ static int lo_rw_aio(struct loop_device *lo, struct > loop_cmd *cmd, >loff_t pos, bool rw) > { > struct iov_iter iter; > + struct req_iterator rq_iter; > struct bio_vec *bvec; > struct request *rq = blk_mq_rq_from_pdu(cmd); > struct bio *bio = rq->bio; > struct file *file = lo->lo_backing_file; > + struct bio_vec tmp; > unsigned int offset; > - int segments = 0; > + int nr_bvec = 0; > int ret; > > + rq_for_each_bvec(tmp, rq, rq_iter) > + nr_bvec++; > + > if (rq->bio != rq->biotail) { > - struct req_iterator iter; > - struct bio_vec tmp; > > - __rq_for_each_bio(bio, rq) > - segments += bio_segments(bio); > - bvec = kmalloc_array(segments, sizeof(struct bio_vec), > + bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec), >GFP_NOIO); > if (!bvec) > return -EIO; > @@ -533,10 +534,10 @@ static int lo_rw_aio(struct loop_device *lo, struct > loop_cmd *cmd, > /* >* The bios of the request may be started from the middle of >* the 'bvec' because of bio splitting, so we can't directly > - * copy bio->bi_iov_vec to new bvec. The rq_for_each_segment > + * copy bio->bi_iov_vec to new bvec. The rq_for_each_bvec >* API will take care of all details for us. >*/ > - rq_for_each_segment(tmp, rq, iter) { > + rq_for_each_bvec(tmp, rq, rq_iter) { > *bvec = tmp; > bvec++; > } > @@ -550,11 +551,10 @@ static int lo_rw_aio(struct loop_device *lo, struct > loop_cmd *cmd, >*/ > offset = bio->bi_iter.bi_bvec_done; > bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter); > - segments = bio_segments(bio); > } > atomic_set(>ref, 2); > > - iov_iter_bvec(, rw, bvec, segments, blk_rq_bytes(rq)); > + iov_iter_bvec(, rw, bvec, nr_bvec, blk_rq_bytes(rq)); > iter.iov_offset = offset; > > cmd->iocb.ki_pos = pos; > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V12 01/20] btrfs: remove various bio_offset arguments
On Mon, Nov 26, 2018 at 10:17:01AM +0800, Ming Lei wrote: > From: Christoph Hellwig > > The btrfs write path passes a bio_offset argument through some deep > callchains including async offloading. In the end this is easily > calculatable using page_offset plus the bvec offset for the first > page in the bio, and only actually used by by a single function. > Just move the calculation of the offset there. > > Reviewed-by: David Sterba > Signed-off-by: Christoph Hellwig > --- > fs/btrfs/disk-io.c | 21 + > fs/btrfs/disk-io.h | 2 +- > fs/btrfs/extent_io.c | 9 ++--- > fs/btrfs/extent_io.h | 5 ++--- > fs/btrfs/inode.c | 17 - > 5 files changed, 18 insertions(+), 36 deletions(-) [snip] > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 9ea4c6f0352f..c576b3fcaea7 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1920,8 +1920,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned > long offset, > * At IO completion time the cums attached on the ordered extent record > * are inserted into the btree > */ > -static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio > *bio, > - u64 bio_offset) > +static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio > *bio) > { > struct inode *inode = private_data; > blk_status_t ret = 0; > @@ -1973,8 +1972,7 @@ blk_status_t btrfs_submit_bio_done(void *private_data, > struct bio *bio, > *c-3) otherwise:async submit > */ > static blk_status_t btrfs_submit_bio_hook(void *private_data, struct bio > *bio, > - int mirror_num, unsigned long bio_flags, > - u64 bio_offset) > + int mirror_num, unsigned long bio_flags) > { > struct inode *inode = private_data; > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > @@ -2011,8 +2009,7 @@ static blk_status_t btrfs_submit_bio_hook(void > *private_data, struct bio *bio, > goto mapit; > /* we're doing a write, do the async checksumming */ > ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, bio_flags, > - bio_offset, inode, > - btrfs_submit_bio_start); > + inode, btrfs_submit_bio_start); > goto out; > } else if (!skip_sum) { > ret = btrfs_csum_one_bio(inode, bio, 0, 0); > @@ -8123,10 +8120,13 @@ static void btrfs_endio_direct_write(struct bio *bio) > } > > static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data, > - struct bio *bio, u64 offset) > + struct bio *bio) > { > struct inode *inode = private_data; > + struct bio_vec *bvec = bio_first_bvec_all(bio); > + u64 offset = page_offset(bvec->bv_page) + bvec->bv_offset; Hm, but for direct I/O, these will be user pages (or the zero page), so page_offset() won't be valid? > blk_status_t ret; > + > ret = btrfs_csum_one_bio(inode, bio, offset, 1); > BUG_ON(ret); /* -ENOMEM */ > return 0;
Re: [Cluster-devel] [PATCH V12 06/20] block: rename bvec helpers
On Mon, Nov 26, 2018 at 10:17:06AM +0800, Ming Lei wrote: > We will support multi-page bvec soon, and have to deal with > single-page vs multi-page bvec. This patch follows Christoph's > suggestion to rename all the following helpers: > > for_each_bvec > bvec_iter_bvec > bvec_iter_len > bvec_iter_page > bvec_iter_offset > > into: > for_each_segment > segment_iter_bvec > segment_iter_len > segment_iter_page > segment_iter_offset > > so that these helpers named with 'segment' only deal with single-page > bvec, or called segment. We will introduce helpers named with 'bvec' > for multi-page bvec. > > bvec_iter_advance() isn't renamed becasue this helper is always operated > on real bvec even though multi-page bvec is supported. > > Suggested-by: Christoph Hellwig Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > .clang-format | 2 +- > drivers/md/dm-integrity.c | 2 +- > drivers/md/dm-io.c | 4 ++-- > drivers/nvdimm/blk.c | 4 ++-- > drivers/nvdimm/btt.c | 4 ++-- > include/linux/bio.h| 10 +- > include/linux/bvec.h | 20 +++- > include/linux/ceph/messenger.h | 2 +- > lib/iov_iter.c | 2 +- > net/ceph/messenger.c | 14 +++--- > 10 files changed, 33 insertions(+), 31 deletions(-)
Re: [Cluster-devel] [PATCH V12 05/20] block: remove bvec_iter_rewind()
On Mon, Nov 26, 2018 at 10:17:05AM +0800, Ming Lei wrote: > Commit 7759eb23fd980 ("block: remove bio_rewind_iter()") removes > bio_rewind_iter(), then no one uses bvec_iter_rewind() any more, > so remove it. Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei > --- > include/linux/bvec.h | 24 > 1 file changed, 24 deletions(-) > > diff --git a/include/linux/bvec.h b/include/linux/bvec.h > index 02c73c6aa805..ba0ae40e77c9 100644 > --- a/include/linux/bvec.h > +++ b/include/linux/bvec.h > @@ -92,30 +92,6 @@ static inline bool bvec_iter_advance(const struct bio_vec > *bv, > return true; > } > > -static inline bool bvec_iter_rewind(const struct bio_vec *bv, > - struct bvec_iter *iter, > - unsigned int bytes) > -{ > - while (bytes) { > - unsigned len = min(bytes, iter->bi_bvec_done); > - > - if (iter->bi_bvec_done == 0) { > - if (WARN_ONCE(iter->bi_idx == 0, > - "Attempted to rewind iter beyond " > - "bvec's boundaries\n")) { > - return false; > - } > - iter->bi_idx--; > - iter->bi_bvec_done = __bvec_iter_bvec(bv, > *iter)->bv_len; > - continue; > - } > - bytes -= len; > - iter->bi_size += len; > - iter->bi_bvec_done -= len; > - } > - return true; > -} > - > #define for_each_bvec(bvl, bio_vec, iter, start) \ > for (iter = (start);\ >(iter).bi_size && \ > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V12 04/20] block: don't use bio->bi_vcnt to figure out segment number
On Mon, Nov 26, 2018 at 10:17:04AM +0800, Ming Lei wrote: > It is wrong to use bio->bi_vcnt to figure out how many segments > there are in the bio even though CLONED flag isn't set on this bio, > because this bio may be splitted or advanced. > > So always use bio_segments() in blk_recount_segments(), and it shouldn't > cause any performance loss now because the physical segment number is figured > out in blk_queue_split() and BIO_SEG_VALID is set meantime since > bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"). > > Reviewed-by: Christoph Hellwig Reviewed-by: Omar Sandoval > Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max > segments") > Signed-off-by: Ming Lei > --- > block/blk-merge.c | 8 +--- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index e69d8f8ba819..51ec6ca56a0a 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -367,13 +367,7 @@ void blk_recalc_rq_segments(struct request *rq) > > void blk_recount_segments(struct request_queue *q, struct bio *bio) > { > - unsigned short seg_cnt; > - > - /* estimate segment number by bi_vcnt for non-cloned bio */ > - if (bio_flagged(bio, BIO_CLONED)) > - seg_cnt = bio_segments(bio); > - else > - seg_cnt = bio->bi_vcnt; > + unsigned short seg_cnt = bio_segments(bio); > > if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) && > (seg_cnt < queue_max_segments(q))) > -- > 2.9.5 >
Re: [Cluster-devel] [PATCH V12 03/20] block: remove the "cluster" flag
On Mon, Nov 26, 2018 at 10:17:03AM +0800, Ming Lei wrote: > From: Christoph Hellwig > > The cluster flag implements some very old SCSI behavior. As far as I > can tell the original intent was to enable or disable any kind of > segment merging. But the actually visible effect to the LLDD is that > it limits each segments to be inside a single page, which we can > also affect by setting the maximum segment size and the segment > boundary. Reviewed-by: Omar Sandoval One comment typo below. > Signed-off-by: Christoph Hellwig > > Replace virt boundary with segment boundary limit. > > Signed-off-by: Ming Lei > --- > block/blk-merge.c | 20 > block/blk-settings.c| 3 --- > block/blk-sysfs.c | 5 + > drivers/scsi/scsi_lib.c | 20 > include/linux/blkdev.h | 6 -- > 5 files changed, 25 insertions(+), 29 deletions(-) > [snip] > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 0df15cb738d2..78d6d05992b0 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1810,6 +1810,8 @@ static int scsi_map_queues(struct blk_mq_tag_set *set) > void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) > { > struct device *dev = shost->dma_dev; > + unsigned max_segment_size = dma_get_max_seg_size(dev); > + unsigned long segment_boundary = shost->dma_boundary; > > /* >* this limit is imposed by hardware restrictions > @@ -1828,13 +1830,23 @@ void __scsi_init_queue(struct Scsi_Host *shost, > struct request_queue *q) > blk_queue_max_hw_sectors(q, shost->max_sectors); > if (shost->unchecked_isa_dma) > blk_queue_bounce_limit(q, BLK_BOUNCE_ISA); > - blk_queue_segment_boundary(q, shost->dma_boundary); > dma_set_seg_boundary(dev, shost->dma_boundary); > > - blk_queue_max_segment_size(q, dma_get_max_seg_size(dev)); > + /* > + * Clustering is a really old concept from the stone age of Linux > + * SCSI support. But the basic idea is that we never give the > + * driver a segment that spans multiple pages. For that we need > + * to limit the segment size, and set the segment boundary so that > + * we never merge a second segment which is no page aligned. Typo, "which is not page aligned".
[Cluster-devel] [PATCH] gfs2: Get rid of potential double-freeing in gfs2_create_inode
In gfs2_create_inode, after setting and releasing the acl / default_acl, the acl / default_acl pointers are not set to NULL as they should be. In that state, when the function reaches label fail_free_acls, gfs2_create_inode will try to release the same acls again. Fix that by setting the pointers to NULL after releasing the acls. Slightly simplify the logic. Also, posix_acl_release checks for NULL already, so there is no need to duplicate those checks here. Fixes: e01580bf9e4d ("gfs2: use generic posix ACL infrastructure") Reported-by: Pan Bian Cc: Christoph Hellwig Cc: sta...@vger.kernel.org # v4.9+ Signed-off-by: Andreas Gruenbacher Signed-off-by: Bob Peterson --- fs/gfs2/inode.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 648f0ca1ad57..998051c4aea7 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -744,17 +744,19 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry, the gfs2 structures. */ if (default_acl) { error = __gfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT); + if (error) + goto fail_gunlock3; posix_acl_release(default_acl); + default_acl = NULL; } if (acl) { - if (!error) - error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS); + error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS); + if (error) + goto fail_gunlock3; posix_acl_release(acl); + acl = NULL; } - if (error) - goto fail_gunlock3; - error = security_inode_init_security(>i_inode, >i_inode, name, _initxattrs, NULL); if (error) @@ -789,10 +791,8 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry, } gfs2_rsqa_delete(ip, NULL); fail_free_acls: - if (default_acl) - posix_acl_release(default_acl); - if (acl) - posix_acl_release(acl); + posix_acl_release(default_acl); + posix_acl_release(acl); fail_gunlock: gfs2_dir_no_add(); gfs2_glock_dq_uninit(ghs); -- 2.17.2
Re: [Cluster-devel] [PATCH V12 10/20] block: use bio_for_each_bvec() to map sg
Looks good, Reviewed-by: Christoph Hellwig
Re: [Cluster-devel] [PATCH V12 09/20] block: use bio_for_each_bvec() to compute multi-page bvec count
Looks fine: Reviewed-by: Christoph Hellwig Nitpick below: > +{ > + unsigned len = bv->bv_len; > + unsigned total_len = 0; > + unsigned new_nsegs = 0, seg_size = 0; > + > + /* > + * Multipage bvec may be too big to hold in one segment, > + * so the current bvec has to be splitted as multiple > + * segments. > + */ Please use up all 80 chars in a line.
Re: [Cluster-devel] [PATCH V12 16/20] block: enable multipage bvecs
> + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) + > + bv->bv_offset + bv->bv_len; The name is a little confusing, as the real end addr would be -1. Maybe throw the -1 in here, and adjust for it in the contigous check below?
Re: [Cluster-devel] [PATCH V12 18/20] block: document usage of bio iterator helpers
On Mon, Nov 26, 2018 at 10:17:18AM +0800, Ming Lei wrote: > Now multi-page bvec is supported, some helpers may return page by > page, meantime some may return segment by segment, this patch > documents the usage. > > Signed-off-by: Ming Lei Looks fine, Reviewed-by: Christoph Hellwig
Re: [Cluster-devel] [PATCH V12 06/20] block: rename bvec helpers
On Mon, Nov 26, 2018 at 10:17:06AM +0800, Ming Lei wrote: > We will support multi-page bvec soon, and have to deal with > single-page vs multi-page bvec. This patch follows Christoph's > suggestion to rename all the following helpers: > > for_each_bvec > bvec_iter_bvec > bvec_iter_len > bvec_iter_page > bvec_iter_offset > > into: > for_each_segment > segment_iter_bvec > segment_iter_len > segment_iter_page > segment_iter_offset > > so that these helpers named with 'segment' only deal with single-page > bvec, or called segment. We will introduce helpers named with 'bvec' > for multi-page bvec. > > bvec_iter_advance() isn't renamed becasue this helper is always operated > on real bvec even though multi-page bvec is supported. > > Suggested-by: Christoph Hellwig > Signed-off-by: Ming Lei Looks fine, Reviewed-by: Christoph Hellwig
Re: [Cluster-devel] [PATCH V12 08/20] block: introduce bio_for_each_bvec() and rq_for_each_bvec()
On Mon, Nov 26, 2018 at 10:17:08AM +0800, Ming Lei wrote: > bio_for_each_bvec() is used for iterating over multi-page bvec for bio > split & merge code. > > rq_for_each_bvec() can be used for drivers which may handle the > multi-page bvec directly, so far loop is one perfect use case. > > Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei Looks fine, Reviewed-by: Christoph Hellwig
Re: [Cluster-devel] [PATCH V12 07/20] block: introduce multi-page bvec helpers
Looks fine, Reviewed-by: Christoph Hellwig
Re: [Cluster-devel] [PATCH V12 05/20] block: remove bvec_iter_rewind()
On Mon, Nov 26, 2018 at 10:17:05AM +0800, Ming Lei wrote: > Commit 7759eb23fd980 ("block: remove bio_rewind_iter()") removes > bio_rewind_iter(), then no one uses bvec_iter_rewind() any more, > so remove it. > > Signed-off-by: Ming Lei Looks good, Reviewed-by: Christoph Hellwig
[Cluster-devel] [PATCH] gfs2: get rid of potential double free
__gfs2_set_acl will not increase the reference count of acl if it fails. In this case, posix_acl_release are called twice, one after __gfs2_set_acl and one in the error handling block. This patch does not release acl when __gfs2_set_acl fails. Fixes: e01580bf9e4("gfs2: use generic posix ACL infrastructure") Signed-off-by: Pan Bian --- fs/gfs2/inode.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 648f0ca..3a9041c 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -744,12 +744,13 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry, the gfs2 structures. */ if (default_acl) { error = __gfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT); - posix_acl_release(default_acl); + if (!error) + posix_acl_release(default_acl); } - if (acl) { + if (acl && !error) { + error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS); if (!error) - error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS); - posix_acl_release(acl); + posix_acl_release(acl); } if (error) -- 2.7.4
Re: [Cluster-devel] [PATCH V12 06/20] block: rename bvec helpers
On Mon, Nov 26, 2018 at 3:20 AM Ming Lei wrote: > > We will support multi-page bvec soon, and have to deal with > single-page vs multi-page bvec. This patch follows Christoph's > suggestion to rename all the following helpers: > > for_each_bvec > bvec_iter_bvec > bvec_iter_len > bvec_iter_page > bvec_iter_offset > > into: > for_each_segment > segment_iter_bvec > segment_iter_len > segment_iter_page > segment_iter_offset > > so that these helpers named with 'segment' only deal with single-page > bvec, or called segment. We will introduce helpers named with 'bvec' > for multi-page bvec. > > bvec_iter_advance() isn't renamed becasue this helper is always operated > on real bvec even though multi-page bvec is supported. > > Suggested-by: Christoph Hellwig > Signed-off-by: Ming Lei > --- > .clang-format | 2 +- Acked-by: Miguel Ojeda Cheers, Miguel