Re: [Cluster-devel] [PATCH V12 16/20] block: enable multipage bvecs

2018-11-26 Thread Ming Lei
On Mon, Nov 26, 2018 at 01:58:42PM +0100, Christoph Hellwig wrote:
> > +   phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
> > +   bv->bv_offset + bv->bv_len;
> 
> The name is a little confusing, as the real end addr would be -1.  Maybe
> throw the -1 in here, and adjust for it in the contigous check below?

Yeah, it makes sense.

thanks,
Ming



Re: [Cluster-devel] [PATCH] gfs2: get rid of potential double free

2018-11-26 Thread PanBian
On Tue, Nov 27, 2018 at 12:19:48AM +0100, Andreas Gruenbacher wrote:
> Pan,
> 
> On Mon, 26 Nov 2018 at 09:44, Pan Bian  wrote:
> 
> > __gfs2_set_acl will not increase the reference count of acl if it fails.
> > In this case, posix_acl_release are called twice, one after
> > __gfs2_set_acl and one in the error handling block. This patch does not
> > release acl when __gfs2_set_acl fails.
> >
> 
> thanks for the bug report and patch; I think this should be fixed more
> clearly as in the patch I've posted earlier today.

I have seen the patch. It is really much clearer. Thank you!

Pan


> 
> Andreas



Re: [Cluster-devel] [PATCH] gfs2: get rid of potential double free

2018-11-26 Thread Andreas Gruenbacher
Pan,

On Mon, 26 Nov 2018 at 09:44, Pan Bian  wrote:

> __gfs2_set_acl will not increase the reference count of acl if it fails.
> In this case, posix_acl_release are called twice, one after
> __gfs2_set_acl and one in the error handling block. This patch does not
> release acl when __gfs2_set_acl fails.
>

thanks for the bug report and patch; I think this should be fixed more
clearly as in the patch I've posted earlier today.

Andreas


Re: [Cluster-devel] [PATCH V12 16/20] block: enable multipage bvecs

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:16AM +0800, Ming Lei wrote:
> This patch pulls the trigger for multi-page bvecs.

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  block/bio.c | 22 +++---
>  fs/iomap.c  |  4 ++--
>  fs/xfs/xfs_aops.c   |  4 ++--
>  include/linux/bio.h |  2 +-
>  4 files changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 75fde30af51f..8bf9338d4783 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -753,6 +753,8 @@ EXPORT_SYMBOL(bio_add_pc_page);
>   * @page: page to add
>   * @len: length of the data to add
>   * @off: offset of the data in @page
> + * @same_page: if %true only merge if the new data is in the same physical
> + *   page as the last segment of the bio.
>   *
>   * Try to add the data at @page + @off to the last bvec of @bio.  This is a
>   * a useful optimisation for file systems with a block size smaller than the
> @@ -761,19 +763,25 @@ EXPORT_SYMBOL(bio_add_pc_page);
>   * Return %true on success or %false on failure.
>   */
>  bool __bio_try_merge_page(struct bio *bio, struct page *page,
> - unsigned int len, unsigned int off)
> + unsigned int len, unsigned int off, bool same_page)
>  {
>   if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
>   return false;
>  
>   if (bio->bi_vcnt > 0) {
>   struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
> + bv->bv_offset + bv->bv_len;
> + phys_addr_t page_addr = page_to_phys(page);
>  
> - if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
> - bv->bv_len += len;
> - bio->bi_iter.bi_size += len;
> - return true;
> - }
> + if (vec_end_addr != page_addr + off)
> + return false;
> + if (same_page && ((vec_end_addr - 1) & PAGE_MASK) != page_addr)
> + return false;
> +
> + bv->bv_len += len;
> + bio->bi_iter.bi_size += len;
> + return true;
>   }
>   return false;
>  }
> @@ -819,7 +827,7 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
>  int bio_add_page(struct bio *bio, struct page *page,
>unsigned int len, unsigned int offset)
>  {
> - if (!__bio_try_merge_page(bio, page, len, offset)) {
> + if (!__bio_try_merge_page(bio, page, len, offset, false)) {
>   if (bio_full(bio))
>   return 0;
>   __bio_add_page(bio, page, len, offset);
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 1f648d098a3b..ec5527b0fba4 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -313,7 +313,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>*/
>   sector = iomap_sector(iomap, pos);
>   if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
> - if (__bio_try_merge_page(ctx->bio, page, plen, poff))
> + if (__bio_try_merge_page(ctx->bio, page, plen, poff, true))
>   goto done;
>   is_contig = true;
>   }
> @@ -344,7 +344,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>   ctx->bio->bi_end_io = iomap_read_end_io;
>   }
>  
> - __bio_add_page(ctx->bio, page, plen, poff);
> + bio_add_page(ctx->bio, page, plen, poff);
>  done:
>   /*
>* Move the caller beyond our range so that it keeps making progress.
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 1f1829e506e8..b9fd44168f61 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -616,12 +616,12 @@ xfs_add_to_ioend(
>   bdev, sector);
>   }
>  
> - if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
> + if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff, true)) {
>   if (iop)
>   atomic_inc(>write_count);
>   if (bio_full(wpc->ioend->io_bio))
>   xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
> - __bio_add_page(wpc->ioend->io_bio, page, len, poff);
> + bio_add_page(wpc->ioend->io_bio, page, len, poff);
>   }
>  
>   wpc->ioend->io_size += len;
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index c35997dd02c2..5505f74aef8b 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -441,7 +441,7 @@ extern int bio_add_page(struct bio *, struct page *, 
> unsigned int,unsigned int);
>  extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page 
> *,
>  unsigned int, unsigned int);
>  bool __bio_try_merge_page(struct bio *bio, struct page *page,
> - unsigned int len, unsigned int off);
> + unsigned int len, unsigned int off, bool same_page);
>  void 

Re: [Cluster-devel] [PATCH V12 02/20] btrfs: look at bi_size for repair decisions

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:02AM +0800, Ming Lei wrote:
> From: Christoph Hellwig 
> 
> bio_readpage_error currently uses bi_vcnt to decide if it is worth
> retrying an I/O.  But the vector count is mostly an implementation
> artifact - it really should figure out if there is more than a
> single sector worth retrying.  Use bi_size for that and shift by
> PAGE_SHIFT.  This really should be blocks/sectors, but given that
> btrfs doesn't support a sector size different from the PAGE_SIZE
> using the page size keeps the changes to a minimum.
> 
> Reviewed-by: David Sterba 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Christoph Hellwig 
> ---
>  fs/btrfs/extent_io.c | 2 +-
>  include/linux/bio.h  | 6 --
>  2 files changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 15fd46582bb2..40751e86a2a9 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2368,7 +2368,7 @@ static int bio_readpage_error(struct bio *failed_bio, 
> u64 phy_offset,
>   int read_mode = 0;
>   blk_status_t status;
>   int ret;
> - unsigned failed_bio_pages = bio_pages_all(failed_bio);
> + unsigned failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT;
>  
>   BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
>  
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 056fb627edb3..6f6bc331a5d1 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -263,12 +263,6 @@ static inline void bio_get_last_bvec(struct bio *bio, 
> struct bio_vec *bv)
>   bv->bv_len = iter.bi_bvec_done;
>  }
>  
> -static inline unsigned bio_pages_all(struct bio *bio)
> -{
> - WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
> - return bio->bi_vcnt;
> -}
> -
>  static inline struct bio_vec *bio_first_bvec_all(struct bio *bio)
>  {
>   WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
> -- 
> 2.9.5
> 



Re: [Cluster-devel] [PATCH V12 17/20] block: always define BIO_MAX_PAGES as 256

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:17AM +0800, Ming Lei wrote:
> Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to
> increase BIO_MAX_PAGES for it.
> 
> CONFIG_THP_SWAP needs to split one THP into normal pages and adds
> them all to one bio. With multipage-bvec, it just takes one bvec to
> hold them all.
> 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  include/linux/bio.h | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 5505f74aef8b..7be48c55b14a 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -34,15 +34,7 @@
>  #define BIO_BUG_ON
>  #endif
>  
> -#ifdef CONFIG_THP_SWAP
> -#if HPAGE_PMD_NR > 256
> -#define BIO_MAX_PAGESHPAGE_PMD_NR
> -#else
>  #define BIO_MAX_PAGES256
> -#endif
> -#else
> -#define BIO_MAX_PAGES256
> -#endif
>  
>  #define bio_prio(bio)(bio)->bi_ioprio
>  #define bio_set_prio(bio, prio)  ((bio)->bi_ioprio = prio)
> -- 
> 2.9.5
> 



Re: [Cluster-devel] [PATCH V12 18/20] block: document usage of bio iterator helpers

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:18AM +0800, Ming Lei wrote:
> Now multi-page bvec is supported, some helpers may return page by
> page, meantime some may return segment by segment, this patch
> documents the usage.

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  Documentation/block/biovecs.txt | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
> index 25689584e6e0..ce6eccaf5df7 100644
> --- a/Documentation/block/biovecs.txt
> +++ b/Documentation/block/biovecs.txt
> @@ -117,3 +117,28 @@ Other implications:
> size limitations and the limitations of the underlying devices. Thus
> there's no need to define ->merge_bvec_fn() callbacks for individual block
> drivers.
> +
> +Usage of helpers:
> +=
> +
> +* The following helpers whose names have the suffix of "_all" can only be 
> used
> +on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
> +shouldn't use them because the bio may have been split before it reached the
> +driver.
> +
> + bio_for_each_segment_all()
> + bio_first_bvec_all()
> + bio_first_page_all()
> + bio_last_bvec_all()
> +
> +* The following helpers iterate over single-page segment. The passed 'struct
> +bio_vec' will contain a single-page IO vector during the iteration
> +
> + bio_for_each_segment()
> + bio_for_each_segment_all()
> +
> +* The following helpers iterate over multi-page bvec. The passed 'struct
> +bio_vec' will contain a multi-page IO vector during the iteration
> +
> + bio_for_each_bvec()
> + rq_for_each_bvec()
> -- 
> 2.9.5
> 



Re: [Cluster-devel] [PATCH V12 15/20] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:15AM +0800, Ming Lei wrote:
> This patch introduces one extra iterator variable to 
> bio_for_each_segment_all(),
> then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.
> 
> Given it is just one mechannical & simple change on all 
> bio_for_each_segment_all()
> users, this patch does tree-wide change in one single patch, so that we can
> avoid to use a temporary helper for this conversion.
> 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  block/bio.c   | 27 ++-
>  block/bounce.c|  6 --
>  drivers/md/bcache/btree.c |  3 ++-
>  drivers/md/dm-crypt.c |  3 ++-
>  drivers/md/raid1.c|  3 ++-
>  drivers/staging/erofs/data.c  |  3 ++-
>  drivers/staging/erofs/unzip_vle.c |  3 ++-
>  fs/block_dev.c|  6 --
>  fs/btrfs/compression.c|  3 ++-
>  fs/btrfs/disk-io.c|  3 ++-
>  fs/btrfs/extent_io.c  |  9 ++---
>  fs/btrfs/inode.c  |  6 --
>  fs/btrfs/raid56.c |  3 ++-
>  fs/crypto/bio.c   |  3 ++-
>  fs/direct-io.c|  4 +++-
>  fs/exofs/ore.c|  3 ++-
>  fs/exofs/ore_raid.c   |  3 ++-
>  fs/ext4/page-io.c |  3 ++-
>  fs/ext4/readpage.c|  3 ++-
>  fs/f2fs/data.c|  9 ++---
>  fs/gfs2/lops.c|  6 --
>  fs/gfs2/meta_io.c |  3 ++-
>  fs/iomap.c|  6 --
>  fs/mpage.c|  3 ++-
>  fs/xfs/xfs_aops.c |  5 +++--
>  include/linux/bio.h   | 11 +--
>  include/linux/bvec.h  | 30 ++
>  27 files changed, 125 insertions(+), 45 deletions(-)



Re: [Cluster-devel] [PATCH V12 14/20] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:14AM +0800, Ming Lei wrote:
> bch_bio_alloc_pages() is always called on one new bio, so it is safe
> to access the bvec table directly. Given it is the only kind of this
> case, open code the bvec table access since bio_for_each_segment_all()
> will be changed to support for iterating over multipage bvec.
> 
> Acked-by: Coly Li 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  drivers/md/bcache/util.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> index 20eddeac1531..62fb917f7a4f 100644
> --- a/drivers/md/bcache/util.c
> +++ b/drivers/md/bcache/util.c
> @@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
>   int i;
>   struct bio_vec *bv;
>  
> - bio_for_each_segment_all(bv, bio, i) {
> + /*
> +  * This is called on freshly new bio, so it is safe to access the
> +  * bvec table directly.
> +  */
> + for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
>   bv->bv_page = alloc_page(gfp_mask);
>   if (!bv->bv_page) {
>   while (--bv >= bio->bi_io_vec)
> -- 
> 2.9.5
> 



Re: [Cluster-devel] [PATCH V12 09/20] block: use bio_for_each_bvec() to compute multi-page bvec count

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:09AM +0800, Ming Lei wrote:
> First it is more efficient to use bio_for_each_bvec() in both
> blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how
> many multi-page bvecs there are in the bio.
> 
> Secondly once bio_for_each_bvec() is used, the bvec may need to be
> splitted because its length can be very longer than max segment size,
> so we have to split the big bvec into several segments.
> 
> Thirdly when splitting multi-page bvec into segments, the max segment
> limit may be reached, so the bio split need to be considered under
> this situation too.

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  block/blk-merge.c | 100 
> +++---
>  1 file changed, 80 insertions(+), 20 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 51ec6ca56a0a..2d8f388d43de 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -161,6 +161,70 @@ static inline unsigned get_max_io_size(struct 
> request_queue *q,
>   return sectors;
>  }
>  
> +static unsigned get_max_segment_size(struct request_queue *q,
> +  unsigned offset)
> +{
> + unsigned long mask = queue_segment_boundary(q);
> +
> + return min_t(unsigned long, mask - (mask & offset) + 1,
> +  queue_max_segment_size(q));
> +}
> +
> +/*
> + * Split the bvec @bv into segments, and update all kinds of
> + * variables.
> + */
> +static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
> + unsigned *nsegs, unsigned *last_seg_size,
> + unsigned *front_seg_size, unsigned *sectors)
> +{
> + unsigned len = bv->bv_len;
> + unsigned total_len = 0;
> + unsigned new_nsegs = 0, seg_size = 0;
> +
> + /*
> +  * Multipage bvec may be too big to hold in one segment,
> +  * so the current bvec has to be splitted as multiple
> +  * segments.
> +  */
> + while (len && new_nsegs + *nsegs < queue_max_segments(q)) {
> + seg_size = get_max_segment_size(q, bv->bv_offset + total_len);
> + seg_size = min(seg_size, len);
> +
> + new_nsegs++;
> + total_len += seg_size;
> + len -= seg_size;
> +
> + if ((bv->bv_offset + total_len) & queue_virt_boundary(q))
> + break;
> + }
> +
> + if (!new_nsegs)
> + return !!len;
> +
> + /* update front segment size */
> + if (!*nsegs) {
> + unsigned first_seg_size;
> +
> + if (new_nsegs == 1)
> + first_seg_size = get_max_segment_size(q, bv->bv_offset);
> + else
> + first_seg_size = queue_max_segment_size(q);
> +
> + if (*front_seg_size < first_seg_size)
> + *front_seg_size = first_seg_size;
> + }
> +
> + /* update other varibles */
> + *last_seg_size = seg_size;
> + *nsegs += new_nsegs;
> + if (sectors)
> + *sectors += total_len >> 9;
> +
> + /* split in the middle of the bvec if len != 0 */
> + return !!len;
> +}
> +
>  static struct bio *blk_bio_segment_split(struct request_queue *q,
>struct bio *bio,
>struct bio_set *bs,
> @@ -174,7 +238,7 @@ static struct bio *blk_bio_segment_split(struct 
> request_queue *q,
>   struct bio *new = NULL;
>   const unsigned max_sectors = get_max_io_size(q, bio);
>  
> - bio_for_each_segment(bv, bio, iter) {
> + bio_for_each_bvec(bv, bio, iter) {
>   /*
>* If the queue doesn't support SG gaps and adding this
>* offset would create a gap, disallow it.
> @@ -189,8 +253,12 @@ static struct bio *blk_bio_segment_split(struct 
> request_queue *q,
>*/
>   if (nsegs < queue_max_segments(q) &&
>   sectors < max_sectors) {
> - nsegs++;
> - sectors = max_sectors;
> + /* split in the middle of bvec */
> + bv.bv_len = (max_sectors - sectors) << 9;
> + bvec_split_segs(q, , ,
> + _size,
> + _seg_size,
> + );
>   }
>   goto split;
>   }
> @@ -212,14 +280,12 @@ static struct bio *blk_bio_segment_split(struct 
> request_queue *q,
>   if (nsegs == queue_max_segments(q))
>   goto split;
>  
> - if (nsegs == 1 && seg_size > front_seg_size)
> - front_seg_size = seg_size;
> -
> - nsegs++;
>   bvprv = bv;
>   bvprvp = 
> - seg_size = bv.bv_len;
> - sectors += bv.bv_len >> 9;

Re: [Cluster-devel] [PATCH V12 13/20] block: loop: pass multi-page bvec to iov_iter

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:13AM +0800, Ming Lei wrote:
> iov_iter is implemented on bvec itererator helpers, so it is safe to pass
> multi-page bvec to it, and this way is much more efficient than passing one
> page in each bvec.
> 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  drivers/block/loop.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 176ab1f28eca..e3683211f12d 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -510,21 +510,22 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> loop_cmd *cmd,
>loff_t pos, bool rw)
>  {
>   struct iov_iter iter;
> + struct req_iterator rq_iter;
>   struct bio_vec *bvec;
>   struct request *rq = blk_mq_rq_from_pdu(cmd);
>   struct bio *bio = rq->bio;
>   struct file *file = lo->lo_backing_file;
> + struct bio_vec tmp;
>   unsigned int offset;
> - int segments = 0;
> + int nr_bvec = 0;
>   int ret;
>  
> + rq_for_each_bvec(tmp, rq, rq_iter)
> + nr_bvec++;
> +
>   if (rq->bio != rq->biotail) {
> - struct req_iterator iter;
> - struct bio_vec tmp;
>  
> - __rq_for_each_bio(bio, rq)
> - segments += bio_segments(bio);
> - bvec = kmalloc_array(segments, sizeof(struct bio_vec),
> + bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
>GFP_NOIO);
>   if (!bvec)
>   return -EIO;
> @@ -533,10 +534,10 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> loop_cmd *cmd,
>   /*
>* The bios of the request may be started from the middle of
>* the 'bvec' because of bio splitting, so we can't directly
> -  * copy bio->bi_iov_vec to new bvec. The rq_for_each_segment
> +  * copy bio->bi_iov_vec to new bvec. The rq_for_each_bvec
>* API will take care of all details for us.
>*/
> - rq_for_each_segment(tmp, rq, iter) {
> + rq_for_each_bvec(tmp, rq, rq_iter) {
>   *bvec = tmp;
>   bvec++;
>   }
> @@ -550,11 +551,10 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> loop_cmd *cmd,
>*/
>   offset = bio->bi_iter.bi_bvec_done;
>   bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> - segments = bio_segments(bio);
>   }
>   atomic_set(>ref, 2);
>  
> - iov_iter_bvec(, rw, bvec, segments, blk_rq_bytes(rq));
> + iov_iter_bvec(, rw, bvec, nr_bvec, blk_rq_bytes(rq));
>   iter.iov_offset = offset;
>  
>   cmd->iocb.ki_pos = pos;
> -- 
> 2.9.5
> 



Re: [Cluster-devel] [PATCH V12 01/20] btrfs: remove various bio_offset arguments

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:01AM +0800, Ming Lei wrote:
> From: Christoph Hellwig 
> 
> The btrfs write path passes a bio_offset argument through some deep
> callchains including async offloading.  In the end this is easily
> calculatable using page_offset plus the bvec offset for the first
> page in the bio, and only actually used by by a single function.
> Just move the calculation of the offset there.
> 
> Reviewed-by: David Sterba 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/btrfs/disk-io.c   | 21 +
>  fs/btrfs/disk-io.h   |  2 +-
>  fs/btrfs/extent_io.c |  9 ++---
>  fs/btrfs/extent_io.h |  5 ++---
>  fs/btrfs/inode.c | 17 -
>  5 files changed, 18 insertions(+), 36 deletions(-)

[snip]

> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 9ea4c6f0352f..c576b3fcaea7 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1920,8 +1920,7 @@ int btrfs_merge_bio_hook(struct page *page, unsigned 
> long offset,
>   * At IO completion time the cums attached on the ordered extent record
>   * are inserted into the btree
>   */
> -static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio 
> *bio,
> - u64 bio_offset)
> +static blk_status_t btrfs_submit_bio_start(void *private_data, struct bio 
> *bio)
>  {
>   struct inode *inode = private_data;
>   blk_status_t ret = 0;
> @@ -1973,8 +1972,7 @@ blk_status_t btrfs_submit_bio_done(void *private_data, 
> struct bio *bio,
>   *c-3) otherwise:async submit
>   */
>  static blk_status_t btrfs_submit_bio_hook(void *private_data, struct bio 
> *bio,
> -  int mirror_num, unsigned long bio_flags,
> -  u64 bio_offset)
> +  int mirror_num, unsigned long bio_flags)
>  {
>   struct inode *inode = private_data;
>   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> @@ -2011,8 +2009,7 @@ static blk_status_t btrfs_submit_bio_hook(void 
> *private_data, struct bio *bio,
>   goto mapit;
>   /* we're doing a write, do the async checksumming */
>   ret = btrfs_wq_submit_bio(fs_info, bio, mirror_num, bio_flags,
> -   bio_offset, inode,
> -   btrfs_submit_bio_start);
> +   inode, btrfs_submit_bio_start);
>   goto out;
>   } else if (!skip_sum) {
>   ret = btrfs_csum_one_bio(inode, bio, 0, 0);
> @@ -8123,10 +8120,13 @@ static void btrfs_endio_direct_write(struct bio *bio)
>  }
>  
>  static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data,
> - struct bio *bio, u64 offset)
> + struct bio *bio)
>  {
>   struct inode *inode = private_data;
> + struct bio_vec *bvec = bio_first_bvec_all(bio);
> + u64 offset = page_offset(bvec->bv_page) + bvec->bv_offset;

Hm, but for direct I/O, these will be user pages (or the zero page), so
page_offset() won't be valid?

>   blk_status_t ret;
> +
>   ret = btrfs_csum_one_bio(inode, bio, offset, 1);
>   BUG_ON(ret); /* -ENOMEM */
>   return 0;



Re: [Cluster-devel] [PATCH V12 06/20] block: rename bvec helpers

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:06AM +0800, Ming Lei wrote:
> We will support multi-page bvec soon, and have to deal with
> single-page vs multi-page bvec. This patch follows Christoph's
> suggestion to rename all the following helpers:
> 
>   for_each_bvec
>   bvec_iter_bvec
>   bvec_iter_len
>   bvec_iter_page
>   bvec_iter_offset
> 
> into:
>   for_each_segment
>   segment_iter_bvec
>   segment_iter_len
>   segment_iter_page
>   segment_iter_offset
> 
> so that these helpers named with 'segment' only deal with single-page
> bvec, or called segment. We will introduce helpers named with 'bvec'
> for multi-page bvec.
> 
> bvec_iter_advance() isn't renamed becasue this helper is always operated
> on real bvec even though multi-page bvec is supported.
> 
> Suggested-by: Christoph Hellwig 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  .clang-format  |  2 +-
>  drivers/md/dm-integrity.c  |  2 +-
>  drivers/md/dm-io.c |  4 ++--
>  drivers/nvdimm/blk.c   |  4 ++--
>  drivers/nvdimm/btt.c   |  4 ++--
>  include/linux/bio.h| 10 +-
>  include/linux/bvec.h   | 20 +++-
>  include/linux/ceph/messenger.h |  2 +-
>  lib/iov_iter.c |  2 +-
>  net/ceph/messenger.c   | 14 +++---
>  10 files changed, 33 insertions(+), 31 deletions(-)



Re: [Cluster-devel] [PATCH V12 05/20] block: remove bvec_iter_rewind()

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:05AM +0800, Ming Lei wrote:
> Commit 7759eb23fd980 ("block: remove bio_rewind_iter()") removes
> bio_rewind_iter(), then no one uses bvec_iter_rewind() any more,
> so remove it.

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  include/linux/bvec.h | 24 
>  1 file changed, 24 deletions(-)
> 
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index 02c73c6aa805..ba0ae40e77c9 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -92,30 +92,6 @@ static inline bool bvec_iter_advance(const struct bio_vec 
> *bv,
>   return true;
>  }
>  
> -static inline bool bvec_iter_rewind(const struct bio_vec *bv,
> -  struct bvec_iter *iter,
> -  unsigned int bytes)
> -{
> - while (bytes) {
> - unsigned len = min(bytes, iter->bi_bvec_done);
> -
> - if (iter->bi_bvec_done == 0) {
> - if (WARN_ONCE(iter->bi_idx == 0,
> -   "Attempted to rewind iter beyond "
> -   "bvec's boundaries\n")) {
> - return false;
> - }
> - iter->bi_idx--;
> - iter->bi_bvec_done = __bvec_iter_bvec(bv, 
> *iter)->bv_len;
> - continue;
> - }
> - bytes -= len;
> - iter->bi_size += len;
> - iter->bi_bvec_done -= len;
> - }
> - return true;
> -}
> -
>  #define for_each_bvec(bvl, bio_vec, iter, start) \
>   for (iter = (start);\
>(iter).bi_size &&  \
> -- 
> 2.9.5
> 



Re: [Cluster-devel] [PATCH V12 04/20] block: don't use bio->bi_vcnt to figure out segment number

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:04AM +0800, Ming Lei wrote:
> It is wrong to use bio->bi_vcnt to figure out how many segments
> there are in the bio even though CLONED flag isn't set on this bio,
> because this bio may be splitted or advanced.
> 
> So always use bio_segments() in blk_recount_segments(), and it shouldn't
> cause any performance loss now because the physical segment number is figured
> out in blk_queue_split() and BIO_SEG_VALID is set meantime since
> bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").
> 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Omar Sandoval 

> Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max 
> segments")
> Signed-off-by: Ming Lei 
> ---
>  block/blk-merge.c | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index e69d8f8ba819..51ec6ca56a0a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -367,13 +367,7 @@ void blk_recalc_rq_segments(struct request *rq)
>  
>  void blk_recount_segments(struct request_queue *q, struct bio *bio)
>  {
> - unsigned short seg_cnt;
> -
> - /* estimate segment number by bi_vcnt for non-cloned bio */
> - if (bio_flagged(bio, BIO_CLONED))
> - seg_cnt = bio_segments(bio);
> - else
> - seg_cnt = bio->bi_vcnt;
> + unsigned short seg_cnt = bio_segments(bio);
>  
>   if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) &&
>   (seg_cnt < queue_max_segments(q)))
> -- 
> 2.9.5
> 



Re: [Cluster-devel] [PATCH V12 03/20] block: remove the "cluster" flag

2018-11-26 Thread Omar Sandoval
On Mon, Nov 26, 2018 at 10:17:03AM +0800, Ming Lei wrote:
> From: Christoph Hellwig 
> 
> The cluster flag implements some very old SCSI behavior.  As far as I
> can tell the original intent was to enable or disable any kind of
> segment merging.  But the actually visible effect to the LLDD is that
> it limits each segments to be inside a single page, which we can
> also affect by setting the maximum segment size and the segment
> boundary.

Reviewed-by: Omar Sandoval 

One comment typo below.

> Signed-off-by: Christoph Hellwig 
> 
> Replace virt boundary with segment boundary limit.
> 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-merge.c   | 20 
>  block/blk-settings.c|  3 ---
>  block/blk-sysfs.c   |  5 +
>  drivers/scsi/scsi_lib.c | 20 
>  include/linux/blkdev.h  |  6 --
>  5 files changed, 25 insertions(+), 29 deletions(-)
> 

[snip]

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0df15cb738d2..78d6d05992b0 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1810,6 +1810,8 @@ static int scsi_map_queues(struct blk_mq_tag_set *set)
>  void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>  {
>   struct device *dev = shost->dma_dev;
> + unsigned max_segment_size = dma_get_max_seg_size(dev);
> + unsigned long segment_boundary = shost->dma_boundary;
>  
>   /*
>* this limit is imposed by hardware restrictions
> @@ -1828,13 +1830,23 @@ void __scsi_init_queue(struct Scsi_Host *shost, 
> struct request_queue *q)
>   blk_queue_max_hw_sectors(q, shost->max_sectors);
>   if (shost->unchecked_isa_dma)
>   blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
> - blk_queue_segment_boundary(q, shost->dma_boundary);
>   dma_set_seg_boundary(dev, shost->dma_boundary);
>  
> - blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
> + /*
> +  * Clustering is a really old concept from the stone age of Linux
> +  * SCSI support.  But the basic idea is that we never give the
> +  * driver a segment that spans multiple pages.  For that we need
> +  * to limit the segment size, and set the segment boundary so that
> +  * we never merge a second segment which is no page aligned.

Typo, "which is not page aligned".



[Cluster-devel] [PATCH] gfs2: Get rid of potential double-freeing in gfs2_create_inode

2018-11-26 Thread Andreas Gruenbacher
In gfs2_create_inode, after setting and releasing the acl / default_acl, the
acl / default_acl pointers are not set to NULL as they should be.  In that
state, when the function reaches label fail_free_acls, gfs2_create_inode will
try to release the same acls again.

Fix that by setting the pointers to NULL after releasing the acls.  Slightly
simplify the logic.  Also, posix_acl_release checks for NULL already, so
there is no need to duplicate those checks here.

Fixes: e01580bf9e4d ("gfs2: use generic posix ACL infrastructure")
Reported-by: Pan Bian 
Cc: Christoph Hellwig 
Cc: sta...@vger.kernel.org # v4.9+
Signed-off-by: Andreas Gruenbacher 
Signed-off-by: Bob Peterson 
---
 fs/gfs2/inode.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 648f0ca1ad57..998051c4aea7 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -744,17 +744,19 @@ static int gfs2_create_inode(struct inode *dir, struct 
dentry *dentry,
   the gfs2 structures. */
if (default_acl) {
error = __gfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT);
+   if (error)
+   goto fail_gunlock3;
posix_acl_release(default_acl);
+   default_acl = NULL;
}
if (acl) {
-   if (!error)
-   error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS);
+   error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS);
+   if (error)
+   goto fail_gunlock3;
posix_acl_release(acl);
+   acl = NULL;
}
 
-   if (error)
-   goto fail_gunlock3;
-
error = security_inode_init_security(>i_inode, >i_inode, name,
 _initxattrs, NULL);
if (error)
@@ -789,10 +791,8 @@ static int gfs2_create_inode(struct inode *dir, struct 
dentry *dentry,
}
gfs2_rsqa_delete(ip, NULL);
 fail_free_acls:
-   if (default_acl)
-   posix_acl_release(default_acl);
-   if (acl)
-   posix_acl_release(acl);
+   posix_acl_release(default_acl);
+   posix_acl_release(acl);
 fail_gunlock:
gfs2_dir_no_add();
gfs2_glock_dq_uninit(ghs);
-- 
2.17.2



Re: [Cluster-devel] [PATCH V12 10/20] block: use bio_for_each_bvec() to map sg

2018-11-26 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH V12 09/20] block: use bio_for_each_bvec() to compute multi-page bvec count

2018-11-26 Thread Christoph Hellwig
Looks fine:

Reviewed-by: Christoph Hellwig 

Nitpick below:

> +{
> + unsigned len = bv->bv_len;
> + unsigned total_len = 0;
> + unsigned new_nsegs = 0, seg_size = 0;
> +
> + /*
> +  * Multipage bvec may be too big to hold in one segment,
> +  * so the current bvec has to be splitted as multiple
> +  * segments.
> +  */

Please use up all 80 chars in a line.



Re: [Cluster-devel] [PATCH V12 16/20] block: enable multipage bvecs

2018-11-26 Thread Christoph Hellwig
> + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) +
> + bv->bv_offset + bv->bv_len;

The name is a little confusing, as the real end addr would be -1.  Maybe
throw the -1 in here, and adjust for it in the contigous check below?



Re: [Cluster-devel] [PATCH V12 18/20] block: document usage of bio iterator helpers

2018-11-26 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 10:17:18AM +0800, Ming Lei wrote:
> Now multi-page bvec is supported, some helpers may return page by
> page, meantime some may return segment by segment, this patch
> documents the usage.
> 
> Signed-off-by: Ming Lei 

Looks fine,

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH V12 06/20] block: rename bvec helpers

2018-11-26 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 10:17:06AM +0800, Ming Lei wrote:
> We will support multi-page bvec soon, and have to deal with
> single-page vs multi-page bvec. This patch follows Christoph's
> suggestion to rename all the following helpers:
> 
>   for_each_bvec
>   bvec_iter_bvec
>   bvec_iter_len
>   bvec_iter_page
>   bvec_iter_offset
> 
> into:
>   for_each_segment
>   segment_iter_bvec
>   segment_iter_len
>   segment_iter_page
>   segment_iter_offset
> 
> so that these helpers named with 'segment' only deal with single-page
> bvec, or called segment. We will introduce helpers named with 'bvec'
> for multi-page bvec.
> 
> bvec_iter_advance() isn't renamed becasue this helper is always operated
> on real bvec even though multi-page bvec is supported.
> 
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Ming Lei 

Looks fine,

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH V12 08/20] block: introduce bio_for_each_bvec() and rq_for_each_bvec()

2018-11-26 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 10:17:08AM +0800, Ming Lei wrote:
> bio_for_each_bvec() is used for iterating over multi-page bvec for bio
> split & merge code.
> 
> rq_for_each_bvec() can be used for drivers which may handle the
> multi-page bvec directly, so far loop is one perfect use case.
> 
> Reviewed-by: Omar Sandoval 
> Signed-off-by: Ming Lei 

Looks fine,

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH V12 07/20] block: introduce multi-page bvec helpers

2018-11-26 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH V12 05/20] block: remove bvec_iter_rewind()

2018-11-26 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 10:17:05AM +0800, Ming Lei wrote:
> Commit 7759eb23fd980 ("block: remove bio_rewind_iter()") removes
> bio_rewind_iter(), then no one uses bvec_iter_rewind() any more,
> so remove it.
> 
> Signed-off-by: Ming Lei 

Looks good,

Reviewed-by: Christoph Hellwig 



[Cluster-devel] [PATCH] gfs2: get rid of potential double free

2018-11-26 Thread Pan Bian
__gfs2_set_acl will not increase the reference count of acl if it fails.
In this case, posix_acl_release are called twice, one after
__gfs2_set_acl and one in the error handling block. This patch does not
release acl when __gfs2_set_acl fails.

Fixes: e01580bf9e4("gfs2: use generic posix ACL infrastructure")
Signed-off-by: Pan Bian 
---
 fs/gfs2/inode.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 648f0ca..3a9041c 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -744,12 +744,13 @@ static int gfs2_create_inode(struct inode *dir, struct 
dentry *dentry,
   the gfs2 structures. */
if (default_acl) {
error = __gfs2_set_acl(inode, default_acl, ACL_TYPE_DEFAULT);
-   posix_acl_release(default_acl);
+   if (!error)
+   posix_acl_release(default_acl);
}
-   if (acl) {
+   if (acl && !error) {
+   error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS);
if (!error)
-   error = __gfs2_set_acl(inode, acl, ACL_TYPE_ACCESS);
-   posix_acl_release(acl);
+   posix_acl_release(acl);
}
 
if (error)
-- 
2.7.4




Re: [Cluster-devel] [PATCH V12 06/20] block: rename bvec helpers

2018-11-26 Thread Miguel Ojeda
On Mon, Nov 26, 2018 at 3:20 AM Ming Lei  wrote:
>
> We will support multi-page bvec soon, and have to deal with
> single-page vs multi-page bvec. This patch follows Christoph's
> suggestion to rename all the following helpers:
>
> for_each_bvec
> bvec_iter_bvec
> bvec_iter_len
> bvec_iter_page
> bvec_iter_offset
>
> into:
> for_each_segment
> segment_iter_bvec
> segment_iter_len
> segment_iter_page
> segment_iter_offset
>
> so that these helpers named with 'segment' only deal with single-page
> bvec, or called segment. We will introduce helpers named with 'bvec'
> for multi-page bvec.
>
> bvec_iter_advance() isn't renamed becasue this helper is always operated
> on real bvec even though multi-page bvec is supported.
>
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Ming Lei 
> ---
>  .clang-format  |  2 +-

Acked-by: Miguel Ojeda 

Cheers,
Miguel