Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 05:10:25PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:31:36PM +0800, Ming Lei wrote:
> > > But while looking over this I wonder why we even need the max_seg_len
> > > here.  The only thing __bvec_iter_advance does it to move bi_bvec_done
> > > and bi_idx forward, with corresponding decrements of bi_size.  As far
> > > as I can tell the only thing that max_seg_len does is that we need
> > > to more iterations of the while loop to archive the same thing.
> > > 
> > > And actual bvec used by the caller will be obtained using
> > > bvec_iter_bvec or segment_iter_bvec depending on if they want multi-page
> > > or single-page variants.
> > 
> > Right, we let __bvec_iter_advance() serve for both multi-page and 
> > single-page
> > case, then we have to tell it via one way or another, now we use the 
> > constant
> > of 'max_seg_len'.
> > 
> > Or you suggest to implement two versions of __bvec_iter_advance()?
> 
> No - I think we can always use the code without any segment in
> bvec_iter_advance.  Because bvec_iter_advance only operates on the
> iteractor, the generation of an actual single-page or multi-page
> bvec is left to the caller using the bvec_iter_bvec or segment_iter_bvec
> helpers.  The only difference is how many bytes you can move the
> iterator forward in a single loop iteration - so if you pass in
> PAGE_SIZE as the max_seg_len you just will have to loop more often
> for a large enough bytes, but not actually do anything different.

Yeah, I see that.

The difference is made by bio_iter_iovec()/bio_iter_mp_iovec() in
__bio_for_each_segment()/__bio_for_each_bvec().


Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 02/19] block: introduce multi-page bvec helpers

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 05:08:11PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:06:11PM +0800, Ming Lei wrote:
> > bvec_iter_* is used for single-page bvec in current linus tree, and there 
> > are
> > lots of users now:
> > 
> > [linux]$ git grep -n "bvec_iter_*" ./ | wc
> > 191 995   13242
> > 
> > If we have to switch it first, it can be a big change, just wondering if 
> > Jens
> > is happy with that?
> 
> Your above grep statement seems to catch every use of struct bvec_iter,
> due to the *.
> 
> Most uses of bvec_iter_ are either in the block headers, or are
> ceph wrappers that match the above and can easily be redefined.

OK, looks you are right, seems not so widely used:

$ git grep -n -w -E 
"bvec_iter_len|bvec_iter_bvec|bvec_iter_advance|bvec_iter_page|bvec_iter_offset"
 ./  | wc
 36 1942907

I will switch to that given the effected driver are only dm, nvdimm and ceph.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 15/19] block: enable multipage bvecs

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 03:55:02PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:23:23AM +0800, Ming Lei wrote:
> > if (bio->bi_vcnt > 0) {
> > -   struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> > +   struct bio_vec bv;
> > +   struct bio_vec *seg = >bi_io_vec[bio->bi_vcnt - 1];
> >  
> > -   if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
> > -   bv->bv_len += len;
> > +   bvec_last_segment(seg, );
> > +
> > +   if (page == bv.bv_page && off == bv.bv_offset + bv.bv_len) {
> 
> I think this we can simplify the try to merge into bio case a bit,
> and also document it better with something like this:
> 
> diff --git a/block/bio.c b/block/bio.c
> index 854676edc438..cc913281a723 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -822,54 +822,40 @@ EXPORT_SYMBOL(bio_add_pc_page);
>   * @page: page to add
>   * @len: length of the data to add
>   * @off: offset of the data in @page
> + * @same_page: if %true only merge if the new data is in the same physical
> + *   page as the last segment of the bio.
>   *
> - * Try to add the data at @page + @off to the last page of @bio.  This is a
> + * Try to add the data at @page + @off to the last bvec of @bio.  This is a
>   * a useful optimisation for file systems with a block size smaller than the
>   * page size.
>   *
>   * Return %true on success or %false on failure.
>   */
>  bool __bio_try_merge_page(struct bio *bio, struct page *page,
> - unsigned int len, unsigned int off)
> + unsigned int len, unsigned int off, bool same_page)
>  {
>   if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
>   return false;
>  
>   if (bio->bi_vcnt > 0) {
> - struct bio_vec bv;
> - struct bio_vec *seg = >bi_io_vec[bio->bi_vcnt - 1];
> -
> - bvec_last_segment(seg, );
> -
> - if (page == bv.bv_page && off == bv.bv_offset + bv.bv_len) {
> - seg->bv_len += len;
> - bio->bi_iter.bi_size += len;
> - return true;
> - }
> + struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> + phys_addr_t vec_addr = page_to_phys(bv->bv_page);
> + phys_addr_t page_addr = page_to_phys(page);
> +
> + if (vec_addr + bv->bv_offset + bv->bv_len != page_addr + off)
> + return false;
> + if (same_page &&
> + (vec_addr & PAGE_SIZE) != (page_addr & PAGE_SIZE))
> + return false;

I guess the correct check should be:

end_addr = vec_addr + bv->bv_offset + bv->bv_len;
if (same_page &&
(end_addr & PAGE_MASK) != (page_addr & PAGE_MASK))
return false;

And this approach is good, will take it in V12.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 03:33:55PM +0100, Christoph Hellwig wrote:
> > +   non-cluster.o
> 
> Do we really need a new source file for these few functions?
> 
> > default:
> > +   if (!blk_queue_cluster(q)) {
> > +   blk_queue_non_cluster_bio(q, bio);
> > +   return;
> 
> I'd name this blk_bio_segment_split_singlepage or similar.

OK.

> 
> > +static __init int init_non_cluster_bioset(void)
> > +{
> > +   WARN_ON(bioset_init(_cluster_bio_set, BIO_POOL_SIZE, 0,
> > +  BIOSET_NEED_BVECS));
> > +   WARN_ON(bioset_integrity_create(_cluster_bio_set, BIO_POOL_SIZE));
> > +   WARN_ON(bioset_init(_cluster_bio_split, BIO_POOL_SIZE, 0, 0));
> 
> Please only allocate the resources once a queue without the cluster
> flag is registered, there are only very few modern drivers that do that.

OK.

> 
> > +static void non_cluster_end_io(struct bio *bio)
> > +{
> > +   struct bio *bio_orig = bio->bi_private;
> > +
> > +   bio_orig->bi_status = bio->bi_status;
> > +   bio_endio(bio_orig);
> > +   bio_put(bio);
> > +}
> 
> Why can't we use bio_chain for the split bios?

The parent bio is multi-page bvec, we can't submit it for non-cluster.

> 
> > +   bio_for_each_segment(from, *bio_orig, iter) {
> > +   if (i++ < max_segs)
> > +   sectors += from.bv_len >> 9;
> > +   else
> > +   break;
> > +   }
> 
> The easy to read way would be:
> 
>   bio_for_each_segment(from, *bio_orig, iter) {
>   if (i++ == max_segs)
>   break;
>   sectors += from.bv_len >> 9;
>   }

OK.

> 
> > +   if (sectors < bio_sectors(*bio_orig)) {
> > +   bio = bio_split(*bio_orig, sectors, GFP_NOIO,
> > +   _cluster_bio_split);
> > +   bio_chain(bio, *bio_orig);
> > +   generic_make_request(*bio_orig);
> > +   *bio_orig = bio;
> 
> I don't think this is very efficient, as this means we now
> clone the bio twice, first to split it at the sector boundary,
> and then again when converting it to single-page bio_vec.

That is exactly what bounce code does. The problem for both bounce
and non-cluster is same actually because the bvec table itself has
to be changed.

> 
> I think this could be something like this (totally untested):
> 
> diff --git a/block/non-cluster.c b/block/non-cluster.c
> index 9c2910be9404..60389f275c43 100644
> --- a/block/non-cluster.c
> +++ b/block/non-cluster.c
> @@ -13,58 +13,59 @@
>  
>  #include "blk.h"
>  
> -static struct bio_set non_cluster_bio_set, non_cluster_bio_split;
> +static struct bio_set non_cluster_bio_set;
>  
>  static __init int init_non_cluster_bioset(void)
>  {
>   WARN_ON(bioset_init(_cluster_bio_set, BIO_POOL_SIZE, 0,
>  BIOSET_NEED_BVECS));
>   WARN_ON(bioset_integrity_create(_cluster_bio_set, BIO_POOL_SIZE));
> - WARN_ON(bioset_init(_cluster_bio_split, BIO_POOL_SIZE, 0, 0));
>  
>   return 0;
>  }
>  __initcall(init_non_cluster_bioset);
>  
> -static void non_cluster_end_io(struct bio *bio)
> -{
> - struct bio *bio_orig = bio->bi_private;
> -
> - bio_orig->bi_status = bio->bi_status;
> - bio_endio(bio_orig);
> - bio_put(bio);
> -}
> -
>  void blk_queue_non_cluster_bio(struct request_queue *q, struct bio 
> **bio_orig)
>  {
> - struct bio *bio;
>   struct bvec_iter iter;
> - struct bio_vec from;
> - unsigned i = 0;
> - unsigned sectors = 0;
> - unsigned short max_segs = min_t(unsigned short, BIO_MAX_PAGES,
> - queue_max_segments(q));
> + struct bio *bio;
> + struct bio_vec bv;
> + unsigned short max_segs, segs = 0;
> +
> + bio = bio_alloc_bioset(GFP_NOIO, bio_segments(*bio_orig),
> + _cluster_bio_set);

bio_segments(*bio_orig) may be > 256, so bio_alloc_bioset() may fail.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 02:32:44PM +0100, Christoph Hellwig wrote:
> > +#define bio_iter_mp_iovec(bio, iter)   \
> > +   segment_iter_bvec((bio)->bi_io_vec, (iter))
> 
> Besides the mp naming we'd like to get rid off there also is just
> a single user of this macro, please just expand it there.

OK.

> 
> > +#define segment_iter_bvec(bvec, iter)  \
> > +((struct bio_vec) {
> > \
> > +   .bv_page= segment_iter_page((bvec), (iter)),\
> > +   .bv_len = segment_iter_len((bvec), (iter)), \
> > +   .bv_offset  = segment_iter_offset((bvec), (iter)),  \
> > +})
> 
> And for this one please keep the segment vs bvec versions of these
> macros close together in the file please, right now it follow the
> bvec_iter_bvec variant closely.

OK.

> 
> > +static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> > *iter,
> > + unsigned bytes, unsigned max_seg_len)
> >  {
> > iter->bi_sector += bytes >> 9;
> >  
> > if (bio_no_advance_iter(bio))
> > iter->bi_size -= bytes;
> > else
> > -   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
> > +   __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_seg_len);
> > /* TODO: It is reasonable to complete bio with error here. */
> >  }
> >  
> > +static inline void bio_advance_iter(struct bio *bio, struct bvec_iter 
> > *iter,
> > +   unsigned bytes)
> > +{
> > +   __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
> > +}
> 
> Btw, I think the remaining users of bio_advance_iter() in bio.h
> should probably switch to using __bio_advance_iter to make them a little
> more clear to read.

Good point.

> 
> > +/* returns one real segment(multi-page bvec) each time */
> 
> space before the brace, please.

OK.

> 
> > +#define BVEC_MAX_LEN  ((unsigned int)-1)
> 
> > while (bytes) {
> > +   unsigned segment_len = segment_iter_len(bv, *iter);
> >  
> > -   iter->bi_bvec_done += len;
> > +   if (max_seg_len < BVEC_MAX_LEN)
> > +   segment_len = min_t(unsigned, segment_len,
> > +   max_seg_len -
> > +   bvec_iter_offset(bv, *iter));
> > +
> > +   segment_len = min(bytes, segment_len);
> 
> Please stick to passing the magic zero here as can often generate more
> efficient code.

But zero may decrease the code readability. Actually the passed
'max_seg_len' is just a constant, and complier should have generated
same efficient code for any constant, either 0 or other.

> 
> Talking about efficent code - I wonder how much code size we'd save
> by moving this function out of line..

That is good point, see the following diff:

[mingl@hp kernel]$ diff -u inline.size non_inline.size
--- inline.size 2018-11-21 23:24:52.305312076 +0800
+++ non_inline.size 2018-11-21 23:24:59.908393010 +0800
@@ -1,2 +1,2 @@
text   data bss dec hex filename
-13429213   6893922 4292692 246158271779b93 vmlinux.inline
+13429153   6893346 4292692 246151911779917 vmlinux.non_inline

vmlinux(non_inline) is built by just moving/exporting __bvec_iter_advance()
into block/bio.c.

The difference is about 276bytes.

> 
> But while looking over this I wonder why we even need the max_seg_len
> here.  The only thing __bvec_iter_advance does it to move bi_bvec_done
> and bi_idx forward, with corresponding decrements of bi_size.  As far
> as I can tell the only thing that max_seg_len does is that we need
> to more iterations of the while loop to archive the same thing.
> 
> And actual bvec used by the caller will be obtained using
> bvec_iter_bvec or segment_iter_bvec depending on if they want multi-page
> or single-page variants.

Right, we let __bvec_iter_advance() serve for both multi-page and single-page
case, then we have to tell it via one way or another, now we use the constant
of 'max_seg_len'.

Or you suggest to implement two versions of __bvec_iter_advance()?

Thanks,
Ming



Re: [Cluster-devel] [PATCH V11 02/19] block: introduce multi-page bvec helpers

2018-11-21 Thread Ming Lei
On Wed, Nov 21, 2018 at 02:19:28PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 11:23:10AM +0800, Ming Lei wrote:
> > This patch introduces helpers of 'segment_iter_*' for multipage
> > bvec support.
> > 
> > The introduced helpers treate one bvec as real multi-page segment,
> > which may include more than one pages.
> 
> Unless I'm missing something these bvec vs segment names are exactly
> inverted vs how we use it elsewhere.
> 
> In the iterators we use segment for single-page bvec, and bvec for multi
> page ones, and here it is inverse.  Please switch it around.

bvec_iter_* is used for single-page bvec in current linus tree, and there are
lots of users now:

[linux]$ git grep -n "bvec_iter_*" ./ | wc
191 995   13242

If we have to switch it first, it can be a big change, just wondering if Jens
is happy with that?

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-21 Thread Ming Lei
On Tue, Nov 20, 2018 at 09:35:07PM -0800, Sagi Grimberg wrote:
> 
> > > Wait, I see that the bvec is still a single array per bio. When you said
> > > a table I thought you meant a 2-dimentional array...
> > 
> > I mean a new 1-d table A has to be created for multiple bios in one rq,
> > and build it in the following way
> > 
> > rq_for_each_bvec(tmp, rq, rq_iter)
> >  *A = tmp;
> > 
> > Then you can pass A to iov_iter_bvec() & send().
> > 
> > Given it is over TCP, I guess it should be doable for you to preallocate one
> > 256-bvec table in one page for each request, then sets the max segment size 
> > as
> > (unsigned int)-1, and max segment number as 256, the preallocated table
> > should work anytime.
> 
> 256 bvec table is really a lot to preallocate, especially when its not
> needed, I can easily initialize the bvec_iter on the bio bvec. If this
> involves preallocation of the worst-case than I don't consider this to
> be an improvement.

If you don't provide one single bvec table, I understand you may not send
this req via one send().

The bvec_iter initialization is easy to do:

bvec_iter = bio->bi_iter

when you move to a new a bio, please refer to  __bio_for_each_bvec() or
__bio_for_each_segment().

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Ming Lei
On Tue, Nov 20, 2018 at 08:42:04PM -0800, Sagi Grimberg wrote:
> 
> > > Yeah, that is the most common example, given merge is enabled
> > > in most of cases. If the driver or device doesn't care merge,
> > > you can disable it and always get single bio request, then the
> > > bio's bvec table can be reused for send().
> > 
> > Does bvec_iter span bvecs with your patches? I didn't see that change?
> 
> Wait, I see that the bvec is still a single array per bio. When you said
> a table I thought you meant a 2-dimentional array...

I mean a new 1-d table A has to be created for multiple bios in one rq,
and build it in the following way

   rq_for_each_bvec(tmp, rq, rq_iter)
*A = tmp;

Then you can pass A to iov_iter_bvec() & send().

Given it is over TCP, I guess it should be doable for you to preallocate one
256-bvec table in one page for each request, then sets the max segment size as
(unsigned int)-1, and max segment number as 256, the preallocated table
should work anytime.


Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Ming Lei
On Tue, Nov 20, 2018 at 07:20:45PM -0800, Sagi Grimberg wrote:
> 
> > Not sure I understand the 'blocking' problem in this case.
> > 
> > We can build a bvec table from this req, and send them all
> > in send(),
> 
> I would like to avoid growing bvec tables and keep everything
> preallocated. Plus, a bvec_iter operates on a bvec which means
> we'll need a table there as well... Not liking it so far...

In case of bios in one request, we can't know how many bvecs there
are except for calling rq_bvecs(), so it may not be suitable to
preallocate the table. If you have to send the IO request in one send(),
runtime allocation may be inevitable.

If you don't require to send the IO request in one send(), you may send
one bio in one time, and just uses the bio's bvec table directly,
such as the single bio case in lo_rw_aio().

> 
> > can this way avoid your blocking issue? You may see this
> > example in branch 'rq->bio != rq->biotail' of lo_rw_aio().
> 
> This is exactly an example of not ignoring the bios...

Yeah, that is the most common example, given merge is enabled
in most of cases. If the driver or device doesn't care merge,
you can disable it and always get single bio request, then the
bio's bvec table can be reused for send().

> 
> > If this way is what you need, I think you are right, even we may
> > introduce the following helpers:
> > 
> > rq_for_each_bvec()
> > rq_bvecs()
> 
> I'm not sure how this helps me either. Unless we can set a bvec_iter to
> span bvecs or have an abstract bio crossing when we re-initialize the
> bvec_iter I don't see how I can ignore bios completely...

rq_for_each_bvec() will iterate over all bvecs from all bios, so you
needn't to see any bio in this req.

rq_bvecs() will return how many bvecs there are in this request(cover
all bios in this req)

> 
> > So looks nvme-tcp host driver might be the 2nd driver which benefits
> > from multi-page bvec directly.
> > 
> > The multi-page bvec V11 has passed my tests and addressed almost
> > all the comments during review on V10. I removed bio_vecs() in V11,
> > but it won't be big deal, we can introduce them anytime when there
> > is the requirement.
> 
> multipage-bvecs and nvme-tcp are going to conflict, so it would be good
> to coordinate on this. I think that nvme-tcp host needs some adjustments
> as setting a bvec_iter. I'm under the impression that the change is rather
> small and self-contained, but I'm not sure I have the full
> picture here.

I guess I may not get your exact requirement on block io iterator from nvme-tcp
too, :-(

thanks,
Ming



[Cluster-devel] [PATCH V11 19/19] block: kill BLK_MQ_F_SG_MERGE

2018-11-20 Thread Ming Lei
QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 block/blk-mq-debugfs.c   | 1 -
 drivers/block/loop.c | 2 +-
 drivers/block/nbd.c  | 2 +-
 drivers/block/rbd.c  | 2 +-
 drivers/block/skd_main.c | 1 -
 drivers/block/xen-blkfront.c | 2 +-
 drivers/md/dm-rq.c   | 2 +-
 drivers/mmc/core/queue.c | 3 +--
 drivers/scsi/scsi_lib.c  | 2 +-
 include/linux/blk-mq.h   | 1 -
 10 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index d752fe4461af..a6ec055b54fa 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -249,7 +249,6 @@ static const char *const alloc_policy_name[] = {
 static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(SHOULD_MERGE),
HCTX_FLAG_NAME(TAG_SHARED),
-   HCTX_FLAG_NAME(SG_MERGE),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index e3683211f12d..4cf5486689de 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1906,7 +1906,7 @@ static int loop_add(struct loop_device **l, int i)
lo->tag_set.queue_depth = 128;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
-   lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
lo->tag_set.driver_data = lo;
 
err = blk_mq_alloc_tag_set(>tag_set);
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 08696f5f00bb..999c94de78e5 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1570,7 +1570,7 @@ static int nbd_dev_add(int index)
nbd->tag_set.numa_node = NUMA_NO_NODE;
nbd->tag_set.cmd_size = sizeof(struct nbd_cmd);
nbd->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
-   BLK_MQ_F_SG_MERGE | BLK_MQ_F_BLOCKING;
+   BLK_MQ_F_BLOCKING;
nbd->tag_set.driver_data = nbd;
 
err = blk_mq_alloc_tag_set(>tag_set);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 8e5140bbf241..3dfd300b5283 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3988,7 +3988,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
rbd_dev->tag_set.ops = _mq_ops;
rbd_dev->tag_set.queue_depth = rbd_dev->opts->queue_depth;
rbd_dev->tag_set.numa_node = NUMA_NO_NODE;
-   rbd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   rbd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
rbd_dev->tag_set.nr_hw_queues = 1;
rbd_dev->tag_set.cmd_size = sizeof(struct work_struct);
 
diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index a10d5736d8f7..a7040f9a1b1b 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -2843,7 +2843,6 @@ static int skd_cons_disk(struct skd_device *skdev)
skdev->sgs_per_request * sizeof(struct scatterlist);
skdev->tag_set.numa_node = NUMA_NO_NODE;
skdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
-   BLK_MQ_F_SG_MERGE |
BLK_ALLOC_POLICY_TO_MQ_FLAG(BLK_TAG_ALLOC_FIFO);
skdev->tag_set.driver_data = skdev;
rc = blk_mq_alloc_tag_set(>tag_set);
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 0ed4b200fa58..d43a5677ccbc 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -977,7 +977,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
} else
info->tag_set.queue_depth = BLK_RING_SIZE(info);
info->tag_set.numa_node = NUMA_NO_NODE;
-   info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
info->tag_set.cmd_size = sizeof(struct blkif_req);
info->tag_set.driver_data = info;
 
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1f1fe9a618ea..afbac62a02a2 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -536,7 +536,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, 
struct dm_table *t)
md->tag_set->ops = _mq_ops;
md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
md->tag_set->numa_node = md->numa_node_id;
-   md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
md->tag_set->driver_data = md;
 
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 35cc138b096d..cc19e71c71d4 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -410,8 +410,7 @@ int mmc_init_queue(struct mmc_queue *mq,

[Cluster-devel] [PATCH V11 18/19] block: kill QUEUE_FLAG_NO_SG_MERGE

2018-11-20 Thread Ming Lei
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"),
physical segment number is mainly figured out in blk_queue_split() for
fast path, and the flag of BIO_SEG_VALID is set there too.

Now only blk_recount_segments() and blk_recalc_rq_segments() use this
flag.

Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
is set in blk_queue_split().

For another user of blk_recalc_rq_segments():

- run in partial completion branch of blk_update_request, which is an unusual 
case

- run in blk_cloned_rq_check_limits(), still not a big problem if the flag is 
killed
since dm-rq is the only user.

Multi-page bvec is enabled now, not doing S/G merging is rather pointless with 
the
current setup of the I/O path, as it isn't going to save you a significant 
amount
of cycles.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 block/blk-merge.c  | 31 ++-
 block/blk-mq-debugfs.c |  1 -
 block/blk-mq.c |  3 ---
 drivers/md/dm-table.c  | 13 -
 include/linux/blkdev.h |  1 -
 5 files changed, 6 insertions(+), 43 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7c44216c1b58..8fcac7855a45 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -343,8 +343,7 @@ void blk_queue_split(struct request_queue *q, struct bio 
**bio)
 EXPORT_SYMBOL(blk_queue_split);
 
 static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
-struct bio *bio,
-bool no_sg_merge)
+struct bio *bio)
 {
struct bio_vec bv, bvprv = { NULL };
int cluster, prev = 0;
@@ -371,13 +370,6 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
nr_phys_segs = 0;
for_each_bio(bio) {
bio_for_each_bvec(bv, bio, iter) {
-   /*
-* If SG merging is disabled, each bio vector is
-* a segment
-*/
-   if (no_sg_merge)
-   goto new_segment;
-
if (prev && cluster) {
if (seg_size + bv.bv_len
> queue_max_segment_size(q))
@@ -412,27 +404,16 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
 
 void blk_recalc_rq_segments(struct request *rq)
 {
-   bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
-   >q->queue_flags);
-
-   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio,
-   no_sg_merge);
+   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio);
 }
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-   unsigned short seg_cnt = bio_segments(bio);
-
-   if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) &&
-   (seg_cnt < queue_max_segments(q)))
-   bio->bi_phys_segments = seg_cnt;
-   else {
-   struct bio *nxt = bio->bi_next;
+   struct bio *nxt = bio->bi_next;
 
-   bio->bi_next = NULL;
-   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false);
-   bio->bi_next = nxt;
-   }
+   bio->bi_next = NULL;
+   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio);
+   bio->bi_next = nxt;
 
bio_set_flag(bio, BIO_SEG_VALID);
 }
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index a32bb79d6c95..d752fe4461af 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -127,7 +127,6 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(SAME_FORCE),
QUEUE_FLAG_NAME(DEAD),
QUEUE_FLAG_NAME(INIT_DONE),
-   QUEUE_FLAG_NAME(NO_SG_MERGE),
QUEUE_FLAG_NAME(POLL),
QUEUE_FLAG_NAME(WC),
QUEUE_FLAG_NAME(FUA),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 32b246ed44c0..0375c3bd410e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2755,9 +2755,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
 
q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 
-   if (!(set->flags & BLK_MQ_F_SG_MERGE))
-   blk_queue_flag_set(QUEUE_FLAG_NO_SG_MERGE, q);
-
q->sg_reserved_size = INT_MAX;
 
INIT_DELAYED_WORK(>requeue_work, blk_mq_requeue_work);
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 844f7d0f2ef8..a41832cf0c98 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1698,14 +1698,6 @@ static int device_is_not_random(struct dm_target *ti, 
struct dm_dev *dev,
return q && !blk_queue_add_random(q);
 }
 
-static int queue_supports_sg_merge(struct dm_target *ti, stru

[Cluster-devel] [PATCH V11 17/19] block: document usage of bio iterator helpers

2018-11-20 Thread Ming Lei
Now multi-page bvec is supported, some helpers may return page by
page, meantime some may return segment by segment, this patch
documents the usage.

Signed-off-by: Ming Lei 
---
 Documentation/block/biovecs.txt | 24 
 1 file changed, 24 insertions(+)

diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
index 25689584e6e0..bb008f7afb05 100644
--- a/Documentation/block/biovecs.txt
+++ b/Documentation/block/biovecs.txt
@@ -117,3 +117,27 @@ Other implications:
size limitations and the limitations of the underlying devices. Thus
there's no need to define ->merge_bvec_fn() callbacks for individual block
drivers.
+
+Usage of helpers:
+=
+
+* The following helpers whose names have the suffix of "_all" can only be used
+on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
+shouldn't use them because the bio may have been split before it reached the
+driver.
+
+   bio_for_each_segment_all()
+   bio_first_bvec_all()
+   bio_first_page_all()
+   bio_last_bvec_all()
+
+* The following helpers iterate over single-page bvecs. The passed 'struct
+bio_vec' will contain a single-page IO vector during the iteration
+
+   bio_for_each_segment()
+   bio_for_each_segment_all()
+
+* The following helpers iterate over single-page bvecs. The passed 'struct
+bio_vec' will contain a single-page IO vector during the iteration
+
+   bio_for_each_bvec()
-- 
2.9.5



[Cluster-devel] [PATCH V11 16/19] block: always define BIO_MAX_PAGES as 256

2018-11-20 Thread Ming Lei
Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to
increase BIO_MAX_PAGES for it.

CONFIG_THP_SWAP needs to split one THP into normal pages and adds
them all to one bio. With multipage-bvec, it just takes one bvec to
hold them all.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
---
 include/linux/bio.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7edad188568a..e5b975fa0558 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -34,15 +34,7 @@
 #define BIO_BUG_ON
 #endif
 
-#ifdef CONFIG_THP_SWAP
-#if HPAGE_PMD_NR > 256
-#define BIO_MAX_PAGES  HPAGE_PMD_NR
-#else
 #define BIO_MAX_PAGES  256
-#endif
-#else
-#define BIO_MAX_PAGES  256
-#endif
 
 #define bio_prio(bio)  (bio)->bi_ioprio
 #define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio)
-- 
2.9.5



[Cluster-devel] [PATCH V11 15/19] block: enable multipage bvecs

2018-11-20 Thread Ming Lei
This patch pulls the trigger for multi-page bvecs.

Signed-off-by: Ming Lei 
---
 block/bio.c   | 32 +++-
 fs/iomap.c|  2 +-
 fs/xfs/xfs_aops.c |  2 +-
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 0f1635b9ec50..854676edc438 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -823,7 +823,7 @@ EXPORT_SYMBOL(bio_add_pc_page);
  * @len: length of the data to add
  * @off: offset of the data in @page
  *
- * Try to add the data at @page + @off to the last bvec of @bio.  This is a
+ * Try to add the data at @page + @off to the last page of @bio.  This is a
  * a useful optimisation for file systems with a block size smaller than the
  * page size.
  *
@@ -836,10 +836,13 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
return false;
 
if (bio->bi_vcnt > 0) {
-   struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
+   struct bio_vec bv;
+   struct bio_vec *seg = >bi_io_vec[bio->bi_vcnt - 1];
 
-   if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
-   bv->bv_len += len;
+   bvec_last_segment(seg, );
+
+   if (page == bv.bv_page && off == bv.bv_offset + bv.bv_len) {
+   seg->bv_len += len;
bio->bi_iter.bi_size += len;
return true;
}
@@ -848,6 +851,25 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
 }
 EXPORT_SYMBOL_GPL(__bio_try_merge_page);
 
+static bool bio_try_merge_segment(struct bio *bio, struct page *page,
+ unsigned int len, unsigned int off)
+{
+   if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
+   return false;
+
+   if (bio->bi_vcnt > 0) {
+   struct bio_vec *seg = >bi_io_vec[bio->bi_vcnt - 1];
+
+   if (page_to_phys(seg->bv_page) + seg->bv_offset + seg->bv_len ==
+   page_to_phys(page) + off) {
+   seg->bv_len += len;
+   bio->bi_iter.bi_size += len;
+   return true;
+   }
+   }
+   return false;
+}
+
 /**
  * __bio_add_page - add page to a bio in a new segment
  * @bio: destination bio
@@ -888,7 +910,7 @@ EXPORT_SYMBOL_GPL(__bio_add_page);
 int bio_add_page(struct bio *bio, struct page *page,
 unsigned int len, unsigned int offset)
 {
-   if (!__bio_try_merge_page(bio, page, len, offset)) {
+   if (!bio_try_merge_segment(bio, page, len, offset)) {
if (bio_full(bio))
return 0;
__bio_add_page(bio, page, len, offset);
diff --git a/fs/iomap.c b/fs/iomap.c
index f5fb8bf75cc8..ccc2ba115f4d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -344,7 +344,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
ctx->bio->bi_end_io = iomap_read_end_io;
}
 
-   __bio_add_page(ctx->bio, page, plen, poff);
+   bio_add_page(ctx->bio, page, plen, poff);
 done:
/*
 * Move the caller beyond our range so that it keeps making progress.
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1f1829e506e8..5c2190216614 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -621,7 +621,7 @@ xfs_add_to_ioend(
atomic_inc(>write_count);
if (bio_full(wpc->ioend->io_bio))
xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
-   __bio_add_page(wpc->ioend->io_bio, page, len, poff);
+   bio_add_page(wpc->ioend->io_bio, page, len, poff);
}
 
wpc->ioend->io_size += len;
-- 
2.9.5



[Cluster-devel] [PATCH V11 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-20 Thread Ming Lei
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all 
bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Signed-off-by: Ming Lei 
---
 block/bio.c   | 27 ++-
 block/bounce.c|  6 --
 drivers/md/bcache/btree.c |  3 ++-
 drivers/md/dm-crypt.c |  3 ++-
 drivers/md/raid1.c|  3 ++-
 drivers/staging/erofs/data.c  |  3 ++-
 drivers/staging/erofs/unzip_vle.c |  3 ++-
 fs/block_dev.c|  6 --
 fs/btrfs/compression.c|  3 ++-
 fs/btrfs/disk-io.c|  3 ++-
 fs/btrfs/extent_io.c  | 12 
 fs/btrfs/inode.c  |  6 --
 fs/btrfs/raid56.c |  3 ++-
 fs/crypto/bio.c   |  3 ++-
 fs/direct-io.c|  4 +++-
 fs/exofs/ore.c|  3 ++-
 fs/exofs/ore_raid.c   |  3 ++-
 fs/ext4/page-io.c |  3 ++-
 fs/ext4/readpage.c|  3 ++-
 fs/f2fs/data.c|  9 ++---
 fs/gfs2/lops.c|  6 --
 fs/gfs2/meta_io.c |  3 ++-
 fs/iomap.c|  6 --
 fs/mpage.c|  3 ++-
 fs/xfs/xfs_aops.c |  5 +++--
 include/linux/bio.h   | 11 +--
 include/linux/bvec.h  | 31 +++
 27 files changed, 128 insertions(+), 46 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 4f4d9884443b..2680aa42a625 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1073,8 +1073,9 @@ static int bio_copy_from_iter(struct bio *bio, struct 
iov_iter *iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_from_iter(bvec->bv_page,
@@ -1104,8 +1105,9 @@ static int bio_copy_to_iter(struct bio *bio, struct 
iov_iter iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_to_iter(bvec->bv_page,
@@ -1127,8 +1129,9 @@ void bio_free_pages(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i)
+   bio_for_each_segment_all(bvec, bio, i, iter_all)
__free_page(bvec->bv_page);
 }
 EXPORT_SYMBOL(bio_free_pages);
@@ -1295,6 +1298,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
struct bio *bio;
int ret;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL);
@@ -1368,7 +1372,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
return bio;
 
  out_unmap:
-   bio_for_each_segment_all(bvec, bio, j) {
+   bio_for_each_segment_all(bvec, bio, j, iter_all) {
put_page(bvec->bv_page);
}
bio_put(bio);
@@ -1379,11 +1383,12 @@ static void __bio_unmap_user(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
/*
 * make sure we dirty pages we wrote to
 */
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec->bv_page);
 
@@ -1475,8 +1480,9 @@ static void bio_copy_kern_endio_read(struct bio *bio)
char *p = bio->bi_private;
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
p += bvec->bv_len;
}
@@ -1585,8 +1591,9 @@ void bio_set_pages_dirty(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
if (!PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
}
@@ -1597,8 +1604,9 @@ static void bio_release_pages(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i)
+   

[Cluster-devel] [PATCH V11 14/19] block: handle non-cluster bio out of blk_bio_segment_split

2018-11-20 Thread Ming Lei
We will enable multi-page bvec soon, but non-cluster queue can't
handle the multi-page bvec at all. This patch borrows bounce's
idea to clone new single-page bio for non-cluster queue, and moves
its handling out of blk_bio_segment_split().

Signed-off-by: Ming Lei 
---
 block/Makefile  |  3 ++-
 block/blk-merge.c   |  6 -
 block/blk.h |  2 ++
 block/non-cluster.c | 70 +
 4 files changed, 79 insertions(+), 2 deletions(-)
 create mode 100644 block/non-cluster.c

diff --git a/block/Makefile b/block/Makefile
index eee1b4ceecf9..e07d59438c4b 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_BLOCK) := bio.o elevator.o blk-core.o blk-sysfs.o \
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
genhd.o partition-generic.o ioprio.o \
-   badblocks.o partitions/ blk-rq-qos.o
+   badblocks.o partitions/ blk-rq-qos.o \
+   non-cluster.o
 
 obj-$(CONFIG_BOUNCE)   += bounce.o
 obj-$(CONFIG_BLK_SCSI_REQUEST) += scsi_ioctl.o
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8829c51b4e75..7c44216c1b58 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -247,7 +247,7 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
goto split;
}
 
-   if (bvprvp && blk_queue_cluster(q)) {
+   if (bvprvp) {
if (seg_size + bv.bv_len > queue_max_segment_size(q))
goto new_segment;
if (!biovec_phys_mergeable(q, bvprvp, ))
@@ -307,6 +307,10 @@ void blk_queue_split(struct request_queue *q, struct bio 
**bio)
split = blk_bio_write_same_split(q, *bio, >bio_split, 
);
break;
default:
+   if (!blk_queue_cluster(q)) {
+   blk_queue_non_cluster_bio(q, bio);
+   return;
+   }
split = blk_bio_segment_split(q, *bio, >bio_split, );
break;
}
diff --git a/block/blk.h b/block/blk.h
index 31c0e45aba3a..6fc5821ced55 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -338,6 +338,8 @@ struct bio *blk_next_bio(struct bio *bio, unsigned int 
nr_pages, gfp_t gfp);
 
 struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask, struct 
bio_set *bs);
 
+void blk_queue_non_cluster_bio(struct request_queue *q, struct bio **bio_orig);
+
 #ifdef CONFIG_BLK_DEV_ZONED
 void blk_queue_free_zone_bitmaps(struct request_queue *q);
 #else
diff --git a/block/non-cluster.c b/block/non-cluster.c
new file mode 100644
index ..9c2910be9404
--- /dev/null
+++ b/block/non-cluster.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/* non-cluster handling for block devices */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "blk.h"
+
+static struct bio_set non_cluster_bio_set, non_cluster_bio_split;
+
+static __init int init_non_cluster_bioset(void)
+{
+   WARN_ON(bioset_init(_cluster_bio_set, BIO_POOL_SIZE, 0,
+  BIOSET_NEED_BVECS));
+   WARN_ON(bioset_integrity_create(_cluster_bio_set, BIO_POOL_SIZE));
+   WARN_ON(bioset_init(_cluster_bio_split, BIO_POOL_SIZE, 0, 0));
+
+   return 0;
+}
+__initcall(init_non_cluster_bioset);
+
+static void non_cluster_end_io(struct bio *bio)
+{
+   struct bio *bio_orig = bio->bi_private;
+
+   bio_orig->bi_status = bio->bi_status;
+   bio_endio(bio_orig);
+   bio_put(bio);
+}
+
+void blk_queue_non_cluster_bio(struct request_queue *q, struct bio **bio_orig)
+{
+   struct bio *bio;
+   struct bvec_iter iter;
+   struct bio_vec from;
+   unsigned i = 0;
+   unsigned sectors = 0;
+   unsigned short max_segs = min_t(unsigned short, BIO_MAX_PAGES,
+   queue_max_segments(q));
+
+   bio_for_each_segment(from, *bio_orig, iter) {
+   if (i++ < max_segs)
+   sectors += from.bv_len >> 9;
+   else
+   break;
+   }
+
+   if (sectors < bio_sectors(*bio_orig)) {
+   bio = bio_split(*bio_orig, sectors, GFP_NOIO,
+   _cluster_bio_split);
+   bio_chain(bio, *bio_orig);
+   generic_make_request(*bio_orig);
+   *bio_orig = bio;
+   }
+   bio = bio_clone_bioset(*bio_orig, GFP_NOIO, _cluster_bio_set);
+
+   bio->bi_phys_segments = bio_segments(bio);
+bio_set_flag(bio, BIO_SEG_VALID);
+   bio->bi_end_io = non_cluster_end_io;
+
+   bio->bi_private = *bio_orig;
+   *bio_orig = bio;
+}
-- 
2.9.5



[Cluster-devel] [PATCH V11 13/19] block: move bounce_clone_bio into bio.c

2018-11-20 Thread Ming Lei
We will reuse bounce_clone_bio() for cloning bio in case of
!blk_queue_cluster(q), so move this helper into bio.c and
rename it as bio_clone_bioset().

No function change.

Signed-off-by: Ming Lei 
---
 block/bio.c| 69 +
 block/blk.h|  2 ++
 block/bounce.c | 70 +-
 3 files changed, 72 insertions(+), 69 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 2680aa42a625..0f1635b9ec50 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -647,6 +647,75 @@ struct bio *bio_clone_fast(struct bio *bio, gfp_t 
gfp_mask, struct bio_set *bs)
 }
 EXPORT_SYMBOL(bio_clone_fast);
 
+/* block core only helper */
+struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask,
+struct bio_set *bs)
+{
+   struct bvec_iter iter;
+   struct bio_vec bv;
+   struct bio *bio;
+
+   /*
+* Pre immutable biovecs, __bio_clone() used to just do a memcpy from
+* bio_src->bi_io_vec to bio->bi_io_vec.
+*
+* We can't do that anymore, because:
+*
+*  - The point of cloning the biovec is to produce a bio with a biovec
+*the caller can modify: bi_idx and bi_bvec_done should be 0.
+*
+*  - The original bio could've had more than BIO_MAX_PAGES biovecs; if
+*we tried to clone the whole thing bio_alloc_bioset() would fail.
+*But the clone should succeed as long as the number of biovecs we
+*actually need to allocate is fewer than BIO_MAX_PAGES.
+*
+*  - Lastly, bi_vcnt should not be looked at or relied upon by code
+*that does not own the bio - reason being drivers don't use it for
+*iterating over the biovec anymore, so expecting it to be kept up
+*to date (i.e. for clones that share the parent biovec) is just
+*asking for trouble and would force extra work on
+*__bio_clone_fast() anyways.
+*/
+
+   bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs);
+   if (!bio)
+   return NULL;
+   bio->bi_disk= bio_src->bi_disk;
+   bio->bi_opf = bio_src->bi_opf;
+   bio->bi_ioprio  = bio_src->bi_ioprio;
+   bio->bi_write_hint  = bio_src->bi_write_hint;
+   bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
+   bio->bi_iter.bi_size= bio_src->bi_iter.bi_size;
+
+   switch (bio_op(bio)) {
+   case REQ_OP_DISCARD:
+   case REQ_OP_SECURE_ERASE:
+   case REQ_OP_WRITE_ZEROES:
+   break;
+   case REQ_OP_WRITE_SAME:
+   bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
+   break;
+   default:
+   bio_for_each_segment(bv, bio_src, iter)
+   bio->bi_io_vec[bio->bi_vcnt++] = bv;
+   break;
+   }
+
+   if (bio_integrity(bio_src)) {
+   int ret;
+
+   ret = bio_integrity_clone(bio, bio_src, gfp_mask);
+   if (ret < 0) {
+   bio_put(bio);
+   return NULL;
+   }
+   }
+
+   bio_clone_blkcg_association(bio, bio_src);
+
+   return bio;
+}
+
 /**
  * bio_add_pc_page -   attempt to add page to bio
  * @q: the target queue
diff --git a/block/blk.h b/block/blk.h
index 816a9abb87cd..31c0e45aba3a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -336,6 +336,8 @@ static inline int blk_iolatency_init(struct request_queue 
*q) { return 0; }
 
 struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp);
 
+struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t gfp_mask, struct 
bio_set *bs);
+
 #ifdef CONFIG_BLK_DEV_ZONED
 void blk_queue_free_zone_bitmaps(struct request_queue *q);
 #else
diff --git a/block/bounce.c b/block/bounce.c
index 7338041e3042..4947c36173b2 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -215,74 +215,6 @@ static void bounce_end_io_read_isa(struct bio *bio)
__bounce_end_io_read(bio, _page_pool);
 }
 
-static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
-   struct bio_set *bs)
-{
-   struct bvec_iter iter;
-   struct bio_vec bv;
-   struct bio *bio;
-
-   /*
-* Pre immutable biovecs, __bio_clone() used to just do a memcpy from
-* bio_src->bi_io_vec to bio->bi_io_vec.
-*
-* We can't do that anymore, because:
-*
-*  - The point of cloning the biovec is to produce a bio with a biovec
-*the caller can modify: bi_idx and bi_bvec_done should be 0.
-*
-*  - The original bio could've had more than BIO_MAX_PAGES biovecs; if
-*we tried to clone the whole thing bio_alloc_bioset() would fail.
-*But the clone should succeed as long as the number of biovecs we
-

[Cluster-devel] [PATCH V11 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-20 Thread Ming Lei
bch_bio_alloc_pages() is always called on one new bio, so it is safe
to access the bvec table directly. Given it is the only kind of this
case, open code the bvec table access since bio_for_each_segment_all()
will be changed to support for iterating over multipage bvec.

Acked-by: Coly Li 
Signed-off-by: Ming Lei 
---
 drivers/md/bcache/util.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 20eddeac1531..62fb917f7a4f 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -270,7 +270,11 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;
 
-   bio_for_each_segment_all(bv, bio, i) {
+   /*
+* This is called on freshly new bio, so it is safe to access the
+* bvec table directly.
+*/
+   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++, i++) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
-- 
2.9.5



[Cluster-devel] [PATCH V11 10/19] block: loop: pass multi-page bvec to iov_iter

2018-11-20 Thread Ming Lei
iov_iter is implemented on bvec itererator helpers, so it is safe to pass
multi-page bvec to it, and this way is much more efficient than passing one
page in each bvec.

Signed-off-by: Ming Lei 
---
 drivers/block/loop.c   | 20 ++--
 include/linux/blkdev.h |  4 
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 176ab1f28eca..e3683211f12d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -510,21 +510,22 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
 loff_t pos, bool rw)
 {
struct iov_iter iter;
+   struct req_iterator rq_iter;
struct bio_vec *bvec;
struct request *rq = blk_mq_rq_from_pdu(cmd);
struct bio *bio = rq->bio;
struct file *file = lo->lo_backing_file;
+   struct bio_vec tmp;
unsigned int offset;
-   int segments = 0;
+   int nr_bvec = 0;
int ret;
 
+   rq_for_each_bvec(tmp, rq, rq_iter)
+   nr_bvec++;
+
if (rq->bio != rq->biotail) {
-   struct req_iterator iter;
-   struct bio_vec tmp;
 
-   __rq_for_each_bio(bio, rq)
-   segments += bio_segments(bio);
-   bvec = kmalloc_array(segments, sizeof(struct bio_vec),
+   bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
 GFP_NOIO);
if (!bvec)
return -EIO;
@@ -533,10 +534,10 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
/*
 * The bios of the request may be started from the middle of
 * the 'bvec' because of bio splitting, so we can't directly
-* copy bio->bi_iov_vec to new bvec. The rq_for_each_segment
+* copy bio->bi_iov_vec to new bvec. The rq_for_each_bvec
 * API will take care of all details for us.
 */
-   rq_for_each_segment(tmp, rq, iter) {
+   rq_for_each_bvec(tmp, rq, rq_iter) {
*bvec = tmp;
bvec++;
}
@@ -550,11 +551,10 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
 */
offset = bio->bi_iter.bi_bvec_done;
bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
-   segments = bio_segments(bio);
}
atomic_set(>ref, 2);
 
-   iov_iter_bvec(, rw, bvec, segments, blk_rq_bytes(rq));
+   iov_iter_bvec(, rw, bvec, nr_bvec, blk_rq_bytes(rq));
iter.iov_offset = offset;
 
cmd->iocb.ki_pos = pos;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1ad6eafc43f2..a281b6737b61 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -805,6 +805,10 @@ struct req_iterator {
__rq_for_each_bio(_iter.bio, _rq)   \
bio_for_each_segment(bvl, _iter.bio, _iter.iter)
 
+#define rq_for_each_bvec(bvl, _rq, _iter)  \
+   __rq_for_each_bio(_iter.bio, _rq)   \
+   bio_for_each_bvec(bvl, _iter.bio, _iter.iter)
+
 #define rq_iter_last(bvec, _iter)  \
(_iter.bio->bi_next == NULL &&  \
 bio_iter_last(bvec, _iter.iter))
-- 
2.9.5



[Cluster-devel] [PATCH V11 09/19] btrfs: move bio_pages_all() to btrfs

2018-11-20 Thread Ming Lei
BTRFS is the only user of this helper, so move this helper into
BTRFS, and implement it via bio_for_each_segment_all(), since
bio->bi_vcnt may not equal to number of pages after multipage bvec
is enabled.

Signed-off-by: Ming Lei 
---
 fs/btrfs/extent_io.c | 14 +-
 include/linux/bio.h  |  6 --
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5d5965297e7e..874bb9aeebdc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2348,6 +2348,18 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, 
struct bio *failed_bio,
return bio;
 }
 
+static unsigned btrfs_bio_pages_all(struct bio *bio)
+{
+   unsigned i;
+   struct bio_vec *bv;
+
+   WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+
+   bio_for_each_segment_all(bv, bio, i)
+   ;
+   return i;
+}
+
 /*
  * this is a generic handler for readpage errors (default
  * readpage_io_failed_hook). if other copies exist, read those and write back
@@ -2368,7 +2380,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
int read_mode = 0;
blk_status_t status;
int ret;
-   unsigned failed_bio_pages = bio_pages_all(failed_bio);
+   unsigned failed_bio_pages = btrfs_bio_pages_all(failed_bio);
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7560209d6a8a..9d6284f53c07 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -282,12 +282,6 @@ static inline void bio_get_last_bvec(struct bio *bio, 
struct bio_vec *bv)
bv->bv_len = iter.bi_bvec_done;
 }
 
-static inline unsigned bio_pages_all(struct bio *bio)
-{
-   WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
-   return bio->bi_vcnt;
-}
-
 static inline struct bio_vec *bio_first_bvec_all(struct bio *bio)
 {
WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
-- 
2.9.5



[Cluster-devel] [PATCH V11 07/19] fs/buffer.c: use bvec iterator to truncate the bio

2018-11-20 Thread Ming Lei
Once multi-page bvec is enabled, the last bvec may include more than one
page, this patch use bvec_last_segment() to truncate the bio.

Reviewed-by: Omar Sandoval 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ming Lei 
---
 fs/buffer.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1286c2b95498..fa37ad52e962 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3032,7 +3032,10 @@ void guard_bio_eod(int op, struct bio *bio)
 
/* ..and clear the end of the buffer for reads */
if (op == REQ_OP_READ) {
-   zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
+   struct bio_vec bv;
+
+   bvec_last_segment(bvec, );
+   zero_user(bv.bv_page, bv.bv_offset + bv.bv_len,
truncated_bytes);
}
 }
-- 
2.9.5



[Cluster-devel] [PATCH V11 08/19] btrfs: use bvec_last_segment to get bio's last page

2018-11-20 Thread Ming Lei
Preparing for supporting multi-page bvec.

Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 fs/btrfs/extent_io.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d228f706ff3e..5d5965297e7e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2720,11 +2720,12 @@ static int __must_check submit_one_bio(struct bio *bio, 
int mirror_num,
 {
blk_status_t ret = 0;
struct bio_vec *bvec = bio_last_bvec_all(bio);
-   struct page *page = bvec->bv_page;
+   struct bio_vec bv;
struct extent_io_tree *tree = bio->bi_private;
u64 start;
 
-   start = page_offset(page) + bvec->bv_offset;
+   bvec_last_segment(bvec, );
+   start = page_offset(bv.bv_page) + bv.bv_offset;
 
bio->bi_private = NULL;
 
-- 
2.9.5



[Cluster-devel] [PATCH V11 06/19] block: introduce bvec_last_segment()

2018-11-20 Thread Ming Lei
BTRFS and guard_bio_eod() need to get the last singlepage segment
from one multipage bvec, so introduce this helper to make them happy.

Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index b279218c5c4d..b37d13a79a7d 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -173,4 +173,26 @@ static inline bool bvec_iter_advance(const struct bio_vec 
*bv,
.bi_bvec_done   = 0,\
 }
 
+/*
+ * Get the last single-page segment from the multi-page bvec and store it
+ * in @seg
+ */
+static inline void bvec_last_segment(const struct bio_vec *bvec,
+struct bio_vec *seg)
+{
+   unsigned total = bvec->bv_offset + bvec->bv_len;
+   unsigned last_page = (total - 1) / PAGE_SIZE;
+
+   seg->bv_page = nth_page(bvec->bv_page, last_page);
+
+   /* the whole segment is inside the last page */
+   if (bvec->bv_offset >= last_page * PAGE_SIZE) {
+   seg->bv_offset = bvec->bv_offset % PAGE_SIZE;
+   seg->bv_len = bvec->bv_len;
+   } else {
+   seg->bv_offset = 0;
+   seg->bv_len = total - last_page * PAGE_SIZE;
+   }
+}
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.9.5



[Cluster-devel] [PATCH V11 05/19] block: use bio_for_each_bvec() to map sg

2018-11-20 Thread Ming Lei
It is more efficient to use bio_for_each_bvec() to map sg, meantime
we have to consider splitting multipage bvec as done in blk_bio_segment_split().

Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 68 +++
 1 file changed, 48 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index ec0b93fa1ff8..8829c51b4e75 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -455,6 +455,52 @@ static int blk_phys_contig_segment(struct request_queue 
*q, struct bio *bio,
return biovec_phys_mergeable(q, _bv, _bv);
 }
 
+static struct scatterlist *blk_next_sg(struct scatterlist **sg,
+   struct scatterlist *sglist)
+{
+   if (!*sg)
+   return sglist;
+
+   /*
+* If the driver previously mapped a shorter list, we could see a
+* termination bit prematurely unless it fully inits the sg table
+* on each mapping. We KNOW that there must be more entries here
+* or the driver would be buggy, so force clear the termination bit
+* to avoid doing a full sg_init_table() in drivers for each command.
+*/
+   sg_unmark_end(*sg);
+   return sg_next(*sg);
+}
+
+static unsigned blk_bvec_map_sg(struct request_queue *q,
+   struct bio_vec *bvec, struct scatterlist *sglist,
+   struct scatterlist **sg)
+{
+   unsigned nbytes = bvec->bv_len;
+   unsigned nsegs = 0, total = 0;
+
+   while (nbytes > 0) {
+   unsigned seg_size;
+   struct page *pg;
+   unsigned offset, idx;
+
+   *sg = blk_next_sg(sg, sglist);
+
+   seg_size = min(nbytes, queue_max_segment_size(q));
+   offset = (total + bvec->bv_offset) % PAGE_SIZE;
+   idx = (total + bvec->bv_offset) / PAGE_SIZE;
+   pg = nth_page(bvec->bv_page, idx);
+
+   sg_set_page(*sg, pg, seg_size, offset);
+
+   total += seg_size;
+   nbytes -= seg_size;
+   nsegs++;
+   }
+
+   return nsegs;
+}
+
 static inline void
 __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 struct scatterlist *sglist, struct bio_vec *bvprv,
@@ -472,25 +518,7 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
(*sg)->length += nbytes;
} else {
 new_segment:
-   if (!*sg)
-   *sg = sglist;
-   else {
-   /*
-* If the driver previously mapped a shorter
-* list, we could see a termination bit
-* prematurely unless it fully inits the sg
-* table on each mapping. We KNOW that there
-* must be more entries here or the driver
-* would be buggy, so force clear the
-* termination bit to avoid doing a full
-* sg_init_table() in drivers for each command.
-*/
-   sg_unmark_end(*sg);
-   *sg = sg_next(*sg);
-   }
-
-   sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
-   (*nsegs)++;
+   (*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
}
*bvprv = *bvec;
 }
@@ -512,7 +540,7 @@ static int __blk_bios_map_sg(struct request_queue *q, 
struct bio *bio,
int cluster = blk_queue_cluster(q), nsegs = 0;
 
for_each_bio(bio)
-   bio_for_each_segment(bvec, bio, iter)
+   bio_for_each_bvec(bvec, bio, iter)
__blk_segment_map_sg(q, , sglist, , sg,
 , );
 
-- 
2.9.5



[Cluster-devel] [PATCH V11 04/19] block: use bio_for_each_bvec() to compute multi-page bvec count

2018-11-20 Thread Ming Lei
First it is more efficient to use bio_for_each_bvec() in both
blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how
many multi-page bvecs there are in the bio.

Secondly once bio_for_each_bvec() is used, the bvec may need to be
splitted because its length can be very longer than max segment size,
so we have to split the big bvec into several segments.

Thirdly when splitting multi-page bvec into segments, the max segment
limit may be reached, so the bio split need to be considered under
this situation too.

Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 87 +++
 1 file changed, 68 insertions(+), 19 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index f52400ce2187..ec0b93fa1ff8 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -161,6 +161,54 @@ static inline unsigned get_max_io_size(struct 
request_queue *q,
return sectors;
 }
 
+/*
+ * Split the bvec @bv into segments, and update all kinds of
+ * variables.
+ */
+static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
+   unsigned *nsegs, unsigned *last_seg_size,
+   unsigned *front_seg_size, unsigned *sectors)
+{
+   unsigned len = bv->bv_len;
+   unsigned total_len = 0;
+   unsigned new_nsegs = 0, seg_size = 0;
+
+   /*
+* Multipage bvec may be too big to hold in one segment,
+* so the current bvec has to be splitted as multiple
+* segments.
+*/
+   while (len && new_nsegs + *nsegs < queue_max_segments(q)) {
+   seg_size = min(queue_max_segment_size(q), len);
+
+   new_nsegs++;
+   total_len += seg_size;
+   len -= seg_size;
+
+   if ((bv->bv_offset + total_len) & queue_virt_boundary(q))
+   break;
+   }
+
+   /* update front segment size */
+   if (!*nsegs) {
+   unsigned first_seg_size = seg_size;
+
+   if (new_nsegs > 1)
+   first_seg_size = queue_max_segment_size(q);
+   if (*front_seg_size < first_seg_size)
+   *front_seg_size = first_seg_size;
+   }
+
+   /* update other varibles */
+   *last_seg_size = seg_size;
+   *nsegs += new_nsegs;
+   if (sectors)
+   *sectors += total_len >> 9;
+
+   /* split in the middle of the bvec if len != 0 */
+   return !!len;
+}
+
 static struct bio *blk_bio_segment_split(struct request_queue *q,
 struct bio *bio,
 struct bio_set *bs,
@@ -174,7 +222,7 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
struct bio *new = NULL;
const unsigned max_sectors = get_max_io_size(q, bio);
 
-   bio_for_each_segment(bv, bio, iter) {
+   bio_for_each_bvec(bv, bio, iter) {
/*
 * If the queue doesn't support SG gaps and adding this
 * offset would create a gap, disallow it.
@@ -189,8 +237,12 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
 */
if (nsegs < queue_max_segments(q) &&
sectors < max_sectors) {
-   nsegs++;
-   sectors = max_sectors;
+   /* split in the middle of bvec */
+   bv.bv_len = (max_sectors - sectors) << 9;
+   bvec_split_segs(q, , ,
+   _size,
+   _seg_size,
+   );
}
goto split;
}
@@ -212,14 +264,12 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
if (nsegs == queue_max_segments(q))
goto split;
 
-   if (nsegs == 1 && seg_size > front_seg_size)
-   front_seg_size = seg_size;
-
-   nsegs++;
bvprv = bv;
bvprvp = 
-   seg_size = bv.bv_len;
-   sectors += bv.bv_len >> 9;
+
+   if (bvec_split_segs(q, , , _size,
+   _seg_size, ))
+   goto split;
 
}
 
@@ -233,8 +283,6 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
bio = new;
}
 
-   if (nsegs == 1 && seg_size > front_seg_size)
-   front_seg_size = seg_size;
bio->bi_seg_front_size = front_seg_size;
if (seg_size > bio->bi_seg_back_size)
bio->bi_seg_back_size = seg_size;
@@ -297,6 +345,7 @@ static unsigned int __blk_recalc_rq_segments(struct 

[Cluster-devel] [PATCH V11 03/19] block: introduce bio_for_each_bvec()

2018-11-20 Thread Ming Lei
This helper is used for iterating over multi-page bvec for bio
split & merge code.

Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 include/linux/bio.h  | 25 ++---
 include/linux/bvec.h | 36 +---
 2 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 056fb627edb3..7560209d6a8a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -76,6 +76,9 @@
 #define bio_data_dir(bio) \
(op_is_write(bio_op(bio)) ? WRITE : READ)
 
+#define bio_iter_mp_iovec(bio, iter)   \
+   segment_iter_bvec((bio)->bi_io_vec, (iter))
+
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
  */
@@ -135,18 +138,24 @@ static inline bool bio_full(struct bio *bio)
 #define bio_for_each_segment_all(bvl, bio, i)  \
for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-   unsigned bytes)
+static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes, unsigned max_seg_len)
 {
iter->bi_sector += bytes >> 9;
 
if (bio_no_advance_iter(bio))
iter->bi_size -= bytes;
else
-   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+   __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_seg_len);
/* TODO: It is reasonable to complete bio with error here. */
 }
 
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+   unsigned bytes)
+{
+   __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
+}
+
 #define __bio_for_each_segment(bvl, bio, iter, start)  \
for (iter = (start);\
 (iter).bi_size &&  \
@@ -156,6 +165,16 @@ static inline void bio_advance_iter(struct bio *bio, 
struct bvec_iter *iter,
 #define bio_for_each_segment(bvl, bio, iter)   \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
+#define __bio_for_each_bvec(bvl, bio, iter, start) \
+   for (iter = (start);\
+(iter).bi_size &&  \
+   ((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
+__bio_advance_iter((bio), &(iter), (bvl).bv_len, BVEC_MAX_LEN))
+
+/* returns one real segment(multi-page bvec) each time */
+#define bio_for_each_bvec(bvl, bio, iter)  \
+   __bio_for_each_bvec(bvl, bio, iter, (bio)->bi_iter)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index ed90bbf4c9c9..b279218c5c4d 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#define BVEC_MAX_LEN  ((unsigned int)-1)
+
 /*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
  */
@@ -87,8 +89,15 @@ struct bvec_iter {
.bv_offset  = bvec_iter_offset((bvec), (iter)), \
 })
 
-static inline bool bvec_iter_advance(const struct bio_vec *bv,
-   struct bvec_iter *iter, unsigned bytes)
+#define segment_iter_bvec(bvec, iter)  \
+((struct bio_vec) {\
+   .bv_page= segment_iter_page((bvec), (iter)),\
+   .bv_len = segment_iter_len((bvec), (iter)), \
+   .bv_offset  = segment_iter_offset((bvec), (iter)),  \
+})
+
+static inline bool __bvec_iter_advance(const struct bio_vec *bv,
+   struct bvec_iter *iter, unsigned bytes, unsigned max_seg_len)
 {
if (WARN_ONCE(bytes > iter->bi_size,
 "Attempted to advance past end of bvec iter\n")) {
@@ -97,12 +106,18 @@ static inline bool bvec_iter_advance(const struct bio_vec 
*bv,
}
 
while (bytes) {
-   unsigned iter_len = bvec_iter_len(bv, *iter);
-   unsigned len = min(bytes, iter_len);
+   unsigned segment_len = segment_iter_len(bv, *iter);
 
-   bytes -= len;
-   iter->bi_size -= len;
-   iter->bi_bvec_done += len;
+   if (max_seg_len < BVEC_MAX_LEN)
+   segment_len = min_t(unsigned, segment_len,
+   max_seg_len -
+   bvec_iter_offset(bv, *iter));
+
+   segment_len = min(bytes, segment_len);
+
+   bytes -= segment_len;
+   iter->bi_size -=

[Cluster-devel] [PATCH V11 02/19] block: introduce multi-page bvec helpers

2018-11-20 Thread Ming Lei
This patch introduces helpers of 'segment_iter_*' for multipage
bvec support.

The introduced helpers treate one bvec as real multi-page segment,
which may include more than one pages.

The existed helpers of bvec_iter_* are interfaces for supporting current
bvec iterator which is thought as single-page by drivers, fs, dm and
etc. These introduced helpers will build single-page bvec in flight, so
this way won't break current bio/bvec users, which needn't any change.

Follows some multi-page bvec background:

- bvecs stored in bio->bi_io_vec is always multi-page style

- bvec(struct bio_vec) represents one physically contiguous I/O
  buffer, now the buffer may include more than one page after
  multi-page bvec is supported, and all these pages represented
  by one bvec is physically contiguous. Before multi-page bvec
  support, at most one page is included in one bvec, we call it
  single-page bvec.

- .bv_page of the bvec points to the 1st page in the multi-page bvec

- .bv_offset of the bvec is the offset of the buffer in the bvec

The effect on the current drivers/filesystem/dm/bcache/...:

- almost everyone supposes that one bvec only includes one single
  page, so we keep the sp interface not changed, for example,
  bio_for_each_segment() still returns single-page bvec

- bio_for_each_segment_all() will return single-page bvec too

- during iterating, iterator variable(struct bvec_iter) is always
  updated in multi-page bvec style, and bvec_iter_advance() is kept
  not changed

- returned(copied) single-page bvec is built in flight by bvec
  helpers from the stored multi-page bvec

Reviewed-by: Omar Sandoval 
Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 02c73c6aa805..ed90bbf4c9c9 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
@@ -50,16 +51,35 @@ struct bvec_iter {
  */
 #define __bvec_iter_bvec(bvec, iter)   (&(bvec)[(iter).bi_idx])
 
-#define bvec_iter_page(bvec, iter) \
+#define segment_iter_page(bvec, iter)  \
(__bvec_iter_bvec((bvec), (iter))->bv_page)
 
-#define bvec_iter_len(bvec, iter)  \
+#define segment_iter_len(bvec, iter)   \
min((iter).bi_size, \
__bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)
 
-#define bvec_iter_offset(bvec, iter)   \
+#define segment_iter_offset(bvec, iter)\
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+#define segment_iter_page_idx(bvec, iter)  \
+   (segment_iter_offset((bvec), (iter)) / PAGE_SIZE)
+
+/*
+ *  of single-page segment.
+ *
+ * This helpers are for building single-page bvec in flight.
+ */
+#define bvec_iter_offset(bvec, iter)   \
+   (segment_iter_offset((bvec), (iter)) % PAGE_SIZE)
+
+#define bvec_iter_len(bvec, iter)  \
+   min_t(unsigned, segment_iter_len((bvec), (iter)),   \
+ PAGE_SIZE - bvec_iter_offset((bvec), (iter)))
+
+#define bvec_iter_page(bvec, iter) \
+   nth_page(segment_iter_page((bvec), (iter)), \
+segment_iter_page_idx((bvec), (iter)))
+
 #define bvec_iter_bvec(bvec, iter) \
 ((struct bio_vec) {\
.bv_page= bvec_iter_page((bvec), (iter)),   \
-- 
2.9.5



[Cluster-devel] [PATCH V11 01/19] block: don't use bio->bi_vcnt to figure out segment number

2018-11-20 Thread Ming Lei
It is wrong to use bio->bi_vcnt to figure out how many segments
there are in the bio even though CLONED flag isn't set on this bio,
because this bio may be splitted or advanced.

So always use bio_segments() in blk_recount_segments(), and it shouldn't
cause any performance loss now because the physical segment number is figured
out in blk_queue_split() and BIO_SEG_VALID is set meantime since
bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").

Reviewed-by: Christoph Hellwig 
Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max 
segments")
Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b1df622cbd85..f52400ce2187 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -368,13 +368,7 @@ void blk_recalc_rq_segments(struct request *rq)
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-   unsigned short seg_cnt;
-
-   /* estimate segment number by bi_vcnt for non-cloned bio */
-   if (bio_flagged(bio, BIO_CLONED))
-   seg_cnt = bio_segments(bio);
-   else
-   seg_cnt = bio->bi_vcnt;
+   unsigned short seg_cnt = bio_segments(bio);
 
if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) &&
(seg_cnt < queue_max_segments(q)))
-- 
2.9.5



Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Ming Lei
On Tue, Nov 20, 2018 at 12:11:35PM -0800, Sagi Grimberg wrote:
> 
> > > > The only user in your final tree seems to be the loop driver, and
> > > > even that one only uses the helper for read/write bios.
> > > > 
> > > > I think something like this would be much simpler in the end:
> > > 
> > > The recently submitted nvme-tcp host driver should also be a user
> > > of this. Does it make sense to keep it as a helper then?
> > 
> > I did take a brief look at the code, and I really don't understand
> > why the heck it even deals with bios to start with.  Like all the
> > other nvme transports it is a blk-mq driver and should iterate
> > over segments in a request and more or less ignore bios.  Something
> > is horribly wrong in the design.
> 
> Can you explain a little more? I'm more than happy to change that but
> I'm not completely clear how...
> 
> Before we begin a data transfer, we need to set our own iterator that
> will advance with the progression of the data transfer. We also need to
> keep in mind that all the data transfer (both send and recv) are
> completely non blocking (and zero-copy when we send).
> 
> That means that every data movement needs to be able to suspend
> and resume asynchronously. i.e. we cannot use the following pattern:
> rq_for_each_segment(bvec, rq, rq_iter) {
>   iov_iter_bvec(_iter, WRITE, , 1, bvec.bv_len);
>   send(sock, iov_iter);
> }

Not sure I understand the 'blocking' problem in this case.

We can build a bvec table from this req, and send them all
in send(), can this way avoid your blocking issue? You may see this
example in branch 'rq->bio != rq->biotail' of lo_rw_aio().

If this way is what you need, I think you are right, even we may
introduce the following helpers:

rq_for_each_bvec()
rq_bvecs()

So looks nvme-tcp host driver might be the 2nd driver which benefits
from multi-page bvec directly.

The multi-page bvec V11 has passed my tests and addressed almost
all the comments during review on V10. I removed bio_vecs() in V11,
but it won't be big deal, we can introduce them anytime when there
is the requirement.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 18/19] block: kill QUEUE_FLAG_NO_SG_MERGE

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:58:03PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 15, 2018 at 04:53:05PM +0800, Ming Lei wrote:
> > Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after 
> > splitting"),
> > physical segment number is mainly figured out in blk_queue_split() for
> > fast path, and the flag of BIO_SEG_VALID is set there too.
> > 
> > Now only blk_recount_segments() and blk_recalc_rq_segments() use this
> > flag.
> > 
> > Basically blk_recount_segments() is bypassed in fast path given 
> > BIO_SEG_VALID
> > is set in blk_queue_split().
> > 
> > For another user of blk_recalc_rq_segments():
> > 
> > - run in partial completion branch of blk_update_request, which is an 
> > unusual case
> > 
> > - run in blk_cloned_rq_check_limits(), still not a big problem if the flag 
> > is killed
> > since dm-rq is the only user.
> > 
> > Multi-page bvec is enabled now, QUEUE_FLAG_NO_SG_MERGE doesn't make sense 
> > any more.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  block/blk-merge.c  | 31 ++-
> >  block/blk-mq-debugfs.c |  1 -
> >  block/blk-mq.c |  3 ---
> >  drivers/md/dm-table.c  | 13 -
> >  include/linux/blkdev.h |  1 -
> >  5 files changed, 6 insertions(+), 43 deletions(-)
> > 
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 153a659fde74..06be298be332 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -351,8 +351,7 @@ void blk_queue_split(struct request_queue *q, struct 
> > bio **bio)
> >  EXPORT_SYMBOL(blk_queue_split);
> >  
> >  static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
> > -struct bio *bio,
> > -bool no_sg_merge)
> > +struct bio *bio)
> >  {
> > struct bio_vec bv, bvprv = { NULL };
> > int cluster, prev = 0;
> > @@ -379,13 +378,6 @@ static unsigned int __blk_recalc_rq_segments(struct 
> > request_queue *q,
> > nr_phys_segs = 0;
> > for_each_bio(bio) {
> > bio_for_each_bvec(bv, bio, iter) {
> > -   /*
> > -* If SG merging is disabled, each bio vector is
> > -* a segment
> > -*/
> > -   if (no_sg_merge)
> > -   goto new_segment;
> > -
> > if (prev && cluster) {
> > if (seg_size + bv.bv_len
> > > queue_max_segment_size(q))
> > @@ -420,27 +412,16 @@ static unsigned int __blk_recalc_rq_segments(struct 
> > request_queue *q,
> >  
> >  void blk_recalc_rq_segments(struct request *rq)
> >  {
> > -   bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
> > -   >q->queue_flags);
> > -
> > -   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio,
> > -   no_sg_merge);
> > +   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio);
> 
> Can we rename __blk_recalc_rq_segments to blk_recalc_rq_segments
> can kill the old blk_recalc_rq_segments now?

Sure.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 18/19] block: kill QUEUE_FLAG_NO_SG_MERGE

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 06:18:11PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:53:05PM +0800, Ming Lei wrote:
> > Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after 
> > splitting"),
> > physical segment number is mainly figured out in blk_queue_split() for
> > fast path, and the flag of BIO_SEG_VALID is set there too.
> > 
> > Now only blk_recount_segments() and blk_recalc_rq_segments() use this
> > flag.
> > 
> > Basically blk_recount_segments() is bypassed in fast path given 
> > BIO_SEG_VALID
> > is set in blk_queue_split().
> > 
> > For another user of blk_recalc_rq_segments():
> > 
> > - run in partial completion branch of blk_update_request, which is an 
> > unusual case
> > 
> > - run in blk_cloned_rq_check_limits(), still not a big problem if the flag 
> > is killed
> > since dm-rq is the only user.
> > 
> > Multi-page bvec is enabled now, QUEUE_FLAG_NO_SG_MERGE doesn't make sense 
> > any more.
> 
> This commit message wasn't very clear. Is it the case that
> QUEUE_FLAG_NO_SG_MERGE is no longer set by any drivers?

OK, I will add the explanation to commit log in next version.

05f1dd53152173 (block: add queue flag for disabling SG merging) introduces this
flag for NVMe performance purpose only, so that merging to segment can
be bypassed for NVMe.

Actually this optimization was bypassed by 54efd50bfd873e2d (block: make
generic_make_request handle arbitrarily sized bios) and bdced438acd83ad83a6c
("block: setup bi_phys_segments after splitting").

Now segment computation can be very quick, given most of times one bvec
can be thought as one segment, so we can remove the flag.

thanks, 
Ming



Re: [Cluster-devel] [PATCH V10 17/19] block: don't use bio->bi_vcnt to figure out segment number

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 06:11:40PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:53:04PM +0800, Ming Lei wrote:
> > It is wrong to use bio->bi_vcnt to figure out how many segments
> > there are in the bio even though CLONED flag isn't set on this bio,
> > because this bio may be splitted or advanced.
> > 
> > So always use bio_segments() in blk_recount_segments(), and it shouldn't
> > cause any performance loss now because the physical segment number is 
> > figured
> > out in blk_queue_split() and BIO_SEG_VALID is set meantime since
> > bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Fixes: 7f60dcaaf91 ("block: blk-merge: fix blk_recount_segments()")
> 
> From what I can tell, the problem was originally introduced by
> 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max 
> segments")
> 
> Is that right?

Indeed, will update it in next version.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 15/19] block: always define BIO_MAX_PAGES as 256

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 05:59:36PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:53:02PM +0800, Ming Lei wrote:
> > Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to
> > increase BIO_MAX_PAGES for it.
> 
> You mentioned to it in the cover letter, but this needs more explanation
> in the commit message. Why did CONFIG_THP_SWAP require > 256? Why does
> multipage bvecs remove that requirement?

CONFIG_THP_SWAP needs to split one TH page into normal pages and adds
them all to one bio. With multipage-bvec, it just takes one bvec to
hold them all.

thanks,
Ming



Re: [Cluster-devel] [PATCH V10 14/19] block: enable multipage bvecs

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:53:08PM +0100, Christoph Hellwig wrote:
> > -
> > -   if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
> > -   bv->bv_len += len;
> > -   bio->bi_iter.bi_size += len;
> > -   return true;
> > -   }
> > +   struct request_queue *q = NULL;
> > +
> > +   if (page == bv->bv_page && off == (bv->bv_offset + bv->bv_len)
> > +   && (off + len) <= PAGE_SIZE)
> 
> How could the page struct be the same, but the range beyond PAGE_SIZE
> (at least with the existing callers)?
> 
> Also no need for the inner btraces, and the && always goes on the
> first line.

OK.

> 
> > +   if (bio->bi_disk)
> > +   q = bio->bi_disk->queue;
> > +
> > +   /* disable multi-page bvec too if cluster isn't enabled */
> > +   if (!q || !blk_queue_cluster(q) ||
> > +   ((page_to_phys(bv->bv_page) + bv->bv_offset + bv->bv_len) !=
> > +(page_to_phys(page) + off)))
> > +   return false;
> > + merge:
> > +   bv->bv_len += len;
> > +   bio->bi_iter.bi_size += len;
> > +   return true;
> 
> Ok, this is scary, as it will give differen results depending on when
> bi_disk is assigned.

It is just merge or not, both can be handled well now.

> But then again we shouldn't really do the cluster
> check here, but rather when splitting the bio for the actual low-level
> driver.

Yeah, I thought of this way too, but it may cause tons of bio split for
no-clustering, and there are quite a few scsi devices which require
to disable clustering.

[linux]$ git grep -n DISABLE_CLUSTERING ./drivers/scsi/ | wc -l
 28

Or we may introduce bio_split_to_single_page_bvec() to allocate &
convert to single-page bvec table for non-clustering, will try this
approach in next version.

> 
> (and eventually we should kill this clustering setting off in favor
> of our normal segment limits).

Yeah, it has been in my post-multi-page todo list already, :-)

thanks,
Ming



Re: [Cluster-devel] [PATCH V10 14/19] block: enable multipage bvecs

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 05:56:27PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:53:01PM +0800, Ming Lei wrote:
> > This patch pulls the trigger for multi-page bvecs.
> > 
> > Now any request queue which supports queue cluster will see multi-page
> > bvecs.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  block/bio.c | 24 ++--
> >  1 file changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 6486722d4d4b..ed6df6f8e63d 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> 
> This comment above __bio_try_merge_page() doesn't make sense after this
> change:
> 
>  This is a
>  a useful optimisation for file systems with a block size smaller than the
>  page size.
> 
> Can you please get rid of it in this patch?

I understand __bio_try_merge_page() still works for original cases, so
looks the optimization for sub-pagesize is still there too, isn't it?

> 
> > @@ -767,12 +767,24 @@ bool __bio_try_merge_page(struct bio *bio, struct 
> > page *page,
> >  
> > if (bio->bi_vcnt > 0) {
> > struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> > -
> > -   if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
> > -   bv->bv_len += len;
> > -   bio->bi_iter.bi_size += len;
> > -   return true;
> > -   }
> > +   struct request_queue *q = NULL;
> > +
> > +   if (page == bv->bv_page && off == (bv->bv_offset + bv->bv_len)
> > +   && (off + len) <= PAGE_SIZE)
> > +   goto merge;
> 
> The parentheses around (bv->bv_offset + bv->bv_len) and (off + len) are
> unnecessary noise.
> 
> What's the point of the new (off + len) <= PAGE_SIZE check?

Yeah, I don't know why I did it, :-(, the check is absolutely always true.

> 
> > +
> > +   if (bio->bi_disk)
> > +   q = bio->bi_disk->queue;
> > +
> > +   /* disable multi-page bvec too if cluster isn't enabled */
> > +   if (!q || !blk_queue_cluster(q) ||
> > +   ((page_to_phys(bv->bv_page) + bv->bv_offset + bv->bv_len) !=
> > +(page_to_phys(page) + off)))
> 
> More unnecessary parentheses here.

OK.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 05:22:45PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:59PM +0800, Ming Lei wrote:
> > This patch introduces one extra iterator variable to 
> > bio_for_each_segment_all(),
> > then we can allow bio_for_each_segment_all() to iterate over multi-page 
> > bvec.
> > 
> > Given it is just one mechannical & simple change on all 
> > bio_for_each_segment_all()
> > users, this patch does tree-wide change in one single patch, so that we can
> > avoid to use a temporary helper for this conversion.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Alexander Viro 
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: linux-bt...@vger.kernel.org
> > Cc: David Sterba 
> > Cc: Darrick J. Wong 
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  block/bio.c   | 27 ++-
> >  block/blk-zoned.c |  1 +
> >  block/bounce.c|  6 --
> >  drivers/md/bcache/btree.c |  3 ++-
> >  drivers/md/dm-crypt.c |  3 ++-
> >  drivers/md/raid1.c|  3 ++-
> >  drivers/staging/erofs/data.c  |  3 ++-
> >  drivers/staging/erofs/unzip_vle.c |  3 ++-
> >  fs/block_dev.c|  6 --
> >  fs/btrfs/compression.c|  3 ++-
> >  fs/btrfs/disk-io.c|  3 ++-
> >  fs/btrfs/extent_io.c  | 12 
> >  fs/btrfs/inode.c  |  6 --
> >  fs/btrfs/raid56.c |  3 ++-
> >  fs/crypto/bio.c   |  3 ++-
> >  fs/direct-io.c|  4 +++-
> >  fs/exofs/ore.c|  3 ++-
> >  fs/exofs/ore_raid.c   |  3 ++-
> >  fs/ext4/page-io.c |  3 ++-
> >  fs/ext4/readpage.c|  3 ++-
> >  fs/f2fs/data.c|  9 ++---
> >  fs/gfs2/lops.c|  6 --
> >  fs/gfs2/meta_io.c |  3 ++-
> >  fs/iomap.c|  6 --
> >  fs/mpage.c|  3 ++-
> >  fs/xfs/xfs_aops.c |  5 +++--
> >  include/linux/bio.h   | 11 +--
> >  include/linux/bvec.h  | 31 +++
> >  28 files changed, 129 insertions(+), 46 deletions(-)
> > 
> 
> [snip]
> 
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index 3496c816946e..1a2430a8b89d 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -131,12 +131,19 @@ static inline bool bio_full(struct bio *bio)
> > return bio->bi_vcnt >= bio->bi_max_vecs;
> >  }
> >  
> > +#define bvec_for_each_segment(bv, bvl, i, iter_all)
> > \
> > +   for (bv = bvec_init_iter_all(_all);\
> > +   (iter_all.done < (bvl)->bv_len) &&  \
> > +   ((bvec_next_segment((bvl), _all)), 1); \
> 
> The parentheses around (bvec_next_segment((bvl), _all)) are
> unnecessary.

OK.

> 
> > +   iter_all.done += bv->bv_len, i += 1)
> > +
> >  /*
> >   * drivers should _never_ use the all version - the bio may have been split
> >   * before it got to the driver and the driver won't own all of it
> >   */
> > -#define bio_for_each_segment_all(bvl, bio, i)  
> > \
> > -   for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
> > +#define bio_for_each_segment_all(bvl, bio, i, iter_all)\
> > +   for (i = 0, iter_all.idx = 0; iter_all.idx < (bio)->bi_vcnt; 
> > iter_all.idx++)\
> > +   bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), 
> > i, iter_all)
> 
> Would it be possible to move i into iter_all to streamline this a bit?

That may may cause unnecessary conversion work for us, because the local
variable 'i' is defined in external function.

> 
> >  static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> > *iter,
> >   unsigned bytes, bool mp)
> > diff --git a/include

Re: [Cluster-devel] [PATCH V10 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 01:42:52PM +0100, David Sterba wrote:
> On Thu, Nov 15, 2018 at 04:52:59PM +0800, Ming Lei wrote:
> > diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> > index 13ba2011a306..789b09ae402a 100644
> > --- a/block/blk-zoned.c
> > +++ b/block/blk-zoned.c
> > @@ -123,6 +123,7 @@ static int blk_report_zones(struct gendisk *disk, 
> > sector_t sector,
> > unsigned int z = 0, n, nrz = *nr_zones;
> > sector_t capacity = get_capacity(disk);
> > int ret;
> > +   struct bvec_iter_all iter_all;
> >  
> > while (z < nrz && sector < capacity) {
> > n = nrz - z;
> 
> iter_all is added but not used and I don't see any
> bio_for_each_segment_all for conversion in this function.

Good catch, will fix it in next version.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 13/19] iomap & xfs: only account for new added page

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:49:36PM +0100, Christoph Hellwig wrote:
> I'd much rather have __bio_try_merge_page only do merges in
> the same page, and have a new __bio_try_merge_segment that does
> multi-page merges.  This will keep the accounting a lot simpler.

Looks this way is clever, will do it in next version.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 13/19] iomap & xfs: only account for new added page

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 05:46:58PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:53:00PM +0800, Ming Lei wrote:
> > After multi-page is enabled, one new page may be merged to a segment
> > even though it is a new added page.
> > 
> > This patch deals with this issue by post-check in case of merge, and
> > only a freshly new added page need to be dealt with for iomap & xfs.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  fs/iomap.c  | 22 ++
> >  fs/xfs/xfs_aops.c   | 10 --
> >  include/linux/bio.h | 11 +++
> >  3 files changed, 33 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index df0212560b36..a1b97a5c726a 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -288,6 +288,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> > loff_t length, void *data,
> > loff_t orig_pos = pos;
> > unsigned poff, plen;
> > sector_t sector;
> > +   bool need_account = false;
> >  
> > if (iomap->type == IOMAP_INLINE) {
> > WARN_ON_ONCE(pos);
> > @@ -313,18 +314,15 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> > loff_t length, void *data,
> >  */
> > sector = iomap_sector(iomap, pos);
> > if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
> > -   if (__bio_try_merge_page(ctx->bio, page, plen, poff))
> > +   if (__bio_try_merge_page(ctx->bio, page, plen, poff)) {
> > +   need_account = iop && bio_is_last_segment(ctx->bio,
> > +   page, plen, poff);
> 
> It's redundant to make this iop && ... since you already check
> iop && need_account below. Maybe rename it to added_page? Also, this
> indentation is wack.

We may avoid to call bio_is_last_segment() in case of !iop, and will
fix the indentation.

Looks added_page is one better name.

> 
> > goto done;
> > +   }
> > is_contig = true;
> > }
> >  
> > -   /*
> > -* If we start a new segment we need to increase the read count, and we
> > -* need to do so before submitting any previous full bio to make sure
> > -* that we don't prematurely unlock the page.
> > -*/
> > -   if (iop)
> > -   atomic_inc(>read_count);
> > +   need_account = true;
> >  
> > if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
> > gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
> > @@ -347,6 +345,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> > loff_t length, void *data,
> > __bio_add_page(ctx->bio, page, plen, poff);
> >  done:
> > /*
> > +* If we add a new page we need to increase the read count, and we
> > +* need to do so before submitting any previous full bio to make sure
> > +* that we don't prematurely unlock the page.
> > +*/
> > +   if (iop && need_account)
> > +   atomic_inc(>read_count);
> > +
> > +   /*
> >  * Move the caller beyond our range so that it keeps making progress.
> >  * For that we have to include any leading non-uptodate ranges, but
> >  * we can skip trailing ones as they will be handled in the next
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index 1f1829e506e8..d8e9cc9f751a 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -603,6 +603,7 @@ xfs_add_to_ioend(
> > unsignedlen = i_blocksize(inode);
> > unsignedpoff = offset & (PAGE_SIZE - 1);
> > sector_tsector;
> > +   boolneed_account;
> >  
> > sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
> > ((of

Re: [Cluster-devel] [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:46:45PM +0100, Christoph Hellwig wrote:
> > -   bio_for_each_segment_all(bv, bio, i) {
> > +   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {
> 
> This really needs a comment.  Otherwise it looks fine to me.

OK, will do it in next version.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 11/19] bcache: avoid to use bio_for_each_segment_all() in bch_bio_alloc_pages()

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 04:44:02PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:58PM +0800, Ming Lei wrote:
> > bch_bio_alloc_pages() is always called on one new bio, so it is safe
> > to access the bvec table directly. Given it is the only kind of this
> > case, open code the bvec table access since bio_for_each_segment_all()
> > will be changed to support for iterating over multipage bvec.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Acked-by: Coly Li 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  drivers/md/bcache/util.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> > index 20eddeac1531..8517aebcda2d 100644
> > --- a/drivers/md/bcache/util.c
> > +++ b/drivers/md/bcache/util.c
> > @@ -270,7 +270,7 @@ int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
> > int i;
> > struct bio_vec *bv;
> >  
> > -   bio_for_each_segment_all(bv, bio, i) {
> > +   for (i = 0, bv = bio->bi_io_vec; i < bio->bi_vcnt; bv++) {
> 
> This is missing an i++.

Good catch, will fix it in next version.

thanks,
Ming



Re: [Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:45:41PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 15, 2018 at 04:52:56PM +0800, Ming Lei wrote:
> > There are still cases in which we need to use bio_bvecs() for get the
> > number of multi-page segment, so introduce it.
> 
> The only user in your final tree seems to be the loop driver, and
> even that one only uses the helper for read/write bios.
> 
> I think something like this would be much simpler in the end:
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index d509902a8046..712511815ac6 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -514,16 +514,18 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> loop_cmd *cmd,
>   struct request *rq = blk_mq_rq_from_pdu(cmd);
>   struct bio *bio = rq->bio;
>   struct file *file = lo->lo_backing_file;
> + struct bvec_iter bvec_iter;
> + struct bio_vec tmp;
>   unsigned int offset;
>   int nr_bvec = 0;
>   int ret;
>  
> + __rq_for_each_bio(bio, rq)
> + bio_for_each_bvec(tmp, bio, bvec_iter)
> + nr_bvec++;
> +
>   if (rq->bio != rq->biotail) {
> - struct bvec_iter iter;
> - struct bio_vec tmp;
>  
> - __rq_for_each_bio(bio, rq)
> - nr_bvec += bio_bvecs(bio);
>   bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
>GFP_NOIO);
>   if (!bvec)
> @@ -537,7 +539,7 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> loop_cmd *cmd,
>* API will take care of all details for us.
>*/
>   __rq_for_each_bio(bio, rq)
> - bio_for_each_bvec(tmp, bio, iter) {
> + bio_for_each_bvec(tmp, bio, bvec_iter) {
>   *bvec = tmp;
>   bvec++;
>   }
> @@ -551,7 +553,6 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> loop_cmd *cmd,
>*/
>   offset = bio->bi_iter.bi_bvec_done;
>   bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> - nr_bvec = bio_bvecs(bio);
>   }
>   atomic_set(>ref, 2);
>  
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index dcad0b69f57a..379440d1ced0 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -200,30 +200,6 @@ static inline unsigned bio_segments(struct bio *bio)
>   }
>  }
>  
> -static inline unsigned bio_bvecs(struct bio *bio)
> -{
> - unsigned bvecs = 0;
> - struct bio_vec bv;
> - struct bvec_iter iter;
> -
> - /*
> -  * We special case discard/write same/write zeroes, because they
> -  * interpret bi_size differently:
> -  */
> - switch (bio_op(bio)) {
> - case REQ_OP_DISCARD:
> - case REQ_OP_SECURE_ERASE:
> - case REQ_OP_WRITE_ZEROES:
> - return 0;
> - case REQ_OP_WRITE_SAME:
> - return 1;
> - default:
> - bio_for_each_bvec(bv, bio, iter)
> - bvecs++;
> - return bvecs;
> - }
> -}
> -
>  /*
>   * get a reference to a bio, so it won't disappear. the intended use is
>   * something like:

OK, will do it in next version.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:38:45PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 15, 2018 at 04:52:55PM +0800, Ming Lei wrote:
> > BTRFS is the only user of this helper, so move this helper into
> > BTRFS, and implement it via bio_for_each_segment_all(), since
> > bio->bi_vcnt may not equal to number of pages after multipage bvec
> > is enabled.
> 
> btrfs only uses the value to check if it is larger than 1.  No amount
> of multipage bio merging should ever make bi_vcnt go from 0 to 1 or
> vice versa.

Could you explain a bit why?

Suppose 2 physically continuous pages are added to this bio, .bi_vcnt
can be 1 in case of multi-page bvec, but it is 2 in case of single-page
bvec.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 10/19] block: loop: pass multi-page bvec to iov_iter

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 04:40:22PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:57PM +0800, Ming Lei wrote:
> > iov_iter is implemented with bvec itererator, so it is safe to pass
> > multipage bvec to it, and this way is much more efficient than
> > passing one page in each bvec.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> 
> Reviewed-by: Omar Sandoval 
> 
> Comments below.
> 
> > Signed-off-by: Ming Lei 
> > ---
> >  drivers/block/loop.c | 23 ---
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index bf6bc35aaf88..a3fd418ec637 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -515,16 +515,16 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> > loop_cmd *cmd,
> > struct bio *bio = rq->bio;
> > struct file *file = lo->lo_backing_file;
> > unsigned int offset;
> > -   int segments = 0;
> > +   int nr_bvec = 0;
> > int ret;
> >  
> > if (rq->bio != rq->biotail) {
> > -   struct req_iterator iter;
> > +   struct bvec_iter iter;
> > struct bio_vec tmp;
> >  
> > __rq_for_each_bio(bio, rq)
> > -   segments += bio_segments(bio);
> > -   bvec = kmalloc_array(segments, sizeof(struct bio_vec),
> > +   nr_bvec += bio_bvecs(bio);
> > +   bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
> >  GFP_NOIO);
> > if (!bvec)
> > return -EIO;
> > @@ -533,13 +533,14 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> > loop_cmd *cmd,
> > /*
> >  * The bios of the request may be started from the middle of
> >  * the 'bvec' because of bio splitting, so we can't directly
> > -* copy bio->bi_iov_vec to new bvec. The rq_for_each_segment
> > +* copy bio->bi_iov_vec to new bvec. The bio_for_each_bvec
> >  * API will take care of all details for us.
> >  */
> > -   rq_for_each_segment(tmp, rq, iter) {
> > -   *bvec = tmp;
> > -   bvec++;
> > -   }
> > +   __rq_for_each_bio(bio, rq)
> > +   bio_for_each_bvec(tmp, bio, iter) {
> > +   *bvec = tmp;
> > +   bvec++;
> > +   }
> 
> Even if they're not strictly necessary, could you please include the
> curly braces for __rq_for_each_bio() here?

Sure, will do it.

> 
> > bvec = cmd->bvec;
> > offset = 0;
> > } else {
> > @@ -550,11 +551,11 @@ static int lo_rw_aio(struct loop_device *lo, struct 
> > loop_cmd *cmd,
> >  */
> > offset = bio->bi_iter.bi_bvec_done;
> > bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
> > -   segments = bio_segments(bio);
> > +   nr_bvec = bio_bvecs(bio);
> 
> This scared me for a second, but it's fine to do here because we haven't
> actually enabled multipage bvecs yet, right?

Well, it is fine, all helpers supporting multi-page bvec actually works
well when it isn't enabled, cause single-page bvec is one special case in
which multi-page bevc helpers have to deal with.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 04:23:56PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:55PM +0800, Ming Lei wrote:
> > BTRFS is the only user of this helper, so move this helper into
> > BTRFS, and implement it via bio_for_each_segment_all(), since
> > bio->bi_vcnt may not equal to number of pages after multipage bvec
> > is enabled.
> 
> Shouldn't you also get rid of bio_pages_all() in this patch?

Good catch!

thanks,
Ming



Re: [Cluster-devel] [PATCH V10 05/19] block: introduce bvec_last_segment()

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 03:23:56PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:52PM +0800, Ming Lei wrote:
> > BTRFS and guard_bio_eod() need to get the last singlepage segment
> > from one multipage bvec, so introduce this helper to make them happy.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> 
> Reviewed-by: Omar Sandoval 
> 
> Minor comments below.
> 
> > Signed-off-by: Ming Lei 
> > ---
> >  include/linux/bvec.h | 25 +
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> > index 3d61352cd8cf..01616a0b6220 100644
> > --- a/include/linux/bvec.h
> > +++ b/include/linux/bvec.h
> > @@ -216,4 +216,29 @@ static inline bool mp_bvec_iter_advance(const struct 
> > bio_vec *bv,
> > .bi_bvec_done   = 0,\
> >  }
> >  
> > +/*
> > + * Get the last singlepage segment from the multipage bvec and store it
> > + * in @seg
> > + */
> > +static inline void bvec_last_segment(const struct bio_vec *bvec,
> > +   struct bio_vec *seg)
> 
> Indentation is all messed up here.

Will fix it.

> 
> > +{
> > +   unsigned total = bvec->bv_offset + bvec->bv_len;
> > +   unsigned last_page = total / PAGE_SIZE;
> > +
> > +   if (last_page * PAGE_SIZE == total)
> > +   last_page--;
> 
> I think this could just be
> 
>   unsigned int last_page = (total - 1) / PAGE_SIZE;

This way is really elegant.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 07/19] btrfs: use bvec_last_segment to get bio's last page

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:37:10PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 15, 2018 at 04:52:54PM +0800, Ming Lei wrote:
> > index 2955a4ea2fa8..161e14b8b180 100644
> > --- a/fs/btrfs/compression.c
> > +++ b/fs/btrfs/compression.c
> > @@ -400,8 +400,11 @@ blk_status_t btrfs_submit_compressed_write(struct 
> > inode *inode, u64 start,
> >  static u64 bio_end_offset(struct bio *bio)
> >  {
> > struct bio_vec *last = bio_last_bvec_all(bio);
> > +   struct bio_vec bv;
> >  
> > -   return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
> > +   bvec_last_segment(last, );
> > +
> > +   return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset;
> 
> I don't think we need this.  If last is a multi-page bvec bv_offset
> will already contain the correct offset from the first page.

Yeah, it is true for this specific case, looks we can drop this patch.


thanks,
Ming



Re: [Cluster-devel] [PATCH V10 03/19] block: use bio_for_each_bvec() to compute multi-page bvec count

2018-11-19 Thread Ming Lei
On Thu, Nov 15, 2018 at 12:20:28PM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:50PM +0800, Ming Lei wrote:
> > First it is more efficient to use bio_for_each_bvec() in both
> > blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how
> > many multi-page bvecs there are in the bio.
> > 
> > Secondly once bio_for_each_bvec() is used, the bvec may need to be
> > splitted because its length can be very longer than max segment size,
> > so we have to split the big bvec into several segments.
> > 
> > Thirdly when splitting multi-page bvec into segments, the max segment
> > limit may be reached, so the bio split need to be considered under
> > this situation too.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> > Signed-off-by: Ming Lei 
> > ---
> >  block/blk-merge.c | 90 
> > ++-
> >  1 file changed, 76 insertions(+), 14 deletions(-)
> > 
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 91b2af332a84..6f7deb94a23f 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -160,6 +160,62 @@ static inline unsigned get_max_io_size(struct 
> > request_queue *q,
> > return sectors;
> >  }
> >  
> > +/*
> > + * Split the bvec @bv into segments, and update all kinds of
> > + * variables.
> > + */
> > +static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
> > +   unsigned *nsegs, unsigned *last_seg_size,
> > +   unsigned *front_seg_size, unsigned *sectors)
> > +{
> > +   bool need_split = false;
> > +   unsigned len = bv->bv_len;
> > +   unsigned total_len = 0;
> > +   unsigned new_nsegs = 0, seg_size = 0;
> 
> "unsigned int" here and everywhere else.
> 
> > +   if ((*nsegs >= queue_max_segments(q)) || !len)
> > +   return need_split;
> > +
> > +   /*
> > +* Multipage bvec may be too big to hold in one segment,
> > +* so the current bvec has to be splitted as multiple
> > +* segments.
> > +*/
> > +   while (new_nsegs + *nsegs < queue_max_segments(q)) {
> > +   seg_size = min(queue_max_segment_size(q), len);
> > +
> > +   new_nsegs++;
> > +   total_len += seg_size;
> > +   len -= seg_size;
> > +
> > +   if ((queue_virt_boundary(q) && ((bv->bv_offset +
> > +   total_len) & queue_virt_boundary(q))) || !len)
> > +   break;
> 
> Checking queue_virt_boundary(q) != 0 is superfluous, and the len check
> could just control the loop, i.e.,
> 
>   while (len && new_nsegs + *nsegs < queue_max_segments(q)) {
>   seg_size = min(queue_max_segment_size(q), len);
> 
>   new_nsegs++;
>   total_len += seg_size;
>   len -= seg_size;
> 
>   if ((bv->bv_offset + total_len) & queue_virt_boundary(q))
>   break;
>   }
> 
> And if you rewrite it this way, I _think_ you can get rid of this
> special case:
> 
>   if ((*nsegs >= queue_max_segments(q)) || !len)
>   return need_split;
> 
> above.

Good point, will do in next version.

> 
> > +   }
> > +
> > +   /* split in the middle of the bvec */
> > +   if (len)
> > +   need_split = true;
> 
> need_split is unnecessary, just return len != 0.

OK.

> 
> > +
> > +   /* update front segment size */
> > +   if (!*nsegs) {
> > +   unsigned first_seg_size = seg_size;
> > +
> > +   if (new_nsegs > 1)
> > +   first_seg_size = queue_max_segment_size(q);
> > +   if (*front_seg_size < first_seg_size)
> > +   *front_seg_size = first_seg_size;
> > +   }
> > +
> > +   /* update other varibles */
> > +   *last_seg_size = seg_size;
> > 

Re: [Cluster-devel] [PATCH V10 04/19] block: use bio_for_each_bvec() to map sg

2018-11-19 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:33:14PM +0100, Christoph Hellwig wrote:
> > +   if (!*sg)
> > +   return sglist;
> > +   else {
> 
> No need for an else after an early return.

OK, good catch!

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 02/19] block: introduce bio_for_each_bvec()

2018-11-18 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:30:28PM +0100, Christoph Hellwig wrote:
> > +static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> > *iter,
> > + unsigned bytes, bool mp)
> 
> I think these magic 'bool np' arguments and wrappers over wrapper
> don't help anyone to actually understand the code.  I'd vote for
> removing as many wrappers as we really don't need, and passing the
> actual segment limit instead of the magic bool flag.  Something like
> this untested patch:

I think this way is fine, just a little comment.

> 
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 277921ad42e7..dcad0b69f57a 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -138,30 +138,21 @@ static inline bool bio_full(struct bio *bio)
>   bvec_for_each_segment(bvl, &((bio)->bi_io_vec[iter_all.idx]), 
> i, iter_all)
>  
>  static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter 
> *iter,
> -   unsigned bytes, bool mp)
> +   unsigned bytes, unsigned max_segment)

The new parameter should have been named as 'max_segment_len' or
'max_seg_len'.

>  {
>   iter->bi_sector += bytes >> 9;
>  
>   if (bio_no_advance_iter(bio))
>   iter->bi_size -= bytes;
>   else
> - if (!mp)
> - bvec_iter_advance(bio->bi_io_vec, iter, bytes);
> - else
> - mp_bvec_iter_advance(bio->bi_io_vec, iter, bytes);
> + __bvec_iter_advance(bio->bi_io_vec, iter, bytes, max_segment);
>   /* TODO: It is reasonable to complete bio with error here. */
>  }
>  
>  static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
>   unsigned bytes)
>  {
> - __bio_advance_iter(bio, iter, bytes, false);
> -}
> -
> -static inline void bio_advance_mp_iter(struct bio *bio, struct bvec_iter 
> *iter,
> -unsigned bytes)
> -{
> - __bio_advance_iter(bio, iter, bytes, true);
> + __bio_advance_iter(bio, iter, bytes, PAGE_SIZE);
>  }
>  
>  #define __bio_for_each_segment(bvl, bio, iter, start)
> \
> @@ -177,7 +168,7 @@ static inline void bio_advance_mp_iter(struct bio *bio, 
> struct bvec_iter *iter,
>   for (iter = (start);\
>(iter).bi_size &&  \
>   ((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
> -  bio_advance_mp_iter((bio), &(iter), (bvl).bv_len))
> +  __bio_advance_iter((bio), &(iter), (bvl).bv_len, 0))

Even we might pass '-1' for multi-page segment.

>  
>  /* returns one real segment(multipage bvec) each time */
>  #define bio_for_each_bvec(bvl, bio, iter)\
> diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> index 02f26d2b59ad..5e2ed46c1c88 100644
> --- a/include/linux/bvec.h
> +++ b/include/linux/bvec.h
> @@ -138,8 +138,7 @@ struct bvec_iter_all {
>  })
>  
>  static inline bool __bvec_iter_advance(const struct bio_vec *bv,
> -struct bvec_iter *iter,
> -unsigned bytes, bool mp)
> + struct bvec_iter *iter, unsigned bytes, unsigned max_segment)
>  {
>   if (WARN_ONCE(bytes > iter->bi_size,
>"Attempted to advance past end of bvec iter\n")) {
> @@ -148,18 +147,18 @@ static inline bool __bvec_iter_advance(const struct 
> bio_vec *bv,
>   }
>  
>   while (bytes) {
> - unsigned len;
> + unsigned segment_len = mp_bvec_iter_len(bv, *iter);
>  
> - if (mp)
> - len = mp_bvec_iter_len(bv, *iter);
> - else
> - len = bvec_iter_len(bv, *iter);
> + if (max_segment) {
> + max_segment -= bvec_iter_offset(bv, *iter);
> + segment_len = min(segment_len, max_segment);

Looks 'max_segment' needs to be constant, shouldn't be updated.

If '-1' is passed for multipage case, the above change may become:

segment_len = min_t(segment_len, max_seg_len - 
bvec_iter_offset(bv, *iter));

This way is more clean, but with extra cost of the above line for multipage
case.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 01/19] block: introduce multi-page page bvec helpers

2018-11-18 Thread Ming Lei
On Sun, Nov 18, 2018 at 08:10:14PM -0700, Jens Axboe wrote:
> On 11/18/18 7:23 PM, Ming Lei wrote:
> > On Fri, Nov 16, 2018 at 02:13:05PM +0100, Christoph Hellwig wrote:
> >>> -#define bvec_iter_page(bvec, iter)   \
> >>> +#define mp_bvec_iter_page(bvec, iter)\
> >>>   (__bvec_iter_bvec((bvec), (iter))->bv_page)
> >>>  
> >>> -#define bvec_iter_len(bvec, iter)\
> >>> +#define mp_bvec_iter_len(bvec, iter) \
> >>
> >> I'd much prefer if we would stick to the segment naming that
> >> we also use in the higher level helper.
> >>
> >> So segment_iter_page, segment_iter_len, etc.
> > 
> > We discussed the naming problem before, one big problem is that the 
> > 'segment'
> > in bio_for_each_segment*() means one single page segment actually.
> > 
> > If we use segment_iter_page() here for multi-page segment, it may
> > confuse people.
> > 
> > Of course, I prefer to the naming of segment/page, 
> > 
> > And Jens didn't agree to rename bio_for_each_segment*() before.
> 
> I didn't like frivolous renaming (and I still don't), but mp_
> is horrible imho. Don't name these after the fact that they
> are done in conjunction with supporting multipage bvecs. That
> very fact will be irrelevant very soon

OK, so what is your suggestion for the naming issue?

Are you fine to use segment_iter_page() here? Then the term of 'segment'
may be interpreted as multi-page segment here, but as single-page in
bio_for_each_segment*().

thanks
Ming



Re: [Cluster-devel] [PATCH V10 01/19] block: introduce multi-page page bvec helpers

2018-11-18 Thread Ming Lei
On Thu, Nov 15, 2018 at 10:25:59AM -0800, Omar Sandoval wrote:
> On Thu, Nov 15, 2018 at 04:52:48PM +0800, Ming Lei wrote:
> > This patch introduces helpers of 'mp_bvec_iter_*' for multipage
> > bvec support.
> > 
> > The introduced helpers treate one bvec as real multi-page segment,
> > which may include more than one pages.
> > 
> > The existed helpers of bvec_iter_* are interfaces for supporting current
> > bvec iterator which is thought as single-page by drivers, fs, dm and
> > etc. These introduced helpers will build single-page bvec in flight, so
> > this way won't break current bio/bvec users, which needn't any change.
> > 
> > Cc: Dave Chinner 
> > Cc: Kent Overstreet 
> > Cc: Mike Snitzer 
> > Cc: dm-de...@redhat.com
> > Cc: Alexander Viro 
> > Cc: linux-fsde...@vger.kernel.org
> > Cc: Shaohua Li 
> > Cc: linux-r...@vger.kernel.org
> > Cc: linux-er...@lists.ozlabs.org
> > Cc: David Sterba 
> > Cc: linux-bt...@vger.kernel.org
> > Cc: Darrick J. Wong 
> > Cc: linux-...@vger.kernel.org
> > Cc: Gao Xiang 
> > Cc: Christoph Hellwig 
> > Cc: Theodore Ts'o 
> > Cc: linux-e...@vger.kernel.org
> > Cc: Coly Li 
> > Cc: linux-bca...@vger.kernel.org
> > Cc: Boaz Harrosh 
> > Cc: Bob Peterson 
> > Cc: cluster-devel@redhat.com
> 
> Reviewed-by: Omar Sandoval 
> 
> But a couple of comments below.
> 
> > Signed-off-by: Ming Lei 
> > ---
> >  include/linux/bvec.h | 63 
> > +---
> >  1 file changed, 60 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/bvec.h b/include/linux/bvec.h
> > index 02c73c6aa805..8ef904a50577 100644
> > --- a/include/linux/bvec.h
> > +++ b/include/linux/bvec.h
> > @@ -23,6 +23,44 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +
> > +/*
> > + * What is multi-page bvecs?
> > + *
> > + * - bvecs stored in bio->bi_io_vec is always multi-page(mp) style
> > + *
> > + * - bvec(struct bio_vec) represents one physically contiguous I/O
> > + *   buffer, now the buffer may include more than one pages after
> > + *   multi-page(mp) bvec is supported, and all these pages represented
> > + *   by one bvec is physically contiguous. Before mp support, at most
> > + *   one page is included in one bvec, we call it single-page(sp)
> > + *   bvec.
> > + *
> > + * - .bv_page of the bvec represents the 1st page in the mp bvec
> > + *
> > + * - .bv_offset of the bvec represents offset of the buffer in the bvec
> > + *
> > + * The effect on the current drivers/filesystem/dm/bcache/...:
> > + *
> > + * - almost everyone supposes that one bvec only includes one single
> > + *   page, so we keep the sp interface not changed, for example,
> > + *   bio_for_each_segment() still returns bvec with single page
> > + *
> > + * - bio_for_each_segment*() will be changed to return single-page
> > + *   bvec too
> > + *
> > + * - during iterating, iterator variable(struct bvec_iter) is always
> > + *   updated in multipage bvec style and that means bvec_iter_advance()
> > + *   is kept not changed
> > + *
> > + * - returned(copied) single-page bvec is built in flight by bvec
> > + *   helpers from the stored multipage bvec
> > + *
> > + * - In case that some components(such as iov_iter) need to support
> > + *   multi-page bvec, we introduce new helpers(mp_bvec_iter_*) for
> > + *   them.
> > + */
> 
> This comment sounds more like a commit message (i.e., how were things
> before, and how are we changing them). In a couple of years when I read
> this code, I probably won't care how it was changed, just how it works.
> So I think a comment explaining the concepts of multi-page and
> single-page bvecs is very useful, but please move all of the "foo was
> changed" and "before mp support" type stuff to the commit message.

OK.

> 
> >  /*
> >   * was unsigned short, but we might as well be ready for > 64kB I/O pages
> > @@ -50,16 +88,35 @@ struct bvec_iter {
> >   */
> >  #define __bvec_iter_bvec(bvec, iter)   (&(bvec)[(iter).bi_idx])
> >  
> > -#define bvec_iter_page(bvec, iter) \
> > +#define mp_bvec_iter_page(bvec, iter)  \
> > (__bvec_iter_bvec((bvec), (iter))->bv_page)
> >  
> > -#define bvec_iter_len(bvec, iter)  \
> > +#define mp_bvec_iter_len(bvec, iter)   \
> > 

Re: [Cluster-devel] [PATCH V10 01/19] block: introduce multi-page page bvec helpers

2018-11-18 Thread Ming Lei
On Fri, Nov 16, 2018 at 02:13:05PM +0100, Christoph Hellwig wrote:
> > -#define bvec_iter_page(bvec, iter) \
> > +#define mp_bvec_iter_page(bvec, iter)  \
> > (__bvec_iter_bvec((bvec), (iter))->bv_page)
> >  
> > -#define bvec_iter_len(bvec, iter)  \
> > +#define mp_bvec_iter_len(bvec, iter)   \
> 
> I'd much prefer if we would stick to the segment naming that
> we also use in the higher level helper.
> 
> So segment_iter_page, segment_iter_len, etc.

We discussed the naming problem before, one big problem is that the 'segment'
in bio_for_each_segment*() means one single page segment actually.

If we use segment_iter_page() here for multi-page segment, it may
confuse people.

Of course, I prefer to the naming of segment/page, 

And Jens didn't agree to rename bio_for_each_segment*() before.

So what is the solution we should take for moving on?

> 
> > + * This helpers are for building sp bvec in flight.
> 
> Please spell out single page, sp is not easy understandable.

OK.

Thanks,
Ming



Re: [Cluster-devel] [PATCH V10 00/19] block: support multi-page bvec

2018-11-16 Thread Ming Lei
On Fri, Nov 16, 2018 at 03:03:14PM +0100, Christoph Hellwig wrote:
> It seems like bi_phys_segments is still around of this series.
> Shouldn't it be superflous now?

Even though multi-page bvec is supported, the segment number doesn't
equal to the actual bvec count yet, for example, one bvec may be too
bigger to be held in one single segment.


Thanks,
Ming



[Cluster-devel] [PATCH V10 18/19] block: kill QUEUE_FLAG_NO_SG_MERGE

2018-11-15 Thread Ming Lei
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"),
physical segment number is mainly figured out in blk_queue_split() for
fast path, and the flag of BIO_SEG_VALID is set there too.

Now only blk_recount_segments() and blk_recalc_rq_segments() use this
flag.

Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
is set in blk_queue_split().

For another user of blk_recalc_rq_segments():

- run in partial completion branch of blk_update_request, which is an unusual 
case

- run in blk_cloned_rq_check_limits(), still not a big problem if the flag is 
killed
since dm-rq is the only user.

Multi-page bvec is enabled now, QUEUE_FLAG_NO_SG_MERGE doesn't make sense any 
more.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/blk-merge.c  | 31 ++-
 block/blk-mq-debugfs.c |  1 -
 block/blk-mq.c |  3 ---
 drivers/md/dm-table.c  | 13 -
 include/linux/blkdev.h |  1 -
 5 files changed, 6 insertions(+), 43 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 153a659fde74..06be298be332 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -351,8 +351,7 @@ void blk_queue_split(struct request_queue *q, struct bio 
**bio)
 EXPORT_SYMBOL(blk_queue_split);
 
 static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
-struct bio *bio,
-bool no_sg_merge)
+struct bio *bio)
 {
struct bio_vec bv, bvprv = { NULL };
int cluster, prev = 0;
@@ -379,13 +378,6 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
nr_phys_segs = 0;
for_each_bio(bio) {
bio_for_each_bvec(bv, bio, iter) {
-   /*
-* If SG merging is disabled, each bio vector is
-* a segment
-*/
-   if (no_sg_merge)
-   goto new_segment;
-
if (prev && cluster) {
if (seg_size + bv.bv_len
> queue_max_segment_size(q))
@@ -420,27 +412,16 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
 
 void blk_recalc_rq_segments(struct request *rq)
 {
-   bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
-   >q->queue_flags);
-
-   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio,
-   no_sg_merge);
+   rq->nr_phys_segments = __blk_recalc_rq_segments(rq->q, rq->bio);
 }
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-   unsigned short seg_cnt = bio_segments(bio);
-
-   if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) &&
-   (seg_cnt < queue_max_segments(q)))
-   bio->bi_phys_segments = seg_cnt;
-   else {
-   struct bio *nxt = bio->bi_next;
+   struct bio *nxt = bio->bi_next;
 
-   bio->bi_next = NULL;
-   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false);
-   bio->bi_next = nxt;
-   }
+   bio->bi_next = NULL;
+   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio);
+   bio->bi_next = nxt;
 
bio_set_flag(bio, BIO_SEG_VALID);
 }
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f021f4817b80..e188b1090759 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -128,7 +128,6 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(SAME_FORCE),
QUEUE_FLAG_NAME(DEAD),
QUEUE_FLAG_NAME(INIT_DONE),
-   QUEUE_FLAG_NAME(NO_SG_MERGE),
QUEUE_FLAG_NAME(POLL),
QUEUE_FLAG_NAME(WC),
QUEUE_FLAG_NAME(FUA),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 411be60d0cb6..ed484af5744b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2755,9 +2755,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
 
q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 
-   if (!(set->flags & BLK_MQ_F_SG_MERGE))
-   queue_flag_set_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
-
q->sg_reserved_size = INT_MAX;
 
INIT_DELAYED_WORK(>requeue_work, blk_mq_requeue_work);
diff --git

[Cluster-devel] [PATCH V10 17/19] block: don't use bio->bi_vcnt to figure out segment number

2018-11-15 Thread Ming Lei
It is wrong to use bio->bi_vcnt to figure out how many segments
there are in the bio even though CLONED flag isn't set on this bio,
because this bio may be splitted or advanced.

So always use bio_segments() in blk_recount_segments(), and it shouldn't
cause any performance loss now because the physical segment number is figured
out in blk_queue_split() and BIO_SEG_VALID is set meantime since
bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting").

Cc: Dave Chinner 
Cc: Kent Overstreet 
Fixes: 7f60dcaaf91 ("block: blk-merge: fix blk_recount_segments()")
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index cb9f49bcfd36..153a659fde74 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -429,13 +429,7 @@ void blk_recalc_rq_segments(struct request *rq)
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-   unsigned short seg_cnt;
-
-   /* estimate segment number by bi_vcnt for non-cloned bio */
-   if (bio_flagged(bio, BIO_CLONED))
-   seg_cnt = bio_segments(bio);
-   else
-   seg_cnt = bio->bi_vcnt;
+   unsigned short seg_cnt = bio_segments(bio);
 
if (test_bit(QUEUE_FLAG_NO_SG_MERGE, >queue_flags) &&
(seg_cnt < queue_max_segments(q)))
-- 
2.9.5



[Cluster-devel] [PATCH V10 07/19] btrfs: use bvec_last_segment to get bio's last page

2018-11-15 Thread Ming Lei
Preparing for supporting multi-page bvec.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 fs/btrfs/compression.c | 5 -
 fs/btrfs/extent_io.c   | 5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 2955a4ea2fa8..161e14b8b180 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -400,8 +400,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
 static u64 bio_end_offset(struct bio *bio)
 {
struct bio_vec *last = bio_last_bvec_all(bio);
+   struct bio_vec bv;
 
-   return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
+   bvec_last_segment(last, );
+
+   return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset;
 }
 
 static noinline int add_ra_bio_pages(struct inode *inode,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d228f706ff3e..5d5965297e7e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2720,11 +2720,12 @@ static int __must_check submit_one_bio(struct bio *bio, 
int mirror_num,
 {
blk_status_t ret = 0;
struct bio_vec *bvec = bio_last_bvec_all(bio);
-   struct page *page = bvec->bv_page;
+   struct bio_vec bv;
struct extent_io_tree *tree = bio->bi_private;
u64 start;
 
-   start = page_offset(page) + bvec->bv_offset;
+   bvec_last_segment(bvec, );
+   start = page_offset(bv.bv_page) + bv.bv_offset;
 
bio->bi_private = NULL;
 
-- 
2.9.5



[Cluster-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-15 Thread Ming Lei
There are still cases in which we need to use bio_bvecs() for get the
number of multi-page segment, so introduce it.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 include/linux/bio.h | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1f0dcf109841..3496c816946e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -196,7 +196,6 @@ static inline unsigned bio_segments(struct bio *bio)
 * We special case discard/write same/write zeroes, because they
 * interpret bi_size differently:
 */
-
switch (bio_op(bio)) {
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
@@ -205,13 +204,34 @@ static inline unsigned bio_segments(struct bio *bio)
case REQ_OP_WRITE_SAME:
return 1;
default:
-   break;
+   bio_for_each_segment(bv, bio, iter)
+   segs++;
+   return segs;
}
+}
 
-   bio_for_each_segment(bv, bio, iter)
-   segs++;
+static inline unsigned bio_bvecs(struct bio *bio)
+{
+   unsigned bvecs = 0;
+   struct bio_vec bv;
+   struct bvec_iter iter;
 
-   return segs;
+   /*
+* We special case discard/write same/write zeroes, because they
+* interpret bi_size differently:
+*/
+   switch (bio_op(bio)) {
+   case REQ_OP_DISCARD:
+   case REQ_OP_SECURE_ERASE:
+   case REQ_OP_WRITE_ZEROES:
+   return 0;
+   case REQ_OP_WRITE_SAME:
+   return 1;
+   default:
+   bio_for_each_bvec(bv, bio, iter)
+   bvecs++;
+   return bvecs;
+   }
 }
 
 /*
-- 
2.9.5



[Cluster-devel] [PATCH V10 16/19] block: document usage of bio iterator helpers

2018-11-15 Thread Ming Lei
Now multi-page bvec is supported, some helpers may return page by
page, meantime some may return segment by segment, this patch
documents the usage.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 Documentation/block/biovecs.txt | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
index 25689584e6e0..bfafb70d0d9e 100644
--- a/Documentation/block/biovecs.txt
+++ b/Documentation/block/biovecs.txt
@@ -117,3 +117,29 @@ Other implications:
size limitations and the limitations of the underlying devices. Thus
there's no need to define ->merge_bvec_fn() callbacks for individual block
drivers.
+
+Usage of helpers:
+=
+
+* The following helpers whose names have the suffix of "_all" can only be used
+on non-BIO_CLONED bio, and usually they are used by filesystem code, and driver
+shouldn't use them because bio may have been split before they got to the 
driver:
+
+   bio_for_each_segment_all()
+   bio_first_bvec_all()
+   bio_first_page_all()
+   bio_last_bvec_all()
+
+* The following helpers iterate over single-page bvec, and the local
+variable of 'struct bio_vec' or the reference records single-page IO
+vector during the itearation:
+
+   bio_for_each_segment()
+   bio_for_each_segment_all()
+
+* The following helper iterates over multi-page bvec, and each bvec may
+include multiple physically contiguous pages, and the local variable of
+'struct bio_vec' or the reference records multi-page IO vector during the
+itearation:
+
+   bio_for_each_bvec()
-- 
2.9.5



[Cluster-devel] [PATCH V10 13/19] iomap & xfs: only account for new added page

2018-11-15 Thread Ming Lei
After multi-page is enabled, one new page may be merged to a segment
even though it is a new added page.

This patch deals with this issue by post-check in case of merge, and
only a freshly new added page need to be dealt with for iomap & xfs.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 fs/iomap.c  | 22 ++
 fs/xfs/xfs_aops.c   | 10 --
 include/linux/bio.h | 11 +++
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index df0212560b36..a1b97a5c726a 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -288,6 +288,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
loff_t orig_pos = pos;
unsigned poff, plen;
sector_t sector;
+   bool need_account = false;
 
if (iomap->type == IOMAP_INLINE) {
WARN_ON_ONCE(pos);
@@ -313,18 +314,15 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
 */
sector = iomap_sector(iomap, pos);
if (ctx->bio && bio_end_sector(ctx->bio) == sector) {
-   if (__bio_try_merge_page(ctx->bio, page, plen, poff))
+   if (__bio_try_merge_page(ctx->bio, page, plen, poff)) {
+   need_account = iop && bio_is_last_segment(ctx->bio,
+   page, plen, poff);
goto done;
+   }
is_contig = true;
}
 
-   /*
-* If we start a new segment we need to increase the read count, and we
-* need to do so before submitting any previous full bio to make sure
-* that we don't prematurely unlock the page.
-*/
-   if (iop)
-   atomic_inc(>read_count);
+   need_account = true;
 
if (!ctx->bio || !is_contig || bio_full(ctx->bio)) {
gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL);
@@ -347,6 +345,14 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
__bio_add_page(ctx->bio, page, plen, poff);
 done:
/*
+* If we add a new page we need to increase the read count, and we
+* need to do so before submitting any previous full bio to make sure
+* that we don't prematurely unlock the page.
+*/
+   if (iop && need_account)
+   atomic_inc(>read_count);
+
+   /*
 * Move the caller beyond our range so that it keeps making progress.
 * For that we have to include any leading non-uptodate ranges, but
 * we can skip trailing ones as they will be handled in the next
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1f1829e506e8..d8e9cc9f751a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -603,6 +603,7 @@ xfs_add_to_ioend(
unsignedlen = i_blocksize(inode);
unsignedpoff = offset & (PAGE_SIZE - 1);
sector_tsector;
+   boolneed_account;
 
sector = xfs_fsb_to_db(ip, wpc->imap.br_startblock) +
((offset - XFS_FSB_TO_B(mp, wpc->imap.br_startoff)) >> 9);
@@ -617,13 +618,18 @@ xfs_add_to_ioend(
}
 
if (!__bio_try_merge_page(wpc->ioend->io_bio, page, len, poff)) {
-   if (iop)
-   atomic_inc(>write_count);
+   need_account = true;
if (bio_full(wpc->ioend->io_bio))
xfs_chain_bio(wpc->ioend, wbc, bdev, sector);
__bio_add_page(wpc->ioend->io_bio, page, len, poff);
+   } else {
+   need_account = iop && bio_is_last_segment(wpc->ioend->io_bio,
+   page, len, poff);
}
 
+   if (iop && need_account)
+   atomic_inc(>write_count);
+
wpc->ioend->io_size += len;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1a2430a8b89d..5040e9a2eb09 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -341,6 +341,17 @@ static inline struct bio_vec *bio_last_bvec_all(struct bio 
*bio)
return >bi_io_vec[bio->bi_vcnt - 1];
 }
 
+/* iomap needs this helper to deal with sub-pagesize bvec */
+static inline bool bio_is_last_segment(struct bio *bio, struct page *page,
+   unsigned int len, unsigned int off)

[Cluster-devel] [PATCH V10 14/19] block: enable multipage bvecs

2018-11-15 Thread Ming Lei
This patch pulls the trigger for multi-page bvecs.

Now any request queue which supports queue cluster will see multi-page
bvecs.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/bio.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 6486722d4d4b..ed6df6f8e63d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -767,12 +767,24 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
 
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
-
-   if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) {
-   bv->bv_len += len;
-   bio->bi_iter.bi_size += len;
-   return true;
-   }
+   struct request_queue *q = NULL;
+
+   if (page == bv->bv_page && off == (bv->bv_offset + bv->bv_len)
+   && (off + len) <= PAGE_SIZE)
+   goto merge;
+
+   if (bio->bi_disk)
+   q = bio->bi_disk->queue;
+
+   /* disable multi-page bvec too if cluster isn't enabled */
+   if (!q || !blk_queue_cluster(q) ||
+   ((page_to_phys(bv->bv_page) + bv->bv_offset + bv->bv_len) !=
+(page_to_phys(page) + off)))
+   return false;
+ merge:
+   bv->bv_len += len;
+   bio->bi_iter.bi_size += len;
+   return true;
}
return false;
 }
-- 
2.9.5



[Cluster-devel] [PATCH V10 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-15 Thread Ming Lei
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all 
bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: linux-fsde...@vger.kernel.org
Cc: Alexander Viro 
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: linux-bt...@vger.kernel.org
Cc: David Sterba 
Cc: Darrick J. Wong 
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/bio.c   | 27 ++-
 block/blk-zoned.c |  1 +
 block/bounce.c|  6 --
 drivers/md/bcache/btree.c |  3 ++-
 drivers/md/dm-crypt.c |  3 ++-
 drivers/md/raid1.c|  3 ++-
 drivers/staging/erofs/data.c  |  3 ++-
 drivers/staging/erofs/unzip_vle.c |  3 ++-
 fs/block_dev.c|  6 --
 fs/btrfs/compression.c|  3 ++-
 fs/btrfs/disk-io.c|  3 ++-
 fs/btrfs/extent_io.c  | 12 
 fs/btrfs/inode.c  |  6 --
 fs/btrfs/raid56.c |  3 ++-
 fs/crypto/bio.c   |  3 ++-
 fs/direct-io.c|  4 +++-
 fs/exofs/ore.c|  3 ++-
 fs/exofs/ore_raid.c   |  3 ++-
 fs/ext4/page-io.c |  3 ++-
 fs/ext4/readpage.c|  3 ++-
 fs/f2fs/data.c|  9 ++---
 fs/gfs2/lops.c|  6 --
 fs/gfs2/meta_io.c |  3 ++-
 fs/iomap.c|  6 --
 fs/mpage.c|  3 ++-
 fs/xfs/xfs_aops.c |  5 +++--
 include/linux/bio.h   | 11 +--
 include/linux/bvec.h  | 31 +++
 28 files changed, 129 insertions(+), 46 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d5368a445561..6486722d4d4b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1072,8 +1072,9 @@ static int bio_copy_from_iter(struct bio *bio, struct 
iov_iter *iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_from_iter(bvec->bv_page,
@@ -1103,8 +1104,9 @@ static int bio_copy_to_iter(struct bio *bio, struct 
iov_iter iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_to_iter(bvec->bv_page,
@@ -1126,8 +1128,9 @@ void bio_free_pages(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i)
+   bio_for_each_segment_all(bvec, bio, i, iter_all)
__free_page(bvec->bv_page);
 }
 EXPORT_SYMBOL(bio_free_pages);
@@ -1293,6 +1296,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
struct bio *bio;
int ret;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL);
@@ -1366,7 +1370,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
return bio;
 
  out_unmap:
-   bio_for_each_segment_all(bvec, bio, j) {
+   bio_for_each_segment_all(bvec, bio, j, iter_all) {
put_page(bvec->bv_page);
}
bio_put(bio);
@@ -1377,11 +1381,12 @@ static void __bio_unmap_user(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
/*
 * make sure we dirty pages we wrote to
 */
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec->bv_page);
 
@@ -1473,8 +1478,9 @@ static void bio_copy_kern_endio_read(struct bio *bio)
char *p = bio->bi_private;
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
p += bvec->bv_len;
}
@@ -1583,8 +1589,9 @@ void bio_set_pages_dirty(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+ 

[Cluster-devel] [PATCH V10 00/19] block: support multi-page bvec

2018-11-15 Thread Ming Lei
pers

- redefine BIO_MAX_PAGES as 256 to make the biggest bvec table
accommodated in 4K page

- move bio_alloc_pages() into bcache as suggested by Christoph

V3:
- rebase on v4.13-rc3 with for-next of block tree
- run more xfstests: xfs/ext4 over NVMe, Sata, DM(linear),
MD(raid1), and not see regressions triggered
- add Reviewed-by on some btrfs patches
- remove two MD patches because both are merged to linus tree
  already

V2:
- bvec table direct access in raid has been cleaned, so NO_MP
flag is dropped
- rebase on recent Neil Brown's change on bio and bounce code
- reorganize the patchset

V1:
- against v4.10-rc1 and some cleanup in V0 are in -linus already
- handle queue_virt_boundary() in mp bvec change and make NVMe happy
- further BTRFS cleanup
- remove QUEUE_FLAG_SPLIT_MP
- rename for two new helpers of bio_for_each_segment_all()
- fix bounce convertion
- address comments in V0

[1], http://marc.info/?l=linux-kernel=141680246629547=2
[2], https://patchwork.kernel.org/patch/9451523/
[3], http://marc.info/?t=14773544711=1=2
[4], http://marc.info/?l=linux-mm=147745525801433=2
[5], http://marc.info/?t=14956948457=1=2
[6], http://marc.info/?t=14982021534=1=2


Ming Lei (19):
  block: introduce multi-page page bvec helpers
  block: introduce bio_for_each_bvec()
  block: use bio_for_each_bvec() to compute multi-page bvec count
  block: use bio_for_each_bvec() to map sg
  block: introduce bvec_last_segment()
  fs/buffer.c: use bvec iterator to truncate the bio
  btrfs: use bvec_last_segment to get bio's last page
  btrfs: move bio_pages_all() to btrfs
  block: introduce bio_bvecs()
  block: loop: pass multi-page bvec to iov_iter
  bcache: avoid to use bio_for_each_segment_all() in
bch_bio_alloc_pages()
  block: allow bio_for_each_segment_all() to iterate over multi-page
bvec
  iomap & xfs: only account for new added page
  block: enable multipage bvecs
  block: always define BIO_MAX_PAGES as 256
  block: document usage of bio iterator helpers
  block: don't use bio->bi_vcnt to figure out segment number
  block: kill QUEUE_FLAG_NO_SG_MERGE
  block: kill BLK_MQ_F_SG_MERGE

 Documentation/block/biovecs.txt   |  26 +
 block/bio.c   |  51 +++---
 block/blk-merge.c | 199 +-
 block/blk-mq-debugfs.c|   2 -
 block/blk-mq.c|   3 -
 block/blk-zoned.c |   1 +
 block/bounce.c|   6 +-
 drivers/block/loop.c  |  25 ++---
 drivers/block/nbd.c   |   2 +-
 drivers/block/rbd.c   |   2 +-
 drivers/block/skd_main.c  |   1 -
 drivers/block/xen-blkfront.c  |   2 +-
 drivers/md/bcache/btree.c |   3 +-
 drivers/md/bcache/util.c  |   2 +-
 drivers/md/dm-crypt.c |   3 +-
 drivers/md/dm-rq.c|   2 +-
 drivers/md/dm-table.c |  13 ---
 drivers/md/raid1.c|   3 +-
 drivers/mmc/core/queue.c  |   3 +-
 drivers/scsi/scsi_lib.c   |   2 +-
 drivers/staging/erofs/data.c  |   3 +-
 drivers/staging/erofs/unzip_vle.c |   3 +-
 fs/block_dev.c|   6 +-
 fs/btrfs/compression.c|   8 +-
 fs/btrfs/disk-io.c|   3 +-
 fs/btrfs/extent_io.c  |  29 --
 fs/btrfs/inode.c  |   6 +-
 fs/btrfs/raid56.c |   3 +-
 fs/buffer.c   |   5 +-
 fs/crypto/bio.c   |   3 +-
 fs/direct-io.c|   4 +-
 fs/exofs/ore.c|   3 +-
 fs/exofs/ore_raid.c   |   3 +-
 fs/ext4/page-io.c |   3 +-
 fs/ext4/readpage.c|   3 +-
 fs/f2fs/data.c|   9 +-
 fs/gfs2/lops.c|   6 +-
 fs/gfs2/meta_io.c |   3 +-
 fs/iomap.c|  28 --
 fs/mpage.c|   3 +-
 fs/xfs/xfs_aops.c |  15 ++-
 include/linux/bio.h   |  94 ++
 include/linux/blk-mq.h|   1 -
 include/linux/blkdev.h|   1 -
 include/linux/bvec.h  | 155 +++--
 45 files changed, 556 insertions(+), 195 deletions(-)

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com


-- 
2.9.5



[Cluster-devel] [PATCH V10 06/19] fs/buffer.c: use bvec iterator to truncate the bio

2018-11-15 Thread Ming Lei
Once multi-page bvec is enabled, the last bvec may include more than one
page, this patch use bvec_last_segment() to truncate the bio.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 fs/buffer.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1286c2b95498..fa37ad52e962 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3032,7 +3032,10 @@ void guard_bio_eod(int op, struct bio *bio)
 
/* ..and clear the end of the buffer for reads */
if (op == REQ_OP_READ) {
-   zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
+   struct bio_vec bv;
+
+   bvec_last_segment(bvec, );
+   zero_user(bv.bv_page, bv.bv_offset + bv.bv_len,
truncated_bytes);
}
 }
-- 
2.9.5



[Cluster-devel] [PATCH V10 19/19] block: kill BLK_MQ_F_SG_MERGE

2018-11-15 Thread Ming Lei
QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/blk-mq-debugfs.c   | 1 -
 drivers/block/loop.c | 2 +-
 drivers/block/nbd.c  | 2 +-
 drivers/block/rbd.c  | 2 +-
 drivers/block/skd_main.c | 1 -
 drivers/block/xen-blkfront.c | 2 +-
 drivers/md/dm-rq.c   | 2 +-
 drivers/mmc/core/queue.c | 3 +--
 drivers/scsi/scsi_lib.c  | 2 +-
 include/linux/blk-mq.h   | 1 -
 10 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index e188b1090759..e1c12358391a 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -250,7 +250,6 @@ static const char *const alloc_policy_name[] = {
 static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(SHOULD_MERGE),
HCTX_FLAG_NAME(TAG_SHARED),
-   HCTX_FLAG_NAME(SG_MERGE),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a3fd418ec637..d509902a8046 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1907,7 +1907,7 @@ static int loop_add(struct loop_device **l, int i)
lo->tag_set.queue_depth = 128;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
-   lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
lo->tag_set.driver_data = lo;
 
err = blk_mq_alloc_tag_set(>tag_set);
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 08696f5f00bb..999c94de78e5 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1570,7 +1570,7 @@ static int nbd_dev_add(int index)
nbd->tag_set.numa_node = NUMA_NO_NODE;
nbd->tag_set.cmd_size = sizeof(struct nbd_cmd);
nbd->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
-   BLK_MQ_F_SG_MERGE | BLK_MQ_F_BLOCKING;
+   BLK_MQ_F_BLOCKING;
nbd->tag_set.driver_data = nbd;
 
err = blk_mq_alloc_tag_set(>tag_set);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 8e5140bbf241..3dfd300b5283 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3988,7 +3988,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
rbd_dev->tag_set.ops = _mq_ops;
rbd_dev->tag_set.queue_depth = rbd_dev->opts->queue_depth;
rbd_dev->tag_set.numa_node = NUMA_NO_NODE;
-   rbd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   rbd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
rbd_dev->tag_set.nr_hw_queues = 1;
rbd_dev->tag_set.cmd_size = sizeof(struct work_struct);
 
diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index a10d5736d8f7..a7040f9a1b1b 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -2843,7 +2843,6 @@ static int skd_cons_disk(struct skd_device *skdev)
skdev->sgs_per_request * sizeof(struct scatterlist);
skdev->tag_set.numa_node = NUMA_NO_NODE;
skdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
-   BLK_MQ_F_SG_MERGE |
BLK_ALLOC_POLICY_TO_MQ_FLAG(BLK_TAG_ALLOC_FIFO);
skdev->tag_set.driver_data = skdev;
rc = blk_mq_alloc_tag_set(>tag_set);
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 56452cabce5b..297412bf23e1 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -977,7 +977,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
} else
info->tag_set.queue_depth = BLK_RING_SIZE(info);
info->tag_set.numa_node = NUMA_NO_NODE;
-   info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
info->tag_set.cmd_size = sizeof(struct blkif_req);
info->tag_set.driver_data = info;
 
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 7cd36e4d1310..140ada0b99fc 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -536,7 +536,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, 
struct dm_table *t)
md->tag_set->ops = _mq_ops;
md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
md->tag_set->numa_node = md->numa_node_id;
-   md->tag_set->

[Cluster-devel] [PATCH V10 15/19] block: always define BIO_MAX_PAGES as 256

2018-11-15 Thread Ming Lei
Now multi-page bvec can cover CONFIG_THP_SWAP, so we don't need to
increase BIO_MAX_PAGES for it.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 include/linux/bio.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5040e9a2eb09..277921ad42e7 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -34,15 +34,7 @@
 #define BIO_BUG_ON
 #endif
 
-#ifdef CONFIG_THP_SWAP
-#if HPAGE_PMD_NR > 256
-#define BIO_MAX_PAGES  HPAGE_PMD_NR
-#else
 #define BIO_MAX_PAGES  256
-#endif
-#else
-#define BIO_MAX_PAGES  256
-#endif
 
 #define bio_prio(bio)  (bio)->bi_ioprio
 #define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio)
-- 
2.9.5



[Cluster-devel] [PATCH V10 02/19] block: introduce bio_for_each_bvec()

2018-11-15 Thread Ming Lei
This helper is used for iterating over multi-page bvec for bio
split & merge code.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 include/linux/bio.h  | 34 +++---
 include/linux/bvec.h | 36 
 2 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 056fb627edb3..1f0dcf109841 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -76,6 +76,9 @@
 #define bio_data_dir(bio) \
(op_is_write(bio_op(bio)) ? WRITE : READ)
 
+#define bio_iter_mp_iovec(bio, iter)   \
+   mp_bvec_iter_bvec((bio)->bi_io_vec, (iter))
+
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
  */
@@ -135,18 +138,33 @@ static inline bool bio_full(struct bio *bio)
 #define bio_for_each_segment_all(bvl, bio, i)  \
for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
 
-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
-   unsigned bytes)
+static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes, bool mp)
 {
iter->bi_sector += bytes >> 9;
 
if (bio_no_advance_iter(bio))
iter->bi_size -= bytes;
else
-   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+   if (!mp)
+   bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+   else
+   mp_bvec_iter_advance(bio->bi_io_vec, iter, bytes);
/* TODO: It is reasonable to complete bio with error here. */
 }
 
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+   unsigned bytes)
+{
+   __bio_advance_iter(bio, iter, bytes, false);
+}
+
+static inline void bio_advance_mp_iter(struct bio *bio, struct bvec_iter *iter,
+  unsigned bytes)
+{
+   __bio_advance_iter(bio, iter, bytes, true);
+}
+
 #define __bio_for_each_segment(bvl, bio, iter, start)  \
for (iter = (start);\
 (iter).bi_size &&  \
@@ -156,6 +174,16 @@ static inline void bio_advance_iter(struct bio *bio, 
struct bvec_iter *iter,
 #define bio_for_each_segment(bvl, bio, iter)   \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
+#define __bio_for_each_bvec(bvl, bio, iter, start) \
+   for (iter = (start);\
+(iter).bi_size &&  \
+   ((bvl = bio_iter_mp_iovec((bio), (iter))), 1);  \
+bio_advance_mp_iter((bio), &(iter), (bvl).bv_len))
+
+/* returns one real segment(multipage bvec) each time */
+#define bio_for_each_bvec(bvl, bio, iter)  \
+   __bio_for_each_bvec(bvl, bio, iter, (bio)->bi_iter)
+
 #define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)
 
 static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 8ef904a50577..3d61352cd8cf 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -124,8 +124,16 @@ struct bvec_iter {
.bv_offset  = bvec_iter_offset((bvec), (iter)), \
 })
 
-static inline bool bvec_iter_advance(const struct bio_vec *bv,
-   struct bvec_iter *iter, unsigned bytes)
+#define mp_bvec_iter_bvec(bvec, iter)  \
+((struct bio_vec) {\
+   .bv_page= mp_bvec_iter_page((bvec), (iter)),\
+   .bv_len = mp_bvec_iter_len((bvec), (iter)), \
+   .bv_offset  = mp_bvec_iter_offset((bvec), (iter)),  \
+})
+
+static inline bool __bvec_iter_advance(const struct bio_vec *bv,
+  struct bvec_iter *iter,
+  unsigned bytes, bool mp)
 {
if (WARN_ONCE(bytes > iter->bi_size,
 "Attempted to advance past end of bvec iter\n")) {
@@ -134,8 +142,14 @@ static inline bool bvec_iter_advance(const struct bio_vec 
*bv,
}
 
while (bytes) {
-   unsigne

[Cluster-devel] [PATCH V10 05/19] block: introduce bvec_last_segment()

2018-11-15 Thread Ming Lei
BTRFS and guard_bio_eod() need to get the last singlepage segment
from one multipage bvec, so introduce this helper to make them happy.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 3d61352cd8cf..01616a0b6220 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -216,4 +216,29 @@ static inline bool mp_bvec_iter_advance(const struct 
bio_vec *bv,
.bi_bvec_done   = 0,\
 }
 
+/*
+ * Get the last singlepage segment from the multipage bvec and store it
+ * in @seg
+ */
+static inline void bvec_last_segment(const struct bio_vec *bvec,
+   struct bio_vec *seg)
+{
+   unsigned total = bvec->bv_offset + bvec->bv_len;
+   unsigned last_page = total / PAGE_SIZE;
+
+   if (last_page * PAGE_SIZE == total)
+   last_page--;
+
+   seg->bv_page = nth_page(bvec->bv_page, last_page);
+
+   /* the whole segment is inside the last page */
+   if (bvec->bv_offset >= last_page * PAGE_SIZE) {
+   seg->bv_offset = bvec->bv_offset % PAGE_SIZE;
+   seg->bv_len = bvec->bv_len;
+   } else {
+   seg->bv_offset = 0;
+   seg->bv_len = total - last_page * PAGE_SIZE;
+   }
+}
+
 #endif /* __LINUX_BVEC_ITER_H */
-- 
2.9.5



[Cluster-devel] [PATCH V10 08/19] btrfs: move bio_pages_all() to btrfs

2018-11-15 Thread Ming Lei
BTRFS is the only user of this helper, so move this helper into
BTRFS, and implement it via bio_for_each_segment_all(), since
bio->bi_vcnt may not equal to number of pages after multipage bvec
is enabled.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 fs/btrfs/extent_io.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5d5965297e7e..874bb9aeebdc 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2348,6 +2348,18 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, 
struct bio *failed_bio,
return bio;
 }
 
+static unsigned btrfs_bio_pages_all(struct bio *bio)
+{
+   unsigned i;
+   struct bio_vec *bv;
+
+   WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+
+   bio_for_each_segment_all(bv, bio, i)
+   ;
+   return i;
+}
+
 /*
  * this is a generic handler for readpage errors (default
  * readpage_io_failed_hook). if other copies exist, read those and write back
@@ -2368,7 +2380,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 
phy_offset,
int read_mode = 0;
blk_status_t status;
int ret;
-   unsigned failed_bio_pages = bio_pages_all(failed_bio);
+   unsigned failed_bio_pages = btrfs_bio_pages_all(failed_bio);
 
BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
-- 
2.9.5



[Cluster-devel] [PATCH V10 04/19] block: use bio_for_each_bvec() to map sg

2018-11-15 Thread Ming Lei
It is more efficient to use bio_for_each_bvec() to map sg, meantime
we have to consider splitting multipage bvec as done in blk_bio_segment_split().

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/blk-merge.c | 72 +++
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6f7deb94a23f..cb9f49bcfd36 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -473,6 +473,56 @@ static int blk_phys_contig_segment(struct request_queue 
*q, struct bio *bio,
return biovec_phys_mergeable(q, _bv, _bv);
 }
 
+static struct scatterlist *blk_next_sg(struct scatterlist **sg,
+   struct scatterlist *sglist)
+{
+   if (!*sg)
+   return sglist;
+   else {
+   /*
+* If the driver previously mapped a shorter
+* list, we could see a termination bit
+* prematurely unless it fully inits the sg
+* table on each mapping. We KNOW that there
+* must be more entries here or the driver
+* would be buggy, so force clear the
+* termination bit to avoid doing a full
+* sg_init_table() in drivers for each command.
+*/
+   sg_unmark_end(*sg);
+   return sg_next(*sg);
+   }
+}
+
+static unsigned blk_bvec_map_sg(struct request_queue *q,
+   struct bio_vec *bvec, struct scatterlist *sglist,
+   struct scatterlist **sg)
+{
+   unsigned nbytes = bvec->bv_len;
+   unsigned nsegs = 0, total = 0;
+
+   while (nbytes > 0) {
+   unsigned seg_size;
+   struct page *pg;
+   unsigned offset, idx;
+
+   *sg = blk_next_sg(sg, sglist);
+
+   seg_size = min(nbytes, queue_max_segment_size(q));
+   offset = (total + bvec->bv_offset) % PAGE_SIZE;
+   idx = (total + bvec->bv_offset) / PAGE_SIZE;
+   pg = nth_page(bvec->bv_page, idx);
+
+   sg_set_page(*sg, pg, seg_size, offset);
+
+   total += seg_size;
+   nbytes -= seg_size;
+   nsegs++;
+   }
+
+   return nsegs;
+}
+
 static inline void
 __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
 struct scatterlist *sglist, struct bio_vec *bvprv,
@@ -490,25 +540,7 @@ __blk_segment_map_sg(struct request_queue *q, struct 
bio_vec *bvec,
(*sg)->length += nbytes;
} else {
 new_segment:
-   if (!*sg)
-   *sg = sglist;
-   else {
-   /*
-* If the driver previously mapped a shorter
-* list, we could see a termination bit
-* prematurely unless it fully inits the sg
-* table on each mapping. We KNOW that there
-* must be more entries here or the driver
-* would be buggy, so force clear the
-* termination bit to avoid doing a full
-* sg_init_table() in drivers for each command.
-*/
-   sg_unmark_end(*sg);
-   *sg = sg_next(*sg);
-   }
-
-   sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
-   (*nsegs)++;
+   (*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);
}
*bvprv = *bvec;
 }
@@ -530,7 +562,7 @@ static int __blk_bios_map_sg(struct request_queue *q, 
struct bio *bio,
int cluster = blk_queue_cluster(q), nsegs = 0;
 
for_each_bio(bio)
-   bio_for_each_segment(bvec, bio, iter)
+   bio_for_each_bvec(bvec, bio, iter)
__blk_segment_map_sg(q, , sglist, , sg,
 , );
 
-- 
2.9.5



[Cluster-devel] [PATCH V10 10/19] block: loop: pass multi-page bvec to iov_iter

2018-11-15 Thread Ming Lei
iov_iter is implemented with bvec itererator, so it is safe to pass
multipage bvec to it, and this way is much more efficient than
passing one page in each bvec.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 drivers/block/loop.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bf6bc35aaf88..a3fd418ec637 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -515,16 +515,16 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
struct bio *bio = rq->bio;
struct file *file = lo->lo_backing_file;
unsigned int offset;
-   int segments = 0;
+   int nr_bvec = 0;
int ret;
 
if (rq->bio != rq->biotail) {
-   struct req_iterator iter;
+   struct bvec_iter iter;
struct bio_vec tmp;
 
__rq_for_each_bio(bio, rq)
-   segments += bio_segments(bio);
-   bvec = kmalloc_array(segments, sizeof(struct bio_vec),
+   nr_bvec += bio_bvecs(bio);
+   bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
 GFP_NOIO);
if (!bvec)
return -EIO;
@@ -533,13 +533,14 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
/*
 * The bios of the request may be started from the middle of
 * the 'bvec' because of bio splitting, so we can't directly
-* copy bio->bi_iov_vec to new bvec. The rq_for_each_segment
+* copy bio->bi_iov_vec to new bvec. The bio_for_each_bvec
 * API will take care of all details for us.
 */
-   rq_for_each_segment(tmp, rq, iter) {
-   *bvec = tmp;
-   bvec++;
-   }
+   __rq_for_each_bio(bio, rq)
+   bio_for_each_bvec(tmp, bio, iter) {
+   *bvec = tmp;
+   bvec++;
+   }
bvec = cmd->bvec;
offset = 0;
} else {
@@ -550,11 +551,11 @@ static int lo_rw_aio(struct loop_device *lo, struct 
loop_cmd *cmd,
 */
offset = bio->bi_iter.bi_bvec_done;
bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
-   segments = bio_segments(bio);
+   nr_bvec = bio_bvecs(bio);
}
atomic_set(>ref, 2);
 
-   iov_iter_bvec(, rw, bvec, segments, blk_rq_bytes(rq));
+   iov_iter_bvec(, rw, bvec, nr_bvec, blk_rq_bytes(rq));
iter.iov_offset = offset;
 
cmd->iocb.ki_pos = pos;
-- 
2.9.5



[Cluster-devel] [PATCH V10 01/19] block: introduce multi-page page bvec helpers

2018-11-15 Thread Ming Lei
This patch introduces helpers of 'mp_bvec_iter_*' for multipage
bvec support.

The introduced helpers treate one bvec as real multi-page segment,
which may include more than one pages.

The existed helpers of bvec_iter_* are interfaces for supporting current
bvec iterator which is thought as single-page by drivers, fs, dm and
etc. These introduced helpers will build single-page bvec in flight, so
this way won't break current bio/bvec users, which needn't any change.

Cc: Dave Chinner 
Cc: Kent Overstreet 
Cc: Mike Snitzer 
Cc: dm-de...@redhat.com
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: David Sterba 
Cc: linux-bt...@vger.kernel.org
Cc: Darrick J. Wong 
Cc: linux-...@vger.kernel.org
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 include/linux/bvec.h | 63 +---
 1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 02c73c6aa805..8ef904a50577 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -23,6 +23,44 @@
 #include 
 #include 
 #include 
+#include 
+
+/*
+ * What is multi-page bvecs?
+ *
+ * - bvecs stored in bio->bi_io_vec is always multi-page(mp) style
+ *
+ * - bvec(struct bio_vec) represents one physically contiguous I/O
+ *   buffer, now the buffer may include more than one pages after
+ *   multi-page(mp) bvec is supported, and all these pages represented
+ *   by one bvec is physically contiguous. Before mp support, at most
+ *   one page is included in one bvec, we call it single-page(sp)
+ *   bvec.
+ *
+ * - .bv_page of the bvec represents the 1st page in the mp bvec
+ *
+ * - .bv_offset of the bvec represents offset of the buffer in the bvec
+ *
+ * The effect on the current drivers/filesystem/dm/bcache/...:
+ *
+ * - almost everyone supposes that one bvec only includes one single
+ *   page, so we keep the sp interface not changed, for example,
+ *   bio_for_each_segment() still returns bvec with single page
+ *
+ * - bio_for_each_segment*() will be changed to return single-page
+ *   bvec too
+ *
+ * - during iterating, iterator variable(struct bvec_iter) is always
+ *   updated in multipage bvec style and that means bvec_iter_advance()
+ *   is kept not changed
+ *
+ * - returned(copied) single-page bvec is built in flight by bvec
+ *   helpers from the stored multipage bvec
+ *
+ * - In case that some components(such as iov_iter) need to support
+ *   multi-page bvec, we introduce new helpers(mp_bvec_iter_*) for
+ *   them.
+ */
 
 /*
  * was unsigned short, but we might as well be ready for > 64kB I/O pages
@@ -50,16 +88,35 @@ struct bvec_iter {
  */
 #define __bvec_iter_bvec(bvec, iter)   (&(bvec)[(iter).bi_idx])
 
-#define bvec_iter_page(bvec, iter) \
+#define mp_bvec_iter_page(bvec, iter)  \
(__bvec_iter_bvec((bvec), (iter))->bv_page)
 
-#define bvec_iter_len(bvec, iter)  \
+#define mp_bvec_iter_len(bvec, iter)   \
min((iter).bi_size, \
__bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)
 
-#define bvec_iter_offset(bvec, iter)   \
+#define mp_bvec_iter_offset(bvec, iter)\
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
 
+#define mp_bvec_iter_page_idx(bvec, iter)  \
+   (mp_bvec_iter_offset((bvec), (iter)) / PAGE_SIZE)
+
+/*
+ *  of single-page(sp) segment.
+ *
+ * This helpers are for building sp bvec in flight.
+ */
+#define bvec_iter_offset(bvec, iter)   \
+   (mp_bvec_iter_offset((bvec), (iter)) % PAGE_SIZE)
+
+#define bvec_iter_len(bvec, iter)  \
+   min_t(unsigned, mp_bvec_iter_len((bvec), (iter)),   \
+   (PAGE_SIZE - (bvec_iter_offset((bvec), (iter)
+
+#define bvec_iter_page(bvec, iter) \
+   nth_page(mp_bvec_iter_page((bvec), (iter)), \
+mp_bvec_iter_page_idx((bvec), (iter)))
+
 #define bvec_iter_bvec(bvec, iter) \
 ((struct bio_vec) {\
.bv_page= bvec_iter_page((bvec), (iter)),   \
-- 
2.9.5



[Cluster-devel] [PATCH V9 12/19] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-13 Thread Ming Lei
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all 
bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Cc: linux-fsde...@vger.kernel.org
Cc: Alexander Viro 
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: linux-bt...@vger.kernel.org
Cc: David Sterba 
Cc: Darrick J. Wong 
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/bio.c   | 27 ++-
 block/blk-zoned.c |  1 +
 block/bounce.c|  6 --
 drivers/md/bcache/btree.c |  3 ++-
 drivers/md/dm-crypt.c |  3 ++-
 drivers/md/raid1.c|  3 ++-
 drivers/staging/erofs/data.c  |  3 ++-
 drivers/staging/erofs/unzip_vle.c |  3 ++-
 fs/block_dev.c|  6 --
 fs/btrfs/compression.c|  3 ++-
 fs/btrfs/disk-io.c|  3 ++-
 fs/btrfs/extent_io.c  | 12 
 fs/btrfs/inode.c  |  6 --
 fs/btrfs/raid56.c |  3 ++-
 fs/crypto/bio.c   |  3 ++-
 fs/direct-io.c|  4 +++-
 fs/exofs/ore.c|  3 ++-
 fs/exofs/ore_raid.c   |  3 ++-
 fs/ext4/page-io.c |  3 ++-
 fs/ext4/readpage.c|  3 ++-
 fs/f2fs/data.c|  9 ++---
 fs/gfs2/lops.c|  6 --
 fs/gfs2/meta_io.c |  3 ++-
 fs/iomap.c|  6 --
 fs/mpage.c|  3 ++-
 fs/xfs/xfs_aops.c |  5 +++--
 include/linux/bio.h   | 11 +--
 include/linux/bvec.h  | 31 +++
 28 files changed, 129 insertions(+), 46 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d5368a445561..6486722d4d4b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1072,8 +1072,9 @@ static int bio_copy_from_iter(struct bio *bio, struct 
iov_iter *iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_from_iter(bvec->bv_page,
@@ -1103,8 +1104,9 @@ static int bio_copy_to_iter(struct bio *bio, struct 
iov_iter iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_to_iter(bvec->bv_page,
@@ -1126,8 +1128,9 @@ void bio_free_pages(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i)
+   bio_for_each_segment_all(bvec, bio, i, iter_all)
__free_page(bvec->bv_page);
 }
 EXPORT_SYMBOL(bio_free_pages);
@@ -1293,6 +1296,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
struct bio *bio;
int ret;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL);
@@ -1366,7 +1370,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
return bio;
 
  out_unmap:
-   bio_for_each_segment_all(bvec, bio, j) {
+   bio_for_each_segment_all(bvec, bio, j, iter_all) {
put_page(bvec->bv_page);
}
bio_put(bio);
@@ -1377,11 +1381,12 @@ static void __bio_unmap_user(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
/*
 * make sure we dirty pages we wrote to
 */
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec->bv_page);
 
@@ -1473,8 +1478,9 @@ static void bio_copy_kern_endio_read(struct bio *bio)
char *p = bio->bi_private;
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
p += bvec->bv_len;
}
@@ -1583,8 +1589,9 @@ void bio_set_pages_dirty(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
- 

[Cluster-devel] [PATCH V8 12/18] block: allow bio_for_each_segment_all() to iterate over multi-page bvec

2018-11-09 Thread Ming Lei
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all 
bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Cc: linux-fsde...@vger.kernel.org
Cc: Alexander Viro 
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: linux-er...@lists.ozlabs.org
Cc: linux-bt...@vger.kernel.org
Cc: David Sterba 
Cc: Darrick J. Wong 
Cc: Gao Xiang 
Cc: Christoph Hellwig 
Cc: Theodore Ts'o 
Cc: linux-e...@vger.kernel.org
Cc: Coly Li 
Cc: linux-bca...@vger.kernel.org
Cc: Boaz Harrosh 
Cc: Bob Peterson 
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei 
---
 block/bio.c   | 27 ++-
 block/blk-zoned.c |  1 +
 block/bounce.c|  6 --
 drivers/md/bcache/btree.c |  3 ++-
 drivers/md/dm-crypt.c |  3 ++-
 drivers/md/raid1.c|  3 ++-
 drivers/staging/erofs/data.c  |  3 ++-
 drivers/staging/erofs/unzip_vle.c |  3 ++-
 fs/block_dev.c|  6 --
 fs/btrfs/compression.c|  3 ++-
 fs/btrfs/disk-io.c|  3 ++-
 fs/btrfs/extent_io.c  | 12 
 fs/btrfs/inode.c  |  6 --
 fs/btrfs/raid56.c |  3 ++-
 fs/crypto/bio.c   |  3 ++-
 fs/direct-io.c|  4 +++-
 fs/exofs/ore.c|  3 ++-
 fs/exofs/ore_raid.c   |  3 ++-
 fs/ext4/page-io.c |  3 ++-
 fs/ext4/readpage.c|  3 ++-
 fs/f2fs/data.c|  9 ++---
 fs/gfs2/lops.c|  6 --
 fs/gfs2/meta_io.c |  3 ++-
 fs/iomap.c|  6 --
 fs/mpage.c|  3 ++-
 fs/xfs/xfs_aops.c |  5 +++--
 include/linux/bio.h   | 11 +--
 include/linux/bvec.h  | 31 +++
 28 files changed, 129 insertions(+), 46 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d5368a445561..6486722d4d4b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1072,8 +1072,9 @@ static int bio_copy_from_iter(struct bio *bio, struct 
iov_iter *iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_from_iter(bvec->bv_page,
@@ -1103,8 +1104,9 @@ static int bio_copy_to_iter(struct bio *bio, struct 
iov_iter iter)
 {
int i;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
ssize_t ret;
 
ret = copy_page_to_iter(bvec->bv_page,
@@ -1126,8 +1128,9 @@ void bio_free_pages(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i)
+   bio_for_each_segment_all(bvec, bio, i, iter_all)
__free_page(bvec->bv_page);
 }
 EXPORT_SYMBOL(bio_free_pages);
@@ -1293,6 +1296,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
struct bio *bio;
int ret;
struct bio_vec *bvec;
+   struct bvec_iter_all iter_all;
 
if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL);
@@ -1366,7 +1370,7 @@ struct bio *bio_map_user_iov(struct request_queue *q,
return bio;
 
  out_unmap:
-   bio_for_each_segment_all(bvec, bio, j) {
+   bio_for_each_segment_all(bvec, bio, j, iter_all) {
put_page(bvec->bv_page);
}
bio_put(bio);
@@ -1377,11 +1381,12 @@ static void __bio_unmap_user(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
/*
 * make sure we dirty pages we wrote to
 */
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec->bv_page);
 
@@ -1473,8 +1478,9 @@ static void bio_copy_kern_endio_read(struct bio *bio)
char *p = bio->bi_private;
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all(bvec, bio, i, iter_all) {
memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
p += bvec->bv_len;
}
@@ -1583,8 +1589,9 @@ void bio_set_pages_dirty(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all iter_all;
 
- 

[Cluster-devel] [PATCH v3 42/49] gfs2: convert to bio_for_each_segment_all_sp()

2017-08-08 Thread Ming Lei
Cc: Steven Whitehouse <swhit...@redhat.com>
Cc: Bob Peterson <rpete...@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei <ming@redhat.com>
---
 fs/gfs2/lops.c| 3 ++-
 fs/gfs2/meta_io.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 3010f9edd177..d1fd8ed01b9e 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -206,11 +206,12 @@ static void gfs2_end_log_write(struct bio *bio)
struct bio_vec *bvec;
struct page *page;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_status)
fs_err(sdp, "Error %d writing to log\n", bio->bi_status);
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
page = bvec->bv_page;
if (page_has_buffers(page))
gfs2_end_log_write_bh(sdp, bvec, bio->bi_status);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index fabe1614f879..6879b0103539 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -190,8 +190,9 @@ static void gfs2_meta_read_endio(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct buffer_head *bh = page_buffers(page);
unsigned int len = bvec->bv_len;
-- 
2.9.4



[Cluster-devel] [PATCH v2 44/51] gfs2: convert to bio_for_each_segment_all_sp()

2017-06-26 Thread Ming Lei
Cc: Steven Whitehouse <swhit...@redhat.com>
Cc: Bob Peterson <rpete...@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Ming Lei <ming@redhat.com>
---
 fs/gfs2/lops.c| 3 ++-
 fs/gfs2/meta_io.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index d62939f00d53..294f1926d9be 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -206,11 +206,12 @@ static void gfs2_end_log_write(struct bio *bio)
struct bio_vec *bvec;
struct page *page;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_status)
fs_err(sdp, "Error %d writing to log\n", bio->bi_status);
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
page = bvec->bv_page;
if (page_has_buffers(page))
gfs2_end_log_write_bh(sdp, bvec, bio->bi_status);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index fabe1614f879..6879b0103539 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -190,8 +190,9 @@ static void gfs2_meta_read_endio(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct buffer_head *bh = page_buffers(page);
unsigned int len = bvec->bv_len;
-- 
2.9.4



[Cluster-devel] [PATCH v1 00/54] block: support multipage bvec

2017-01-05 Thread Ming Lei
Hi,

This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and 
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages from fs in I/O.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.

Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.

With multipage bvec:

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is possibility in future to improve memory footprint
of bvecs usage. 

3) how is multipage bvec implemented in this patchset?

The 1st 9 patches comment on some special cases. As we saw,
most of cases are found as safe for multipage bvec,
only fs/buffer, MD and btrfs need to deal with. Both fs/buffer
and btrfs are dealt with in the following patches based on some
new block APIs for multipage bvec. 

Given a little more work is involved to cleanup MD, this patchset
introduces QUEUE_FLAG_NO_MP for them, and this component can still
see/use singlepage bvec. In the future, once the cleanup is done, the
flag can be killed.

The 2nd part(23 ~ 54) implements multipage bvec in block:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_sp()
and bio_for_each_segment_all_mp()) are introduced. 

Also convert current bio_for_each_segment_all() into the
above two.

- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)

- btrfs cleanup
just three patches for avoiding direct access to bvec table.

These patches can be found in the following git tree:

https://github.com/ming1/linux/commits/mp-bvec-0.6-v4.10-rc

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

TODO:
- cleanup direct access to bvec table for MD

V1:
- against v4.10-rc1 and some cleanup in V0 are in -linus already
- handle queue_virt_boundary() in mp bvec change and make NVMe happy
- further BTRFS cleanup
- remove QUEUE_FLAG_SPLIT_MP
- rename for two new helpers of bio_for_each_segment_all()
- fix bounce convertion
- address comments in V0

[1], http://marc.info/?l=linux-kernel=141680246629547=2
[2], https://patchwork.kernel.org/patch/9451523/
[3], http://marc.info/?t=14773544711=1=2
[4], http://marc.info/?l=linux-mm=147745525801433=2


Ming Lei (54):
  block: drbd: comment on direct access bvec table
  block: loop: comment on direct access to bvec table
  kernel/power/swap.c: comment on direct access to bvec table
  mm: page_io.c: comment on direct access to bvec table
  fs/buffer: comment on direct access to bvec table
  f2fs: f2fs_read_end_io: comment on direct access to bvec table
  bcache: comment on direct access to bvec table
  block: comment on bio_alloc_pages()
  block: comment on bio_iov_iter_get_pages()
  block: introduce flag QUEUE_FLAG_NO_MP
  md: set NO_MP for request queue of md
  dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
  block: comments on bio_for_each_segment[_all]
  block: introduce multipage/single page bvec helpers
  block: implement sp version of bvec iterator helpers
  block: introduce bio_for_each_segment_mp()
  block: introduce bio_clone_sp()
  bvec_iter: introduce BVEC_ITER_ALL_INIT
  block: bounce: avoid direct access to bvec table
  block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
  block: introduce bio_can_convert_to_sp()
  block: bounce: convert multipage bvecs into singlepage
  bcache: handle bio_clone() & bvec u

[Cluster-devel] [PATCH v1 43/54] gfs2: convert to bio_for_each_segment_all_sp()

2017-01-05 Thread Ming Lei
Signed-off-by: Ming Lei <tom.leim...@gmail.com>
---
 fs/gfs2/lops.c| 3 ++-
 fs/gfs2/meta_io.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index b1f9144b42c7..ddbd1f772cdb 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -208,13 +208,14 @@ static void gfs2_end_log_write(struct bio *bio)
struct bio_vec *bvec;
struct page *page;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_error) {
sdp->sd_log_error = bio->bi_error;
fs_err(sdp, "Error %d writing to log\n", bio->bi_error);
}
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
page = bvec->bv_page;
if (page_has_buffers(page))
gfs2_end_log_write_bh(sdp, bvec, bio->bi_error);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 49db8ef13fdf..317cc8ed74ce 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -190,8 +190,9 @@ static void gfs2_meta_read_endio(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct buffer_head *bh = page_buffers(page);
unsigned int len = bvec->bv_len;
-- 
2.7.4



Re: [Cluster-devel] [PATCH 00/60] block: support multipage bvec

2016-11-01 Thread Ming Lei
On Mon, Oct 31, 2016 at 11:25 PM, Christoph Hellwig <h...@infradead.org> wrote:
> Hi Ming,
>
> can you send a first patch just doing the obvious cleanups like
> converting to bio_add_page and replacing direct poking into the
> bio with the proper accessors?  That should help reducing the

OK, that is just the 1st part of the patchset.

> actual series to a sane size, and it should also help to cut
> down the Cc list.
>



Thanks,
Ming Lei



[Cluster-devel] [PATCH 00/60] block: support multipage bvec

2016-10-31 Thread Ming Lei
Hi,

This patchset brings multipage bvec into block layer. Basic
xfstests(-a auto) over virtio-blk/virtio-scsi have been run
and no regression is found, so it should be good enough
to show the approach now, and any comments are welcome!

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and 
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages inside fs/block stack.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment'.

Also huge pages are being brought to filesystem[2], we
can do IO a hugepage a time[3], requires that one bio can
transfer at least one huge page one time. Turns out it isn't
flexiable to change BIO_MAX_PAGES simply[3]. Multipage bvec
can fit in this case very well.

With multipage bvec:

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is possibility in future to improve memory footprint
of bvecs usage. 

3) how is multipage bvec implemented in this patchset?

The 1st 22 patches cleanup on direct access to bvec table,
and comments on some special cases. With this approach,
most of cases are found as safe for multipage bvec,
only fs/buffer, pktcdvd, dm-io, MD and btrfs need to deal
with.

Given a little more work is involved to cleanup pktcdvd,
MD and btrfs, this patchset introduces QUEUE_FLAG_NO_MP for
them, and these components can still see/use singlepage bvec.
In the future, once the cleanup is done, the flag can be killed.

The 2nd part(23 ~ 60) implements multipage bvec in block:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_rd()
and bio_for_each_segment_all_wt()) are introduced. 

- bio_clone() changes
At default bio_clone still clones one new bio in multipage bvec
way. Also single page version of bio_clone() is introduced
for some special cases, such as only single page bvec is used
for the new cloned bio(bio bounce, ...)

These patches can be found in the following git tree:

https://github.com/ming1/linux/tree/mp-bvec-0.3-v4.9

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

TODO:
- cleanup direct access to bvec table for MD & btrfs


[1], http://marc.info/?l=linux-kernel=141680246629547=2
[2], http://lwn.net/Articles/700781/
[3], http://marc.info/?t=14773544711=1=2
[4], http://marc.info/?l=linux-mm=147745525801433=2


Ming Lei (60):
  block: bio: introduce bio_init_with_vec_table()
  block drivers: convert to bio_init_with_vec_table()
  block: drbd: remove impossible failure handling
  block: floppy: use bio_add_page()
  target: avoid to access .bi_vcnt directly
  bcache: debug: avoid to access .bi_io_vec directly
  dm: crypt: use bio_add_page()
  dm: use bvec iterator helpers to implement .get_page and .next_page
  dm: dm.c: replace 'bio->bi_vcnt == 1' with !bio_multiple_segments
  fs: logfs: convert to bio_add_page() in sync_request()
  fs: logfs: use bio_add_page() in __bdev_writeseg()
  fs: logfs: use bio_add_page() in do_erase()
  fs: logfs: remove unnecesary check
  block: drbd: comment on direct access bvec table
  block: loop: comment on direct access to bvec table
  block: pktcdvd: comment on direct access to bvec table
  kernel/power/swap.c: comment on direct access to bvec table
  mm: page_io.c: comment on direct access to bvec table
  fs/buffer: comment on direct access to bvec table
  f2fs: f2fs_read_end_io: comment on direct access to bvec table
  bcache: comment on direct access to bvec table
  block: comment on bio_alloc_pages()
  block: introduce flag QUEUE_FLAG_NO_MP
  md: set NO_MP for request queue of md
  block: pktcdvd: set NO_MP for pktcdvd request queue
  btrfs: set NO_MP for request queues behind BTRFS
  block: introduce BIO_SP_MAX_SECTORS
  block: introduce QUEUE_FLAG_SPLIT_MP
  dm: limit the max bio size as BIO_SP_MAX_SECTORS << SECTOR_SHIFT
  bcache: set flag of QUEUE_FLAG_SPLIT_MP
  block: introduce multipage/single page bvec helpers
  block: implement sp version of bvec iterator helpers
  block: introduce bio_for_each_segment_mp()
  blo

[Cluster-devel] [PATCH 53/60] gfs2: convert to bio_for_each_segment_all_rd()

2016-10-31 Thread Ming Lei
Signed-off-by: Ming Lei <tom.leim...@gmail.com>
---
 fs/gfs2/lops.c| 3 ++-
 fs/gfs2/meta_io.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 49d5a1b61b06..f03a52e06ce5 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -208,13 +208,14 @@ static void gfs2_end_log_write(struct bio *bio)
struct bio_vec *bvec;
struct page *page;
int i;
+   struct bvec_iter_all bia;
 
if (bio->bi_error) {
sdp->sd_log_error = bio->bi_error;
fs_err(sdp, "Error %d writing to log\n", bio->bi_error);
}
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
page = bvec->bv_page;
if (page_has_buffers(page))
gfs2_end_log_write_bh(sdp, bvec, bio->bi_error);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 373639a59782..3ab7a8609009 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -191,8 +191,9 @@ static void gfs2_meta_read_endio(struct bio *bio)
 {
struct bio_vec *bvec;
int i;
+   struct bvec_iter_all bia;
 
-   bio_for_each_segment_all(bvec, bio, i) {
+   bio_for_each_segment_all_rd(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct buffer_head *bh = page_buffers(page);
unsigned int len = bvec->bv_len;
-- 
2.7.4



<    1   2