Re: [Cluster-devel] [PATCH V15 00/18] block: support multi-page bvec

2019-02-15 Thread Jens Axboe
On 2/15/19 4:13 AM, Ming Lei wrote:
> Hi,
> 
> This patchset brings multi-page bvec into block layer:
> 
> 1) what is multi-page bvec?
> 
> Multipage bvecs means that one 'struct bio_bvec' can hold multiple pages
> which are physically contiguous instead of one single page used in linux
> kernel for long time.
> 
> 2) why is multi-page bvec introduced?
> 
> Kent proposed the idea[1] first. 
> 
> As system's RAM becomes much bigger than before, and huge page, transparent
> huge page and memory compaction are widely used, it is a bit easy now
> to see physically contiguous pages from fs in I/O. On the other hand, from
> block layer's view, it isn't necessary to store intermediate pages into bvec,
> and it is enough to just store the physicallly contiguous 'segment' in each
> io vector.
> 
> Also huge pages are being brought to filesystem and swap [2][6], we can
> do IO on a hugepage each time[3], which requires that one bio can transfer
> at least one huge page one time. Turns out it isn't flexiable to change
> BIO_MAX_PAGES simply[3][5]. Multipage bvec can fit in this case very well.
> As we saw, if CONFIG_THP_SWAP is enabled, BIO_MAX_PAGES can be configured
> as much bigger, such as 512, which requires at least two 4K pages for holding
> the bvec table.
> 
> With multi-page bvec:
> 
> - Inside block layer, both bio splitting and sg map can become more
> efficient than before by just traversing the physically contiguous
> 'segment' instead of each page.
> 
> - segment handling in block layer can be improved much in future since it
> should be quite easy to convert multipage bvec into segment easily. For
> example, we might just store segment in each bvec directly in future.
> 
> - bio size can be increased and it should improve some high-bandwidth IO
> case in theory[4].
> 
> - there is opportunity in future to improve memory footprint of bvecs. 
> 
> 3) how is multi-page bvec implemented in this patchset?
> 
> Patch 1 ~ 3 parpares for supporting multi-page bvec. 
> 
> Patches 4 ~ 14 implement multipage bvec in block layer:
> 
>   - put all tricks into bvec/bio/rq iterators, and as far as
>   drivers and fs use these standard iterators, they are happy
>   with multipage bvec
> 
>   - introduce bio_for_each_bvec() to iterate over multipage bvec for 
> splitting
>   bio and mapping sg
> 
>   - keep current bio_for_each_segment*() to itereate over singlepage bvec 
> and
>   make sure current users won't be broken; especailly, convert to this
>   new helper prototype in single patch 21 given it is bascially a 
> mechanism
>   conversion
> 
>   - deal with iomap & xfs's sub-pagesize io vec in patch 13
> 
>   - enalbe multipage bvec in patch 14 
> 
> Patch 15 redefines BIO_MAX_PAGES as 256.
> 
> Patch 16 documents usages of bio iterator helpers.
> 
> Patch 17~18 kills NO_SG_MERGE.
> 
> These patches can be found in the following git tree:
> 
>   git:  https://github.com/ming1/linux.git  v5.0-blk_mp_bvec_v14
   ^^^

v15?

> Lots of test(blktest, xfstests, ltp io, ...) have been run with this patchset,
> and not see regression.
> 
> Thanks Christoph for reviewing the early version and providing very good
> suggestions, such as: introduce bio_init_with_vec_table(), remove another
> unnecessary helpers for cleanup and so on.
> 
> Thanks Chritoph and Omar for reviewing V10/V11/V12, and provides lots of
> helpful comments.

Applied, thanks Ming. Let's hope it sticks!

-- 
Jens Axboe



Re: [Cluster-devel] [PATCH V15 00/18] block: support multi-page bvec

2019-02-15 Thread Christoph Hellwig
I still don't understand why mp_bvec_last_segment isn't simply
called bvec_last_segment as there is no conflict.  But I don't
want to hold this series up on that as there only are two users
left and we can always just fix it up later.



[Cluster-devel] [PATCH V15 00/18] block: support multi-page bvec

2019-02-15 Thread Ming Lei
Hi,

This patchset brings multi-page bvec into block layer:

1) what is multi-page bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold multiple pages
which are physically contiguous instead of one single page used in linux
kernel for long time.

2) why is multi-page bvec introduced?

Kent proposed the idea[1] first. 

As system's RAM becomes much bigger than before, and huge page, transparent
huge page and memory compaction are widely used, it is a bit easy now
to see physically contiguous pages from fs in I/O. On the other hand, from
block layer's view, it isn't necessary to store intermediate pages into bvec,
and it is enough to just store the physicallly contiguous 'segment' in each
io vector.

Also huge pages are being brought to filesystem and swap [2][6], we can
do IO on a hugepage each time[3], which requires that one bio can transfer
at least one huge page one time. Turns out it isn't flexiable to change
BIO_MAX_PAGES simply[3][5]. Multipage bvec can fit in this case very well.
As we saw, if CONFIG_THP_SWAP is enabled, BIO_MAX_PAGES can be configured
as much bigger, such as 512, which requires at least two 4K pages for holding
the bvec table.

With multi-page bvec:

- Inside block layer, both bio splitting and sg map can become more
efficient than before by just traversing the physically contiguous
'segment' instead of each page.

- segment handling in block layer can be improved much in future since it
should be quite easy to convert multipage bvec into segment easily. For
example, we might just store segment in each bvec directly in future.

- bio size can be increased and it should improve some high-bandwidth IO
case in theory[4].

- there is opportunity in future to improve memory footprint of bvecs. 

3) how is multi-page bvec implemented in this patchset?

Patch 1 ~ 3 parpares for supporting multi-page bvec. 

Patches 4 ~ 14 implement multipage bvec in block layer:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- introduce bio_for_each_bvec() to iterate over multipage bvec for 
splitting
bio and mapping sg

- keep current bio_for_each_segment*() to itereate over singlepage bvec 
and
make sure current users won't be broken; especailly, convert to this
new helper prototype in single patch 21 given it is bascially a 
mechanism
conversion

- deal with iomap & xfs's sub-pagesize io vec in patch 13

- enalbe multipage bvec in patch 14 

Patch 15 redefines BIO_MAX_PAGES as 256.

Patch 16 documents usages of bio iterator helpers.

Patch 17~18 kills NO_SG_MERGE.

These patches can be found in the following git tree:

git:  https://github.com/ming1/linux.git  v5.0-blk_mp_bvec_v14

Lots of test(blktest, xfstests, ltp io, ...) have been run with this patchset,
and not see regression.

Thanks Christoph for reviewing the early version and providing very good
suggestions, such as: introduce bio_init_with_vec_table(), remove another
unnecessary helpers for cleanup and so on.

Thanks Chritoph and Omar for reviewing V10/V11/V12, and provides lots of
helpful comments.

V15:
- rename bio_for_each_mp_bvec/rq_for_each_mp_bvec as
  bio_for_each_bvec/rq_for_each_bvec, as suggested by Christoph,
  so the mp_bvec name is only used by bvec helpers

V14:
- drop patch(patch 4 in V13) for renaming bvec helpers, as suggested by 
Jens
- use mp_bvec_* as multi-page bvec helper name
- fix one build issue, which is caused by missing one converion of
bio_for_each_segment_all in fs/gfs2
- fix one 32bit ARCH specific issue caused by segment boundary mask
overflow

V13:
- rebase on v5.0-rc2
- address Omar's comment on patch 1 of V12 by using V11's approach
- rename one local vairable in patch 15 as suggested by Christoph

V12:
- deal with non-cluster by max segment size & segment boundary limit
- rename bvec helper's name
- revert new change on bvec_iter_advance() in V11
- introduce rq_for_each_bvec()
- use simpler check on enalbing multi-page bvec
- fix Document change

V11:
- address most of reviews from Omar and christoph
- rename mp_bvec_* as segment_* helpers
- remove 'mp' parameter from bvec_iter_advance() and related helpers
- cleanup patch on bvec_split_segs() and blk_bio_segment_split(),
  remove unnecessary checks
- simplify bvec_last_segment()
- drop bio_pages_all()
- introduce dedicated functions/file for handling non-cluser bio for
avoiding checking queue cluster before adding page to bio
- introduce bio_try_merge_segment() for simplifying iomap/xfs page
  accounting code
- Fix Document change

V10:
- no any code change, just add more guys and list into patch's CC list,