Re: [PATCH v7 1/3] bio: limit bio max size

2021-04-19 Thread Changheun Lee
> On 4/18/21 10:49 PM, Changheun Lee wrote:
> >>> @@ -167,6 +168,7 @@ void blk_queue_max_hw_sectors(struct request_queue 
> >>> *q, unsigned int max_hw_secto
> >>>   max_sectors = round_down(max_sectors,
> >>>limits->logical_block_size >> SECTOR_SHIFT);
> >>>   limits->max_sectors = max_sectors;
> >>> + limits->bio_max_bytes = max_sectors << SECTOR_SHIFT;
> >>>  
> >>>   q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
> >>>  }
> >>
> >> Can the new shift operation overflow? If so, how about using
> >> check_shl_overflow()?
> > 
> > Actually, overflow might be not heppen in case of physical device.
> > But I modified as below. feedback about this.
> > 
> > @@ -168,6 +169,9 @@ void blk_queue_max_hw_sectors(struct request_queue *q, 
> > unsigned int max_hw_secto
> >  limits->logical_block_size >> SECTOR_SHIFT);
> > limits->max_sectors = max_sectors;
> >  
> > +   limits->bio_max_bytes = check_shl_overflow(max_sectors, SECTOR_SHIFT,
> > +   >bio_max_bytes) ? UINT_MAX : max_sectors << 
> > SECTOR_SHIFT;
> > +
> > q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
> >  }
> >  EXPORT_SYMBOL(blk_queue_max_hw_sectors);
> 
> If no overflow occurs, check_shl_overflow() stores the result in the
> memory location the third argument points at. So the above expression
> can be simplified into the following:
> 
> if (check_shl_overflow(max_sectors, SECTOR_SHIFT, >bio_max_bytes)) {
>   limits->bio_max_bytes = UINT_MAX;
> }

OK. No problem. It might be more readable.

> 
> >>> diff --git a/include/linux/bio.h b/include/linux/bio.h
> >>> index d0246c92a6e8..e5add63da3af 100644
> >>> --- a/include/linux/bio.h
> >>> +++ b/include/linux/bio.h
> >>> @@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
> >>>   return NULL;
> >>>  }
> >>>  
> >>> +extern unsigned int bio_max_size(struct bio *bio);
> >>
> >> You may want to define bio_max_size() as an inline function in bio.h
> >> such that no additional function calls are introduced in the hot path.
> > 
> > I tried, but it is not easy. because request_queue structure of blkdev.h
> > should be referred in bio.h. I think it's not good to apply as a inline 
> > function.
> 
> Please don't worry about this. Inlining bio_max_size() is not a big
> concern to me.
> 
> Thanks,
> 
> Bart.

I think removing of UINT_MAX setting in blk_stack_limits() might be good.
Because default queue limits for stacking device will be set in
blk_set_stacking_limits(), and blk_set_default_limits() will be called
automatically. So setting of UINT_MAX in blk_stack_limits() is not needed.
And setting in blk_stack_limits() can overwrite bio_max_bytes as a default
after stacking driver set to proper bio_max_bytes value.


Thanks,

Changheun Lee.


Re: [PATCH v7 1/3] bio: limit bio max size

2021-04-19 Thread Changheun Lee
> > @@ -167,6 +168,7 @@ void blk_queue_max_hw_sectors(struct request_queue *q, 
> > unsigned int max_hw_secto
> > max_sectors = round_down(max_sectors,
> >  limits->logical_block_size >> SECTOR_SHIFT);
> > limits->max_sectors = max_sectors;
> > +   limits->bio_max_bytes = max_sectors << SECTOR_SHIFT;
> >  
> > q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
> >  }
> 
> Can the new shift operation overflow? If so, how about using
> check_shl_overflow()?

Actually, overflow might be not heppen in case of physical device.
But I modified as below. feedback about this.

@@ -168,6 +169,9 @@ void blk_queue_max_hw_sectors(struct request_queue *q, 
unsigned int max_hw_secto
 limits->logical_block_size >> SECTOR_SHIFT);
limits->max_sectors = max_sectors;
 
+   limits->bio_max_bytes = check_shl_overflow(max_sectors, SECTOR_SHIFT,
+   >bio_max_bytes) ? UINT_MAX : max_sectors << 
SECTOR_SHIFT;
+
q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(blk_queue_max_hw_sectors);

> 
> >>> @@ -538,6 +540,8 @@ int blk_stack_limits(struct queue_limits *t, struct 
> >>> queue_limits *b,
> >>>  {
> >>>   unsigned int top, bottom, alignment, ret = 0;
> >>>  
> >>> + t->bio_max_bytes = min_not_zero(t->bio_max_bytes, b->bio_max_bytes);
> >>> +
> >>>   t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
> >>>   t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
> >>>   t->max_dev_sectors = min_not_zero(t->max_dev_sectors, 
> >>> b->max_dev_sectors);
> >>
> >> The above will limit bio_max_bytes for all stacked block devices, which
> >> is something we do not want. I propose to set t->bio_max_bytes to
> >> UINT_MAX in blk_stack_limits() and to let the stacked driver (e.g.
> >> dm-crypt) decide whether or not to lower that value.
> > 
> > Actually, bio size should be limited in dm-crypt too. Because almost I/O
> > from user space will be gone to dm-crypt first. I/O issue timing will be
> > delayed if bio size is not limited in dm-crypt.
> > Do you have any idea to decide whether takes lower bio max size, or not
> > in the stacked driver?
> > Add a flag to decide this in driver layer like before?
> > Or insert code manually in each stacked driver if it is needed?
> 
> There will be fewer stacked drivers for which the bio size has to be
> limited than for which the bio size has not to be limited. Hence the
> proposal to set t->bio_max_bytes to UINT_MAX in blk_stack_limits() and
> to let the stacked driver (e.g. dm-crypt) decide whether or not to lower
> that value.

I see what you said. I'll set t->bio_max_bytes to UINT_MAX in
blk_stack_limits() as you mentioned.

> 
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index d0246c92a6e8..e5add63da3af 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
> > return NULL;
> >  }
> >  
> > +extern unsigned int bio_max_size(struct bio *bio);
> 
> You may want to define bio_max_size() as an inline function in bio.h
> such that no additional function calls are introduced in the hot path.

I tried, but it is not easy. because request_queue structure of blkdev.h
should be referred in bio.h. I think it's not good to apply as a inline 
function.


Thanks,

Changheun Lee.


Re: [PATCH v7 1/3] bio: limit bio max size

2021-04-16 Thread Changheun Lee
> On 4/15/21 3:38 AM, Changheun Lee wrote:
> > @@ -167,6 +168,7 @@ void blk_queue_max_hw_sectors(struct request_queue *q, 
> > unsigned int max_hw_secto
> > max_sectors = round_down(max_sectors,
> >  limits->logical_block_size >> SECTOR_SHIFT);
> > limits->max_sectors = max_sectors;
> > +   limits->bio_max_bytes = max_sectors << SECTOR_SHIFT;
> >  
> > q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
> >  }
> 
> Can the new shift operation overflow? If so, how about using
> check_shl_overflow()?

OK, I'll check.

> 
> > @@ -538,6 +540,8 @@ int blk_stack_limits(struct queue_limits *t, struct 
> > queue_limits *b,
> >  {
> > unsigned int top, bottom, alignment, ret = 0;
> >  
> > +   t->bio_max_bytes = min_not_zero(t->bio_max_bytes, b->bio_max_bytes);
> > +
> > t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
> > t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
> > t->max_dev_sectors = min_not_zero(t->max_dev_sectors, 
> > b->max_dev_sectors);
> 
> The above will limit bio_max_bytes for all stacked block devices, which
> is something we do not want. I propose to set t->bio_max_bytes to
> UINT_MAX in blk_stack_limits() and to let the stacked driver (e.g.
> dm-crypt) decide whether or not to lower that value.
> 

Actually, bio size should be limited in dm-crypt too. Because almost I/O
from user space will be gone to dm-crypt first. I/O issue timing will be
delayed if bio size is not limited in dm-crypt.
Do you have any idea to decide whether takes lower bio max size, or not
in the stacked driver?
Add a flag to decide this in driver layer like before?
Or insert code manually in each stacked driver if it is needed?

> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index d0246c92a6e8..e5add63da3af 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
> > return NULL;
> >  }
> >  
> > +extern unsigned int bio_max_size(struct bio *bio);
> 
> You may want to define bio_max_size() as an inline function in bio.h
> such that no additional function calls are introduced in the hot path.

Thanks, I'll try.

> 
> Thanks,
> 
> Bart.


Re: [PATCH v7 1/3] bio: limit bio max size

2021-04-15 Thread Changheun Lee
> On 4/12/21 7:55 PM, Changheun Lee wrote:
> > +unsigned int bio_max_size(struct bio *bio)
> > +{
> > +   struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> > +
> > +   if (blk_queue_limit_bio_size(q))
> > +   return blk_queue_get_max_sectors(q, bio_op(bio))
> > +   << SECTOR_SHIFT;
> > +
> > +   return UINT_MAX;
> > +}
> 
> This patch adds an if-statement to the hot path and that may have a
> slight negative performance impact. I recommend to follow the approach
> of max_hw_sectors. That means removing QUEUE_FLAG_LIMIT_BIO_SIZE and to
> initialize the maximum bio size to UINT_MAX in blk_set_default_limits().
> 
> Thanks,
> 
> Bart.

I modified as Bart's approach. Thanks for your advice.
It's more simple than before. I think it looks good.
Please, review below. I'll prepare new version base on this.


diff --git a/block/bio.c b/block/bio.c
index 50e579088aca..9e5061ecc317 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -255,6 +255,13 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+
+   return q->limits.bio_max_bytes;
+}
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -866,7 +873,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/block/blk-settings.c b/block/blk-settings.c
index b4aa2f37fab6..b167e8db856b 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -37,6 +37,7 @@ EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
  */
 void blk_set_default_limits(struct queue_limits *lim)
 {
+   lim->bio_max_bytes = UINT_MAX;
lim->max_segments = BLK_MAX_SEGMENTS;
lim->max_discard_segments = 1;
lim->max_integrity_segments = 0;
@@ -167,6 +168,7 @@ void blk_queue_max_hw_sectors(struct request_queue *q, 
unsigned int max_hw_secto
max_sectors = round_down(max_sectors,
 limits->logical_block_size >> SECTOR_SHIFT);
limits->max_sectors = max_sectors;
+   limits->bio_max_bytes = max_sectors << SECTOR_SHIFT;
 
q->backing_dev_info->io_pages = max_sectors >> (PAGE_SHIFT - 9);
 }
@@ -538,6 +540,8 @@ int blk_stack_limits(struct queue_limits *t, struct 
queue_limits *b,
 {
unsigned int top, bottom, alignment, ret = 0;
 
+   t->bio_max_bytes = min_not_zero(t->bio_max_bytes, b->bio_max_bytes);
+
t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
t->max_dev_sectors = min_not_zero(t->max_dev_sectors, 
b->max_dev_sectors);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d0246c92a6e8..e5add63da3af 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -106,6 +106,8 @@ static inline void *bio_data(struct bio *bio)
return NULL;
 }
 
+extern unsigned int bio_max_size(struct bio *bio);
+
 /**
  * bio_full - check if the bio is full
  * @bio:   bio to check
@@ -119,7 +121,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
return true;
 
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 158aefae1030..c205d60ac611 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -312,6 +312,8 @@ enum blk_zoned_model {
 };
 
 struct queue_limits {
+   unsigned intbio_max_bytes;
+
unsigned long   bounce_pfn;
unsigned long   seg_boundary_mask;
unsigned long   virt_boundary_mask;


Re: [PATCH v7 2/3] ufs: set QUEUE_FLAG_LIMIT_BIO_SIZE

2021-04-13 Thread Changheun Lee
> On 4/12/21 7:55 PM, Changheun Lee wrote:
> > Set QUEUE_FLAG_LIMIT_BIO_SIZE queue flag to limit bio max size to
> > queue max sectors size for UFS device.
> > 
> > Signed-off-by: Changheun Lee 
> > ---
> >  drivers/scsi/scsi_lib.c   | 2 ++
> >  drivers/scsi/ufs/ufshcd.c | 1 +
> >  include/scsi/scsi_host.h  | 2 ++
> >  3 files changed, 5 insertions(+)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 7d52a11e1b61..73ce6ba7903a 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1838,6 +1838,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, 
> > struct request_queue *q)
> >  * Devices that require a bigger alignment can increase it later.
> >  */
> > blk_queue_dma_alignment(q, max(4, dma_get_cache_alignment()) - 1);
> > +
> > +   blk_queue_set_limit_bio_size(q, shost->limit_bio_size);
> >  }
> >  EXPORT_SYMBOL_GPL(__scsi_init_queue);
> >  
> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > index d3d05e997c13..000eb5ab022e 100644
> > --- a/drivers/scsi/ufs/ufshcd.c
> > +++ b/drivers/scsi/ufs/ufshcd.c
> > @@ -9313,6 +9313,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem 
> > *mmio_base, unsigned int irq)
> > host->max_channel = UFSHCD_MAX_CHANNEL;
> > host->unique_id = host->host_no;
> > host->max_cmd_len = UFS_CDB_SIZE;
> > +   host->limit_bio_size = true;
> >  
> > hba->max_pwr_info.is_valid = false;
> >  
> > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> > index e30fd963b97d..486f61588717 100644
> > --- a/include/scsi/scsi_host.h
> > +++ b/include/scsi/scsi_host.h
> > @@ -607,6 +607,8 @@ struct Scsi_Host {
> > unsigned int max_segment_size;
> > unsigned long dma_boundary;
> > unsigned long virt_boundary_mask;
> > +   unsigned int limit_bio_size;
> > +
> > /*
> >  * In scsi-mq mode, the number of hardware queues supported by the LLD.
> >  *
> 
> This patch should have been split into one patch for the SCSI core and
> another patch for the UFS driver.
> 
> I see an issue with this patch: a new attribute has been introduced in
> struct Scsi_Host but it is not used by the SCSI core. That seems weird
> to me.
> 
> Another comment I have about this patch is why to restrict the BIO size
> (blk_queue_set_limit_bio_size(q, shost->limit_bio_size)) only for UFS
> devices? Aren't all block devices, including NVMe devices, expected to
> benefit from limiting the BIO size if no stacking is involved? How about
> the following approach?
> * In blk_queue_max_hw_sectors(), set the maximum BIO size to
> max_hw_sectors. As you may know blk_queue_max_hw_sectors() may be
> called by any block driver (stacking and non-stacking).
> * In blk_stack_limits(), set the maximum BIO size to UINT_MAX. Stacking
> block drivers must call blk_stack_limits().

I think it will be good for all block device too, but it might be not in
some environment. I got a feedback about it. So I specified to UFS only.
I see what you said. I'll try as your approach.

> 
> Thanks,
> 
> Bart.


Thanks,

Changheun Lee.


Re: [PATCH v7 3/3] bio: add limit_bio_size sysfs

2021-04-13 Thread Changheun Lee
> On Tue, Apr 13, 2021 at 11:55:02AM +0900, Changheun Lee wrote:
> > Add limit_bio_size block sysfs node to limit bio size.
> > Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
> > And bio max size will be limited by queue max sectors via
> > QUEUE_FLAG_LIMIT_BIO_SIZE set.
> > 
> > Signed-off-by: Changheun Lee 
> > ---
> >  Documentation/ABI/testing/sysfs-block | 10 ++
> >  Documentation/block/queue-sysfs.rst   |  7 +++
> >  block/blk-sysfs.c |  3 +++
> >  3 files changed, 20 insertions(+)
> 
> Isn't it too late to change the sysfs entry after the device has been
> probed and initialized by the kernel as the kernel does not look at this
> value after that?

Kernel will take a look this when page is merged to bio. And bio size will
be limited dynamically.

> 
> Why do you need a userspace knob for this?  What tool is going to ever
> change this, and what logic is it going to use to change it?  Why can't
> the kernel also just "do the right thing" and properly detect this
> option as well as userspace can?

Actually I didn't considerate userspace knob at first. And there is no tool,
no logic to change it. It's a kind of policy about bio max size.
One time setting will be OK on system boot time actually.

At the first, I suggested 1MB bio max size. It is same with bio max size
before applying of multipage bvec. But there are requests of big bio size
on another system environment, and feedback of knob for utility too.
So I made this as a optional for each system, and a knob too.

> 
> thanks,
> 
> greg k-h


Thanks,

Changheun Lee


Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-12 Thread Changheun Lee
> On Sun, Apr 11, 2021 at 10:13:01PM +, Damien Le Moal wrote:
> > On 2021/04/09 23:47, Bart Van Assche wrote:
> > > On 4/7/21 3:27 AM, Damien Le Moal wrote:
> > >> On 2021/04/07 18:46, Changheun Lee wrote:
> > >>> I'll prepare new patch as you recommand. It will be added setting of
> > >>> limit_bio_size automatically when queue max sectors is determined.
> > >>
> > >> Please do that in the driver for the HW that benefits from it. Do not do 
> > >> this
> > >> for all block devices.
> > > 
> > > Hmm ... is it ever useful to build a bio with a size that exceeds 
> > > max_hw_sectors when submitting a bio directly to a block device, or in 
> > > other words, if no stacked block driver sits between the submitter and 
> > > the block device? Am I perhaps missing something?
> > 
> > Device performance wise, the benefits are certainly not obvious to me 
> > either.
> > But for very fast block devices, I think the CPU overhead of building more
> > smaller BIOs may be significant compared to splitting a large BIO into 
> > multiple
> > requests. Though it may be good to revisit this with some benchmark numbers.
> 
> This patch tries to address issue[1] in do_direct_IO() in which
> Changheun observed that other operations takes time between adding page
> to bio.
> 
> However, do_direct_IO() just does following except for adding bio and
> submitting bio:
> 
> - retrieves pages at batch(pin 64 pages each time from VM) and 
> 
> - retrieve block mapping(get_more_blocks), which is still done usually
> very less times for 32MB; for new mapping, clean_bdev_aliases() may
> take a bit time.
> 
> If there isn't system memory pressure, pin 64 pages won't be slow, but
> get_more_blocks() may take a bit time.
> 
> Changheun, can you check if multiple get_more_blocks() is called for 
> submitting
> 32MB in your test?

almost one time called.

> 
> In my 32MB sync dio f2fs test on x86_64 VM, one buffer_head mapping can
> hold 32MB, but it is one freshly new f2fs.
> 
> I'd suggest to understand the issue completely before figuring out one
> solution.

Thank you for your advice. I'll analyze more about your point later. :)
But I think it's different from finding main time spend point in
do_direct_IO(). I think excessive loop should be controlled.
8,192 loops in do_direct_IO() - for 32MB - to submit one bio is too much
on 4KB page system. I want to apply a optional solution to avoid
excessive loop casued by multipage bvec.

Thanks,

Changheun Lee


[PATCH v7 3/3] bio: add limit_bio_size sysfs

2021-04-12 Thread Changheun Lee
Add limit_bio_size block sysfs node to limit bio size.
Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
And bio max size will be limited by queue max sectors via
QUEUE_FLAG_LIMIT_BIO_SIZE set.

Signed-off-by: Changheun Lee 
---
 Documentation/ABI/testing/sysfs-block | 10 ++
 Documentation/block/queue-sysfs.rst   |  7 +++
 block/blk-sysfs.c |  3 +++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index e34cdeeeb9d4..86a7b15410cf 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -316,3 +316,13 @@ Description:
does not complete in this time then the block driver timeout
handler is invoked. That timeout handler can decide to retry
the request, to fail it or to start a device recovery strategy.
+
+What:  /sys/block//queue/limit_bio_size
+Date:  Feb, 2021
+Contact:   Changheun Lee 
+Description:
+   (RW) Toggle for set/clear QUEUE_FLAG_LIMIT_BIO_SIZE queue flag.
+   Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if 
limit_bio_size
+   is set. And bio max size will be limited by queue max sectors.
+   QUEUE_FLAG_LIMIT_BIO_SIZE will be cleared if limit_bio_size is
+   cleard. And limit of bio max size will be cleard.
diff --git a/Documentation/block/queue-sysfs.rst 
b/Documentation/block/queue-sysfs.rst
index 4dc7f0d499a8..220d183a4c11 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -286,4 +286,11 @@ sequential zones of zoned block devices (devices with a 
zoned attributed
 that reports "host-managed" or "host-aware"). This value is always 0 for
 regular block devices.
 
+limit_bio_size (RW)
+---
+This indicates QUEUE_FLAG_LIMIT_BIO_SIZE queue flag value. And
+QUEUE_FLAG_LIMIT_BIO_SIZE can be changed via set(1)/clear(0) this node.
+bio max size will be limited by queue max sectors via set this node. And
+limit of bio max size will be cleard via clear this node.
+
 Jens Axboe , February 2009
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0f4f0c8a7825..4625d5319daf 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -294,6 +294,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -625,6 +626,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -659,6 +661,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.29.0



[PATCH v7 1/3] bio: limit bio max size

2021-04-12 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 
---
 block/bio.c| 13 -
 block/blk-settings.c   | 17 +
 include/linux/bio.h|  4 +++-
 include/linux/blkdev.h |  4 
 4 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 50e579088aca..4bad97f1d37b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -255,6 +255,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+
+   if (blk_queue_limit_bio_size(q))
+   return blk_queue_get_max_sectors(q, bio_op(bio))
+   << SECTOR_SHIFT;
+
+   return UINT_MAX;
+}
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -866,7 +877,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/block/blk-settings.c b/block/blk-settings.c
index b4aa2f37fab6..1d94b97cea4f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -928,6 +928,23 @@ void blk_queue_set_zoned(struct gendisk *disk, enum 
blk_zoned_model model)
 }
 EXPORT_SYMBOL_GPL(blk_queue_set_zoned);
 
+/**
+ * blk_queue_set_limit_bio_size - set limit bio size flag
+ * @q: the request queue for the device
+ * @limit: limit bio size on(true), or off
+ *
+ * bio max size will be limited to queue max sectors size,
+ * if limit is true.
+ */
+void blk_queue_set_limit_bio_size(struct request_queue *q, bool limit)
+{
+   if (limit)
+   blk_queue_flag_set(QUEUE_FLAG_LIMIT_BIO_SIZE, q);
+   else
+   blk_queue_flag_clear(QUEUE_FLAG_LIMIT_BIO_SIZE, q);
+}
+EXPORT_SYMBOL_GPL(blk_queue_set_limit_bio_size);
+
 static int __init blk_settings_init(void)
 {
blk_max_low_pfn = max_low_pfn - 1;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d0246c92a6e8..e5add63da3af 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -106,6 +106,8 @@ static inline void *bio_data

[PATCH v7 0/3] limit bio max size

2021-04-12 Thread Changheun Lee
I found a inefficient behavior from multipage bvec. Large chunk DIO
scenario is that. This patch series could be a solution to improve it.

Changheun Lee (3):
  bio: limit bio max size
  ufs: set QUEUE_FLAG_LIMIT_BIO_SIZE
  bio: add limit_bio_size sysfs

 Documentation/ABI/testing/sysfs-block | 10 ++
 Documentation/block/queue-sysfs.rst   |  7 +++
 block/bio.c   | 13 -
 block/blk-settings.c  | 17 +
 block/blk-sysfs.c |  3 +++
 drivers/scsi/scsi_lib.c   |  2 ++
 drivers/scsi/ufs/ufshcd.c |  1 +
 include/linux/bio.h   |  4 +++-
 include/linux/blkdev.h|  4 
 include/scsi/scsi_host.h  |  2 ++
 10 files changed, 61 insertions(+), 2 deletions(-)

-- 
2.29.0



[PATCH v7 2/3] ufs: set QUEUE_FLAG_LIMIT_BIO_SIZE

2021-04-12 Thread Changheun Lee
Set QUEUE_FLAG_LIMIT_BIO_SIZE queue flag to limit bio max size to
queue max sectors size for UFS device.

Signed-off-by: Changheun Lee 
---
 drivers/scsi/scsi_lib.c   | 2 ++
 drivers/scsi/ufs/ufshcd.c | 1 +
 include/scsi/scsi_host.h  | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7d52a11e1b61..73ce6ba7903a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1838,6 +1838,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
request_queue *q)
 * Devices that require a bigger alignment can increase it later.
 */
blk_queue_dma_alignment(q, max(4, dma_get_cache_alignment()) - 1);
+
+   blk_queue_set_limit_bio_size(q, shost->limit_bio_size);
 }
 EXPORT_SYMBOL_GPL(__scsi_init_queue);
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index d3d05e997c13..000eb5ab022e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -9313,6 +9313,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem 
*mmio_base, unsigned int irq)
host->max_channel = UFSHCD_MAX_CHANNEL;
host->unique_id = host->host_no;
host->max_cmd_len = UFS_CDB_SIZE;
+   host->limit_bio_size = true;
 
hba->max_pwr_info.is_valid = false;
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index e30fd963b97d..486f61588717 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -607,6 +607,8 @@ struct Scsi_Host {
unsigned int max_segment_size;
unsigned long dma_boundary;
unsigned long virt_boundary_mask;
+   unsigned int limit_bio_size;
+
/*
 * In scsi-mq mode, the number of hardware queues supported by the LLD.
 *
-- 
2.29.0



[PATCH v6 3/3] bio: add limit_bio_size sysfs

2021-04-11 Thread Changheun Lee
Add limit_bio_size block sysfs node to limit bio size.
Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
And bio max size will be limited by queue max sectors via
QUEUE_FLAG_LIMIT_BIO_SIZE set.

Signed-off-by: Changheun Lee 
---
 Documentation/ABI/testing/sysfs-block | 10 ++
 Documentation/block/queue-sysfs.rst   |  7 +++
 block/blk-sysfs.c |  3 +++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index e34cdeeeb9d4..86a7b15410cf 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -316,3 +316,13 @@ Description:
does not complete in this time then the block driver timeout
handler is invoked. That timeout handler can decide to retry
the request, to fail it or to start a device recovery strategy.
+
+What:  /sys/block//queue/limit_bio_size
+Date:  Feb, 2021
+Contact:   Changheun Lee 
+Description:
+   (RW) Toggle for set/clear QUEUE_FLAG_LIMIT_BIO_SIZE queue flag.
+   Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if 
limit_bio_size
+   is set. And bio max size will be limited by queue max sectors.
+   QUEUE_FLAG_LIMIT_BIO_SIZE will be cleared if limit_bio_size is
+   cleard. And limit of bio max size will be cleard.
diff --git a/Documentation/block/queue-sysfs.rst 
b/Documentation/block/queue-sysfs.rst
index 4dc7f0d499a8..220d183a4c11 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -286,4 +286,11 @@ sequential zones of zoned block devices (devices with a 
zoned attributed
 that reports "host-managed" or "host-aware"). This value is always 0 for
 regular block devices.
 
+limit_bio_size (RW)
+---
+This indicates QUEUE_FLAG_LIMIT_BIO_SIZE queue flag value. And
+QUEUE_FLAG_LIMIT_BIO_SIZE can be changed via set(1)/clear(0) this node.
+bio max size will be limited by queue max sectors via set this node. And
+limit of bio max size will be cleard via clear this node.
+
 Jens Axboe , February 2009
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0f4f0c8a7825..4625d5319daf 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -294,6 +294,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -625,6 +626,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -659,6 +661,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.29.0



[PATCH v6 2/3] ufs: set QUEUE_FLAG_LIMIT_BIO_SIZE

2021-04-11 Thread Changheun Lee
Set QUEUE_FLAG_LIMIT_BIO_SIZE queue flag to limit bio max size to
queue max sectors size for UFS device.

Signed-off-by: Changheun Lee 
---
 drivers/scsi/scsi_lib.c   | 2 ++
 drivers/scsi/ufs/ufshcd.c | 1 +
 include/scsi/scsi_host.h  | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7d52a11e1b61..73ce6ba7903a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1838,6 +1838,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
request_queue *q)
 * Devices that require a bigger alignment can increase it later.
 */
blk_queue_dma_alignment(q, max(4, dma_get_cache_alignment()) - 1);
+
+   blk_queue_set_limit_bio_size(q, shost->limit_bio_size);
 }
 EXPORT_SYMBOL_GPL(__scsi_init_queue);
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index d3d05e997c13..000eb5ab022e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -9313,6 +9313,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem 
*mmio_base, unsigned int irq)
host->max_channel = UFSHCD_MAX_CHANNEL;
host->unique_id = host->host_no;
host->max_cmd_len = UFS_CDB_SIZE;
+   host->limit_bio_size = true;
 
hba->max_pwr_info.is_valid = false;
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index e30fd963b97d..486f61588717 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -607,6 +607,8 @@ struct Scsi_Host {
unsigned int max_segment_size;
unsigned long dma_boundary;
unsigned long virt_boundary_mask;
+   unsigned int limit_bio_size;
+
/*
 * In scsi-mq mode, the number of hardware queues supported by the LLD.
 *
-- 
2.29.0



[PATCH v6 1/3] bio: limit bio max size

2021-04-11 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 
---
 block/bio.c| 13 -
 block/blk-settings.c   | 17 +
 include/linux/bio.h|  2 +-
 include/linux/blkdev.h |  4 
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 50e579088aca..e4d6169106b6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -255,6 +255,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q = bio->bi_disk->queue;
+
+   if (blk_queue_limit_bio_size(q))
+   return blk_queue_get_max_sectors(q, bio_op(bio))
+   << SECTOR_SHIFT;
+
+   return UINT_MAX;
+}
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -866,7 +877,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/block/blk-settings.c b/block/blk-settings.c
index b4aa2f37fab6..1d94b97cea4f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -928,6 +928,23 @@ void blk_queue_set_zoned(struct gendisk *disk, enum 
blk_zoned_model model)
 }
 EXPORT_SYMBOL_GPL(blk_queue_set_zoned);
 
+/**
+ * blk_queue_set_limit_bio_size - set limit bio size flag
+ * @q: the request queue for the device
+ * @limit: limit bio size on(true), or off
+ *
+ * bio max size will be limited to queue max sectors size,
+ * if limit is true.
+ */
+void blk_queue_set_limit_bio_size(struct request_queue *q, bool limit)
+{
+   if (limit)
+   blk_queue_flag_set(QUEUE_FLAG_LIMIT_BIO_SIZE, q);
+   else
+   blk_queue_flag_clear(QUEUE_FLAG_LIMIT_BIO_SIZE, q);
+}
+EXPORT_SYMBOL_GPL(blk_queue_set_limit_bio_size);
+
 static int __init blk_settings_init(void)
 {
blk_max_low_pfn = max_low_pfn - 1;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d0246c92a6e8..830c784967c0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -119,7 +119,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
  

[PATCH v6 0/3] limit bio max size

2021-04-11 Thread Changheun Lee
I found a inefficient behavior from multipage bvec. Large chunk DIO
scenario is that. This patch series could be a solution to improve it.

Changheun Lee (3):
  bio: limit bio max size
  ufs: set QUEUE_FLAG_LIMIT_BIO_SIZE
  bio: add limit_bio_size sysfs

 Documentation/ABI/testing/sysfs-block | 10 ++
 Documentation/block/queue-sysfs.rst   |  7 +++
 block/bio.c   | 13 -
 block/blk-settings.c  | 17 +
 block/blk-sysfs.c |  3 +++
 drivers/scsi/scsi_lib.c   |  2 ++
 drivers/scsi/ufs/ufshcd.c |  1 +
 include/linux/bio.h   |  2 +-
 include/linux/blkdev.h|  4 
 include/scsi/scsi_host.h  |  2 ++
 10 files changed, 59 insertions(+), 2 deletions(-)

-- 
2.29.0



Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-07 Thread Changheun Lee
> On Wed, Apr 07, 2021 at 03:55:07PM +0900, Changheun Lee wrote:
> > > On Wed, Apr 07, 2021 at 02:06:33PM +0900, Changheun Lee wrote:
> > > > > On Wed, Apr 07, 2021 at 09:16:12AM +0900, Changheun Lee wrote:
> > > > > > > On Tue, Apr 06, 2021 at 10:31:28AM +0900, Changheun Lee wrote:
> > > > > > > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > > > > > > but sometimes it would lead to inefficient behaviors.
> > > > > > > > > in case of large chunk direct I/O, - 32MB chunk read in user 
> > > > > > > > > space -
> > > > > > > > > all pages for 32MB would be merged to a bio structure if the 
> > > > > > > > > pages
> > > > > > > > > physical addresses are contiguous. it makes some delay to 
> > > > > > > > > submit
> > > > > > > > > until merge complete. bio max size should be limited to a 
> > > > > > > > > proper size.
> > > > > > > > > 
> > > > > > > > > When 32MB chunk read with direct I/O option is coming from 
> > > > > > > > > userspace,
> > > > > > > > > kernel behavior is below now in do_direct_IO() loop. it's 
> > > > > > > > > timeline.
> > > > > > > > > 
> > > > > > > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > > > > > > >  | total elapsed time is over 2ms.
> > > > > > > > >  |-- ... --->|
> > > > > > > > >  | 8,192 
> > > > > > > > > pages merged a bio.
> > > > > > > > >  | at this 
> > > > > > > > > time, first bio submit is done.
> > > > > > > > >  | 1 bio is 
> > > > > > > > > split to 32 read request and issue.
> > > > > > > > >  
> > > > > > > > > |--->
> > > > > > > > >   
> > > > > > > > > |--->
> > > > > > > > >
> > > > > > > > > |--->
> > > > > > > > >   
> > > > > > > > > ..
> > > > > > > > >   
> > > > > > > > >  |--->
> > > > > > > > >   
> > > > > > > > >   |--->|
> > > > > > > > >   total 19ms elapsed to complete 32MB 
> > > > > > > > > read done from device. |
> > > > > > > > > 
> > > > > > > > > If bio max size is limited with 1MB, behavior is changed 
> > > > > > > > > below.
> > > > > > > > > 
> > > > > > > > >  | bio merge for 1MB. 256 pages are merged for each bio.
> > > > > > > > >  | total 32 bio will be made.
> > > > > > > > >  | total elapsed time is over 2ms. it's same.
> > > > > > > > >  | but, first bio submit timing is fast. about 100us.
> > > > > > > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > > > > > > >   | 256 pages merged a bio.
> > > > > > > > >   | at this time, first bio submit is done.
> > > > > > > > >   | and 1 read request is issued for 1 bio.
> > > > > > > > >   |--->
> > > > > > > > >|--->
> > > > > > > > > |--->
> > > > > > > > >   ..
> > > > > > > > >

Re: [RESEND PATCH v5 1/2] bio: limit bio max size

2021-04-07 Thread Changheun Lee
> On Wed, Apr 07, 2021 at 02:06:33PM +0900, Changheun Lee wrote:
> > > On Wed, Apr 07, 2021 at 09:16:12AM +0900, Changheun Lee wrote:
> > > > > On Tue, Apr 06, 2021 at 10:31:28AM +0900, Changheun Lee wrote:
> > > > > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > > > > but sometimes it would lead to inefficient behaviors.
> > > > > > > in case of large chunk direct I/O, - 32MB chunk read in user 
> > > > > > > space -
> > > > > > > all pages for 32MB would be merged to a bio structure if the pages
> > > > > > > physical addresses are contiguous. it makes some delay to submit
> > > > > > > until merge complete. bio max size should be limited to a proper 
> > > > > > > size.
> > > > > > > 
> > > > > > > When 32MB chunk read with direct I/O option is coming from 
> > > > > > > userspace,
> > > > > > > kernel behavior is below now in do_direct_IO() loop. it's 
> > > > > > > timeline.
> > > > > > > 
> > > > > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > > > > >  | total elapsed time is over 2ms.
> > > > > > >  |-- ... --->|
> > > > > > >  | 8,192 pages 
> > > > > > > merged a bio.
> > > > > > >  | at this time, 
> > > > > > > first bio submit is done.
> > > > > > >  | 1 bio is split 
> > > > > > > to 32 read request and issue.
> > > > > > >  |--->
> > > > > > >   
> > > > > > > |--->
> > > > > > >
> > > > > > > |--->
> > > > > > >   
> > > > > > > ..
> > > > > > >   
> > > > > > >  |--->
> > > > > > >   
> > > > > > >   |--->|
> > > > > > >   total 19ms elapsed to complete 32MB 
> > > > > > > read done from device. |
> > > > > > > 
> > > > > > > If bio max size is limited with 1MB, behavior is changed below.
> > > > > > > 
> > > > > > >  | bio merge for 1MB. 256 pages are merged for each bio.
> > > > > > >  | total 32 bio will be made.
> > > > > > >  | total elapsed time is over 2ms. it's same.
> > > > > > >  | but, first bio submit timing is fast. about 100us.
> > > > > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > > > > >   | 256 pages merged a bio.
> > > > > > >   | at this time, first bio submit is done.
> > > > > > >   | and 1 read request is issued for 1 bio.
> > > > > > >   |--->
> > > > > > >|--->
> > > > > > > |--->
> > > > > > >   ..
> > > > > > >  |--->
> > > > > > >   
> > > > > > > |--->|
> > > > > > > total 17ms elapsed to complete 32MB read done from 
> > > > > > > device. |
> > > > > > > 
> > > > > > > As a result, read request issue timing is faster if bio max size 
> > > > > > > is limited.
> > > > > > > Current kernel behavior with multipage bvec, super large bio can 
> > > > > > > be created.
> > > > > > > And it lead to delay first I/O request issue.
> > > > > > > 
> > > >

Re: [RESEND PATCH v5 1/2] bio: limit bio max size

2021-04-06 Thread Changheun Lee
> On Wed, Apr 07, 2021 at 09:16:12AM +0900, Changheun Lee wrote:
> > > On Tue, Apr 06, 2021 at 10:31:28AM +0900, Changheun Lee wrote:
> > > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > > but sometimes it would lead to inefficient behaviors.
> > > > > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > > > > all pages for 32MB would be merged to a bio structure if the pages
> > > > > physical addresses are contiguous. it makes some delay to submit
> > > > > until merge complete. bio max size should be limited to a proper size.
> > > > > 
> > > > > When 32MB chunk read with direct I/O option is coming from userspace,
> > > > > kernel behavior is below now in do_direct_IO() loop. it's timeline.
> > > > > 
> > > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > > >  | total elapsed time is over 2ms.
> > > > >  |-- ... --->|
> > > > >  | 8,192 pages merged 
> > > > > a bio.
> > > > >  | at this time, 
> > > > > first bio submit is done.
> > > > >  | 1 bio is split to 
> > > > > 32 read request and issue.
> > > > >  |--->
> > > > >   |--->
> > > > >|--->
> > > > >   ..
> > > > >
> > > > > |--->
> > > > > 
> > > > > |--->|
> > > > >   total 19ms elapsed to complete 32MB read 
> > > > > done from device. |
> > > > > 
> > > > > If bio max size is limited with 1MB, behavior is changed below.
> > > > > 
> > > > >  | bio merge for 1MB. 256 pages are merged for each bio.
> > > > >  | total 32 bio will be made.
> > > > >  | total elapsed time is over 2ms. it's same.
> > > > >  | but, first bio submit timing is fast. about 100us.
> > > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > > >   | 256 pages merged a bio.
> > > > >   | at this time, first bio submit is done.
> > > > >   | and 1 read request is issued for 1 bio.
> > > > >   |--->
> > > > >|--->
> > > > > |--->
> > > > >   ..
> > > > >  |--->
> > > > >   |--->|
> > > > > total 17ms elapsed to complete 32MB read done from device. |
> > > > > 
> > > > > As a result, read request issue timing is faster if bio max size is 
> > > > > limited.
> > > > > Current kernel behavior with multipage bvec, super large bio can be 
> > > > > created.
> > > > > And it lead to delay first I/O request issue.
> > > > > 
> > > > > Signed-off-by: Changheun Lee 
> > > > > ---
> > > > >  block/bio.c| 13 -
> > > > >  include/linux/bio.h|  2 +-
> > > > >  include/linux/blkdev.h |  3 +++
> > > > >  3 files changed, 16 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/block/bio.c b/block/bio.c
> > > > > index 1f2cc1fbe283..c528e1f944c7 100644
> > > > > --- a/block/bio.c
> > > > > +++ b/block/bio.c
> > > > > @@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec 
> > > > > *table,
> > > > >  }
> > > > >  EXPORT_SYMBOL(bio_init);
> > > > >  
> > > > > +unsigned int bio_max_size(struct bio *bio)
> > > > > +{
> > > > > + struct request_queue *q = bio->bi_disk->queue;
>

Re: [RESEND PATCH v5 2/2] bio: add limit_bio_size sysfs

2021-04-06 Thread Changheun Lee
> On 3/16/21 12:44 AM, Changheun Lee wrote:
> > Add limit_bio_size block sysfs node to limit bio size.
> > Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
> > And bio max size will be limited by queue max sectors via
> > QUEUE_FLAG_LIMIT_BIO_SIZE set.
> > 
> > Signed-off-by: Changheun Lee 
> > ---
> >  Documentation/ABI/testing/sysfs-block | 10 ++
> >  Documentation/block/queue-sysfs.rst   |  7 +++
> >  block/blk-sysfs.c |  3 +++
> >  3 files changed, 20 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-block 
> > b/Documentation/ABI/testing/sysfs-block
> > index e34cdeeeb9d4..86a7b15410cf 100644
> > --- a/Documentation/ABI/testing/sysfs-block
> > +++ b/Documentation/ABI/testing/sysfs-block
> > @@ -316,3 +316,13 @@ Description:
> > does not complete in this time then the block driver timeout
> > handler is invoked. That timeout handler can decide to retry
> > the request, to fail it or to start a device recovery strategy.
> > +
> > +What:  /sys/block//queue/limit_bio_size
> > +Date:  Feb, 2021
> > +Contact:   Changheun Lee 
> > +Description:
> > +   (RW) Toggle for set/clear QUEUE_FLAG_LIMIT_BIO_SIZE queue flag.
> > +   Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if 
> > limit_bio_size
> > +   is set. And bio max size will be limited by queue max sectors.
> > +   QUEUE_FLAG_LIMIT_BIO_SIZE will be cleared if limit_bio_size is
> > +   cleard. And limit of bio max size will be cleard.
> > diff --git a/Documentation/block/queue-sysfs.rst 
> > b/Documentation/block/queue-sysfs.rst
> > index 2638d3446b79..cd371a821855 100644
> > --- a/Documentation/block/queue-sysfs.rst
> > +++ b/Documentation/block/queue-sysfs.rst
> > @@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block 
> > Commands) and ZAC
> >  do not support zone commands, they will be treated as regular block devices
> >  and zoned will report "none".
> >  
> > +limit_bio_size (RW)
> > +---
> > +This indicates QUEUE_FLAG_LIMIT_BIO_SIZE queue flag value. And
> > +QUEUE_FLAG_LIMIT_BIO_SIZE can be changed via set(1)/clear(0) this node.
> > +bio max size will be limited by queue max sectors via set this node. And
> > +limit of bio max size will be cleard via clear this node.
> > +
> >  Jens Axboe , February 2009
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index b513f1683af0..840d97f427e6 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -288,6 +288,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
> >  QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
> >  QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
> >  QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
> > +QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
> >  #undef QUEUE_SYSFS_BIT_FNS
> >  
> >  static ssize_t queue_zoned_show(struct request_queue *q, char *page)
> > @@ -615,6 +616,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
> >  QUEUE_RW_ENTRY(queue_iostats, "iostats");
> >  QUEUE_RW_ENTRY(queue_random, "add_random");
> >  QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
> > +QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
> >  
> >  static struct attribute *queue_attrs[] = {
> > _requests_entry.attr,
> > @@ -648,6 +650,7 @@ static struct attribute *queue_attrs[] = {
> > _rq_affinity_entry.attr,
> > _iostats_entry.attr,
> > _stable_writes_entry.attr,
> > +   _limit_bio_size_entry.attr,
> > _random_entry.attr,
> > _poll_entry.attr,
> > _wc_entry.attr,
> 
> Has it been considered to introduce a function to set the BIO size limit
> instead of introducing a new sysfs attribute? See also
> blk_queue_max_hw_sectors().

A function to set has been not considered yet.
But sysfs attribute should be supported I think. Because it can be
different depending on each system environment including policy.

> 
> Thanks,
> 
> Bart.

Thanks,

Changheun Lee


Re: [RESEND PATCH v5 1/2] bio: limit bio max size

2021-04-06 Thread Changheun Lee
> On Tue, Apr 06, 2021 at 10:31:28AM +0900, Changheun Lee wrote:
> > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > but sometimes it would lead to inefficient behaviors.
> > > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > > all pages for 32MB would be merged to a bio structure if the pages
> > > physical addresses are contiguous. it makes some delay to submit
> > > until merge complete. bio max size should be limited to a proper size.
> > > 
> > > When 32MB chunk read with direct I/O option is coming from userspace,
> > > kernel behavior is below now in do_direct_IO() loop. it's timeline.
> > > 
> > >  | bio merge for 32MB. total 8,192 pages are merged.
> > >  | total elapsed time is over 2ms.
> > >  |-- ... --->|
> > >  | 8,192 pages merged a 
> > > bio.
> > >  | at this time, first 
> > > bio submit is done.
> > >  | 1 bio is split to 32 
> > > read request and issue.
> > >  |--->
> > >   |--->
> > >|--->
> > >   ..
> > >
> > > |--->
> > > 
> > > |--->|
> > >   total 19ms elapsed to complete 32MB read done 
> > > from device. |
> > > 
> > > If bio max size is limited with 1MB, behavior is changed below.
> > > 
> > >  | bio merge for 1MB. 256 pages are merged for each bio.
> > >  | total 32 bio will be made.
> > >  | total elapsed time is over 2ms. it's same.
> > >  | but, first bio submit timing is fast. about 100us.
> > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > >   | 256 pages merged a bio.
> > >   | at this time, first bio submit is done.
> > >   | and 1 read request is issued for 1 bio.
> > >   |--->
> > >|--->
> > > |--->
> > >   ..
> > >          |--->
> > >   |--->|
> > > total 17ms elapsed to complete 32MB read done from device. |
> > > 
> > > As a result, read request issue timing is faster if bio max size is 
> > > limited.
> > > Current kernel behavior with multipage bvec, super large bio can be 
> > > created.
> > > And it lead to delay first I/O request issue.
> > > 
> > > Signed-off-by: Changheun Lee 
> > > ---
> > >  block/bio.c| 13 -
> > >  include/linux/bio.h|  2 +-
> > >  include/linux/blkdev.h |  3 +++
> > >  3 files changed, 16 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/bio.c b/block/bio.c
> > > index 1f2cc1fbe283..c528e1f944c7 100644
> > > --- a/block/bio.c
> > > +++ b/block/bio.c
> > > @@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
> > >  }
> > >  EXPORT_SYMBOL(bio_init);
> > >  
> > > +unsigned int bio_max_size(struct bio *bio)
> > > +{
> > > + struct request_queue *q = bio->bi_disk->queue;
> > > +
> > > + if (blk_queue_limit_bio_size(q))
> > > + return blk_queue_get_max_sectors(q, bio_op(bio))
> > > + << SECTOR_SHIFT;
> > > +
> > > + return UINT_MAX;
> > > +}
> > > +
> > >  /**
> > >   * bio_reset - reinitialize a bio
> > >   * @bio: bio to reset
> > > @@ -877,7 +888,7 @@ bool __bio_try_merge_page(struct bio *bio, struct 
> > > page *page,
> > >   struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> > >  
> > >   if (page_is_mergeable(bv, page, len, off, same_page)) {
> > > - if (bio->bi_iter.bi_size > UINT_MAX - len) {
> > > + if (bio->bi_iter.bi_siz

[RESEND PATCH v5 1/2] bio: limit bio max size

2021-04-05 Thread Changheun Lee
> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio structure if the pages
> physical addresses are contiguous. it makes some delay to submit
> until merge complete. bio max size should be limited to a proper size.
> 
> When 32MB chunk read with direct I/O option is coming from userspace,
> kernel behavior is below now in do_direct_IO() loop. it's timeline.
> 
>  | bio merge for 32MB. total 8,192 pages are merged.
>  | total elapsed time is over 2ms.
>  |-- ... --->|
>  | 8,192 pages merged a bio.
>  | at this time, first bio 
> submit is done.
>  | 1 bio is split to 32 read 
> request and issue.
>  |--->
>   |--->
>|--->
>   ..
>
> |--->
> 
> |--->|
>   total 19ms elapsed to complete 32MB read done from 
> device. |
> 
> If bio max size is limited with 1MB, behavior is changed below.
> 
>  | bio merge for 1MB. 256 pages are merged for each bio.
>  | total 32 bio will be made.
>  | total elapsed time is over 2ms. it's same.
>  | but, first bio submit timing is fast. about 100us.
>  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
>   | 256 pages merged a bio.
>   | at this time, first bio submit is done.
>   | and 1 read request is issued for 1 bio.
>   |--->
>|--->
> |--->
>   ..
>  |--->
>   |--->|
> total 17ms elapsed to complete 32MB read done from device. |
> 
> As a result, read request issue timing is faster if bio max size is limited.
> Current kernel behavior with multipage bvec, super large bio can be created.
> And it lead to delay first I/O request issue.
> 
> Signed-off-by: Changheun Lee 
> ---
>  block/bio.c| 13 -
>  include/linux/bio.h|  2 +-
>  include/linux/blkdev.h |  3 +++
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 1f2cc1fbe283..c528e1f944c7 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
>  }
>  EXPORT_SYMBOL(bio_init);
>  
> +unsigned int bio_max_size(struct bio *bio)
> +{
> + struct request_queue *q = bio->bi_disk->queue;
> +
> + if (blk_queue_limit_bio_size(q))
> + return blk_queue_get_max_sectors(q, bio_op(bio))
> + << SECTOR_SHIFT;
> +
> + return UINT_MAX;
> +}
> +
>  /**
>   * bio_reset - reinitialize a bio
>   * @bio: bio to reset
> @@ -877,7 +888,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
> *page,
>   struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
>  
>   if (page_is_mergeable(bv, page, len, off, same_page)) {
> - if (bio->bi_iter.bi_size > UINT_MAX - len) {
> + if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
>   *same_page = false;
>   return false;
>   }
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 1edda614f7ce..13b6f6562a5b 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -113,7 +113,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
>   if (bio->bi_vcnt >= bio->bi_max_vecs)
>   return true;
>  
> - if (bio->bi_iter.bi_size > UINT_MAX - len)
> + if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
>   return true;
>  
>   return false;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f94ee3089e01..3aeab9e7e97b 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -621,6 +621,7 @@ struct request_queue {
>  

[RESEND PATCH v5 2/2] bio: add limit_bio_size sysfs

2021-03-16 Thread Changheun Lee
Add limit_bio_size block sysfs node to limit bio size.
Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
And bio max size will be limited by queue max sectors via
QUEUE_FLAG_LIMIT_BIO_SIZE set.

Signed-off-by: Changheun Lee 
---
 Documentation/ABI/testing/sysfs-block | 10 ++
 Documentation/block/queue-sysfs.rst   |  7 +++
 block/blk-sysfs.c |  3 +++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index e34cdeeeb9d4..86a7b15410cf 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -316,3 +316,13 @@ Description:
does not complete in this time then the block driver timeout
handler is invoked. That timeout handler can decide to retry
the request, to fail it or to start a device recovery strategy.
+
+What:  /sys/block//queue/limit_bio_size
+Date:  Feb, 2021
+Contact:   Changheun Lee 
+Description:
+   (RW) Toggle for set/clear QUEUE_FLAG_LIMIT_BIO_SIZE queue flag.
+   Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if 
limit_bio_size
+   is set. And bio max size will be limited by queue max sectors.
+   QUEUE_FLAG_LIMIT_BIO_SIZE will be cleared if limit_bio_size is
+   cleard. And limit of bio max size will be cleard.
diff --git a/Documentation/block/queue-sysfs.rst 
b/Documentation/block/queue-sysfs.rst
index 2638d3446b79..cd371a821855 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block Commands) 
and ZAC
 do not support zone commands, they will be treated as regular block devices
 and zoned will report "none".
 
+limit_bio_size (RW)
+---
+This indicates QUEUE_FLAG_LIMIT_BIO_SIZE queue flag value. And
+QUEUE_FLAG_LIMIT_BIO_SIZE can be changed via set(1)/clear(0) this node.
+bio max size will be limited by queue max sectors via set this node. And
+limit of bio max size will be cleard via clear this node.
+
 Jens Axboe , February 2009
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..840d97f427e6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -288,6 +288,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -615,6 +616,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -648,6 +650,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.28.0



[RESEND PATCH v5 1/2] bio: limit bio max size

2021-03-16 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 
---
 block/bio.c| 13 -
 include/linux/bio.h|  2 +-
 include/linux/blkdev.h |  3 +++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..c528e1f944c7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q = bio->bi_disk->queue;
+
+   if (blk_queue_limit_bio_size(q))
+   return blk_queue_get_max_sectors(q, bio_op(bio))
+   << SECTOR_SHIFT;
+
+   return UINT_MAX;
+}
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -877,7 +888,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..13b6f6562a5b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -113,7 +113,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
return true;
 
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..3aeab9e7e97b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -621,6 +621,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE 28  /* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT   29 /* device supports NOWAIT */
+#define QUEUE_FLAG_LIMIT_BIO_SIZE 30   /* limit bio size */
 
 #define QUEUE_FLAG_MQ_DEFAULT  ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_SAME_COMP) |  \
@@ -667,6 +668,8 @@ bool blk_queue_flag_test_and_set(unsi

Re: [PATCH v5 1/2] bio: limit bio max size

2021-02-15 Thread Changheun Lee
> 
> On Tue, Feb 16, 2021 at 11:00:32AM +0900, Changheun Lee wrote:
> > Please feedback to me if more modification is needed to apply. :)
> 
> No context here :(
> 

I'm so sorry. I missed it.

> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio structure if the pages
> physical addresses are contiguous. it makes some delay to submit
> until merge complete. bio max size should be limited to a proper size.
> 
> When 32MB chunk read with direct I/O option is coming from userspace,
> kernel behavior is below now in do_direct_IO() loop. it's timeline.
> 
>  | bio merge for 32MB. total 8,192 pages are merged.
>  | total elapsed time is over 2ms.
>  |-- ... --->|
>  | 8,192 pages merged a bio.
>  | at this time, first bio 
> submit is done.
>  | 1 bio is split to 32 read 
> request and issue.
>  |--->
>   |--->
>|--->
>   ..
>
> |--->
> 
> |--->|
>   total 19ms elapsed to complete 32MB read done from 
> device. |
> 
> If bio max size is limited with 1MB, behavior is changed below.
> 
>  | bio merge for 1MB. 256 pages are merged for each bio.
>  | total 32 bio will be made.
>  | total elapsed time is over 2ms. it's same.
>  | but, first bio submit timing is fast. about 100us.
>  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
>   | 256 pages merged a bio.
>   | at this time, first bio submit is done.
>   | and 1 read request is issued for 1 bio.
>   |--->
>|--->
> |--->
>   ..
>  |--->
>   |--->|
> total 17ms elapsed to complete 32MB read done from device. |
> 
> As a result, read request issue timing is faster if bio max size is limited.
> Current kernel behavior with multipage bvec, super large bio can be created.
> And it lead to delay first I/O request issue.
> 
> Signed-off-by: Changheun Lee 
> ---
>  block/bio.c| 13 -
>  include/linux/bio.h|  2 +-
>  include/linux/blkdev.h |  3 +++
>  3 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 1f2cc1fbe283..c528e1f944c7 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
>  }
>  EXPORT_SYMBOL(bio_init);
>  
> +unsigned int bio_max_size(struct bio *bio)
> +{
> + struct request_queue *q = bio->bi_disk->queue;
> +
> + if (blk_queue_limit_bio_size(q))
> + return blk_queue_get_max_sectors(q, bio_op(bio))
> + << SECTOR_SHIFT;
> +
> + return UINT_MAX;
> +}
> +
>  /**
>   * bio_reset - reinitialize a bio
>   * @bio: bio to reset
> @@ -877,7 +888,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
> *page,
>   struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
>  
>   if (page_is_mergeable(bv, page, len, off, same_page)) {
> - if (bio->bi_iter.bi_size > UINT_MAX - len) {
> + if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
>   *same_page = false;
>   return false;
>   }
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 1edda614f7ce..13b6f6562a5b 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -113,7 +113,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
>   if (bio->bi_vcnt >= bio->bi_max_vecs)
>   return true;
>  
> - if (bio->bi_iter.bi_size > UINT_MAX - len)
> + if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
>   return true;
>  
>   return false;
> diff --g

[PATCH v5 1/2] bio: limit bio max size

2021-02-15 Thread Changheun Lee
Please feedback to me if more modification is needed to apply. :)

---
Changheun Lee


[PATCH v5 2/2] bio: add limit_bio_size sysfs

2021-02-03 Thread Changheun Lee
Add limit_bio_size block sysfs node to limit bio size.
Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
And bio max size will be limited by queue max sectors via
QUEUE_FLAG_LIMIT_BIO_SIZE set.

Signed-off-by: Changheun Lee 
---
 Documentation/ABI/testing/sysfs-block | 10 ++
 Documentation/block/queue-sysfs.rst   |  7 +++
 block/blk-sysfs.c |  3 +++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index e34cdeeeb9d4..86a7b15410cf 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -316,3 +316,13 @@ Description:
does not complete in this time then the block driver timeout
handler is invoked. That timeout handler can decide to retry
the request, to fail it or to start a device recovery strategy.
+
+What:  /sys/block//queue/limit_bio_size
+Date:  Feb, 2021
+Contact:   Changheun Lee 
+Description:
+   (RW) Toggle for set/clear QUEUE_FLAG_LIMIT_BIO_SIZE queue flag.
+   Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if 
limit_bio_size
+   is set. And bio max size will be limited by queue max sectors.
+   QUEUE_FLAG_LIMIT_BIO_SIZE will be cleared if limit_bio_size is
+   cleard. And limit of bio max size will be cleard.
diff --git a/Documentation/block/queue-sysfs.rst 
b/Documentation/block/queue-sysfs.rst
index 2638d3446b79..cd371a821855 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block Commands) 
and ZAC
 do not support zone commands, they will be treated as regular block devices
 and zoned will report "none".
 
+limit_bio_size (RW)
+---
+This indicates QUEUE_FLAG_LIMIT_BIO_SIZE queue flag value. And
+QUEUE_FLAG_LIMIT_BIO_SIZE can be changed via set(1)/clear(0) this node.
+bio max size will be limited by queue max sectors via set this node. And
+limit of bio max size will be cleard via clear this node.
+
 Jens Axboe , February 2009
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..840d97f427e6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -288,6 +288,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -615,6 +616,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -648,6 +650,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.28.0



[PATCH v5 1/2] bio: limit bio max size

2021-02-03 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now in do_direct_IO() loop. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 
---
 block/bio.c| 13 -
 include/linux/bio.h|  2 +-
 include/linux/blkdev.h |  3 +++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..c528e1f944c7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q = bio->bi_disk->queue;
+
+   if (blk_queue_limit_bio_size(q))
+   return blk_queue_get_max_sectors(q, bio_op(bio))
+   << SECTOR_SHIFT;
+
+   return UINT_MAX;
+}
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -877,7 +888,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..13b6f6562a5b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -113,7 +113,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
return true;
 
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..3aeab9e7e97b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -621,6 +621,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE 28  /* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT   29 /* device supports NOWAIT */
+#define QUEUE_FLAG_LIMIT_BIO_SIZE 30   /* limit bio size */
 
 #define QUEUE_FLAG_MQ_DEFAULT  ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_SAME_COMP) |  \
@@ -667,6 +668,8 @@ bool blk_queue_flag_test_and_set(unsi

[PATCH 2/2] bio: add limit_bio_size sysfs

2021-02-03 Thread Changheun Lee
Add limit_bio_size block sysfs node to limit bio size.
Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
And bio max size will be limited by queue max sectors via
QUEUE_FLAG_LIMIT_BIO_SIZE set.

Signed-off-by: Changheun Lee 
Signed-off-by: nanich.lee 
---
 Documentation/ABI/testing/sysfs-block | 10 ++
 Documentation/block/queue-sysfs.rst   |  7 +++
 block/blk-sysfs.c |  3 +++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-block 
b/Documentation/ABI/testing/sysfs-block
index e34cdeeeb9d4..86a7b15410cf 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -316,3 +316,13 @@ Description:
does not complete in this time then the block driver timeout
handler is invoked. That timeout handler can decide to retry
the request, to fail it or to start a device recovery strategy.
+
+What:  /sys/block//queue/limit_bio_size
+Date:  Feb, 2021
+Contact:   Changheun Lee 
+Description:
+   (RW) Toggle for set/clear QUEUE_FLAG_LIMIT_BIO_SIZE queue flag.
+   Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if 
limit_bio_size
+   is set. And bio max size will be limited by queue max sectors.
+   QUEUE_FLAG_LIMIT_BIO_SIZE will be cleared if limit_bio_size is
+   cleard. And limit of bio max size will be cleard.
diff --git a/Documentation/block/queue-sysfs.rst 
b/Documentation/block/queue-sysfs.rst
index 2638d3446b79..cd371a821855 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -273,4 +273,11 @@ devices are described in the ZBC (Zoned Block Commands) 
and ZAC
 do not support zone commands, they will be treated as regular block devices
 and zoned will report "none".
 
+limit_bio_size (RW)
+---
+This indicates QUEUE_FLAG_LIMIT_BIO_SIZE queue flag value. And
+QUEUE_FLAG_LIMIT_BIO_SIZE can be changed via set(1)/clear(0) this node.
+bio max size will be limited by queue max sectors via set this node. And
+limit of bio max size will be cleard via clear this node.
+
 Jens Axboe , February 2009
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..840d97f427e6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -288,6 +288,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -615,6 +616,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -648,6 +650,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.28.0



[PATCH v4 2/2] bio: add limit_bio_size sysfs

2021-02-03 Thread Changheun Lee
Add limit_bio_size block sysfs node to limit bio size.
Queue flag QUEUE_FLAG_LIMIT_BIO_SIZE will be set if limit_bio_size is set.
And bio max size will be limited by queue max sectors via
QUEUE_FLAG_LIMIT_BIO_SIZE set.

Signed-off-by: Changheun Lee 
---
 block/blk-sysfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..840d97f427e6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -288,6 +288,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -615,6 +616,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -648,6 +650,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.28.0



Re: [PATCH v4 1/2] bio: limit bio max size

2021-02-02 Thread Changheun Lee
> On Tue, Feb 02, 2021 at 01:12:04PM +0900, Changheun Lee wrote:
> > > On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote:
> > > > > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote:
> > > > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > > > but sometimes it would lead to inefficient behaviors.
> > > > > > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > > > > > all pages for 32MB would be merged to a bio structure if the pages
> > > > > > physical addresses are contiguous. it makes some delay to submit
> > > > > > until merge complete. bio max size should be limited to a proper 
> > > > > > size.
> > > > > > 
> > > > > > When 32MB chunk read with direct I/O option is coming from 
> > > > > > userspace,
> > > > > > kernel behavior is below now. it's timeline.
> > > > > > 
> > > > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > > > >  | total elapsed time is over 2ms.
> > > > > >  |-- ... --->|
> > > > > >  | 8,192 pages 
> > > > > > merged a bio.
> > > > > >  | at this time, 
> > > > > > first bio submit is done.
> > > > > >  | 1 bio is split 
> > > > > > to 32 read request and issue.
> > > > > >  |--->
> > > > > >   |--->
> > > > > >|--->
> > > > > >   ..
> > > > > >
> > > > > > |--->
> > > > > > 
> > > > > > |--->|
> > > > > >   total 19ms elapsed to complete 32MB read 
> > > > > > done from device. |
> > > > > > 
> > > > > > If bio max size is limited with 1MB, behavior is changed below.
> > > > > > 
> > > bio_iov_iter_get_pages> > >  | bio merge for 1MB. 256 pages are merged 
> > > for each bio.
> > > > > >  | total 32 bio will be made.
> > > > > >  | total elapsed time is over 2ms. it's same.
> > > > > >  | but, first bio submit timing is fast. about 100us.
> > > > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > > > >   | 256 pages merged a bio.
> > > > > >   | at this time, first bio submit is done.
> > > > > >   | and 1 read request is issued for 1 bio.
> > > > > >   |--->
> > > > > >|--->
> > > > > > |--->
> > > > > >   ..
> > > > > >  |--->
> > > > > >   |--->|
> > > > > > total 17ms elapsed to complete 32MB read done from device. |
> > > > > 
> > > > > Can you share us if enabling THP in your application can avoid this 
> > > > > issue? BTW, you
> > > > > need to make the 32MB buffer aligned with huge page size. IMO, THP 
> > > > > perfectly fits
> > > > > your case.
> > > > > 
> > > > 
> > > > THP is enabled already like as below in my environment. It has no 
> > > > effect.
> > > > 
> > > > cat /sys/kernel/mm/transparent_hugepage/enabled
> > > > [always] madvise never
> > > 
> > > The 32MB user buffer needs to be huge page size aligned. If your system
> > > supports bcc/bpftrace, it is quite easy to check if the buffer is
> > > aligned.
> > > 
> > > > 
> > > > This issue was reported from performance benchmark application in open 
> > > 

Re: [PATCH v4 1/2] bio: limit bio max size

2021-02-01 Thread Changheun Lee
> On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote:
> > > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote:
> > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > but sometimes it would lead to inefficient behaviors.
> > > > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > > > all pages for 32MB would be merged to a bio structure if the pages
> > > > physical addresses are contiguous. it makes some delay to submit
> > > > until merge complete. bio max size should be limited to a proper size.
> > > > 
> > > > When 32MB chunk read with direct I/O option is coming from userspace,
> > > > kernel behavior is below now. it's timeline.
> > > > 
> > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > >  | total elapsed time is over 2ms.
> > > >  |-- ... --->|
> > > >  | 8,192 pages merged a 
> > > > bio.
> > > >  | at this time, first 
> > > > bio submit is done.
> > > >  | 1 bio is split to 32 
> > > > read request and issue.
> > > >  |--->
> > > >   |--->
> > > >|--->
> > > >   ..
> > > >
> > > > |--->
> > > > 
> > > > |--->|
> > > >   total 19ms elapsed to complete 32MB read done 
> > > > from device. |
> > > > 
> > > > If bio max size is limited with 1MB, behavior is changed below.
> > > > 
> bio_iov_iter_get_pages> > >  | bio merge for 1MB. 256 pages are merged for 
> each bio.
> > > >  | total 32 bio will be made.
> > > >  | total elapsed time is over 2ms. it's same.
> > > >  | but, first bio submit timing is fast. about 100us.
> > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > >   | 256 pages merged a bio.
> > > >   | at this time, first bio submit is done.
> > > >   | and 1 read request is issued for 1 bio.
> > > >   |--->
> > > >|--->
> > > > |--->
> > > >   ..
> > > >  |--->
> > > >   |--->|
> > > > total 17ms elapsed to complete 32MB read done from device. |
> > > 
> > > Can you share us if enabling THP in your application can avoid this 
> > > issue? BTW, you
> > > need to make the 32MB buffer aligned with huge page size. IMO, THP 
> > > perfectly fits
> > > your case.
> > > 
> > 
> > THP is enabled already like as below in my environment. It has no effect.
> > 
> > cat /sys/kernel/mm/transparent_hugepage/enabled
> > [always] madvise never
> 
> The 32MB user buffer needs to be huge page size aligned. If your system
> supports bcc/bpftrace, it is quite easy to check if the buffer is
> aligned.
> 
> > 
> > This issue was reported from performance benchmark application in open 
> > market.
> > I can't control application's working in open market.
> > It's not only my own case. This issue might be occured in many mobile 
> > environment.
> > At least, I checked this problem in exynos, and qualcomm chipset.
> 
> You just said it takes 2ms for building 32MB bio, but you never investigate 
> the
> reason. I guess it is from get_user_pages_fast(), but maybe others. Can you 
> dig
> further for the reason? Maybe it is one arm64 specific issue.
> 
> BTW, bio_iov_iter_get_pages() just takes ~200us on one x86_64 VM with THP, 
> which is
> observed via bcc/funclatency when running the following workload:
> 

I think you focused on bio_iov_iter_get_pages() because I just commented page
merge delay only. Sorry about that. I missed details of this issue.
Actually there a

Re: [PATCH v4 1/2] bio: limit bio max size

2021-01-31 Thread Changheun Lee
> On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote:
> > bio size can grow up to 4GB when muli-page bvec is enabled.
> > but sometimes it would lead to inefficient behaviors.
> > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > all pages for 32MB would be merged to a bio structure if the pages
> > physical addresses are contiguous. it makes some delay to submit
> > until merge complete. bio max size should be limited to a proper size.
> > 
> > When 32MB chunk read with direct I/O option is coming from userspace,
> > kernel behavior is below now. it's timeline.
> > 
> >  | bio merge for 32MB. total 8,192 pages are merged.
> >  | total elapsed time is over 2ms.
> >  |-- ... --->|
> >  | 8,192 pages merged a bio.
> >  | at this time, first bio 
> > submit is done.
> >  | 1 bio is split to 32 
> > read request and issue.
> >  |--->
> >   |--->
> >|--->
> >   ..
> >
> > |--->
> > 
> > |--->|
> >   total 19ms elapsed to complete 32MB read done 
> > from device. |
> > 
> > If bio max size is limited with 1MB, behavior is changed below.
> > 
> >  | bio merge for 1MB. 256 pages are merged for each bio.
> >  | total 32 bio will be made.
> >  | total elapsed time is over 2ms. it's same.
> >  | but, first bio submit timing is fast. about 100us.
> >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> >   | 256 pages merged a bio.
> >   | at this time, first bio submit is done.
> >   | and 1 read request is issued for 1 bio.
> >   |--->
> >|--->
> > |--->
> >   ..
> >  |--->
> >   |--->|
> > total 17ms elapsed to complete 32MB read done from device. |
> 
> Can you share us if enabling THP in your application can avoid this issue? 
> BTW, you
> need to make the 32MB buffer aligned with huge page size. IMO, THP perfectly 
> fits
> your case.
> 

THP is enabled already like as below in my environment. It has no effect.

cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

This issue was reported from performance benchmark application in open market.
I can't control application's working in open market.
It's not only my own case. This issue might be occured in many mobile 
environment.
At least, I checked this problem in exynos, and qualcomm chipset.

> 
> Thanks,
> Ming
> 
> 

---
Changheun Lee
Samsung Electronics.


[PATCH v4 2/2] bio: add limit_bio_size sysfs

2021-01-28 Thread Changheun Lee
Add new sysfs node to limit bio size.

Signed-off-by: Changheun Lee 
---
 block/blk-sysfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..840d97f427e6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -288,6 +288,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -615,6 +616,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -648,6 +650,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.28.0



[PATCH v4 1/2] bio: limit bio max size

2021-01-28 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 
---
 block/bio.c| 13 -
 include/linux/bio.h|  2 +-
 include/linux/blkdev.h |  3 +++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..c528e1f944c7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -287,6 +287,17 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q = bio->bi_disk->queue;
+
+   if (blk_queue_limit_bio_size(q))
+   return blk_queue_get_max_sectors(q, bio_op(bio))
+   << SECTOR_SHIFT;
+
+   return UINT_MAX;
+}
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -877,7 +888,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..13b6f6562a5b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -113,7 +113,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
return true;
 
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..3aeab9e7e97b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -621,6 +621,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE 28  /* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT   29 /* device supports NOWAIT */
+#define QUEUE_FLAG_LIMIT_BIO_SIZE 30   /* limit bio size */
 
 #define QUEUE_FLAG_MQ_DEFAULT  ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_SAME_COMP) |  \
@@ -667,6 +668,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct 

Re: [PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Changheun Lee
> On 2021/01/26 18:37, Changheun Lee wrote:
> > bio size can grow up to 4GB when muli-page bvec is enabled.
> > but sometimes it would lead to inefficient behaviors.
> > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > all pages for 32MB would be merged to a bio structure if the pages
> > physical addresses are contiguous. it makes some delay to submit
> > until merge complete. bio max size should be limited to a proper size.
> > 
> > When 32MB chunk read with direct I/O option is coming from userspace,
> > kernel behavior is below now. it's timeline.
> > 
> >  | bio merge for 32MB. total 8,192 pages are merged.
> >  | total elapsed time is over 2ms.
> >  |-- ... --->|
> >  | 8,192 pages merged a bio.
> >  | at this time, first bio 
> > submit is done.
> >  | 1 bio is split to 32 
> > read request and issue.
> >  |--->
> >   |--->
> >|--->
> >   ..
> >
> > |--->
> > 
> > |--->|
> >   total 19ms elapsed to complete 32MB read done 
> > from device. |
> > 
> > If bio max size is limited with 1MB, behavior is changed below.
> > 
> >  | bio merge for 1MB. 256 pages are merged for each bio.
> >  | total 32 bio will be made.
> >  | total elapsed time is over 2ms. it's same.
> >  | but, first bio submit timing is fast. about 100us.
> >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> >   | 256 pages merged a bio.
> >   | at this time, first bio submit is done.
> >   | and 1 read request is issued for 1 bio.
> >   |--->
> >|--->
> > |--->
> >   ..
> >  |--->
> >   |------->|
> > total 17ms elapsed to complete 32MB read done from device. |
> > 
> > As a result, read request issue timing is faster if bio max size is limited.
> > Current kernel behavior with multipage bvec, super large bio can be created.
> > And it lead to delay first I/O request issue.
> > 
> > Signed-off-by: Changheun Lee 
> > ---
> >  block/bio.c| 17 -
> >  include/linux/bio.h|  4 +++-
> >  include/linux/blkdev.h |  3 +++
> >  3 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 1f2cc1fbe283..ec0281889045 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -287,6 +287,21 @@ void bio_init(struct bio *bio, struct bio_vec *table,
> >  }
> >  EXPORT_SYMBOL(bio_init);
> >  
> > +unsigned int bio_max_size(struct bio *bio)
> > +{
> > +   struct request_queue *q;
> > +
> > +   if (!bio->bi_disk)
> > +   return UINT_MAX;
> > +
> > +   q = bio->bi_disk->queue;
> > +   if (!blk_queue_limit_bio_size(q))
> > +   return UINT_MAX;
> > +
> > +   return blk_queue_get_max_sectors(q, bio_op(bio)) << SECTOR_SHIFT;
> > +}
> > +EXPORT_SYMBOL(bio_max_size);
> 
> Why export this ? This is only used in this file and in bio.h in bio_full().

OK. I'll remove it.

> 
> > +
> >  /**
> >   * bio_reset - reinitialize a bio
> >   * @bio:   bio to reset
> > @@ -877,7 +892,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
> > *page,
> > struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> >  
> > if (page_is_mergeable(bv, page, len, off, same_page)) {
> > -   if (bio->bi_iter.bi_size > UINT_MAX - len) {
> > +   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
> > *same_page = false;
> > return false;
> > }
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index 1edda614f7ce.

Re: [PATCH 1/1] scsi: sd: use max_xfer_blocks for set rw_max if max_xfer_blocks is available

2021-01-26 Thread Changheun Lee
> Hello Changheun!
> 
> > I want to discuss using max_xfer_blocks instead of opt_xfer_blocks as
> > a optional.  For example, device reports opt_xfer_blocks is 512KB and
> > 1MB as a max_xfer_blocks too. Currently rw_max is set with 512KB only.
> 
> Because that's what the device asks for. If a device explicitly requests
> us to use 512 KB I/Os we should not be sending it 1 MB requests.
> 
> The spec is very clear. It says that if you send a command *larger* than
> opt_xfer_blocks, you should expect *slower* performance. That makes
> max_xfer_blocks a particularly poor choice for setting the default I/O
> size.
> 
> In addition to being slower, max_xfer_blocks could potentially also be
> much, much larger than opt_xfer_blocks. I understand your 512 KB vs. 1
> MB example. But if the max_xfer_blocks limit is reported as 1 GB, is
> that then the right value to use instead of 512 KB? Probably not.
> 
> If a device does not report an opt_xfer_blocks value that suits your
> workload, just override the resulting max_sectors_kb in sysfs. This is
> intentionally a soft limit so it can be adjusted by the user without
> having to change the kernel.
> 
> -- 
> Martin K. PetersenOracle Linux Engineering
> 

I understood what you said. I reviewed meaning of opt_xfer_blocks from
SCSI spec again. I think below is what you saw in spec.

The OPTIMAL TRANSFER LENGTH field indicates the optimal transfer size in
logical blocks for a single command shown in table 197. If a device server
receives one of these commands with a transfer size greater than this value,
then the device server may incur significant delays in processing the
command. If the OPTIMAL TRANSFER LENGTH field is set to zero, then there
is no reported optimal transfer size.

Thank you for kindly feedback. :)

---
Changheun Lee
Samsung Electronics


Re: [PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Changheun Lee
> On Tue, Jan 26, 2021 at 06:26:02AM +, Damien Le Moal wrote:
> > On 2021/01/26 15:07, Ming Lei wrote:
> > > On Tue, Jan 26, 2021 at 04:06:06AM +, Damien Le Moal wrote:
> > >> On 2021/01/26 12:58, Ming Lei wrote:
> > >>> On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changheun Lee wrote:
> > >>>> bio size can grow up to 4GB when muli-page bvec is enabled.
> > >>>> but sometimes it would lead to inefficient behaviors.
> > >>>> in case of large chunk direct I/O, - 32MB chunk read in user space -
> > >>>> all pages for 32MB would be merged to a bio structure if the pages
> > >>>> physical addresses are contiguous. it makes some delay to submit
> > >>>> until merge complete. bio max size should be limited to a proper size.
> > >>>>
> > >>>> When 32MB chunk read with direct I/O option is coming from userspace,
> > >>>> kernel behavior is below now. it's timeline.
> > >>>>
> > >>>>  | bio merge for 32MB. total 8,192 pages are merged.
> > >>>>  | total elapsed time is over 2ms.
> > >>>>  |-- ... --->|
> > >>>>  | 8,192 pages merged 
> > >>>> a bio.
> > >>>>  | at this time, first 
> > >>>> bio submit is done.
> > >>>>  | 1 bio is split to 
> > >>>> 32 read request and issue.
> > >>>>  |--->
> > >>>>   |--->
> > >>>>|--->
> > >>>>   ..
> > >>>>
> > >>>> |--->
> > >>>> 
> > >>>> |--->|
> > >>>>   total 19ms elapsed to complete 32MB read 
> > >>>> done from device. |
> > >>>>
> > >>>> If bio max size is limited with 1MB, behavior is changed below.
> > >>>>
> > >>>>  | bio merge for 1MB. 256 pages are merged for each bio.
> > >>>>  | total 32 bio will be made.
> > >>>>  | total elapsed time is over 2ms. it's same.
> > >>>>  | but, first bio submit timing is fast. about 100us.
> > >>>>  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > >>>>   | 256 pages merged a bio.
> > >>>>   | at this time, first bio submit is done.
> > >>>>   | and 1 read request is issued for 1 bio.
> > >>>>   |--->
> > >>>>|--->
> > >>>> |--->
> > >>>>   ..
> > >>>>  |--->
> > >>>>   |--->|
> > >>>> total 17ms elapsed to complete 32MB read done from device. |
> > >>>>
> > >>>> As a result, read request issue timing is faster if bio max size is 
> > >>>> limited.
> > >>>> Current kernel behavior with multipage bvec, super large bio can be 
> > >>>> created.
> > >>>> And it lead to delay first I/O request issue.
> > >>>>
> > >>>> Signed-off-by: Changheun Lee 
> > >>>> ---
> > >>>>  block/bio.c| 17 -
> > >>>>  include/linux/bio.h|  4 +++-
> > >>>>  include/linux/blkdev.h |  3 +++
> > >>>>  3 files changed, 22 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/block/bio.c b/block/bio.c
> > >>>> index 1f2cc1fbe283..ec0281889045 100644
> > >>>> --- a/block/bio.c
> > >>>> +++ b/block/bio.c
> > >>>> @@ -287,6 +287,21 @@ void bio_init(struct bio *bio, struct bio_vec 
> > >>>

Re: [PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Changheun Lee
> On 2021/01/27 9:36, Changheun Lee wrote:
> >>> +
> >>>  /**
> >>>   * bio_reset - reinitialize a bio
> >>>   * @bio: bio to reset
> >>> @@ -877,7 +892,7 @@ bool __bio_try_merge_page(struct bio *bio, struct 
> >>> page *page,
> >>>   struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
> >>>  
> >>>   if (page_is_mergeable(bv, page, len, off, same_page)) {
> >>> - if (bio->bi_iter.bi_size > UINT_MAX - len) {
> >>> + if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
> >>>   *same_page = false;
> >>>   return false;
> >>>   }
> >>> diff --git a/include/linux/bio.h b/include/linux/bio.h
> >>> index 1edda614f7ce..cdb134ca7bf5 100644
> >>> --- a/include/linux/bio.h
> >>> +++ b/include/linux/bio.h
> >>> @@ -100,6 +100,8 @@ static inline void *bio_data(struct bio *bio)
> >>>   return NULL;
> >>>  }
> >>>  
> >>> +extern unsigned int bio_max_size(struct bio *);
> >>
> >> No need for extern.
> > 
> > It's just for compile warning in my test environment.
> > I'll remove it too. But I think compile warning could be in the other
> > .c file which includes bio.h. Is it OK?
> 
> Hmmm... not having extern should not generate a compilation warning. There are
> tons of functions declared without extern in header files in the kernel. What
> compiler are you using ?
> 

Compiler imformation is below.

CROSS_COMPILE: 
android/prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/bin/aarch64-linux-android-
CC: android/prebuilts/clang/host/linux-x86/clang-r383902/bin/clang
CLANG_TRIPLE: 
android/prebuilts/clang/host/linux-x86/clang-r383902/bin/aarch64-linux-gnu-
CROSS_COMPILE_COMPAT: 
android/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/bin/arm-linux-androideabi-


> 
> -- 
> Damien Le Moal
> Western Digital Research
> 

---
Changheun Lee
Samsung Electronics


[PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 
---
 block/bio.c| 17 -
 include/linux/bio.h|  4 +++-
 include/linux/blkdev.h |  3 +++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..ec0281889045 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -287,6 +287,21 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q;
+
+   if (!bio->bi_disk)
+   return UINT_MAX;
+
+   q = bio->bi_disk->queue;
+   if (!blk_queue_limit_bio_size(q))
+   return UINT_MAX;
+
+   return blk_queue_get_max_sectors(q, bio_op(bio)) << SECTOR_SHIFT;
+}
+EXPORT_SYMBOL(bio_max_size);
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -877,7 +892,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..cdb134ca7bf5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -100,6 +100,8 @@ static inline void *bio_data(struct bio *bio)
return NULL;
 }
 
+extern unsigned int bio_max_size(struct bio *);
+
 /**
  * bio_full - check if the bio is full
  * @bio:   bio to check
@@ -113,7 +115,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
return true;
 
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..3aeab9e7e97b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -621,6 +621,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE 28  /* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT   29 /* device supports NOWAIT */
+#d

Re: [PATCH 1/1] scsi: sd: use max_xfer_blocks for set rw_max if max_xfer_blocks is available

2021-01-26 Thread Changheun Lee
> Damien,
> 
> >> How about set larger valid value between sdkp->max_xfer_blocks,
> >> and sdkp->opt_xfer_blocks to rw_max?
> >
> > Again, if your device reports an opt_xfer_blocks value that is too
> > small for its own good, that is a problem with this device.
> 
> Correct. It is very much intentional that we do not default to issuing
> the largest commands supported by the physical hardware.
> 
> If the device is not reporting an optimal transfer size, and the block
> layer default is too small, the solution is to adjust max_sectors_kb in
> sysfs (by adding a udev rule, for instance).
>

Sorry, I said wrong.
As others mentioned, it's device problem if opt_xfer_blocks is wrong.
kernel modification is not needed for it.

I want to discuss using max_xfer_blocks instead of opt_xfer_blocks as a 
optional.
For example, device reports opt_xfer_blocks is 512KB and 1MB as a
max_xfer_blocks too. Currently rw_max is set with 512KB only.
I want a option to select max_xfer_blocks for rw_max.
Are there any historical problem when rw_max is set with max_xfer_blocks?

How about below approach if max_xfer_blocks can be set to rw_max?
It's using queue flag as a option. Do you have good idea to suggest?

static bool sd_validate_max_xfer_size(struct scsi_disk *sdkp,
  unsigned int dev_max)
{
struct request_queue *q = sdkp->disk->queue;

if (!blk_queue_allow_sd_rw_max(q))
return false;

if (sdkp->max_xfer_blocks == 0)
return false;

..

return true;
}

static int sd_revalidate_disk(struct gendisk *disk)
{
..

if (sd_validate_max_xfer_size(sdkp, dev_max)) {
q->limits.io_opt = logical_to_bytes(sdp, sdkp->max_xfer_blocks);
rw_max = logical_to_sectors(sdp, sdkp->max_xfer_blocks);
} else if (sd_validate_opt_xfer_size(sdkp, dev_max)) {
q->limits.io_opt = logical_to_bytes(sdp, sdkp->opt_xfer_blocks);
rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
} else {

..
}

> -- 
> Martin K. PetersenOracle Linux Engineering


[PATCH v3 2/2] bio: add limit_bio_size sysfs

2021-01-26 Thread Changheun Lee
Add new sysfs node to limit bio size.

Signed-off-by: Changheun Lee 
---
 block/blk-sysfs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f1683af0..840d97f427e6 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -288,6 +288,7 @@ QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
+QUEUE_SYSFS_BIT_FNS(limit_bio_size, LIMIT_BIO_SIZE, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -615,6 +616,7 @@ QUEUE_RW_ENTRY(queue_nonrot, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_RW_ENTRY(queue_limit_bio_size, "limit_bio_size");
 
 static struct attribute *queue_attrs[] = {
_requests_entry.attr,
@@ -648,6 +650,7 @@ static struct attribute *queue_attrs[] = {
_rq_affinity_entry.attr,
_iostats_entry.attr,
_stable_writes_entry.attr,
+   _limit_bio_size_entry.attr,
_random_entry.attr,
_poll_entry.attr,
_wc_entry.attr,
-- 
2.28.0



[PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if the pages
physical addresses are contiguous. it makes some delay to submit
until merge complete. bio max size should be limited to a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 
---
 block/bio.c| 17 -
 include/linux/bio.h|  4 +++-
 include/linux/blkdev.h |  3 +++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..ec0281889045 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -287,6 +287,21 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 }
 EXPORT_SYMBOL(bio_init);
 
+unsigned int bio_max_size(struct bio *bio)
+{
+   struct request_queue *q;
+
+   if (!bio->bi_disk)
+   return UINT_MAX;
+
+   q = bio->bi_disk->queue;
+   if (!blk_queue_limit_bio_size(q))
+   return UINT_MAX;
+
+   return blk_queue_get_max_sectors(q, bio_op(bio)) << SECTOR_SHIFT;
+}
+EXPORT_SYMBOL(bio_max_size);
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -877,7 +892,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len) {
*same_page = false;
return false;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..cdb134ca7bf5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -100,6 +100,8 @@ static inline void *bio_data(struct bio *bio)
return NULL;
 }
 
+extern unsigned int bio_max_size(struct bio *);
+
 /**
  * bio_full - check if the bio is full
  * @bio:   bio to check
@@ -113,7 +115,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > bio_max_size(bio) - len)
return true;
 
return false;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..3aeab9e7e97b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -621,6 +621,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE 28  /* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT   29 /* device supports NOWAIT */
+#d

Re: [PATCH v2] bio: limit bio max size

2021-01-24 Thread Changheun Lee
> On Thu, Jan 21, 2021 at 09:58:03AM +0900, Changheun Lee wrote:
> > bio size can grow up to 4GB when muli-page bvec is enabled.
> > but sometimes it would lead to inefficient behaviors.
> > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > all pages for 32MB would be merged to a bio structure if memory address is
> > continued phsycally. it makes some delay to submit until merge complete.
> > bio max size should be limited as a proper size.
> > 
> > When 32MB chunk read with direct I/O option is coming from userspace,
> > kernel behavior is below now. it's timeline.
> 
> IMO, the issue should only exist on sync direct IO and writeback.
> Not sure if writeback cares this small delay because user data
> has been written to page cache already.
> 
> Wrt. your big direct IO case, as I suggested, you may reduce the
> submission delay a lot by applying THP.
> 
> Or you can just hardcode the limit in case of sync dio.

As you said, this small delay is not affect in case of writeback.
It's not common, but there are needs of direct I/O from userspace.
It will be operated optional, not default in v3 patch version.
And it'll be activated with queue flag, or sysfs node.
Please, review it again.

> 
> > 
> >  | bio merge for 32MB. total 8,192 pages are merged.
> >  | total elapsed time is over 2ms.
> >  |-- ... --->|
> >  | 8,192 pages merged a bio.
> >  | at this time, first bio 
> > submit is done.
> >  | 1 bio is split to 32 
> > read request and issue.
> >  |--->
> >   |--->
> >|--->
> >   ..
> >
> > |--->
> > 
> > |--->|
> >   total 19ms elapsed to complete 32MB read done 
> > from device. |
> > 
> > If bio max size is limited with 1MB, behavior is changed below.
> > 
> >  | bio merge for 1MB. 256 pages are merged for each bio.
> >  | total 32 bio will be made.
> >  | total elapsed time is over 2ms. it's same.
> >  | but, first bio submit timing is fast. about 100us.
> >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> >   | 256 pages merged a bio.
> >   | at this time, first bio submit is done.
> >   | and 1 read request is issued for 1 bio.
> >   |--->
> >|--->
> > |--->
> >   ..
> >  |--->
> >       |--->|
> > total 17ms elapsed to complete 32MB read done from device. |
> > 
> > As a result, read request issue timing is faster if bio max size is limited.
> > Current kernel behavior with multipage bvec, super large bio can be created.
> > And it lead to delay first I/O request issue.
> > 
> > Signed-off-by: Changheun Lee 
> > 
> > ---
> >  block/bio.c   | 17 -
> >  include/linux/bio.h   | 13 +++--
> >  include/linux/blk_types.h |  1 +
> >  3 files changed, 20 insertions(+), 11 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 1f2cc1fbe283..027503c2e2e7 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -284,9 +284,24 @@ void bio_init(struct bio *bio, struct bio_vec *table,
> >  
> > bio->bi_io_vec = table;
> > bio->bi_max_vecs = max_vecs;
> > +   bio->bi_max_size = UINT_MAX;
> >  }
> >  EXPORT_SYMBOL(bio_init);
> >  
> > +void bio_set_dev(struct bio *bio, struct block_device *bdev)
> > +{
> > +   if (bio->bi_disk != bdev->bd_disk)
> > +   bio_clear_flag(bio, BIO_THROTTLED);
> > +
> > +   bio->bi_disk = bdev->bd_disk;
> > +   bio->bi_partno = bdev->bd_partno;
> > +   bio->bi_max_size = blk_queue_get_max_sectors(bio->bi_disk->queue,
> > +   bio_op(bio)) << SECTOR_SHIFT;
> > +
> > +   bio_associate_blkg(bio);
> > +}
> > +

Re: [PATCH 1/1] scsi: sd: use max_xfer_blocks for set rw_max if max_xfer_blocks is available

2021-01-21 Thread Changheun Lee
> On 2021/01/20 15:45, Manjong Lee wrote:
> > Add recipients for more reviews.
> 
> Please resend instead of replying to your own patch. The reply quoting 
> corrupts
> the patch.
> 
> The patch title is very long.
> 
> > 
> >> SCSI device has max_xfer_size and opt_xfer_size,
> >> but current kernel uses only opt_xfer_size.
> >>
> >> It causes the limitation on setting IO chunk size,
> >> although it can support larger one.
> >>
> >> So, I propose this patch to use max_xfer_size in case it has valid value.
> >> It can support to use the larger chunk IO on SCSI device.
> >>
> >> For example,
> >> This patch is effective in case of some SCSI device like UFS
> >> with opt_xfer_size 512KB, queue depth 32 and max_xfer_size over 512KB.
> >>
> >> I expect both the performance improvement
> >> and the efficiency use of smaller command queue depth.
> 
> This can be measured, and this commit message should include results to show 
> how
> effective this change is.
> 
> >>
> >> Signed-off-by: Manjong Lee 
> >> ---
> >> drivers/scsi/sd.c | 56 +++
> >> 1 file changed, 52 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >> index 679c2c025047..de59f01c1304 100644
> >> --- a/drivers/scsi/sd.c
> >> +++ b/drivers/scsi/sd.c
> >> @@ -3108,6 +3108,53 @@ static void sd_read_security(struct scsi_disk 
> >> *sdkp, unsigned char *buffer)
> >> sdkp->security = 1;
> >> }
> >>
> >> +static bool sd_validate_max_xfer_size(struct scsi_disk *sdkp,
> >> +unsigned int dev_max)
> >> +{
> >> +  struct scsi_device *sdp = sdkp->device;
> >> +  unsigned int max_xfer_bytes =
> >> +  logical_to_bytes(sdp, sdkp->max_xfer_blocks);
> >> +
> >> +  if (sdkp->max_xfer_blocks == 0)
> >> +  return false;
> >> +
> >> +  if (sdkp->max_xfer_blocks > SD_MAX_XFER_BLOCKS) {
> >> +  sd_first_printk(KERN_WARNING, sdkp,
> >> +  "Maximal transfer size %u logical blocks " \
> >> +  "> sd driver limit (%u logical blocks)\n",
> >> +  sdkp->max_xfer_blocks, SD_DEF_XFER_BLOCKS);
> >> +  return false;
> >> +  }
> >> +
> >> +  if (sdkp->max_xfer_blocks > dev_max) {
> >> +  sd_first_printk(KERN_WARNING, sdkp,
> >> +  "Maximal transfer size %u logical blocks "
> >> +  "> dev_max (%u logical blocks)\n",
> >> +  sdkp->max_xfer_blocks, dev_max);
> >> +  return false;
> >> +  }
> >> +
> >> +  if (max_xfer_bytes < PAGE_SIZE) {
> >> +  sd_first_printk(KERN_WARNING, sdkp,
> >> +  "Maximal transfer size %u bytes < " \
> >> +  "PAGE_SIZE (%u bytes)\n",
> >> +  max_xfer_bytes, (unsigned int)PAGE_SIZE);
> >> +  return false;
> >> +  }
> >> +
> >> +  if (max_xfer_bytes & (sdkp->physical_block_size - 1)) {
> >> +  sd_first_printk(KERN_WARNING, sdkp,
> >> +  "Maximal transfer size %u bytes not a " \
> >> +  "multiple of physical block size (%u bytes)\n",
> >> +  max_xfer_bytes, sdkp->physical_block_size);
> >> +  return false;
> >> +  }
> >> +
> >> +  sd_first_printk(KERN_INFO, sdkp, "Maximal transfer size %u bytes\n",
> >> +  max_xfer_bytes);
> >> +  return true;
> >> +}
> 
> Except for the order of the comparisons against SD_MAX_XFER_BLOCKS and 
> dev_max,
> this function looks identical to sd_validate_opt_xfer_size(), modulo the use 
> of
> max_xfer_blocks instead of opt_xfer_blocks. Can't you turn this into 
> something like:
> 
> static bool sd_validate_max_xfer_size(struct scsi_disk *sdkp,
> const char *name,
> unsigned int xfer_blocks,
> unsigned int dev_max)
> 
> To allow checking both opt_xfer_blocks and max_xfer_blocks ?
> 
> >> +
> >> /*
> >> * Determine the device's preferred I/O size for reads and writes
> >> * unless the reported value is unreasonably small, large, not a
> >> @@ -3233,12 +3280,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
> >>
> >> /* Initial block count limit based on CDB TRANSFER LENGTH field size. */
> >> dev_max = sdp->use_16_for_rw ? SD_MAX_XFER_BLOCKS : SD_DEF_XFER_BLOCKS;
> 
> This looks weird: no indentation. Care to resend ?
> 
> >> -
> >> -  /* Some devices report a maximum block count for READ/WRITE requests. */
> >> -  dev_max = min_not_zero(dev_max, sdkp->max_xfer_blocks);
> >> q->limits.max_dev_sectors = logical_to_sectors(sdp, dev_max);
> >>
> >> -  if (sd_validate_opt_xfer_size(sdkp, dev_max)) {
> >> +  if (sd_validate_max_xfer_size(sdkp, dev_max)) {
> >> +  q->limits.io_opt = 0;
> >> +  rw_max = logical_to_sectors(sdp, sdkp->max_xfer_blocks);
> >> +  q->limits.max_dev_sectors = rw_max;
> >> +  } else if (sd_validate_opt_xfer_size(sdkp, dev_max)) {
> 
> This does not look correct to me. This renders the device 

Re: [PATCH v2] bio: limit bio max size

2021-01-21 Thread Changheun Lee
> On Thu, 2021-01-21 at 18:36 +0900, Changheun Lee wrote:
> > > Please drop the "." at the end of the patch title.
> > > 
> > > > bio size can grow up to 4GB when muli-page bvec is enabled.
> > > > but sometimes it would lead to inefficient behaviors.
> > > > in case of large chunk direct I/O, - 32MB chunk read in user space -
> > > > all pages for 32MB would be merged to a bio structure if memory address
> > > > is
> > > > continued phsycally. it makes some delay to submit until merge complete.
> > > 
> > > s/if memory address is continued phsycally/if the pages physical addresses
> > > are
> > > contiguous/
> > > 
> > > > bio max size should be limited as a proper size.
> > > 
> > > s/as/to/
> > 
> > Thank you for advice. :)
> > 
> > > 
> > > > 
> > > > When 32MB chunk read with direct I/O option is coming from userspace,
> > > > kernel behavior is below now. it's timeline.
> > > > 
> > > >  | bio merge for 32MB. total 8,192 pages are merged.
> > > >  | total elapsed time is over 2ms.
> > > >  |-- ... --->|
> > > >  | 8,192 pages merged a
> > > > bio.
> > > >  | at this time, first
> > > > bio submit is done.
> > > >  | 1 bio is split to 32
> > > > read request and issue.
> > > >  |--->
> > > >   |--->
> > > >|--->
> > > >   ..
> > > >
> > > > |-
> > > > -->
> > > > 
> > > > |
> > > > --->|
> > > >   total 19ms elapsed to complete 32MB read done
> > > > from device. |
> > > > 
> > > > If bio max size is limited with 1MB, behavior is changed below.
> > > > 
> > > >  | bio merge for 1MB. 256 pages are merged for each bio.
> > > >  | total 32 bio will be made.
> > > >  | total elapsed time is over 2ms. it's same.
> > > >  | but, first bio submit timing is fast. about 100us.
> > > >  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
> > > >   | 256 pages merged a bio.
> > > >   | at this time, first bio submit is done.
> > > >   | and 1 read request is issued for 1 bio.
> > > >   |--->
> > > >|--->
> > > > |--->
> > > >   ..
> > > >  |--->
> > > >   |--->|
> > > > total 17ms elapsed to complete 32MB read done from device. |
> > > > 
> > > > As a result, read request issue timing is faster if bio max size is
> > > > limited.
> > > > Current kernel behavior with multipage bvec, super large bio can be
> > > > created.
> > > > And it lead to delay first I/O request issue.
> > > > 
> > > > Signed-off-by: Changheun Lee 
> > > > 
> > > > ---
> > > >  block/bio.c   | 17 -
> > > >  include/linux/bio.h   | 13 +++--
> > > >  include/linux/blk_types.h |  1 +
> > > >  3 files changed, 20 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/block/bio.c b/block/bio.c
> > > > index 1f2cc1fbe283..027503c2e2e7 100644
> > > > --- a/block/bio.c
> > > > +++ b/block/bio.c
> > > > @@ -284,9 +284,24 @@ void bio_init(struct bio *bio, struct bio_vec
> > > > *table,
> > > >  
> > > > bio->bi_io_vec = table;
> > > > bio->bi_max_vecs = max_vecs;
> > > > +   bio->bi_max_size = UINT_MAX;
> > > >  }
> > > >  EXP

Re: [PATCH v2] bio: limit bio max size

2021-01-21 Thread Changheun Lee
>Please drop the "." at the end of the patch title.
>
>> bio size can grow up to 4GB when muli-page bvec is enabled.
>> but sometimes it would lead to inefficient behaviors.
>> in case of large chunk direct I/O, - 32MB chunk read in user space -
>> all pages for 32MB would be merged to a bio structure if memory address is
>> continued phsycally. it makes some delay to submit until merge complete.
>
>s/if memory address is continued phsycally/if the pages physical addresses are
>contiguous/
>
>> bio max size should be limited as a proper size.
>
>s/as/to/

Thank you for advice. :)

>
>> 
>> When 32MB chunk read with direct I/O option is coming from userspace,
>> kernel behavior is below now. it's timeline.
>> 
>>  | bio merge for 32MB. total 8,192 pages are merged.
>>  | total elapsed time is over 2ms.
>>  |-- ... --->|
>>  | 8,192 pages merged a bio.
>>  | at this time, first bio 
>> submit is done.
>>  | 1 bio is split to 32 read 
>> request and issue.
>>  |--->
>>   |--->
>>|--->
>>   ..
>>
>> |--->
>> 
>> |--->|
>>   total 19ms elapsed to complete 32MB read done from 
>> device. |
>> 
>> If bio max size is limited with 1MB, behavior is changed below.
>> 
>>  | bio merge for 1MB. 256 pages are merged for each bio.
>>  | total 32 bio will be made.
>>  | total elapsed time is over 2ms. it's same.
>>  | but, first bio submit timing is fast. about 100us.
>>  |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
>>   | 256 pages merged a bio.
>>   | at this time, first bio submit is done.
>>   | and 1 read request is issued for 1 bio.
>>   |--->
>>|--->
>> |--->
>>   ..
>>  |--->
>>       |--->|
>> total 17ms elapsed to complete 32MB read done from device. |
>> 
>> As a result, read request issue timing is faster if bio max size is limited.
>> Current kernel behavior with multipage bvec, super large bio can be created.
>> And it lead to delay first I/O request issue.
>> 
>> Signed-off-by: Changheun Lee 
>> 
>> ---
>>  block/bio.c   | 17 -
>>  include/linux/bio.h   | 13 +++--
>>  include/linux/blk_types.h |  1 +
>>  3 files changed, 20 insertions(+), 11 deletions(-)
>> 
>> diff --git a/block/bio.c b/block/bio.c
>> index 1f2cc1fbe283..027503c2e2e7 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -284,9 +284,24 @@ void bio_init(struct bio *bio, struct bio_vec *table,
>>  
>>  bio->bi_io_vec = table;
>>  bio->bi_max_vecs = max_vecs;
>> +bio->bi_max_size = UINT_MAX;
>>  }
>>  EXPORT_SYMBOL(bio_init);
>>  
>> +void bio_set_dev(struct bio *bio, struct block_device *bdev)
>> +{
>> +if (bio->bi_disk != bdev->bd_disk)
>> +bio_clear_flag(bio, BIO_THROTTLED);
>> +
>> +bio->bi_disk = bdev->bd_disk;
>> +bio->bi_partno = bdev->bd_partno;
>> +bio->bi_max_size = blk_queue_get_max_sectors(bio->bi_disk->queue,
>> +bio_op(bio)) << SECTOR_SHIFT;
>> +
>> +bio_associate_blkg(bio);
>> +}
>> +EXPORT_SYMBOL(bio_set_dev);
>> +
>>  /**
>>   * bio_reset - reinitialize a bio
>>   * @bio:bio to reset
>> @@ -877,7 +892,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
>> *page,
>>  struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
>>  
>>  if (page_is_mergeable(bv, page, len, off, same_page)) {
>> -if (bio->bi_iter.bi_size > UINT_MAX - len) {
>> +if (bio->bi_iter.bi_size > bio->bi_max_size - len)
&

[PATCH v2] bio: limit bio max size.

2021-01-20 Thread Changheun Lee
bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 32MB chunk read in user space -
all pages for 32MB would be merged to a bio structure if memory address is
continued phsycally. it makes some delay to submit until merge complete.
bio max size should be limited as a proper size.

When 32MB chunk read with direct I/O option is coming from userspace,
kernel behavior is below now. it's timeline.

 | bio merge for 32MB. total 8,192 pages are merged.
 | total elapsed time is over 2ms.
 |-- ... --->|
 | 8,192 pages merged a bio.
 | at this time, first bio 
submit is done.
 | 1 bio is split to 32 read 
request and issue.
 |--->
  |--->
   |--->
  ..
   
|--->

|--->|
  total 19ms elapsed to complete 32MB read done from 
device. |

If bio max size is limited with 1MB, behavior is changed below.

 | bio merge for 1MB. 256 pages are merged for each bio.
 | total 32 bio will be made.
 | total elapsed time is over 2ms. it's same.
 | but, first bio submit timing is fast. about 100us.
 |--->|--->|--->|---> ... -->|--->|--->|--->|--->|
  | 256 pages merged a bio.
  | at this time, first bio submit is done.
  | and 1 read request is issued for 1 bio.
  |--->
   |--->
|--->
  ..
 |--->
  |--->|
total 17ms elapsed to complete 32MB read done from device. |

As a result, read request issue timing is faster if bio max size is limited.
Current kernel behavior with multipage bvec, super large bio can be created.
And it lead to delay first I/O request issue.

Signed-off-by: Changheun Lee 

---
 block/bio.c   | 17 -
 include/linux/bio.h   | 13 +++--
 include/linux/blk_types.h |  1 +
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..027503c2e2e7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -284,9 +284,24 @@ void bio_init(struct bio *bio, struct bio_vec *table,
 
bio->bi_io_vec = table;
bio->bi_max_vecs = max_vecs;
+   bio->bi_max_size = UINT_MAX;
 }
 EXPORT_SYMBOL(bio_init);
 
+void bio_set_dev(struct bio *bio, struct block_device *bdev)
+{
+   if (bio->bi_disk != bdev->bd_disk)
+   bio_clear_flag(bio, BIO_THROTTLED);
+
+   bio->bi_disk = bdev->bd_disk;
+   bio->bi_partno = bdev->bd_partno;
+   bio->bi_max_size = blk_queue_get_max_sectors(bio->bi_disk->queue,
+   bio_op(bio)) << SECTOR_SHIFT;
+
+   bio_associate_blkg(bio);
+}
+EXPORT_SYMBOL(bio_set_dev);
+
 /**
  * bio_reset - reinitialize a bio
  * @bio:   bio to reset
@@ -877,7 +892,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > bio->bi_max_size - len)
*same_page = false;
return false;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..b9803e80c259 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -113,7 +113,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > bio->bi_max_size - len)
return true;
 
return false;
@@ -482,20 +482,13 @@ extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned 
long *, mempool_t *);
 extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
 extern unsigned int bvec_nr_vecs(unsigned short idx);
 extern const char *bio_devname(struct bio *bio, char *buffer);
-
-#define bio_set_dev(bio, bdev) \
-do {   \
-   if ((bio)->bi_disk 

Re: [PATCH] bio: limit bio max size.

2021-01-13 Thread Changheun Lee
>On 2021/01/14 12:53, Ming Lei wrote:
>> On Wed, Jan 13, 2021 at 12:02:44PM +, Damien Le Moal wrote:
>>> On 2021/01/13 20:48, Ming Lei wrote:
>>>> On Wed, Jan 13, 2021 at 11:16:11AM +, Damien Le Moal wrote:
>>>>> On 2021/01/13 19:25, Ming Lei wrote:
>>>>>> On Wed, Jan 13, 2021 at 09:28:02AM +, Damien Le Moal wrote:
>>>>>>> On 2021/01/13 18:19, Ming Lei wrote:
>>>>>>>> On Wed, Jan 13, 2021 at 12:09 PM Changheun Lee 
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>> On 2021/01/12 21:14, Changheun Lee wrote:
>>>>>>>>>>>> On 2021/01/12 17:52, Changheun Lee wrote:
>>>>>>>>>>>>> From: "Changheun Lee" 
>>>>>>>>>>>>>
>>>>>>>>>>>>> bio size can grow up to 4GB when muli-page bvec is enabled.
>>>>>>>>>>>>> but sometimes it would lead to inefficient behaviors.
>>>>>>>>>>>>> in case of large chunk direct I/O, - 64MB chunk read in user 
>>>>>>>>>>>>> space -
>>>>>>>>>>>>> all pages for 64MB would be merged to a bio structure if memory 
>>>>>>>>>>>>> address is
>>>>>>>>>>>>> continued phsycally. it makes some delay to submit until merge 
>>>>>>>>>>>>> complete.
>>>>>>>>>>>>> bio max size should be limited as a proper size.
>>>>>>>>>>>>
>>>>>>>>>>>> But merging physically contiguous pages into the same bvec + later 
>>>>>>>>>>>> automatic bio
>>>>>>>>>>>> split on submit should give you better throughput for large IOs 
>>>>>>>>>>>> compared to
>>>>>>>>>>>> having to issue a bio chain of smaller BIOs that are arbitrarily 
>>>>>>>>>>>> sized and will
>>>>>>>>>>>> likely need splitting anyway (because of DMA boundaries etc).
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have a specific case where you see higher performance with 
>>>>>>>>>>>> this patch
>>>>>>>>>>>> applied ? On Intel, BIO_MAX_SIZE would be 1MB... That is arbitrary 
>>>>>>>>>>>> and too small
>>>>>>>>>>>> considering that many hardware can execute larger IOs than that.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When I tested 32MB chunk read with O_DIRECT in android, all pages 
>>>>>>>>>>> of 32MB
>>>>>>>>>>> is merged into a bio structure.
>>>>>>>>>>> And elapsed time to merge complete was about 2ms.
>>>>>>>>>>> It means first bio-submit is after 2ms.
>>>>>>>>>>> If bio size is limited with 1MB with this patch, first bio-submit 
>>>>>>>>>>> is about
>>>>>>>>>>> 100us by bio_full operation.
>>>>>>>>>>
>>>>>>>>>> bio_submit() will split the large BIO case into multiple requests 
>>>>>>>>>> while the
>>>>>>>>>> small BIO case will likely result one or two requests only. That 
>>>>>>>>>> likely explain
>>>>>>>>>> the time difference here. However, for the large case, the 2ms will 
>>>>>>>>>> issue ALL
>>>>>>>>>> requests needed for processing the entire 32MB user IO while the 1MB 
>>>>>>>>>> bio case
>>>>>>>>>> will need 32 different bio_submit() calls. So what is the actual 
>>>>>>>>>> total latency
>>>>>>>>>> difference for the entire 32MB user IO ? That is I think what needs 
>>>>>>>>>> to be
>>>>>>>>>> compared here.
>>>>>>>>>>
>>>>>>>>>> Also, what is your device max_sectors_kb and max queue depth ?
>>>>>>>>&g

Re: Re: [PATCH] bio: limit bio max size.

2021-01-12 Thread Changheun Lee
>On 2021/01/13 13:01, Changheun Lee wrote:
>>> On 2021/01/12 21:14, Changheun Lee wrote:
>>>>> On 2021/01/12 17:52, Changheun Lee wrote:
>>>>>> From: "Changheun Lee" 
>>>>>>
>>>>>> bio size can grow up to 4GB when muli-page bvec is enabled.
>>>>>> but sometimes it would lead to inefficient behaviors.
>>>>>> in case of large chunk direct I/O, - 64MB chunk read in user space -
>>>>>> all pages for 64MB would be merged to a bio structure if memory address 
>>>>>> is
>>>>>> continued phsycally. it makes some delay to submit until merge complete.
>>>>>> bio max size should be limited as a proper size.
>>>>>
>>>>> But merging physically contiguous pages into the same bvec + later 
>>>>> automatic bio
>>>>> split on submit should give you better throughput for large IOs compared 
>>>>> to
>>>>> having to issue a bio chain of smaller BIOs that are arbitrarily sized 
>>>>> and will
>>>>> likely need splitting anyway (because of DMA boundaries etc).
>>>>>
>>>>> Do you have a specific case where you see higher performance with this 
>>>>> patch
>>>>> applied ? On Intel, BIO_MAX_SIZE would be 1MB... That is arbitrary and 
>>>>> too small
>>>>> considering that many hardware can execute larger IOs than that.
>>>>>
>>>>
>>>> When I tested 32MB chunk read with O_DIRECT in android, all pages of 32MB
>>>> is merged into a bio structure.
>>>> And elapsed time to merge complete was about 2ms.
>>>> It means first bio-submit is after 2ms.
>>>> If bio size is limited with 1MB with this patch, first bio-submit is about
>>>> 100us by bio_full operation.
>>>
>>> bio_submit() will split the large BIO case into multiple requests while the
>>> small BIO case will likely result one or two requests only. That likely 
>>> explain
>>> the time difference here. However, for the large case, the 2ms will issue 
>>> ALL
>>> requests needed for processing the entire 32MB user IO while the 1MB bio 
>>> case
>>> will need 32 different bio_submit() calls. So what is the actual total 
>>> latency
>>> difference for the entire 32MB user IO ? That is I think what needs to be
>>> compared here.
>>>
>>> Also, what is your device max_sectors_kb and max queue depth ?
>>>
>> 
>> 32MB total latency is about 19ms including merge time without this patch.
>> But with this patch, total latency is about 17ms including merge time too.
>> Actually 32MB read time from device is same - about 16.7ms - in driver layer.
>> No need to hold more I/O than max_sectors_kb during bio merge.
>> My device is UFS. and max_sectors_kb is 1MB, queue depth is 32.
>
>This may be due to the CPU being slow: it takes time to build the 32MB BIO,
>during which the device is idling. With the 1MB BIO limit, the first BIO hits
>the drive much more quickly, reducing idle time of the device and leading to
>higher throughput. But there are 31 more BIOs to build and issue after the 
>first
>one... That does create a pipeline of requests keeping the device busy, but 
>that
>also likely keeps your CPU a lot more busy building these additional BIOs.
>Overall, do you see a difference in CPU load for the 32MB BIO case vs the 1MB
>max BIO case ? Any increase in CPU load with the lower BIO size limit ?
>

CPU load is more than 32MB bio size. Because bio_sumbit progress is doing in 
parallel.
But I tested it same CPU operation frequency, So there are no difference of CPU 
speed.
Logically, bio max size is 4GB now. it's not realistic I know, but 4GB merge to 
a bio
will takes much time even if CPU works fast.
This is why I think bio max size should be limited.

>> 
>>>> It's not large delay and can't be observed with low speed device.
>>>> But it's needed to reduce merge delay for high speed device.
>>>> I improved 512MB sequential read performance from 1900MB/s to 2000MB/s
>>>> with this patch on android platform.
>>>> As you said, 1MB might be small for some device.
>>>> But method is needed to re-size, or select the bio max size.
>>>
>>> At the very least, I think that such limit should not be arbitrary as your 
>>> patch
>>> proposes but rely on the device characteristics (e.g.
>>> max_hw_sectors_kb/max_sectors_kb and queue depth).
>>>
>> 
>> 

Re: Re: [PATCH] bio: limit bio max size.

2021-01-12 Thread Changheun Lee
>On 2021/01/12 21:14, Changheun Lee wrote:
>>> On 2021/01/12 17:52, Changheun Lee wrote:
>>>> From: "Changheun Lee" 
>>>>
>>>> bio size can grow up to 4GB when muli-page bvec is enabled.
>>>> but sometimes it would lead to inefficient behaviors.
>>>> in case of large chunk direct I/O, - 64MB chunk read in user space -
>>>> all pages for 64MB would be merged to a bio structure if memory address is
>>>> continued phsycally. it makes some delay to submit until merge complete.
>>>> bio max size should be limited as a proper size.
>>>
>>> But merging physically contiguous pages into the same bvec + later 
>>> automatic bio
>>> split on submit should give you better throughput for large IOs compared to
>>> having to issue a bio chain of smaller BIOs that are arbitrarily sized and 
>>> will
>>> likely need splitting anyway (because of DMA boundaries etc).
>>>
>>> Do you have a specific case where you see higher performance with this patch
>>> applied ? On Intel, BIO_MAX_SIZE would be 1MB... That is arbitrary and too 
>>> small
>>> considering that many hardware can execute larger IOs than that.
>>>
>> 
>> When I tested 32MB chunk read with O_DIRECT in android, all pages of 32MB
>> is merged into a bio structure.
>> And elapsed time to merge complete was about 2ms.
>> It means first bio-submit is after 2ms.
>> If bio size is limited with 1MB with this patch, first bio-submit is about
>> 100us by bio_full operation.
>
>bio_submit() will split the large BIO case into multiple requests while the
>small BIO case will likely result one or two requests only. That likely explain
>the time difference here. However, for the large case, the 2ms will issue ALL
>requests needed for processing the entire 32MB user IO while the 1MB bio case
>will need 32 different bio_submit() calls. So what is the actual total latency
>difference for the entire 32MB user IO ? That is I think what needs to be
>compared here.
>
>Also, what is your device max_sectors_kb and max queue depth ?
>

32MB total latency is about 19ms including merge time without this patch.
But with this patch, total latency is about 17ms including merge time too.
Actually 32MB read time from device is same - about 16.7ms - in driver layer.
No need to hold more I/O than max_sectors_kb during bio merge.
My device is UFS. and max_sectors_kb is 1MB, queue depth is 32.

>> It's not large delay and can't be observed with low speed device.
>> But it's needed to reduce merge delay for high speed device.
>> I improved 512MB sequential read performance from 1900MB/s to 2000MB/s
>> with this patch on android platform.
>> As you said, 1MB might be small for some device.
>> But method is needed to re-size, or select the bio max size.
>
>At the very least, I think that such limit should not be arbitrary as your 
>patch
>proposes but rely on the device characteristics (e.g.
>max_hw_sectors_kb/max_sectors_kb and queue depth).
>

I agree with your opinion, I thought same as your idea. For that, deep research
is needed, proper timing to set and bio structure modification, etc ...
Current is simple patch for default bio max size.
Before applying of multipage bvec, bio max size was 1MB in kernel 4.x by 
BIO_MAX_PAGES.
So I think 1MB bio max size is reasonable as a default.

>> 
>>>
>>>>
>>>> Signed-off-by: Changheun Lee 
>>>> ---
>>>>  block/bio.c | 2 +-
>>>>  include/linux/bio.h | 3 ++-
>>>>  2 files changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/bio.c b/block/bio.c
>>>> index 1f2cc1fbe283..dbe14d675f28 100644
>>>> --- a/block/bio.c
>>>> +++ b/block/bio.c
>>>> @@ -877,7 +877,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
>>>> *page,
>>>>struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
>>>>  
>>>>if (page_is_mergeable(bv, page, len, off, same_page)) {
>>>> -  if (bio->bi_iter.bi_size > UINT_MAX - len) {
>>>> +  if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) {
>>>>*same_page = false;
>>>>return false;
>>>>}
>>>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>>>> index 1edda614f7ce..0f49b354b1f6 100644
>>>> --- a/include/linux/bio.h
>>>> +++ b/include/linux/bio.h
>>>> @@ -20,6 +20,7 @@
>>>>  #

Re: Re: [PATCH] bio: limit bio max size.

2021-01-12 Thread Changheun Lee
>On 2021/01/12 17:52, Changheun Lee wrote:
>> From: "Changheun Lee" 
>> 
>> bio size can grow up to 4GB when muli-page bvec is enabled.
>> but sometimes it would lead to inefficient behaviors.
>> in case of large chunk direct I/O, - 64MB chunk read in user space -
>> all pages for 64MB would be merged to a bio structure if memory address is
>> continued phsycally. it makes some delay to submit until merge complete.
>> bio max size should be limited as a proper size.
>
>But merging physically contiguous pages into the same bvec + later automatic 
>bio
>split on submit should give you better throughput for large IOs compared to
>having to issue a bio chain of smaller BIOs that are arbitrarily sized and will
>likely need splitting anyway (because of DMA boundaries etc).
>
>Do you have a specific case where you see higher performance with this patch
>applied ? On Intel, BIO_MAX_SIZE would be 1MB... That is arbitrary and too 
>small
>considering that many hardware can execute larger IOs than that.
>

When I tested 32MB chunk read with O_DIRECT in android, all pages of 32MB
is merged into a bio structure.
And elapsed time to merge complete was about 2ms.
It means first bio-submit is after 2ms.
If bio size is limited with 1MB with this patch, first bio-submit is about
100us by bio_full operation.
It's not large delay and can't be observed with low speed device.
But it's needed to reduce merge delay for high speed device.
I improved 512MB sequential read performance from 1900MB/s to 2000MB/s
with this patch on android platform.
As you said, 1MB might be small for some device.
But method is needed to re-size, or select the bio max size.

>
>> 
>> Signed-off-by: Changheun Lee 
>> ---
>>  block/bio.c | 2 +-
>>  include/linux/bio.h | 3 ++-
>>  2 files changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/block/bio.c b/block/bio.c
>> index 1f2cc1fbe283..dbe14d675f28 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -877,7 +877,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
>> *page,
>>  struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
>>  
>>  if (page_is_mergeable(bv, page, len, off, same_page)) {
>> -if (bio->bi_iter.bi_size > UINT_MAX - len) {
>> +if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) {
>>  *same_page = false;
>>  return false;
>>  }
>> diff --git a/include/linux/bio.h b/include/linux/bio.h
>> index 1edda614f7ce..0f49b354b1f6 100644
>> --- a/include/linux/bio.h
>> +++ b/include/linux/bio.h
>> @@ -20,6 +20,7 @@
>>  #endif
>>  
>>  #define BIO_MAX_PAGES   256
>> +#define BIO_MAX_SIZE(BIO_MAX_PAGES * PAGE_SIZE)
>>  
>>  #define bio_prio(bio)   (bio)->bi_ioprio
>>  #define bio_set_prio(bio, prio) ((bio)->bi_ioprio = prio)
>> @@ -113,7 +114,7 @@ static inline bool bio_full(struct bio *bio, unsigned 
>> len)
>>  if (bio->bi_vcnt >= bio->bi_max_vecs)
>>  return true;
>>  
>> -if (bio->bi_iter.bi_size > UINT_MAX - len)
>> +if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len)
>>  return true;
>>  
>>  return false;
>> 
>
>
>-- 
>Damien Le Moal
>Western Digital Research


[PATCH] bio: limit bio max size.

2021-01-12 Thread Changheun Lee
From: "Changheun Lee" 

bio size can grow up to 4GB when muli-page bvec is enabled.
but sometimes it would lead to inefficient behaviors.
in case of large chunk direct I/O, - 64MB chunk read in user space -
all pages for 64MB would be merged to a bio structure if memory address is
continued phsycally. it makes some delay to submit until merge complete.
bio max size should be limited as a proper size.

Signed-off-by: Changheun Lee 
---
 block/bio.c | 2 +-
 include/linux/bio.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1f2cc1fbe283..dbe14d675f28 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -877,7 +877,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page 
*page,
struct bio_vec *bv = >bi_io_vec[bio->bi_vcnt - 1];
 
if (page_is_mergeable(bv, page, len, off, same_page)) {
-   if (bio->bi_iter.bi_size > UINT_MAX - len) {
+   if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) {
*same_page = false;
return false;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1edda614f7ce..0f49b354b1f6 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -20,6 +20,7 @@
 #endif
 
 #define BIO_MAX_PAGES  256
+#define BIO_MAX_SIZE   (BIO_MAX_PAGES * PAGE_SIZE)
 
 #define bio_prio(bio)  (bio)->bi_ioprio
 #define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio)
@@ -113,7 +114,7 @@ static inline bool bio_full(struct bio *bio, unsigned len)
if (bio->bi_vcnt >= bio->bi_max_vecs)
return true;
 
-   if (bio->bi_iter.bi_size > UINT_MAX - len)
+   if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len)
return true;
 
return false;
-- 
2.28.0