Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-09-15 Thread Christoph Hellwig
On Thu, Sep 10, 2020 at 01:15:41PM -0400, Mike Snitzer wrote:
> > I'll move it to blk_register_queue, which should work just fine.
> 
> That'll work for initial DM table load as part of DM device creation
> (dm_setup_md_queue).  But it won't account for DM table reloads that
> might change underlying devices on a live DM device (done using
> __bind).
> 
> Both dm_setup_md_queue() and __bind() call dm_table_set_restrictions()
> to set/update queue_limits.  It feels like __bind() will need to call a
> new block helper to set/update parts of queue_limits (e.g. ra_pages and
> io_pages).
> 
> Any chance you're open to factoring out that block function as an
> exported symbol for use by blk_register_queue() and code like DM's
> __bind()?

I agree with the problem statement.  OTOH adding an exported helper
for two trivial assignments seems a little silly..

For now I'll just keep the open coded ->io_pages assignment in
dm.  Note that dm doesn't currently update the ->ra_pages value
based on the underlying devices, so an incremental patch to do that
might be useful as well.


Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-09-10 Thread Mike Snitzer
On Thu, Sep 10 2020 at  5:28am -0400,
Christoph Hellwig  wrote:

> On Wed, Sep 02, 2020 at 12:20:07PM -0400, Mike Snitzer wrote:
> > On Wed, Sep 02 2020 at 11:11am -0400,
> > Christoph Hellwig  wrote:
> > 
> > > On Wed, Aug 26, 2020 at 06:07:38PM -0400, Mike Snitzer wrote:
> > > > On Sun, Jul 26 2020 at 11:03am -0400,
> > > > Christoph Hellwig  wrote:
> > > > 
> > > > > Drivers shouldn't really mess with the readahead size, as that is a VM
> > > > > concept.  Instead set it based on the optimal I/O size by lifting the
> > > > > algorithm from the md driver when registering the disk.  Also set
> > > > > bdi->io_pages there as well by applying the same scheme based on
> > > > > max_sectors.
> > > > > 
> > > > > Signed-off-by: Christoph Hellwig 
> > > > > ---
> > > > >  block/blk-settings.c |  5 ++---
> > > > >  block/blk-sysfs.c|  1 -
> > > > >  block/genhd.c| 13 +++--
> > > > >  drivers/block/aoe/aoeblk.c   |  2 --
> > > > >  drivers/block/drbd/drbd_nl.c | 12 +---
> > > > >  drivers/md/bcache/super.c|  4 
> > > > >  drivers/md/dm-table.c|  3 ---
> > > > >  drivers/md/raid0.c   | 16 
> > > > >  drivers/md/raid10.c  | 24 +---
> > > > >  drivers/md/raid5.c   | 13 +
> > > > >  10 files changed, 16 insertions(+), 77 deletions(-)
> > > > 
> > > > 
> > > > In general these changes need a solid audit relative to stacking
> > > > drivers.  That is, the limits stacking methods (blk_stack_limits)
> > > > vs lower level allocation methods (__device_add_disk).
> > > > 
> > > > You optimized for lowlevel __device_add_disk establishing the bdi's
> > > > ra_pages and io_pages.  That is at the beginning of disk allocation,
> > > > well before any build up of stacking driver's queue_io_opt() -- which
> > > > was previously done in disk_stack_limits or driver specific methods
> > > > (e.g. dm_table_set_restrictions) that are called _after_ all the limits
> > > > stacking occurs.
> > > > 
> > > > By inverting the setting of the bdi's ra_pages and io_pages to be done
> > > > so early in __device_add_disk it'll break properly setting these values
> > > > for at least DM afaict.
> > > 
> > > ra_pages never got inherited by stacking drivers, check it by modifying
> > > it on an underlying device and then creating a trivial dm or md one.
> > 
> > Sure, not saying that it did.  But if the goal is to set ra_pages based
> > on io_opt then to do that correctly on stacking drivers it must be done
> > in terms of limits stacking right?  Or at least done at a location that
> > is after the limits stacking has occurred?  So should DM just open-code
> > setting ra_pages like it did for io_pages?
> > 
> > Because setting ra_pages in __device_add_disk() is way too early for DM
> > -- given it uses device_add_disk_no_queue_reg via add_disk_no_queue_reg
> > at DM device creation (before stacking all underlying devices' limits).
> 
> I'll move it to blk_register_queue, which should work just fine.

That'll work for initial DM table load as part of DM device creation
(dm_setup_md_queue).  But it won't account for DM table reloads that
might change underlying devices on a live DM device (done using
__bind).

Both dm_setup_md_queue() and __bind() call dm_table_set_restrictions()
to set/update queue_limits.  It feels like __bind() will need to call a
new block helper to set/update parts of queue_limits (e.g. ra_pages and
io_pages).

Any chance you're open to factoring out that block function as an
exported symbol for use by blk_register_queue() and code like DM's
__bind()?

Thanks,
Mike



Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-09-10 Thread Christoph Hellwig
On Wed, Sep 02, 2020 at 12:20:07PM -0400, Mike Snitzer wrote:
> On Wed, Sep 02 2020 at 11:11am -0400,
> Christoph Hellwig  wrote:
> 
> > On Wed, Aug 26, 2020 at 06:07:38PM -0400, Mike Snitzer wrote:
> > > On Sun, Jul 26 2020 at 11:03am -0400,
> > > Christoph Hellwig  wrote:
> > > 
> > > > Drivers shouldn't really mess with the readahead size, as that is a VM
> > > > concept.  Instead set it based on the optimal I/O size by lifting the
> > > > algorithm from the md driver when registering the disk.  Also set
> > > > bdi->io_pages there as well by applying the same scheme based on
> > > > max_sectors.
> > > > 
> > > > Signed-off-by: Christoph Hellwig 
> > > > ---
> > > >  block/blk-settings.c |  5 ++---
> > > >  block/blk-sysfs.c|  1 -
> > > >  block/genhd.c| 13 +++--
> > > >  drivers/block/aoe/aoeblk.c   |  2 --
> > > >  drivers/block/drbd/drbd_nl.c | 12 +---
> > > >  drivers/md/bcache/super.c|  4 
> > > >  drivers/md/dm-table.c|  3 ---
> > > >  drivers/md/raid0.c   | 16 
> > > >  drivers/md/raid10.c  | 24 +---
> > > >  drivers/md/raid5.c   | 13 +
> > > >  10 files changed, 16 insertions(+), 77 deletions(-)
> > > 
> > > 
> > > In general these changes need a solid audit relative to stacking
> > > drivers.  That is, the limits stacking methods (blk_stack_limits)
> > > vs lower level allocation methods (__device_add_disk).
> > > 
> > > You optimized for lowlevel __device_add_disk establishing the bdi's
> > > ra_pages and io_pages.  That is at the beginning of disk allocation,
> > > well before any build up of stacking driver's queue_io_opt() -- which
> > > was previously done in disk_stack_limits or driver specific methods
> > > (e.g. dm_table_set_restrictions) that are called _after_ all the limits
> > > stacking occurs.
> > > 
> > > By inverting the setting of the bdi's ra_pages and io_pages to be done
> > > so early in __device_add_disk it'll break properly setting these values
> > > for at least DM afaict.
> > 
> > ra_pages never got inherited by stacking drivers, check it by modifying
> > it on an underlying device and then creating a trivial dm or md one.
> 
> Sure, not saying that it did.  But if the goal is to set ra_pages based
> on io_opt then to do that correctly on stacking drivers it must be done
> in terms of limits stacking right?  Or at least done at a location that
> is after the limits stacking has occurred?  So should DM just open-code
> setting ra_pages like it did for io_pages?
> 
> Because setting ra_pages in __device_add_disk() is way too early for DM
> -- given it uses device_add_disk_no_queue_reg via add_disk_no_queue_reg
> at DM device creation (before stacking all underlying devices' limits).

I'll move it to blk_register_queue, which should work just fine.


Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-09-02 Thread Mike Snitzer
On Wed, Sep 02 2020 at 11:11am -0400,
Christoph Hellwig  wrote:

> On Wed, Aug 26, 2020 at 06:07:38PM -0400, Mike Snitzer wrote:
> > On Sun, Jul 26 2020 at 11:03am -0400,
> > Christoph Hellwig  wrote:
> > 
> > > Drivers shouldn't really mess with the readahead size, as that is a VM
> > > concept.  Instead set it based on the optimal I/O size by lifting the
> > > algorithm from the md driver when registering the disk.  Also set
> > > bdi->io_pages there as well by applying the same scheme based on
> > > max_sectors.
> > > 
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >  block/blk-settings.c |  5 ++---
> > >  block/blk-sysfs.c|  1 -
> > >  block/genhd.c| 13 +++--
> > >  drivers/block/aoe/aoeblk.c   |  2 --
> > >  drivers/block/drbd/drbd_nl.c | 12 +---
> > >  drivers/md/bcache/super.c|  4 
> > >  drivers/md/dm-table.c|  3 ---
> > >  drivers/md/raid0.c   | 16 
> > >  drivers/md/raid10.c  | 24 +---
> > >  drivers/md/raid5.c   | 13 +
> > >  10 files changed, 16 insertions(+), 77 deletions(-)
> > 
> > 
> > In general these changes need a solid audit relative to stacking
> > drivers.  That is, the limits stacking methods (blk_stack_limits)
> > vs lower level allocation methods (__device_add_disk).
> > 
> > You optimized for lowlevel __device_add_disk establishing the bdi's
> > ra_pages and io_pages.  That is at the beginning of disk allocation,
> > well before any build up of stacking driver's queue_io_opt() -- which
> > was previously done in disk_stack_limits or driver specific methods
> > (e.g. dm_table_set_restrictions) that are called _after_ all the limits
> > stacking occurs.
> > 
> > By inverting the setting of the bdi's ra_pages and io_pages to be done
> > so early in __device_add_disk it'll break properly setting these values
> > for at least DM afaict.
> 
> ra_pages never got inherited by stacking drivers, check it by modifying
> it on an underlying device and then creating a trivial dm or md one.

Sure, not saying that it did.  But if the goal is to set ra_pages based
on io_opt then to do that correctly on stacking drivers it must be done
in terms of limits stacking right?  Or at least done at a location that
is after the limits stacking has occurred?  So should DM just open-code
setting ra_pages like it did for io_pages?

Because setting ra_pages in __device_add_disk() is way too early for DM
-- given it uses device_add_disk_no_queue_reg via add_disk_no_queue_reg
at DM device creation (before stacking all underlying devices' limits).

> And I think that is a good thing - in general we shouldn't really mess
> with this thing from drivers if we can avoid it.  I've kept the legacy
> aoe and md parity raid cases, out of which the first looks pretty weird
> and the md one at least remotely sensible.

I don't want drivers, like DM, to have to worry about these.  So I agree
with that goal ;)

> ->io_pages is still inherited in disk_stack_limits, just like before
> so no change either.

I'm missing where, but I only looked closer at this 06/14 patch.
In it I see io_pages is no longer adjusted in disk_stack_limits().

Mike



Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-09-02 Thread Christoph Hellwig
On Wed, Aug 26, 2020 at 06:07:38PM -0400, Mike Snitzer wrote:
> On Sun, Jul 26 2020 at 11:03am -0400,
> Christoph Hellwig  wrote:
> 
> > Drivers shouldn't really mess with the readahead size, as that is a VM
> > concept.  Instead set it based on the optimal I/O size by lifting the
> > algorithm from the md driver when registering the disk.  Also set
> > bdi->io_pages there as well by applying the same scheme based on
> > max_sectors.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  block/blk-settings.c |  5 ++---
> >  block/blk-sysfs.c|  1 -
> >  block/genhd.c| 13 +++--
> >  drivers/block/aoe/aoeblk.c   |  2 --
> >  drivers/block/drbd/drbd_nl.c | 12 +---
> >  drivers/md/bcache/super.c|  4 
> >  drivers/md/dm-table.c|  3 ---
> >  drivers/md/raid0.c   | 16 
> >  drivers/md/raid10.c  | 24 +---
> >  drivers/md/raid5.c   | 13 +
> >  10 files changed, 16 insertions(+), 77 deletions(-)
> 
> 
> In general these changes need a solid audit relative to stacking
> drivers.  That is, the limits stacking methods (blk_stack_limits)
> vs lower level allocation methods (__device_add_disk).
> 
> You optimized for lowlevel __device_add_disk establishing the bdi's
> ra_pages and io_pages.  That is at the beginning of disk allocation,
> well before any build up of stacking driver's queue_io_opt() -- which
> was previously done in disk_stack_limits or driver specific methods
> (e.g. dm_table_set_restrictions) that are called _after_ all the limits
> stacking occurs.
> 
> By inverting the setting of the bdi's ra_pages and io_pages to be done
> so early in __device_add_disk it'll break properly setting these values
> for at least DM afaict.

ra_pages never got inherited by stacking drivers, check it by modifying
it on an underlying device and then creating a trivial dm or md one.
And I think that is a good thing - in general we shouldn't really mess
with this thing from drivers if we can avoid it.  I've kept the legacy
aoe and md parity raid cases, out of which the first looks pretty weird
and the md one at least remotely sensible.

->io_pages is still inherited in disk_stack_limits, just like before
so no change either.


Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-08-26 Thread Mike Snitzer
On Sun, Jul 26 2020 at 11:03am -0400,
Christoph Hellwig  wrote:

> Drivers shouldn't really mess with the readahead size, as that is a VM
> concept.  Instead set it based on the optimal I/O size by lifting the
> algorithm from the md driver when registering the disk.  Also set
> bdi->io_pages there as well by applying the same scheme based on
> max_sectors.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/blk-settings.c |  5 ++---
>  block/blk-sysfs.c|  1 -
>  block/genhd.c| 13 +++--
>  drivers/block/aoe/aoeblk.c   |  2 --
>  drivers/block/drbd/drbd_nl.c | 12 +---
>  drivers/md/bcache/super.c|  4 
>  drivers/md/dm-table.c|  3 ---
>  drivers/md/raid0.c   | 16 
>  drivers/md/raid10.c  | 24 +---
>  drivers/md/raid5.c   | 13 +
>  10 files changed, 16 insertions(+), 77 deletions(-)


In general these changes need a solid audit relative to stacking
drivers.  That is, the limits stacking methods (blk_stack_limits)
vs lower level allocation methods (__device_add_disk).

You optimized for lowlevel __device_add_disk establishing the bdi's
ra_pages and io_pages.  That is at the beginning of disk allocation,
well before any build up of stacking driver's queue_io_opt() -- which
was previously done in disk_stack_limits or driver specific methods
(e.g. dm_table_set_restrictions) that are called _after_ all the limits
stacking occurs.

By inverting the setting of the bdi's ra_pages and io_pages to be done
so early in __device_add_disk it'll break properly setting these values
for at least DM afaict.

Mike


> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 76a7e03bcd6cac..01049e9b998f1d 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -452,6 +452,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
>  void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
>  {
>   blk_limits_io_opt(&q->limits, opt);
> + q->backing_dev_info->ra_pages =
> + max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
>  }
>  EXPORT_SYMBOL(blk_queue_io_opt);
>  
> @@ -628,9 +630,6 @@ void disk_stack_limits(struct gendisk *disk, struct 
> block_device *bdev,
>   printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
>  top, bottom);
>   }
> -
> - t->backing_dev_info->io_pages =
> - t->limits.max_sectors >> (PAGE_SHIFT - 9);
>  }
>  EXPORT_SYMBOL(disk_stack_limits);
>  
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 7dda709f3ccb6f..ce418d9128a0b2 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -245,7 +245,6 @@ queue_max_sectors_store(struct request_queue *q, const 
> char *page, size_t count)
>  
>   spin_lock_irq(&q->queue_lock);
>   q->limits.max_sectors = max_sectors_kb << 1;
> - q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
>   spin_unlock_irq(&q->queue_lock);
>  
>   return ret;
> diff --git a/block/genhd.c b/block/genhd.c
> index 8b1e9f48957cb5..097d4e4bc0b8a2 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -775,6 +775,7 @@ static void __device_add_disk(struct device *parent, 
> struct gendisk *disk,
> const struct attribute_group **groups,
> bool register_queue)
>  {
> + struct request_queue *q = disk->queue;
>   dev_t devt;
>   int retval;
>  
> @@ -785,7 +786,7 @@ static void __device_add_disk(struct device *parent, 
> struct gendisk *disk,
>* registration.
>*/
>   if (register_queue)
> - elevator_init_mq(disk->queue);
> + elevator_init_mq(q);
>  
>   /* minors == 0 indicates to use ext devt from part0 and should
>* be accompanied with EXT_DEVT flag.  Make sure all
> @@ -815,10 +816,18 @@ static void __device_add_disk(struct device *parent, 
> struct gendisk *disk,
>   disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
>   disk->flags |= GENHD_FL_NO_PART_SCAN;
>   } else {
> - struct backing_dev_info *bdi = disk->queue->backing_dev_info;
> + struct backing_dev_info *bdi = q->backing_dev_info;
>   struct device *dev = disk_to_dev(disk);
>   int ret;
>  
> + /*
> +  * For read-ahead of large files to be effective, we need to
> +  * readahead at least twice the optimal I/O size.
> +  */
> + bdi->ra_pages = max(queue_io_opt(q) * 2 / PAGE_SIZE,
> + VM_READAHEAD_PAGES);
> + bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
> +
>   /* Register BDI before referencing it from bdev */
>   dev->devt = devt;
>   ret = bdi_register(bdi, "%u:%u", MAJOR(devt), MINOR(devt));
> diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
> index 5ca7216e9e01f3..89b33b402b4e52

Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-28 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 


[PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-26 Thread Christoph Hellwig
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.

Signed-off-by: Christoph Hellwig 
---
 block/blk-settings.c |  5 ++---
 block/blk-sysfs.c|  1 -
 block/genhd.c| 13 +++--
 drivers/block/aoe/aoeblk.c   |  2 --
 drivers/block/drbd/drbd_nl.c | 12 +---
 drivers/md/bcache/super.c|  4 
 drivers/md/dm-table.c|  3 ---
 drivers/md/raid0.c   | 16 
 drivers/md/raid10.c  | 24 +---
 drivers/md/raid5.c   | 13 +
 10 files changed, 16 insertions(+), 77 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 76a7e03bcd6cac..01049e9b998f1d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -452,6 +452,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
 void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
 {
blk_limits_io_opt(&q->limits, opt);
+   q->backing_dev_info->ra_pages =
+   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
@@ -628,9 +630,6 @@ void disk_stack_limits(struct gendisk *disk, struct 
block_device *bdev,
printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
   top, bottom);
}
-
-   t->backing_dev_info->io_pages =
-   t->limits.max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7dda709f3ccb6f..ce418d9128a0b2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -245,7 +245,6 @@ queue_max_sectors_store(struct request_queue *q, const char 
*page, size_t count)
 
spin_lock_irq(&q->queue_lock);
q->limits.max_sectors = max_sectors_kb << 1;
-   q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
spin_unlock_irq(&q->queue_lock);
 
return ret;
diff --git a/block/genhd.c b/block/genhd.c
index 8b1e9f48957cb5..097d4e4bc0b8a2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -775,6 +775,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
  const struct attribute_group **groups,
  bool register_queue)
 {
+   struct request_queue *q = disk->queue;
dev_t devt;
int retval;
 
@@ -785,7 +786,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * registration.
 */
if (register_queue)
-   elevator_init_mq(disk->queue);
+   elevator_init_mq(q);
 
/* minors == 0 indicates to use ext devt from part0 and should
 * be accompanied with EXT_DEVT flag.  Make sure all
@@ -815,10 +816,18 @@ static void __device_add_disk(struct device *parent, 
struct gendisk *disk,
disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
disk->flags |= GENHD_FL_NO_PART_SCAN;
} else {
-   struct backing_dev_info *bdi = disk->queue->backing_dev_info;
+   struct backing_dev_info *bdi = q->backing_dev_info;
struct device *dev = disk_to_dev(disk);
int ret;
 
+   /*
+* For read-ahead of large files to be effective, we need to
+* readahead at least twice the optimal I/O size.
+*/
+   bdi->ra_pages = max(queue_io_opt(q) * 2 / PAGE_SIZE,
+   VM_READAHEAD_PAGES);
+   bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+
/* Register BDI before referencing it from bdev */
dev->devt = devt;
ret = bdi_register(bdi, "%u:%u", MAJOR(devt), MINOR(devt));
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..89b33b402b4e52 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
-   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,6 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-   q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 650372ee2c7822..212bf711fb6b41 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1360,18 +1360,8 @@ static void drbd_setup_queue_param(struct drbd_device 

[PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-24 Thread Christoph Hellwig
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.

Signed-off-by: Christoph Hellwig 
---
 block/blk-settings.c |  5 ++---
 block/blk-sysfs.c|  1 -
 block/genhd.c| 13 +++--
 drivers/block/aoe/aoeblk.c   |  2 --
 drivers/block/drbd/drbd_nl.c | 12 +---
 drivers/md/bcache/super.c|  4 
 drivers/md/dm-table.c|  3 ---
 drivers/md/raid0.c   | 16 
 drivers/md/raid10.c  | 24 +---
 drivers/md/raid5.c   | 13 +
 10 files changed, 16 insertions(+), 77 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 76a7e03bcd6cac..01049e9b998f1d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -452,6 +452,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
 void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
 {
blk_limits_io_opt(&q->limits, opt);
+   q->backing_dev_info->ra_pages =
+   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
@@ -628,9 +630,6 @@ void disk_stack_limits(struct gendisk *disk, struct 
block_device *bdev,
printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
   top, bottom);
}
-
-   t->backing_dev_info->io_pages =
-   t->limits.max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7dda709f3ccb6f..ce418d9128a0b2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -245,7 +245,6 @@ queue_max_sectors_store(struct request_queue *q, const char 
*page, size_t count)
 
spin_lock_irq(&q->queue_lock);
q->limits.max_sectors = max_sectors_kb << 1;
-   q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
spin_unlock_irq(&q->queue_lock);
 
return ret;
diff --git a/block/genhd.c b/block/genhd.c
index 8b1e9f48957cb5..097d4e4bc0b8a2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -775,6 +775,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
  const struct attribute_group **groups,
  bool register_queue)
 {
+   struct request_queue *q = disk->queue;
dev_t devt;
int retval;
 
@@ -785,7 +786,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * registration.
 */
if (register_queue)
-   elevator_init_mq(disk->queue);
+   elevator_init_mq(q);
 
/* minors == 0 indicates to use ext devt from part0 and should
 * be accompanied with EXT_DEVT flag.  Make sure all
@@ -815,10 +816,18 @@ static void __device_add_disk(struct device *parent, 
struct gendisk *disk,
disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
disk->flags |= GENHD_FL_NO_PART_SCAN;
} else {
-   struct backing_dev_info *bdi = disk->queue->backing_dev_info;
+   struct backing_dev_info *bdi = q->backing_dev_info;
struct device *dev = disk_to_dev(disk);
int ret;
 
+   /*
+* For read-ahead of large files to be effective, we need to
+* readahead at least twice the optimal I/O size.
+*/
+   bdi->ra_pages = max(queue_io_opt(q) * 2 / PAGE_SIZE,
+   VM_READAHEAD_PAGES);
+   bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+
/* Register BDI before referencing it from bdev */
dev->devt = devt;
ret = bdi_register(bdi, "%u:%u", MAJOR(devt), MINOR(devt));
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..89b33b402b4e52 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
-   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,6 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-   q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 650372ee2c7822..212bf711fb6b41 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1360,18 +1360,8 @@ static void drbd_setup_queue_param(struct drbd_device 

Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-22 Thread Christoph Hellwig
On Wed, Jul 22, 2020 at 07:13:54AM +, Johannes Thumshirn wrote:
> On 22/07/2020 08:27, Christoph Hellwig wrote:
> > +   q->backing_dev_info->ra_pages =
> > +   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
> 
> Dumb question, wouldn't a '>> PAGE_SHIFT' be better instead of a potentially 
> costly division?
> 
> Or aren't we caring at all as it's a) not in the fast-path and b) compilers 
> can optimize it to a shift?

That's my thinking.  If anyone has a strong preference I can change
it.


Re: [PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-22 Thread Johannes Thumshirn
On 22/07/2020 08:27, Christoph Hellwig wrote:
> + q->backing_dev_info->ra_pages =
> + max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);

Dumb question, wouldn't a '>> PAGE_SHIFT' be better instead of a potentially 
costly division?

Or aren't we caring at all as it's a) not in the fast-path and b) compilers 
can optimize it to a shift?


[PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-21 Thread Christoph Hellwig
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.

Signed-off-by: Christoph Hellwig 
---
 block/blk-settings.c |  5 ++---
 block/blk-sysfs.c|  1 -
 block/genhd.c| 13 +++--
 drivers/block/aoe/aoeblk.c   |  2 --
 drivers/block/drbd/drbd_nl.c | 12 +---
 drivers/md/bcache/super.c|  4 
 drivers/md/dm-table.c|  3 ---
 drivers/md/raid0.c   | 16 
 drivers/md/raid10.c  | 24 +---
 drivers/md/raid5.c   | 13 +
 10 files changed, 16 insertions(+), 77 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 76a7e03bcd6cac..01049e9b998f1d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -452,6 +452,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
 void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
 {
blk_limits_io_opt(&q->limits, opt);
+   q->backing_dev_info->ra_pages =
+   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
@@ -628,9 +630,6 @@ void disk_stack_limits(struct gendisk *disk, struct 
block_device *bdev,
printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
   top, bottom);
}
-
-   t->backing_dev_info->io_pages =
-   t->limits.max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7dda709f3ccb6f..ce418d9128a0b2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -245,7 +245,6 @@ queue_max_sectors_store(struct request_queue *q, const char 
*page, size_t count)
 
spin_lock_irq(&q->queue_lock);
q->limits.max_sectors = max_sectors_kb << 1;
-   q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
spin_unlock_irq(&q->queue_lock);
 
return ret;
diff --git a/block/genhd.c b/block/genhd.c
index 8b1e9f48957cb5..097d4e4bc0b8a2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -775,6 +775,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
  const struct attribute_group **groups,
  bool register_queue)
 {
+   struct request_queue *q = disk->queue;
dev_t devt;
int retval;
 
@@ -785,7 +786,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * registration.
 */
if (register_queue)
-   elevator_init_mq(disk->queue);
+   elevator_init_mq(q);
 
/* minors == 0 indicates to use ext devt from part0 and should
 * be accompanied with EXT_DEVT flag.  Make sure all
@@ -815,10 +816,18 @@ static void __device_add_disk(struct device *parent, 
struct gendisk *disk,
disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
disk->flags |= GENHD_FL_NO_PART_SCAN;
} else {
-   struct backing_dev_info *bdi = disk->queue->backing_dev_info;
+   struct backing_dev_info *bdi = q->backing_dev_info;
struct device *dev = disk_to_dev(disk);
int ret;
 
+   /*
+* For read-ahead of large files to be effective, we need to
+* readahead at least twice the optimal I/O size.
+*/
+   bdi->ra_pages = max(queue_io_opt(q) * 2 / PAGE_SIZE,
+   VM_READAHEAD_PAGES);
+   bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+
/* Register BDI before referencing it from bdev */
dev->devt = devt;
ret = bdi_register(bdi, "%u:%u", MAJOR(devt), MINOR(devt));
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..89b33b402b4e52 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
-   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,6 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-   q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 650372ee2c7822..212bf711fb6b41 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1360,18 +1360,8 @@ static void drbd_setup_queue_param(struct drbd_device 

[PATCH 06/14] block: lift setting the readahead size into the block layer

2020-07-20 Thread Christoph Hellwig
Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.

Signed-off-by: Christoph Hellwig 
---
 block/blk-settings.c |  5 ++---
 block/blk-sysfs.c|  1 -
 block/genhd.c| 13 +++--
 drivers/block/aoe/aoeblk.c   |  2 --
 drivers/block/drbd/drbd_nl.c | 12 +---
 drivers/md/bcache/super.c|  4 
 drivers/md/dm-table.c|  3 ---
 drivers/md/raid0.c   | 16 
 drivers/md/raid10.c  | 24 +---
 drivers/md/raid5.c   | 13 +
 10 files changed, 16 insertions(+), 77 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 76a7e03bcd6cac..01049e9b998f1d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -452,6 +452,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
 void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
 {
blk_limits_io_opt(&q->limits, opt);
+   q->backing_dev_info->ra_pages =
+   max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
 }
 EXPORT_SYMBOL(blk_queue_io_opt);
 
@@ -628,9 +630,6 @@ void disk_stack_limits(struct gendisk *disk, struct 
block_device *bdev,
printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
   top, bottom);
}
-
-   t->backing_dev_info->io_pages =
-   t->limits.max_sectors >> (PAGE_SHIFT - 9);
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index be67952e7be229..d7cf560efa92e5 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -245,7 +245,6 @@ queue_max_sectors_store(struct request_queue *q, const char 
*page, size_t count)
 
spin_lock_irq(&q->queue_lock);
q->limits.max_sectors = max_sectors_kb << 1;
-   q->backing_dev_info->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
spin_unlock_irq(&q->queue_lock);
 
return ret;
diff --git a/block/genhd.c b/block/genhd.c
index 8b1e9f48957cb5..097d4e4bc0b8a2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -775,6 +775,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
  const struct attribute_group **groups,
  bool register_queue)
 {
+   struct request_queue *q = disk->queue;
dev_t devt;
int retval;
 
@@ -785,7 +786,7 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
 * registration.
 */
if (register_queue)
-   elevator_init_mq(disk->queue);
+   elevator_init_mq(q);
 
/* minors == 0 indicates to use ext devt from part0 and should
 * be accompanied with EXT_DEVT flag.  Make sure all
@@ -815,10 +816,18 @@ static void __device_add_disk(struct device *parent, 
struct gendisk *disk,
disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
disk->flags |= GENHD_FL_NO_PART_SCAN;
} else {
-   struct backing_dev_info *bdi = disk->queue->backing_dev_info;
+   struct backing_dev_info *bdi = q->backing_dev_info;
struct device *dev = disk_to_dev(disk);
int ret;
 
+   /*
+* For read-ahead of large files to be effective, we need to
+* readahead at least twice the optimal I/O size.
+*/
+   bdi->ra_pages = max(queue_io_opt(q) * 2 / PAGE_SIZE,
+   VM_READAHEAD_PAGES);
+   bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+
/* Register BDI before referencing it from bdev */
dev->devt = devt;
ret = bdi_register(bdi, "%u:%u", MAJOR(devt), MINOR(devt));
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..89b33b402b4e52 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
-   enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,6 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
-   q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 650372ee2c7822..212bf711fb6b41 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1360,18 +1360,8 @@ static void drbd_setup_queue_param(struct drbd_device