Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-13 Thread Hannes Reinecke
On 08/08/2015 11:02 AM, Kent Overstreet wrote:
> On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
>> Wouldn't it be easier to move both max_write_same_sectors and
>> max_discard sectors to 64 bit (ie to type sector_t) and be done with the
>> overflow?
>> Seems to me this is far too much coding around self-imposed restrictions...
> 
> It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I
> suppose wouldn't actually increase the size of struct bio (when sector_t is 64
> bits), since struct bvec_iter has padding right now...
> 
Which I guess we should be doing anyway.
Devices are getting larger and larger, so in the long run in need to
be moved to 64 bits.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-13 Thread Hannes Reinecke
On 08/08/2015 11:02 AM, Kent Overstreet wrote:
 On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
 Wouldn't it be easier to move both max_write_same_sectors and
 max_discard sectors to 64 bit (ie to type sector_t) and be done with the
 overflow?
 Seems to me this is far too much coding around self-imposed restrictions...
 
 It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I
 suppose wouldn't actually increase the size of struct bio (when sector_t is 64
 bits), since struct bvec_iter has padding right now...
 
Which I guess we should be doing anyway.
Devices are getting larger and larger, so in the long run in need to
be moved to 64 bits.

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike> That is the benefit.  And when coupled with the new default
Mike> max_discard of 64K (pending change from Jens for 4.3) this 2GB
Mike> upper limit really isn't such a big deal.  Unless I'm missing
Mike> something...

2GB is fine for current SATA due to the stupid range descriptors. But
there are some changes in the pipeline to support descriptors with
bigger ranges so that will change.

However, we currently do 4GB discards on SCSI so the proposed cap will
cut that in half. I'm OK with that as an interim solution. though.

But I'm a bit concerned about what might be lurking in dm thinp if you
trip over partial blocks like in Ming's example...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-11 Thread Martin K. Petersen
 Mike == Mike Snitzer snit...@redhat.com writes:

Mike That is the benefit.  And when coupled with the new default
Mike max_discard of 64K (pending change from Jens for 4.3) this 2GB
Mike upper limit really isn't such a big deal.  Unless I'm missing
Mike something...

2GB is fine for current SATA due to the stupid range descriptors. But
there are some changes in the pipeline to support descriptors with
bigger ranges so that will change.

However, we currently do 4GB discards on SCSI so the proposed cap will
cut that in half. I'm OK with that as an interim solution. though.

But I'm a bit concerned about what might be lurking in dm thinp if you
trip over partial blocks like in Ming's example...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
> On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
> > Will change it to MAX_BIO_SECTORS.
> > May I add your ACK?
> 
> Yes, please go ahead.

Thanks. I'll send a new version of the series once device-mapper guy
acks.

Hi Mike,

I have updated my tree. Could you pull and re-test?
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req

The 2 thin-provisioning tests passed.
Hope I can have your ACK soon.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
> Will change it to MAX_BIO_SECTORS.
> May I add your ACK?

Yes, please go ahead.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote:
> On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
> > +/*
> > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
> > + * it is of the proper granularity as long as the granularity is a power
> > + * of two.
> > + */
> > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)
> 
> Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
> to something like Kent's multipage biovecs we'll actually need it for
> regular read/write bios in addition to discard and write same.
> 
> Except for that the patch looks reasonable to me.

Will change it to MAX_BIO_SECTORS.
May I add your ACK?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
> +/*
> + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
> + * it is of the proper granularity as long as the granularity is a power
> + * of two.
> + */
> +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)

Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
to something like Kent's multipage biovecs we'll actually need it for
regular read/write bios in addition to discard and write same.

Except for that the patch looks reasonable to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote:
> > "Mike" == Mike Snitzer  writes:
> 
> Mike> This will translate to all intermediate layers that might split
> Mike> discards needing to worry about granularity/alignment too
> Mike> (e.g. how dm-thinp will have to care because it must generate
> Mike> discard mappings with associated bios based on how blocks were
> Mike> mapped to thinp).
> 
> The fundamental issue here is that alignment and granularity should
> never, ever have been enforced at the top of the stack. Horrendous idea
> from the very beginning.
> 
> For the < handful of braindead devices that get confused when you do
> partial or misaligned blocks we should have had a quirk that did any
> range adjusting at the bottom in sd_setup_discard_cmnd().
> 
> There's a reason I turned discard_zeroes_data off for UNMAP!
> 
> Wrt. the range size I don't have a problem with capping at the 32-bit
> bi_size limit. We probably don't want to send commands much bigger than
> that anyway.

How about below?

commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a
Author: Ming Lin 
Date:   Fri Aug 7 15:07:07 2015 -0700

block: remove split code in blkdev_issue_{discard,write_same}

The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.

For discard, we set max discard sectors to (1<<31)>>9 to ensure
it doesn't overflow bi_size and hopefully it is of the proper
granularity as long as the granularity is a power of two.

Signed-off-by: Ming Lin 
---
 block/blk-lib.c | 47 +++
 1 file changed, 11 insertions(+), 36 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..4859e4b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err)
bio_put(bio);
 }
 
+/*
+ * Ensure that max discard sectors doesn't overflow bi_size and hopefully
+ * it is of the proper granularity as long as the granularity is a power
+ * of two.
+ */
+#define MAX_DISCARD_SECTORS ((1U << 31) >> 9)
+
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q->limits.discard_granularity >> 9, 1U);
-   alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
-
-   /*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
-
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug();
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect, tmp;
+   sector_t end_sect;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
break;
}
 
-   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
+   req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS);
end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects < nr_sects &&
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
 
bio->bi_iter.bi_sector = sector;

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote:
  Mike == Mike Snitzer snit...@redhat.com writes:
 
 Mike This will translate to all intermediate layers that might split
 Mike discards needing to worry about granularity/alignment too
 Mike (e.g. how dm-thinp will have to care because it must generate
 Mike discard mappings with associated bios based on how blocks were
 Mike mapped to thinp).
 
 The fundamental issue here is that alignment and granularity should
 never, ever have been enforced at the top of the stack. Horrendous idea
 from the very beginning.
 
 For the  handful of braindead devices that get confused when you do
 partial or misaligned blocks we should have had a quirk that did any
 range adjusting at the bottom in sd_setup_discard_cmnd().
 
 There's a reason I turned discard_zeroes_data off for UNMAP!
 
 Wrt. the range size I don't have a problem with capping at the 32-bit
 bi_size limit. We probably don't want to send commands much bigger than
 that anyway.

How about below?

commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a
Author: Ming Lin min...@ssi.samsung.com
Date:   Fri Aug 7 15:07:07 2015 -0700

block: remove split code in blkdev_issue_{discard,write_same}

The split code in blkdev_issue_{discard,write_same} can go away
now that any driver that cares does the split. We have to make
sure bio size doesn't overflow.

For discard, we set max discard sectors to (131)9 to ensure
it doesn't overflow bi_size and hopefully it is of the proper
granularity as long as the granularity is a power of two.

Signed-off-by: Ming Lin min...@ssi.samsung.com
---
 block/blk-lib.c | 47 +++
 1 file changed, 11 insertions(+), 36 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..4859e4b 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err)
bio_put(bio);
 }
 
+/*
+ * Ensure that max discard sectors doesn't overflow bi_size and hopefully
+ * it is of the proper granularity as long as the granularity is a power
+ * of two.
+ */
+#define MAX_DISCARD_SECTORS ((1U  31)  9)
+
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q-limits.discard_granularity  9, 1U);
-   alignment = (bdev_discard_alignment(bdev)  9) % granularity;
-
-   /*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
-
if (flags  BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
@@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
blk_start_plug(plug);
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect, tmp;
+   sector_t end_sect;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
break;
}
 
-   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
+   req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS);
end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects  nr_sects 
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
 
bio-bi_iter.bi_sector = sector;
 

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
 +/*
 + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
 + * it is of the proper granularity as long as the granularity is a power
 + * of two.
 + */
 +#define MAX_DISCARD_SECTORS ((1U  31)  9)

Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
to something like Kent's multipage biovecs we'll actually need it for
regular read/write bios in addition to discard and write same.

Except for that the patch looks reasonable to me.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote:
 On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote:
  +/*
  + * Ensure that max discard sectors doesn't overflow bi_size and hopefully
  + * it is of the proper granularity as long as the granularity is a power
  + * of two.
  + */
  +#define MAX_DISCARD_SECTORS ((1U  31)  9)
 
 Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS.  If we ever
 to something like Kent's multipage biovecs we'll actually need it for
 regular read/write bios in addition to discard and write same.
 
 Except for that the patch looks reasonable to me.

Will change it to MAX_BIO_SECTORS.
May I add your ACK?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Christoph Hellwig
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
 Will change it to MAX_BIO_SECTORS.
 May I add your ACK?

Yes, please go ahead.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-09 Thread Ming Lin
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote:
 On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote:
  Will change it to MAX_BIO_SECTORS.
  May I add your ACK?
 
 Yes, please go ahead.

Thanks. I'll send a new version of the series once device-mapper guy
acks.

Hi Mike,

I have updated my tree. Could you pull and re-test?
https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req

The 2 thin-provisioning tests passed.
Hope I can have your ACK soon.

Thanks.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike> This will translate to all intermediate layers that might split
Mike> discards needing to worry about granularity/alignment too
Mike> (e.g. how dm-thinp will have to care because it must generate
Mike> discard mappings with associated bios based on how blocks were
Mike> mapped to thinp).

The fundamental issue here is that alignment and granularity should
never, ever have been enforced at the top of the stack. Horrendous idea
from the very beginning.

For the < handful of braindead devices that get confused when you do
partial or misaligned blocks we should have had a quirk that did any
range adjusting at the bottom in sd_setup_discard_cmnd().

There's a reason I turned discard_zeroes_data off for UNMAP!

Wrt. the range size I don't have a problem with capping at the 32-bit
bi_size limit. We probably don't want to send commands much bigger than
that anyway.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Kent Overstreet
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
> Wouldn't it be easier to move both max_write_same_sectors and
> max_discard sectors to 64 bit (ie to type sector_t) and be done with the
> overflow?
> Seems to me this is far too much coding around self-imposed restrictions...

It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I
suppose wouldn't actually increase the size of struct bio (when sector_t is 64
bits), since struct bvec_iter has padding right now...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Hannes Reinecke
On 08/08/2015 01:40 AM, Ming Lin wrote:
> 
> On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
>> I'm for solution 3:
>>
>>  - keep blk_bio_{discard,write_same}_split, but ensure we never built
>>a > 4GB bio in blkdev_issue_{discard,write_same}.
> 
> This has problem as I mentioned in solution 1.
> We need to also make sure max discard size is of proper granularity.
> See below example.
> 
>   4G: 8388608 sectors
> UINT_MAX: 8388607 sectors
> 
> dm-thinp block size = default discard granularity = 128 sectors
> 
> blkdev_issue_discard(sector=0, nr_sectors=8388608)
> 
> 1. Only ensure bi_size not overflow
> 
> It doesn't work.
> 
> [start_sector, end_sector]
> [0, 8388607]
> [0, 8388606], then dm-thinp splits it to 2 bios
> [0, 8388479]
> [8388480, 8388606] ---> this has problem in process_discard_bio(),
> because the discard size(7 sectors) covers 
> less than a block(128 sectors)
> [8388607, 8388607] ---> same problem 
> 
> 2. Ensure bi_size not overflow and max discard size is of proper granularity
> 
> It works.
> 
> [start_sector, end_sector]
> [0, 8388607]
> [0, 8388479]
> [8388480, 8388607]
> 
> 
> So how about below patch?
> 
> commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7
> Author: Ming Lin 
> Date:   Fri Aug 7 15:07:07 2015 -0700
> 
> block: remove split code in blkdev_issue_{discard,write_same}
> 
> The split code in blkdev_issue_{discard,write_same} can go away
> now that any driver that cares does the split. We have to make
> sure bio size doesn't overflow.
> 
> For discard, we ensure max_discard_sectors is of the proper
> granularity. So if discard size > 4G, blkdev_issue_discard() always
> send multiple granularity requests to lower level, except that the
> last one may be not multiple granularity.
> 
> Signed-off-by: Ming Lin 
> ---
>  block/blk-lib.c | 37 +
>  1 file changed, 9 insertions(+), 28 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 7688ee3..e178a07 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   struct request_queue *q = bdev_get_queue(bdev);
>   int type = REQ_WRITE | REQ_DISCARD;
>   unsigned int max_discard_sectors, granularity;
> - int alignment;
>   struct bio_batch bb;
>   struct bio *bio;
>   int ret = 0;
> @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>  
>   /* Zero-sector (unknown) and one-sector granularities are the same.  */
>   granularity = max(q->limits.discard_granularity >> 9, 1U);
> - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
>  
>   /*
> -  * Ensure that max_discard_sectors is of the proper
> -  * granularity, so that requests stay aligned after a split.
> -  */
> - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> +  * Ensure that max_discard_sectors doesn't overflow bi_size and is of
> +  * the proper granularity. So if discard size > 4G, 
> blkdev_issue_discard()
> +  * always split and send multiple granularity requests to lower level,
> +  * except that the last one may be not multiple granularity.
> + */
> + max_discard_sectors = UINT_MAX >> 9;
>   max_discard_sectors -= max_discard_sectors % granularity;
> - if (unlikely(!max_discard_sectors)) {
> - /* Avoid infinite loop below. Being cautious never hurts. */
> - return -EOPNOTSUPP;
> - }
>  
>   if (flags & BLKDEV_DISCARD_SECURE) {
>   if (!blk_queue_secdiscard(q))
> @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   blk_start_plug();
>   while (nr_sects) {
>   unsigned int req_sects;
> - sector_t end_sect, tmp;
> + sector_t end_sect;
>  
>   bio = bio_alloc(gfp_mask, 1);
>   if (!bio) {
> @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>   }
>  
>   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
> -
> - /*
> -  * If splitting a request, and the next starting sector would be
> -  * misaligned, stop the discard at the previous aligned sector.
> -  */
>   end_sect = sector + req_sects;
> - tmp = end_sect;
> - if (req_sects < nr_sects &&
> - sector_div(tmp, granularity) != alignment) {
> - end_sect = end_sect - alignment;
> - sector_div(end_sect, granularity);
> - end_sect = end_sect * granularity + alignment;
> - req_sects = end_sect - sector;
> - }
>  
>   bio->bi_iter.bi_sector = sector;
> 

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Hannes Reinecke
On 08/08/2015 01:40 AM, Ming Lin wrote:
 
 On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote:
 I'm for solution 3:

  - keep blk_bio_{discard,write_same}_split, but ensure we never built
a  4GB bio in blkdev_issue_{discard,write_same}.
 
 This has problem as I mentioned in solution 1.
 We need to also make sure max discard size is of proper granularity.
 See below example.
 
   4G: 8388608 sectors
 UINT_MAX: 8388607 sectors
 
 dm-thinp block size = default discard granularity = 128 sectors
 
 blkdev_issue_discard(sector=0, nr_sectors=8388608)
 
 1. Only ensure bi_size not overflow
 
 It doesn't work.
 
 [start_sector, end_sector]
 [0, 8388607]
 [0, 8388606], then dm-thinp splits it to 2 bios
 [0, 8388479]
 [8388480, 8388606] --- this has problem in process_discard_bio(),
 because the discard size(7 sectors) covers 
 less than a block(128 sectors)
 [8388607, 8388607] --- same problem 
 
 2. Ensure bi_size not overflow and max discard size is of proper granularity
 
 It works.
 
 [start_sector, end_sector]
 [0, 8388607]
 [0, 8388479]
 [8388480, 8388607]
 
 
 So how about below patch?
 
 commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7
 Author: Ming Lin min...@ssi.samsung.com
 Date:   Fri Aug 7 15:07:07 2015 -0700
 
 block: remove split code in blkdev_issue_{discard,write_same}
 
 The split code in blkdev_issue_{discard,write_same} can go away
 now that any driver that cares does the split. We have to make
 sure bio size doesn't overflow.
 
 For discard, we ensure max_discard_sectors is of the proper
 granularity. So if discard size  4G, blkdev_issue_discard() always
 send multiple granularity requests to lower level, except that the
 last one may be not multiple granularity.
 
 Signed-off-by: Ming Lin min...@ssi.samsung.com
 ---
  block/blk-lib.c | 37 +
  1 file changed, 9 insertions(+), 28 deletions(-)
 
 diff --git a/block/blk-lib.c b/block/blk-lib.c
 index 7688ee3..e178a07 100644
 --- a/block/blk-lib.c
 +++ b/block/blk-lib.c
 @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
   struct request_queue *q = bdev_get_queue(bdev);
   int type = REQ_WRITE | REQ_DISCARD;
   unsigned int max_discard_sectors, granularity;
 - int alignment;
   struct bio_batch bb;
   struct bio *bio;
   int ret = 0;
 @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
  
   /* Zero-sector (unknown) and one-sector granularities are the same.  */
   granularity = max(q-limits.discard_granularity  9, 1U);
 - alignment = (bdev_discard_alignment(bdev)  9) % granularity;
  
   /*
 -  * Ensure that max_discard_sectors is of the proper
 -  * granularity, so that requests stay aligned after a split.
 -  */
 - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
 +  * Ensure that max_discard_sectors doesn't overflow bi_size and is of
 +  * the proper granularity. So if discard size  4G, 
 blkdev_issue_discard()
 +  * always split and send multiple granularity requests to lower level,
 +  * except that the last one may be not multiple granularity.
 + */
 + max_discard_sectors = UINT_MAX  9;
   max_discard_sectors -= max_discard_sectors % granularity;
 - if (unlikely(!max_discard_sectors)) {
 - /* Avoid infinite loop below. Being cautious never hurts. */
 - return -EOPNOTSUPP;
 - }
  
   if (flags  BLKDEV_DISCARD_SECURE) {
   if (!blk_queue_secdiscard(q))
 @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
   blk_start_plug(plug);
   while (nr_sects) {
   unsigned int req_sects;
 - sector_t end_sect, tmp;
 + sector_t end_sect;
  
   bio = bio_alloc(gfp_mask, 1);
   if (!bio) {
 @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
 sector_t sector,
   }
  
   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
 -
 - /*
 -  * If splitting a request, and the next starting sector would be
 -  * misaligned, stop the discard at the previous aligned sector.
 -  */
   end_sect = sector + req_sects;
 - tmp = end_sect;
 - if (req_sects  nr_sects 
 - sector_div(tmp, granularity) != alignment) {
 - end_sect = end_sect - alignment;
 - sector_div(end_sect, granularity);
 - end_sect = end_sect * granularity + alignment;
 - req_sects = end_sect - sector;
 - }
  
   bio-bi_iter.bi_sector = sector;
   bio-bi_end_io = bio_batch_end_io;
 @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device 

Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Kent Overstreet
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote:
 Wouldn't it be easier to move both max_write_same_sectors and
 max_discard sectors to 64 bit (ie to type sector_t) and be done with the
 overflow?
 Seems to me this is far too much coding around self-imposed restrictions...

It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I
suppose wouldn't actually increase the size of struct bio (when sector_t is 64
bits), since struct bvec_iter has padding right now...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios

2015-08-08 Thread Martin K. Petersen
 Mike == Mike Snitzer snit...@redhat.com writes:

Mike This will translate to all intermediate layers that might split
Mike discards needing to worry about granularity/alignment too
Mike (e.g. how dm-thinp will have to care because it must generate
Mike discard mappings with associated bios based on how blocks were
Mike mapped to thinp).

The fundamental issue here is that alignment and granularity should
never, ever have been enforced at the top of the stack. Horrendous idea
from the very beginning.

For the  handful of braindead devices that get confused when you do
partial or misaligned blocks we should have had a quirk that did any
range adjusting at the bottom in sd_setup_discard_cmnd().

There's a reason I turned discard_zeroes_data off for UNMAP!

Wrt. the range size I don't have a problem with capping at the 32-bit
bi_size limit. We probably don't want to send commands much bigger than
that anyway.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/