Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 11:02 AM, Kent Overstreet wrote: > On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: >> Wouldn't it be easier to move both max_write_same_sectors and >> max_discard sectors to 64 bit (ie to type sector_t) and be done with the >> overflow? >> Seems to me this is far too much coding around self-imposed restrictions... > > It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I > suppose wouldn't actually increase the size of struct bio (when sector_t is 64 > bits), since struct bvec_iter has padding right now... > Which I guess we should be doing anyway. Devices are getting larger and larger, so in the long run in need to be moved to 64 bits. Cheers, Hannes -- Dr. Hannes ReineckezSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 11:02 AM, Kent Overstreet wrote: On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: Wouldn't it be easier to move both max_write_same_sectors and max_discard sectors to 64 bit (ie to type sector_t) and be done with the overflow? Seems to me this is far too much coding around self-imposed restrictions... It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I suppose wouldn't actually increase the size of struct bio (when sector_t is 64 bits), since struct bvec_iter has padding right now... Which I guess we should be doing anyway. Devices are getting larger and larger, so in the long run in need to be moved to 64 bits. Cheers, Hannes -- Dr. Hannes ReineckezSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Mike" == Mike Snitzer writes: Mike> That is the benefit. And when coupled with the new default Mike> max_discard of 64K (pending change from Jens for 4.3) this 2GB Mike> upper limit really isn't such a big deal. Unless I'm missing Mike> something... 2GB is fine for current SATA due to the stupid range descriptors. But there are some changes in the pipeline to support descriptors with bigger ranges so that will change. However, we currently do 4GB discards on SCSI so the proposed cap will cut that in half. I'm OK with that as an interim solution. though. But I'm a bit concerned about what might be lurking in dm thinp if you trip over partial blocks like in Ming's example... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Mike == Mike Snitzer snit...@redhat.com writes: Mike That is the benefit. And when coupled with the new default Mike max_discard of 64K (pending change from Jens for 4.3) this 2GB Mike upper limit really isn't such a big deal. Unless I'm missing Mike something... 2GB is fine for current SATA due to the stupid range descriptors. But there are some changes in the pipeline to support descriptors with bigger ranges so that will change. However, we currently do 4GB discards on SCSI so the proposed cap will cut that in half. I'm OK with that as an interim solution. though. But I'm a bit concerned about what might be lurking in dm thinp if you trip over partial blocks like in Ming's example... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: > On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > > Will change it to MAX_BIO_SECTORS. > > May I add your ACK? > > Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. Hope I can have your ACK soon. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: > Will change it to MAX_BIO_SECTORS. > May I add your ACK? Yes, please go ahead. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote: > On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: > > +/* > > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > > + * it is of the proper granularity as long as the granularity is a power > > + * of two. > > + */ > > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) > > Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever > to something like Kent's multipage biovecs we'll actually need it for > regular read/write bios in addition to discard and write same. > > Except for that the patch looks reasonable to me. Will change it to MAX_BIO_SECTORS. May I add your ACK? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: > +/* > + * Ensure that max discard sectors doesn't overflow bi_size and hopefully > + * it is of the proper granularity as long as the granularity is a power > + * of two. > + */ > +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever to something like Kent's multipage biovecs we'll actually need it for regular read/write bios in addition to discard and write same. Except for that the patch looks reasonable to me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote: > > "Mike" == Mike Snitzer writes: > > Mike> This will translate to all intermediate layers that might split > Mike> discards needing to worry about granularity/alignment too > Mike> (e.g. how dm-thinp will have to care because it must generate > Mike> discard mappings with associated bios based on how blocks were > Mike> mapped to thinp). > > The fundamental issue here is that alignment and granularity should > never, ever have been enforced at the top of the stack. Horrendous idea > from the very beginning. > > For the < handful of braindead devices that get confused when you do > partial or misaligned blocks we should have had a quirk that did any > range adjusting at the bottom in sd_setup_discard_cmnd(). > > There's a reason I turned discard_zeroes_data off for UNMAP! > > Wrt. the range size I don't have a problem with capping at the 32-bit > bi_size limit. We probably don't want to send commands much bigger than > that anyway. How about below? commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a Author: Ming Lin Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we set max discard sectors to (1<<31)>>9 to ensure it doesn't overflow bi_size and hopefully it is of the proper granularity as long as the granularity is a power of two. Signed-off-by: Ming Lin --- block/blk-lib.c | 47 +++ 1 file changed, 11 insertions(+), 36 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..4859e4b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err) bio_put(bio); } +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - - /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ + req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector;
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote: Mike == Mike Snitzer snit...@redhat.com writes: Mike This will translate to all intermediate layers that might split Mike discards needing to worry about granularity/alignment too Mike (e.g. how dm-thinp will have to care because it must generate Mike discard mappings with associated bios based on how blocks were Mike mapped to thinp). The fundamental issue here is that alignment and granularity should never, ever have been enforced at the top of the stack. Horrendous idea from the very beginning. For the handful of braindead devices that get confused when you do partial or misaligned blocks we should have had a quirk that did any range adjusting at the bottom in sd_setup_discard_cmnd(). There's a reason I turned discard_zeroes_data off for UNMAP! Wrt. the range size I don't have a problem with capping at the 32-bit bi_size limit. We probably don't want to send commands much bigger than that anyway. How about below? commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a Author: Ming Lin min...@ssi.samsung.com Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we set max discard sectors to (131)9 to ensure it doesn't overflow bi_size and hopefully it is of the proper granularity as long as the granularity is a power of two. Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-lib.c | 47 +++ 1 file changed, 11 insertions(+), 36 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..4859e4b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err) bio_put(bio); } +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q-limits.discard_granularity 9, 1U); - alignment = (bdev_discard_alignment(bdev) 9) % granularity; - - /* -* Ensure that max_discard_sectors is of the proper -* granularity, so that requests stay aligned after a split. -*/ - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* -* If splitting a request, and the next starting sector would be -* misaligned, stop the discard at the previous aligned sector. -*/ + req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects nr_sects - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio-bi_iter.bi_sector = sector;
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever to something like Kent's multipage biovecs we'll actually need it for regular read/write bios in addition to discard and write same. Except for that the patch looks reasonable to me. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, 2015-08-08 at 23:41 -0700, Christoph Hellwig wrote: On Sat, Aug 08, 2015 at 10:59:50PM -0700, Ming Lin wrote: +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U 31) 9) Thisn't isn't MAX_DISCARD_SECTORS, it's MAX_BIO_SECTORS. If we ever to something like Kent's multipage biovecs we'll actually need it for regular read/write bios in addition to discard and write same. Except for that the patch looks reasonable to me. Will change it to MAX_BIO_SECTORS. May I add your ACK? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: Will change it to MAX_BIO_SECTORS. May I add your ACK? Yes, please go ahead. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sun, 2015-08-09 at 00:01 -0700, Christoph Hellwig wrote: On Sat, Aug 08, 2015 at 11:55:47PM -0700, Ming Lin wrote: Will change it to MAX_BIO_SECTORS. May I add your ACK? Yes, please go ahead. Thanks. I'll send a new version of the series once device-mapper guy acks. Hi Mike, I have updated my tree. Could you pull and re-test? https://git.kernel.org/cgit/linux/kernel/git/mlin/linux.git/log/?h=block-generic-req The 2 thin-provisioning tests passed. Hope I can have your ACK soon. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
> "Mike" == Mike Snitzer writes: Mike> This will translate to all intermediate layers that might split Mike> discards needing to worry about granularity/alignment too Mike> (e.g. how dm-thinp will have to care because it must generate Mike> discard mappings with associated bios based on how blocks were Mike> mapped to thinp). The fundamental issue here is that alignment and granularity should never, ever have been enforced at the top of the stack. Horrendous idea from the very beginning. For the < handful of braindead devices that get confused when you do partial or misaligned blocks we should have had a quirk that did any range adjusting at the bottom in sd_setup_discard_cmnd(). There's a reason I turned discard_zeroes_data off for UNMAP! Wrt. the range size I don't have a problem with capping at the 32-bit bi_size limit. We probably don't want to send commands much bigger than that anyway. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: > Wouldn't it be easier to move both max_write_same_sectors and > max_discard sectors to 64 bit (ie to type sector_t) and be done with the > overflow? > Seems to me this is far too much coding around self-imposed restrictions... It's bio->bi_iter.bi_size that would have to be increased to 64 bits. Which I suppose wouldn't actually increase the size of struct bio (when sector_t is 64 bits), since struct bvec_iter has padding right now... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 01:40 AM, Ming Lin wrote: > > On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: >> I'm for solution 3: >> >> - keep blk_bio_{discard,write_same}_split, but ensure we never built >>a > 4GB bio in blkdev_issue_{discard,write_same}. > > This has problem as I mentioned in solution 1. > We need to also make sure max discard size is of proper granularity. > See below example. > > 4G: 8388608 sectors > UINT_MAX: 8388607 sectors > > dm-thinp block size = default discard granularity = 128 sectors > > blkdev_issue_discard(sector=0, nr_sectors=8388608) > > 1. Only ensure bi_size not overflow > > It doesn't work. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388606], then dm-thinp splits it to 2 bios > [0, 8388479] > [8388480, 8388606] ---> this has problem in process_discard_bio(), > because the discard size(7 sectors) covers > less than a block(128 sectors) > [8388607, 8388607] ---> same problem > > 2. Ensure bi_size not overflow and max discard size is of proper granularity > > It works. > > [start_sector, end_sector] > [0, 8388607] > [0, 8388479] > [8388480, 8388607] > > > So how about below patch? > > commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 > Author: Ming Lin > Date: Fri Aug 7 15:07:07 2015 -0700 > > block: remove split code in blkdev_issue_{discard,write_same} > > The split code in blkdev_issue_{discard,write_same} can go away > now that any driver that cares does the split. We have to make > sure bio size doesn't overflow. > > For discard, we ensure max_discard_sectors is of the proper > granularity. So if discard size > 4G, blkdev_issue_discard() always > send multiple granularity requests to lower level, except that the > last one may be not multiple granularity. > > Signed-off-by: Ming Lin > --- > block/blk-lib.c | 37 + > 1 file changed, 9 insertions(+), 28 deletions(-) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 7688ee3..e178a07 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > struct request_queue *q = bdev_get_queue(bdev); > int type = REQ_WRITE | REQ_DISCARD; > unsigned int max_discard_sectors, granularity; > - int alignment; > struct bio_batch bb; > struct bio *bio; > int ret = 0; > @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > > /* Zero-sector (unknown) and one-sector granularities are the same. */ > granularity = max(q->limits.discard_granularity >> 9, 1U); > - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; > > /* > - * Ensure that max_discard_sectors is of the proper > - * granularity, so that requests stay aligned after a split. > - */ > - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); > + * Ensure that max_discard_sectors doesn't overflow bi_size and is of > + * the proper granularity. So if discard size > 4G, > blkdev_issue_discard() > + * always split and send multiple granularity requests to lower level, > + * except that the last one may be not multiple granularity. > + */ > + max_discard_sectors = UINT_MAX >> 9; > max_discard_sectors -= max_discard_sectors % granularity; > - if (unlikely(!max_discard_sectors)) { > - /* Avoid infinite loop below. Being cautious never hurts. */ > - return -EOPNOTSUPP; > - } > > if (flags & BLKDEV_DISCARD_SECURE) { > if (!blk_queue_secdiscard(q)) > @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > blk_start_plug(); > while (nr_sects) { > unsigned int req_sects; > - sector_t end_sect, tmp; > + sector_t end_sect; > > bio = bio_alloc(gfp_mask, 1); > if (!bio) { > @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, > sector_t sector, > } > > req_sects = min_t(sector_t, nr_sects, max_discard_sectors); > - > - /* > - * If splitting a request, and the next starting sector would be > - * misaligned, stop the discard at the previous aligned sector. > - */ > end_sect = sector + req_sects; > - tmp = end_sect; > - if (req_sects < nr_sects && > - sector_div(tmp, granularity) != alignment) { > - end_sect = end_sect - alignment; > - sector_div(end_sect, granularity); > - end_sect = end_sect * granularity + alignment; > - req_sects = end_sect - sector; > - } > > bio->bi_iter.bi_sector = sector; >
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On 08/08/2015 01:40 AM, Ming Lin wrote: On Fri, 2015-08-07 at 09:30 +0200, Christoph Hellwig wrote: I'm for solution 3: - keep blk_bio_{discard,write_same}_split, but ensure we never built a 4GB bio in blkdev_issue_{discard,write_same}. This has problem as I mentioned in solution 1. We need to also make sure max discard size is of proper granularity. See below example. 4G: 8388608 sectors UINT_MAX: 8388607 sectors dm-thinp block size = default discard granularity = 128 sectors blkdev_issue_discard(sector=0, nr_sectors=8388608) 1. Only ensure bi_size not overflow It doesn't work. [start_sector, end_sector] [0, 8388607] [0, 8388606], then dm-thinp splits it to 2 bios [0, 8388479] [8388480, 8388606] --- this has problem in process_discard_bio(), because the discard size(7 sectors) covers less than a block(128 sectors) [8388607, 8388607] --- same problem 2. Ensure bi_size not overflow and max discard size is of proper granularity It works. [start_sector, end_sector] [0, 8388607] [0, 8388479] [8388480, 8388607] So how about below patch? commit 1ca2ad977255efb3c339f4ca16fb798ed5ec54f7 Author: Ming Lin min...@ssi.samsung.com Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we ensure max_discard_sectors is of the proper granularity. So if discard size 4G, blkdev_issue_discard() always send multiple granularity requests to lower level, except that the last one may be not multiple granularity. Signed-off-by: Ming Lin min...@ssi.samsung.com --- block/blk-lib.c | 37 + 1 file changed, 9 insertions(+), 28 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..e178a07 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -44,7 +44,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -58,18 +57,15 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, /* Zero-sector (unknown) and one-sector granularities are the same. */ granularity = max(q-limits.discard_granularity 9, 1U); - alignment = (bdev_discard_alignment(bdev) 9) % granularity; /* - * Ensure that max_discard_sectors is of the proper - * granularity, so that requests stay aligned after a split. - */ - max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX 9); + * Ensure that max_discard_sectors doesn't overflow bi_size and is of + * the proper granularity. So if discard size 4G, blkdev_issue_discard() + * always split and send multiple granularity requests to lower level, + * except that the last one may be not multiple granularity. + */ + max_discard_sectors = UINT_MAX 9; max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } if (flags BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) @@ -84,7 +80,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -93,20 +89,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* - * If splitting a request, and the next starting sector would be - * misaligned, stop the discard at the previous aligned sector. - */ end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects nr_sects - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio-bi_iter.bi_sector = sector; bio-bi_end_io = bio_batch_end_io; @@ -166,10 +149,8 @@ int blkdev_issue_write_same(struct block_device
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
On Sat, Aug 08, 2015 at 10:52:24AM +0200, Hannes Reinecke wrote: Wouldn't it be easier to move both max_write_same_sectors and max_discard sectors to 64 bit (ie to type sector_t) and be done with the overflow? Seems to me this is far too much coding around self-imposed restrictions... It's bio-bi_iter.bi_size that would have to be increased to 64 bits. Which I suppose wouldn't actually increase the size of struct bio (when sector_t is 64 bits), since struct bvec_iter has padding right now... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios
Mike == Mike Snitzer snit...@redhat.com writes: Mike This will translate to all intermediate layers that might split Mike discards needing to worry about granularity/alignment too Mike (e.g. how dm-thinp will have to care because it must generate Mike discard mappings with associated bios based on how blocks were Mike mapped to thinp). The fundamental issue here is that alignment and granularity should never, ever have been enforced at the top of the stack. Horrendous idea from the very beginning. For the handful of braindead devices that get confused when you do partial or misaligned blocks we should have had a quirk that did any range adjusting at the bottom in sd_setup_discard_cmnd(). There's a reason I turned discard_zeroes_data off for UNMAP! Wrt. the range size I don't have a problem with capping at the 32-bit bi_size limit. We probably don't want to send commands much bigger than that anyway. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/