[PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size may get error. Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits") Signed-off-by: Fam Zheng --- v2: Fix granularity mismatch. [Martin] --- drivers/scsi/sd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index fcfeddc..a5c7e67 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk) rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); } else rw_max = BLK_DEF_MAX_SECTORS; + rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max)); /* Combine with controller limits */ q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); -- 2.9.3
Re: [PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
Fam Zheng writes: Hi Fam, > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk) > rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); > } else > rw_max = BLK_DEF_MAX_SECTORS; > + rw_max = min_not_zero(rw_max, dev_max); rw_max is in sectors, dev_max is in logical blocks. > > /* Combine with controller limits */ > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v3 00/10] be2iscsi: driver update 11.4.0.0
Jitendra Bhivare writes: > This patch is generated against for-next branch. > > v3 changes: > be2iscsi: Fix closing of connection > - Fixed per Tomas's review comments. > > v2 changes: > +be2iscsi: Update Copyright > > Jitendra Bhivare (10): > be2iscsi: Check tag in beiscsi_mccq_compl_wait > be2iscsi: Fix closing of connection > be2iscsi: Replace spin_unlock_bh with spin_lock > scsi_transport_iscsi: Use flush_work in iscsi_remove_session > be2iscsi: Increase HDQ default queue size > be2iscsi: Use num_cons field in Rx CQE > be2iscsi: Remove free_list for ASYNC handles > be2iscsi: Check size before copying ASYNC handle > be2iscsi: Update Copyright > be2iscsi: Update driver version Applied to 4.12/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: ufs: fix wrong/ambiguous fall through comments
kusumi.tomoh...@gmail.com writes: > From: Tomohiro Kusumi > > These aren't really falling through to anywhere meaningful. > > Signed-off-by: Tomohiro Kusumi Applied to 4.12/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: osd_uld: remove an unneeded NULL check
Dan Carpenter writes: > We don't call the remove() function unless probe() succeeds so "oud" > can't be NULL here. Plus, if it were NULL, we dereference it on the > next line so it would crash anyway. Applied to 4.12/scsi-queue (by hand). -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: ufs: remove the duplicated checking for supporting clkscaling
Jaehoon Chung writes: Jaehoon, > There are same conditions for checking whether supporting clkscaling > or not. When ufshcd is supporting clkscaling, active_reqs should be > decreased by two. Applied to 4.11/scsi-fixes. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Delock 89384 Sata Controller Causes Lockups Under Heavy Load
Hello, I'm new to this list and I signed up, because I found an instability with the following sata controller: Product name: Delock 89384 10 Port PCIe 2.0 x2 Low Profile retail Identifies as: 03:00.0 SATA controller: ASMedia Technology Inc. Device 0625 (rev 01) (PCIe to 10x sata controller card) Problem description: The controller works and recognizes all my drives. But under heavy load, e. g. a mdadm raid-6 resync or just a dd to a file, it keeps causing lockups and random device link resettings on multiple devices. I spend the last two weeks on replacing components in this server, the controller is definitely the problem. Everything works fine with a Marvell 9215 controller and I tried the controller with three different mainboards and kernel versions 3.2, 4.4 and 4.10. The controller or its kernel driver definitely cause these lock ups. I made sure, that all drives were properly connected. [Syslog attached at the bottom of this mail.] As I am an experienced linux user, but new to this, first questions: Is this the right place to seek for help? If not so: Where might I get help with this? If so: Does anybody have an idea, what might causes this problem. My abilities: I can test patches on the mainline kernel. I can't code, as I lack any kind of knowledge about the sata standard. I have the controller card and an empty spare device here, to run any kind of tests. Syslog of one of these resets: If the level of stress is high enough, they happen on all connected devices (seemingly random) from different manufacturers (WesternDigital and Seagate) and different types of models. So this is probably not a bug in the firmware of one of the drives. Log: Mar 24 09:01:43 Server1 kernel: [ 1807.338347] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Mar 24 09:01:43 Server1 kernel: [ 1807.340701] ata3.00: failed command: FLUSH CACHE EXT Mar 24 09:01:43 Server1 kernel: [ 1807.343078] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Mar 24 09:01:43 Server1 kernel: [ 1807.343078] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Mar 24 09:01:43 Server1 kernel: [ 1807.349717] ata3.00: status: { DRDY } Mar 24 09:01:43 Server1 kernel: [ 1807.353029] ata3: hard resetting link Mar 24 09:01:43 Server1 kernel: [ 1807.665533] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Mar 24 09:01:43 Server1 kernel: [ 1807.667000] ata3.00: configured for UDMA/133 Mar 24 09:01:43 Server1 kernel: [ 1807.667007] ata3.00: retrying FLUSH 0xea Emask 0x4 Mar 24 09:01:43 Server1 kernel: [ 1807.667164] ata3.00: device reported invalid CHS sector 0 Mar 24 09:01:43 Server1 kernel: [ 1807.667183] ata3: EH complete Whenever such a lock up happens, the whole partition is not read or writeable for at least 90 seconds and sometimes several minutes. But the system never crashed. I tried to google the controller card, didn't find much about it. Any advice would be much appreciated :). Greetings, Matthias
Re: [PATCH 7/7] sd: use ZERO_PAGE for WRITE_SAME payloads
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote: > We're never touching the contents of the page, so save a memory > allocation for these cases. > > Signed-off-by: Christoph Hellwig > --- > drivers/scsi/sd.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index c18fe9ff1f8f..af632e350ab4 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -756,7 +756,7 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd > *cmd) > u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); > u32 data_len = sdp->sector_size; > > - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); > + rq->special_vec.bv_page = ZERO_PAGE(0); > if (!rq->special_vec.bv_page) > return BLKPREP_DEFER; > rq->special_vec.bv_offset = 0; > @@ -785,7 +785,7 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd > *cmd, bool unmap) > u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); > u32 data_len = sdp->sector_size; > > - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); > + rq->special_vec.bv_page = ZERO_PAGE(0); > if (!rq->special_vec.bv_page) > return BLKPREP_DEFER; > rq->special_vec.bv_offset = 0; > @@ -1256,7 +1256,8 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt) > { > struct request *rq = SCpnt->request; > > - if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) > + if ((rq->rq_flags & RQF_SPECIAL_PAYLOAD) && > + rq->special_vec.bv_page != ZERO_PAGE(0)) > __free_page(rq->special_vec.bv_page); > > if (SCpnt->cmnd != scsi_req(rq)->cmd) { Reviewed-by: Bart Van Assche
Re: [PATCH 6/7] sd: support multi-range TRIM for ATA disks
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote: > @@ -698,13 +698,19 @@ static void sd_config_discard(struct scsi_disk *sdkp, > unsigned int mode) > break; > > case SD_LBP_ATA_TRIM: > - max_blocks = 65535 * (512 / sizeof(__le64)); > + max_ranges = 512 / sizeof(__le64); > + max_range_size = USHRT_MAX; > + max_blocks = max_ranges * max_range_size; > if (sdkp->device->ata_trim_zeroes_data) > q->limits.discard_zeroes_data = 1; > break; > } > > blk_queue_max_discard_sectors(q, max_blocks * (logical_block_size >> > 9)); > + if (max_ranges) > + blk_queue_max_discard_segments(q, max_ranges); > + if (max_range_size) > + blk_queue_max_discard_segment_size(q, max_range_size); > queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); > } Hello Christoph, Does blk_queue_max_discard_segment_size() expect a second argument that is a number of bytes? Is max_range_size a number of logical blocks that each have a size 1 << sector_shift? > @@ -826,14 +832,21 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd) > cmd->cmnd[8] = data_len; > > buf = page_address(rq->special_vec.bv_page); > - for (i = 0; i < (data_len >> 3); i++) { > - u64 n = min(nr_sectors, 0xu); > + __rq_for_each_bio(bio, rq) { > + u64 sector = bio->bi_iter.bi_sector >> (sector_shift - 9); > + u32 nr_sectors = bio->bi_iter.bi_size >> sector_shift; > > - buf[i] = cpu_to_le64(sector | (n << 48)); > - if (nr_sectors <= 0x) > - break; > - sector += 0x; > - nr_sectors -= 0x; > + do { > + u64 n = min(nr_sectors, 0xu); > + > + buf[i] = cpu_to_le64(sector | (n << 48)); > + if (nr_sectors <= 0x) > + break; > + sector += 0x; > + nr_sectors -= 0x; > + i++; > + > + } while (!WARN_ON_ONCE(i >= data_len >> 3)); > } > > cmd->allowed = SD_MAX_RETRIES; It's unfortunate that the loop-end test occurs twice (i < data_len >> 3). Please consider using put_unaligned_le64() instead of cpu_to_le64() such that the data type of buf can be changed from __le64* into void *. With that change and by introducing e.g. the following: void *end; [ ... ] end = (void *)buf + data_len; the loop variable 'i' can be eliminated. If buf[i++] = ... would be changed into *buf++ = ... then that would allow to change the two occurrences of (i < data_len >> 3) into (buf < end). Thanks, Bart.
Re: [PATCH 5/7] block: add a max_discard_segment_size queue limit
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote: > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 5a7da607ca04..3b3bd646f580 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -333,6 +333,7 @@ struct queue_limits { > unsigned short max_segments; > unsigned short max_integrity_segments; > unsigned short max_discard_segments; > + unsigned intmax_discard_segment_size; > > unsigned char misaligned; > unsigned char discard_misaligned; > @@ -1150,6 +1151,8 @@ extern void blk_queue_max_segments(struct request_queue > *, unsigned short); > extern void blk_queue_max_discard_segments(struct request_queue *, > unsigned short); > extern void blk_queue_max_segment_size(struct request_queue *, unsigned int); > +extern void blk_queue_max_discard_segment_size(struct request_queue *, > + unsigned int); > extern void blk_queue_max_discard_sectors(struct request_queue *q, > unsigned int max_discard_sectors); > extern void blk_queue_max_write_same_sectors(struct request_queue *q, > @@ -1415,6 +1418,11 @@ static inline unsigned int > queue_max_segment_size(struct request_queue *q) > return q->limits.max_segment_size; > } > > +static inline unsigned int queue_max_discard_segment_size(struct > request_queue *q) > +{ > + return q->limits.max_discard_segment_size; > +} > + > static inline unsigned short queue_logical_block_size(struct request_queue > *q) > { > int retval = 512; Hello Christoph, It's probably a good idea to add a comment somewhere that the unit of max_discard_segment_size is one byte. Additionally, should it be mentioned in the commit message that this patch limits the maximum discard segment size to 4 GB? Thanks, Bart.
Re: [PATCH 2/7] sd: provide a new ata trim provisioning mode
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote: > + case SD_LBP_ATA_TRIM: > + max_blocks = 65535 * (512 / sizeof(__le64)); > + if (sdkp->device->ata_trim_zeroes_data) > + q->limits.discard_zeroes_data = 1; > + break; Do we need a comment here that explains where the numbers 65535 and 512 come from? > + u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9); > + u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); Please consider using logical_to_sectors() instead of open-coding this function. Thanks, Bart.
Re: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors
On 03/27/2017 06:14 PM, Stephen Hemminger wrote: > Are you sure the real problem is not the one fixed by this commit? > > commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f > Author: Stephen Hemminger > Date: Tue Mar 7 09:15:53 2017 -0800 > > scsi: storvsc: Workaround for virtual DVD SCSI version > > Hyper-V host emulation of SCSI for virtual DVD device reports SCSI > version 0 (UNKNOWN) but is still capable of supporting REPORTLUN. > > Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11 > successfully with virtual DVD ROM device. What happens is that the SCSI > scan process falls back to doing sequential probing by INQUIRY. But the > storvsc driver has a previous workaround that masks/blocks all errors > reports from INQUIRY (or MODE_SENSE) commands. This workaround causes > the scan to then populate a full set of bogus LUN's on the target and > then sends kernel spinning off into a death spiral doing block reads on > the non-existent LUNs. > > By setting the correct blacklist flags, the target with the DVD device > is scanned with REPORTLUN and that works correctly. > > Patch needs to go in current 4.11, it is safe but not necessary in older > kernels. > > Signed-off-by: Stephen Hemminger > Reviewed-by: K. Y. Srinivasan > Reviewed-by: Christoph Hellwig > Signed-off-by: Martin K. Petersen > > -Original Message- > From: Joseph Salisbury [mailto:joseph.salisb...@canonical.com] > Sent: Monday, March 27, 2017 1:22 PM > To: Long Li > Cc: KY Srinivasan ; Martin K. Petersen > ; Haiyang Zhang ; Stephen > Hemminger ; j...@linux.vnet.ibm.com; > de...@linuxdriverproject.org; linux-scsi ; LKML > ; sta...@vger.kernel.org; Greg KH > > Subject: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] > scsi: storvsc: properly set residual data length on errors > > Hi Long Li, > > A kernel bug report was opened against Ubuntu [0]. After a kernel > bisect, it was found that reverting the following commit resolved this bug: > > commit 40630f462824ee24bc00d692865c86c3828094e0 > Author: Long Li > Date: Wed Dec 14 18:46:03 2016 -0800 > > scsi: storvsc: properly set residual data length on errors > > > The regression was introduced in mainline as of v4.11-rc1. It was also > cc'd to stable and has landed in v3.12.y, v4.4.y, v4.9.y and v4.10.y. > > > This regression seems pretty severe since it's preventing virtual > machines from booting. It's affecting a couple of users so far. I was > hoping to get your feedback, since you are the patch author. Do you > think gathering any additional data will help diagnose this issue, or > would it be best to submit a revert request? > > > Thanks, > > Joe > > > [0] http://pad.lv/1674635 > > Thanks for the pointer, Stephen. I'll have this patch tested and respond with the results. Joe
Re: [PATCH 1/7] ѕd: split sd_setup_discard_cmnd
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote: > + u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9); > + u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); Although I know this is an issue in the existing code and not something introduced by you: please consider using logical_to_sectors() instead of open-coding this function. Otherwise this patch looks fine to me. Thanks, Bart.
RE: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors
Are you sure the real problem is not the one fixed by this commit? commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f Author: Stephen Hemminger Date: Tue Mar 7 09:15:53 2017 -0800 scsi: storvsc: Workaround for virtual DVD SCSI version Hyper-V host emulation of SCSI for virtual DVD device reports SCSI version 0 (UNKNOWN) but is still capable of supporting REPORTLUN. Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11 successfully with virtual DVD ROM device. What happens is that the SCSI scan process falls back to doing sequential probing by INQUIRY. But the storvsc driver has a previous workaround that masks/blocks all errors reports from INQUIRY (or MODE_SENSE) commands. This workaround causes the scan to then populate a full set of bogus LUN's on the target and then sends kernel spinning off into a death spiral doing block reads on the non-existent LUNs. By setting the correct blacklist flags, the target with the DVD device is scanned with REPORTLUN and that works correctly. Patch needs to go in current 4.11, it is safe but not necessary in older kernels. Signed-off-by: Stephen Hemminger Reviewed-by: K. Y. Srinivasan Reviewed-by: Christoph Hellwig Signed-off-by: Martin K. Petersen -Original Message- From: Joseph Salisbury [mailto:joseph.salisb...@canonical.com] Sent: Monday, March 27, 2017 1:22 PM To: Long Li Cc: KY Srinivasan ; Martin K. Petersen ; Haiyang Zhang ; Stephen Hemminger ; j...@linux.vnet.ibm.com; de...@linuxdriverproject.org; linux-scsi ; LKML ; sta...@vger.kernel.org; Greg KH Subject: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors Hi Long Li, A kernel bug report was opened against Ubuntu [0]. After a kernel bisect, it was found that reverting the following commit resolved this bug: commit 40630f462824ee24bc00d692865c86c3828094e0 Author: Long Li Date: Wed Dec 14 18:46:03 2016 -0800 scsi: storvsc: properly set residual data length on errors The regression was introduced in mainline as of v4.11-rc1. It was also cc'd to stable and has landed in v3.12.y, v4.4.y, v4.9.y and v4.10.y. This regression seems pretty severe since it's preventing virtual machines from booting. It's affecting a couple of users so far. I was hoping to get your feedback, since you are the patch author. Do you think gathering any additional data will help diagnose this issue, or would it be best to submit a revert request? Thanks, Joe [0] http://pad.lv/1674635
[REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors
Hi Long Li, A kernel bug report was opened against Ubuntu [0]. After a kernel bisect, it was found that reverting the following commit resolved this bug: commit 40630f462824ee24bc00d692865c86c3828094e0 Author: Long Li Date: Wed Dec 14 18:46:03 2016 -0800 scsi: storvsc: properly set residual data length on errors The regression was introduced in mainline as of v4.11-rc1. It was also cc'd to stable and has landed in v3.12.y, v4.4.y, v4.9.y and v4.10.y. This regression seems pretty severe since it's preventing virtual machines from booting. It's affecting a couple of users so far. I was hoping to get your feedback, since you are the patch author. Do you think gathering any additional data will help diagnose this issue, or would it be best to submit a revert request? Thanks, Joe [0] http://pad.lv/1674635
Re: [PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
- Original Message - > From: "Fam Zheng" > To: linux-ker...@vger.kernel.org > Cc: "Martin K. Petersen" , f...@redhat.com, > linux-scsi@vger.kernel.org, "James E.J. > Bottomley" > Sent: Monday, March 27, 2017 10:18:31 AM > Subject: [PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable > > If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we > end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size > may get error. > > Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits") > Signed-off-by: Fam Zheng > --- > drivers/scsi/sd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index fcfeddc..e2e21ab 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk) > rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); > } else > rw_max = BLK_DEF_MAX_SECTORS; > + rw_max = min_not_zero(rw_max, dev_max); > > /* Combine with controller limits */ > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); > -- > 2.9.3 > > Looks good to me and addresses the issue we faced. FInal confirmation this change is OK to come from Martin or James. Reviewed-by: Laurence Oberman
Re: [Drbd-dev] RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
On Mon, 2017-03-27 at 10:03 -0400, Mike Snitzer wrote: > As for the blkdev_issue_zeroout() resorting to manually zeroing the > range, if the discard fails or dzd not supported, that certainly > requires DM thinp to implement manual zeroing of the head and tail of > the range if partial blocks are being zeroed. So I welcome any advances > there. It is probably something that is best left to Joe or myself to > tackle. But I'll gladly accept patches ;) Some time ago I posted a patch series for the block layer that zeroes start and tail for nonaligned discard requests if dzd has been set. See also "[PATCH v3 0/5] Make blkdev_issue_discard() submit aligned discard requests" (https://www.spinics.net/lists/linux-block/msg02360.html). Bart.
Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
On Mon, Mar 27, 2017 at 10:03:07AM -0400, Mike Snitzer wrote: > By "you" I assume you're referring to Lars? Yes. > Lars' approach for discard, > when drbd is layered on dm-thinp, feels over-engineered. Not his fault, > the way discard and zeroing got conflated certainly lends itself to > these ugly hacks. SO I do appreciate that for anything to leverage > discard_zeroes_data it needs to be reliable. Which runs counter to how > discard was implemented (discard may get silently dropped!) But that is > why dm-thinp doesn't advertise dzd. Anyway... That's exactly what this series does - remove discard_zeroes_data and use the new REQ_OP_WRITE_ZEROES for anything that wants zeroing offload. > As for the blkdev_issue_zeroout() resorting to manually zeroing the > range, if the discard fails or dzd not supported, that certainly > requires DM thinp to implement manual zeroing of the head and tail of > the range if partial blocks are being zeroed. So I welcome any advances > there. It is probably something that is best left to Joe or myself to > tackle. But I'll gladly accept patches ;) Ok, I'll happily leave this to the two of you..
[PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size may get error. Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits") Signed-off-by: Fam Zheng --- drivers/scsi/sd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index fcfeddc..e2e21ab 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk) rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); } else rw_max = BLK_DEF_MAX_SECTORS; + rw_max = min_not_zero(rw_max, dev_max); /* Combine with controller limits */ q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); -- 2.9.3
Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
On Mon, Mar 27 2017 at 5:10am -0400, Christoph Hellwig wrote: > It sounds like you don't want to support traditional discard at all, > but only WRITE ZEROES. So in many ways this series is the right way > forward. It would be nice if we could do a full blown > REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks, > similar to what hardware that implements WRITE SAME of zeroes > or WRITE ZEROES would do. I'll see if I could include that in my > series. By "you" I assume you're referring to Lars? Lars' approach for discard, when drbd is layered on dm-thinp, feels over-engineered. Not his fault, the way discard and zeroing got conflated certainly lends itself to these ugly hacks. SO I do appreciate that for anything to leverage discard_zeroes_data it needs to be reliable. Which runs counter to how discard was implemented (discard may get silently dropped!) But that is why dm-thinp doesn't advertise dzd. Anyway... As for the blkdev_issue_zeroout() resorting to manually zeroing the range, if the discard fails or dzd not supported, that certainly requires DM thinp to implement manual zeroing of the head and tail of the range if partial blocks are being zeroed. So I welcome any advances there. It is probably something that is best left to Joe or myself to tackle. But I'll gladly accept patches ;)
[PATCHv5 2/2] tcmu: Fix wrongly calculating of the base_command_size
From: Xiubo Li The t_data_nents and t_bidi_data_nents are the numbers of the segments, but it couldn't be sure the block size equals to size of the segment. For the worst case, all the blocks are discontiguous and there will need the same number of iovecs, that's to say: blocks == iovs. So here just set the number of iovs to block count needed by tcmu cmd. Signed-off-by: Xiubo Li Tested-by: Ilias Tsitsimpis --- drivers/target/target_core_user.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 65d475f..ede815c 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -408,6 +408,13 @@ static inline size_t tcmu_cmd_get_data_length(struct tcmu_cmd *tcmu_cmd) return data_length; } +static inline uint32_t tcmu_cmd_get_block_cnt(struct tcmu_cmd *tcmu_cmd) +{ + size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd); + + return data_length / DATA_BLOCK_SIZE; +} + static sense_reason_t tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd) { @@ -435,8 +442,7 @@ static inline size_t tcmu_cmd_get_data_length(struct tcmu_cmd *tcmu_cmd) * expensive to tell how many regions are freed in the bitmap */ base_command_size = max(offsetof(struct tcmu_cmd_entry, - req.iov[se_cmd->t_bidi_data_nents + - se_cmd->t_data_nents]), + req.iov[tcmu_cmd_get_block_cnt(tcmu_cmd)]), sizeof(struct tcmu_cmd_entry)); command_size = base_command_size + round_up(scsi_command_size(se_cmd->t_task_cdb), TCMU_OP_ALIGN_SIZE); -- 1.8.3.1
Re: [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES
On Thu, Mar 23, 2017 at 11:10:38AM -0400, Mike Snitzer wrote: > Not sure why you've split out the dm-kcopyd patch, likely best to just > fold it into the previous dm support patch. The dm-kcopyd patch switches to using WRITE_ZEROES instead of write same for dm as user of these interfaces. The one before just adds WRITE_ZEROES support to dm.
Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
It sounds like you don't want to support traditional discard at all, but only WRITE ZEROES. So in many ways this series is the right way forward. It would be nice if we could do a full blown REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks, similar to what hardware that implements WRITE SAME of zeroes or WRITE ZEROES would do. I'll see if I could include that in my series.
[PATCHv5 0/2] tcmu: For bugs fix only
From: Xiubo Li Changed for V5: - This only includes #1 and #2. And for old #3, #4 are still reviewing. - #1, since the issue reported by Ilias is a separate new one, and will create a new patch later. - #2, address the issue pointed out by Mike, thanks. - #1 and #2 have been tested by Ilias in BIDI case, thanks. Changed for V4: - re-order the #3, #4 at the head. - merge most of the #5 to others. Xiubo Li (2): tcmu: Fix possible overwrite of t_data_sg's last iov[] tcmu: Fix wrongly calculating of the base_command_size drivers/target/target_core_user.c | 44 +++ 1 file changed, 31 insertions(+), 13 deletions(-) -- 1.8.3.1
[PATCHv5 1/2] tcmu: Fix possible overwrite of t_data_sg's last iov[]
From: Xiubo Li If there has BIDI data, its first iov[] will overwrite the last iov[] for se_cmd->t_data_sg. To fix this, we can just increase the iov pointer, but this may introuduce a new memory leakage bug: If the se_cmd->data_length and se_cmd->t_bidi_data_sg->length are all not aligned up to the DATA_BLOCK_SIZE, the actual length needed maybe larger than just sum of them. So, this could be avoided by rounding all the data lengthes up to DATA_BLOCK_SIZE. Signed-off-by: Xiubo Li Signed-off-by: Bryant G. Ly Reviewed-by: Mike Christie Tested-by: Ilias Tsitsimpis --- drivers/target/target_core_user.c | 34 +++--- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 57defc3..65d475f 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -394,6 +394,20 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, size_t cmd_size, size_t d return true; } +static inline size_t tcmu_cmd_get_data_length(struct tcmu_cmd *tcmu_cmd) +{ + struct se_cmd *se_cmd = tcmu_cmd->se_cmd; + size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE); + + if (se_cmd->se_cmd_flags & SCF_BIDI) { + BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents)); + data_length += round_up(se_cmd->t_bidi_data_sg->length, + DATA_BLOCK_SIZE); + } + + return data_length; +} + static sense_reason_t tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd) { @@ -407,7 +421,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, size_t cmd_size, size_t d uint32_t cmd_head; uint64_t cdb_off; bool copy_to_data_area; - size_t data_length; + size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd); DECLARE_BITMAP(old_bitmap, DATA_BLOCK_BITS); if (test_bit(TCMU_DEV_BIT_BROKEN, &udev->flags)) @@ -433,11 +447,6 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, size_t cmd_size, size_t d mb = udev->mb_addr; cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */ - data_length = se_cmd->data_length; - if (se_cmd->se_cmd_flags & SCF_BIDI) { - BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents)); - data_length += se_cmd->t_bidi_data_sg->length; - } if ((command_size > (udev->cmdr_size / 2)) || data_length > udev->data_size) { pr_warn("TCMU: Request of size %zu/%zu is too big for %u/%zu " @@ -511,11 +520,14 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, size_t cmd_size, size_t d entry->req.iov_dif_cnt = 0; /* Handle BIDI commands */ - iov_cnt = 0; - alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg, - se_cmd->t_bidi_data_nents, &iov, &iov_cnt, false); - entry->req.iov_bidi_cnt = iov_cnt; - + if (se_cmd->se_cmd_flags & SCF_BIDI) { + iov_cnt = 0; + iov++; + alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg, + se_cmd->t_bidi_data_nents, &iov, &iov_cnt, + false); + entry->req.iov_bidi_cnt = iov_cnt; + } /* cmd's data_bitmap is what changed in process */ bitmap_xor(tcmu_cmd->data_bitmap, old_bitmap, udev->data_bitmap, DATA_BLOCK_BITS); -- 1.8.3.1