Re: [PATCHv3 0/5] Dynamic growing data area support
On 2017年03月21日 13:24, Nicholas A. Bellinger wrote: Hi Xiubo ! On Mon, 2017-03-20 at 17:09 +0800, lixi...@cmss.chinamobile.com wrote: From: Xiubo Li Changed for V3: - [PATCHv2 2/5] fix double usage of blocks and possible page fault call trace. - [PATCHv2 5/5] fix a mistake. Changed for V2: - [PATCHv2 1/5] just fixes some small spelling and other mistakes. And as the initial patch, here sets cmd area to 8M and data area to 1G(1M fixed and 1023M growing) - [PATCHv2 2/5] is a new one, adding global data block pool support. The max total size of the pool is 2G and all the targets will get growing blocks from here. Test this using multi-targets at the same time. - [PATCHv2 3/5] changed nothing, respin it to avoid the conflict. - [PATCHv2 4/5] and [PATCHv2 5/5] are new ones. Xiubo Li (5): tcmu: Add dynamic growing data area feature support tcmu: Add global data block pool support target/user: Fix possible overwrite of t_data_sg's last iov[] target/user: Fix wrongly calculating of the base_command_size target/user: Clean up tcmu_queue_cmd_ring drivers/target/target_core_user.c | 621 +++--- 1 file changed, 514 insertions(+), 107 deletions(-) I've not been following the details of your TCMU efforts, so will have defer to MNC + AGrover for Acked-by + Reviewed-bys here. ;) One comment on the series ordering though.. Patches #1 + #2 introduce new features, while patches #3 + #4 are bug-fixes to (existing..?) code. AFAICT, patches #3 + #4 are stand-alone that don't depend on the new features. Is that correct..? Yes, that's right. If so, I'd prefer to apply #3 + #4 to target-pending/master for v4.11-rcX (eg: bug-fixes only), and include new features in #1 + #2 and cleanup in #5 to target-pending/for-next. (eg: next merge window for v4.12-rc1). Usually the preferred way when submitting patches is to always put bug-fixes first in the series, followed by new features, further cleanups, etc. That way it's easy for a maintainer to split out / cherry-pick bug-fixes from the series as necessary, without needing to worry about dependencies in the earlier patches. That said, if patch #3 + #4 are stand-alone bug-fixes, would you be so kind to re-order them at the head of the series, and re-submit..? Sure. I will split the #3, #4 separately. Thanks, BRs Xiubo
Re: [PATCHv3 0/5] Dynamic growing data area support
Hi Xiubo ! On Mon, 2017-03-20 at 17:09 +0800, lixi...@cmss.chinamobile.com wrote: > From: Xiubo Li > > Changed for V3: > - [PATCHv2 2/5] fix double usage of blocks and possible page fault > call trace. > - [PATCHv2 5/5] fix a mistake. > > Changed for V2: > - [PATCHv2 1/5] just fixes some small spelling and other mistakes. > And as the initial patch, here sets cmd area to 8M and data area to > 1G(1M fixed and 1023M growing) > - [PATCHv2 2/5] is a new one, adding global data block pool support. > The max total size of the pool is 2G and all the targets will get > growing blocks from here. > Test this using multi-targets at the same time. > - [PATCHv2 3/5] changed nothing, respin it to avoid the conflict. > - [PATCHv2 4/5] and [PATCHv2 5/5] are new ones. > > > Xiubo Li (5): > tcmu: Add dynamic growing data area feature support > tcmu: Add global data block pool support > target/user: Fix possible overwrite of t_data_sg's last iov[] > target/user: Fix wrongly calculating of the base_command_size > target/user: Clean up tcmu_queue_cmd_ring > > drivers/target/target_core_user.c | 621 > +++--- > 1 file changed, 514 insertions(+), 107 deletions(-) > I've not been following the details of your TCMU efforts, so will have defer to MNC + AGrover for Acked-by + Reviewed-bys here. ;) One comment on the series ordering though.. Patches #1 + #2 introduce new features, while patches #3 + #4 are bug-fixes to (existing..?) code. AFAICT, patches #3 + #4 are stand-alone that don't depend on the new features. Is that correct..? If so, I'd prefer to apply #3 + #4 to target-pending/master for v4.11-rcX (eg: bug-fixes only), and include new features in #1 + #2 and cleanup in #5 to target-pending/for-next. (eg: next merge window for v4.12-rc1). Usually the preferred way when submitting patches is to always put bug-fixes first in the series, followed by new features, further cleanups, etc. That way it's easy for a maintainer to split out / cherry-pick bug-fixes from the series as necessary, without needing to worry about dependencies in the earlier patches. That said, if patch #3 + #4 are stand-alone bug-fixes, would you be so kind to re-order them at the head of the series, and re-submit..?
Re: [PATCH] scsi: sr: fix oob access in get_capabilities
On 2017/3/20 22:29, Martin K. Petersen wrote: > Kefeng Wang writes: > > Kefeng, > >> The issue still exists, the patch return zero in scsi_mode_sense(), but zero >> means >> SAM_STAT_GOOD in scsi_status_is_good(), so n will be still bigger than 512; > > OK, I checked the other users of scsi_mode_sense(). So let's keep this > fix local to sr.c for now. > > How about the following? > > > scsi: sr: Sanity check returned mode data > > Kefeng Wang discovered that old versions of the QEMU CD driver would > return mangled mode data causing us to walk off the end of the buffer in > an attempt to parse it. Sanity check the returned mode sense data. > > Cc: > Reported-by: Kefeng Wang > Signed-off-by: Martin K. Petersen > > diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c > index 0b29b9329b1c..a8f630213a1a 100644 > --- a/drivers/scsi/sr.c > +++ b/drivers/scsi/sr.c > @@ -836,6 +836,7 @@ static void get_capabilities(struct scsi_cd *cd) > unsigned char *buffer; > struct scsi_mode_data data; > struct scsi_sense_hdr sshdr; > + unsigned int ms_len = 128; > int rc, n; > > static const char *loadmech[] = > @@ -862,10 +863,11 @@ static void get_capabilities(struct scsi_cd *cd) > scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); > > /* ask for mode page 0x2a */ > - rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, 128, > + rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, ms_len, >SR_TIMEOUT, 3, &data, NULL); > move n = data.header_length + data.block_descriptor_length; here, > - if (!scsi_status_is_good(rc)) { > + if (!scsi_status_is_good(rc) || data.length > ms_len || > + data.header_length + data.block_descriptor_length > data.length) { n > data.length Tested-by: Kefeng Wang Thanks, Kefeng > /* failed, drive doesn't have capabilities mode page */ > cd->cdi.speed = 1; > cd->cdi.mask |= (CDC_CD_R | CDC_CD_RW | CDC_DVD_R | > > > > . >
[PATCH 7/7] sd: use ZERO_PAGE for WRITE_SAME payloads
We're never touching the contents of the page, so save a memory allocation for these cases. Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index c18fe9ff1f8f..af632e350ab4 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -756,7 +756,7 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd) u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); u32 data_len = sdp->sector_size; - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + rq->special_vec.bv_page = ZERO_PAGE(0); if (!rq->special_vec.bv_page) return BLKPREP_DEFER; rq->special_vec.bv_offset = 0; @@ -785,7 +785,7 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd, bool unmap) u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); u32 data_len = sdp->sector_size; - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + rq->special_vec.bv_page = ZERO_PAGE(0); if (!rq->special_vec.bv_page) return BLKPREP_DEFER; rq->special_vec.bv_offset = 0; @@ -1256,7 +1256,8 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt) { struct request *rq = SCpnt->request; - if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) + if ((rq->rq_flags & RQF_SPECIAL_PAYLOAD) && + rq->special_vec.bv_page != ZERO_PAGE(0)) __free_page(rq->special_vec.bv_page); if (SCpnt->cmnd != scsi_req(rq)->cmd) { -- 2.11.0
[PATCH 4/7] libata: simplify the trim implementation
Don't try to fake up basic SCSI logical block provisioning and WRITE SAME support, but offer support for the Linux Vendor Specific TRIM command instead. This simplifies the implementation a lot, and avoids rewriting the data out buffer in the I/O path. Note that this new command is only offered to the block layer and will fail for pass through commands. While this is theoretically a regression in the functionality offered through SG_IO the previous support was buggy and corrupted user memory by rewriting the data out buffer in place. Last but not least this removes the global ata_scsi_rbuf_lock from the trim path. Signed-off-by: Christoph Hellwig --- drivers/ata/libata-scsi.c | 179 -- 1 file changed, 28 insertions(+), 151 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index b93d7e33789a..965b9e7dbb7d 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1322,6 +1322,16 @@ static int ata_scsi_dev_config(struct scsi_device *sdev, blk_queue_flush_queueable(q, false); + if (ata_id_has_trim(dev->id) && + !(dev->horkage & ATA_HORKAGE_NOTRIM)) { + sdev->ata_trim = 1; + if (ata_id_has_zero_after_trim(dev->id) && + (dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM)) { + ata_dev_info(dev, "Enabling discard_zeroes_data\n"); + sdev->ata_trim_zeroes_data = 1; + } + } + dev->sdev = sdev; return 0; } @@ -2383,21 +2393,6 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) */ min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id); put_unaligned_be16(min_io_sectors, &rbuf[6]); - - /* -* Optimal unmap granularity. -* -* The ATA spec doesn't even know about a granularity or alignment -* for the TRIM command. We can leave away most of the unmap related -* VPD page entries, but we have specifify a granularity to signal -* that we support some form of unmap - in thise case via WRITE SAME -* with the unmap bit set. -*/ - if (ata_id_has_trim(args->id)) { - put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM, &rbuf[36]); - put_unaligned_be32(1, &rbuf[28]); - } - return 0; } @@ -2746,16 +2741,6 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf) rbuf[14] = (lowest_aligned >> 8) & 0x3f; rbuf[15] = lowest_aligned; - if (ata_id_has_trim(args->id) && - !(dev->horkage & ATA_HORKAGE_NOTRIM)) { - rbuf[14] |= 0x80; /* LBPME */ - - if (ata_id_has_zero_after_trim(args->id) && - dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { - ata_dev_info(dev, "Enabling discard_zeroes_data\n"); - rbuf[14] |= 0x40; /* LBPRZ */ - } - } if (ata_id_zoned_cap(args->id) || args->dev->class == ATA_DEV_ZAC) rbuf[12] = (1 << 4); /* RC_BASIS */ @@ -3339,141 +3324,45 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc) } /** - * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim - * @cmd: SCSI command being translated - * @trmax: Maximum number of entries that will fit in sector_size bytes. - * @sector: Starting sector - * @count: Total Range of request in logical sectors - * - * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted - * descriptor. - * - * Upto 64 entries of the format: - * 63:48 Range Length - * 47:0 LBA - * - * Range Length of 0 is ignored. - * LBA's should be sorted order and not overlap. - * - * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET - * - * Return: Number of bytes copied into sglist. - */ -static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax, - u64 sector, u32 count) -{ - struct scsi_device *sdp = cmd->device; - size_t len = sdp->sector_size; - size_t r; - __le64 *buf; - u32 i = 0; - unsigned long flags; - - WARN_ON(len > ATA_SCSI_RBUF_SIZE); - - if (len > ATA_SCSI_RBUF_SIZE) - len = ATA_SCSI_RBUF_SIZE; - - spin_lock_irqsave(&ata_scsi_rbuf_lock, flags); - buf = ((void *)ata_scsi_rbuf); - memset(buf, 0, len); - while (i < trmax) { - u64 entry = sector | - ((u64)(count > 0x ? 0x : count) << 48); - buf[i++] = __cpu_to_le64(entry); - if (count <= 0x) - break; - count -= 0x; - sector += 0x; - } - r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_
[PATCH 5/7] block: add a max_discard_segment_size queue limit
ATA only allows 16 bits, so we need a limit. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 6 ++ block/blk-merge.c | 9 + block/blk-settings.c | 14 ++ include/linux/blkdev.h | 8 4 files changed, 37 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index d772c221cc17..3eb3bd89b47a 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1486,9 +1486,15 @@ bool bio_attempt_discard_merge(struct request_queue *q, struct request *req, struct bio *bio) { unsigned short segments = blk_rq_nr_discard_segments(req); + unsigned max_segment_sectors = queue_max_discard_segment_size(q) >> 9; if (segments >= queue_max_discard_segments(q)) goto no_merge; + if (blk_rq_sectors(req) > max_segment_sectors) + goto no_merge; + if (bio_sectors(bio) > max_segment_sectors) + goto no_merge; + if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req, blk_rq_pos(req))) goto no_merge; diff --git a/block/blk-merge.c b/block/blk-merge.c index 2afa262425d1..c62a6f0325e0 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -11,6 +11,15 @@ #include "blk.h" +/* + * Split a discard bio if it doesn't fit into the overall discard request size + * of the device. Note that we don't split it here if it's over the maximum + * discard segment size to avoid creating way too many bios in that case. + * We will simply take care of never merging such a larger than segment size + * bio into a request that has other bios, and let the low-level driver take + * care of splitting the request into multiple ranges in the on the wire + * format. + */ static struct bio *blk_bio_discard_split(struct request_queue *q, struct bio *bio, struct bio_set *bs, diff --git a/block/blk-settings.c b/block/blk-settings.c index 1e7174ffc9d4..9d515ae3a405 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -93,6 +93,7 @@ void blk_set_default_limits(struct queue_limits *lim) lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; lim->virt_boundary_mask = 0; lim->max_segment_size = BLK_MAX_SEGMENT_SIZE; + lim->max_discard_segment_size = UINT_MAX; lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS; lim->max_dev_sectors = 0; lim->chunk_sectors = 0; @@ -132,6 +133,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->max_discard_segments = 1; lim->max_hw_sectors = UINT_MAX; lim->max_segment_size = UINT_MAX; + lim->max_discard_segment_size = UINT_MAX; lim->max_sectors = UINT_MAX; lim->max_dev_sectors = UINT_MAX; lim->max_write_same_sectors = UINT_MAX; @@ -376,6 +378,18 @@ void blk_queue_max_segment_size(struct request_queue *q, unsigned int max_size) EXPORT_SYMBOL(blk_queue_max_segment_size); /** + * blk_queue_max_discard_segment_size - set max segment size for discards + * @q: the request queue for the device + * @max_size: max size of a discard segment in bytes + **/ +void blk_queue_max_discard_segment_size(struct request_queue *q, + unsigned int max_size) +{ + q->limits.max_discard_segment_size = max_size; +} +EXPORT_SYMBOL_GPL(blk_queue_max_discard_segment_size); + +/** * blk_queue_logical_block_size - set logical block size for the queue * @q: the request queue for the device * @size: the logical block size, in bytes diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 5a7da607ca04..3b3bd646f580 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -333,6 +333,7 @@ struct queue_limits { unsigned short max_segments; unsigned short max_integrity_segments; unsigned short max_discard_segments; + unsigned intmax_discard_segment_size; unsigned char misaligned; unsigned char discard_misaligned; @@ -1150,6 +1151,8 @@ extern void blk_queue_max_segments(struct request_queue *, unsigned short); extern void blk_queue_max_discard_segments(struct request_queue *, unsigned short); extern void blk_queue_max_segment_size(struct request_queue *, unsigned int); +extern void blk_queue_max_discard_segment_size(struct request_queue *, + unsigned int); extern void blk_queue_max_discard_sectors(struct request_queue *q, unsigned int max_discard_sectors); extern void blk_queue_max_write_same_sectors(struct request_queue *q, @@ -1415,6 +1418,11 @@ static inline unsigned int queue_max_segment_size(struct request_queue *q) return q->limits.max_segment_size; } +static inline unsigned int queue_max_discard_segment_size(struct request_queue *q) +{ + return q->limits.max_discard_segment_size; +} + static inline unsigned short
support ranges TRIM for libata
This series implements rangeѕ discard for ATA SSDs. Compared to the initial NVMe support there are two things that complicate the ATA support: - ATA only suports 16-bit long ranges - the whole mess of generating a SCSI command first and then translating it to an ATA one. This series adds support for limited range size to the block layer, and stops translating discard commands - instead we add a new Vendor Specific SCSI command that contains the TRIM payload when the device asks for it.
[PATCH 1/7] ѕd: split sd_setup_discard_cmnd
Split sd_setup_discard_cmnd into one function per provisioning type. While this creates some very slight duplication of boilerplate code it keeps the code modular for additions of new provisioning types, and for reusing the write same functions for the upcoming scsi implementation of the Write Zeroes operation. Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 153 ++ 1 file changed, 84 insertions(+), 69 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index fcfeddc79331..b853f91fb3da 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -701,93 +701,97 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); } -/** - * sd_setup_discard_cmnd - unmap blocks on thinly provisioned device - * @sdp: scsi device to operate on - * @rq: Request to prepare - * - * Will issue either UNMAP or WRITE SAME(16) depending on preference - * indicated by target device. - **/ -static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd) +static int sd_setup_unmap_cmnd(struct scsi_cmnd *cmd) { - struct request *rq = cmd->request; struct scsi_device *sdp = cmd->device; - struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); - sector_t sector = blk_rq_pos(rq); - unsigned int nr_sectors = blk_rq_sectors(rq); - unsigned int len; - int ret; + struct request *rq = cmd->request; + u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9); + u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); + unsigned int data_len = 24; char *buf; - struct page *page; - sector >>= ilog2(sdp->sector_size) - 9; - nr_sectors >>= ilog2(sdp->sector_size) - 9; - - page = alloc_page(GFP_ATOMIC | __GFP_ZERO); - if (!page) + rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + if (!rq->special_vec.bv_page) return BLKPREP_DEFER; + rq->special_vec.bv_offset = 0; + rq->special_vec.bv_len = data_len; + rq->rq_flags |= RQF_SPECIAL_PAYLOAD; - switch (sdkp->provisioning_mode) { - case SD_LBP_UNMAP: - buf = page_address(page); - - cmd->cmd_len = 10; - cmd->cmnd[0] = UNMAP; - cmd->cmnd[8] = 24; - - put_unaligned_be16(6 + 16, &buf[0]); - put_unaligned_be16(16, &buf[2]); - put_unaligned_be64(sector, &buf[8]); - put_unaligned_be32(nr_sectors, &buf[16]); + cmd->cmd_len = 10; + cmd->cmnd[0] = UNMAP; + cmd->cmnd[8] = 24; - len = 24; - break; + buf = page_address(rq->special_vec.bv_page); + put_unaligned_be16(6 + 16, &buf[0]); + put_unaligned_be16(16, &buf[2]); + put_unaligned_be64(sector, &buf[8]); + put_unaligned_be32(nr_sectors, &buf[16]); - case SD_LBP_WS16: - cmd->cmd_len = 16; - cmd->cmnd[0] = WRITE_SAME_16; - cmd->cmnd[1] = 0x8; /* UNMAP */ - put_unaligned_be64(sector, &cmd->cmnd[2]); - put_unaligned_be32(nr_sectors, &cmd->cmnd[10]); + cmd->allowed = SD_MAX_RETRIES; + cmd->transfersize = data_len; + rq->timeout = SD_TIMEOUT; + scsi_req(rq)->resid_len = data_len; - len = sdkp->device->sector_size; - break; + return scsi_init_io(cmd); +} - case SD_LBP_WS10: - case SD_LBP_ZERO: - cmd->cmd_len = 10; - cmd->cmnd[0] = WRITE_SAME; - if (sdkp->provisioning_mode == SD_LBP_WS10) - cmd->cmnd[1] = 0x8; /* UNMAP */ - put_unaligned_be32(sector, &cmd->cmnd[2]); - put_unaligned_be16(nr_sectors, &cmd->cmnd[7]); +static int sd_setup_write_same16_cmnd(struct scsi_cmnd *cmd) +{ + struct scsi_device *sdp = cmd->device; + struct request *rq = cmd->request; + u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9); + u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); + u32 data_len = sdp->sector_size; - len = sdkp->device->sector_size; - break; + rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + if (!rq->special_vec.bv_page) + return BLKPREP_DEFER; + rq->special_vec.bv_offset = 0; + rq->special_vec.bv_len = data_len; + rq->rq_flags |= RQF_SPECIAL_PAYLOAD; - default: - ret = BLKPREP_INVALID; - goto out; - } + cmd->cmd_len = 16; + cmd->cmnd[0] = WRITE_SAME_16; + cmd->cmnd[1] = 0x8; /* UNMAP */ + put_unaligned_be64(sector, &cmd->cmnd[2]); + put_unaligned_be32(nr_sectors, &cmd->cmnd[10]); + cmd->allowed = SD_MAX_RETRIES; + cmd->transfersize = data_len; rq->timeout = SD_TIMEOUT; + s
[PATCH 2/7] sd: provide a new ata trim provisioning mode
This uses a vendor specific command to pass the ATA TRIM payload to libata without having to rewrite it in place. Support for it is indicated by a new flag in struct scsi_device that libata will set in it's slave_configure routine. A second flag indicates if TRIM will reliably zero data. Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 60 +++--- drivers/scsi/sd.h | 1 + include/scsi/scsi_device.h | 2 ++ include/scsi/scsi_proto.h | 3 +++ 4 files changed, 63 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index b853f91fb3da..cc8684818305 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -371,6 +371,7 @@ static const char *lbp_mode[] = { [SD_LBP_WS16] = "writesame_16", [SD_LBP_WS10] = "writesame_10", [SD_LBP_ZERO] = "writesame_zero", + [SD_LBP_ATA_TRIM] = "ata_trim", [SD_LBP_DISABLE]= "disabled", }; @@ -411,7 +412,7 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr, sd_config_discard(sdkp, SD_LBP_ZERO); else if (!strncmp(buf, lbp_mode[SD_LBP_DISABLE], 20)) sd_config_discard(sdkp, SD_LBP_DISABLE); - else + else /* we don't allow manual setting of SD_LBP_ATA_TRIM */ return -EINVAL; return count; @@ -653,7 +654,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) * lead to data corruption. If LBPRZ is not set, we honor the * device preference. */ - if (sdkp->lbprz) { + if (sdkp->lbprz || sdkp->device->ata_trim) { q->limits.discard_alignment = 0; q->limits.discard_granularity = logical_block_size; } else { @@ -695,6 +696,12 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) (u32)SD_MAX_WS10_BLOCKS); q->limits.discard_zeroes_data = 1; break; + + case SD_LBP_ATA_TRIM: + max_blocks = 65535 * (512 / sizeof(__le64)); + if (sdkp->device->ata_trim_zeroes_data) + q->limits.discard_zeroes_data = 1; + break; } blk_queue_max_discard_sectors(q, max_blocks * (logical_block_size >> 9)); @@ -794,6 +801,49 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd *cmd, bool unmap) return scsi_init_io(cmd); } +static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd) +{ + struct scsi_device *sdp = cmd->device; + struct request *rq = cmd->request; + u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9); + u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); + u32 data_len = sdp->sector_size, i; + __le64 *buf; + + rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); + if (!rq->special_vec.bv_page) + return BLKPREP_DEFER; + rq->special_vec.bv_offset = 0; + rq->special_vec.bv_len = data_len; + rq->rq_flags |= RQF_SPECIAL_PAYLOAD; + + /* +* Use the Linux Vendor Specific TRIM command to pass the TRIM payload +* to libata. +*/ + cmd->cmd_len = 10; + cmd->cmnd[0] = LINUX_VS_TRIM; + cmd->cmnd[8] = data_len; + + buf = page_address(rq->special_vec.bv_page); + for (i = 0; i < (data_len >> 3); i++) { + u64 n = min(nr_sectors, 0xu); + + buf[i] = cpu_to_le64(sector | (n << 48)); + if (nr_sectors <= 0x) + break; + sector += 0x; + nr_sectors -= 0x; + } + + cmd->allowed = SD_MAX_RETRIES; + cmd->transfersize = data_len; + rq->timeout = SD_TIMEOUT; + scsi_req(rq)->resid_len = data_len; + + return scsi_init_io(cmd); +} + static void sd_config_write_same(struct scsi_disk *sdkp) { struct request_queue *q = sdkp->disk->queue; @@ -1168,6 +1218,8 @@ static int sd_init_command(struct scsi_cmnd *cmd) return sd_setup_write_same10_cmnd(cmd, true); case SD_LBP_ZERO: return sd_setup_write_same10_cmnd(cmd, false); + case SD_LBP_ATA_TRIM: + return sd_setup_ata_trim_cmnd(cmd); default: return BLKPREP_INVALID; } @@ -2739,7 +2791,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp) sdkp->max_xfer_blocks = get_unaligned_be32(&buffer[8]); sdkp->opt_xfer_blocks = get_unaligned_be32(&buffer[12]); - if (buffer[3] == 0x3c) { + if (sdkp->device->ata_trim) { + sd_config_discard(sdkp, SD_LBP_ATA_TRIM); + } else if (buffer[3] == 0x3c) { unsigned int lba_count, desc_count; sdkp->max_ws_blocks = (u32)get_una
[PATCH 6/7] sd: support multi-range TRIM for ATA disks
Almost the same scheme as the older multi-range support for NVMe. Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 37 + 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index cc8684818305..c18fe9ff1f8f 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -643,7 +643,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) { struct request_queue *q = sdkp->disk->queue; unsigned int logical_block_size = sdkp->device->sector_size; - unsigned int max_blocks = 0; + unsigned int max_blocks = 0, max_ranges = 0, max_range_size = 0; q->limits.discard_zeroes_data = 0; @@ -698,13 +698,19 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) break; case SD_LBP_ATA_TRIM: - max_blocks = 65535 * (512 / sizeof(__le64)); + max_ranges = 512 / sizeof(__le64); + max_range_size = USHRT_MAX; + max_blocks = max_ranges * max_range_size; if (sdkp->device->ata_trim_zeroes_data) q->limits.discard_zeroes_data = 1; break; } blk_queue_max_discard_sectors(q, max_blocks * (logical_block_size >> 9)); + if (max_ranges) + blk_queue_max_discard_segments(q, max_ranges); + if (max_range_size) + blk_queue_max_discard_segment_size(q, max_range_size); queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q); } @@ -805,9 +811,9 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd) { struct scsi_device *sdp = cmd->device; struct request *rq = cmd->request; - u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9); - u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9); - u32 data_len = sdp->sector_size, i; + u32 sector_shift = ilog2(sdp->sector_size); + u32 data_len = sdp->sector_size, i = 0; + struct bio *bio; __le64 *buf; rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO); @@ -826,14 +832,21 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd) cmd->cmnd[8] = data_len; buf = page_address(rq->special_vec.bv_page); - for (i = 0; i < (data_len >> 3); i++) { - u64 n = min(nr_sectors, 0xu); + __rq_for_each_bio(bio, rq) { + u64 sector = bio->bi_iter.bi_sector >> (sector_shift - 9); + u32 nr_sectors = bio->bi_iter.bi_size >> sector_shift; - buf[i] = cpu_to_le64(sector | (n << 48)); - if (nr_sectors <= 0x) - break; - sector += 0x; - nr_sectors -= 0x; + do { + u64 n = min(nr_sectors, 0xu); + + buf[i] = cpu_to_le64(sector | (n << 48)); + if (nr_sectors <= 0x) + break; + sector += 0x; + nr_sectors -= 0x; + i++; + + } while (!WARN_ON_ONCE(i >= data_len >> 3)); } cmd->allowed = SD_MAX_RETRIES; -- 2.11.0
[PATCH 3/7] libata: remove SCT WRITE SAME support
This was already disabled a while ago because it caused I/O errors, and it's severly getting into the way of the discard / write zeroes rework. Signed-off-by: Christoph Hellwig --- drivers/ata/libata-scsi.c | 128 +++--- 1 file changed, 29 insertions(+), 99 deletions(-) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 1ac70744ae7b..b93d7e33789a 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3393,46 +3393,6 @@ static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax, } /** - * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same - * @cmd: SCSI command being translated - * @lba: Starting sector - * @num: Number of sectors to be zero'd. - * - * Rewrite the WRITE SAME payload to be an SCT Write Same formatted - * descriptor. - * NOTE: Writes a pattern (0's) in the foreground. - * - * Return: Number of bytes copied into sglist. - */ -static size_t ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num) -{ - struct scsi_device *sdp = cmd->device; - size_t len = sdp->sector_size; - size_t r; - u16 *buf; - unsigned long flags; - - spin_lock_irqsave(&ata_scsi_rbuf_lock, flags); - buf = ((void *)ata_scsi_rbuf); - - put_unaligned_le16(0x0002, &buf[0]); /* SCT_ACT_WRITE_SAME */ - put_unaligned_le16(0x0101, &buf[1]); /* WRITE PTRN FG */ - put_unaligned_le64(lba, &buf[2]); - put_unaligned_le64(num, &buf[6]); - put_unaligned_le32(0u, &buf[10]); /* pattern */ - - WARN_ON(len > ATA_SCSI_RBUF_SIZE); - - if (len > ATA_SCSI_RBUF_SIZE) - len = ATA_SCSI_RBUF_SIZE; - - r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len); - spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags); - - return r; -} - -/** * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same * @qc: Command to be translated * @@ -3468,26 +3428,17 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) } scsi_16_lba_len(cdb, &block, &n_block); - if (unmap) { - /* If trim is not enabled the cmd is invalid. */ - if ((dev->horkage & ATA_HORKAGE_NOTRIM) || - !ata_id_has_trim(dev->id)) { - fp = 1; - bp = 3; - goto invalid_fld; - } - /* If the request is too large the cmd is invalid */ - if (n_block > 0x * trmax) { - fp = 2; - goto invalid_fld; - } - } else { - /* If write same is not available the cmd is invalid */ - if (!ata_id_sct_write_same(dev->id)) { - fp = 1; - bp = 3; - goto invalid_fld; - } + if (!unmap || + (dev->horkage & ATA_HORKAGE_NOTRIM) || + !ata_id_has_trim(dev->id)) { + fp = 1; + bp = 3; + goto invalid_fld; + } + /* If the request is too large the cmd is invalid */ + if (n_block > 0x * trmax) { + fp = 2; + goto invalid_fld; } /* @@ -3502,49 +3453,28 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count) * is defined as number of 512 byte blocks to be transferred. */ - if (unmap) { - size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block); - if (size != len) - goto invalid_param_len; - if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) { - /* Newer devices support queued TRIM commands */ - tf->protocol = ATA_PROT_NCQ; - tf->command = ATA_CMD_FPDMA_SEND; - tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f; - tf->nsect = qc->tag << 3; - tf->hob_feature = (size / 512) >> 8; - tf->feature = size / 512; + size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block); + if (size != len) + goto invalid_param_len; - tf->auxiliary = 1; - } else { - tf->protocol = ATA_PROT_DMA; - tf->hob_feature = 0; - tf->feature = ATA_DSM_TRIM; - tf->hob_nsect = (size / 512) >> 8; - tf->nsect = size / 512; - tf->command = ATA_CMD_DSM; - } - } else { - size = ata_format_sct_write_same(scmd, block, n_block); - if (size != len) - goto invalid_param_len; + if (ata_ncq_enabled
RE: [PATCH] hpsa: fix volume offline state
> -Original Message- > From: Tomas Henzl [mailto:the...@redhat.com] > Sent: Monday, March 20, 2017 10:43 AM > To: linux-scsi@vger.kernel.org > Cc: Don Brace ; joseph.szczy...@hpe.com > Subject: [PATCH] hpsa: fix volume offline state > > EXTERNAL EMAIL > > > In a previous patch a hpsa_scsi_dev_t.volume_offline update line > has been removed, so let us put it back.. > > Fixes: 85b29008d8 (hpsa: update check for logical volume status) > > Signed-off-by: Tomas Henzl > --- > drivers/scsi/hpsa.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c > index 0d0be7754a..9d659aaace 100644 > --- a/drivers/scsi/hpsa.c > +++ b/drivers/scsi/hpsa.c > @@ -3885,6 +3885,7 @@ static int hpsa_update_device_info(struct > ctlr_info *h, > if (h->fw_support & MISC_FW_RAID_OFFLOAD_BASIC) > hpsa_get_ioaccel_status(h, scsi3addr, this_device); > volume_offline = hpsa_volume_offline(h, scsi3addr); > + this_device->volume_offline = volume_offline; > if (volume_offline == HPSA_LV_FAILED) { > rc = HPSA_LV_FAILED; > dev_err(&h->pdev->dev, > -- > 2.7.4 Acked-by: Don Brace Thanks Tomas.
[PATCH] hpsa: fix volume offline state
In a previous patch a hpsa_scsi_dev_t.volume_offline update line has been removed, so let us put it back.. Fixes: 85b29008d8 (hpsa: update check for logical volume status) Signed-off-by: Tomas Henzl --- drivers/scsi/hpsa.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 0d0be7754a..9d659aaace 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -3885,6 +3885,7 @@ static int hpsa_update_device_info(struct ctlr_info *h, if (h->fw_support & MISC_FW_RAID_OFFLOAD_BASIC) hpsa_get_ioaccel_status(h, scsi3addr, this_device); volume_offline = hpsa_volume_offline(h, scsi3addr); + this_device->volume_offline = volume_offline; if (volume_offline == HPSA_LV_FAILED) { rc = HPSA_LV_FAILED; dev_err(&h->pdev->dev, -- 2.7.4
Re: [PATCH] scsi: sr: fix oob access in get_capabilities
Kefeng Wang writes: Kefeng, > The issue still exists, the patch return zero in scsi_mode_sense(), but zero > means > SAM_STAT_GOOD in scsi_status_is_good(), so n will be still bigger than 512; OK, I checked the other users of scsi_mode_sense(). So let's keep this fix local to sr.c for now. How about the following? scsi: sr: Sanity check returned mode data Kefeng Wang discovered that old versions of the QEMU CD driver would return mangled mode data causing us to walk off the end of the buffer in an attempt to parse it. Sanity check the returned mode sense data. Cc: Reported-by: Kefeng Wang Signed-off-by: Martin K. Petersen diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index 0b29b9329b1c..a8f630213a1a 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -836,6 +836,7 @@ static void get_capabilities(struct scsi_cd *cd) unsigned char *buffer; struct scsi_mode_data data; struct scsi_sense_hdr sshdr; + unsigned int ms_len = 128; int rc, n; static const char *loadmech[] = @@ -862,10 +863,11 @@ static void get_capabilities(struct scsi_cd *cd) scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); /* ask for mode page 0x2a */ - rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, 128, + rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, ms_len, SR_TIMEOUT, 3, &data, NULL); - if (!scsi_status_is_good(rc)) { + if (!scsi_status_is_good(rc) || data.length > ms_len || + data.header_length + data.block_descriptor_length > data.length) { /* failed, drive doesn't have capabilities mode page */ cd->cdi.speed = 1; cd->cdi.mask |= (CDC_CD_R | CDC_CD_RW | CDC_DVD_R |
Re: [PATCH] scsi: libsas: fix ata xfer length
John Garry writes: > I should have added this originally to the changelog: > Fixes: ff2aeb1eb64c8a4770a6 ("libata: convert to chained sg") Added. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] qedf: fix wrong le16 conversion
On Mon, 20 Mar 2017, 8:49am -, Arnd Bergmann wrote: > gcc points out that we are converting a 16-bit integer into a 32-bit > little-endian type and assigning that to 16-bit little-endian > will end up with a zero: > > drivers/scsi/qedf/drv_fcoe_fw_funcs.c: In function > 'init_initiator_rw_fcoe_task': > include/uapi/linux/byteorder/big_endian.h:32:26: error: large integer > implicitly truncated to unsigned type [-Werror=overflow] > t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID); > > The correct solution appears to be to just use a 16-bit byte swap instead. > > Fixes: be086e7c53f1 ("qed*: Utilize Firmware 8.15.3.0") > Signed-off-by: Arnd Bergmann > --- > drivers/scsi/qedf/drv_fcoe_fw_funcs.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c > b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c > index bb812db48da6..8c65e3b034dc 100644 > --- a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c > +++ b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c > @@ -8,7 +8,7 @@ > #include "drv_fcoe_fw_funcs.h" > #include "drv_scsi_fw_funcs.h" > > -#define FCOE_RX_ID ((u32)0x) > +#define FCOE_RX_ID (0xu) > > static inline void init_common_sqe(struct fcoe_task_params *task_params, > enum fcoe_sqe_request_type request_type) > @@ -59,7 +59,7 @@ int init_initiator_rw_fcoe_task(struct fcoe_task_params > *task_params, > t_st_ctx->read_only.task_type = task_params->task_type; > SET_FIELD(t_st_ctx->read_write.flags, > FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1); > - t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID); > + t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID); > > /* Ustorm ctx */ > u_ag_ctx = &ctx->ustorm_ag_context; > @@ -151,7 +151,7 @@ int init_initiator_midpath_unsolicited_fcoe_task( > t_st_ctx->read_only.task_type = task_params->task_type; > SET_FIELD(t_st_ctx->read_write.flags, > FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1); > - t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID); > + t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID); > > /* Init Ustorm */ > u_ag_ctx = &ctx->ustorm_ag_context; > Arnd, thanks for fixing this up. Acked-by: Chad Dupuis
[PATCHv3 5/5] target/user: Clean up tcmu_queue_cmd_ring
From: Xiubo Li Add two helpers to simplify the code, and move some code out of the cmdr spin lock to reduce the size of critical region. Signed-off-by: Xiubo Li --- drivers/target/target_core_user.c | 68 +-- 1 file changed, 37 insertions(+), 31 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index e3daf15..52fa988 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -153,7 +153,7 @@ struct tcmu_cmd { /* Can't use se_cmd when cleaning up expired cmds, because if cmd has been completed then accessing se_cmd is off limits */ - uint32_t dbi_len; + uint32_t dbi_cnt; uint32_t dbi_cur; uint32_t *dbi; @@ -255,7 +255,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, int i; tcmu_cmd_reset_dbi_cur(tcmu_cmd); - for (i = 0; i < tcmu_cmd->dbi_len; i++) { + for (i = 0; i < tcmu_cmd->dbi_cnt; i++) { if (!tcmu_get_empty_block(udev, tcmu_cmd)) goto err; } @@ -263,7 +263,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, err: pr_debug("no blocks: only %u blocks available, but ask for %u\n", - tcmu_cmd->dbi_len, tcmu_cmd->dbi_cur); + tcmu_cmd->dbi_cnt, tcmu_cmd->dbi_cur); tcmu_cmd_free_data(tcmu_cmd, tcmu_cmd->dbi_cur); udev->waiting_global = true; return false; @@ -286,25 +286,29 @@ static inline void tcmu_free_cmd(struct tcmu_cmd *tcmu_cmd) kmem_cache_free(tcmu_cmd_cache, tcmu_cmd); } -static inline uint32_t tcmu_cmd_get_dbi_len(struct se_cmd *se_cmd) +static inline uint32_t tcmu_cmd_get_data_length(struct se_cmd *se_cmd) { size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE); - uint32_t dbi_len; if (se_cmd->se_cmd_flags & SCF_BIDI) data_length += round_up(se_cmd->t_bidi_data_sg->length, DATA_BLOCK_SIZE); - dbi_len = data_length / DATA_BLOCK_SIZE; + return data_length; +} + +static inline uint32_t tcmu_cmd_get_dbi_cnt(struct se_cmd *se_cmd) +{ + size_t data_length = tcmu_cmd_get_data_length(se_cmd); - return dbi_len; + return data_length / DATA_BLOCK_SIZE; } static struct tcmu_cmd *tcmu_alloc_cmd(struct se_cmd *se_cmd) { struct se_device *se_dev = se_cmd->se_dev; struct tcmu_dev *udev = TCMU_DEV(se_dev); - uint32_t dbi_len = tcmu_cmd_get_dbi_len(se_cmd); + uint32_t dbi_cnt = tcmu_cmd_get_dbi_cnt(se_cmd); struct tcmu_cmd *tcmu_cmd; int cmd_id; @@ -319,8 +323,8 @@ static struct tcmu_cmd *tcmu_alloc_cmd(struct se_cmd *se_cmd) msecs_to_jiffies(udev->cmd_time_out); tcmu_cmd_reset_dbi_cur(tcmu_cmd); - tcmu_cmd->dbi_len = dbi_len; - tcmu_cmd->dbi = kcalloc(dbi_len, sizeof(uint32_t), GFP_KERNEL); + tcmu_cmd->dbi_cnt = dbi_cnt; + tcmu_cmd->dbi = kcalloc(dbi_cnt, sizeof(uint32_t), GFP_KERNEL); if (!tcmu_cmd->dbi) { kmem_cache_free(tcmu_cmd_cache, tcmu_cmd); return NULL; @@ -572,13 +576,27 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, struct tcmu_cmd *cmd, return true; } +static inline void tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd, + size_t *base_size, size_t *cmd_size) +{ + struct se_cmd *se_cmd = tcmu_cmd->se_cmd; + + *base_size = max(offsetof(struct tcmu_cmd_entry, +req.iov[tcmu_cmd->dbi_cnt]), +sizeof(struct tcmu_cmd_entry)); + + *cmd_size = round_up(scsi_command_size(se_cmd->t_task_cdb), + TCMU_OP_ALIGN_SIZE) + *base_size; + WARN_ON(*cmd_size & (TCMU_OP_ALIGN_SIZE-1)); +} + static sense_reason_t tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd) { struct tcmu_dev *udev = tcmu_cmd->tcmu_dev; struct se_cmd *se_cmd = tcmu_cmd->se_cmd; size_t base_command_size, command_size; - struct tcmu_mailbox *mb; + struct tcmu_mailbox *mb = udev->mb_addr; struct tcmu_cmd_entry *entry; struct iovec *iov; int iov_cnt, ret; @@ -590,6 +608,8 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, struct tcmu_cmd *cmd, if (test_bit(TCMU_DEV_BIT_BROKEN, &udev->flags)) return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + if (se_cmd->se_cmd_flags & SCF_BIDI) + BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents)); /* * Must be a certain minimum size for response sense info, but * also may be larger if the iov array is large. @@ -597,33 +617,19 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, struct tcmu_cmd *cmd, * We prepare way too many iovs for potential uses here, because it's
[PATCHv3 3/5] target/user: Fix possible overwrite of t_data_sg's last iov[]
From: Xiubo Li If there has BIDI data, its first iov[] will overwrite the last iov[] for se_cmd->t_data_sg. To fix this, we can just increase the iov pointer, but this may introuduce a new memory leakage bug: If the se_cmd->data_length and se_cmd->t_bidi_data_sg->length are all not aligned up to the DATA_BLOCK_SIZE, the actual length needed maybe larger than just sum of them. So, this could be avoided by rounding all the data lengthes up to DATA_BLOCK_SIZE. Signed-off-by: Xiubo Li Signed-off-by: Bryant G. Ly --- drivers/target/target_core_user.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 8cbe196..780c30f 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -288,13 +288,14 @@ static inline void tcmu_free_cmd(struct tcmu_cmd *tcmu_cmd) static inline uint32_t tcmu_cmd_get_dbi_len(struct se_cmd *se_cmd) { - size_t data_length = se_cmd->data_length; + size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE); uint32_t dbi_len; if (se_cmd->se_cmd_flags & SCF_BIDI) - data_length += se_cmd->t_bidi_data_sg->length; + data_length += round_up(se_cmd->t_bidi_data_sg->length, + DATA_BLOCK_SIZE); - dbi_len = (data_length + DATA_BLOCK_SIZE - 1) / DATA_BLOCK_SIZE; + dbi_len = data_length / DATA_BLOCK_SIZE; return dbi_len; } @@ -609,10 +610,11 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, struct tcmu_cmd *cmd, mb = udev->mb_addr; cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */ - data_length = se_cmd->data_length; + data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE); if (se_cmd->se_cmd_flags & SCF_BIDI) { BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents)); - data_length += se_cmd->t_bidi_data_sg->length; + data_length += round_up(se_cmd->t_bidi_data_sg->length, + DATA_BLOCK_SIZE); } if ((command_size > (udev->cmdr_size / 2)) || data_length > udev->data_size) { @@ -690,15 +692,19 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, struct tcmu_cmd *cmd, entry->req.iov_dif_cnt = 0; /* Handle BIDI commands */ - iov_cnt = 0; - ret = alloc_and_scatter_data_area(udev, tcmu_cmd, - se_cmd->t_bidi_data_sg, se_cmd->t_bidi_data_nents, - &iov, &iov_cnt, false); - if (ret) { - pr_err("tcmu: alloc and scatter bidi data failed\n"); - return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + if (se_cmd->se_cmd_flags & SCF_BIDI) { + iov_cnt = 0; + iov++; + ret = alloc_and_scatter_data_area(udev, tcmu_cmd, + se_cmd->t_bidi_data_sg, + se_cmd->t_bidi_data_nents, + &iov, &iov_cnt, false); + if (ret) { + pr_err("tcmu: alloc and scatter bidi data failed\n"); + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + } + entry->req.iov_bidi_cnt = iov_cnt; } - entry->req.iov_bidi_cnt = iov_cnt; /* All offsets relative to mb_addr, not start of entry! */ cdb_off = CMDR_OFF + cmd_head + base_command_size; -- 1.8.3.1
[PATCHv3 1/5] tcmu: Add dynamic growing data area feature support
From: Xiubo Li Currently for the TCMU, the ring buffer size is fixed to 64K cmd area + 1M data area, and this will be bottlenecks for high iops. The struct tcmu_cmd_entry {} size is fixed about 112 bytes with iovec[N] & N <= 4, and the size of struct iovec is about 16 bytes. If N == 0, the ratio will be sizeof(cmd entry) : sizeof(datas) == 112Bytes : (N * 4096)Bytes = 28 : 0, no data area is need. If 0 < N <=4, the ratio will be sizeof(cmd entry) : sizeof(datas) == 112Bytes : (N * 4096)Bytes = 28 : (N * 1024), so the max will be 28 : 1024. If N > 4, the sizeof(cmd entry) will be [(N - 4) *16 + 112] bytes, and its corresponding data size will be [N * 4096], so the ratio of sizeof(cmd entry) : sizeof(datas) == [(N - 4) * 16 + 112)Bytes : (N * 4096)Bytes == 4/1024 - 12/(N * 1024), so the max is about 4 : 1024. When N is bigger, the ratio will be smaller. As the initial patch, we will set the cmd area size to 2M, and the cmd area size to 32M. The TCMU will dynamically grows the data area from 0 to max 32M size as needed. The cmd area memory will be allocated through vmalloc(), and the data area's blocks will be allocated individually later when needed. The allocated data area block memory will be managed via radix tree. For now the bitmap still be the most efficient way to search and manage the block index, this could be update later. Signed-off-by: Xiubo Li Signed-off-by: Jianfei Hu --- drivers/target/target_core_user.c | 286 -- 1 file changed, 215 insertions(+), 71 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index c6874c3..fc9f237 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -2,6 +2,7 @@ * Copyright (C) 2013 Shaohua Li * Copyright (C) 2014 Red Hat, Inc. * Copyright (C) 2015 Arrikto, Inc. + * Copyright (C) 2017 Chinamobile, Inc. * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, @@ -25,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -63,15 +65,17 @@ * this may have a 'UAM' comment. */ - #define TCMU_TIME_OUT (30 * MSEC_PER_SEC) -#define DATA_BLOCK_BITS 256 -#define DATA_BLOCK_SIZE 4096 +/* For cmd area, the size is fixed 2M */ +#define CMDR_SIZE (2 * 1024 * 1024) -#define CMDR_SIZE (16 * 4096) +/* For data area, the size is fixed 32M */ +#define DATA_BLOCK_BITS (8 * 1024) +#define DATA_BLOCK_SIZE 4096 #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE) +/* The ring buffer size is 34M */ #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE) static struct device *tcmu_root_device; @@ -103,12 +107,14 @@ struct tcmu_dev { size_t data_off; size_t data_size; - DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); - wait_queue_head_t wait_cmdr; /* TODO should this be a mutex? */ spinlock_t cmdr_lock; + uint32_t dbi_cur; + DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); + struct radix_tree_root data_blocks; + struct idr commands; spinlock_t commands_lock; @@ -130,7 +136,9 @@ struct tcmu_cmd { /* Can't use se_cmd when cleaning up expired cmds, because if cmd has been completed then accessing se_cmd is off limits */ - DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); + uint32_t dbi_len; + uint32_t dbi_cur; + uint32_t *dbi; unsigned long deadline; @@ -161,10 +169,80 @@ enum tcmu_multicast_groups { .netnsok = true, }; +static int tcmu_get_empty_block(struct tcmu_dev *udev, void **addr) +{ + void *p; + uint32_t dbi; + int ret; + + dbi = find_first_zero_bit(udev->data_bitmap, DATA_BLOCK_BITS); + if (dbi > udev->dbi_cur) + udev->dbi_cur = dbi; + + set_bit(dbi, udev->data_bitmap); + + p = radix_tree_lookup(&udev->data_blocks, dbi); + if (!p) { + p = kzalloc(DATA_BLOCK_SIZE, GFP_ATOMIC); + if (!p) { + clear_bit(dbi, udev->data_bitmap); + return -ENOMEM; + } + + ret = radix_tree_insert(&udev->data_blocks, dbi, p); + if (ret) { + kfree(p); + clear_bit(dbi, udev->data_bitmap); + return ret; + } + } + + *addr = p; + return dbi; +} + +static void *tcmu_get_block_addr(struct tcmu_dev *udev, uint32_t dbi) +{ + return radix_tree_lookup(&udev->data_blocks, dbi); +} + +#define tcmu_cmd_reset_dbi_cur(cmd) ((cmd)->dbi_cur = 0) +#define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index)) +#define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++]) + +static void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd) +{ + struct tcmu_dev *udev = tcmu_cmd->tcmu_dev; + uint32_t bi; + + for (bi =
[PATCHv3 0/5] Dynamic growing data area support
From: Xiubo Li Changed for V3: - [PATCHv2 2/5] fix double usage of blocks and possible page fault call trace. - [PATCHv2 5/5] fix a mistake. Changed for V2: - [PATCHv2 1/5] just fixes some small spelling and other mistakes. And as the initial patch, here sets cmd area to 8M and data area to 1G(1M fixed and 1023M growing) - [PATCHv2 2/5] is a new one, adding global data block pool support. The max total size of the pool is 2G and all the targets will get growing blocks from here. Test this using multi-targets at the same time. - [PATCHv2 3/5] changed nothing, respin it to avoid the conflict. - [PATCHv2 4/5] and [PATCHv2 5/5] are new ones. Xiubo Li (5): tcmu: Add dynamic growing data area feature support tcmu: Add global data block pool support target/user: Fix possible overwrite of t_data_sg's last iov[] target/user: Fix wrongly calculating of the base_command_size target/user: Clean up tcmu_queue_cmd_ring drivers/target/target_core_user.c | 621 +++--- 1 file changed, 514 insertions(+), 107 deletions(-) -- 1.8.3.1
[PATCHv3 4/5] target/user: Fix wrongly calculating of the base_command_size
From: Xiubo Li The t_data_nents and t_bidi_data_nents are all the numbers of the segments, and we couldn't be sure the size of the data area block will equal to size of the segment. Use the actually block number needed intead of the sum of segments. Signed-off-by: Xiubo Li --- drivers/target/target_core_user.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 780c30f..e3daf15 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -598,8 +598,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, struct tcmu_cmd *cmd, * expensive to tell how many regions are freed in the bitmap */ base_command_size = max(offsetof(struct tcmu_cmd_entry, - req.iov[se_cmd->t_bidi_data_nents + - se_cmd->t_data_nents]), + req.iov[tcmu_cmd->dbi_len]), sizeof(struct tcmu_cmd_entry)); command_size = base_command_size + round_up(scsi_command_size(se_cmd->t_task_cdb), TCMU_OP_ALIGN_SIZE); -- 1.8.3.1
[PATCHv3 2/5] tcmu: Add global data block pool support
From: Xiubo Li For each target there will be one ring, when the target number grows larger and larger, it could eventually runs out of the system memories. In this patch for each target ring, for the cmd area the size will be limited to 8MB and for the data area the size will be limited to 256K * PAGE_SIZE. For all the targets' data areas, they will get empty blocks from the "global data block pool", which has limited to 512K * PAGE_SIZE for now. When the "global data block pool" has been used up, then any target could trigger the unmapping thread routine to shrink the targets' rings. And for the idle targets the unmapping routine will reserve 256 blocks at least. When user space has touched the data blocks out of the iov[N], the tcmu_page_fault() will return one zeroed blocks. Signed-off-by: Xiubo Li Signed-off-by: Jianfei Hu --- drivers/target/target_core_user.c | 426 ++ 1 file changed, 339 insertions(+), 87 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index fc9f237..8cbe196 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -31,6 +31,8 @@ #include #include #include +#include +#include #include #include #include @@ -67,17 +69,24 @@ #define TCMU_TIME_OUT (30 * MSEC_PER_SEC) -/* For cmd area, the size is fixed 2M */ -#define CMDR_SIZE (2 * 1024 * 1024) +/* For cmd area, the size is fixed 8MB */ +#define CMDR_SIZE (8 * 1024 * 1024) -/* For data area, the size is fixed 32M */ -#define DATA_BLOCK_BITS (8 * 1024) -#define DATA_BLOCK_SIZE 4096 +/* + * For data area, the block size is PAGE_SIZE and + * the total size is 256K * PAGE_SIZE. + */ +#define DATA_BLOCK_SIZE PAGE_SIZE +#define DATA_BLOCK_BITS (256 * 1024) #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE) +#define DATA_BLOCK_RES_BITS 256 -/* The ring buffer size is 34M */ +/* The total size of the ring is 8M + 256K * PAGE_SIZE */ #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE) +/* Default maximum of the global data blocks(512K * PAGE_SIZE) */ +#define TCMU_GLOBAL_MAX_BLOCKS (512 * 1024) + static struct device *tcmu_root_device; struct tcmu_hba { @@ -87,6 +96,8 @@ struct tcmu_hba { #define TCMU_CONFIG_LEN 256 struct tcmu_dev { + struct list_head node; + struct se_device se_dev; char *name; @@ -98,6 +109,16 @@ struct tcmu_dev { struct uio_info uio_info; + struct inode *inode; + + struct mutex unmap_mutex; + bool unmapping; + bool waiting_global; + uint32_t dbi_cur; + uint32_t dbi_thresh; + DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); + struct radix_tree_root data_blocks; + struct tcmu_mailbox *mb_addr; size_t dev_size; u32 cmdr_size; @@ -111,10 +132,6 @@ struct tcmu_dev { /* TODO should this be a mutex? */ spinlock_t cmdr_lock; - uint32_t dbi_cur; - DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); - struct radix_tree_root data_blocks; - struct idr commands; spinlock_t commands_lock; @@ -146,6 +163,14 @@ struct tcmu_cmd { unsigned long flags; }; +static struct task_struct *unmap_thread; +static wait_queue_head_t unmap_wait; +static DEFINE_MUTEX(udev_mutex); +static LIST_HEAD(root_udev); + +static spinlock_t db_count_lock; +static unsigned long global_db_count; + static struct kmem_cache *tcmu_cmd_cache; /* multicast group */ @@ -169,54 +194,90 @@ enum tcmu_multicast_groups { .netnsok = true, }; -static int tcmu_get_empty_block(struct tcmu_dev *udev, void **addr) +#define tcmu_cmd_reset_dbi_cur(cmd) ((cmd)->dbi_cur = 0) +#define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index)) +#define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++]) + +static inline void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd, uint32_t len) { - void *p; - uint32_t dbi; - int ret; + struct tcmu_dev *udev = tcmu_cmd->tcmu_dev; + uint32_t i; - dbi = find_first_zero_bit(udev->data_bitmap, DATA_BLOCK_BITS); - if (dbi > udev->dbi_cur) - udev->dbi_cur = dbi; + for (i = 0; i < len; i++) + clear_bit(tcmu_cmd->dbi[i], udev->data_bitmap); +} - set_bit(dbi, udev->data_bitmap); +static inline bool tcmu_get_empty_block(struct tcmu_dev *udev, + struct tcmu_cmd *tcmu_cmd) +{ + struct page *page; + int ret, dbi; - p = radix_tree_lookup(&udev->data_blocks, dbi); - if (!p) { - p = kzalloc(DATA_BLOCK_SIZE, GFP_ATOMIC); - if (!p) { - clear_bit(dbi, udev->data_bitmap); - return -ENOMEM; + dbi = find_first_zero_bit(udev->data_bitmap, udev->dbi_thresh); + if (dbi == udev->dbi_thresh) + return false; + + page = radix_tree_lookup(&udev->data_blocks, dbi); + if (!page) { +
Re: [PATCH] scsi: libsas: fix ata xfer length
On 19/03/2017 17:21, Martin K. Petersen wrote: John Garry writes: John, The total ata xfer length may not be calculated properly, in that we do not use the proper method to get an sg element dma length. According to the code comment, sg_dma_len() should be used after dma_map_sg() is called. This issue was found by turning on the SMMUv3 in front of the hisi_sas controller in hip07. Multiple sg elements were being combined into a single element, but the original first element length was being use as the total xfer length. I should have added this originally to the changelog: Fixes: ff2aeb1eb64c8a4770a6 ("libata: convert to chained sg") BTW, I am surprised this issue has not been seen in almost 10 years, but we cannot attach a SATA disk when SMMU enabled without it. Cheers, John Applied to 4.11/scsi-fixes.
[PATCH] qedf: fix wrong le16 conversion
gcc points out that we are converting a 16-bit integer into a 32-bit little-endian type and assigning that to 16-bit little-endian will end up with a zero: drivers/scsi/qedf/drv_fcoe_fw_funcs.c: In function 'init_initiator_rw_fcoe_task': include/uapi/linux/byteorder/big_endian.h:32:26: error: large integer implicitly truncated to unsigned type [-Werror=overflow] t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID); The correct solution appears to be to just use a 16-bit byte swap instead. Fixes: be086e7c53f1 ("qed*: Utilize Firmware 8.15.3.0") Signed-off-by: Arnd Bergmann --- drivers/scsi/qedf/drv_fcoe_fw_funcs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c index bb812db48da6..8c65e3b034dc 100644 --- a/drivers/scsi/qedf/drv_fcoe_fw_funcs.c +++ b/drivers/scsi/qedf/drv_fcoe_fw_funcs.c @@ -8,7 +8,7 @@ #include "drv_fcoe_fw_funcs.h" #include "drv_scsi_fw_funcs.h" -#define FCOE_RX_ID ((u32)0x) +#define FCOE_RX_ID (0xu) static inline void init_common_sqe(struct fcoe_task_params *task_params, enum fcoe_sqe_request_type request_type) @@ -59,7 +59,7 @@ int init_initiator_rw_fcoe_task(struct fcoe_task_params *task_params, t_st_ctx->read_only.task_type = task_params->task_type; SET_FIELD(t_st_ctx->read_write.flags, FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1); - t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID); + t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID); /* Ustorm ctx */ u_ag_ctx = &ctx->ustorm_ag_context; @@ -151,7 +151,7 @@ int init_initiator_midpath_unsolicited_fcoe_task( t_st_ctx->read_only.task_type = task_params->task_type; SET_FIELD(t_st_ctx->read_write.flags, FCOE_TSTORM_FCOE_TASK_ST_CTX_READ_WRITE_EXP_FIRST_FRAME, 1); - t_st_ctx->read_write.rx_id = cpu_to_le32(FCOE_RX_ID); + t_st_ctx->read_write.rx_id = cpu_to_le16(FCOE_RX_ID); /* Init Ustorm */ u_ag_ctx = &ctx->ustorm_ag_context; -- 2.9.0