Re: [PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq
On Thu, Nov 02, 2017 at 11:24:31PM +0800, Ming Lei wrote: > Hi Jens, > > This patchset avoids to allocate driver tag beforehand for flush rq > in case of I/O scheduler, then flush rq isn't treated specially > wrt. get/put driver tag, code gets cleanup much, such as, > reorder_tags_to_front() is removed, and we needn't to worry > about request order in dispatch list for avoiding I/O deadlock. > > 'dbench -t 30 -s 64' has been run on different devices(shared tag, > multi-queue, singele queue, ...), and no issues are observed, > even very low queue depth test are run, debench still works well. > > Please consider it for V4.15, thanks! Hi Jens, As we discussed before, this patch is a good cleanup on handling flush request, could you share your opinion on V3? thanks, Ming
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Sat, Nov 04, 2017 at 12:47:31AM +0800, Ming Lei wrote: > On Fri, Nov 03, 2017 at 12:13:09PM -0400, Laurence Oberman wrote: > > Hi > > I had it working some time back. I am off today to take my son to the > > doctor. > > I will get Bart's test working again this weekend. > > Hello Laurence and Bart, > > Just found srp-test starts to work now with v4.14-rc4 kernel, and > the IO hang issue has been reproduced one time. BTW, the connection > failure happened on for-next of block tree. > > Will investigate the issue this weekend. Hi Bart, Please test the following patch to see if your issue can be fixed: https://marc.info/?l=linux-scsi=150976056219162=2 I run three times of your test 01, not see the hang issue any more once this patch is applied. Thanks, Ming > > > > > On Nov 3, 2017 11:51 AM, "Bart Van Assche"wrote: > > > > > On Fri, 2017-11-03 at 23:47 +0800, Ming Lei wrote: > > > > Forget to mention, there is failure when running 'make' under srp-test > > > > because shellcheck package is missed in RHEL7. Can that be the issue > > > > of test failure? If yes, could you provide a special version of srp-test > > > > which doesn't depend on shellcheck? > > > > > > The srp-test software does not depend on shellcheck. The only purpose of > > > the "shellcheck" target in the srp-test Makefile is to verify the syntax > > > of the shell scripts. The tests run fine even if shellcheck is missing. > > > > > > Bart. > > -- > Ming -- Ming
[PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()
It is very expensive to atomic_inc/atomic_dec the host wide counter of host->busy_count, and it should have been avoided via blk-mq's mechanism of getting driver tag, which uses the more efficient way of sbitmap queue. Also we don't check atomic_read(>device_busy) in scsi_mq_get_budget() and don't run queue if the counter becomes zero, so IO hang may be caused if all requests are completed just before the current SCSI device is added to shost->starved_list. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Reported-by: Bart Van AsscheSigned-off-by: Ming Lei --- drivers/scsi/scsi_lib.c | 29 + 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index a098af3070a1..7f218ef61900 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1944,11 +1944,7 @@ static void scsi_mq_put_budget(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct scsi_device *sdev = q->queuedata; - struct Scsi_Host *shost = sdev->host; - atomic_dec(>host_busy); - if (scsi_target(sdev)->can_queue > 0) - atomic_dec(_target(sdev)->target_busy); atomic_dec(>device_busy); put_device(>sdev_gendev); } @@ -1957,7 +1953,6 @@ static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) { struct request_queue *q = hctx->queue; struct scsi_device *sdev = q->queuedata; - struct Scsi_Host *shost = sdev->host; blk_status_t ret; ret = prep_to_mq(scsi_prep_state_check(sdev, NULL)); @@ -1968,18 +1963,9 @@ static blk_status_t scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) goto out; if (!scsi_dev_queue_ready(q, sdev)) goto out_put_device; - if (!scsi_target_queue_ready(shost, sdev)) - goto out_dec_device_busy; - if (!scsi_host_queue_ready(q, shost, sdev)) - goto out_dec_target_busy; return BLK_STS_OK; -out_dec_target_busy: - if (scsi_target(sdev)->can_queue > 0) - atomic_dec(_target(sdev)->target_busy); -out_dec_device_busy: - atomic_dec(>device_busy); out_put_device: put_device(>sdev_gendev); out: @@ -1992,6 +1978,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req = bd->rq; struct request_queue *q = req->q; struct scsi_device *sdev = q->queuedata; + struct Scsi_Host *shost = sdev->host; struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req); blk_status_t ret; int reason; @@ -2001,10 +1988,15 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, goto out_put_budget; ret = BLK_STS_RESOURCE; + if (!scsi_target_queue_ready(shost, sdev)) + goto out_put_budget; + if (!scsi_host_queue_ready(q, shost, sdev)) + goto out_dec_target_busy; + if (!(req->rq_flags & RQF_DONTPREP)) { ret = prep_to_mq(scsi_mq_prep_fn(req)); if (ret != BLK_STS_OK) - goto out_put_budget; + goto out_dec_host_busy; req->rq_flags |= RQF_DONTPREP; } else { blk_mq_start_request(req); @@ -2022,11 +2014,16 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, if (reason) { scsi_set_blocked(cmd, reason); ret = BLK_STS_RESOURCE; - goto out_put_budget; + goto out_dec_host_busy; } return BLK_STS_OK; +out_dec_host_busy: + atomic_dec(>host_busy); +out_dec_target_busy: + if (scsi_target(sdev)->can_queue > 0) + atomic_dec(_target(sdev)->target_busy); out_put_budget: scsi_mq_put_budget(hctx); switch (ret) { -- 2.9.5
[PATCH v3 3/8] nvmet/fc: Use sgl_alloc() and sgl_free()
Use the sgl_alloc() and sgl_free() functions instead of open coding these functions. Signed-off-by: Bart Van AsscheReviewed-by: Johannes Thumshirn Reviewed-by: Hannes Reinecke Cc: Keith Busch Cc: Christoph Hellwig Cc: James Smart Cc: Sagi Grimberg --- drivers/nvme/target/Kconfig | 1 + drivers/nvme/target/fc.c| 36 ++-- 2 files changed, 3 insertions(+), 34 deletions(-) diff --git a/drivers/nvme/target/Kconfig b/drivers/nvme/target/Kconfig index 03e4ab65fe77..4d9715630e21 100644 --- a/drivers/nvme/target/Kconfig +++ b/drivers/nvme/target/Kconfig @@ -39,6 +39,7 @@ config NVME_TARGET_FC tristate "NVMe over Fabrics FC target driver" depends on NVME_TARGET depends on HAS_DMA + select SGL_ALLOC help This enables the NVMe FC target support, which allows exporting NVMe devices over FC. diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c index 5f2570f5dfa9..1b4030d74ceb 100644 --- a/drivers/nvme/target/fc.c +++ b/drivers/nvme/target/fc.c @@ -1695,31 +1695,12 @@ static int nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod) { struct scatterlist *sg; - struct page *page; unsigned int nent; - u32 page_len, length; - int i = 0; - length = fod->total_length; - nent = DIV_ROUND_UP(length, PAGE_SIZE); - sg = kmalloc_array(nent, sizeof(struct scatterlist), GFP_KERNEL); + sg = sgl_alloc(fod->total_length, GFP_KERNEL, ); if (!sg) goto out; - sg_init_table(sg, nent); - - while (length) { - page_len = min_t(u32, length, PAGE_SIZE); - - page = alloc_page(GFP_KERNEL); - if (!page) - goto out_free_pages; - - sg_set_page([i], page, page_len, 0); - length -= page_len; - i++; - } - fod->data_sg = sg; fod->data_sg_cnt = nent; fod->data_sg_cnt = fc_dma_map_sg(fod->tgtport->dev, sg, nent, @@ -1729,14 +1710,6 @@ nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod) return 0; -out_free_pages: - while (i > 0) { - i--; - __free_page(sg_page([i])); - } - kfree(sg); - fod->data_sg = NULL; - fod->data_sg_cnt = 0; out: return NVME_SC_INTERNAL; } @@ -1744,18 +1717,13 @@ nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod) static void nvmet_fc_free_tgt_pgs(struct nvmet_fc_fcp_iod *fod) { - struct scatterlist *sg; - int count; - if (!fod->data_sg || !fod->data_sg_cnt) return; fc_dma_unmap_sg(fod->tgtport->dev, fod->data_sg, fod->data_sg_cnt, ((fod->io_dir == NVMET_FCP_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE)); - for_each_sg(fod->data_sg, sg, fod->data_sg_cnt, count) - __free_page(sg_page(sg)); - kfree(fod->data_sg); + sgl_free(fod->data_sg); fod->data_sg = NULL; fod->data_sg_cnt = 0; } -- 2.14.3
[PATCH v3 0/8] Introduce sgl_alloc() and sgl_free()
Hello Jens, As you know there are multiple drivers that both allocate a scatter/gather list and populate that list with pages. This patch series moves the code for allocating and freeing such scatterlists from these drivers into lib/scatterlist.c. Please consider this patch series for kernel v4.15. Notes: - Only the ib_srpt and ipr changes have been tested. - The next step is to introduce a caching mechanism for these scatterlists and make the nvmet/rdma and SCSI target drivers use that caching mechanism since for these drivers sgl allocation occurs in the hot path. Changes between v2 and v3: - Added Reviewed-by tags that had been posted as replies on v2. - Cc'ed the crypto maintainers for the entire patch series. Changes between v1 and v2: - Moved the sgl_alloc*() and sgl_free() functions from a new source file into lib/scatterlist.c. - Changed the argument order for the sgl_alloc*() functions such that the (pointer to) the output argument comes last. Thanks, Bart. Bart Van Assche (8): lib/scatterlist: Introduce sgl_alloc() and sgl_free() crypto: scompress - use sgl_alloc() and sgl_free() nvmet/fc: Use sgl_alloc() and sgl_free() nvmet/rdma: Use sgl_alloc() and sgl_free() target: Use sgl_alloc_order() and sgl_free() scsi/ipr: Use sgl_alloc_order() and sgl_free_order() scsi/pmcraid: Remove an unused structure member scsi/pmcraid: Use sgl_alloc_order() and sgl_free_order() crypto/Kconfig | 1 + crypto/scompress.c | 51 +--- drivers/nvme/target/Kconfig| 2 + drivers/nvme/target/fc.c | 36 +-- drivers/nvme/target/rdma.c | 63 ++-- drivers/scsi/Kconfig | 2 + drivers/scsi/ipr.c | 49 +++ drivers/scsi/ipr.h | 2 +- drivers/scsi/pmcraid.c | 43 ++ drivers/scsi/pmcraid.h | 3 +- drivers/target/Kconfig | 1 + drivers/target/target_core_transport.c | 46 ++- include/linux/scatterlist.h| 10 lib/Kconfig| 4 ++ lib/scatterlist.c | 105 + 15 files changed, 151 insertions(+), 267 deletions(-) -- 2.14.3
[PATCH v3 1/8] lib/scatterlist: Introduce sgl_alloc() and sgl_free()
Many kernel drivers contain code that allocates and frees both a scatterlist and the pages that populate that scatterlist. Introduce functions in lib/scatterlist.c that perform these tasks instead of duplicating this functionality in multiple drivers. Only include these functions in the build if CONFIG_SGL_ALLOC=y to avoid that the kernel size increases if this functionality is not used. Signed-off-by: Bart Van AsscheReviewed-by: Hannes Reinecke Reviewed-by: Johannes Thumshirn --- include/linux/scatterlist.h | 10 + lib/Kconfig | 4 ++ lib/scatterlist.c | 105 3 files changed, 119 insertions(+) diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 4b3286ac60c8..3642511918d5 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -266,6 +266,16 @@ int sg_alloc_table_from_pages(struct sg_table *sgt, unsigned long offset, unsigned long size, gfp_t gfp_mask); +#ifdef CONFIG_SGL_ALLOC +struct scatterlist *sgl_alloc_order(unsigned long long length, + unsigned int order, bool chainable, + gfp_t gfp, unsigned int *nent_p); +struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp, + unsigned int *nent_p); +void sgl_free_order(struct scatterlist *sgl, int order); +void sgl_free(struct scatterlist *sgl); +#endif /* CONFIG_SGL_ALLOC */ + size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf, size_t buflen, off_t skip, bool to_buffer); diff --git a/lib/Kconfig b/lib/Kconfig index b1445b22a6de..8396c4cfa1ab 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -413,6 +413,10 @@ config HAS_DMA depends on !NO_DMA default y +config SGL_ALLOC + bool + default n + config DMA_NOOP_OPS bool depends on HAS_DMA && (!64BIT || ARCH_DMA_ADDR_T_64BIT) diff --git a/lib/scatterlist.c b/lib/scatterlist.c index be7b4dd6b68d..e2b5a78ceb44 100644 --- a/lib/scatterlist.c +++ b/lib/scatterlist.c @@ -433,6 +433,111 @@ int sg_alloc_table_from_pages(struct sg_table *sgt, } EXPORT_SYMBOL(sg_alloc_table_from_pages); +#ifdef CONFIG_SGL_ALLOC + +/** + * sgl_alloc_order - allocate a scatterlist and its pages + * @length: Length in bytes of the scatterlist. Must be at least one + * @order: Second argument for alloc_pages() + * @chainable: Whether or not to allocate an extra element in the scatterlist + * for scatterlist chaining purposes + * @gfp: Memory allocation flags + * @nent_p: [out] Number of entries in the scatterlist that have pages + * + * Returns: A pointer to an initialized scatterlist or %NULL upon failure. + */ +struct scatterlist *sgl_alloc_order(unsigned long long length, + unsigned int order, bool chainable, + gfp_t gfp, unsigned int *nent_p) +{ + struct scatterlist *sgl, *sg; + struct page *page; + unsigned int nent, nalloc; + u32 elem_len; + + nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order); + /* Check for integer overflow */ + if (length > (nent << (PAGE_SHIFT + order))) + return NULL; + nalloc = nent; + if (chainable) { + /* Check for integer overflow */ + if (nalloc + 1 < nalloc) + return NULL; + nalloc++; + } + sgl = kmalloc_array(nalloc, sizeof(struct scatterlist), + (gfp & ~GFP_DMA) | __GFP_ZERO); + if (!sgl) + return NULL; + + sg_init_table(sgl, nent); + sg = sgl; + while (length) { + elem_len = min_t(u64, length, PAGE_SIZE << order); + page = alloc_pages(gfp, order); + if (!page) { + sgl_free(sgl); + return NULL; + } + + sg_set_page(sg, page, elem_len, 0); + length -= elem_len; + sg = sg_next(sg); + } + WARN_ON_ONCE(sg); + if (nent_p) + *nent_p = nent; + return sgl; +} +EXPORT_SYMBOL(sgl_alloc_order); + +/** + * sgl_alloc - allocate a scatterlist and its pages + * @length: Length in bytes of the scatterlist + * @gfp: Memory allocation flags + * @nent_p: [out] Number of entries in the scatterlist + * + * Returns: A pointer to an initialized scatterlist or %NULL upon failure. + */ +struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp, + unsigned int *nent_p) +{ + return sgl_alloc_order(length, 0, false, gfp, nent_p); +} +EXPORT_SYMBOL(sgl_alloc); + +/** + * sgl_free_order - free a scatterlist and its pages + * @sgl: Scatterlist with one or more elements + * @order: Second argument for __free_pages() + */
[PATCH v3 8/8] scsi/pmcraid: Use sgl_alloc_order() and sgl_free_order()
Use the sgl_alloc_order() and sgl_free_order() functions instead of open coding these functions. Signed-off-by: Bart Van AsscheReviewed-by: Johannes Thumshirn Cc: linux-s...@vger.kernel.org Cc: Martin K. Petersen Cc: Anil Ravindranath --- drivers/scsi/Kconfig | 1 + drivers/scsi/pmcraid.c | 43 --- drivers/scsi/pmcraid.h | 2 +- 3 files changed, 6 insertions(+), 40 deletions(-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index d918115afdca..04400bed39a1 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -1577,6 +1577,7 @@ config ZFCP config SCSI_PMCRAID tristate "PMC SIERRA Linux MaxRAID adapter support" depends on PCI && SCSI && NET + select SGL_ALLOC ---help--- This driver supports the PMC SIERRA MaxRAID adapters. diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c index b4d6cd8cd1ad..95fc0352f9bb 100644 --- a/drivers/scsi/pmcraid.c +++ b/drivers/scsi/pmcraid.c @@ -3232,12 +3232,7 @@ static int pmcraid_build_ioadl( */ static void pmcraid_free_sglist(struct pmcraid_sglist *sglist) { - int i; - - for (i = 0; i < sglist->num_sg; i++) - __free_pages(sg_page(&(sglist->scatterlist[i])), -sglist->order); - + sgl_free_order(sglist->scatterlist, sglist->order); kfree(sglist); } @@ -3254,50 +3249,20 @@ static void pmcraid_free_sglist(struct pmcraid_sglist *sglist) static struct pmcraid_sglist *pmcraid_alloc_sglist(int buflen) { struct pmcraid_sglist *sglist; - struct scatterlist *scatterlist; - struct page *page; - int num_elem, i, j; int sg_size; int order; - int bsize_elem; sg_size = buflen / (PMCRAID_MAX_IOADLS - 1); order = (sg_size > 0) ? get_order(sg_size) : 0; - bsize_elem = PAGE_SIZE * (1 << order); - - /* Determine the actual number of sg entries needed */ - if (buflen % bsize_elem) - num_elem = (buflen / bsize_elem) + 1; - else - num_elem = buflen / bsize_elem; /* Allocate a scatter/gather list for the DMA */ - sglist = kzalloc(sizeof(struct pmcraid_sglist) + -(sizeof(struct scatterlist) * (num_elem - 1)), -GFP_KERNEL); - + sglist = kzalloc(sizeof(struct pmcraid_sglist), GFP_KERNEL); if (sglist == NULL) return NULL; - scatterlist = sglist->scatterlist; - sg_init_table(scatterlist, num_elem); sglist->order = order; - sglist->num_sg = num_elem; - sg_size = buflen; - - for (i = 0; i < num_elem; i++) { - page = alloc_pages(GFP_KERNEL|GFP_DMA|__GFP_ZERO, order); - if (!page) { - for (j = i - 1; j >= 0; j--) - __free_pages(sg_page([j]), order); - kfree(sglist); - return NULL; - } - - sg_set_page([i], page, - sg_size < bsize_elem ? sg_size : bsize_elem, 0); - sg_size -= bsize_elem; - } + sgl_alloc_order(buflen, order, false, + GFP_KERNEL | GFP_DMA | __GFP_ZERO, >num_sg); return sglist; } diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h index 44da91712115..754ef30927e2 100644 --- a/drivers/scsi/pmcraid.h +++ b/drivers/scsi/pmcraid.h @@ -542,7 +542,7 @@ struct pmcraid_sglist { u32 order; u32 num_sg; u32 num_dma_sg; - struct scatterlist scatterlist[1]; + struct scatterlist *scatterlist; }; /* page D0 inquiry data of focal point resource */ -- 2.14.3
[PATCH v3 7/8] scsi/pmcraid: Remove an unused structure member
Signed-off-by: Bart Van AsscheReviewed-by: Johannes Thumshirn Cc: linux-s...@vger.kernel.org Cc: Martin K. Petersen Cc: Anil Ravindranath --- drivers/scsi/pmcraid.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h index 8bfac72a242b..44da91712115 100644 --- a/drivers/scsi/pmcraid.h +++ b/drivers/scsi/pmcraid.h @@ -542,7 +542,6 @@ struct pmcraid_sglist { u32 order; u32 num_sg; u32 num_dma_sg; - u32 buffer_len; struct scatterlist scatterlist[1]; }; -- 2.14.3
[PATCH v3 6/8] scsi/ipr: Use sgl_alloc_order() and sgl_free_order()
Use the sgl_alloc_order() and sgl_free_order() functions instead of open coding these functions. Signed-off-by: Bart Van AsscheAcked-by: Brian King Reviewed-by: Johannes Thumshirn Reviewed-by: Hannes Reinecke Cc: Martin K. Petersen Cc: linux-s...@vger.kernel.org --- drivers/scsi/Kconfig | 1 + drivers/scsi/ipr.c | 49 - drivers/scsi/ipr.h | 2 +- 3 files changed, 10 insertions(+), 42 deletions(-) diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig index 766955318005..d918115afdca 100644 --- a/drivers/scsi/Kconfig +++ b/drivers/scsi/Kconfig @@ -1059,6 +1059,7 @@ config SCSI_IPR depends on PCI && SCSI && ATA select FW_LOADER select IRQ_POLL + select SGL_ALLOC ---help--- This driver supports the IBM Power Linux family RAID adapters. This includes IBM pSeries 5712, 5703, 5709, and 570A, as well diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index f838bd73befa..6fed5db6255e 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -3815,10 +3815,8 @@ static struct device_attribute ipr_iopoll_weight_attr = { **/ static struct ipr_sglist *ipr_alloc_ucode_buffer(int buf_len) { - int sg_size, order, bsize_elem, num_elem, i, j; + int sg_size, order; struct ipr_sglist *sglist; - struct scatterlist *scatterlist; - struct page *page; /* Get the minimum size per scatter/gather element */ sg_size = buf_len / (IPR_MAX_SGLIST - 1); @@ -3826,45 +3824,18 @@ static struct ipr_sglist *ipr_alloc_ucode_buffer(int buf_len) /* Get the actual size per element */ order = get_order(sg_size); - /* Determine the actual number of bytes per element */ - bsize_elem = PAGE_SIZE * (1 << order); - - /* Determine the actual number of sg entries needed */ - if (buf_len % bsize_elem) - num_elem = (buf_len / bsize_elem) + 1; - else - num_elem = buf_len / bsize_elem; - /* Allocate a scatter/gather list for the DMA */ - sglist = kzalloc(sizeof(struct ipr_sglist) + -(sizeof(struct scatterlist) * (num_elem - 1)), -GFP_KERNEL); - + sglist = kzalloc(sizeof(struct ipr_sglist), GFP_KERNEL); if (sglist == NULL) { ipr_trace; return NULL; } - - scatterlist = sglist->scatterlist; - sg_init_table(scatterlist, num_elem); - sglist->order = order; - sglist->num_sg = num_elem; - - /* Allocate a bunch of sg elements */ - for (i = 0; i < num_elem; i++) { - page = alloc_pages(GFP_KERNEL, order); - if (!page) { - ipr_trace; - - /* Free up what we already allocated */ - for (j = i - 1; j >= 0; j--) - __free_pages(sg_page([j]), order); - kfree(sglist); - return NULL; - } - - sg_set_page([i], page, 0, 0); + sglist->scatterlist = sgl_alloc_order(buf_len, order, false, GFP_KERNEL, + >num_sg); + if (!sglist->scatterlist) { + kfree(sglist); + return NULL; } return sglist; @@ -3882,11 +3853,7 @@ static struct ipr_sglist *ipr_alloc_ucode_buffer(int buf_len) **/ static void ipr_free_ucode_buffer(struct ipr_sglist *sglist) { - int i; - - for (i = 0; i < sglist->num_sg; i++) - __free_pages(sg_page(>scatterlist[i]), sglist->order); - + sgl_free_order(sglist->scatterlist, sglist->order); kfree(sglist); } diff --git a/drivers/scsi/ipr.h b/drivers/scsi/ipr.h index c7f0e9e3cd7d..93570734cbfb 100644 --- a/drivers/scsi/ipr.h +++ b/drivers/scsi/ipr.h @@ -1454,7 +1454,7 @@ struct ipr_sglist { u32 num_sg; u32 num_dma_sg; u32 buffer_len; - struct scatterlist scatterlist[1]; + struct scatterlist *scatterlist; }; enum ipr_sdt_state { -- 2.14.3
[PATCH v3 4/8] nvmet/rdma: Use sgl_alloc() and sgl_free()
Use the sgl_alloc() and sgl_free() functions instead of open coding these functions. Signed-off-by: Bart Van AsscheReviewed-by: Johannes Thumshirn Reviewed-by: Hannes Reinecke Cc: Keith Busch Cc: Christoph Hellwig Cc: Sagi Grimberg --- drivers/nvme/target/Kconfig | 1 + drivers/nvme/target/rdma.c | 63 +++-- 2 files changed, 5 insertions(+), 59 deletions(-) diff --git a/drivers/nvme/target/Kconfig b/drivers/nvme/target/Kconfig index 4d9715630e21..5f4f8b16685f 100644 --- a/drivers/nvme/target/Kconfig +++ b/drivers/nvme/target/Kconfig @@ -29,6 +29,7 @@ config NVME_TARGET_RDMA tristate "NVMe over Fabrics RDMA target support" depends on INFINIBAND depends on NVME_TARGET + select SGL_ALLOC help This enables the NVMe RDMA target support, which allows exporting NVMe devices over RDMA. diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 76d2bb793afe..363d44c10b68 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -185,59 +185,6 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp) spin_unlock_irqrestore(>queue->rsps_lock, flags); } -static void nvmet_rdma_free_sgl(struct scatterlist *sgl, unsigned int nents) -{ - struct scatterlist *sg; - int count; - - if (!sgl || !nents) - return; - - for_each_sg(sgl, sg, nents, count) - __free_page(sg_page(sg)); - kfree(sgl); -} - -static int nvmet_rdma_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, - u32 length) -{ - struct scatterlist *sg; - struct page *page; - unsigned int nent; - int i = 0; - - nent = DIV_ROUND_UP(length, PAGE_SIZE); - sg = kmalloc_array(nent, sizeof(struct scatterlist), GFP_KERNEL); - if (!sg) - goto out; - - sg_init_table(sg, nent); - - while (length) { - u32 page_len = min_t(u32, length, PAGE_SIZE); - - page = alloc_page(GFP_KERNEL); - if (!page) - goto out_free_pages; - - sg_set_page([i], page, page_len, 0); - length -= page_len; - i++; - } - *sgl = sg; - *nents = nent; - return 0; - -out_free_pages: - while (i > 0) { - i--; - __free_page(sg_page([i])); - } - kfree(sg); -out: - return NVME_SC_INTERNAL; -} - static int nvmet_rdma_alloc_cmd(struct nvmet_rdma_device *ndev, struct nvmet_rdma_cmd *c, bool admin) { @@ -484,7 +431,7 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp) } if (rsp->req.sg != >cmd->inline_sg) - nvmet_rdma_free_sgl(rsp->req.sg, rsp->req.sg_cnt); + sgl_free(rsp->req.sg); if (unlikely(!list_empty_careful(>rsp_wr_wait_list))) nvmet_rdma_process_wr_wait_list(queue); @@ -620,16 +567,14 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp, u32 len = get_unaligned_le24(sgl->length); u32 key = get_unaligned_le32(sgl->key); int ret; - u16 status; /* no data command? */ if (!len) return 0; - status = nvmet_rdma_alloc_sgl(>req.sg, >req.sg_cnt, - len); - if (status) - return status; + rsp->req.sg = sgl_alloc(len, GFP_KERNEL, >req.sg_cnt); + if (!rsp->req.sg) + return NVME_SC_INTERNAL; ret = rdma_rw_ctx_init(>rw, cm_id->qp, cm_id->port_num, rsp->req.sg, rsp->req.sg_cnt, 0, addr, key, -- 2.14.3
[PATCH v3 2/8] crypto: scompress - use sgl_alloc() and sgl_free()
Use the sgl_alloc() and sgl_free() functions instead of open coding these functions. Signed-off-by: Bart Van AsscheCc: Ard Biesheuvel Cc: Herbert Xu --- crypto/Kconfig | 1 + crypto/scompress.c | 51 ++- 2 files changed, 3 insertions(+), 49 deletions(-) diff --git a/crypto/Kconfig b/crypto/Kconfig index 0a121f9ddf8e..a0667dd284ff 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -105,6 +105,7 @@ config CRYPTO_KPP config CRYPTO_ACOMP2 tristate select CRYPTO_ALGAPI2 + select SGL_ALLOC config CRYPTO_ACOMP tristate diff --git a/crypto/scompress.c b/crypto/scompress.c index 2075e2c4e7df..968bbcf65c94 100644 --- a/crypto/scompress.c +++ b/crypto/scompress.c @@ -140,53 +140,6 @@ static int crypto_scomp_init_tfm(struct crypto_tfm *tfm) return ret; } -static void crypto_scomp_sg_free(struct scatterlist *sgl) -{ - int i, n; - struct page *page; - - if (!sgl) - return; - - n = sg_nents(sgl); - for_each_sg(sgl, sgl, n, i) { - page = sg_page(sgl); - if (page) - __free_page(page); - } - - kfree(sgl); -} - -static struct scatterlist *crypto_scomp_sg_alloc(size_t size, gfp_t gfp) -{ - struct scatterlist *sgl; - struct page *page; - int i, n; - - n = ((size - 1) >> PAGE_SHIFT) + 1; - - sgl = kmalloc_array(n, sizeof(struct scatterlist), gfp); - if (!sgl) - return NULL; - - sg_init_table(sgl, n); - - for (i = 0; i < n; i++) { - page = alloc_page(gfp); - if (!page) - goto err; - sg_set_page(sgl + i, page, PAGE_SIZE, 0); - } - - return sgl; - -err: - sg_mark_end(sgl + i); - crypto_scomp_sg_free(sgl); - return NULL; -} - static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) { struct crypto_acomp *tfm = crypto_acomp_reqtfm(req); @@ -220,7 +173,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) scratch_dst, >dlen, *ctx); if (!ret) { if (!req->dst) { - req->dst = crypto_scomp_sg_alloc(req->dlen, GFP_ATOMIC); + req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL); if (!req->dst) goto out; } @@ -274,7 +227,7 @@ int crypto_init_scomp_ops_async(struct crypto_tfm *tfm) crt->compress = scomp_acomp_compress; crt->decompress = scomp_acomp_decompress; - crt->dst_free = crypto_scomp_sg_free; + crt->dst_free = sgl_free; crt->reqsize = sizeof(void *); return 0; -- 2.14.3
[PATCH v3 5/8] target: Use sgl_alloc_order() and sgl_free()
Use the sgl_alloc_order() and sgl_free() functions instead of open coding these functions. Signed-off-by: Bart Van AsscheCc: Nicholas A. Bellinger Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg --- drivers/target/Kconfig | 1 + drivers/target/target_core_transport.c | 46 +++--- 2 files changed, 5 insertions(+), 42 deletions(-) diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig index e2bc99980f75..4c44d7bed01a 100644 --- a/drivers/target/Kconfig +++ b/drivers/target/Kconfig @@ -5,6 +5,7 @@ menuconfig TARGET_CORE select CONFIGFS_FS select CRC_T10DIF select BLK_SCSI_REQUEST # only for scsi_command_size_tbl.. + select SGL_ALLOC default n help Say Y or M here to enable the TCM Storage Engine and ConfigFS enabled diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 836d552b0385..9bbd08be9d60 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2293,13 +2293,7 @@ static void target_complete_ok_work(struct work_struct *work) void target_free_sgl(struct scatterlist *sgl, int nents) { - struct scatterlist *sg; - int count; - - for_each_sg(sgl, sg, nents, count) - __free_page(sg_page(sg)); - - kfree(sgl); + sgl_free(sgl); } EXPORT_SYMBOL(target_free_sgl); @@ -2423,42 +2417,10 @@ int target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length, bool zero_page, bool chainable) { - struct scatterlist *sg; - struct page *page; - gfp_t zero_flag = (zero_page) ? __GFP_ZERO : 0; - unsigned int nalloc, nent; - int i = 0; - - nalloc = nent = DIV_ROUND_UP(length, PAGE_SIZE); - if (chainable) - nalloc++; - sg = kmalloc_array(nalloc, sizeof(struct scatterlist), GFP_KERNEL); - if (!sg) - return -ENOMEM; + gfp_t gfp = GFP_KERNEL | (zero_page ? __GFP_ZERO : 0); - sg_init_table(sg, nalloc); - - while (length) { - u32 page_len = min_t(u32, length, PAGE_SIZE); - page = alloc_page(GFP_KERNEL | zero_flag); - if (!page) - goto out; - - sg_set_page([i], page, page_len, 0); - length -= page_len; - i++; - } - *sgl = sg; - *nents = nent; - return 0; - -out: - while (i > 0) { - i--; - __free_page(sg_page([i])); - } - kfree(sg); - return -ENOMEM; + *sgl = sgl_alloc_order(length, 0, chainable, gfp, nents); + return *sgl ? 0 : -ENOMEM; } EXPORT_SYMBOL(target_alloc_sgl); -- 2.14.3
Re: [PATCH v2] blk-mq: Make blk_mq_get_request() error path less confusing
On 11/03/2017 12:31 PM, Bart Van Assche wrote: > On Mon, 2017-10-16 at 16:32 -0700, Bart Van Assche wrote: >> blk_mq_get_tag() can modify data->ctx. This means that in the >> error path of blk_mq_get_request() data->ctx should be passed to >> blk_mq_put_ctx() instead of local_ctx. Note: since blk_mq_put_ctx() >> ignores its argument, this patch does not change any functionality. > > Hello Jens, > > Would it be possible to share your opinion about this patch? Yeah looks fine, I queued it up. -- Jens Axboe
Re: [PATCH v2] blk-mq: Make blk_mq_get_request() error path less confusing
On Mon, 2017-10-16 at 16:32 -0700, Bart Van Assche wrote: > blk_mq_get_tag() can modify data->ctx. This means that in the > error path of blk_mq_get_request() data->ctx should be passed to > blk_mq_put_ctx() instead of local_ctx. Note: since blk_mq_put_ctx() > ignores its argument, this patch does not change any functionality. Hello Jens, Would it be possible to share your opinion about this patch? Thanks, Bart.
[PATCH RESEND] badblocks: fix wrong return value in badblocks_set if badblocks are disabled
MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if badblocks are disabled, otherwise, rdev_set_badblocks() will record superblock changes and return success in that case and md will fail to report an IO error which it should. This bug has existed since badblocks were introduced in commit 9e0e252a048b ("badblocks: Add core badblock management code"). Signed-off-by: Liu BoAcked-by: Guoqing Jiang --- block/badblocks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/badblocks.c b/block/badblocks.c index 43c7116..91f7bcf 100644 --- a/block/badblocks.c +++ b/block/badblocks.c @@ -178,7 +178,7 @@ int badblocks_set(struct badblocks *bb, sector_t s, int sectors, if (bb->shift < 0) /* badblocks are disabled */ - return 0; + return 1; if (bb->shift) { /* round the start down, and the end up */ -- 2.9.4
Re: [PATCH v4] blk-mq: fix nr_requests wrong value when modify it from sysfs
On 09/22/2017 09:36 AM, weiping zhang wrote: > if blk-mq use "none" io scheduler, nr_request get a wrong value when > input a number > tag_set->queue_depth. blk_mq_tag_update_depth will get > the smaller one min(nr, set->queue_depth), and then q->nr_request get a > wrong value. > > Reproduce: > > echo none > /sys/block/nvme0n1/queue/scheduler > echo 100 > /sys/block/nvme0n1/queue/nr_requests > cat /sys/block/nvme0n1/queue/nr_requests > 100 Sorry for the delay - looks good to me, and also tested that it does fix the issue. Thanks, applied for 4.15. -- Jens Axboe
Re: [PATCH v4] blk-mq: fix nr_requests wrong value when modify it from sysfs
On Fri, Sep 22, 2017 at 11:36:28PM +0800, weiping zhang wrote: > if blk-mq use "none" io scheduler, nr_request get a wrong value when > input a number > tag_set->queue_depth. blk_mq_tag_update_depth will get > the smaller one min(nr, set->queue_depth), and then q->nr_request get a > wrong value. > > Reproduce: > > echo none > /sys/block/nvme0n1/queue/scheduler > echo 100 > /sys/block/nvme0n1/queue/nr_requests > cat /sys/block/nvme0n1/queue/nr_requests > 100 > > Signed-off-by: weiping zhang> --- > > Changes since v4: > * fix typo in commit message(queue/ioscheduler => queue/scheduler) > > Changes since v3: > * remove compare nr with tags->qdepth, pass nr to blk_mq_tag_update_depth > directly > > * remove return EINVAL when user modify nr_request less than BLKDEV_MIN_RQ > > Changes since v2: > * add return EINVAL when user modify nr_request less than BLKDEV_MIN_RQ > * remove pr_warn, and return EINVAL, if input number is too large > > block/blk-mq.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 98a1860..491e336 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2642,8 +2642,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, > unsigned int nr) >* queue depth. This is similar to what the old code would do. >*/ > if (!hctx->sched_tags) { > - ret = blk_mq_tag_update_depth(hctx, >tags, > - min(nr, > set->queue_depth), > + ret = blk_mq_tag_update_depth(hctx, >tags, nr, > false); > } else { > ret = blk_mq_tag_update_depth(hctx, >sched_tags, > -- > 2.9.4 > Hello Jens, ping
Re: [PATCH] badblocks: fix wrong return value in badblocks_set if badblocks are disabled
On Fri, Nov 03, 2017 at 10:13:38AM -0600, Liu Bo wrote: > Hi Shaohua, > > Given it's related to md, can you please take this thru your tree? Yes, the patch makes sense. Can you resend the patch to me? I can't find it in my inbox Thanks, Shaohua > Thanks, > > -liubo > > On Wed, Sep 27, 2017 at 04:13:17PM -0600, Liu Bo wrote: > > MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if > > badblocks are disabled, otherwise, rdev_set_badblocks() will record > > superblock changes and return success in that case and md will fail to > > report an IO error which it should. > > > > This bug has existed since badblocks were introduced in commit > > 9e0e252a048b ("badblocks: Add core badblock management code"). > > > > Signed-off-by: Liu Bo> > --- > > block/badblocks.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/block/badblocks.c b/block/badblocks.c > > index 43c7116..91f7bcf 100644 > > --- a/block/badblocks.c > > +++ b/block/badblocks.c > > @@ -178,7 +178,7 @@ int badblocks_set(struct badblocks *bb, sector_t s, int > > sectors, > > > > if (bb->shift < 0) > > /* badblocks are disabled */ > > - return 0; > > + return 1; > > > > if (bb->shift) { > > /* round the start down, and the end up */ > > -- > > 2.9.4 > >
Re: [PATCH] badblocks: fix wrong return value in badblocks_set if badblocks are disabled
Hi Shaohua, Given it's related to md, can you please take this thru your tree? Thanks, -liubo On Wed, Sep 27, 2017 at 04:13:17PM -0600, Liu Bo wrote: > MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if > badblocks are disabled, otherwise, rdev_set_badblocks() will record > superblock changes and return success in that case and md will fail to > report an IO error which it should. > > This bug has existed since badblocks were introduced in commit > 9e0e252a048b ("badblocks: Add core badblock management code"). > > Signed-off-by: Liu Bo> --- > block/badblocks.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/badblocks.c b/block/badblocks.c > index 43c7116..91f7bcf 100644 > --- a/block/badblocks.c > +++ b/block/badblocks.c > @@ -178,7 +178,7 @@ int badblocks_set(struct badblocks *bb, sector_t s, int > sectors, > > if (bb->shift < 0) > /* badblocks are disabled */ > - return 0; > + return 1; > > if (bb->shift) { > /* round the start down, and the end up */ > -- > 2.9.4 >
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Fri, Nov 03, 2017 at 12:13:09PM -0400, Laurence Oberman wrote: > Hi > I had it working some time back. I am off today to take my son to the > doctor. > I will get Bart's test working again this weekend. Hello Laurence and Bart, Just found srp-test starts to work now with v4.14-rc4 kernel, and the IO hang issue has been reproduced one time. BTW, the connection failure happened on for-next of block tree. Will investigate the issue this weekend. Thanks, Ming > > On Nov 3, 2017 11:51 AM, "Bart Van Assche"wrote: > > > On Fri, 2017-11-03 at 23:47 +0800, Ming Lei wrote: > > > Forget to mention, there is failure when running 'make' under srp-test > > > because shellcheck package is missed in RHEL7. Can that be the issue > > > of test failure? If yes, could you provide a special version of srp-test > > > which doesn't depend on shellcheck? > > > > The srp-test software does not depend on shellcheck. The only purpose of > > the "shellcheck" target in the srp-test Makefile is to verify the syntax > > of the shell scripts. The tests run fine even if shellcheck is missing. > > > > Bart. -- Ming
Re: block layer patches for nvme multipath support
On 11/02/2017 12:29 PM, Christoph Hellwig wrote: > Hi Jens, > > these patches add the block layer helpers / tweaks for NVMe multipath > support. Can you review them for inclusion? > > There have been no functional changes to the versions posted with > previous nvme multipath patchset. I've queued this up. I really hate the direct_make_request(), but I also don't have a good suggestion on how to do this any better right now. -- Jens Axboe
Re: [GIT PULL] nvme updates for Linux 4.15
On 11/03/2017 06:17 AM, Christoph Hellwig wrote: > Hi Jens, > > below are the currently queue nvme updates for Linux 4.15. There are > a few more things that could make it for this merge window, but I'd > like to get things into linux-next, especially for the unlikely case > that Linus decided to cut -rc8. > > Highlights: > - support for SGLs in the PCIe driver (Chaitanya Kulkarni) > - disable I/O schedulers for the admin queue (Israel Rukshin) > - various Fibre Channel fixes and enhancements (James Smart) > - various refactoring for better code sharing between transports >(Sagi Grimberg and me) > > as well as lots of little bits from various contributors. Pulled, thanks. -- Jens Axboe
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Fri, 2017-11-03 at 23:47 +0800, Ming Lei wrote: > Forget to mention, there is failure when running 'make' under srp-test > because shellcheck package is missed in RHEL7. Can that be the issue > of test failure? If yes, could you provide a special version of srp-test > which doesn't depend on shellcheck? The srp-test software does not depend on shellcheck. The only purpose of the "shellcheck" target in the srp-test Makefile is to verify the syntax of the shell scripts. The tests run fine even if shellcheck is missing. Bart.
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Fri, 2017-11-03 at 23:18 +0800, Ming Lei wrote: > BTW, Laurence found there is kernel crash in his IB/SRP test when running > for-next branch of block tree, so we just test v4.14-rc4 w/wo my blk-mq > patches. One fix for a *sporadic* initiator crash has been queued for the v4.15 merge window. If you hit another crash, please let me know. See also https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git/commit/?h=k.o/for-next=8a0d18c62121d3c554a83eb96e2752861d84d937. Bart.
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Fri, Nov 03, 2017 at 03:23:14PM +, Bart Van Assche wrote: > On Fri, 2017-11-03 at 11:50 +0800, Ming Lei wrote: > > On Fri, Nov 03, 2017 at 02:42:50AM +, Bart Van Assche wrote: > > > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote: > > > > [root@ibclient srp-test]# ./run_tests > > > > modprobe: FATAL: Module target_core_mod is in use. > > > > > > LIO must be unloaded before srp-test software is started. > > > > Yeah, I can make this test kick off after running > > 'modprobe -fr ib_isert' first, but looks all tests failed without > > any hang, could you check if that is the expected result? > > > > https://pastebin.com/VXe66Jpg > > I see plenty of "... >/sys/.../add_target failed: Connection timed out" > messages. The srp-test scripts use the loopback functionality of IB ports > so one or more IB ports must be active. Have you already checked the IB > ports state, e.g. by running ibv_devinfo? [root@ibclient ~]# ibv_devinfo hca_id: mlx5_1 transport: InfiniBand (0) fw_ver: 12.20.1010 node_guid: 7cfe:9003:0072:6ed3 sys_image_guid: 7cfe:9003:0072:6ed2 vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2190110032 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 1 port_lmc: 0x00 link_layer: InfiniBand hca_id: mlx5_0 transport: InfiniBand (0) fw_ver: 12.20.1010 node_guid: 7cfe:9003:0072:6ed2 sys_image_guid: 7cfe:9003:0072:6ed2 vendor_id: 0x02c9 vendor_part_id: 4115 hw_ver: 0x0 board_id: MT_2190110032 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:4096 (5) active_mtu: 4096 (5) sm_lid: 4 port_lid: 4 port_lmc: 0x00 link_layer: InfiniBand Forget to mention, there is failure when running 'make' under srp-test because shellcheck package is missed in RHEL7. Can that be the issue of test failure? If yes, could you provide a special version of srp-test which doesn't depend on shellcheck? > > > Also could you provide some output from debugfs when this hang happens? > > such as: > > > > dispatch/busy/.. under hctxN > > This is the output that appears on my test setup if the two patches from > the patch series "block: remove unnecessary RESTART" are applied: > > # ~bart/software/infiniband/srp-test/run_tests -f xfs -d -e deadline -r 60 > > Unloaded the ib_srpt kernel module > Zero-initializing /dev/ram0 ... done > Zero-initializing /dev/ram1 ... done > Zero-initializing /dev/disk/by-id/scsi-04650 ... done > brw-rw 1 root disk 1, 0 Nov 3 08:10 /dev/ram0 > /dev/ram0 > brw-rw 1 root disk 1, 1 Nov 3 08:10 /dev/ram1 > /dev/ram1 > lrwxrwxrwx 1 root root 9 Nov 3 08:11 /dev/disk/by-id/scsi-04650 > -> ../../sdi > /dev/sdi > Configured SRP target driver > Running test /home/bart/software/infiniband/srp-test/tests/01 ... > id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d51,pkey=7fff,service_id=0x0002c90300a02d50, > >/sys/class/infiniband_srp/ > srp-mlx4_0-2/add_target failed: Invalid argument > id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d52,pkey=7fff,service_id=0x0002c90300a02d50, > >/sys/class/infiniband_srp/ > srp-mlx4_0-2/add_target failed: Invalid argument > count_luns(): 3 <> 3 > Error: unloading kernel module ib_srp failed > Test /home/bart/software/infiniband/srp-test/tests/01 failed > # cat /sys/module/scsi_mod/parameters/use_blk_mq > Y > # find /sys/kernel/debug/block -name busy | xargs cat Can you check 'dispatch' file, hctx state, tags and sched_tags? And see where is the IO not dispatched? > # dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg > [ 709.321750] sysrq: SysRq : Show Blocked State > [ 709.327906] taskPC stack pid father ... > [ 709.456602] kworker/u66:6 D0 341 2 0x8000 > [ 709.463371] Workqueue: events_unbound async_run_entry_fn > [
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Fri, 2017-11-03 at 23:18 +0800, Ming Lei wrote: > Subject: [PATCH] SCSI_MQ: fix IO hang in case of queue busy > > We have to insert the rq back before checking .device_busy, > otherwise When IO completes just after the check and before > this req is added to hctx->dispatch, this queue may never get > chance to be run, then this IO may hang forever. > > This patch introduces BLK_STS_RESOURCE_OK for handling this > issue. > > Signed-off-by: Ming Lei> --- > block/blk-mq.c| 17 + > drivers/scsi/scsi_lib.c | 8 > include/linux/blk-mq.h| 1 + > include/linux/blk_types.h | 1 + > 4 files changed, 27 insertions(+) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index e4d2490f4e7e..e1e03576edca 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -660,6 +660,16 @@ static void __blk_mq_requeue_request(struct request *rq) > } > } > > +void blk_mq_reinsert_request_hctx(struct blk_mq_hw_ctx *hctx, struct request > *rq) > +{ > + __blk_mq_requeue_request(rq); > + > + spin_lock(>lock); > + list_add_tail(>queuelist, >dispatch); > + spin_unlock(>lock); > +} > +EXPORT_SYMBOL(blk_mq_reinsert_request_hctx); > + > void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list) > { > __blk_mq_requeue_request(rq); > @@ -1165,6 +1175,12 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, > struct list_head *list, > list_add(>queuelist, list); > __blk_mq_requeue_request(rq); > break; > + } else if (ret == BLK_STS_RESOURCE_OK) { > + /* > + * BLK_STS_RESOURCE_OK means driver handled this > + * STS_RESOURCE already, we just need to stop dispatch. > + */ > + break; > } > > fail_rq: > @@ -1656,6 +1672,7 @@ static void __blk_mq_try_issue_directly(struct > blk_mq_hw_ctx *hctx, > ret = q->mq_ops->queue_rq(hctx, ); > switch (ret) { > case BLK_STS_OK: > + case BLK_STS_RESOURCE_OK: > *cookie = new_cookie; > return; > case BLK_STS_RESOURCE: > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 7f218ef61900..0165c1caed82 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -2030,9 +2030,17 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx > *hctx, > case BLK_STS_OK: > break; > case BLK_STS_RESOURCE: > + /* > + * We have to insert the rq back before checking .device_busy, > + * otherwise when IO completes just after the check and before > + * this req is added to hctx->dispatch, this queue may never get > + * chance to be run, then this IO may hang forever. > + */ > + blk_mq_reinsert_request_hctx(hctx, req); > if (atomic_read(>device_busy) == 0 && > !scsi_device_blocked(sdev)) > blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); > + ret = BLK_STS_RESOURCE_OK; > break; > default: > /* > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h > index e5e6becd57d3..4740f643d8c5 100644 > --- a/include/linux/blk-mq.h > +++ b/include/linux/blk-mq.h > @@ -244,6 +244,7 @@ void blk_mq_start_request(struct request *rq); > void blk_mq_end_request(struct request *rq, blk_status_t error); > void __blk_mq_end_request(struct request *rq, blk_status_t error); > > +void blk_mq_reinsert_request_hctx(struct blk_mq_hw_ctx *hctx, struct request > *rq); > void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list); > void blk_mq_add_to_requeue_list(struct request *rq, bool at_head, > bool kick_requeue_list); > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index 3385c89f402e..b630cc026a93 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -32,6 +32,7 @@ typedef u8 __bitwise blk_status_t; > #define BLK_STS_PROTECTION ((__force blk_status_t)8) > #define BLK_STS_RESOURCE ((__force blk_status_t)9) > #define BLK_STS_IOERR((__force blk_status_t)10) > +#define BLK_STS_RESOURCE_OK ((__force blk_status_t)11) > > /* hack for device mapper, don't use elsewhere: */ > #define BLK_STS_DM_REQUEUE((__force blk_status_t)11) Running test one with this patch applied on top of Jens' for-next branch yields the same result as without this patch: [ ... ] Error: unloading kernel module ib_srp failed Test /home/bart/software/infiniband/srp-test/tests/01 failed # dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg [ 325.167295] sysrq: SysRq : Show Blocked State [ 325.173612] taskPC stack pid father [ 325.181903] kworker/10:1D0 165 2 0x8000 [ 325.189396]
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Fri, 2017-11-03 at 11:50 +0800, Ming Lei wrote: > On Fri, Nov 03, 2017 at 02:42:50AM +, Bart Van Assche wrote: > > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote: > > > [root@ibclient srp-test]# ./run_tests > > > modprobe: FATAL: Module target_core_mod is in use. > > > > LIO must be unloaded before srp-test software is started. > > Yeah, I can make this test kick off after running > 'modprobe -fr ib_isert' first, but looks all tests failed without > any hang, could you check if that is the expected result? > > https://pastebin.com/VXe66Jpg I see plenty of "... >/sys/.../add_target failed: Connection timed out" messages. The srp-test scripts use the loopback functionality of IB ports so one or more IB ports must be active. Have you already checked the IB ports state, e.g. by running ibv_devinfo? > Also could you provide some output from debugfs when this hang happens? > such as: > > dispatch/busy/.. under hctxN This is the output that appears on my test setup if the two patches from the patch series "block: remove unnecessary RESTART" are applied: # ~bart/software/infiniband/srp-test/run_tests -f xfs -d -e deadline -r 60 Unloaded the ib_srpt kernel module Zero-initializing /dev/ram0 ... done Zero-initializing /dev/ram1 ... done Zero-initializing /dev/disk/by-id/scsi-04650 ... done brw-rw 1 root disk 1, 0 Nov 3 08:10 /dev/ram0 /dev/ram0 brw-rw 1 root disk 1, 1 Nov 3 08:10 /dev/ram1 /dev/ram1 lrwxrwxrwx 1 root root 9 Nov 3 08:11 /dev/disk/by-id/scsi-04650 -> ../../sdi /dev/sdi Configured SRP target driver Running test /home/bart/software/infiniband/srp-test/tests/01 ... id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d51,pkey=7fff,service_id=0x0002c90300a02d50, >/sys/class/infiniband_srp/ srp-mlx4_0-2/add_target failed: Invalid argument id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe82c90300a02d52,pkey=7fff,service_id=0x0002c90300a02d50, >/sys/class/infiniband_srp/ srp-mlx4_0-2/add_target failed: Invalid argument count_luns(): 3 <> 3 Error: unloading kernel module ib_srp failed Test /home/bart/software/infiniband/srp-test/tests/01 failed # cat /sys/module/scsi_mod/parameters/use_blk_mq Y # find /sys/kernel/debug/block -name busy | xargs cat # dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg [ 709.321750] sysrq: SysRq : Show Blocked State [ 709.327906] taskPC stack pid father [ 709.335910] kworker/4:1 D054 2 0x8000 [ 709.342887] Workqueue: srp_remove srp_remove_work [ib_srp] [ 709.349806] Call Trace: [ 709.353365] __schedule+0x2fa/0xbb0 [ 709.357865] schedule+0x36/0x90 [ 709.361985] async_synchronize_cookie_domain+0x88/0x130 [ 709.368422] ? finish_wait+0x90/0x90 [ 709.373033] async_synchronize_full_domain+0x18/0x20 [ 709.379184] sd_remove+0x4d/0xc0 [sd_mod] [ 709.384282] device_release_driver_internal+0x160/0x210 [ 709.390752] device_release_driver+0x12/0x20 [ 709.396130] bus_remove_device+0x100/0x180 [ 709.401332] device_del+0x1d8/0x340 [ 709.405823] __scsi_remove_device+0xfc/0x130 [ 709.411221] scsi_forget_host+0x25/0x70 [ 709.416108] scsi_remove_host+0x79/0x120 [ 709.421114] srp_remove_work+0x90/0x1d0 [ib_srp] [ 709.426867] process_one_work+0x20a/0x660 [ 709.431971] worker_thread+0x3d/0x3b0 [ 709.436655] kthread+0x13a/0x150 [ 709.440872] ? process_one_work+0x660/0x660 [ 709.446133] ? kthread_create_on_node+0x40/0x40 [ 709.451825] ret_from_fork+0x27/0x40 [ 709.456602] kworker/u66:6 D0 341 2 0x8000 [ 709.463371] Workqueue: events_unbound async_run_entry_fn [ 709.469919] Call Trace: [ 709.473325] __schedule+0x2fa/0xbb0 [ 709.477850] ? trace_hardirqs_on_caller+0xfb/0x190 [ 709.483847] schedule+0x36/0x90 [ 709.487982] schedule_timeout+0x22c/0x570 [ 709.493115] ? call_timer_fn+0x330/0x330 [ 709.498133] ? wait_for_completion_io_timeout+0xf7/0x180 [ 709.504732] io_schedule_timeout+0x1e/0x50 [ 709.509938] ? io_schedule_timeout+0x1e/0x50 [ 709.515413] wait_for_completion_io_timeout+0x11f/0x180 [ 709.521920] ? wake_up_q+0x80/0x80 [ 709.526376] blk_execute_rq+0x86/0xc0 [ 709.531135] scsi_execute+0xdb/0x1f0 [ 709.536336] sd_revalidate_disk+0xed/0x1c70 [sd_mod] [ 709.543281] sd_probe_async+0xc3/0x1d0 [sd_mod] [ 709.549524] async_run_entry_fn+0x38/0x160 [ 709.03] process_one_work+0x20a/0x660 [ 709.561299] worker_thread+0x3d/0x3b0 [ 709.566688] kthread+0x13a/0x150 [ 709.571613] ? process_one_work+0x660/0x660 [ 709.577561] ? kthread_create_on_node+0x40/0x40 [ 709.583910] ret_from_fork+0x27/0x40 [ 709.589174] kworker/5:2 D0 386 2 0x8000 [ 709.596616] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] [ 709.604130] Call Trace: [ 709.608096] __schedule+0x2fa/0xbb0 [ 709.613265] ? trace_hardirqs_on_caller+0xfb/0x190 [ 709.619879] schedule+0x36/0x90 [ 709.624684]
Re: [PATCH V2 0/2] block: remove unnecessary RESTART
On Fri, Nov 03, 2017 at 02:42:50AM +, Bart Van Assche wrote: > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote: > > [root@ibclient srp-test]# ./run_tests > > modprobe: FATAL: Module target_core_mod is in use. > > LIO must be unloaded before srp-test software is started. Hi Bart, Even with help of Laurence, we still can't setup your srp-test in our test environment today. But we have run Laurence's usual 3 tests on IB/SRP with/without all my following patches against V4.14-rc4, looks everything is fine, and no I/O hang is observed. 0001-blk-mq-sched-dispatch-from-scheduler-IFF-progress-is.patch 0002-blk-mq-sched-move-actual-dispatching-into-one-helper.patch 0003-sbitmap-introduce-__sbitmap_for_each_set.patch 0004-block-kyber-check-if-there-are-requests-in-ctx-in-ky.patch 0005-blk-mq-introduce-.get_budget-and-.put_budget-in-blk_.patch 0006-blk-mq-sched-improve-dispatching-from-sw-queue.patch 0007-scsi-allow-passing-in-null-rq-to-scsi_prep_state_che.patch 0008-scsi-implement-.get_budget-and-.put_budget-for-blk-m.patch 0009-blk-mq-don-t-handle-TAG_SHARED-in-restart.patch 0010-blk-mq-don-t-restart-queue-when-.get_budget-returns-.patch BTW, Laurence found there is kernel crash in his IB/SRP test when running for-next branch of block tree, so we just test v4.14-rc4 w/wo my blk-mq patches. And I looked at the SCSI's queue_rq code for a while, and only found one issue which may cause IO hang, and the following patch may address this issue, but not sure if it is same with your issue. Could you apply this patch and see if your issue can be fixed? BTW, it should be helpful to check the blk-mq debugfs related files when your I/O hang happens, could you provide that info? -- >From edcb243d9a6f3446bd9a9f95c00bed7616dd7368 Mon Sep 17 00:00:00 2001 From: Ming LeiDate: Fri, 3 Nov 2017 12:11:59 +0800 Subject: [PATCH] SCSI_MQ: fix IO hang in case of queue busy We have to insert the rq back before checking .device_busy, otherwise When IO completes just after the check and before this req is added to hctx->dispatch, this queue may never get chance to be run, then this IO may hang forever. This patch introduces BLK_STS_RESOURCE_OK for handling this issue. Signed-off-by: Ming Lei --- block/blk-mq.c| 17 + drivers/scsi/scsi_lib.c | 8 include/linux/blk-mq.h| 1 + include/linux/blk_types.h | 1 + 4 files changed, 27 insertions(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index e4d2490f4e7e..e1e03576edca 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -660,6 +660,16 @@ static void __blk_mq_requeue_request(struct request *rq) } } +void blk_mq_reinsert_request_hctx(struct blk_mq_hw_ctx *hctx, struct request *rq) +{ + __blk_mq_requeue_request(rq); + + spin_lock(>lock); + list_add_tail(>queuelist, >dispatch); + spin_unlock(>lock); +} +EXPORT_SYMBOL(blk_mq_reinsert_request_hctx); + void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list) { __blk_mq_requeue_request(rq); @@ -1165,6 +1175,12 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, list_add(>queuelist, list); __blk_mq_requeue_request(rq); break; + } else if (ret == BLK_STS_RESOURCE_OK) { + /* +* BLK_STS_RESOURCE_OK means driver handled this +* STS_RESOURCE already, we just need to stop dispatch. +*/ + break; } fail_rq: @@ -1656,6 +1672,7 @@ static void __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, ret = q->mq_ops->queue_rq(hctx, ); switch (ret) { case BLK_STS_OK: + case BLK_STS_RESOURCE_OK: *cookie = new_cookie; return; case BLK_STS_RESOURCE: diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 7f218ef61900..0165c1caed82 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2030,9 +2030,17 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, case BLK_STS_OK: break; case BLK_STS_RESOURCE: + /* +* We have to insert the rq back before checking .device_busy, +* otherwise when IO completes just after the check and before +* this req is added to hctx->dispatch, this queue may never get +* chance to be run, then this IO may hang forever. +*/ + blk_mq_reinsert_request_hctx(hctx, req); if (atomic_read(>device_busy) == 0 && !scsi_device_blocked(sdev)) blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); + ret = BLK_STS_RESOURCE_OK; break;
Re: [PATCH 3/3] nvme: fix eui_show() print format
On Fri, Nov 03, 2017 at 01:55:16PM +0100, Christoph Hellwig wrote: > On Fri, Nov 03, 2017 at 11:02:50AM +0100, Javier González wrote: > > Signed-off-by: Javier González> > --- > > drivers/nvme/host/core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index ae8ab0a1ef0d..f05c81774abf 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -2108,7 +2108,7 @@ static ssize_t eui_show(struct device *dev, struct > > device_attribute *attr, > > char *buf) > > { > > struct nvme_ns *ns = nvme_get_ns_from_dev(dev); > > - return sprintf(buf, "%8phd\n", ns->eui); > > + return sprintf(buf, "%8phD\n", ns->eui); > > } > > static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL); > > This looks correct. I wonder what the old code printed - does someone > have a device with an EUI-64 at hand to quickly cross check what we > did before? It just prints the same as the 'ph' format, which would look like this: 01 02 03 04 05 06 07 08 The change will make it look like this: 01-02-03-04-05-06-07-08 I think that was the original intention. Reviewed-by: Keith Busch
Re: [PATCH 1/3] nvme: do not check for ns on rw path
On Fri, Nov 03, 2017 at 01:53:40PM +0100, Christoph Hellwig wrote: > > - if (ns && ns->ms && > > + if (ns->ms && > > (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)) && > > !blk_integrity_rq(req) && !blk_rq_is_passthrough(req)) > > return BLK_STS_NOTSUPP; > > blk_rq_is_passthrough also can't be true here. > > How about: > > if (ns->ms && !blk_integrity_rq(req) && > (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple))) > return BLK_STS_NOTSUPP; > > Although I have to admit I don't really understand what this check > is even trying to do. It basically checks for a namespace that has > a format with metadata that is not T10 protection information and > then rejects all I/O to it. Why are we even creating a block device > node for such a thing? If the namespace has metadata, but the request doesn't have a metadata payload attached to it for whatever reason, we can't construct the command for that format. We also can't have the controller strip/generate the payload with PRACT bit set if it's not a T10 format, so we just fail the command.
[PATCH V13 02/10] mmc: block: Add error-handling comments
Add error-handling comments to explain what would also be done for blk-mq if it used the legacy error-handling. Signed-off-by: Adrian Hunter--- drivers/mmc/core/block.c | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index ea80ff4cd7f9..ad72fa19f082 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1894,7 +1894,11 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) case MMC_BLK_SUCCESS: case MMC_BLK_PARTIAL: /* -* A block was successfully transferred. +* Reset success, and accept bytes_xfered. For +* MMC_BLK_PARTIAL re-submit the remaining request. For +* MMC_BLK_SUCCESS error out the remaining request (it +* could not be re-submitted anyway if a next request +* had already begun). */ mmc_blk_reset_success(md, type); @@ -1914,6 +1918,14 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) } break; case MMC_BLK_CMD_ERR: + /* +* For SD cards, get bytes written, but do not accept +* bytes_xfered if that fails. For MMC cards accept +* bytes_xfered. Then try to reset. If reset fails then +* error out the remaining request, otherwise retry +* once (N.B mmc_blk_reset() will not succeed twice in a +* row). +*/ req_pending = mmc_blk_rw_cmd_err(md, card, brq, old_req, req_pending); if (mmc_blk_reset(md, card->host, type)) { if (req_pending) @@ -1930,11 +1942,20 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) } break; case MMC_BLK_RETRY: + /* +* Do not accept bytes_xfered, but retry up to 5 times, +* otherwise same as abort. +*/ retune_retry_done = brq->retune_retry_done; if (retry++ < 5) break; /* Fall through */ case MMC_BLK_ABORT: + /* +* Do not accept bytes_xfered, but try to reset. If +* reset succeeds, try once more, otherwise error out +* the request. +*/ if (!mmc_blk_reset(md, card->host, type)) break; mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq); @@ -1943,6 +1964,13 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) case MMC_BLK_DATA_ERR: { int err; + /* +* Do not accept bytes_xfered, but try to reset. If +* reset succeeds, try once more. If reset fails with +* ENODEV which means the partition is wrong, then error +* out the request. Otherwise attempt to read one sector +* at a time. +*/ err = mmc_blk_reset(md, card->host, type); if (!err) break; @@ -1954,6 +1982,10 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) /* Fall through */ } case MMC_BLK_ECC_ERR: + /* +* Do not accept bytes_xfered. If reading more than one +* sector, try reading one sector at a time. +*/ if (brq->data.blocks > 1) { /* Redo read one sector at a time */ pr_warn("%s: retrying using single block read\n", @@ -1975,10 +2007,12 @@ static void mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *new_req) } break; case MMC_BLK_NOMEDIUM: + /* Do not accept bytes_xfered. Error out the request */ mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq); mmc_blk_rw_try_restart(mq, new_req, mqrq_cur); return; default: + /* Do not accept bytes_xfered. Error out the request */
[PATCH V13 01/10] mmc: core: Add parameter use_blk_mq
Until mmc has blk-mq support fully implemented and tested, add a parameter use_blk_mq, default to false unless config option MMC_MQ_DEFAULT is selected. Signed-off-by: Adrian Hunter--- drivers/mmc/Kconfig | 11 +++ drivers/mmc/core/core.c | 7 +++ drivers/mmc/core/core.h | 2 ++ drivers/mmc/core/host.c | 2 ++ drivers/mmc/core/host.h | 4 include/linux/mmc/host.h | 1 + 6 files changed, 27 insertions(+) diff --git a/drivers/mmc/Kconfig b/drivers/mmc/Kconfig index ec21388311db..98202934bd29 100644 --- a/drivers/mmc/Kconfig +++ b/drivers/mmc/Kconfig @@ -12,6 +12,17 @@ menuconfig MMC If you want MMC/SD/SDIO support, you should say Y here and also to your specific host controller driver. +config MMC_MQ_DEFAULT + bool "MMC: use blk-mq I/O path by default" + depends on MMC && BLOCK + ---help--- + This option enables the new blk-mq based I/O path for MMC block + devices by default. With the option the mmc_core.use_blk_mq + module/boot option defaults to Y, without it to N, but it can + still be overridden either way. + + If unsure say N. + if MMC source "drivers/mmc/core/Kconfig" diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 1f0f44f4dd5f..5df03cb73be7 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -66,6 +66,13 @@ bool use_spi_crc = 1; module_param(use_spi_crc, bool, 0); +#ifdef CONFIG_MMC_MQ_DEFAULT +bool mmc_use_blk_mq = true; +#else +bool mmc_use_blk_mq = false; +#endif +module_param_named(use_blk_mq, mmc_use_blk_mq, bool, S_IWUSR | S_IRUGO); + static int mmc_schedule_delayed_work(struct delayed_work *work, unsigned long delay) { diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h index 71e6c6d7ceb7..8c5dd8d31400 100644 --- a/drivers/mmc/core/core.h +++ b/drivers/mmc/core/core.h @@ -35,6 +35,8 @@ struct mmc_bus_ops { int (*reset)(struct mmc_host *); }; +extern bool mmc_use_blk_mq; + void mmc_attach_bus(struct mmc_host *host, const struct mmc_bus_ops *ops); void mmc_detach_bus(struct mmc_host *host); diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c index 35a9e4fd1a9f..62ef6cb0ece4 100644 --- a/drivers/mmc/core/host.c +++ b/drivers/mmc/core/host.c @@ -404,6 +404,8 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev) host->fixed_drv_type = -EINVAL; + host->use_blk_mq = mmc_use_blk_mq; + return host; } diff --git a/drivers/mmc/core/host.h b/drivers/mmc/core/host.h index fb689a1065ed..6eaf558e62d6 100644 --- a/drivers/mmc/core/host.h +++ b/drivers/mmc/core/host.h @@ -74,6 +74,10 @@ static inline bool mmc_card_hs400es(struct mmc_card *card) return card->host->ios.enhanced_strobe; } +static inline bool mmc_host_use_blk_mq(struct mmc_host *host) +{ + return host->use_blk_mq; +} #endif diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index e7743eca1021..ce2075d6f429 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -380,6 +380,7 @@ struct mmc_host { unsigned intdoing_retune:1; /* re-tuning in progress */ unsigned intretune_now:1; /* do re-tuning at next req */ unsigned intretune_paused:1; /* re-tuning is temporarily disabled */ + unsigned intuse_blk_mq:1; /* use blk-mq */ int rescan_disable; /* disable card detection */ int rescan_entered; /* used with nonremovable devices */ -- 1.9.1
[PATCH V13 05/10] mmc: cqhci: support for command queue enabled host
From: Venkat GopalakrishnanThis patch adds CMDQ support for command-queue compatible hosts. Command queue is added in eMMC-5.1 specification. This enables the controller to process upto 32 requests at a time. Adrian Hunter contributed renaming to cqhci, recovery, suspend and resume, cqhci_off, cqhci_wait_for_idle, and external timeout handling. Signed-off-by: Asutosh Das Signed-off-by: Sujit Reddy Thumma Signed-off-by: Konstantin Dorfman Signed-off-by: Venkat Gopalakrishnan Signed-off-by: Subhash Jadavani Signed-off-by: Ritesh Harjani Signed-off-by: Adrian Hunter --- drivers/mmc/host/Kconfig | 13 + drivers/mmc/host/Makefile |1 + drivers/mmc/host/cqhci.c | 1150 + drivers/mmc/host/cqhci.h | 240 ++ 4 files changed, 1404 insertions(+) create mode 100644 drivers/mmc/host/cqhci.c create mode 100644 drivers/mmc/host/cqhci.h diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index 567028c9219a..3092b7085cb5 100644 --- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -857,6 +857,19 @@ config MMC_SUNXI This selects support for the SD/MMC Host Controller on Allwinner sunxi SoCs. +config MMC_CQHCI + tristate "Command Queue Host Controller Interface support" + depends on HAS_DMA + help + This selects the Command Queue Host Controller Interface (CQHCI) + support present in host controllers of Qualcomm Technologies, Inc + amongst others. + This controller supports eMMC devices with command queue support. + + If you have a controller with this interface, say Y or M here. + + If unsure, say N. + config MMC_TOSHIBA_PCI tristate "Toshiba Type A SD/MMC Card Interface Driver" depends on PCI diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile index ab61a3e39c0b..de140e3ef402 100644 --- a/drivers/mmc/host/Makefile +++ b/drivers/mmc/host/Makefile @@ -91,6 +91,7 @@ obj-$(CONFIG_MMC_SDHCI_ST)+= sdhci-st.o obj-$(CONFIG_MMC_SDHCI_MICROCHIP_PIC32)+= sdhci-pic32.o obj-$(CONFIG_MMC_SDHCI_BRCMSTB)+= sdhci-brcmstb.o obj-$(CONFIG_MMC_SDHCI_OMAP) += sdhci-omap.o +obj-$(CONFIG_MMC_CQHCI)+= cqhci.o ifeq ($(CONFIG_CB710_DEBUG),y) CFLAGS-cb710-mmc+= -DDEBUG diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c new file mode 100644 index ..159270e947cf --- /dev/null +++ b/drivers/mmc/host/cqhci.c @@ -0,0 +1,1150 @@ +/* Copyright (c) 2015, The Linux Foundation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "cqhci.h" + +#define DCMD_SLOT 31 +#define NUM_SLOTS 32 + +struct cqhci_slot { + struct mmc_request *mrq; + unsigned int flags; +#define CQHCI_EXTERNAL_TIMEOUT BIT(0) +#define CQHCI_COMPLETEDBIT(1) +#define CQHCI_HOST_CRC BIT(2) +#define CQHCI_HOST_TIMEOUT BIT(3) +#define CQHCI_HOST_OTHER BIT(4) +}; + +static inline u8 *get_desc(struct cqhci_host *cq_host, u8 tag) +{ + return cq_host->desc_base + (tag * cq_host->slot_sz); +} + +static inline u8 *get_link_desc(struct cqhci_host *cq_host, u8 tag) +{ + u8 *desc = get_desc(cq_host, tag); + + return desc + cq_host->task_desc_len; +} + +static inline dma_addr_t get_trans_desc_dma(struct cqhci_host *cq_host, u8 tag) +{ + return cq_host->trans_desc_dma_base + + (cq_host->mmc->max_segs * tag * +cq_host->trans_desc_len); +} + +static inline u8 *get_trans_desc(struct cqhci_host *cq_host, u8 tag) +{ + return cq_host->trans_desc_base + + (cq_host->trans_desc_len * cq_host->mmc->max_segs * tag); +} + +static void setup_trans_desc(struct cqhci_host *cq_host, u8 tag) +{ + u8 *link_temp; + dma_addr_t trans_temp; + + link_temp = get_link_desc(cq_host, tag); + trans_temp = get_trans_desc_dma(cq_host, tag); + + memset(link_temp, 0, cq_host->link_desc_len); + if (cq_host->link_desc_len > 8) + *(link_temp + 8) = 0; + + if (tag == DCMD_SLOT && (cq_host->mmc->caps2 & MMC_CAP2_CQE_DCMD)) { + *link_temp = CQHCI_VALID(0)
[PATCH V13 07/10] mmc: block: blk-mq: Add support for direct completion
For blk-mq, add support for completing requests directly in the ->done callback. That means that error handling and urgent background operations must be handled by recovery_work in that case. Signed-off-by: Adrian Hunter--- drivers/mmc/core/block.c | 100 +-- drivers/mmc/core/block.h | 1 + drivers/mmc/core/queue.c | 5 ++- drivers/mmc/core/queue.h | 6 +++ include/linux/mmc/host.h | 1 + 5 files changed, 101 insertions(+), 12 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index e8be17152884..cbb4b35a592d 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -2086,6 +2086,22 @@ static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req) } } +static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq) +{ + mmc_blk_eval_resp_error(brq); + + return brq->sbc.error || brq->cmd.error || brq->stop.error || + brq->data.error || brq->cmd.resp[0] & CMD_ERRORS; +} + +static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq, + struct request *req) +{ + int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; + + mmc_blk_reset_success(mq->blkdata, type); +} + static void mmc_blk_mq_complete_rq(struct mmc_queue *mq, struct request *req) { struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); @@ -2167,14 +2183,43 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req) if (host->ops->post_req) host->ops->post_req(host, mrq, 0); - blk_mq_complete_request(req); + /* +* Block layer timeouts race with completions which means the normal +* completion path cannot be used during recovery. +*/ + if (mq->in_recovery) + mmc_blk_mq_complete_rq(mq, req); + else + blk_mq_complete_request(req); mmc_blk_mq_acct_req_done(mq, req); } +void mmc_blk_mq_recovery(struct mmc_queue *mq) +{ + struct request *req = mq->recovery_req; + struct mmc_host *host = mq->card->host; + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + + mq->recovery_req = NULL; + mq->rw_wait = false; + + if (mmc_blk_rq_error(>brq)) { + mmc_retune_hold_now(host); + mmc_blk_rw_recovery(mq, req); + } + + mmc_blk_urgent_bkops(mq, mqrq); + + mmc_blk_mq_post_req(mq, req); +} + static void mmc_blk_mq_complete_prev_req(struct mmc_queue *mq, struct request **prev_req) { + if (mmc_queue_direct_complete(mq->card->host)) + return; + mutex_lock(>complete_lock); if (!mq->complete_req) @@ -2208,19 +2253,43 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq) struct request *req = mmc_queue_req_to_req(mqrq); struct request_queue *q = req->q; struct mmc_queue *mq = q->queuedata; + struct mmc_host *host = mq->card->host; unsigned long flags; - bool waiting; - spin_lock_irqsave(q->queue_lock, flags); - mq->complete_req = req; - mq->rw_wait = false; - waiting = mq->waiting; - spin_unlock_irqrestore(q->queue_lock, flags); + if (!mmc_queue_direct_complete(host)) { + bool waiting; + + spin_lock_irqsave(q->queue_lock, flags); + mq->complete_req = req; + mq->rw_wait = false; + waiting = mq->waiting; + spin_unlock_irqrestore(q->queue_lock, flags); + + if (waiting) + wake_up(>wait); + else + kblockd_schedule_work(>complete_work); - if (waiting) + return; + } + + if (mmc_blk_rq_error(>brq) || + mmc_blk_urgent_bkops_needed(mq, mqrq)) { + spin_lock_irqsave(q->queue_lock, flags); + mq->recovery_needed = true; + mq->recovery_req = req; + spin_unlock_irqrestore(q->queue_lock, flags); wake_up(>wait); - else - kblockd_schedule_work(>complete_work); + schedule_work(>recovery_work); + return; + } + + mmc_blk_rw_reset_success(mq, req); + + mq->rw_wait = false; + wake_up(>wait); + + mmc_blk_mq_post_req(mq, req); } static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err) @@ -2230,7 +2299,12 @@ static bool mmc_blk_rw_wait_cond(struct mmc_queue *mq, int *err) bool done; spin_lock_irqsave(q->queue_lock, flags); - done = !mq->rw_wait; + if (mq->recovery_needed) { + *err = -EBUSY; + done = true; + } else { + done = !mq->rw_wait; + } mq->waiting = !done; spin_unlock_irqrestore(q->queue_lock, flags); @@
[PATCH V13 06/10] mmc: sdhci-pci: Add CQHCI support for Intel GLK
Add CQHCI initialization and implement CQHCI operations for Intel GLK. Signed-off-by: Adrian Hunter--- drivers/mmc/host/Kconfig | 1 + drivers/mmc/host/sdhci-pci-core.c | 155 +- 2 files changed, 155 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig index 3092b7085cb5..2b02a9788bb6 100644 --- a/drivers/mmc/host/Kconfig +++ b/drivers/mmc/host/Kconfig @@ -81,6 +81,7 @@ config MMC_SDHCI_BIG_ENDIAN_32BIT_BYTE_SWAPPER config MMC_SDHCI_PCI tristate "SDHCI support on PCI bus" depends on MMC_SDHCI && PCI + select MMC_CQHCI help This selects the PCI Secure Digital Host Controller Interface. Most controllers found today are PCI devices. diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c index 3e4f04fd5175..110c634cfb43 100644 --- a/drivers/mmc/host/sdhci-pci-core.c +++ b/drivers/mmc/host/sdhci-pci-core.c @@ -30,6 +30,8 @@ #include #include +#include "cqhci.h" + #include "sdhci.h" #include "sdhci-pci.h" @@ -116,6 +118,28 @@ int sdhci_pci_resume_host(struct sdhci_pci_chip *chip) return 0; } + +static int sdhci_cqhci_suspend(struct sdhci_pci_chip *chip) +{ + int ret; + + ret = cqhci_suspend(chip->slots[0]->host->mmc); + if (ret) + return ret; + + return sdhci_pci_suspend_host(chip); +} + +static int sdhci_cqhci_resume(struct sdhci_pci_chip *chip) +{ + int ret; + + ret = sdhci_pci_resume_host(chip); + if (ret) + return ret; + + return cqhci_resume(chip->slots[0]->host->mmc); +} #endif #ifdef CONFIG_PM @@ -166,8 +190,48 @@ static int sdhci_pci_runtime_resume_host(struct sdhci_pci_chip *chip) return 0; } + +static int sdhci_cqhci_runtime_suspend(struct sdhci_pci_chip *chip) +{ + int ret; + + ret = cqhci_suspend(chip->slots[0]->host->mmc); + if (ret) + return ret; + + return sdhci_pci_runtime_suspend_host(chip); +} + +static int sdhci_cqhci_runtime_resume(struct sdhci_pci_chip *chip) +{ + int ret; + + ret = sdhci_pci_runtime_resume_host(chip); + if (ret) + return ret; + + return cqhci_resume(chip->slots[0]->host->mmc); +} #endif +static u32 sdhci_cqhci_irq(struct sdhci_host *host, u32 intmask) +{ + int cmd_error = 0; + int data_error = 0; + + if (!sdhci_cqe_irq(host, intmask, _error, _error)) + return intmask; + + cqhci_irq(host->mmc, intmask, cmd_error, data_error); + + return 0; +} + +static void sdhci_pci_dumpregs(struct mmc_host *mmc) +{ + sdhci_dumpregs(mmc_priv(mmc)); +} + /*\ * * * Hardware specific quirk handling * @@ -583,6 +647,18 @@ static void sdhci_intel_voltage_switch(struct sdhci_host *host) .voltage_switch = sdhci_intel_voltage_switch, }; +static const struct sdhci_ops sdhci_intel_glk_ops = { + .set_clock = sdhci_set_clock, + .set_power = sdhci_intel_set_power, + .enable_dma = sdhci_pci_enable_dma, + .set_bus_width = sdhci_set_bus_width, + .reset = sdhci_reset, + .set_uhs_signaling = sdhci_set_uhs_signaling, + .hw_reset = sdhci_pci_hw_reset, + .voltage_switch = sdhci_intel_voltage_switch, + .irq= sdhci_cqhci_irq, +}; + static void byt_read_dsm(struct sdhci_pci_slot *slot) { struct intel_host *intel_host = sdhci_pci_priv(slot); @@ -612,12 +688,80 @@ static int glk_emmc_probe_slot(struct sdhci_pci_slot *slot) { int ret = byt_emmc_probe_slot(slot); + slot->host->mmc->caps2 |= MMC_CAP2_CQE; + if (slot->chip->pdev->device != PCI_DEVICE_ID_INTEL_GLK_EMMC) { slot->host->mmc->caps2 |= MMC_CAP2_HS400_ES, slot->host->mmc_host_ops.hs400_enhanced_strobe = intel_hs400_enhanced_strobe; + slot->host->mmc->caps2 |= MMC_CAP2_CQE_DCMD; + } + + return ret; +} + +static void glk_cqe_enable(struct mmc_host *mmc) +{ + struct sdhci_host *host = mmc_priv(mmc); + u32 reg; + + /* +* CQE gets stuck if it sees Buffer Read Enable bit set, which can be +* the case after tuning, so ensure the buffer is drained. +*/ + reg = sdhci_readl(host, SDHCI_PRESENT_STATE); + while (reg & SDHCI_DATA_AVAILABLE) { + sdhci_readl(host, SDHCI_BUFFER); + reg = sdhci_readl(host, SDHCI_PRESENT_STATE); + } + + sdhci_cqe_enable(mmc); +} + +static const struct cqhci_host_ops glk_cqhci_ops = { + .enable
[PATCH V13 09/10] mmc: block: blk-mq: Stop using card_busy_detect()
card_busy_detect() doesn't set a correct timeout, and it doesn't take care of error status bits. Stop using it for blk-mq. Signed-off-by: Adrian Hunter--- drivers/mmc/core/block.c | 117 +++ 1 file changed, 109 insertions(+), 8 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 0c29b1d8d545..5c5ff3c34313 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1426,15 +1426,18 @@ static inline void mmc_apply_rel_rw(struct mmc_blk_request *brq, } } -#define CMD_ERRORS \ - (R1_OUT_OF_RANGE | /* Command argument out of range */ \ -R1_ADDRESS_ERROR | /* Misaligned address */\ +#define CMD_ERRORS_EXCL_OOR\ + (R1_ADDRESS_ERROR | /* Misaligned address */\ R1_BLOCK_LEN_ERROR | /* Transferred block length incorrect */\ R1_WP_VIOLATION | /* Tried to write to protected block */ \ R1_CARD_ECC_FAILED | /* Card ECC failed */ \ R1_CC_ERROR | /* Card controller error */ \ R1_ERROR) /* General/unknown error */ +#define CMD_ERRORS \ + (CMD_ERRORS_EXCL_OOR | \ +R1_OUT_OF_RANGE) /* Command argument out of range */ \ + static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq) { u32 val; @@ -1951,6 +1954,95 @@ static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req) mqrq->retries = MMC_NO_RETRIES; } +static inline bool mmc_blk_oor_valid(struct mmc_blk_request *brq) +{ + return !!brq->mrq.sbc; +} + +static inline u32 mmc_blk_stop_err_bits(struct mmc_blk_request *brq) +{ + return mmc_blk_oor_valid(brq) ? CMD_ERRORS : CMD_ERRORS_EXCL_OOR; +} + +static inline bool mmc_blk_in_tran_state(u32 status) +{ + /* +* Some cards mishandle the status bits, so make sure to check both the +* busy indication and the card state. +*/ + return status & R1_READY_FOR_DATA && + (R1_CURRENT_STATE(status) == R1_STATE_TRAN); +} + +static unsigned int mmc_blk_clock_khz(struct mmc_host *host) +{ + if (host->actual_clock) + return host->actual_clock / 1000; + + /* Clock may be subject to a divisor, fudge it by a factor of 2. */ + if (host->ios.clock) + return host->ios.clock / 2000; + + /* How can there be no clock */ + WARN_ON_ONCE(1); + return 100; /* 100 kHz is minimum possible value */ +} + +static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host, + struct mmc_data *data) +{ + unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 100); + unsigned int khz; + + if (data->timeout_clks) { + khz = mmc_blk_clock_khz(host); + ms += DIV_ROUND_UP(data->timeout_clks, khz); + } + + return msecs_to_jiffies(ms); +} + +static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req, + u32 *resp_errs) +{ + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + struct mmc_data *data = >brq.data; + unsigned long timeout; + u32 status; + int err; + + timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data); + + while (1) { + bool done = time_after(jiffies, timeout); + + err = __mmc_send_status(card, , 5); + if (err) { + pr_err("%s: error %d requesting status\n", + req->rq_disk->disk_name, err); + break; + } + + /* Accumulate any response error bits seen */ + if (resp_errs) + *resp_errs |= status; + + if (mmc_blk_in_tran_state(status)) + break; + + /* Timeout if the device never becomes ready */ + if (done) { + pr_err("%s: Card stuck in wrong state! %s %s\n", + mmc_hostname(card->host), + req->rq_disk->disk_name, __func__); + err = -ETIMEDOUT; + break; + } + } + + return err; +} + static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req) { int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; @@ -2097,17 +2189,26 @@ static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq) static int mmc_blk_card_busy(struct mmc_card *card, struct request *req) { struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); - bool gen_err = false; +
[PATCH V13 08/10] mmc: block: blk-mq: Separate card polling from recovery
Recovery is simpler to understand if it is only used for errors. Create a separate function for card polling. Signed-off-by: Adrian Hunter--- drivers/mmc/core/block.c | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index cbb4b35a592d..0c29b1d8d545 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -2094,6 +2094,24 @@ static inline bool mmc_blk_rq_error(struct mmc_blk_request *brq) brq->data.error || brq->cmd.resp[0] & CMD_ERRORS; } +static int mmc_blk_card_busy(struct mmc_card *card, struct request *req) +{ + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + bool gen_err = false; + int err; + + if (mmc_host_is_spi(card->host) || rq_data_dir(req) == READ) + return 0; + + err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, _err); + + /* Copy the general error bit so it will be seen later on */ + if (gen_err) + mqrq->brq.stop.resp[0] |= R1_ERROR; + + return err; +} + static inline void mmc_blk_rw_reset_success(struct mmc_queue *mq, struct request *req) { @@ -2150,8 +2168,15 @@ static void mmc_blk_mq_poll_completion(struct mmc_queue *mq, struct request *req) { struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + struct mmc_host *host = mq->card->host; - mmc_blk_rw_recovery(mq, req); + if (mmc_blk_rq_error(>brq) || + mmc_blk_card_busy(mq->card, req)) { + mmc_blk_rw_recovery(mq, req); + } else { + mmc_blk_rw_reset_success(mq, req); + mmc_retune_release(host); + } mmc_blk_urgent_bkops(mq, mqrq); } -- 1.9.1
[PATCH V13 10/10] mmc: block: blk-mq: Stop using legacy recovery
There are only a few things the recovery needs to do. Primarily, it just needs to: Determine the number of bytes transferred Get the card back to transfer state Determine whether to retry There are also a couple of additional features: Reset the card before the last retry Read one sector at a time The legacy code spent much effort analyzing command errors, but commands fail fast, so it is simpler just to give all command errors the same number of retries. Signed-off-by: Adrian Hunter--- drivers/mmc/core/block.c | 261 --- 1 file changed, 135 insertions(+), 126 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 5c5ff3c34313..623fa2be7077 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1480,9 +1480,11 @@ static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq) } } -static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card, - struct mmc_queue_req *mq_mrq) +static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, +struct mmc_async_req *areq) { + struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req, + areq); struct mmc_blk_request *brq = _mrq->brq; struct request *req = mmc_queue_req_to_req(mq_mrq); int need_retune = card->host->need_retune; @@ -1588,15 +1590,6 @@ static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card, return MMC_BLK_SUCCESS; } -static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, -struct mmc_async_req *areq) -{ - struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req, - areq); - - return __mmc_blk_err_check(card, mq_mrq); -} - static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq, int disable_multi, bool *do_rel_wr_p, bool *do_data_tag_p) @@ -1922,6 +1915,7 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, } #define MMC_MAX_RETRIES5 +#define MMC_DATA_RETRIES 2 #define MMC_NO_RETRIES (MMC_MAX_RETRIES + 1) /* Single sector read during recovery */ @@ -1974,6 +1968,28 @@ static inline bool mmc_blk_in_tran_state(u32 status) (R1_CURRENT_STATE(status) == R1_STATE_TRAN); } +/* + * Check for errors the host controller driver might not have seen such as + * response mode errors or invalid card state. + */ +static bool mmc_blk_status_error(struct request *req, u32 status) +{ + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + struct mmc_blk_request *brq = >brq; + u32 stop_err_bits = mmc_blk_stop_err_bits(brq); + + return brq->cmd.resp[0] & CMD_ERRORS|| + brq->stop.resp[0] & stop_err_bits || + status& stop_err_bits || + (rq_data_dir(req) == WRITE && !mmc_blk_in_tran_state(status)); +} + +static inline bool mmc_blk_cmd_started(struct mmc_blk_request *brq) +{ + return !brq->sbc.error && !brq->cmd.error && + !(brq->cmd.resp[0] & CMD_ERRORS); +} + static unsigned int mmc_blk_clock_khz(struct mmc_host *host) { if (host->actual_clock) @@ -2043,6 +2059,47 @@ static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req, return err; } +static int mmc_blk_send_stop(struct mmc_card *card) +{ + struct mmc_command cmd = { + .opcode = MMC_STOP_TRANSMISSION, + .flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC, + }; + + return mmc_wait_for_cmd(card->host, , 5); +} + +static int mmc_blk_fix_state(struct mmc_card *card, struct request *req) +{ + int err; + + mmc_retune_hold_now(card->host); + + mmc_blk_send_stop(card); + + err = mmc_blk_card_stuck(card, req, NULL); + + mmc_retune_release(card->host); + + return err; +} + +/* + * Requests are completed by mmc_blk_mq_complete_rq() which sets simple + * policy: + * 1. A request that has transferred at least some data is considered + * successful and will be requeued if there is remaining data to + * transfer. + * 2. Otherwise the number of retries is incremented and the request + * will be requeued if there are remaining retries. + * 3. Otherwise the request will be errored out. + * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and + * mqrq->retries. So there are only 4 possible actions here: + * 1. do not accept the bytes_xfered value i.e. set it to zero + * 2. change mqrq->retries to determine the number of retries + * 3. try to reset the card + * 4. read one sector at a time + */ static void
[PATCH V13 04/10] mmc: block: Add CQE support
Add CQE support to the block driver, including: - optionally using DCMD for flush requests - "manually" issuing discard requests - issuing read / write requests to the CQE - supporting block-layer timeouts - handling recovery - supporting re-tuning CQE offers 25% - 50% better random multi-threaded I/O. There is a slight (e.g. 2%) drop in sequential read speed but no observable change to sequential write. CQE automatically sends the commands to complete requests. However it only supports reads / writes and so-called "direct commands" (DCMD). Furthermore DCMD is limited to one command at a time, but discards require 3 commands. That makes issuing discards through CQE very awkward, but some CQE's don't support DCMD anyway. So for discards, the existing non-CQE approach is taken, where the mmc core code issues the 3 commands one at a time i.e. mmc_erase(). Where DCMD is used, is for issuing flushes. Signed-off-by: Adrian Hunter--- drivers/mmc/core/block.c | 150 +++- drivers/mmc/core/block.h | 2 + drivers/mmc/core/queue.c | 158 +-- drivers/mmc/core/queue.h | 18 ++ 4 files changed, 322 insertions(+), 6 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index e2838ff4738e..e8be17152884 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -112,6 +112,7 @@ struct mmc_blk_data { #define MMC_BLK_WRITE BIT(1) #define MMC_BLK_DISCARDBIT(2) #define MMC_BLK_SECDISCARD BIT(3) +#define MMC_BLK_CQE_RECOVERY BIT(4) /* * Only set in main mmc_blk_data associated @@ -1717,6 +1718,138 @@ static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq, *do_data_tag_p = do_data_tag; } +#define MMC_CQE_RETRIES 2 + +static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req) +{ + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + struct mmc_request *mrq = >brq.mrq; + struct request_queue *q = req->q; + struct mmc_host *host = mq->card->host; + unsigned long flags; + bool put_card; + int err; + + mmc_cqe_post_req(host, mrq); + + if (mrq->cmd && mrq->cmd->error) + err = mrq->cmd->error; + else if (mrq->data && mrq->data->error) + err = mrq->data->error; + else + err = 0; + + if (err) { + if (mqrq->retries++ < MMC_CQE_RETRIES) + blk_mq_requeue_request(req, true); + else + blk_mq_end_request(req, BLK_STS_IOERR); + } else if (mrq->data) { + if (blk_update_request(req, BLK_STS_OK, mrq->data->bytes_xfered)) + blk_mq_requeue_request(req, true); + else + __blk_mq_end_request(req, BLK_STS_OK); + } else { + blk_mq_end_request(req, BLK_STS_OK); + } + + spin_lock_irqsave(q->queue_lock, flags); + + mq->in_flight[mmc_issue_type(mq, req)] -= 1; + + put_card = mmc_tot_in_flight(mq) == 0; + + mmc_cqe_check_busy(mq); + + spin_unlock_irqrestore(q->queue_lock, flags); + + if (!mq->cqe_busy) + blk_mq_run_hw_queues(q, true); + + if (put_card) + mmc_put_card(mq->card, >ctx); +} + +void mmc_blk_cqe_recovery(struct mmc_queue *mq) +{ + struct mmc_card *card = mq->card; + struct mmc_host *host = card->host; + int err; + + pr_debug("%s: CQE recovery start\n", mmc_hostname(host)); + + err = mmc_cqe_recovery(host); + if (err) + mmc_blk_reset(mq->blkdata, host, MMC_BLK_CQE_RECOVERY); + else + mmc_blk_reset_success(mq->blkdata, MMC_BLK_CQE_RECOVERY); + + pr_debug("%s: CQE recovery done\n", mmc_hostname(host)); +} + +static void mmc_blk_cqe_req_done(struct mmc_request *mrq) +{ + struct mmc_queue_req *mqrq = container_of(mrq, struct mmc_queue_req, + brq.mrq); + struct request *req = mmc_queue_req_to_req(mqrq); + struct request_queue *q = req->q; + struct mmc_queue *mq = q->queuedata; + + /* +* Block layer timeouts race with completions which means the normal +* completion path cannot be used during recovery. +*/ + if (mq->in_recovery) + mmc_blk_cqe_complete_rq(mq, req); + else + blk_mq_complete_request(req); +} + +static int mmc_blk_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq) +{ + mrq->done = mmc_blk_cqe_req_done; + mrq->recovery_notifier = mmc_cqe_recovery_notifier; + + return mmc_cqe_start_req(host, mrq); +} + +static struct mmc_request *mmc_blk_cqe_prep_dcmd(struct mmc_queue_req *mqrq, +
[PATCH V13 03/10] mmc: block: Add blk-mq support
Define and use a blk-mq queue. Discards and flushes are processed synchronously, but reads and writes asynchronously. In order to support slow DMA unmapping, DMA unmapping is not done until after the next request is started. That means the request is not completed until then. If there is no next request then the completion is done by queued work. Signed-off-by: Adrian Hunter--- drivers/mmc/core/block.c | 457 ++- drivers/mmc/core/block.h | 9 + drivers/mmc/core/queue.c | 273 +--- drivers/mmc/core/queue.h | 32 4 files changed, 740 insertions(+), 31 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index ad72fa19f082..e2838ff4738e 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -1264,7 +1264,10 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) break; } mq_rq->drv_op_result = ret; - blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK); + if (req->mq_ctx) + blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK); + else + blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK); } static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req) @@ -1307,7 +1310,10 @@ static void mmc_blk_issue_discard_rq(struct mmc_queue *mq, struct request *req) else mmc_blk_reset_success(md, type); fail: - blk_end_request(req, status, blk_rq_bytes(req)); + if (req->mq_ctx) + blk_mq_end_request(req, status); + else + blk_end_request(req, status, blk_rq_bytes(req)); } static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq, @@ -1377,7 +1383,10 @@ static void mmc_blk_issue_secdiscard_rq(struct mmc_queue *mq, if (!err) mmc_blk_reset_success(md, type); out: - blk_end_request(req, status, blk_rq_bytes(req)); + if (req->mq_ctx) + blk_mq_end_request(req, status); + else + blk_end_request(req, status, blk_rq_bytes(req)); } static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req) @@ -1387,7 +1396,10 @@ static void mmc_blk_issue_flush(struct mmc_queue *mq, struct request *req) int ret = 0; ret = mmc_flush_cache(card); - blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK); + if (req->mq_ctx) + blk_mq_end_request(req, ret ? BLK_STS_IOERR : BLK_STS_OK); + else + blk_end_request_all(req, ret ? BLK_STS_IOERR : BLK_STS_OK); } /* @@ -1464,11 +1476,9 @@ static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq) } } -static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, -struct mmc_async_req *areq) +static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card, + struct mmc_queue_req *mq_mrq) { - struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req, - areq); struct mmc_blk_request *brq = _mrq->brq; struct request *req = mmc_queue_req_to_req(mq_mrq); int need_retune = card->host->need_retune; @@ -1574,6 +1584,15 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, return MMC_BLK_SUCCESS; } +static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, +struct mmc_async_req *areq) +{ + struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req, + areq); + + return __mmc_blk_err_check(card, mq_mrq); +} + static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq, int disable_multi, bool *do_rel_wr_p, bool *do_data_tag_p) @@ -1766,6 +1785,428 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq, mqrq->areq.err_check = mmc_blk_err_check; } +#define MMC_MAX_RETRIES5 +#define MMC_NO_RETRIES (MMC_MAX_RETRIES + 1) + +/* Single sector read during recovery */ +static void mmc_blk_ss_read(struct mmc_queue *mq, struct request *req) +{ + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + blk_status_t status; + + while (1) { + mmc_blk_rw_rq_prep(mqrq, mq->card, 1, mq); + + mmc_wait_for_req(mq->card->host, >brq.mrq); + + /* +* Not expecting command errors, so just give up in that case. +* If there are retries remaining, the request will get +* requeued. +*/ + if (mqrq->brq.cmd.error) + return; + + if (blk_rq_bytes(req) <= 512) +
[PATCH V13 00/10] mmc: Add Command Queue support
Hi Here is V13 of the hardware command queue patches without the software command queue patches, now using blk-mq and now with blk-mq support for non-CQE I/O. HW CMDQ offers 25% - 50% better random multi-threaded I/O. I see a slight 2% drop in sequential read speed but no change to sequential write. Non-CQE blk-mq showed a 3% decrease in sequential read performance. This seemed to be coming from the inferior latency of running work items compared with a dedicated thread. Hacking blk-mq workqueue to be unbound reduced the performance degradation from 3% to 1%. While we should look at changing blk-mq to give better workqueue performance, a bigger gain is likely to be made by adding a new host API to enable the next already-prepared request to be issued directly from within ->done() callback of the current request. Changes since V12: mmc: block: Add error-handling comments New patch. mmc: block: Add blk-mq support Use legacy error handling mmc: block: Add CQE support Re-base mmc: block: blk-mq: Add support for direct completion New patch. mmc: block: blk-mq: Separate card polling from recovery New patch. mmc: block: blk-mq: Stop using card_busy_detect() New patch. mmc: block: blk-mq: Stop using legacy recovery New patch. Changes since V11: Split "mmc: block: Add CQE and blk-mq support" into 2 patches Changes since V10: mmc: core: Remove unnecessary host claim mmc: core: Introduce host claiming by context mmc: core: Add support for handling CQE requests mmc: mmc: Enable Command Queuing mmc: mmc: Enable CQE's mmc: block: Use local variables in mmc_blk_data_prep() mmc: block: Prepare CQE data mmc: block: Factor out mmc_setup_queue() mmc: core: Add parameter use_blk_mq mmc: core: Export mmc_start_bkops() mmc: core: Export mmc_start_request() mmc: core: Export mmc_retune_hold_now() and mmc_retune_release() Dropped because they have been applied mmc: block: Add CQE and blk-mq support Extend blk-mq support for asynchronous read / writes to all host controllers including those that require polling. The direct completion path is still available but depends on a new capability flag. Drop blk-mq support for synchronous read / writes. Venkat Gopalakrishnan (1): mmc: cqhci: support for command queue enabled host Changes since V9: mmc: block: Add CQE and blk-mq support - reinstate mq support for REQ_OP_DRV_IN/OUT that was removed because it was incorrectly assumed to be handled by the rpmb character device - don't check for rpmb block device anymore mmc: cqhci: support for command queue enabled host Fix cqhci_set_irqs() as per Haibo Chen Changes since V8: Re-based mmc: core: Introduce host claiming by context Slightly simplified as per Ulf mmc: core: Export mmc_retune_hold_now() and mmc_retune_release() New patch. mmc: block: Add CQE and blk-mq support Fix missing ->post_req() on the error path Changes since V7: Re-based mmc: core: Introduce host claiming by context Slightly simplified mmc: core: Add parameter use_blk_mq New patch. mmc: core: Remove unnecessary host claim New patch. mmc: core: Export mmc_start_bkops() New patch. mmc: core: Export mmc_start_request() New patch. mmc: block: Add CQE and blk-mq support Add blk-mq support for non_CQE requests Changes since V6: mmc: core: Introduce host claiming by context New patch. mmc: core: Move mmc_start_areq() declaration Dropped because it has been applied mmc: block: Fix block status codes Dropped because it has been applied mmc: host: Add CQE interface Dropped because it has been applied mmc: core: Turn off CQE before sending commands Dropped because it has been applied mmc: block: Factor out mmc_setup_queue() New patch. mmc: block: Add CQE support Drop legacy support and add blk-mq support Changes since V5: Re-based mmc: core: Add mmc_retune_hold_now() Dropped because it has been applied mmc: core: Add members to mmc_request and mmc_data for CQE's Dropped because it has been applied mmc: core: Move mmc_start_areq() declaration New patch at Ulf's request mmc: block: Fix block status codes Another un-related patch mmc: host: Add CQE interface Move recovery_notifier() callback to struct mmc_request mmc: core: Add support for handling CQE requests Roll __mmc_cqe_request_done() into mmc_cqe_request_done() Move function declarations requested by Ulf mmc: core: Remove unused MMC_CAP2_PACKED_CMD Dropped because it has been applied
Re: [PATCH 1/3] nvme: do not check for ns on rw path
> On 3 Nov 2017, at 13.53, Christoph Hellwigwrote: > >> -if (ns && ns->ms && >> +if (ns->ms && >> (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)) && >> !blk_integrity_rq(req) && !blk_rq_is_passthrough(req)) >> return BLK_STS_NOTSUPP; > > blk_rq_is_passthrough also can't be true here. > > How about: > > if (ns->ms && !blk_integrity_rq(req) && > (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple))) > return BLK_STS_NOTSUPP; > Sure. > Although I have to admit I don't really understand what this check > is even trying to do. It basically checks for a namespace that has > a format with metadata that is not T10 protection information and > then rejects all I/O to it. Why are we even creating a block device > node for such a thing? Looking at the history (i) the check has changed location and (ii) some checks have been added through time. So it looks like leftovers from here and there. If we end up not needing these checks at all here, you can just fix it all in the same commit. Just wanted to get rid of sparse/smatch complains... signature.asc Description: Message signed with OpenPGP
Re: [PATCH 2/3] nvme: compare NQN string with right size
> On 3 Nov 2017, at 13.54, Christoph Hellwigwrote: > > On Fri, Nov 03, 2017 at 11:02:49AM +0100, Javier González wrote: >> Compare subnqns using NVMF_NQN_SIZE as it is < 256 >> >> Signed-off-by: Javier González >> --- >> drivers/nvme/host/core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index bd1d5ff911c9..ae8ab0a1ef0d 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -1743,7 +1743,7 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, >> struct nvme_id_ctrl *id) >> >> nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE); >> if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) { >> -strcpy(ctrl->subnqn, id->subnqn); >> +strncpy(ctrl->subnqn, id->subnqn, NVMF_NQN_SIZE); >> return; >> } > > This isn't a compare, but a copy. Except for that it looks ok to me. True. Can you change the message when picking it up? signature.asc Description: Message signed with OpenPGP
Re: [PATCH 3/3] nvme: fix eui_show() print format
On Fri, Nov 03, 2017 at 11:02:50AM +0100, Javier González wrote: > Signed-off-by: Javier González> --- > drivers/nvme/host/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index ae8ab0a1ef0d..f05c81774abf 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -2108,7 +2108,7 @@ static ssize_t eui_show(struct device *dev, struct > device_attribute *attr, > char *buf) > { > struct nvme_ns *ns = nvme_get_ns_from_dev(dev); > - return sprintf(buf, "%8phd\n", ns->eui); > + return sprintf(buf, "%8phD\n", ns->eui); > } > static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL); This looks correct. I wonder what the old code printed - does someone have a device with an EUI-64 at hand to quickly cross check what we did before?
Re: [PATCH 2/3] nvme: compare NQN string with right size
On Fri, Nov 03, 2017 at 11:02:49AM +0100, Javier González wrote: > Compare subnqns using NVMF_NQN_SIZE as it is < 256 > > Signed-off-by: Javier González> --- > drivers/nvme/host/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index bd1d5ff911c9..ae8ab0a1ef0d 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1743,7 +1743,7 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, > struct nvme_id_ctrl *id) > > nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE); > if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) { > - strcpy(ctrl->subnqn, id->subnqn); > + strncpy(ctrl->subnqn, id->subnqn, NVMF_NQN_SIZE); > return; > } This isn't a compare, but a copy. Except for that it looks ok to me.
[GIT PULL] nvme updates for Linux 4.15
Hi Jens, below are the currently queue nvme updates for Linux 4.15. There are a few more things that could make it for this merge window, but I'd like to get things into linux-next, especially for the unlikely case that Linus decided to cut -rc8. Highlights: - support for SGLs in the PCIe driver (Chaitanya Kulkarni) - disable I/O schedulers for the admin queue (Israel Rukshin) - various Fibre Channel fixes and enhancements (James Smart) - various refactoring for better code sharing between transports (Sagi Grimberg and me) as well as lots of little bits from various contributors. The following changes since commit 9c9883744dda1cc38339a448dd8435140537027e: block: move __elv_next_request to blk-core.c (2017-10-03 08:43:04 -0600) are available in the git repository at: git://git.infradead.org/nvme.git nvme-4.15 for you to fetch changes up to a806c6c81e6c0d07c8a8b2643bad4a37a196d681: nvme: comment typo fixed in clearing AER (2017-11-03 10:02:20 +0300) Chaitanya Kulkarni (1): nvme-pci: add SGL support Christoph Hellwig (11): nvme: simplify compat_ioctl handling nvme: use ida_simple_{get,remove} for the controller instance nvme: use kref_get_unless_zero in nvme_find_get_ns nvme: simplify nvme_open nvme: switch controller refcounting to use struct device nvme: get rid of nvme_ctrl_list nvme: check for a live controller in nvme_dev_open nvme-fc: merge __nvme_fc_schedule_delete_work into __nvme_fc_del_ctrl nvme: move controller deletion to common code nvme-rdma: remove nvme_rdma_remove_ctrl nvme: consolidate common code from ->reset_work Israel Rukshin (3): nvme-rdma: Add BLK_MQ_F_NO_SCHED flag to admin tag set nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag set nvme-loop: Add BLK_MQ_F_NO_SCHED flag to admin tag set James Smart (18): nvmet: bump NVMET_NR_QUEUES to 128 nvme-fc: add uevent for auto-connect nvme-fc: create fc class and transport device nvme-fc: move remote port get/put/free location nvme-fc: correct io termination handling nvme-fc: correct io timeout behavior nvme: add duplicate_connect option nvme: add helper to compare options to controller nvme-rdma: add support for duplicate_connect option nvme-fc: add support for duplicate_connect option nvme-fc: remove NVME_FC_MAX_SEGMENTS nvme-fc: avoid workqueue flush stalls nvme-fc: change ctlr state assignments during reset/reconnect nvme-fc: add a dev_loss_tmo field to the remoteport nvme-fc: check connectivity before initiating reconnects nvme: allow controller RESETTING to RECONNECTING transition nvme-fc: add dev_loss_tmo timeout and remoteport resume support nvmet: fix fatal_err_work deadlock Keith Busch (1): nvme: Remove unused headers Marc Olson (1): nvme: update timeout module parameter type Max Gurtovoy (1): nvme-rdma: align nvme_rdma_device structure Minwoo Im (2): nvme-pci: fix typos in comments nvme: comment typo fixed in clearing AER Nitzan Carmi (1): nvme-rdma: Add debug message when reaches timeout Randy Dunlap (1): nvme: use menu Kconfig interface Roy Shterman (1): nvmet: Change max_nsid in subsystem due to ns_disable if needed Sagi Grimberg (14): nvme-fabrics: request transport module block: introduce blk_mq_tagset_iter nvme: introduce nvme_reinit_tagset block: remove blk_mq_reinit_tagset nvme-rdma: pass tagset to directly nvme_rdma_free_tagset nvme-rdma: fix wrong logging message nvme-rdma: move assignment to declaration nvme-rdma: Check that reinit_request got a proper mr nvme-rdma: teardown admin/io queues once on error recovery nvme-rdma: Don't local invalidate if the queue is not live nvme-rdma: change queue flag semantics DELETING -> ALLOCATED nvme-rdma: stop controller reset if the controller is deleting nvme-rdma: reuse nvme_delete_ctrl when reconnect attempts expire nvme: flush reset_work before safely continuing with delete operation block/blk-mq-tag.c | 11 +- drivers/nvme/Kconfig | 4 + drivers/nvme/host/core.c | 260 +++- drivers/nvme/host/fabrics.c| 12 +- drivers/nvme/host/fabrics.h| 14 + drivers/nvme/host/fc.c | 656 ++--- drivers/nvme/host/nvme.h | 26 +- drivers/nvme/host/pci.c| 224 +++--- drivers/nvme/host/rdma.c | 225 -- drivers/nvme/target/core.c | 13 + drivers/nvme/target/fc.c | 16 +- drivers/nvme/target/loop.c | 47 +-- drivers/nvme/target/nvmet.h| 2 +- include/linux/blk-mq.h | 4 +- include/linux/nvme-fc-driver.h | 15 +- 15 files changed, 1026 insertions(+), 503 deletions(-)
[PATCH 1/3] nvme: do not check for ns on rw path
On the rw path, the ns is assumed to be set. However, a check is still done, inherited from the time the code resided at nvme_queue_rq(). Eliminate this check, which also eliminates a smatch complain for not doing proper NULL checks on ns. Signed-off-by: Javier González--- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 5a14cc7f28ee..bd1d5ff911c9 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -472,7 +472,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, * unless this namespace is formated such that the metadata can be * stripped/generated by the controller with PRACT=1. */ - if (ns && ns->ms && + if (ns->ms && (!ns->pi_type || ns->ms != sizeof(struct t10_pi_tuple)) && !blk_integrity_rq(req) && !blk_rq_is_passthrough(req)) return BLK_STS_NOTSUPP; -- 2.7.4
[PATCH 2/3] nvme: compare NQN string with right size
Compare subnqns using NVMF_NQN_SIZE as it is < 256 Signed-off-by: Javier González--- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index bd1d5ff911c9..ae8ab0a1ef0d 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1743,7 +1743,7 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id) nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE); if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) { - strcpy(ctrl->subnqn, id->subnqn); + strncpy(ctrl->subnqn, id->subnqn, NVMF_NQN_SIZE); return; } -- 2.7.4
[PATCH 3/3] nvme: fix eui_show() print format
Signed-off-by: Javier González--- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index ae8ab0a1ef0d..f05c81774abf 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2108,7 +2108,7 @@ static ssize_t eui_show(struct device *dev, struct device_attribute *attr, char *buf) { struct nvme_ns *ns = nvme_get_ns_from_dev(dev); - return sprintf(buf, "%8phd\n", ns->eui); + return sprintf(buf, "%8phD\n", ns->eui); } static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL); -- 2.7.4
[PATCH 0/3] nvme: small fixes reported by smatch
Fix a number of small things reported by smatch on the nvme driver Javier González (3): nvme: do not check for ns on rw path nvme: compare NQN string with right size nvme: fix eui_show() print format drivers/nvme/host/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.7.4