[PATCH v2] blkmq: Fix NULL pointer deref when all reserved tags in use
When allocating from the reserved tags pool, bt_get() is called with a NULL hctx. If all tags are in use, the hw queue is kicked to push out any pending IO, potentially freeing tags, and tag allocation is retried. The problem is that blk_mq_run_hw_queue() doesn't check for a NULL hctx. So we avoid it with a simple NULL hctx test. This issue was introduced by: b32232073e80: blk-mq: fix hang in bt_get() Tested by hammering mtip32xx with concurrent smartctl/hdparm. Signed-off-by: Sam Bradshaw Signed-off-by: Selvan Mani --- diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index d53a764..9d7dd64 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -280,7 +280,8 @@ static int bt_get(struct blk_mq_alloc_data *data, * pending IO submits before going to sleep waiting for * some to complete. */ - blk_mq_run_hw_queue(hctx, false); + if (hctx) + blk_mq_run_hw_queue(hctx, false); /* * Retry tag allocation after running the hardware queue, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] blkmq: Fix NULL pointer deref when all reserved tags in use
Good catch! But why not put the hctx == NULL check in as a conditional in bt_get() before running the queue? I can't imagine other cases where calling blk_mq_run_hw_queue() with hctx == NULL would be a valid scenario. The change was meant to be broad in scope. A runtime NULL deref is a rather unfortunate way to determine that there are other invalid scenarios. But given that both approaches fix the immediate problem, I'd be happy to change the patch as you suggest. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] blkmq: Fix NULL pointer deref when all reserved tags in use
When allocating from the reserved tags pool, bt_get() is called with a NULL hctx. If all tags are in use, the hw queue is kicked to push out any pending IO, potentially freeing tags, and tag allocation is retried. The problem is that blk_mq_run_hw_queue() doesn't check for a NULL hctx. This patch fixes that bug. An alternative implementation might skip kicking the queue for reserved tags and go right to io_schedule() but we chose to keep it simple. Tested by hammering mtip32xx with concurrent smartctl/hdparm. Signed-off-by: Sam Bradshaw Signed-off-by: Selvan Mani --- block/blk-mq.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 59fa239..0471af6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -887,7 +887,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) { - if (unlikely(test_bit(BLK_MQ_S_STOPPED, >state) || + if (unlikely(!hctx || test_bit(BLK_MQ_S_STOPPED, >state) || !blk_mq_hw_queue_mapped(hctx))) return; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] blkmq: Fix NULL pointer deref when all reserved tags in use
When allocating from the reserved tags pool, bt_get() is called with a NULL hctx. If all tags are in use, the hw queue is kicked to push out any pending IO, potentially freeing tags, and tag allocation is retried. The problem is that blk_mq_run_hw_queue() doesn't check for a NULL hctx. This patch fixes that bug. An alternative implementation might skip kicking the queue for reserved tags and go right to io_schedule() but we chose to keep it simple. Tested by hammering mtip32xx with concurrent smartctl/hdparm. Signed-off-by: Sam Bradshaw sbrads...@micron.com Signed-off-by: Selvan Mani sm...@micron.com --- block/blk-mq.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 59fa239..0471af6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -887,7 +887,7 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async) { - if (unlikely(test_bit(BLK_MQ_S_STOPPED, hctx-state) || + if (unlikely(!hctx || test_bit(BLK_MQ_S_STOPPED, hctx-state) || !blk_mq_hw_queue_mapped(hctx))) return; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] blkmq: Fix NULL pointer deref when all reserved tags in use
When allocating from the reserved tags pool, bt_get() is called with a NULL hctx. If all tags are in use, the hw queue is kicked to push out any pending IO, potentially freeing tags, and tag allocation is retried. The problem is that blk_mq_run_hw_queue() doesn't check for a NULL hctx. So we avoid it with a simple NULL hctx test. This issue was introduced by: b32232073e80: blk-mq: fix hang in bt_get() Tested by hammering mtip32xx with concurrent smartctl/hdparm. Signed-off-by: Sam Bradshaw sbrads...@micron.com Signed-off-by: Selvan Mani sm...@micron.com --- diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index d53a764..9d7dd64 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -280,7 +280,8 @@ static int bt_get(struct blk_mq_alloc_data *data, * pending IO submits before going to sleep waiting for * some to complete. */ - blk_mq_run_hw_queue(hctx, false); + if (hctx) + blk_mq_run_hw_queue(hctx, false); /* * Retry tag allocation after running the hardware queue, -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] blkmq: Fix NULL pointer deref when all reserved tags in use
Good catch! But why not put the hctx == NULL check in as a conditional in bt_get() before running the queue? I can't imagine other cases where calling blk_mq_run_hw_queue() with hctx == NULL would be a valid scenario. The change was meant to be broad in scope. A runtime NULL deref is a rather unfortunate way to determine that there are other invalid scenarios. But given that both approaches fix the immediate problem, I'd be happy to change the patch as you suggest. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] block: pass correct prot_buf pointer to integrity metadata processing function
The prot_buf pointer passed to the generate/verify functions is incorrect for the second and subsequent range, making it impossible to verify the guard tag. The patch correctly increments the prot_buf pointer by the tuple size for each pass. Signed-off-by: Sam Bradshaw --- diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 5cbd5d9..c7cbf60 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -226,13 +226,13 @@ static int bio_integrity_process(struct bio *bio, iter.disk_name = bio->bi_bdev->bd_disk->disk_name; iter.interval = bi->interval; iter.seed = bip_get_seed(bip); - iter.prot_buf = prot_buf; bio_for_each_segment(bv, bio, bviter) { void *kaddr = kmap_atomic(bv.bv_page); iter.data_buf = kaddr + bv.bv_offset; iter.data_size = bv.bv_len; + iter.prot_buf = prot_buf; ret = proc_fn(); if (ret) { @@ -241,6 +241,7 @@ static int bio_integrity_process(struct bio *bio, } kunmap_atomic(kaddr); + prot_buf += bi->tuple_size; } return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] block: pass correct prot_buf pointer to integrity metadata processing function
The prot_buf pointer passed to the generate/verify functions is incorrect for the second and subsequent range, making it impossible to verify the guard tag. The patch correctly increments the prot_buf pointer by the tuple size for each pass. Signed-off-by: Sam Bradshaw sbrads...@micron.com --- diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 5cbd5d9..c7cbf60 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -226,13 +226,13 @@ static int bio_integrity_process(struct bio *bio, iter.disk_name = bio-bi_bdev-bd_disk-disk_name; iter.interval = bi-interval; iter.seed = bip_get_seed(bip); - iter.prot_buf = prot_buf; bio_for_each_segment(bv, bio, bviter) { void *kaddr = kmap_atomic(bv.bv_page); iter.data_buf = kaddr + bv.bv_offset; iter.data_size = bv.bv_len; + iter.prot_buf = prot_buf; ret = proc_fn(iter); if (ret) { @@ -241,6 +241,7 @@ static int bio_integrity_process(struct bio *bio, } kunmap_atomic(kaddr); + prot_buf += bi-tuple_size; } return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block: pass correct seed to integrity metadata generation function
> -Original Message- > From: Martin K. Petersen [mailto:martin.peter...@oracle.com] > Sent: Wednesday, January 07, 2015 4:19 PM > To: Sam Bradshaw (sbradshaw) > Cc: Martin K. Petersen; ax...@kernel.dk; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] block: pass correct seed to integrity metadata > generation function > > >>>>> "Sam" == Sam Bradshaw (sbradshaw) writes: > > Sam> Yes. > > The seed is just a seed. We happen to set it to the (block layer) > sector number if nothing else is provided but it's essentially just an > incrementing counter starting at an arbitrary value chosen by the > caller. > > Since an I/O may get remapped many times as block devices are stacked > (DM, MD, stripe splits, partition offset shifts, etc.) the seed is not > expected to match the LBA on the storage device. That is almost never > the case. > > In SCSI we do a remapping pass before submitting a WRITE or upon > completion of a READ. You will have to do the same for NVMe. > > It was done this way to avoid remapping the ref tag several times. We > adjust the seed as the I/O gets sliced and diced and only map the PI > pages once to do a single traversal at the bottom of the stack. > > For next gen devices we simply pass the seed to the hardware and let it > handle the remapping. This saves us having to map the PI pages and pull > them through the cache. Got it, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block: pass correct seed to integrity metadata generation function
> -Original Message- > From: Martin K. Petersen [mailto:martin.peter...@oracle.com] > Sent: Wednesday, January 07, 2015 3:43 PM > To: Sam Bradshaw (sbradshaw) > Cc: Martin K. Petersen; ax...@kernel.dk; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] block: pass correct seed to integrity metadata > generation function > > >>>>> "Sam" == Sam Bradshaw (sbradshaw) writes: > > Sam> Yes, you were CC: on the original. Could you take a peek at the > Sam> patch and give me feedback? https://lkml.org/lkml/2014/12/18/521 > > Is the target an NVMe device? Yes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block: pass correct seed to integrity metadata generation function
> Sam> Any particular objection to this patch? It would be nice if the > Sam> kernel supported PI on block sizes other than just 512 bytes. > > I never saw the patch. Did you CC: me on it? Yes, you were CC: on the original. Could you take a peek at the patch and give me feedback? https://lkml.org/lkml/2014/12/18/521 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: pass correct seed to integrity metadata generation function
On 12/18/2014 04:11 PM, Sam Bradshaw wrote: The seed value passed to the blk integrity metadata generation function was wrong depending on the device block size (interval) and how many blocks comprised the bvec. This patch converts the seed to 'interval' units and increments it correctly for each iteration. Tested against a 4k+8 (w/ type1 PI) sector size. Any particular objection to this patch? It would be nice if the kernel supported PI on block sizes other than just 512 bytes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] block: pass correct seed to integrity metadata generation function
On 12/18/2014 04:11 PM, Sam Bradshaw wrote: The seed value passed to the blk integrity metadata generation function was wrong depending on the device block size (interval) and how many blocks comprised the bvec. This patch converts the seed to 'interval' units and increments it correctly for each iteration. Tested against a 4k+8 (w/ type1 PI) sector size. Any particular objection to this patch? It would be nice if the kernel supported PI on block sizes other than just 512 bytes. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block: pass correct seed to integrity metadata generation function
Sam Any particular objection to this patch? It would be nice if the Sam kernel supported PI on block sizes other than just 512 bytes. I never saw the patch. Did you CC: me on it? Yes, you were CC: on the original. Could you take a peek at the patch and give me feedback? https://lkml.org/lkml/2014/12/18/521 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block: pass correct seed to integrity metadata generation function
-Original Message- From: Martin K. Petersen [mailto:martin.peter...@oracle.com] Sent: Wednesday, January 07, 2015 3:43 PM To: Sam Bradshaw (sbradshaw) Cc: Martin K. Petersen; ax...@kernel.dk; linux-kernel@vger.kernel.org Subject: Re: [PATCH] block: pass correct seed to integrity metadata generation function Sam == Sam Bradshaw (sbradshaw) sbrads...@micron.com writes: Sam Yes, you were CC: on the original. Could you take a peek at the Sam patch and give me feedback? https://lkml.org/lkml/2014/12/18/521 Is the target an NVMe device? Yes. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block: pass correct seed to integrity metadata generation function
-Original Message- From: Martin K. Petersen [mailto:martin.peter...@oracle.com] Sent: Wednesday, January 07, 2015 4:19 PM To: Sam Bradshaw (sbradshaw) Cc: Martin K. Petersen; ax...@kernel.dk; linux-kernel@vger.kernel.org Subject: Re: [PATCH] block: pass correct seed to integrity metadata generation function Sam == Sam Bradshaw (sbradshaw) sbrads...@micron.com writes: Sam Yes. The seed is just a seed. We happen to set it to the (block layer) sector number if nothing else is provided but it's essentially just an incrementing counter starting at an arbitrary value chosen by the caller. Since an I/O may get remapped many times as block devices are stacked (DM, MD, stripe splits, partition offset shifts, etc.) the seed is not expected to match the LBA on the storage device. That is almost never the case. In SCSI we do a remapping pass before submitting a WRITE or upon completion of a READ. You will have to do the same for NVMe. It was done this way to avoid remapping the ref tag several times. We adjust the seed as the I/O gets sliced and diced and only map the PI pages once to do a single traversal at the bottom of the stack. For next gen devices we simply pass the seed to the hardware and let it handle the remapping. This saves us having to map the PI pages and pull them through the cache. Got it, thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block:mtip32xx: Add call to set bit for failing drive rebuild in mtip32xx.c for the function,mtip_hw_get_identify
> -Original Message- > From: Nicholas Krause [mailto:xerofo...@gmail.com] > Sent: Monday, January 05, 2015 10:45 PM > To: ax...@fb.com > Cc: Asai Thambi Samymuthu Pattrayasamy (asamymuthupa); Sam Bradshaw > (sbradshaw); h...@lst.de; Selvan Mani (smani) [CONT - Type 2]; > agord...@redhat.com; linux-kernel@vger.kernel.org > Subject: [PATCH] block:mtip32xx: Add call to set bit for failing drive > rebuild in mtip32xx.c for the function,mtip_hw_get_identify > > This adds a call to the set_bit function for setting the bit for driver > rebuild failures if the drive reset build fails based on checking the > last element in the buf array for holding the value of the hardware > registers.In additon we set this bit on the dd_flag element of the > pointer dd of the structure type,driver_data passed to this function. Thanks, Nicholas. Good stuff. Aside, if you're seeing rebuild failures, please contact Micron support to get your drive(s) replaced under warranty. -Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] block:mtip32xx: Add call to set bit for failing drive rebuild in mtip32xx.c for the function,mtip_hw_get_identify
-Original Message- From: Nicholas Krause [mailto:xerofo...@gmail.com] Sent: Monday, January 05, 2015 10:45 PM To: ax...@fb.com Cc: Asai Thambi Samymuthu Pattrayasamy (asamymuthupa); Sam Bradshaw (sbradshaw); h...@lst.de; Selvan Mani (smani) [CONT - Type 2]; agord...@redhat.com; linux-kernel@vger.kernel.org Subject: [PATCH] block:mtip32xx: Add call to set bit for failing drive rebuild in mtip32xx.c for the function,mtip_hw_get_identify This adds a call to the set_bit function for setting the bit for driver rebuild failures if the drive reset build fails based on checking the last element in the buf array for holding the value of the hardware registers.In additon we set this bit on the dd_flag element of the pointer dd of the structure type,driver_data passed to this function. Thanks, Nicholas. Good stuff. Aside, if you're seeing rebuild failures, please contact Micron support to get your drive(s) replaced under warranty. -Sam -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] block: pass correct seed to integrity metadata generation function
The seed value passed to the blk integrity metadata generation function was wrong depending on the device block size (interval) and how many blocks comprised the bvec. This patch converts the seed to 'interval' units and increments it correctly for each iteration. Tested against a 4k+8 (w/ type1 PI) sector size. Signed-off-by: Sam Bradshaw Signed-off-by: Selvan Mani --- diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 5cbd5d9..7114040 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -219,20 +219,22 @@ static int bio_integrity_process(struct bio *bio, struct bvec_iter bviter; struct bio_vec bv; struct bio_integrity_payload *bip = bio_integrity(bio); - unsigned int ret = 0; + unsigned int len = 0, ret = 0; void *prot_buf = page_address(bip->bip_vec->bv_page) + bip->bip_vec->bv_offset; + sector_t seed, start = bio_integrity_intervals(bi, bip_get_seed(bip)); iter.disk_name = bio->bi_bdev->bd_disk->disk_name; iter.interval = bi->interval; - iter.seed = bip_get_seed(bip); - iter.prot_buf = prot_buf; + seed = start; bio_for_each_segment(bv, bio, bviter) { void *kaddr = kmap_atomic(bv.bv_page); iter.data_buf = kaddr + bv.bv_offset; iter.data_size = bv.bv_len; + iter.seed = seed; + iter.prot_buf = prot_buf; ret = proc_fn(); if (ret) { @@ -241,6 +243,9 @@ static int bio_integrity_process(struct bio *bio, } kunmap_atomic(kaddr); + len += bv.bv_len; + seed = start + (len / bi->interval); + prot_buf += bi->tuple_size; } return ret; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] block: pass correct seed to integrity metadata generation function
The seed value passed to the blk integrity metadata generation function was wrong depending on the device block size (interval) and how many blocks comprised the bvec. This patch converts the seed to 'interval' units and increments it correctly for each iteration. Tested against a 4k+8 (w/ type1 PI) sector size. Signed-off-by: Sam Bradshaw sbrads...@micron.com Signed-off-by: Selvan Mani sm...@micron.com --- diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 5cbd5d9..7114040 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -219,20 +219,22 @@ static int bio_integrity_process(struct bio *bio, struct bvec_iter bviter; struct bio_vec bv; struct bio_integrity_payload *bip = bio_integrity(bio); - unsigned int ret = 0; + unsigned int len = 0, ret = 0; void *prot_buf = page_address(bip-bip_vec-bv_page) + bip-bip_vec-bv_offset; + sector_t seed, start = bio_integrity_intervals(bi, bip_get_seed(bip)); iter.disk_name = bio-bi_bdev-bd_disk-disk_name; iter.interval = bi-interval; - iter.seed = bip_get_seed(bip); - iter.prot_buf = prot_buf; + seed = start; bio_for_each_segment(bv, bio, bviter) { void *kaddr = kmap_atomic(bv.bv_page); iter.data_buf = kaddr + bv.bv_offset; iter.data_size = bv.bv_len; + iter.seed = seed; + iter.prot_buf = prot_buf; ret = proc_fn(iter); if (ret) { @@ -241,6 +243,9 @@ static int bio_integrity_process(struct bio *bio, } kunmap_atomic(kaddr); + len += bv.bv_len; + seed = start + (len / bi-interval); + prot_buf += bi-tuple_size; } return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtip32xx: minor performance enhancements
This patch adds the following: 1) Compiler hinting in the fast path. 2) A prefetch of port->flags to eliminate moderate cpu stalling later in mtip_hw_submit_io(). 3) Eliminate a redundant rq_data_dir(). 4) Reorder members of driver_data to eliminate false cacheline sharing between irq_workers_active and unal_qdepth. With some workload and topology configurations, I'm seeing ~1.5% throughput improvement in small block random read benchmarks as well as improved latency std. dev. Signed-off-by: Sam Bradshaw --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 74abd49..fcbac54 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -2380,6 +2380,8 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, /* Map the scatter list for DMA access */ nents = dma_map_sg(>pdev->dev, command->sg, nents, dma_dir); + prefetch(>flags); + command->scatter_ents = nents; /* @@ -2392,7 +2394,7 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, fis = command->command; fis->type= 0x27; fis->opts= 1 << 7; - if (rq_data_dir(rq) == READ) + if (dma_dir == DMA_FROM_DEVICE) fis->command = ATA_CMD_FPDMA_READ; else fis->command = ATA_CMD_FPDMA_WRITE; @@ -2412,7 +2414,7 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, fis->res3= 0; fill_command_sg(dd, command, nents); - if (command->unaligned) + if (unlikely(command->unaligned)) fis->device |= 1 << 7; /* Populate the command header */ @@ -2433,7 +2435,7 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, * To prevent this command from being issued * if an internal command is in progress or error handling is active. */ - if (port->flags & MTIP_PF_PAUSE_IO) { + if (unlikely(port->flags & MTIP_PF_PAUSE_IO)) { set_bit(rq->tag, port->cmds_to_issue); set_bit(MTIP_PF_ISSUE_CMDS_BIT, >flags); return; @@ -3754,7 +3756,7 @@ static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx, struct driver_data *dd = hctx->queue->queuedata; struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq); - if (!dd->unal_qdepth || rq_data_dir(rq) == READ) + if (rq_data_dir(rq) == READ || !dd->unal_qdepth) return false; /* @@ -3776,11 +3778,11 @@ static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq) { int ret; - if (mtip_check_unal_depth(hctx, rq)) + if (unlikely(mtip_check_unal_depth(hctx, rq))) return BLK_MQ_RQ_QUEUE_BUSY; ret = mtip_submit_request(hctx, rq); - if (!ret) + if (likely(!ret)) return BLK_MQ_RQ_QUEUE_OK; rq->errors = ret; diff --git a/drivers/block/mtip32xx/mtip32xx.h b/drivers/block/mtip32xx/mtip32xx.h index 4b9b554..ba1b31e 100644 --- a/drivers/block/mtip32xx/mtip32xx.h +++ b/drivers/block/mtip32xx/mtip32xx.h @@ -493,19 +493,19 @@ struct driver_data { struct workqueue_struct *isr_workq; - struct mtip_work work[MTIP_MAX_SLOT_GROUPS]; - atomic_t irq_workers_active; + struct mtip_work work[MTIP_MAX_SLOT_GROUPS]; + int isr_binding; struct block_device *bdev; - int unal_qdepth; /* qdepth of unaligned IO queue */ - struct list_head online_list; /* linkage for online list */ struct list_head remove_list; /* linkage for removing list */ + + int unal_qdepth; /* qdepth of unaligned IO queue */ }; #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtip32xx: minor performance enhancements
This patch adds the following: 1) Compiler hinting in the fast path. 2) A prefetch of port-flags to eliminate moderate cpu stalling later in mtip_hw_submit_io(). 3) Eliminate a redundant rq_data_dir(). 4) Reorder members of driver_data to eliminate false cacheline sharing between irq_workers_active and unal_qdepth. With some workload and topology configurations, I'm seeing ~1.5% throughput improvement in small block random read benchmarks as well as improved latency std. dev. Signed-off-by: Sam Bradshaw sbrads...@micron.com --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 74abd49..fcbac54 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -2380,6 +2380,8 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, /* Map the scatter list for DMA access */ nents = dma_map_sg(dd-pdev-dev, command-sg, nents, dma_dir); + prefetch(port-flags); + command-scatter_ents = nents; /* @@ -2392,7 +2394,7 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, fis = command-command; fis-type= 0x27; fis-opts= 1 7; - if (rq_data_dir(rq) == READ) + if (dma_dir == DMA_FROM_DEVICE) fis-command = ATA_CMD_FPDMA_READ; else fis-command = ATA_CMD_FPDMA_WRITE; @@ -2412,7 +2414,7 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, fis-res3= 0; fill_command_sg(dd, command, nents); - if (command-unaligned) + if (unlikely(command-unaligned)) fis-device |= 1 7; /* Populate the command header */ @@ -2433,7 +2435,7 @@ static void mtip_hw_submit_io(struct driver_data *dd, struct request *rq, * To prevent this command from being issued * if an internal command is in progress or error handling is active. */ - if (port-flags MTIP_PF_PAUSE_IO) { + if (unlikely(port-flags MTIP_PF_PAUSE_IO)) { set_bit(rq-tag, port-cmds_to_issue); set_bit(MTIP_PF_ISSUE_CMDS_BIT, port-flags); return; @@ -3754,7 +3756,7 @@ static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx, struct driver_data *dd = hctx-queue-queuedata; struct mtip_cmd *cmd = blk_mq_rq_to_pdu(rq); - if (!dd-unal_qdepth || rq_data_dir(rq) == READ) + if (rq_data_dir(rq) == READ || !dd-unal_qdepth) return false; /* @@ -3776,11 +3778,11 @@ static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq) { int ret; - if (mtip_check_unal_depth(hctx, rq)) + if (unlikely(mtip_check_unal_depth(hctx, rq))) return BLK_MQ_RQ_QUEUE_BUSY; ret = mtip_submit_request(hctx, rq); - if (!ret) + if (likely(!ret)) return BLK_MQ_RQ_QUEUE_OK; rq-errors = ret; diff --git a/drivers/block/mtip32xx/mtip32xx.h b/drivers/block/mtip32xx/mtip32xx.h index 4b9b554..ba1b31e 100644 --- a/drivers/block/mtip32xx/mtip32xx.h +++ b/drivers/block/mtip32xx/mtip32xx.h @@ -493,19 +493,19 @@ struct driver_data { struct workqueue_struct *isr_workq; - struct mtip_work work[MTIP_MAX_SLOT_GROUPS]; - atomic_t irq_workers_active; + struct mtip_work work[MTIP_MAX_SLOT_GROUPS]; + int isr_binding; struct block_device *bdev; - int unal_qdepth; /* qdepth of unaligned IO queue */ - struct list_head online_list; /* linkage for online list */ struct list_head remove_list; /* linkage for removing list */ + + int unal_qdepth; /* qdepth of unaligned IO queue */ }; #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtip32xx: switch to vmalloc() to mitigate high order allocation failures
The mtip_port kmalloc() allocation is relatively high order, in particular since the recent doubling of the size of the scatterlist container. The allocation has been shown to fail during SRSI under fragmented or low memory conditions. Switching to vmalloc() mitigates the problem. Converted both mtip_port and driver_data allocations for consistency. Signed-off-by: Sam Bradshaw diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 624e9d9..f55e3e5 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3116,7 +3116,7 @@ static int mtip_free_orphan(struct driver_data *dd) dd->queue = NULL; } } - kfree(dd); + vfree(dd); return 0; } @@ -3495,8 +3495,7 @@ static int mtip_hw_init(struct driver_data *dd) hba_setup(dd); - dd->port = kzalloc_node(sizeof(struct mtip_port), GFP_KERNEL, - dd->numa_node); + dd->port = vzalloc_node(sizeof(struct mtip_port), dd->numa_node); if (!dd->port) { dev_err(>pdev->dev, "Memory allocation: port structure\n"); @@ -3680,7 +3679,7 @@ out2: out1: /* Free the memory allocated for the for structure. */ - kfree(dd->port); + vfree(dd->port); return rv; } @@ -3724,7 +3723,7 @@ static int mtip_hw_exit(struct driver_data *dd) mtip_dma_free(dd); /* Free the memory allocated for the for structure. */ - kfree(dd->port); + vfree(dd->port); dd->port = NULL; return 0; @@ -4512,7 +4511,7 @@ static int mtip_pci_probe(struct pci_dev *pdev, my_node, pcibus_to_node(pdev->bus), dev_to_node(>dev), cpu_to_node(smp_processor_id()), smp_processor_id()); - dd = kzalloc_node(sizeof(struct driver_data), GFP_KERNEL, my_node); + dd = vzalloc_node(sizeof(struct driver_data), my_node); if (dd == NULL) { dev_err(>dev, "Unable to allocate memory for driver data\n"); @@ -4670,7 +4669,7 @@ setmask_err: pcim_iounmap_regions(pdev, 1 << MTIP_ABAR); iomap_err: - kfree(dd); + vfree(dd); pci_set_drvdata(pdev, NULL); return rv; done: @@ -4731,7 +4730,7 @@ static void mtip_pci_remove(struct pci_dev *pdev) spin_unlock_irqrestore(_lock, flags); if (!dd->sr) - kfree(dd); + vfree(dd); else set_bit(MTIP_DDF_REMOVE_DONE_BIT, >dd_flag); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mtip32xx: mtip_async_complete() bug fixes
On 03/13/2014 02:02 PM, Jens Axboe wrote: > On 03/13/2014 03:09 PM, Sam Bradshaw wrote: >> This patch fixes 2 issues in the fast completion path: >> 1) Possible double completions / double dma_unmap_sg() calls due to lack >> of atomicity in the check and subsequent dereference of the upper layer >> callback function. Fixed with cmpxchg before unmap and callback. >> 2) Regression in unaligned IO constraining workaround for p420m devices. >> Fixed by checking if IO is unaligned and using proper semaphore if so. > > Sam, what is this patch against? It doesn't apply to -rc6 and it doesn't > apply to for-3.15/drivers (the latter would be preferred/required). It > also seems to have line break issues. > It's against for-3.15/drivers. Sorry about the breaks; if I remove them does it apply cleanly? diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index b2012b7..624e9d9 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -252,38 +252,45 @@ static void mtip_async_complete(struct mtip_port *port, void *data, int status) { - struct mtip_cmd *command; + struct mtip_cmd *cmd; struct driver_data *dd = data; - int cb_status = status ? -EIO : 0; + int unaligned, cb_status = status ? -EIO : 0; + void (*func)(void *, int); if (unlikely(!dd) || unlikely(!port)) return; - command = >commands[tag]; + cmd = >commands[tag]; if (unlikely(status == PORT_IRQ_TF_ERR)) { dev_warn(>dd->pdev->dev, "Command tag %d failed due to TFE\n", tag); } - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(>pdev->dev, - command->sg, - command->scatter_ents, - command->direction); + /* Clear the active flag */ + atomic_set(>commands[tag].active, 0); /* Upper layer callback */ - if (likely(command->async_callback)) - command->async_callback(command->async_data, cb_status); + func = cmd->async_callback; + if (likely(func && cmpxchg(>async_callback, func, 0) == func)) { - command->async_callback = NULL; - command->comp_func = NULL; + /* Unmap the DMA scatter list entries */ + dma_unmap_sg(>pdev->dev, + cmd->sg, + cmd->scatter_ents, + cmd->direction); - /* Clear the allocated and active bits for the command */ - atomic_set(>commands[tag].active, 0); - release_slot(port, tag); + func(cmd->async_data, cb_status); + unaligned = cmd->unaligned; - up(>cmd_slot); + /* Clear the allocated bit for the command */ + release_slot(port, tag); + + if (unlikely(unaligned)) + up(>cmd_slot_unal); + else + up(>cmd_slot); + } } /* @@ -660,11 +667,12 @@ static void mtip_timeout_function(unsigned long int data) { struct mtip_port *port = (struct mtip_port *) data; struct host_to_dev_fis *fis; - struct mtip_cmd *command; - int tag, cmdto_cnt = 0; + struct mtip_cmd *cmd; + int unaligned, tag, cmdto_cnt = 0; unsigned int bit, group; unsigned int num_command_slots; unsigned long to, tagaccum[SLOTBITS_IN_LONGS]; + void (*func)(void *, int); if (unlikely(!port)) return; @@ -694,8 +702,8 @@ static void mtip_timeout_function(unsigned long int data) group = tag >> 5; bit = tag & 0x1F; - command = >commands[tag]; - fis = (struct host_to_dev_fis *) command->command; + cmd = >commands[tag]; + fis = (struct host_to_dev_fis *) cmd->command; set_bit(tag, tagaccum); cmdto_cnt++; @@ -709,27 +717,30 @@ static void mtip_timeout_function(unsigned long int data) */ writel(1 << bit, port->completed[group]); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(>dd->pdev->dev, - command->sg, - command->scatter_ents, - command->direction); + /* Clear the active flag for the command */ + atomic_set(>commands[tag].active, 0); - /* Call the async completion c
[PATCH] mtip32xx: mtip_async_complete() bug fixes
This patch fixes 2 issues in the fast completion path: 1) Possible double completions / double dma_unmap_sg() calls due to lack of atomicity in the check and subsequent dereference of the upper layer callback function. Fixed with cmpxchg before unmap and callback. 2) Regression in unaligned IO constraining workaround for p420m devices. Fixed by checking if IO is unaligned and using proper semaphore if so. Bumped version to indicate presence of these fixes. Signed-off-by: Sam Bradshaw diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index b2012b7..624e9d9 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -252,38 +252,45 @@ static void mtip_async_complete(struct mtip_port *port, void *data, int status) { - struct mtip_cmd *command; + struct mtip_cmd *cmd; struct driver_data *dd = data; - int cb_status = status ? -EIO : 0; + int unaligned, cb_status = status ? -EIO : 0; + void (*func)(void *, int); if (unlikely(!dd) || unlikely(!port)) return; - command = >commands[tag]; + cmd = >commands[tag]; if (unlikely(status == PORT_IRQ_TF_ERR)) { dev_warn(>dd->pdev->dev, "Command tag %d failed due to TFE\n", tag); } - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(>pdev->dev, - command->sg, - command->scatter_ents, - command->direction); + /* Clear the active flag */ + atomic_set(>commands[tag].active, 0); /* Upper layer callback */ - if (likely(command->async_callback)) - command->async_callback(command->async_data, cb_status); + func = cmd->async_callback; + if (likely(func && cmpxchg(>async_callback, func, 0) == func)) { - command->async_callback = NULL; - command->comp_func = NULL; + /* Unmap the DMA scatter list entries */ + dma_unmap_sg(>pdev->dev, + cmd->sg, + cmd->scatter_ents, + cmd->direction); - /* Clear the allocated and active bits for the command */ - atomic_set(>commands[tag].active, 0); - release_slot(port, tag); + func(cmd->async_data, cb_status); + unaligned = cmd->unaligned; - up(>cmd_slot); + /* Clear the allocated bit for the command */ + release_slot(port, tag); + + if (unlikely(unaligned)) + up(>cmd_slot_unal); + else + up(>cmd_slot); + } } /* @@ -660,11 +667,12 @@ static void mtip_timeout_function(unsigned long int data) { struct mtip_port *port = (struct mtip_port *) data; struct host_to_dev_fis *fis; - struct mtip_cmd *command; - int tag, cmdto_cnt = 0; + struct mtip_cmd *cmd; + int unaligned, tag, cmdto_cnt = 0; unsigned int bit, group; unsigned int num_command_slots; unsigned long to, tagaccum[SLOTBITS_IN_LONGS]; + void (*func)(void *, int); if (unlikely(!port)) return; @@ -694,8 +702,8 @@ static void mtip_timeout_function(unsigned long int data) group = tag >> 5; bit = tag & 0x1F; - command = >commands[tag]; - fis = (struct host_to_dev_fis *) command->command; + cmd = >commands[tag]; + fis = (struct host_to_dev_fis *) cmd->command; set_bit(tag, tagaccum); cmdto_cnt++; @@ -709,27 +717,30 @@ static void mtip_timeout_function(unsigned long int data) */ writel(1 << bit, port->completed[group]); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(>dd->pdev->dev, - command->sg, - command->scatter_ents, - command->direction); + /* Clear the active flag for the command */ + atomic_set(>commands[tag].active, 0); - /* Call the async completion callback. */ - if (likely(command->async_callback)) - command->async_callback(command->async_data, --EIO); - command->async_callback = NULL; - command->comp_func = NULL; + func = cmd->async_call
[PATCH] mtip32xx: mtip_async_complete() bug fixes
This patch fixes 2 issues in the fast completion path: 1) Possible double completions / double dma_unmap_sg() calls due to lack of atomicity in the check and subsequent dereference of the upper layer callback function. Fixed with cmpxchg before unmap and callback. 2) Regression in unaligned IO constraining workaround for p420m devices. Fixed by checking if IO is unaligned and using proper semaphore if so. Bumped version to indicate presence of these fixes. Signed-off-by: Sam Bradshaw sbrads...@micron.com diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index b2012b7..624e9d9 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -252,38 +252,45 @@ static void mtip_async_complete(struct mtip_port *port, void *data, int status) { - struct mtip_cmd *command; + struct mtip_cmd *cmd; struct driver_data *dd = data; - int cb_status = status ? -EIO : 0; + int unaligned, cb_status = status ? -EIO : 0; + void (*func)(void *, int); if (unlikely(!dd) || unlikely(!port)) return; - command = port-commands[tag]; + cmd = port-commands[tag]; if (unlikely(status == PORT_IRQ_TF_ERR)) { dev_warn(port-dd-pdev-dev, Command tag %d failed due to TFE\n, tag); } - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(dd-pdev-dev, - command-sg, - command-scatter_ents, - command-direction); + /* Clear the active flag */ + atomic_set(port-commands[tag].active, 0); /* Upper layer callback */ - if (likely(command-async_callback)) - command-async_callback(command-async_data, cb_status); + func = cmd-async_callback; + if (likely(func cmpxchg(cmd-async_callback, func, 0) == func)) { - command-async_callback = NULL; - command-comp_func = NULL; + /* Unmap the DMA scatter list entries */ + dma_unmap_sg(dd-pdev-dev, + cmd-sg, + cmd-scatter_ents, + cmd-direction); - /* Clear the allocated and active bits for the command */ - atomic_set(port-commands[tag].active, 0); - release_slot(port, tag); + func(cmd-async_data, cb_status); + unaligned = cmd-unaligned; - up(port-cmd_slot); + /* Clear the allocated bit for the command */ + release_slot(port, tag); + + if (unlikely(unaligned)) + up(port-cmd_slot_unal); + else + up(port-cmd_slot); + } } /* @@ -660,11 +667,12 @@ static void mtip_timeout_function(unsigned long int data) { struct mtip_port *port = (struct mtip_port *) data; struct host_to_dev_fis *fis; - struct mtip_cmd *command; - int tag, cmdto_cnt = 0; + struct mtip_cmd *cmd; + int unaligned, tag, cmdto_cnt = 0; unsigned int bit, group; unsigned int num_command_slots; unsigned long to, tagaccum[SLOTBITS_IN_LONGS]; + void (*func)(void *, int); if (unlikely(!port)) return; @@ -694,8 +702,8 @@ static void mtip_timeout_function(unsigned long int data) group = tag 5; bit = tag 0x1F; - command = port-commands[tag]; - fis = (struct host_to_dev_fis *) command-command; + cmd = port-commands[tag]; + fis = (struct host_to_dev_fis *) cmd-command; set_bit(tag, tagaccum); cmdto_cnt++; @@ -709,27 +717,30 @@ static void mtip_timeout_function(unsigned long int data) */ writel(1 bit, port-completed[group]); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(port-dd-pdev-dev, - command-sg, - command-scatter_ents, - command-direction); + /* Clear the active flag for the command */ + atomic_set(port-commands[tag].active, 0); - /* Call the async completion callback. */ - if (likely(command-async_callback)) - command-async_callback(command-async_data, --EIO); - command-async_callback = NULL; - command-comp_func = NULL; + func = cmd-async_callback; + if (func + cmpxchg(cmd-async_callback, func, 0) == func
Re: [PATCH] mtip32xx: mtip_async_complete() bug fixes
On 03/13/2014 02:02 PM, Jens Axboe wrote: On 03/13/2014 03:09 PM, Sam Bradshaw wrote: This patch fixes 2 issues in the fast completion path: 1) Possible double completions / double dma_unmap_sg() calls due to lack of atomicity in the check and subsequent dereference of the upper layer callback function. Fixed with cmpxchg before unmap and callback. 2) Regression in unaligned IO constraining workaround for p420m devices. Fixed by checking if IO is unaligned and using proper semaphore if so. Sam, what is this patch against? It doesn't apply to -rc6 and it doesn't apply to for-3.15/drivers (the latter would be preferred/required). It also seems to have line break issues. It's against for-3.15/drivers. Sorry about the breaks; if I remove them does it apply cleanly? diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index b2012b7..624e9d9 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -252,38 +252,45 @@ static void mtip_async_complete(struct mtip_port *port, void *data, int status) { - struct mtip_cmd *command; + struct mtip_cmd *cmd; struct driver_data *dd = data; - int cb_status = status ? -EIO : 0; + int unaligned, cb_status = status ? -EIO : 0; + void (*func)(void *, int); if (unlikely(!dd) || unlikely(!port)) return; - command = port-commands[tag]; + cmd = port-commands[tag]; if (unlikely(status == PORT_IRQ_TF_ERR)) { dev_warn(port-dd-pdev-dev, Command tag %d failed due to TFE\n, tag); } - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(dd-pdev-dev, - command-sg, - command-scatter_ents, - command-direction); + /* Clear the active flag */ + atomic_set(port-commands[tag].active, 0); /* Upper layer callback */ - if (likely(command-async_callback)) - command-async_callback(command-async_data, cb_status); + func = cmd-async_callback; + if (likely(func cmpxchg(cmd-async_callback, func, 0) == func)) { - command-async_callback = NULL; - command-comp_func = NULL; + /* Unmap the DMA scatter list entries */ + dma_unmap_sg(dd-pdev-dev, + cmd-sg, + cmd-scatter_ents, + cmd-direction); - /* Clear the allocated and active bits for the command */ - atomic_set(port-commands[tag].active, 0); - release_slot(port, tag); + func(cmd-async_data, cb_status); + unaligned = cmd-unaligned; - up(port-cmd_slot); + /* Clear the allocated bit for the command */ + release_slot(port, tag); + + if (unlikely(unaligned)) + up(port-cmd_slot_unal); + else + up(port-cmd_slot); + } } /* @@ -660,11 +667,12 @@ static void mtip_timeout_function(unsigned long int data) { struct mtip_port *port = (struct mtip_port *) data; struct host_to_dev_fis *fis; - struct mtip_cmd *command; - int tag, cmdto_cnt = 0; + struct mtip_cmd *cmd; + int unaligned, tag, cmdto_cnt = 0; unsigned int bit, group; unsigned int num_command_slots; unsigned long to, tagaccum[SLOTBITS_IN_LONGS]; + void (*func)(void *, int); if (unlikely(!port)) return; @@ -694,8 +702,8 @@ static void mtip_timeout_function(unsigned long int data) group = tag 5; bit = tag 0x1F; - command = port-commands[tag]; - fis = (struct host_to_dev_fis *) command-command; + cmd = port-commands[tag]; + fis = (struct host_to_dev_fis *) cmd-command; set_bit(tag, tagaccum); cmdto_cnt++; @@ -709,27 +717,30 @@ static void mtip_timeout_function(unsigned long int data) */ writel(1 bit, port-completed[group]); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(port-dd-pdev-dev, - command-sg, - command-scatter_ents, - command-direction); + /* Clear the active flag for the command */ + atomic_set(port-commands[tag].active, 0); - /* Call the async completion callback. */ - if (likely(command-async_callback)) - command-async_callback(command-async_data, --EIO
[PATCH] mtip32xx: switch to vmalloc() to mitigate high order allocation failures
The mtip_port kmalloc() allocation is relatively high order, in particular since the recent doubling of the size of the scatterlist container. The allocation has been shown to fail during SRSI under fragmented or low memory conditions. Switching to vmalloc() mitigates the problem. Converted both mtip_port and driver_data allocations for consistency. Signed-off-by: Sam Bradshaw sbrads...@micron.com diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 624e9d9..f55e3e5 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3116,7 +3116,7 @@ static int mtip_free_orphan(struct driver_data *dd) dd-queue = NULL; } } - kfree(dd); + vfree(dd); return 0; } @@ -3495,8 +3495,7 @@ static int mtip_hw_init(struct driver_data *dd) hba_setup(dd); - dd-port = kzalloc_node(sizeof(struct mtip_port), GFP_KERNEL, - dd-numa_node); + dd-port = vzalloc_node(sizeof(struct mtip_port), dd-numa_node); if (!dd-port) { dev_err(dd-pdev-dev, Memory allocation: port structure\n); @@ -3680,7 +3679,7 @@ out2: out1: /* Free the memory allocated for the for structure. */ - kfree(dd-port); + vfree(dd-port); return rv; } @@ -3724,7 +3723,7 @@ static int mtip_hw_exit(struct driver_data *dd) mtip_dma_free(dd); /* Free the memory allocated for the for structure. */ - kfree(dd-port); + vfree(dd-port); dd-port = NULL; return 0; @@ -4512,7 +4511,7 @@ static int mtip_pci_probe(struct pci_dev *pdev, my_node, pcibus_to_node(pdev-bus), dev_to_node(pdev-dev), cpu_to_node(smp_processor_id()), smp_processor_id()); - dd = kzalloc_node(sizeof(struct driver_data), GFP_KERNEL, my_node); + dd = vzalloc_node(sizeof(struct driver_data), my_node); if (dd == NULL) { dev_err(pdev-dev, Unable to allocate memory for driver data\n); @@ -4670,7 +4669,7 @@ setmask_err: pcim_iounmap_regions(pdev, 1 MTIP_ABAR); iomap_err: - kfree(dd); + vfree(dd); pci_set_drvdata(pdev, NULL); return rv; done: @@ -4731,7 +4730,7 @@ static void mtip_pci_remove(struct pci_dev *pdev) spin_unlock_irqrestore(dev_lock, flags); if (!dd-sr) - kfree(dd); + vfree(dd); else set_bit(MTIP_DDF_REMOVE_DONE_BIT, dd-dd_flag); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] mtip32xx: Unmap the DMA segments before completing the IO request
On 03/12/2014 09:05 AM, Felipe Franciosi wrote: If the buffers are unmapped after completing a request, then stale data might be in the request. Good find, Felipe, thank you. I would prefer something along the lines of this patch to make sure to avoid double completions / dma_unmap_sg() calls during surprise removal and/or timeout conditions. Jens: note that this patch also fixes a regression in the unaligned workaround implementation that was introduced by the SRSI patch. Signed-off-by: Sam Bradshaw diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 5160269..390ac6f 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -252,38 +252,45 @@ static void mtip_async_complete(struct mtip_port *port, void *data, int status) { - struct mtip_cmd *command; + struct mtip_cmd *cmd; struct driver_data *dd = data; - int cb_status = status ? -EIO : 0; + int unaligned, cb_status = status ? -EIO : 0; + void (*func)(void *, int); if (unlikely(!dd) || unlikely(!port)) return; - command = >commands[tag]; + cmd = >commands[tag]; if (unlikely(status == PORT_IRQ_TF_ERR)) { dev_warn(>dd->pdev->dev, "Command tag %d failed due to TFE\n", tag); } + /* Clear the active flag */ + atomic_set(>commands[tag].active, 0); + /* Upper layer callback */ - if (likely(command->async_callback)) - command->async_callback(command->async_data, cb_status); + func = cmd->async_callback; + if (likely(func && cmpxchg(>async_callback, func, 0) == func)) { - command->async_callback = NULL; - command->comp_func = NULL; + /* Unmap the DMA scatter list entries */ + dma_unmap_sg(>pdev->dev, + cmd->sg, + cmd->scatter_ents, + cmd->direction); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(>pdev->dev, - command->sg, - command->scatter_ents, - command->direction); + func(cmd->async_data, cb_status); + unaligned = cmd->unaligned; - /* Clear the allocated and active bits for the command */ - atomic_set(>commands[tag].active, 0); - release_slot(port, tag); + /* Clear the allocated bit for the command */ + release_slot(port, tag); - up(>cmd_slot); + if (unlikely(unaligned)) + up(>cmd_slot_unal); + else + up(>cmd_slot); + } } /* @@ -660,11 +667,12 @@ static void mtip_timeout_function(unsigned long int data) { struct mtip_port *port = (struct mtip_port *) data; struct host_to_dev_fis *fis; - struct mtip_cmd *command; - int tag, cmdto_cnt = 0; + struct mtip_cmd *cmd; + int unaligned, tag, cmdto_cnt = 0; unsigned int bit, group; unsigned int num_command_slots; unsigned long to, tagaccum[SLOTBITS_IN_LONGS]; + void (*func)(void *, int); if (unlikely(!port)) return; @@ -694,8 +702,8 @@ static void mtip_timeout_function(unsigned long int data) group = tag >> 5; bit = tag & 0x1F; - command = >commands[tag]; - fis = (struct host_to_dev_fis *) command->command; + cmd = >commands[tag]; + fis = (struct host_to_dev_fis *) cmd->command; set_bit(tag, tagaccum); cmdto_cnt++; @@ -709,27 +717,30 @@ static void mtip_timeout_function(unsigned long int data) */ writel(1 << bit, port->completed[group]); - /* Call the async completion callback. */ - if (likely(command->async_callback)) - command->async_callback(command->async_data, --EIO); - command->async_callback = NULL; - command->comp_func = NULL; + /* Clear the active flag for the command */ + atomic_set(>commands[tag].active, 0); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(>dd->pdev->dev, - command->sg, - command->scatter_ents, - command-&
Re: [PATCH 2/2] mtip32xx: Unmap the DMA segments before completing the IO request
On 03/12/2014 09:05 AM, Felipe Franciosi wrote: If the buffers are unmapped after completing a request, then stale data might be in the request. Good find, Felipe, thank you. I would prefer something along the lines of this patch to make sure to avoid double completions / dma_unmap_sg() calls during surprise removal and/or timeout conditions. Jens: note that this patch also fixes a regression in the unaligned workaround implementation that was introduced by the SRSI patch. Signed-off-by: Sam Bradshaw sbrads...@micron.com diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 5160269..390ac6f 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -252,38 +252,45 @@ static void mtip_async_complete(struct mtip_port *port, void *data, int status) { - struct mtip_cmd *command; + struct mtip_cmd *cmd; struct driver_data *dd = data; - int cb_status = status ? -EIO : 0; + int unaligned, cb_status = status ? -EIO : 0; + void (*func)(void *, int); if (unlikely(!dd) || unlikely(!port)) return; - command = port-commands[tag]; + cmd = port-commands[tag]; if (unlikely(status == PORT_IRQ_TF_ERR)) { dev_warn(port-dd-pdev-dev, Command tag %d failed due to TFE\n, tag); } + /* Clear the active flag */ + atomic_set(port-commands[tag].active, 0); + /* Upper layer callback */ - if (likely(command-async_callback)) - command-async_callback(command-async_data, cb_status); + func = cmd-async_callback; + if (likely(func cmpxchg(cmd-async_callback, func, 0) == func)) { - command-async_callback = NULL; - command-comp_func = NULL; + /* Unmap the DMA scatter list entries */ + dma_unmap_sg(dd-pdev-dev, + cmd-sg, + cmd-scatter_ents, + cmd-direction); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(dd-pdev-dev, - command-sg, - command-scatter_ents, - command-direction); + func(cmd-async_data, cb_status); + unaligned = cmd-unaligned; - /* Clear the allocated and active bits for the command */ - atomic_set(port-commands[tag].active, 0); - release_slot(port, tag); + /* Clear the allocated bit for the command */ + release_slot(port, tag); - up(port-cmd_slot); + if (unlikely(unaligned)) + up(port-cmd_slot_unal); + else + up(port-cmd_slot); + } } /* @@ -660,11 +667,12 @@ static void mtip_timeout_function(unsigned long int data) { struct mtip_port *port = (struct mtip_port *) data; struct host_to_dev_fis *fis; - struct mtip_cmd *command; - int tag, cmdto_cnt = 0; + struct mtip_cmd *cmd; + int unaligned, tag, cmdto_cnt = 0; unsigned int bit, group; unsigned int num_command_slots; unsigned long to, tagaccum[SLOTBITS_IN_LONGS]; + void (*func)(void *, int); if (unlikely(!port)) return; @@ -694,8 +702,8 @@ static void mtip_timeout_function(unsigned long int data) group = tag 5; bit = tag 0x1F; - command = port-commands[tag]; - fis = (struct host_to_dev_fis *) command-command; + cmd = port-commands[tag]; + fis = (struct host_to_dev_fis *) cmd-command; set_bit(tag, tagaccum); cmdto_cnt++; @@ -709,27 +717,30 @@ static void mtip_timeout_function(unsigned long int data) */ writel(1 bit, port-completed[group]); - /* Call the async completion callback. */ - if (likely(command-async_callback)) - command-async_callback(command-async_data, --EIO); - command-async_callback = NULL; - command-comp_func = NULL; + /* Clear the active flag for the command */ + atomic_set(port-commands[tag].active, 0); - /* Unmap the DMA scatter list entries */ - dma_unmap_sg(port-dd-pdev-dev, - command-sg, - command-scatter_ents, - command-direction); + func = cmd-async_callback; + if (func + cmpxchg(cmd-async_callback, func, 0) == func
RE: mtip32xx blk-mq status?
> On 2014-02-26 11:29, Christoph Hellwig wrote: > > Hi all, > > > > with blk-mq stabilizing in mainline and Jens using mtip32xx as tje > major > > example drivers in the past is there any progress on getting the > > conversion finished and merged? > > I'll pick up the pieces as soon as I get back and can test again. The > basic conversion is easy enough, I'll just dust that off. On top of > that > I started splitting the issue groups up, since it actually gets you a > good step further than just the basic conversion. If you have a public branch for the latter, send it our way. We can help with the SRSI testing. > The only problem area I ran into was the special case of the unaligned > writes. Yup, it's a painful limitation. If you need any p420m hardware, let me know. -Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: mtip32xx blk-mq status?
On 2014-02-26 11:29, Christoph Hellwig wrote: Hi all, with blk-mq stabilizing in mainline and Jens using mtip32xx as tje major example drivers in the past is there any progress on getting the conversion finished and merged? I'll pick up the pieces as soon as I get back and can test again. The basic conversion is easy enough, I'll just dust that off. On top of that I started splitting the issue groups up, since it actually gets you a good step further than just the basic conversion. If you have a public branch for the latter, send it our way. We can help with the SRSI testing. The only problem area I ran into was the special case of the unaligned writes. Yup, it's a painful limitation. If you need any p420m hardware, let me know. -Sam -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtip32xx: Make SGL container per-command to eliminate high order dma allocation
The mtip32xx driver makes a high order dma memory allocation to store a command index table, some dedicated buffers, and a command header & SGL blob. This allocation can fail with a surprise insert under low & fragmented memory conditions. This patch breaks these regions up into separate low order allocations and increases the maximum number of segments a single command SGL can have. We wanted to allow at least 256 segments for 1 MB direct IO. Since the command header occupies the first 0x80 bytes of the SGL blob, that meant we needed two 4k pages to contain the header and SGL. The two pages allow up to 504 SGL segments. Signed-off-by: Sam Bradshaw Signed-off-by: Asai Thambi S P --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 050c712..ae46bfd 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -41,10 +41,31 @@ #include "mtip32xx.h" #define HW_CMD_SLOT_SZ (MTIP_MAX_COMMAND_SLOTS * 32) -#define HW_CMD_TBL_SZ (AHCI_CMD_TBL_HDR_SZ + (MTIP_MAX_SG * 16)) -#define HW_CMD_TBL_AR_SZ (HW_CMD_TBL_SZ * MTIP_MAX_COMMAND_SLOTS) -#define HW_PORT_PRIV_DMA_SZ \ - (HW_CMD_SLOT_SZ + HW_CMD_TBL_AR_SZ + AHCI_RX_FIS_SZ) + +/* DMA region containing RX Fis, Identify, RLE10, and SMART buffers */ +#define AHCI_RX_FIS_SZ 0x100 +#define AHCI_RX_FIS_OFFSET 0x0 +#define AHCI_IDFY_SZATA_SECT_SIZE +#define AHCI_IDFY_OFFSET0x400 +#define AHCI_SECTBUF_SZ ATA_SECT_SIZE +#define AHCI_SECTBUF_OFFSET 0x800 +#define AHCI_SMARTBUF_SZATA_SECT_SIZE +#define AHCI_SMARTBUF_OFFSET0xC00 +/* 0x100 + 0x200 + 0x200 + 0x200 is smaller than 4k but we pad it out */ +#define BLOCK_DMA_ALLOC_SZ 4096 + +/* DMA region containing command table (should be 8192 bytes) */ +#define AHCI_CMD_SLOT_SZsizeof(struct mtip_cmd_hdr) +#define AHCI_CMD_TBL_SZ (MTIP_MAX_COMMAND_SLOTS * AHCI_CMD_SLOT_SZ) +#define AHCI_CMD_TBL_OFFSET 0x0 + +/* DMA region per command (contains header and SGL) */ +#define AHCI_CMD_TBL_HDR_SZ 0x80 +#define AHCI_CMD_TBL_HDR_OFFSET 0x0 +#define AHCI_CMD_TBL_SGL_SZ (MTIP_MAX_SG * sizeof(struct mtip_cmd_sg)) +#define AHCI_CMD_TBL_SGL_OFFSET AHCI_CMD_TBL_HDR_SZ +#define CMD_DMA_ALLOC_SZ(AHCI_CMD_TBL_SGL_SZ + AHCI_CMD_TBL_HDR_SZ) + #define HOST_CAP_NZDMA (1 << 19) #define HOST_HSORG 0xFC @@ -3313,6 +3334,118 @@ st_out: } /* + * DMA region teardown + * + * @dd Pointer to driver_data structure + * + * return value + * None + */ +static void mtip_dma_free(struct driver_data *dd) +{ + int i; + struct mtip_port *port = dd->port; + + if (port->block1) + dmam_free_coherent(>pdev->dev, BLOCK_DMA_ALLOC_SZ, + port->block1, port->block1_dma); + + if (port->command_list) { + dmam_free_coherent(>pdev->dev, AHCI_CMD_TBL_SZ, + port->command_list, port->command_list_dma); + } + + for (i = 0; i < MTIP_MAX_COMMAND_SLOTS; i++) { + if (port->commands[i].command) + dmam_free_coherent(>pdev->dev, CMD_DMA_ALLOC_SZ, + port->commands[i].command, + port->commands[i].command_dma); + } +} + +/* + * DMA region setup + * + * @dd Pointer to driver_data structure + * + * return value + * -ENOMEM Not enough free DMA region space to initialize driver + */ +static int mtip_dma_alloc(struct driver_data *dd) +{ + struct mtip_port *port = dd->port; + int i, rv = 0; + u32 host_cap_64 = readl(dd->mmio + HOST_CAP) & HOST_CAP_64; + + /* Allocate dma memory for RX Fis, Identify, and Sector Bufffer */ + port->block1 = + dmam_alloc_coherent(>pdev->dev, BLOCK_DMA_ALLOC_SZ, + >block1_dma, GFP_KERNEL); + if (!port->block1) + return -ENOMEM; + memset(port->block1, 0, BLOCK_DMA_ALLOC_SZ); + + /* Allocate dma memory for command list */ + port->command_list = + dmam_alloc_coherent(>pdev->dev, AHCI_CMD_TBL_SZ, + >command_list_dma, GFP_KERNEL); + if (!port->command_list) { + dmam_free_coherent(>pdev->dev, BLOCK_DMA_ALLOC_SZ, + port->block1, port->block1_dma); + port->block1 = NULL; + port->block1_dma = 0; + return -ENOMEM; + } + memset(port->command_list, 0, AHCI_CMD_TBL_SZ); + + /* Setup all pointers into first DMA region */ + port->rxfis = port->block1 + AHCI_RX_FIS_OFFSET; + port->rxfis_dma = port->block1_dma + AHCI_RX_FIS_OFFSET; + port-
[PATCH] mtip32xx: Make SGL container per-command to eliminate high order dma allocation
The mtip32xx driver makes a high order dma memory allocation to store a command index table, some dedicated buffers, and a command header SGL blob. This allocation can fail with a surprise insert under low fragmented memory conditions. This patch breaks these regions up into separate low order allocations and increases the maximum number of segments a single command SGL can have. We wanted to allow at least 256 segments for 1 MB direct IO. Since the command header occupies the first 0x80 bytes of the SGL blob, that meant we needed two 4k pages to contain the header and SGL. The two pages allow up to 504 SGL segments. Signed-off-by: Sam Bradshaw sbrads...@micron.com Signed-off-by: Asai Thambi S P asamymuth...@micron.com --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 050c712..ae46bfd 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -41,10 +41,31 @@ #include mtip32xx.h #define HW_CMD_SLOT_SZ (MTIP_MAX_COMMAND_SLOTS * 32) -#define HW_CMD_TBL_SZ (AHCI_CMD_TBL_HDR_SZ + (MTIP_MAX_SG * 16)) -#define HW_CMD_TBL_AR_SZ (HW_CMD_TBL_SZ * MTIP_MAX_COMMAND_SLOTS) -#define HW_PORT_PRIV_DMA_SZ \ - (HW_CMD_SLOT_SZ + HW_CMD_TBL_AR_SZ + AHCI_RX_FIS_SZ) + +/* DMA region containing RX Fis, Identify, RLE10, and SMART buffers */ +#define AHCI_RX_FIS_SZ 0x100 +#define AHCI_RX_FIS_OFFSET 0x0 +#define AHCI_IDFY_SZATA_SECT_SIZE +#define AHCI_IDFY_OFFSET0x400 +#define AHCI_SECTBUF_SZ ATA_SECT_SIZE +#define AHCI_SECTBUF_OFFSET 0x800 +#define AHCI_SMARTBUF_SZATA_SECT_SIZE +#define AHCI_SMARTBUF_OFFSET0xC00 +/* 0x100 + 0x200 + 0x200 + 0x200 is smaller than 4k but we pad it out */ +#define BLOCK_DMA_ALLOC_SZ 4096 + +/* DMA region containing command table (should be 8192 bytes) */ +#define AHCI_CMD_SLOT_SZsizeof(struct mtip_cmd_hdr) +#define AHCI_CMD_TBL_SZ (MTIP_MAX_COMMAND_SLOTS * AHCI_CMD_SLOT_SZ) +#define AHCI_CMD_TBL_OFFSET 0x0 + +/* DMA region per command (contains header and SGL) */ +#define AHCI_CMD_TBL_HDR_SZ 0x80 +#define AHCI_CMD_TBL_HDR_OFFSET 0x0 +#define AHCI_CMD_TBL_SGL_SZ (MTIP_MAX_SG * sizeof(struct mtip_cmd_sg)) +#define AHCI_CMD_TBL_SGL_OFFSET AHCI_CMD_TBL_HDR_SZ +#define CMD_DMA_ALLOC_SZ(AHCI_CMD_TBL_SGL_SZ + AHCI_CMD_TBL_HDR_SZ) + #define HOST_CAP_NZDMA (1 19) #define HOST_HSORG 0xFC @@ -3313,6 +3334,118 @@ st_out: } /* + * DMA region teardown + * + * @dd Pointer to driver_data structure + * + * return value + * None + */ +static void mtip_dma_free(struct driver_data *dd) +{ + int i; + struct mtip_port *port = dd-port; + + if (port-block1) + dmam_free_coherent(dd-pdev-dev, BLOCK_DMA_ALLOC_SZ, + port-block1, port-block1_dma); + + if (port-command_list) { + dmam_free_coherent(dd-pdev-dev, AHCI_CMD_TBL_SZ, + port-command_list, port-command_list_dma); + } + + for (i = 0; i MTIP_MAX_COMMAND_SLOTS; i++) { + if (port-commands[i].command) + dmam_free_coherent(dd-pdev-dev, CMD_DMA_ALLOC_SZ, + port-commands[i].command, + port-commands[i].command_dma); + } +} + +/* + * DMA region setup + * + * @dd Pointer to driver_data structure + * + * return value + * -ENOMEM Not enough free DMA region space to initialize driver + */ +static int mtip_dma_alloc(struct driver_data *dd) +{ + struct mtip_port *port = dd-port; + int i, rv = 0; + u32 host_cap_64 = readl(dd-mmio + HOST_CAP) HOST_CAP_64; + + /* Allocate dma memory for RX Fis, Identify, and Sector Bufffer */ + port-block1 = + dmam_alloc_coherent(dd-pdev-dev, BLOCK_DMA_ALLOC_SZ, + port-block1_dma, GFP_KERNEL); + if (!port-block1) + return -ENOMEM; + memset(port-block1, 0, BLOCK_DMA_ALLOC_SZ); + + /* Allocate dma memory for command list */ + port-command_list = + dmam_alloc_coherent(dd-pdev-dev, AHCI_CMD_TBL_SZ, + port-command_list_dma, GFP_KERNEL); + if (!port-command_list) { + dmam_free_coherent(dd-pdev-dev, BLOCK_DMA_ALLOC_SZ, + port-block1, port-block1_dma); + port-block1 = NULL; + port-block1_dma = 0; + return -ENOMEM; + } + memset(port-command_list, 0, AHCI_CMD_TBL_SZ); + + /* Setup all pointers into first DMA region */ + port-rxfis = port-block1 + AHCI_RX_FIS_OFFSET; + port-rxfis_dma = port-block1_dma + AHCI_RX_FIS_OFFSET; + port-identify = port-block1 + AHCI_IDFY_OFFSET; + port-identify_dma = port-block1_dma + AHCI_IDFY_OFFSET; + port
[PATCH] mtip32xx: Correctly handle security locked condition
If power is removed during a secure erase, the drive will end up in a security locked condition. This patch causes the driver to identify, log, and flag the security lock state. IOs are prevented from submission to the drive until the locked state is addressed with a secure erase. Bumped version number to reflect this capability. Signed-off-by: Sam Bradshaw Signed-off-by: Asai Thambi S P --- drivers/block/mtip32xx/mtip32xx.c | 16 ++-- drivers/block/mtip32xx/mtip32xx.h |2 +- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index aab28a4..a710f51 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -899,8 +899,9 @@ static void mtip_handle_tfe(struct driver_data *dd) fail_reason = "thermal shutdown"; } if (buf[288] == 0xBF) { + set_bit(MTIP_DDF_SEC_LOCK_BIT, >dd_flag); dev_info(>pdev->dev, - "Drive indicates rebuild has failed.\n"); + "Drive indicates rebuild has failed. Secure erase required.\n"); fail_all_ncq_cmds = 1; fail_reason = "rebuild failed"; } @@ -1566,6 +1567,12 @@ static int mtip_get_identify(struct mtip_port *port, void __user *user_buffer) } #endif + /* Check security locked state */ + if (port->identify[128] & 0x4) + set_bit(MTIP_DDF_SEC_LOCK_BIT, >dd->dd_flag); + else + clear_bit(MTIP_DDF_SEC_LOCK_BIT, >dd->dd_flag); + #ifdef MTIP_TRIM /* Disabling TRIM support temporarily */ /* Demux ID.DRAT & ID.RZAT to determine trim support */ if (port->identify[69] & (1 << 14) && port->identify[69] & (1 << 5)) @@ -1887,6 +1894,10 @@ static void mtip_dump_identify(struct mtip_port *port) strlcpy(cbuf, (char *)(port->identify+27), 41); dev_info(>dd->pdev->dev, "Model: %s\n", cbuf); + dev_info(>dd->pdev->dev, "Security: %04x %s\n", + port->identify[128], + port->identify[128] & 0x4 ? "(LOCKED)" : ""); + if (mtip_hw_get_capacity(port->dd, )) dev_info(>dd->pdev->dev, "Capacity: %llu sectors (%llu MB)\n", @@ -3594,7 +3605,8 @@ static int mtip_hw_exit(struct driver_data *dd) * saves its state. */ if (!dd->sr) { - if (!test_bit(MTIP_DDF_REBUILD_FAILED_BIT, >dd_flag)) + if (!test_bit(MTIP_PF_REBUILD_BIT, >port->flags) && + !test_bit(MTIP_DDF_SEC_LOCK_BIT, >dd_flag)) if (mtip_standby_immediate(dd->port)) dev_warn(>pdev->dev, "STANDBY IMMEDIATE failed\n"); diff --git a/drivers/block/mtip32xx/mtip32xx.h b/drivers/block/mtip32xx/mtip32xx.h index 9be7a15..f73444a 100644 --- a/drivers/block/mtip32xx/mtip32xx.h +++ b/drivers/block/mtip32xx/mtip32xx.h @@ -92,7 +92,7 @@ /* Driver name and version strings */ #define MTIP_DRV_NAME "mtip32xx" -#define MTIP_DRV_VERSION "1.2.6os3" +#define MTIP_DRV_VERSION "1.3.0" /* Maximum number of minor device numbers per device. */ #define MTIP_MAX_MINORS16 -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtip32xx: Correctly handle security locked condition
If power is removed during a secure erase, the drive will end up in a security locked condition. This patch causes the driver to identify, log, and flag the security lock state. IOs are prevented from submission to the drive until the locked state is addressed with a secure erase. Bumped version number to reflect this capability. Signed-off-by: Sam Bradshaw sbrads...@micron.com Signed-off-by: Asai Thambi S P asamymuth...@micron.com --- drivers/block/mtip32xx/mtip32xx.c | 16 ++-- drivers/block/mtip32xx/mtip32xx.h |2 +- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index aab28a4..a710f51 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -899,8 +899,9 @@ static void mtip_handle_tfe(struct driver_data *dd) fail_reason = thermal shutdown; } if (buf[288] == 0xBF) { + set_bit(MTIP_DDF_SEC_LOCK_BIT, dd-dd_flag); dev_info(dd-pdev-dev, - Drive indicates rebuild has failed.\n); + Drive indicates rebuild has failed. Secure erase required.\n); fail_all_ncq_cmds = 1; fail_reason = rebuild failed; } @@ -1566,6 +1567,12 @@ static int mtip_get_identify(struct mtip_port *port, void __user *user_buffer) } #endif + /* Check security locked state */ + if (port-identify[128] 0x4) + set_bit(MTIP_DDF_SEC_LOCK_BIT, port-dd-dd_flag); + else + clear_bit(MTIP_DDF_SEC_LOCK_BIT, port-dd-dd_flag); + #ifdef MTIP_TRIM /* Disabling TRIM support temporarily */ /* Demux ID.DRAT ID.RZAT to determine trim support */ if (port-identify[69] (1 14) port-identify[69] (1 5)) @@ -1887,6 +1894,10 @@ static void mtip_dump_identify(struct mtip_port *port) strlcpy(cbuf, (char *)(port-identify+27), 41); dev_info(port-dd-pdev-dev, Model: %s\n, cbuf); + dev_info(port-dd-pdev-dev, Security: %04x %s\n, + port-identify[128], + port-identify[128] 0x4 ? (LOCKED) : ); + if (mtip_hw_get_capacity(port-dd, sectors)) dev_info(port-dd-pdev-dev, Capacity: %llu sectors (%llu MB)\n, @@ -3594,7 +3605,8 @@ static int mtip_hw_exit(struct driver_data *dd) * saves its state. */ if (!dd-sr) { - if (!test_bit(MTIP_DDF_REBUILD_FAILED_BIT, dd-dd_flag)) + if (!test_bit(MTIP_PF_REBUILD_BIT, dd-port-flags) + !test_bit(MTIP_DDF_SEC_LOCK_BIT, dd-dd_flag)) if (mtip_standby_immediate(dd-port)) dev_warn(dd-pdev-dev, STANDBY IMMEDIATE failed\n); diff --git a/drivers/block/mtip32xx/mtip32xx.h b/drivers/block/mtip32xx/mtip32xx.h index 9be7a15..f73444a 100644 --- a/drivers/block/mtip32xx/mtip32xx.h +++ b/drivers/block/mtip32xx/mtip32xx.h @@ -92,7 +92,7 @@ /* Driver name and version strings */ #define MTIP_DRV_NAME mtip32xx -#define MTIP_DRV_VERSION 1.2.6os3 +#define MTIP_DRV_VERSION 1.3.0 /* Maximum number of minor device numbers per device. */ #define MTIP_MAX_MINORS16 -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtip32xx: Correctly handle bio->bi_idx != 0 conditions
Stacking drivers may append bvecs to existing bio's, resulting in non-zero bi_idx conditions. This patch counts the loops of bio_for_each_segment() rather than inheriting the bi_idx value to pass as a segment count to the hardware submission routine. Signed-off-by: Sam Bradshaw --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 847107e..7c77ae1 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3863,7 +3863,7 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio) struct driver_data *dd = queue->queuedata; struct scatterlist *sg; struct bio_vec *bvec; - int nents = 0; + int i, nents = 0; int tag = 0, unaligned = 0; if (unlikely(dd->dd_flag & MTIP_DDF_STOP_IO)) { @@ -3921,11 +3921,12 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio) } /* Create the scatter list for this bio. */ - bio_for_each_segment(bvec, bio, nents) { + bio_for_each_segment(bvec, bio, i) { sg_set_page([nents], bvec->bv_page, bvec->bv_len, bvec->bv_offset); + nents++; } /* Issue the read/write. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][TRIVIAL] mtip32xx: Fix NULL pointer dereference during module unload
An open file-handle to one or more of the driver exported debugfs nodes causes raciness in recursive removal during module unload; sometimes a stale parent dentry is dereferenced when more than 1 pci device is present. Signed-off-by: Sam Bradshaw --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 847107e..e366c74 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3002,7 +3002,8 @@ static int mtip_hw_debugfs_init(struct driver_data *dd) static void mtip_hw_debugfs_exit(struct driver_data *dd) { - debugfs_remove_recursive(dd->dfs_node); + if (dd->dfs_node) + debugfs_remove_recursive(dd->dfs_node); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][TRIVIAL] mtip32xx: Fix NULL pointer dereference during module unload
An open file-handle to one or more of the driver exported debugfs nodes causes raciness in recursive removal during module unload; sometimes a stale parent dentry is dereferenced when more than 1 pci device is present. Signed-off-by: Sam Bradshaw --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 847107e..e366c74 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3002,7 +3002,8 @@ static int mtip_hw_debugfs_init(struct driver_data *dd) static void mtip_hw_debugfs_exit(struct driver_data *dd) { - debugfs_remove_recursive(dd->dfs_node); + if (dd->dfs_node) + debugfs_remove_recursive(dd->dfs_node); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][TRIVIAL] mtip32xx: Fix NULL pointer dereference during module unload
An open file-handle to one or more of the driver exported debugfs nodes causes raciness in recursive removal during module unload; sometimes a stale parent dentry is dereferenced when more than 1 pci device is present. Signed-off-by: Sam Bradshaw sbrads...@micron.com --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 847107e..e366c74 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3002,7 +3002,8 @@ static int mtip_hw_debugfs_init(struct driver_data *dd) static void mtip_hw_debugfs_exit(struct driver_data *dd) { - debugfs_remove_recursive(dd-dfs_node); + if (dd-dfs_node) + debugfs_remove_recursive(dd-dfs_node); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][TRIVIAL] mtip32xx: Fix NULL pointer dereference during module unload
An open file-handle to one or more of the driver exported debugfs nodes causes raciness in recursive removal during module unload; sometimes a stale parent dentry is dereferenced when more than 1 pci device is present. Signed-off-by: Sam Bradshaw sbrads...@micron.com --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 847107e..e366c74 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3002,7 +3002,8 @@ static int mtip_hw_debugfs_init(struct driver_data *dd) static void mtip_hw_debugfs_exit(struct driver_data *dd) { - debugfs_remove_recursive(dd-dfs_node); + if (dd-dfs_node) + debugfs_remove_recursive(dd-dfs_node); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mtip32xx: Correctly handle bio-bi_idx != 0 conditions
Stacking drivers may append bvecs to existing bio's, resulting in non-zero bi_idx conditions. This patch counts the loops of bio_for_each_segment() rather than inheriting the bi_idx value to pass as a segment count to the hardware submission routine. Signed-off-by: Sam Bradshaw sbrads...@micron.com --- diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 847107e..7c77ae1 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -3863,7 +3863,7 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio) struct driver_data *dd = queue-queuedata; struct scatterlist *sg; struct bio_vec *bvec; - int nents = 0; + int i, nents = 0; int tag = 0, unaligned = 0; if (unlikely(dd-dd_flag MTIP_DDF_STOP_IO)) { @@ -3921,11 +3921,12 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio) } /* Create the scatter list for this bio. */ - bio_for_each_segment(bvec, bio, nents) { + bio_for_each_segment(bvec, bio, i) { sg_set_page(sg[nents], bvec-bv_page, bvec-bv_len, bvec-bv_offset); + nents++; } /* Issue the read/write. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] mtip32xx: mtip32xx: Disable TRIM support
> On Fri, Apr 12 2013, Asai Thambi S P wrote: > > > > Temporarily disabling TRIM support until TRIM related issues > > are addressed in the firmware. > > How serious is this? We do have released kernels out there with the > driver, you might want to consider a stable backport too. It's a performance issue, not data integrity. Since the ATA trim command is non-queueable, IO traffic must halt while the trim is outstanding. A filesystem that is actively trimming sectors is effectively duty cycling the IO path on this part. We are working to shrink the trim command latency while also investigating whether some sort of offline trim is more appropriate. Given the incremental endurance gain for trim on 34nm SLC vs the severe throughput degradation under some workloads, it seemed prudent to disable the feature. If you think that warrants a stable backport, just let me or Asai know. -Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] mtip32xx: mtip32xx: Disable TRIM support
On Fri, Apr 12 2013, Asai Thambi S P wrote: Temporarily disabling TRIM support until TRIM related issues are addressed in the firmware. How serious is this? We do have released kernels out there with the driver, you might want to consider a stable backport too. It's a performance issue, not data integrity. Since the ATA trim command is non-queueable, IO traffic must halt while the trim is outstanding. A filesystem that is actively trimming sectors is effectively duty cycling the IO path on this part. We are working to shrink the trim command latency while also investigating whether some sort of offline trim is more appropriate. Given the incremental endurance gain for trim on 34nm SLC vs the severe throughput degradation under some workloads, it seemed prudent to disable the feature. If you think that warrants a stable backport, just let me or Asai know. -Sam -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: block: Add driver for Micron RealSSD pcie flash cards
> -Original Message- > From: Dan Carpenter [mailto:dan.carpen...@oracle.com] > Sent: Monday, October 08, 2012 12:13 AM > To: Sam Bradshaw (sbradshaw) > Cc: Jens Axboe; linux-kernel@vger.kernel.org > Subject: re: block: Add driver for Micron RealSSD pcie flash cards > > Hello Sam Bradshaw, > > The patch 88523a61558a: "block: Add driver for Micron RealSSD pcie > flash cards" from Aug 30, 2011, creates an endian bug: > drivers/block/mtip32xx/mtip32xx.c:2468 mtip_hw_submit_io() > > drivers/block/mtip32xx/mtip32xx.c > 2465 fis->opts= 1 << 7; > 2466 fis->command = > 2467 (dir == READ ? ATA_CMD_FPDMA_READ : > ATA_CMD_FPDMA_WRITE); > 2468 *((unsigned int *) >lba_low) = (start & 0xFF); > > 2469 *((unsigned int *) >lba_low_ex) = ((start >> 24) & > 0xFF); > ^^^ > > These addresses point to a series of four unions inside the > host_to_dev_fis struct. The unions will be set to different values > depending on if the CPU is big or little endian. > > If you knew which endianness this works on, then you could probably add > a cpu_to_be32() or cpu_to_le32(). Thanks for the notification; we'll get a patch out asap. -Sam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: block: Add driver for Micron RealSSD pcie flash cards
-Original Message- From: Dan Carpenter [mailto:dan.carpen...@oracle.com] Sent: Monday, October 08, 2012 12:13 AM To: Sam Bradshaw (sbradshaw) Cc: Jens Axboe; linux-kernel@vger.kernel.org Subject: re: block: Add driver for Micron RealSSD pcie flash cards Hello Sam Bradshaw, The patch 88523a61558a: block: Add driver for Micron RealSSD pcie flash cards from Aug 30, 2011, creates an endian bug: drivers/block/mtip32xx/mtip32xx.c:2468 mtip_hw_submit_io() drivers/block/mtip32xx/mtip32xx.c 2465 fis-opts= 1 7; 2466 fis-command = 2467 (dir == READ ? ATA_CMD_FPDMA_READ : ATA_CMD_FPDMA_WRITE); 2468 *((unsigned int *) fis-lba_low) = (start 0xFF); 2469 *((unsigned int *) fis-lba_low_ex) = ((start 24) 0xFF); ^^^ These addresses point to a series of four unions inside the host_to_dev_fis struct. The unions will be set to different values depending on if the CPU is big or little endian. If you knew which endianness this works on, then you could probably add a cpu_to_be32() or cpu_to_le32(). Thanks for the notification; we'll get a patch out asap. -Sam -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/