Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same
Shaun, On 8/23/16 09:22, Shaun Tancheff wrote: > On Mon, Aug 22, 2016 at 6:57 PM, Damien Le Moal> wrote: >> >> Shaun, >> >> On 8/22/16 13:31, Shaun Tancheff wrote: >> [...] >>> -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq, >>> - sector_t sector, unsigned int num_sectors) >>> +int sd_zbc_setup_discard(struct scsi_cmnd *cmd) >>> { >>> - struct blk_zone *zone; >>> + struct request *rq = cmd->request; >>> + struct scsi_device *sdp = cmd->device; >>> + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); >>> + sector_t sector = blk_rq_pos(rq); >>> + unsigned int nr_sectors = blk_rq_sectors(rq); >>> int ret = BLKPREP_OK; >>> + struct blk_zone *zone; >>> unsigned long flags; >>> + u32 wp_offset; >>> + bool use_write_same = false; >>> >>> zone = blk_lookup_zone(rq->q, sector); >>> - if (!zone) >>> + if (!zone) { >>> + /* Test for a runt zone before giving up */ >>> + if (sdp->type != TYPE_ZBC) { >>> + struct request_queue *q = rq->q; >>> + struct rb_node *node; >>> + >>> + node = rb_last(>zones); >>> + if (node) >>> + zone = rb_entry(node, struct blk_zone, node); >>> + if (zone) { >>> + spin_lock_irqsave(>lock, flags); >>> + if ((zone->start + zone->len) <= sector) >>> + goto out; >>> + spin_unlock_irqrestore(>lock, flags); >>> + zone = NULL; >>> + } >>> + } >>> return BLKPREP_KILL; >>> + } >> >> I do not understand the point of this code here to test for the runt >> zone. As long as sector is within the device maximum capacity (in 512B >> unit), blk_lookup_zone will return the pointer to the zone structure >> containing that sector (the RB-tree does not have any constraint >> regarding zone size). The only case where NULL would be returned is if >> discard is issued super early right after the disk is probed and before >> the zone refresh work has completed. We can certainly protect against >> that by delaying the discard. > > As you can see I am not including Host Managed in the > runt check. Indeed, but having a runt zone could also happen with a host-managed disk. > Also you may note that in my patch to get Host Aware working > with the zone cache I do not include the runt zone in the cache. Why not ? The RB-tree will handle it just fine (the insert and lookup code as Hannes had them was not relying on a constant zone size). > So as it sits I need this fallback otherwise doing blkdiscard over > the whole device ends in a error, as well as mkfs.f2fs et. al. Got it, but I do not see a problem with including it. I have not checked the code, but the split of a big discard call into "chunks" should be already handling the last chunk and make sure that the operation does not exceed the device capacity (in any case, that's easy to fix in the sd_zbc_setup_discard code). >>> zone->start, zone->state); >>> ret = BLKPREP_DEFER; >>> goto out; >>> @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, >>> struct request *rq, >>> if (zone->state == BLK_ZONE_OFFLINE) { >>> /* let the drive fail the command */ >>> sd_zbc_debug_ratelimit(sdkp, >>> -"Discarding offline zone %zu\n", >>> +"Discarding offline zone %zx\n", >>> zone->start); >>> goto out; >>> } >>> - >>> - if (!blk_zone_is_smr(zone)) { >>> + if (blk_zone_is_cmr(zone)) { >>> + use_write_same = true; >>> sd_zbc_debug_ratelimit(sdkp, >>> -"Discarding %s zone %zu\n", >>> -blk_zone_is_cmr(zone) ? "CMR" : >>> "unknown", >>> +"Discarding CMR zone %zx\n", >>> zone->start); >>> - ret = BLKPREP_DONE; >>> goto out; >>> } >> >> Some 10TB host managed disks out there have 1% conventional zone space, >> that is 100GB of capacity. When issuing a "reset all", doing a write >> same in these zones will take forever... If the user really wants zeroes >> in those zones, let it issue a zeroout. >> >> I think that it would a better choice to simply not report >> discard_zeroes_data as true and do nothing for conventional zones reset. > > I think that would be unfortunate for Host Managed but I think it's > the right choice for Host Aware at this time. So either we base > it on disk type or we have some other config flag added to sysfs. I do not see any
Re: [PATCH v2 15/29] be2iscsi: Fix to make boot discovery non-blocking
Please check the return on line 1517. It looks suspicious that it does not release the lock, unlike the return on line 1508. julia --- Hi Jitendra, [auto build test WARNING on scsi/for-next] [also build test WARNING on v4.8-rc3 next-20160822] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information] url: https://github.com/0day-ci/linux/commits/Jitendra-Bhivare/be2iscsi-driver-update-11-2-0-0/20160819-175550 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next :: branch date: 4 days ago :: commit date: 4 days ago >> drivers/scsi/be2iscsi/be_mgmt.c:1517:2-8: preceding lock on line 1504 git remote add linux-review https://github.com/0day-ci/linux git remote update linux-review git checkout b9f267a2cf5f152fb5b5b907b715e0854f337ccd vim +1517 drivers/scsi/be2iscsi/be_mgmt.c b9f267a2 Jitendra Bhivare 2016-08-19 1498 struct be_cmd_get_session_req *req; b9f267a2 Jitendra Bhivare 2016-08-19 1499 struct be_dma_mem *nonemb_cmd; b9f267a2 Jitendra Bhivare 2016-08-19 1500 struct be_mcc_wrb *wrb; b9f267a2 Jitendra Bhivare 2016-08-19 1501 struct be_sge *sge; b9f267a2 Jitendra Bhivare 2016-08-19 1502 unsigned int tag; b9f267a2 Jitendra Bhivare 2016-08-19 1503 b9f267a2 Jitendra Bhivare 2016-08-19 @1504 mutex_lock(>mbox_lock); b9f267a2 Jitendra Bhivare 2016-08-19 1505 wrb = alloc_mcc_wrb(phba, ); b9f267a2 Jitendra Bhivare 2016-08-19 1506 if (!wrb) { b9f267a2 Jitendra Bhivare 2016-08-19 1507 mutex_unlock(>mbox_lock); b9f267a2 Jitendra Bhivare 2016-08-19 1508 return 0; b9f267a2 Jitendra Bhivare 2016-08-19 1509 } b9f267a2 Jitendra Bhivare 2016-08-19 1510 b9f267a2 Jitendra Bhivare 2016-08-19 1511 nonemb_cmd = >boot_struct.nonemb_cmd; b9f267a2 Jitendra Bhivare 2016-08-19 1512 nonemb_cmd->size = sizeof(*resp); b9f267a2 Jitendra Bhivare 2016-08-19 1513 nonemb_cmd->va = pci_alloc_consistent(phba->ctrl.pdev, b9f267a2 Jitendra Bhivare 2016-08-19 1514 sizeof(nonemb_cmd->size), b9f267a2 Jitendra Bhivare 2016-08-19 1515 _cmd->dma); b9f267a2 Jitendra Bhivare 2016-08-19 1516 if (!nonemb_cmd->va) b9f267a2 Jitendra Bhivare 2016-08-19 @1517 return 0; b9f267a2 Jitendra Bhivare 2016-08-19 1518 b9f267a2 Jitendra Bhivare 2016-08-19 1519 req = nonemb_cmd->va; b9f267a2 Jitendra Bhivare 2016-08-19 1520 memset(req, 0, sizeof(*req)); --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same
On Mon, Aug 22, 2016 at 6:57 PM, Damien Le Moalwrote: > > Shaun, > > On 8/22/16 13:31, Shaun Tancheff wrote: > [...] >> -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq, >> - sector_t sector, unsigned int num_sectors) >> +int sd_zbc_setup_discard(struct scsi_cmnd *cmd) >> { >> - struct blk_zone *zone; >> + struct request *rq = cmd->request; >> + struct scsi_device *sdp = cmd->device; >> + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); >> + sector_t sector = blk_rq_pos(rq); >> + unsigned int nr_sectors = blk_rq_sectors(rq); >> int ret = BLKPREP_OK; >> + struct blk_zone *zone; >> unsigned long flags; >> + u32 wp_offset; >> + bool use_write_same = false; >> >> zone = blk_lookup_zone(rq->q, sector); >> - if (!zone) >> + if (!zone) { >> + /* Test for a runt zone before giving up */ >> + if (sdp->type != TYPE_ZBC) { >> + struct request_queue *q = rq->q; >> + struct rb_node *node; >> + >> + node = rb_last(>zones); >> + if (node) >> + zone = rb_entry(node, struct blk_zone, node); >> + if (zone) { >> + spin_lock_irqsave(>lock, flags); >> + if ((zone->start + zone->len) <= sector) >> + goto out; >> + spin_unlock_irqrestore(>lock, flags); >> + zone = NULL; >> + } >> + } >> return BLKPREP_KILL; >> + } > > I do not understand the point of this code here to test for the runt > zone. As long as sector is within the device maximum capacity (in 512B > unit), blk_lookup_zone will return the pointer to the zone structure > containing that sector (the RB-tree does not have any constraint > regarding zone size). The only case where NULL would be returned is if > discard is issued super early right after the disk is probed and before > the zone refresh work has completed. We can certainly protect against > that by delaying the discard. As you can see I am not including Host Managed in the runt check. Also you may note that in my patch to get Host Aware working with the zone cache I do not include the runt zone in the cache. So as it sits I need this fallback otherwise doing blkdiscard over the whole device ends in a error, as well as mkfs.f2fs et. al. >> spin_lock_irqsave(>lock, flags); >> - >> if (zone->state == BLK_ZONE_UNKNOWN || >> zone->state == BLK_ZONE_BUSY) { >> sd_zbc_debug_ratelimit(sdkp, >> -"Discarding zone %zu state %x, >> deferring\n", >> +"Discarding zone %zx state %x, >> deferring\n", > > Sector values are usually displayed in decimal. Why use Hex here ? At > least "0x" would be needed to avoid confusion I think. Yeah, my brain is lazy about converting very large numbers to powers of 2. So it's much easier to spot zone alignment here. >> zone->start, zone->state); >> ret = BLKPREP_DEFER; >> goto out; >> @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, >> struct request *rq, >> if (zone->state == BLK_ZONE_OFFLINE) { >> /* let the drive fail the command */ >> sd_zbc_debug_ratelimit(sdkp, >> -"Discarding offline zone %zu\n", >> +"Discarding offline zone %zx\n", >> zone->start); >> goto out; >> } >> - >> - if (!blk_zone_is_smr(zone)) { >> + if (blk_zone_is_cmr(zone)) { >> + use_write_same = true; >> sd_zbc_debug_ratelimit(sdkp, >> -"Discarding %s zone %zu\n", >> -blk_zone_is_cmr(zone) ? "CMR" : >> "unknown", >> +"Discarding CMR zone %zx\n", >> zone->start); >> - ret = BLKPREP_DONE; >> goto out; >> } > > Some 10TB host managed disks out there have 1% conventional zone space, > that is 100GB of capacity. When issuing a "reset all", doing a write > same in these zones will take forever... If the user really wants zeroes > in those zones, let it issue a zeroout. > > I think that it would a better choice to simply not report > discard_zeroes_data as true and do nothing for conventional zones reset. I think that would be unfortunate for Host Managed but I think it's the right choice for Host Aware at this time. So either we base it on disk type or we have some other config flag added to sysfs. >> - if (blk_zone_is_empty(zone)) { >> - sd_zbc_debug_ratelimit(sdkp,
Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same
Shaun, On 8/22/16 13:31, Shaun Tancheff wrote: [...] > -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq, > - sector_t sector, unsigned int num_sectors) > +int sd_zbc_setup_discard(struct scsi_cmnd *cmd) > { > - struct blk_zone *zone; > + struct request *rq = cmd->request; > + struct scsi_device *sdp = cmd->device; > + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); > + sector_t sector = blk_rq_pos(rq); > + unsigned int nr_sectors = blk_rq_sectors(rq); > int ret = BLKPREP_OK; > + struct blk_zone *zone; > unsigned long flags; > + u32 wp_offset; > + bool use_write_same = false; > > zone = blk_lookup_zone(rq->q, sector); > - if (!zone) > + if (!zone) { > + /* Test for a runt zone before giving up */ > + if (sdp->type != TYPE_ZBC) { > + struct request_queue *q = rq->q; > + struct rb_node *node; > + > + node = rb_last(>zones); > + if (node) > + zone = rb_entry(node, struct blk_zone, node); > + if (zone) { > + spin_lock_irqsave(>lock, flags); > + if ((zone->start + zone->len) <= sector) > + goto out; > + spin_unlock_irqrestore(>lock, flags); > + zone = NULL; > + } > + } > return BLKPREP_KILL; > + } I do not understand the point of this code here to test for the runt zone. As long as sector is within the device maximum capacity (in 512B unit), blk_lookup_zone will return the pointer to the zone structure containing that sector (the RB-tree does not have any constraint regarding zone size). The only case where NULL would be returned is if discard is issued super early right after the disk is probed and before the zone refresh work has completed. We can certainly protect against that by delaying the discard. > > spin_lock_irqsave(>lock, flags); > - > if (zone->state == BLK_ZONE_UNKNOWN || > zone->state == BLK_ZONE_BUSY) { > sd_zbc_debug_ratelimit(sdkp, > -"Discarding zone %zu state %x, > deferring\n", > +"Discarding zone %zx state %x, > deferring\n", Sector values are usually displayed in decimal. Why use Hex here ? At least "0x" would be needed to avoid confusion I think. > zone->start, zone->state); > ret = BLKPREP_DEFER; > goto out; > @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct > request *rq, > if (zone->state == BLK_ZONE_OFFLINE) { > /* let the drive fail the command */ > sd_zbc_debug_ratelimit(sdkp, > -"Discarding offline zone %zu\n", > +"Discarding offline zone %zx\n", > zone->start); > goto out; > } > - > - if (!blk_zone_is_smr(zone)) { > + if (blk_zone_is_cmr(zone)) { > + use_write_same = true; > sd_zbc_debug_ratelimit(sdkp, > -"Discarding %s zone %zu\n", > -blk_zone_is_cmr(zone) ? "CMR" : > "unknown", > +"Discarding CMR zone %zx\n", > zone->start); > - ret = BLKPREP_DONE; > goto out; > } Some 10TB host managed disks out there have 1% conventional zone space, that is 100GB of capacity. When issuing a "reset all", doing a write same in these zones will take forever... If the user really wants zeroes in those zones, let it issue a zeroout. I think that it would a better choice to simply not report discard_zeroes_data as true and do nothing for conventional zones reset. > - if (blk_zone_is_empty(zone)) { > - sd_zbc_debug_ratelimit(sdkp, > -"Discarding empty zone %zu\n", > -zone->start); > - ret = BLKPREP_DONE; > + if (zone->start != sector || zone->len < nr_sectors) { > + sd_printk(KERN_ERR, sdkp, > + "Misaligned RESET WP %zx/%x on zone %zx/%zx\n", > + sector, nr_sectors, zone->start, zone->len); > + ret = BLKPREP_KILL; > goto out; > } > - > - if (zone->start != sector || > - zone->len < num_sectors) { > + /* Protect against Reset WP when more data had been written to the > + * zone than is being discarded. > + */ > + wp_offset = zone->wp - zone->start; > + if (wp_offset > nr_sectors) { > sd_printk(KERN_ERR, sdkp, > -
[Bug 151661] Adaptec 3405 3805 prints "AAC: Host adapter dead -1" every 10 seconds but works fine anyway
https://bugzilla.kernel.org/show_bug.cgi?id=151661 --- Comment #11 from Piotr Szymaniak--- Created attachment 229761 --> https://bugzilla.kernel.org/attachment.cgi?id=229761=edit config-4.7.0 -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 151661] Adaptec 3405 3805 prints "AAC: Host adapter dead -1" every 10 seconds but works fine anyway
https://bugzilla.kernel.org/show_bug.cgi?id=151661 --- Comment #10 from Piotr Szymaniak--- Created attachment 229751 --> https://bugzilla.kernel.org/attachment.cgi?id=229751=edit config-4.3.6 -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 151661] Adaptec 3405 3805 prints "AAC: Host adapter dead -1" every 10 seconds but works fine anyway
https://bugzilla.kernel.org/show_bug.cgi?id=151661 --- Comment #9 from Piotr Szymaniak--- Created attachment 229741 --> https://bugzilla.kernel.org/attachment.cgi?id=229741=edit dmesg vanilla 4.3.6 (with Alex patch) I tried previous vanilla kernels 4.x.latest-released and, as 4.6.x prints messages over and over I skipped 4.5 series and tried: - 4.4.19 - prints dead messages - 4.3.6 - works without messages (with Alex patch [1], intel_iommu=igfx_off) Not sure if this helps, but I hope it will. Any suggestions how to proceed from here? [1] https://www.redhat.com/archives/vfio-users/2016-July/msg00063.html -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
On 22 August 2016 at 20:32, Shaun Tancheffwrote: > On Mon, Aug 22, 2016 at 3:07 PM, Tom Yan wrote: >> I don't see how that's possible. count / n_block will always be >> smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention >> that isn't even a "buffer limit" anyway. By SG_IO do you mean like >> SCSI Write Same commands that issued with sg_write_same or so? If >> that's the case, that's what exactly commit 5c79097a28c2 >> ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds >> limit") is for. > > Ah, I see. You are guarding the only user of ata_set_lba_range_entries(). Yup. It is the only right thing to do anyway, that we leave the function "open" and guard per context when we use it. Say if ata_set_lba_range_entries() is gonna be a function that is shared by others, it would only make this commit more important. As I said, we did not guard it with a certain buffer limit, but merely redundantly guard it with a ("humanized") limit that applies to TRIM only. > Still if you are going to do that you have to alert any new user that they > must have an appropriately sized buffer to be overwriting. > > Better to move it out of ata.h then the limit the scope of accidental > misuse? I am not sure if it is really necessary but that's fine to me. I see that you have been doing it in your SCT Write Same patch series. Probably I can/should just leave the move to you? > > Regards, > Shaun -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
On Mon, Aug 22, 2016 at 3:07 PM, Tom Yanwrote: > I don't see how that's possible. count / n_block will always be > smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention > that isn't even a "buffer limit" anyway. By SG_IO do you mean like > SCSI Write Same commands that issued with sg_write_same or so? If > that's the case, that's what exactly commit 5c79097a28c2 > ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds > limit") is for. Ah, I see. You are guarding the only user of ata_set_lba_range_entries(). Still if you are going to do that you have to alert any new user that they must have an appropriately sized buffer to be overwriting. Better to move it out of ata.h then the limit the scope of accidental misuse? Regards, Shaun -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] aacraid: Use memdup_user() rather than duplicating its implementation
>> @@ -526,15 +526,9 @@ static int aac_send_raw_srb(struct aac_dev* dev, void >> __user * arg) >> goto cleanup; >> } >> >> - user_srbcmd = kmalloc(fibsize, GFP_KERNEL); >> - if (!user_srbcmd) { >> - dprintk((KERN_DEBUG"aacraid: Could not make a copy of the >> srb\n")); >> - rcode = -ENOMEM; >> - goto cleanup; >> - } >> - if(copy_from_user(user_srbcmd, user_srb,fibsize)){ >> - dprintk((KERN_DEBUG"aacraid: Could not copy srb from >> user\n")); >> - rcode = -EFAULT; >> + user_srbcmd = memdup_user(user_srb, fibsize); >> + if (IS_ERR(user_srbcmd)) { >> + rcode = PTR_ERR(user_srbcmd); >> goto cleanup; >> } >> >> -- > > Hi Markus, > > Patch 2/7 should precede Patch 1/7, as falling into kfree() would not look > pretty. Do you eventually prefer that this source code adjustment should be combined with the update suggestion "[2/7] aacraid: One function call less in aac_send_raw_srb() after error detection" in a single update step? Regards, Markus -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
I don't see how that's possible. count / n_block will always be smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention that isn't even a "buffer limit" anyway. By SG_IO do you mean like SCSI Write Same commands that issued with sg_write_same or so? If that's the case, that's what exactly commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds limit") is for. On 23 August 2016 at 03:58, Shaun Tancheffwrote: > On Mon, Aug 22, 2016 at 1:55 PM, wrote: >> From: Tom Yan >> >> In commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with >> n_block that exceeds limit"), it is made sure that >> ata_set_lba_range_entries() will never be called with a request >> size (n_block) that is larger than the number of blocks that a >> 512-byte block TRIM payload can describe (65535 * 64 = 4194240), >> in addition to acknowlegding the SCSI/block layer with the same >> limit by advertising it as the Maximum Write Same Length. >> >> Therefore, it is unnecessary to hard code the same limit in >> ata_set_lba_range_entries() itself, which would only cost extra >> maintenance effort. Such effort can be noticed in, for example, >> commit 2983860c7668 ("libata-scsi: avoid repeated calculation of >> number of TRIM ranges"). >> >> Signed-off-by: Tom Yan >> >> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c >> index be9c76c..9b74ecb 100644 >> --- a/drivers/ata/libata-scsi.c >> +++ b/drivers/ata/libata-scsi.c >> @@ -3322,7 +3322,7 @@ static unsigned int ata_scsi_write_same_xlat(struct >> ata_queued_cmd *qc) >> buf = page_address(sg_page(scsi_sglist(scmd))); >> >> if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) { >> - size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, >> block, n_block); >> + size = ata_set_lba_range_entries(buf, block, n_block); >> } else { >> fp = 2; >> goto invalid_fld; >> diff --git a/include/linux/ata.h b/include/linux/ata.h >> index adbc812..5e2e9ad 100644 >> --- a/include/linux/ata.h >> +++ b/include/linux/ata.h >> @@ -1077,19 +1077,19 @@ static inline void ata_id_to_hd_driveid(u16 *id) >> * TO NV CACHE PINNED SET. >> */ >> static inline unsigned ata_set_lba_range_entries(void *_buffer, >> - unsigned num, u64 sector, unsigned long count) >> + u64 sector, unsigned long count) >> { >> __le64 *buffer = _buffer; >> unsigned i = 0, used_bytes; >> >> - while (i < num) { >> - u64 entry = sector | >> - ((u64)(count > 0x ? 0x : count) << 48); >> + while (count > 0) { >> + u64 range, entry; >> + >> + range = count > 0x ? 0x : count; >> + entry = sector | (range << 48); >> buffer[i++] = __cpu_to_le64(entry); >> - if (count <= 0x) >> - break; >> - count -= 0x; >> - sector += 0x; >> + count -= range; >> + sector += range; >> } > > I think the problem here is that I can now inject a buffer overflow > via SG_IO. > >> used_bytes = ALIGN(i * 8, 512); >> -- >> 2.9.3 >> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
On Mon, Aug 22, 2016 at 1:55 PM,wrote: > From: Tom Yan > > In commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with > n_block that exceeds limit"), it is made sure that > ata_set_lba_range_entries() will never be called with a request > size (n_block) that is larger than the number of blocks that a > 512-byte block TRIM payload can describe (65535 * 64 = 4194240), > in addition to acknowlegding the SCSI/block layer with the same > limit by advertising it as the Maximum Write Same Length. > > Therefore, it is unnecessary to hard code the same limit in > ata_set_lba_range_entries() itself, which would only cost extra > maintenance effort. Such effort can be noticed in, for example, > commit 2983860c7668 ("libata-scsi: avoid repeated calculation of > number of TRIM ranges"). > > Signed-off-by: Tom Yan > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index be9c76c..9b74ecb 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -3322,7 +3322,7 @@ static unsigned int ata_scsi_write_same_xlat(struct > ata_queued_cmd *qc) > buf = page_address(sg_page(scsi_sglist(scmd))); > > if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) { > - size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, > block, n_block); > + size = ata_set_lba_range_entries(buf, block, n_block); > } else { > fp = 2; > goto invalid_fld; > diff --git a/include/linux/ata.h b/include/linux/ata.h > index adbc812..5e2e9ad 100644 > --- a/include/linux/ata.h > +++ b/include/linux/ata.h > @@ -1077,19 +1077,19 @@ static inline void ata_id_to_hd_driveid(u16 *id) > * TO NV CACHE PINNED SET. > */ > static inline unsigned ata_set_lba_range_entries(void *_buffer, > - unsigned num, u64 sector, unsigned long count) > + u64 sector, unsigned long count) > { > __le64 *buffer = _buffer; > unsigned i = 0, used_bytes; > > - while (i < num) { > - u64 entry = sector | > - ((u64)(count > 0x ? 0x : count) << 48); > + while (count > 0) { > + u64 range, entry; > + > + range = count > 0x ? 0x : count; > + entry = sector | (range << 48); > buffer[i++] = __cpu_to_le64(entry); > - if (count <= 0x) > - break; > - count -= 0x; > - sector += 0x; > + count -= range; > + sector += range; > } I think the problem here is that I can now inject a buffer overflow via SG_IO. > used_bytes = ALIGN(i * 8, 512); > -- > 2.9.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ata: do not hard code limit in ata_set_lba_range_entries()
From: Tom YanIn commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds limit"), it is made sure that ata_set_lba_range_entries() will never be called with a request size (n_block) that is larger than the number of blocks that a 512-byte block TRIM payload can describe (65535 * 64 = 4194240), in addition to acknowlegding the SCSI/block layer with the same limit by advertising it as the Maximum Write Same Length. Therefore, it is unnecessary to hard code the same limit in ata_set_lba_range_entries() itself, which would only cost extra maintenance effort. Such effort can be noticed in, for example, commit 2983860c7668 ("libata-scsi: avoid repeated calculation of number of TRIM ranges"). Signed-off-by: Tom Yan diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index be9c76c..9b74ecb 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3322,7 +3322,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) buf = page_address(sg_page(scsi_sglist(scmd))); if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) { - size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block); + size = ata_set_lba_range_entries(buf, block, n_block); } else { fp = 2; goto invalid_fld; diff --git a/include/linux/ata.h b/include/linux/ata.h index adbc812..5e2e9ad 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -1077,19 +1077,19 @@ static inline void ata_id_to_hd_driveid(u16 *id) * TO NV CACHE PINNED SET. */ static inline unsigned ata_set_lba_range_entries(void *_buffer, - unsigned num, u64 sector, unsigned long count) + u64 sector, unsigned long count) { __le64 *buffer = _buffer; unsigned i = 0, used_bytes; - while (i < num) { - u64 entry = sector | - ((u64)(count > 0x ? 0x : count) << 48); + while (count > 0) { + u64 range, entry; + + range = count > 0x ? 0x : count; + entry = sector | (range << 48); buffer[i++] = __cpu_to_le64(entry); - if (count <= 0x) - break; - count -= 0x; - sector += 0x; + count -= range; + sector += range; } used_bytes = ALIGN(i * 8, 512); -- 2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/7] aacraid: Use memdup_user() rather than duplicating its implementation
> From: Markus Elfring> Date: Sat, 20 Aug 2016 20:05:24 +0200 > > Reuse existing functionality from memdup_user() instead of keeping duplicate > source code. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > drivers/scsi/aacraid/commctrl.c | 12 +++- > 1 file changed, 3 insertions(+), 9 deletions(-) > > diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c > index 5648b71..1af3084 100644 > --- a/drivers/scsi/aacraid/commctrl.c > +++ b/drivers/scsi/aacraid/commctrl.c > @@ -526,15 +526,9 @@ static int aac_send_raw_srb(struct aac_dev* dev, void > __user * arg) > goto cleanup; > } > > - user_srbcmd = kmalloc(fibsize, GFP_KERNEL); > - if (!user_srbcmd) { > - dprintk((KERN_DEBUG"aacraid: Could not make a copy of the > srb\n")); > - rcode = -ENOMEM; > - goto cleanup; > - } > - if(copy_from_user(user_srbcmd, user_srb,fibsize)){ > - dprintk((KERN_DEBUG"aacraid: Could not copy srb from > user\n")); > - rcode = -EFAULT; > + user_srbcmd = memdup_user(user_srb, fibsize); > + if (IS_ERR(user_srbcmd)) { > + rcode = PTR_ERR(user_srbcmd); > goto cleanup; > } > > -- Hi Markus, Patch 2/7 should precede Patch 1/7, as falling into kfree() would not look pretty. Thanks, -Dave
Re: [PATCH 0/3] nvme-fabrics: Add NVME FC Transport support to lpfc
On 8/11/2016 2:24 PM, Christoph Hellwig wrote: On Fri, Jul 22, 2016 at 05:24:00PM -0700, James Smart wrote: This patchset adds NVME support to the lpfc FC driver. It reworks the core driver for both NVME and SCSI protocol support, then adds the nvme-over-fabrics host and target interfaces which connect to the NVME FC transport lower-level api. Patches are cut against the linux-nvme for-4.8/drivers branch Before reviewing the details: what's the plan for merging this as we'll have dependencies on both the SCSI and the block tree? Good question. With the first round of lpfc patches, I tried to ensure we had all the cross-protocol infrastructure in place such that the driver can have 2 halves that effectively run independently. I'm hoping it means that I can submit scsi stuff independent from nvme stuff, but we'll have to see how it goes over time. I'm open to suggestions on how best to manage the 2 trees - e.g. scsi stuff goes in scsi tree, and if a dependency then has to be pulled into block ? nvme stuff only to block. The initial lpfc patch set should have all the scsi infrastructure pulled into the 4.8 merge, so hopefully that dependency has been resolved already. -- james Any recommendations ? -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] nvme-fabrics: Add FC LLDD loopback driver to test FC host and target transport
On 8/11/2016 2:23 PM, Christoph Hellwig wrote: On Fri, Jul 22, 2016 at 05:23:59PM -0700, James Smart wrote: Add FC LLDD loopback driver to test FC host and target transport within nvme-fabrics To aid in the development and testing of the lower-level api of the FC transport, this loopback driver has been created to act as if it were a FC hba driver supporting both the host interfaces as well as the target interfaces with the nvme FC transport. A proper FC loop device should support SCSI and NVMe. Any chance to morph it into one that does both? Nope. Not really appropriate for this module, whose goals were to test the fc-nvme interfaces. SCSI is a whole different beast. -- james -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] nvme-fabrics: Add target FC transport support
On 8/11/2016 2:22 PM, Christoph Hellwig wrote: + queue->work_q = alloc_workqueue("ntfc%d.%d.%d", 0, 0, + assoc->tgtport->fc_target_port.port_num, + assoc->a_id, qid); Do we really need a workqueue per queue? I thought we did - so there was always an execution context for the nvmet layer when a command is received. My understanding was the nvmet layer processes the cmd synchronously. I avoided per cpu - as I assumed the synchronous flows would end up blocking other connections to other subsystems/controllers. If that's not the case - can you describe the nvmet layer better so I understand what we need ? No reference counting? yes - will make a pass at it. -- james -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] nvme-fabrics: Add host FC transport support
Are you going to send a FC support patch for nvme-cli as well? I will, but concentrating on core support for now. I expect a fair number of things need to be done in the cli. + assoc_rqst->assoc_cmd.ersp_ratio = cpu_to_be16(ersp_ratio); + assoc_rqst->assoc_cmd.sqsize = cpu_to_be16(qsize); + /* TODO: +* assoc_rqst->assoc_cmd.cntlid = cpu_to_be16(?); +* strncpy(assoc_rqst->assoc_cmd.hostid, ?, +* min(FCNVME_ASSOC_HOSTID_LEN, NVMF_NQN_SIZE)); +* strncpy(assoc_rqst->assoc_cmd.hostnqn, ?, +* min(FCNVME_ASSOC_HOSTNQN_LEN, NVMF_NQN_SIZE]; +*/ What is the problem with filling all these out based on the information in the nvme_host structure? Well, I want to get them from the same place that the fabrics routines do when they build a capsule. So they should come from the opts struct. It's missing the controller id, so I'll need to address that. This probably needs to be moved, similar to the patch Sagi just did for RDMA. In general it might be a good idea to look at the various recent RDMA changes and check if they need to be brought over. yeah - I typically do a resync every time I repost. + /* +* TODO: blk_integrity_rq(rq) for DIX +*/ We'll hopefully be able to just move the DIX handling to common code. Any help appreciated.. Yes and noand likely a separate discussion. If DIF is passed, you should involve the host port to validate on egress as well has what the target controller will do. Although it brings up some headaches if the host detects an error (e.g. protocol requires target to be part of it). If DIF is inserted, the question will be whether you want to insert DIF at the egress point of the host port or do you pass it over the fabric w/o dif and have the tgt controller insert it. I assume we'll need to talk more. This seems to miss reference counting on the lport and rport structure. yes - an area of weakness right now. It would be good to sort this out before merging. I'll make a pass at it. -- james -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] nvme-fabrics: Add FC LLDD loopback driver to test FC host and target transport
On 8/2/2016 5:15 PM, J Freyensee wrote: +int +fcloop_fcp_op(struct nvmet_fc_target_port *tgtport, + struct nvmefc_tgt_fcp_req *tgt_fcpreq) +{ + ... + + return 0; if this function returns an 'int', why would it always return 0 and not the fcp_err values (if there is an error)? I'll look at it. The reason it's carried in the tgt_fcpreq is that a real LLDD would likely return from the fcp_op, and report an error later on. I'll see if things should be reorg'd. +struct nvme_fc_port_template fctemplate = { + .create_queue = fcloop_create_queue, + .delete_queue = fcloop_delete_queue, + .ls_req = fcloop_ls_req, + .fcp_io = fcloop_fcp_req, + .ls_abort = fcloop_ls_abort, + .fcp_abort = fcloop_fcp_abort, + + .max_hw_queues = 1, + .max_sgl_segments = 256, + .max_dif_sgl_segments = 256, + .dma_boundary = 0x, Between here and "struct nvmet_fc_target_template tgttemplate" they are assigning the same magic values to the same variable names, so why not have these values as #defines for a tad easier maintainability? sure. +static ssize_t +fcloop_create_local_port(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct nvme_fc_port_info pinfo; + struct fcloop_ctrl_options *opts; + struct nvme_fc_local_port *localport; + struct fcloop_lport *lport; + int ret; + + opts = kzalloc(sizeof(*opts), GFP_KERNEL); + if (!opts) + return -ENOMEM; + + ret = fcloop_parse_options(opts, buf); + if (ret) + goto out_free_opts; + + /* everything there ? */ + if ((opts->mask & LPORT_OPTS) != LPORT_OPTS) { + ret = -EINVAL; + goto out_free_opts; + } + + pinfo.fabric_name = opts->fabric; + pinfo.node_name = opts->wwnn; + pinfo.port_name = opts->wwpn; + pinfo.port_role = opts->roles; + pinfo.port_id = opts->fcaddr; + + ret = nvme_fc_register_localport(, , NULL, ); + if (!ret) { + /* success */ + lport = localport->private; + lport->localport = localport; + INIT_LIST_HEAD(>list); + INIT_LIST_HEAD(>rport_list); + list_add_tail(>list, _lports); + + /* mark all of the input buffer consumed */ + ret = count; + } + +out_free_opts: + kfree(opts); + return ret; +} + +static int __delete_local_port(struct fcloop_lport *lport) +{ + int ret; + + if (!list_empty(>rport_list)) + return -EBUSY; + + list_del(>list); + Is a mutex or locking mechanism not needed here for this list? Yeah - could probably use. As it was mainly a test tool with a controlled sequence, I think I ignored it. -- james -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] nvme-fabrics: Add target FC transport support
On 8/1/2016 9:37 PM, J Freyensee wrote: +static void +nvmet_fc_free_target_queue(struct nvmet_fc_tgt_queue *queue) +{ + struct nvmet_fc_tgtport *tgtport = queue->assoc->tgtport; + unsigned long flags; + + /* +* beware: nvmet layer hangs waiting for a completion if +* connect command failed +*/ + flush_workqueue(queue->work_q); + if (queue->connected) + nvmet_sq_destroy(>nvme_sq); I was wondering if there is any way for this fc target layer to fake send an NVMe completion to the nvmet layer to prevent a nvmet layer hang (because I'm assuming the nvmet layer hangs because it never receives a connect completion upon failure here), then send a signal to tear down the sq. Or alternatively call nvmet_ctrl_fatal_error() if connect fails as a trial/alternative to having the nvmet layer hang? I haven't tried to fix the bugs in the other layers - enough to do in FC already. Agree to look into this. + +/** + * nvme_fc_register_targetport - transport entry point called by an + * LLDD to register the existence of a local + * NVME subystem FC port. + * @pinfo: pointer to information about the port to be registered + * @template: LLDD entrypoints and operational parameters for the port + * @dev: physical hardware device node port corresponds to. Will be + * used for DMA mappings + * @tgtport_p: pointer to a local port pointer. Upon success, the looks like the variable tgtport_p does not exist (or it's now called portptr)? Yeah - comment not in sync with the code's name. +/* + * Actual processing routine for received FC-NVME LS Requests from the LLD + */ +void +nvmet_fc_handle_ls_rqst(struct nvmet_fc_tgtport *tgtport, + struct nvmet_fc_ls_iod *iod) +{ + struct fcnvme_ls_rqst_w0 *w0 = + (struct fcnvme_ls_rqst_w0 *)iod->rqstbuf; + + iod->lsreq->nvmet_fc_private = iod; + iod->lsreq->rspbuf = iod->rspbuf; + iod->lsreq->rspdma = iod->rspdma; + iod->lsreq->done = nvmet_fc_xmt_ls_rsp_done; + /* Be preventative. handlers will later set to valid length */ + iod->lsreq->rsplen = 0; + + iod->assoc = NULL; + + /* +* handlers: +* parse request input, set up nvmet req (cmd, rsp, execute) +* and format the LS response +* if non-zero returned, then no futher action taken on the LS +* if zero: +* valid to call nvmet layer if execute routine set +* iod->rspbuf contains ls response +*/ + switch (w0->ls_cmd) { + case FCNVME_LS_CREATE_ASSOCIATION: + /* Creates Association and initial Admin Queue/Connection */ + nvmet_fc_ls_create_association(tgtport, iod); + break; + case FCNVME_LS_CREATE_CONNECTION: + /* Creates an IO Queue/Connection */ + nvmet_fc_ls_create_connection(tgtport, iod); + break; + case FCNVME_LS_DISCONNECT: + /* Terminate a Queue/Connection or the Association */ + nvmet_fc_ls_disconnect(tgtport, iod); + break; + default: + iod->lsreq->rsplen = nvmet_fc_format_rjt(iod ->rspbuf, + NVME_FC_MAX_LS_BUFFER_SIZE, w0 ->ls_cmd, + LSRJT_REASON_INVALID_ELS_CODE, + LSRJT_EXPL_NO_EXPLANATION, 0); + } + + nvmet_fc_xmt_ls_rsp(tgtport, iod); +} All the functions in the case() (nvmet_fc_ls_disconnect(), nvmet_fc_ls_create_association(), etc) have internal return values (errors and certain values), I'm curious why you wouldn't want to bubble up those values through the function calling chain? Especially since there is a comment in the function above that says "if non-zero returned, then no futher action taken on the LS". just style. they all generate a response payload that is then always sent, so returning a code didn't really mean much. The comment is old. They used to return the value, but as you can see, they don't now. Will clean it up. . + +static int __init nvmet_fc_init_module(void) +{ + /* ensure NVMET_NR_QUEUES is a power of 2 - required for our masks */ + if (!is_power_of_2((unsigned long)NVMET_NR_QUEUES)) { + pr_err("%s: NVMET_NR_QUEUES required to be power of 2\n", + __func__); If this is so important that NVMET_NR_QUEUES always be a power of two for this FC driver, I'd have the function print out the value in the error message for easier diagnosis. no problem. And why mask the sign of NVMET_NR_QUEUES? Yes, it would have to be a really careless programmer error that would be caught very quick if the #define became negative, but masking out the sign of a #define value that seems to be pretty important for FC transport functionality makes me a tad
Re: [PATCH 3/5] nvme-fabrics: Add host FC transport support
On 7/29/2016 3:10 PM, J Freyensee wrote: + /* TODO: +* assoc_rqst->assoc_cmd.cntlid = cpu_to_be16(?); +* strncpy(assoc_rqst->assoc_cmd.hostid, ?, +* min(FCNVME_ASSOC_HOSTID_LEN, NVMF_NQN_SIZE)); +* strncpy(assoc_rqst->assoc_cmd.hostnqn, ?, +* min(FCNVME_ASSOC_HOSTNQN_LEN, NVMF_NQN_SIZE]; +*/ What is the TODO here? main todo is that the ctrlid is not known yet - not set in the opts struct yet. It's set within the fabrics connect routines that build the capsule, which are called after this transport action. So the todo is to modify the core to add it to the opts struct as well. Then these lines can be added in. more snip... + +static int +nvme_fc_init_queue(struct nvme_fc_ctrl *ctrl, int idx, size_t queue_size) +{ + ... + + return 0; Slightly confused. Looks like in nvme_fc_configure_admin_queue() and nvme_fc_init_io_queues() check for this function returning an error, but nvme_fc_init_queue() never returns anything but 0. Should it return an error? Does the comments above imply that this function could change in the future such that it would return something other than 0? more more snip... + +static int +nvme_fc_init_io_queues(struct nvme_fc_ctrl *ctrl) +{ + ... + + return 0; Right now as-is nvme_fc_init_queue() will always return 0, but this function is hard-coded to return 0. Independent of what nvme_fc_init_queue() returns, this function should be returning 'ret' as "nvme_fc_create_io_queues()" has code to check if this function fails: +static int +nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl) +{ . . . + dev_info(ctrl->ctrl.device, "creating %d I/O queues.\n", + opts->nr_io_queues); + + ret = nvme_fc_init_io_queues(ctrl); + if (ret) + return ret; + Well, at the time it was written, it wasn't clear it would always return 0. As it does, I'll see about cleaning it up. +static int +nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue *queue, + struct nvme_fc_fcp_op *op, u32 data_len, + enum nvmefc_fcp_datadir io_dir) +{ + ... + + if (op->rq) + blk_mq_start_request(op->rq); + + ret = ctrl->lport->ops->fcp_io(>lport->localport, + >rport->remoteport, + queue->lldd_handle, ->fcp_req); + + if (ret) { + dev_err(ctrl->dev, + "Send nvme command failed - lldd returned %d.\n", ret); see a bit lower for a comment on this if(ret) evaluating to TRUE. + + if (op->rq) {/* normal request */ + nvme_fc_unmap_data(ctrl, op->rq, op); + nvme_cleanup_cmd(op->rq); + if (ret != -EBUSY) { + /* complete the io w/ error status */ + blk_mq_complete_request(op->rq, + NVME_SC_FC_TRANSPORT _ERROR); + } else { + blk_mq_stop_hw_queues(op->rq->q); + nvme_requeue_req(op->rq); + blk_mq_delay_queue(queue->hctx, + NVMEFC_QUEUE_DELAY); + } + } else {/* aen */ + struct nvme_completion *cqe = ->rsp_iu.cqe; + + cqe->status = (NVME_SC_FC_TRANSPORT_ERROR << 1); + nvme_complete_async_event(>ctrl ->ctrl, cqe); + } + } + + return BLK_MQ_RQ_QUEUE_OK; Is this right? We want to return 'BLK_MQ_RQ_QUEUE_OK' and not something to signify the 'ret' value was not 0? yeah - this could probably use fixing. I should be returning BLK_MQ_RQ_QUEUE_BUSY for some of the cases and BLK_MQ_RQ_QUEUE_ERROR on some others. +static struct nvme_ctrl * +__nvme_fc_create_ctrl(struct device *dev, struct nvmf_ctrl_options *opts, + struct nvme_fc_lport *lport, struct nvme_fc_rport *rport) +{ + struct nvme_fc_ctrl *ctrl; + int ret; + bool changed; + + ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL); + if (!ctrl) + return ERR_PTR(-ENOMEM); + ctrl->ctrl.opts = opts; + INIT_LIST_HEAD(>ctrl_list); + ctrl->lport = lport; + ctrl->l_id = lport->localport.port_num; + ctrl->rport = rport; + ctrl->r_id = rport->remoteport.port_num; + ctrl->dev = lport->dev; + + ret = nvme_init_ctrl(>ctrl, dev, _fc_ctrl_ops, 0); + if (ret) + goto out_free_ctrl; + + INIT_WORK(>delete_work, nvme_fc_del_ctrl_work); + spin_lock_init(>lock); + + /* io queue count */ + ctrl->queue_count = min_t(unsigned int, + opts->nr_io_queues, +
Re: [PATCH 2/2] Migrate zone cache from RB-Tree to arrays of descriptors
On Mon, Aug 22, 2016 at 2:11 AM, Hannes Reineckewrote: > On 08/22/2016 06:34 AM, Shaun Tancheff wrote: >> Currently the RB-Tree zone cache is fast and flexible. It does >> use a rather largish amount of ram. This model reduces the ram >> required from 120 bytes per zone to 16 bytes per zone with a >> moderate transformation of the blk_zone_lookup() api. >> >> This model is predicated on the belief that most variations >> on zoned media will follow a pattern of using collections of same >> sized zones on a single device. Similar to the pattern of erase >> blocks on flash devices being progressivly larger 16K, 64K, ... >> >> The goal is to be able to build a descriptor which is both memory >> efficient, performant, and flexible. >> >> Signed-off-by: Shaun Tancheff >> --- >> block/blk-core.c |2 +- >> block/blk-sysfs.c | 31 +- >> block/blk-zoned.c | 103 +++-- >> drivers/scsi/sd.c |5 +- >> drivers/scsi/sd.h |4 +- >> drivers/scsi/sd_zbc.c | 1025 >> +++- >> include/linux/blkdev.h | 82 +++- >> 7 files changed, 716 insertions(+), 536 deletions(-) > Have you measure the performance impact here? As far as actual hardware (HostAware) I am seeing the same I/O performance. I suspect its just that below 100k iops the zone cache just isn't a bottleneck. > The main idea behind using an RB-tree is that each single element will > fit in the CPU cache; using an array will prevent that. > So we will increase the number of cache flushes, and most likely a > performance penalty, too. > Hence I'd rather like to see a performance measurement here before going > down that road. I think it will have to be a simulated benchmark, if that's okay. Of course I'm open to suggestions if there is something you have in mind. -- Regards, Shaun Tancheff -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi:Prevent deletion of SCSI block device in use
Hi Vasundhara, [auto build test WARNING on scsi/for-next] [also build test WARNING on v4.8-rc3 next-20160822] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information] url: https://github.com/0day-ci/linux/commits/Vasundhara-Gurunath/scsi-Prevent-deletion-of-SCSI-block-device-in-use/20160819-184126 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next config: x86_64-randconfig-s2-08191801 (attached as .config) compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers/scsi/scsi_sysfs.c: In function 'sdev_store_delete': >> drivers/scsi/scsi_sysfs.c:744: warning: format '%d' expects type 'int', but >> argument 5 has type 'long int' vim +744 drivers/scsi/scsi_sysfs.c 728 sdev->delete_msg_buf = 729 kmalloc(128, GFP_KERNEL); 730 memset(sdev->delete_msg_buf, 0, 128); 731 } 732 do_gettimeofday(); 733 time_to_tm(tv.tv_sec, 0, ); 734 sprintf(sdev->delete_msg_buf, 735 "Last delete attempt: %d:%d:%d %02d:%02d\n" 736 "Open Count : %d\n" 737 "IO Active Count : %d\n" 738 "IO Done Count : %d\n", 739 tms.tm_mday, tms.tm_mon + 1, 740 tms.tm_year + 1900, 741 tms.tm_hour, tms.tm_min, 742 sdev->usage_count, 743 sdev->iorequest_cnt.counter, > 744 sdev->iodone_cnt.counter); 745 746 return count; 747 } 748 } 749 750 if (device_remove_file_self(dev, attr)) 751 scsi_remove_device(to_scsi_device(dev)); 752 return count; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
RE: [PATCH] megaraid_sas: Use memdup_user() rather than duplicating its implementation
>-Original Message- >From: SF Markus Elfring [mailto:elfr...@users.sourceforge.net] >Sent: Sunday, August 21, 2016 2:19 PM >To: linux-scsi@vger.kernel.org; megaraidlinux@avagotech.com; James E. >J. >Bottomley; Kashyap Desai; Martin K. Petersen; Sumit Saxena; Uday Lingala >Cc: LKML; kernel-janit...@vger.kernel.org; Julia Lawall >Subject: [PATCH] megaraid_sas: Use memdup_user() rather than duplicating >its >implementation > >From: Markus Elfring>Date: Sun, 21 Aug 2016 10:39:04 +0200 > >Reuse existing functionality from memdup_user() instead of keeping >duplicate >source code. > >This issue was detected by using the Coccinelle software. > >Signed-off-by: Markus Elfring >--- > drivers/scsi/megaraid/megaraid_sas_base.c | 11 +++ > 1 file changed, 3 insertions(+), 8 deletions(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >b/drivers/scsi/megaraid/megaraid_sas_base.c >index c1ed25a..9a2fe4e 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_base.c >+++ b/drivers/scsi/megaraid/megaraid_sas_base.c >@@ -6711,14 +6711,9 @@ static int megasas_mgmt_ioctl_fw(struct file *file, >unsigned long arg) > unsigned long flags; > u32 wait_time = MEGASAS_RESET_WAIT_TIME; > >- ioc = kmalloc(sizeof(*ioc), GFP_KERNEL); >- if (!ioc) >- return -ENOMEM; >- >- if (copy_from_user(ioc, user_ioc, sizeof(*ioc))) { >- error = -EFAULT; >- goto out_kfree_ioc; >- } >+ ioc = memdup_user(user_ioc, sizeof(*ioc)); >+ if (IS_ERR(ioc)) >+ return PTR_ERR(ioc); > > instance = megasas_lookup_instance(ioc->host_no); > if (!instance) { Acked by: Sumit Saxena >-- >2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 2/3] ses: use scsi_is_sas_rphy instead of is_sas_attached
Hi Johannes, [auto build test ERROR on scsi/for-next] [also build test ERROR on v4.8-rc3] [cannot apply to next-20160822] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information] url: https://github.com/0day-ci/linux/commits/Johannes-Thumshirn/Fix-panic-when-a-SES-device-is-attached-to-a-hpsa-logical-volume/20160815-231901 base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next config: x86_64-randconfig-n0-08182202 (attached as .config) compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/built-in.o: In function `ses_match_to_enclosure': >> ses.c:(.text+0x548dae): undefined reference to `scsi_is_sas_rphy' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH 1/6] target: fix hang in target_wait_for_sess_cmds()
When shutting down a session I'm seeing this hung_task message: INFO: task kworker/u128:0:6 blocked for more than 480 seconds. Tainted: GE N 4.4.17-default+ #241 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u128:0 D 880237615c00 0 6 2 0x Workqueue: fc_rport_eq fc_rport_work [libfc] 88017edd7c48 81c11500 88017edd0180 88017edd8000 88035b6860a8 88017edd0180 88036461eac0 88017edd7c60 815db4d5 7fff 88017edd7d10 Call Trace: [] schedule+0x35/0x80 [] schedule_timeout+0x237/0x2d0 [] wait_for_completion+0xa3/0x110 [] target_wait_for_sess_cmds+0x45/0x190 [target_core_mod] [] ft_close_sess+0x24/0x30 [tcm_fc] [] ft_prlo+0x8a/0x95 [tcm_fc] [] fc_rport_work+0x3b8/0x7c0 [libfc] [] process_one_work+0x14e/0x410 [] worker_thread+0x116/0x490 Looking at the code it looks as if there is some confusion about when to call 'wait_for_completion(>cmd_wait_comp). The problem is that there are _two_ sections where the code waits for 'cmd_wait_comp' completion, but the completion is only called with 'complete', not 'complete_all'. So we cannot have both waiting for it, doing so will lead to a stuck wait_for_completion. Signed-off-by: Hannes Reinecke--- drivers/target/target_core_transport.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 6094a6b..2e1a6d8 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2522,7 +2522,7 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) * the remaining calls to target_put_sess_cmd(), and not the * callers of this function. */ - if (aborted) { + if (aborted && !cmd->cmd_wait_set) { pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag); wait_for_completion(>cmd_wait_comp); cmd->se_tfo->release_cmd(cmd); @@ -2642,9 +2642,11 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess) list_for_each_entry(se_cmd, _sess->sess_wait_list, se_cmd_list) { rc = kref_get_unless_zero(_cmd->cmd_kref); if (rc) { - se_cmd->cmd_wait_set = 1; spin_lock(_cmd->t_state_lock); - se_cmd->transport_state |= CMD_T_FABRIC_STOP; + if (!(se_cmd->transport_state & CMD_T_FABRIC_STOP)) { + se_cmd->cmd_wait_set = 1; + se_cmd->transport_state |= CMD_T_FABRIC_STOP; + } spin_unlock(_cmd->t_state_lock); } } @@ -2664,24 +2666,26 @@ void target_wait_for_sess_cmds(struct se_session *se_sess) list_for_each_entry_safe(se_cmd, tmp_cmd, _sess->sess_wait_list, se_cmd_list) { + int cmd_wait; pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:" " %d\n", se_cmd, se_cmd->t_state, se_cmd->se_tfo->get_cmd_state(se_cmd)); spin_lock_irqsave(_cmd->t_state_lock, flags); tas = (se_cmd->transport_state & CMD_T_TAS); + cmd_wait = se_cmd->cmd_wait_set; spin_unlock_irqrestore(_cmd->t_state_lock, flags); - if (!target_put_sess_cmd(se_cmd)) { if (tas) target_put_sess_cmd(se_cmd); } - + if (!cmd_wait) + continue; wait_for_completion(_cmd->cmd_wait_comp); pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d" " fabric state: %d\n", se_cmd, se_cmd->t_state, se_cmd->se_tfo->get_cmd_state(se_cmd)); - + se_cmd->cmd_wait_set = 0; se_cmd->se_tfo->release_cmd(se_cmd); } -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting()
target_sess_cmd_list_waiting() might hit on a condition where the kref for the command is already 0, but the destructor has not been called yet (or is stuck in waiting for a spin lock). Rather than leaving the command on the list we should explicitly remove it to avoid race issues later on. Signed-off-by: Hannes Reinecke--- drivers/target/target_core_transport.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 2e1a6d8..ce136f0 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2547,8 +2547,8 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref) * fabric acknowledgement that requires two target_put_sess_cmd() * invocations before se_cmd descriptor release. */ - if (ack_kref) - kref_get(_cmd->cmd_kref); + if (ack_kref && !kref_get_unless_zero(_cmd->cmd_kref)) + return -EINVAL; spin_lock_irqsave(_sess->sess_cmd_lock, flags); if (se_sess->sess_tearing_down) { @@ -2627,7 +2627,7 @@ EXPORT_SYMBOL(target_put_sess_cmd); */ void target_sess_cmd_list_set_waiting(struct se_session *se_sess) { - struct se_cmd *se_cmd; + struct se_cmd *se_cmd, *tmp_cmd; unsigned long flags; int rc; @@ -2639,7 +2639,8 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess) se_sess->sess_tearing_down = 1; list_splice_init(_sess->sess_cmd_list, _sess->sess_wait_list); - list_for_each_entry(se_cmd, _sess->sess_wait_list, se_cmd_list) { + list_for_each_entry_safe(se_cmd, tmp_cmd, +_sess->sess_wait_list, se_cmd_list) { rc = kref_get_unless_zero(_cmd->cmd_kref); if (rc) { spin_lock(_cmd->t_state_lock); @@ -2648,7 +2649,8 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess) se_cmd->transport_state |= CMD_T_FABRIC_STOP; } spin_unlock(_cmd->t_state_lock); - } + } else + list_del_init(_cmd->se_cmd_list); } spin_unlock_irqrestore(_sess->sess_cmd_lock, flags); -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage
Update the debug statements to match those from libfc. Signed-off-by: Hannes Reinecke--- drivers/target/tcm_fc/tfc_sess.c | 37 ++--- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/drivers/target/tcm_fc/tfc_sess.c b/drivers/target/tcm_fc/tfc_sess.c index cc8c261..fd5c3de 100644 --- a/drivers/target/tcm_fc/tfc_sess.c +++ b/drivers/target/tcm_fc/tfc_sess.c @@ -39,6 +39,11 @@ #include "tcm_fc.h" +#define TFC_SESS_DBG(lport, fmt, args...) \ + pr_debug("host%u: rport %6.6x: " fmt, \ +(lport)->host->host_no, \ +(lport)->port_id, ##args ) + static void ft_sess_delete_all(struct ft_tport *); /* @@ -167,24 +172,29 @@ static struct ft_sess *ft_sess_get(struct fc_lport *lport, u32 port_id) struct ft_tport *tport; struct hlist_head *head; struct ft_sess *sess; + char *reason = "no session created"; rcu_read_lock(); tport = rcu_dereference(lport->prov[FC_TYPE_FCP]); - if (!tport) + if (!tport) { + reason = "not an FCP port"; goto out; + } head = >hash[ft_sess_hash(port_id)]; hlist_for_each_entry_rcu(sess, head, hash) { if (sess->port_id == port_id) { kref_get(>kref); rcu_read_unlock(); - pr_debug("port_id %x found %p\n", port_id, sess); + TFC_SESS_DBG(lport, "port_id %x found %p\n", +port_id, sess); return sess; } } out: rcu_read_unlock(); - pr_debug("port_id %x not found\n", port_id); + TFC_SESS_DBG(lport, "port_id %x not found, %s\n", +port_id, reason); return NULL; } @@ -195,7 +205,7 @@ static int ft_sess_alloc_cb(struct se_portal_group *se_tpg, struct ft_tport *tport = sess->tport; struct hlist_head *head = >hash[ft_sess_hash(sess->port_id)]; - pr_debug("port_id %x sess %p\n", sess->port_id, sess); + TFC_SESS_DBG(tport->lport, "port_id %x sess %p\n", sess->port_id, sess); hlist_add_head_rcu(>hash, head); tport->sess_count++; @@ -320,7 +330,7 @@ void ft_sess_close(struct se_session *se_sess) mutex_unlock(_lport_lock); return; } - pr_debug("port_id %x\n", port_id); + TFC_SESS_DBG(sess->tport->lport, "port_id %x close session\n", port_id); ft_sess_unhash(sess); mutex_unlock(_lport_lock); ft_close_sess(sess); @@ -380,8 +390,13 @@ static int ft_prli_locked(struct fc_rport_priv *rdata, u32 spp_len, if (!(fcp_parm & FCP_SPPF_INIT_FCN)) return FC_SPP_RESP_CONF; sess = ft_sess_create(tport, rdata->ids.port_id, rdata); - if (!sess) - return FC_SPP_RESP_RES; + if (IS_ERR(sess)) { + if (PTR_ERR(sess) == -EACCES) { + spp->spp_flags &= ~FC_SPP_EST_IMG_PAIR; + return FC_SPP_RESP_CONF; + } else + return FC_SPP_RESP_RES; + } if (!sess->params) rdata->prli_count++; sess->params = fcp_parm; @@ -424,8 +439,8 @@ static int ft_prli(struct fc_rport_priv *rdata, u32 spp_len, mutex_lock(_lport_lock); ret = ft_prli_locked(rdata, spp_len, rspp, spp); mutex_unlock(_lport_lock); - pr_debug("port_id %x flags %x ret %x\n", - rdata->ids.port_id, rspp ? rspp->spp_flags : 0, ret); + TFC_SESS_DBG(rdata->local_port, "port_id %x flags %x ret %x\n", +rdata->ids.port_id, rspp ? rspp->spp_flags : 0, ret); return ret; } @@ -478,11 +493,11 @@ static void ft_recv(struct fc_lport *lport, struct fc_frame *fp) struct ft_sess *sess; u32 sid = fc_frame_sid(fp); - pr_debug("sid %x\n", sid); + TFC_SESS_DBG(lport, "recv sid %x\n", sid); sess = ft_sess_get(lport, sid); if (!sess) { - pr_debug("sid %x sess lookup failed\n", sid); + TFC_SESS_DBG(lport, "sid %x sess lookup failed\n", sid); /* TBD XXX - if FCP_CMND, send PRLO */ fc_frame_free(fp); return; -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] target/tcm_fc: print command pointer in debug message
When allocating a new command we should add the pointer to the debug statements; that allows us to match this with other debug statements for handling data. Signed-off-by: Hannes Reinecke--- drivers/target/tcm_fc/tfc_cmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c index 216e18c..36f0864 100644 --- a/drivers/target/tcm_fc/tfc_cmd.c +++ b/drivers/target/tcm_fc/tfc_cmd.c @@ -575,7 +575,7 @@ static void ft_send_work(struct work_struct *work) TARGET_SCF_ACK_KREF)) goto err; - pr_debug("r_ctl %x alloc target_submit_cmd\n", fh->fh_r_ctl); + pr_debug("r_ctl %x target_submit_cmd %p\n", fh->fh_r_ctl, cmd); return; err: -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] target/tcm_fc: use CPU affinity for responses
The libfc stack assigns exchange IDs based on the CPU the request was received on, so we need to send the responses via the same CPU. Otherwise the send logic gets confuses and responses will be delayed, causing exchange timeouts on the initiator side. Signed-off-by: Hannes Reinecke--- drivers/target/tcm_fc/tfc_cmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c index 36f0864..ff5de9a 100644 --- a/drivers/target/tcm_fc/tfc_cmd.c +++ b/drivers/target/tcm_fc/tfc_cmd.c @@ -572,7 +572,7 @@ static void ft_send_work(struct work_struct *work) if (target_submit_cmd(>se_cmd, cmd->sess->se_sess, fcp->fc_cdb, >ft_sense_buffer[0], scsilun_to_int(>fc_lun), ntohl(fcp->fc_dl), task_attr, data_dir, - TARGET_SCF_ACK_KREF)) + TARGET_SCF_ACK_KREF | TARGET_SCF_USE_CPUID)) goto err; pr_debug("r_ctl %x target_submit_cmd %p\n", fh->fh_r_ctl, cmd); -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] target/tcm_fc: return detailed error in ft_sess_create()
Not every failure is due to out-of-memory; the ACLs might not be set, too. So return a detailed error code in ft_sess_create() instead of just a NULL pointer. Signed-off-by: Hannes Reinecke--- drivers/target/tcm_fc/tfc_sess.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/target/tcm_fc/tfc_sess.c b/drivers/target/tcm_fc/tfc_sess.c index 6ffbb60..cc8c261 100644 --- a/drivers/target/tcm_fc/tfc_sess.c +++ b/drivers/target/tcm_fc/tfc_sess.c @@ -223,7 +223,7 @@ static struct ft_sess *ft_sess_create(struct ft_tport *tport, u32 port_id, sess = kzalloc(sizeof(*sess), GFP_KERNEL); if (!sess) - return NULL; + return ERR_PTR(-ENOMEM); kref_init(>kref); /* ref for table entry */ sess->tport = tport; @@ -234,8 +234,9 @@ static struct ft_sess *ft_sess_create(struct ft_tport *tport, u32 port_id, TARGET_PROT_NORMAL, [0], sess, ft_sess_alloc_cb); if (IS_ERR(sess->se_sess)) { + int rc = PTR_ERR(sess->se_sess); kfree(sess); - return NULL; + sess = ERR_PTR(rc); } return sess; } -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] tcm_fc fixes
Hi Nic, here's a set of patches and fixes I've found during debugging FCoE over virtio. The first two fix a race condition I've encountered during starting and stopping initiators; they are similar to your prior patches, but I still think they should be applied to avoid any race issues. The next three are just minor cleanups, but the last one is _absolutely_ crucial. With the current code tcm_fc will behave _very_ odd if you're running in a virtualized environment, as the xid allocator in libfc will be thouroughly thrashed by sending frames on _other_ cpus than those the originator frame got received on. This causes frames to be delayed (by up to several minutes) leading to I/O errors on the initiator side. As usual, comments and reviews are welcome. Hannes Reinecke (6): target: fix hang in target_wait_for_sess_cmds() target: fix potential race window in target_sess_cmd_list_waiting() target/tcm_fc: print command pointer in debug message target/tcm_fc: return detailed error in ft_sess_create() target/tcm_fc: Update debugging statements to match libfc usage target/tcm_fc: use CPU affinity for responses drivers/target/target_core_transport.c | 28 ++- drivers/target/tcm_fc/tfc_cmd.c| 4 ++-- drivers/target/tcm_fc/tfc_sess.c | 42 +++--- 3 files changed, 48 insertions(+), 26 deletions(-) -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Migrate zone cache from RB-Tree to arrays of descriptors
On 08/22/2016 06:34 AM, Shaun Tancheff wrote: > Currently the RB-Tree zone cache is fast and flexible. It does > use a rather largish amount of ram. This model reduces the ram > required from 120 bytes per zone to 16 bytes per zone with a > moderate transformation of the blk_zone_lookup() api. > > This model is predicated on the belief that most variations > on zoned media will follow a pattern of using collections of same > sized zones on a single device. Similar to the pattern of erase > blocks on flash devices being progressivly larger 16K, 64K, ... > > The goal is to be able to build a descriptor which is both memory > efficient, performant, and flexible. > > Signed-off-by: Shaun Tancheff> --- > block/blk-core.c |2 +- > block/blk-sysfs.c | 31 +- > block/blk-zoned.c | 103 +++-- > drivers/scsi/sd.c |5 +- > drivers/scsi/sd.h |4 +- > drivers/scsi/sd_zbc.c | 1025 > +++- > include/linux/blkdev.h | 82 +++- > 7 files changed, 716 insertions(+), 536 deletions(-) > Have you measure the performance impact here? The main idea behind using an RB-tree is that each single element will fit in the CPU cache; using an array will prevent that. So we will increase the number of cache flushes, and most likely a performance penalty, too. Hence I'd rather like to see a performance measurement here before going down that road. Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html