Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same

2016-08-22 Thread Damien Le Moal

Shaun,

On 8/23/16 09:22, Shaun Tancheff wrote:
> On Mon, Aug 22, 2016 at 6:57 PM, Damien Le Moal  
> wrote:
>>
>> Shaun,
>>
>> On 8/22/16 13:31, Shaun Tancheff wrote:
>> [...]
>>> -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq,
>>> -  sector_t sector, unsigned int num_sectors)
>>> +int sd_zbc_setup_discard(struct scsi_cmnd *cmd)
>>>  {
>>> - struct blk_zone *zone;
>>> + struct request *rq = cmd->request;
>>> + struct scsi_device *sdp = cmd->device;
>>> + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
>>> + sector_t sector = blk_rq_pos(rq);
>>> + unsigned int nr_sectors = blk_rq_sectors(rq);
>>>   int ret = BLKPREP_OK;
>>> + struct blk_zone *zone;
>>>   unsigned long flags;
>>> + u32 wp_offset;
>>> + bool use_write_same = false;
>>>
>>>   zone = blk_lookup_zone(rq->q, sector);
>>> - if (!zone)
>>> + if (!zone) {
>>> + /* Test for a runt zone before giving up */
>>> + if (sdp->type != TYPE_ZBC) {
>>> + struct request_queue *q = rq->q;
>>> + struct rb_node *node;
>>> +
>>> + node = rb_last(>zones);
>>> + if (node)
>>> + zone = rb_entry(node, struct blk_zone, node);
>>> + if (zone) {
>>> + spin_lock_irqsave(>lock, flags);
>>> + if ((zone->start + zone->len) <= sector)
>>> + goto out;
>>> + spin_unlock_irqrestore(>lock, flags);
>>> + zone = NULL;
>>> + }
>>> + }
>>>   return BLKPREP_KILL;
>>> + }
>>
>> I do not understand the point of this code here to test for the runt
>> zone. As long as sector is within the device maximum capacity (in 512B
>> unit), blk_lookup_zone will return the pointer to the zone structure
>> containing that sector (the RB-tree does not have any constraint
>> regarding zone size). The only case where NULL would be returned is if
>> discard is issued super early right after the disk is probed and before
>> the zone refresh work has completed. We can certainly protect against
>> that by delaying the discard.
> 
> As you can see I am not including Host Managed in the
> runt check.

Indeed, but having a runt zone could also happen with a host-managed disk.

> Also you may note that in my patch to get Host Aware working
> with the zone cache I do not include the runt zone in the cache.

Why not ? The RB-tree will handle it just fine (the insert and lookup
code as Hannes had them was not relying on a constant zone size).

> So as it sits I need this fallback otherwise doing blkdiscard over
> the whole device ends in a error, as well as mkfs.f2fs et. al.

Got it, but I do not see a problem with including it. I have not checked
the code, but the split of a big discard call into "chunks" should be
already handling the last chunk and make sure that the operation does
not exceed the device capacity (in any case, that's easy to fix in the
sd_zbc_setup_discard code).

>>>  zone->start, zone->state);
>>>   ret = BLKPREP_DEFER;
>>>   goto out;
>>> @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, 
>>> struct request *rq,
>>>   if (zone->state == BLK_ZONE_OFFLINE) {
>>>   /* let the drive fail the command */
>>>   sd_zbc_debug_ratelimit(sdkp,
>>> -"Discarding offline zone %zu\n",
>>> +"Discarding offline zone %zx\n",
>>>  zone->start);
>>>   goto out;
>>>   }
>>> -
>>> - if (!blk_zone_is_smr(zone)) {
>>> + if (blk_zone_is_cmr(zone)) {
>>> + use_write_same = true;
>>>   sd_zbc_debug_ratelimit(sdkp,
>>> -"Discarding %s zone %zu\n",
>>> -blk_zone_is_cmr(zone) ? "CMR" : 
>>> "unknown",
>>> +"Discarding CMR zone %zx\n",
>>>  zone->start);
>>> - ret = BLKPREP_DONE;
>>>   goto out;
>>>   }
>>
>> Some 10TB host managed disks out there have 1% conventional zone space,
>> that is 100GB of capacity. When issuing a "reset all", doing a write
>> same in these zones will take forever... If the user really wants zeroes
>> in those zones, let it issue a zeroout.
>>
>> I think that it would a better choice to simply not report
>> discard_zeroes_data as true and do nothing for conventional zones reset.
> 
> I think that would be unfortunate for Host Managed but I think it's
> the right choice for Host Aware at this time. So either we base
> it on disk type or we have some other config flag added to sysfs.

I do not see any 

Re: [PATCH v2 15/29] be2iscsi: Fix to make boot discovery non-blocking

2016-08-22 Thread Julia Lawall
Please check the return on line 1517.  It looks suspicious that it does
not release the lock, unlike the return on line 1508.

julia

---

Hi Jitendra,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.8-rc3 next-20160822]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Jitendra-Bhivare/be2iscsi-driver-update-11-2-0-0/20160819-175550
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
:: branch date: 4 days ago
:: commit date: 4 days ago

>> drivers/scsi/be2iscsi/be_mgmt.c:1517:2-8: preceding lock on line 1504

git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout b9f267a2cf5f152fb5b5b907b715e0854f337ccd
vim +1517 drivers/scsi/be2iscsi/be_mgmt.c

b9f267a2 Jitendra Bhivare 2016-08-19  1498  struct be_cmd_get_session_req 
*req;
b9f267a2 Jitendra Bhivare 2016-08-19  1499  struct be_dma_mem *nonemb_cmd;
b9f267a2 Jitendra Bhivare 2016-08-19  1500  struct be_mcc_wrb *wrb;
b9f267a2 Jitendra Bhivare 2016-08-19  1501  struct be_sge *sge;
b9f267a2 Jitendra Bhivare 2016-08-19  1502  unsigned int tag;
b9f267a2 Jitendra Bhivare 2016-08-19  1503
b9f267a2 Jitendra Bhivare 2016-08-19 @1504  mutex_lock(>mbox_lock);
b9f267a2 Jitendra Bhivare 2016-08-19  1505  wrb = alloc_mcc_wrb(phba, );
b9f267a2 Jitendra Bhivare 2016-08-19  1506  if (!wrb) {
b9f267a2 Jitendra Bhivare 2016-08-19  1507  
mutex_unlock(>mbox_lock);
b9f267a2 Jitendra Bhivare 2016-08-19  1508  return 0;
b9f267a2 Jitendra Bhivare 2016-08-19  1509  }
b9f267a2 Jitendra Bhivare 2016-08-19  1510
b9f267a2 Jitendra Bhivare 2016-08-19  1511  nonemb_cmd = 
>boot_struct.nonemb_cmd;
b9f267a2 Jitendra Bhivare 2016-08-19  1512  nonemb_cmd->size = 
sizeof(*resp);
b9f267a2 Jitendra Bhivare 2016-08-19  1513  nonemb_cmd->va = 
pci_alloc_consistent(phba->ctrl.pdev,
b9f267a2 Jitendra Bhivare 2016-08-19  1514  
  sizeof(nonemb_cmd->size),
b9f267a2 Jitendra Bhivare 2016-08-19  1515  
  _cmd->dma);
b9f267a2 Jitendra Bhivare 2016-08-19  1516  if (!nonemb_cmd->va)
b9f267a2 Jitendra Bhivare 2016-08-19 @1517  return 0;
b9f267a2 Jitendra Bhivare 2016-08-19  1518
b9f267a2 Jitendra Bhivare 2016-08-19  1519  req = nonemb_cmd->va;
b9f267a2 Jitendra Bhivare 2016-08-19  1520  memset(req, 0, sizeof(*req));

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 6:57 PM, Damien Le Moal  wrote:
>
> Shaun,
>
> On 8/22/16 13:31, Shaun Tancheff wrote:
> [...]
>> -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq,
>> -  sector_t sector, unsigned int num_sectors)
>> +int sd_zbc_setup_discard(struct scsi_cmnd *cmd)
>>  {
>> - struct blk_zone *zone;
>> + struct request *rq = cmd->request;
>> + struct scsi_device *sdp = cmd->device;
>> + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
>> + sector_t sector = blk_rq_pos(rq);
>> + unsigned int nr_sectors = blk_rq_sectors(rq);
>>   int ret = BLKPREP_OK;
>> + struct blk_zone *zone;
>>   unsigned long flags;
>> + u32 wp_offset;
>> + bool use_write_same = false;
>>
>>   zone = blk_lookup_zone(rq->q, sector);
>> - if (!zone)
>> + if (!zone) {
>> + /* Test for a runt zone before giving up */
>> + if (sdp->type != TYPE_ZBC) {
>> + struct request_queue *q = rq->q;
>> + struct rb_node *node;
>> +
>> + node = rb_last(>zones);
>> + if (node)
>> + zone = rb_entry(node, struct blk_zone, node);
>> + if (zone) {
>> + spin_lock_irqsave(>lock, flags);
>> + if ((zone->start + zone->len) <= sector)
>> + goto out;
>> + spin_unlock_irqrestore(>lock, flags);
>> + zone = NULL;
>> + }
>> + }
>>   return BLKPREP_KILL;
>> + }
>
> I do not understand the point of this code here to test for the runt
> zone. As long as sector is within the device maximum capacity (in 512B
> unit), blk_lookup_zone will return the pointer to the zone structure
> containing that sector (the RB-tree does not have any constraint
> regarding zone size). The only case where NULL would be returned is if
> discard is issued super early right after the disk is probed and before
> the zone refresh work has completed. We can certainly protect against
> that by delaying the discard.

As you can see I am not including Host Managed in the
runt check.

Also you may note that in my patch to get Host Aware working
with the zone cache I do not include the runt zone in the cache.
So as it sits I need this fallback otherwise doing blkdiscard over
the whole device ends in a error, as well as mkfs.f2fs et. al.

>>   spin_lock_irqsave(>lock, flags);
>> -
>>   if (zone->state == BLK_ZONE_UNKNOWN ||
>>   zone->state == BLK_ZONE_BUSY) {
>>   sd_zbc_debug_ratelimit(sdkp,
>> -"Discarding zone %zu state %x, 
>> deferring\n",
>> +"Discarding zone %zx state %x, 
>> deferring\n",
>
> Sector values are usually displayed in decimal. Why use Hex here ? At
> least "0x" would be needed to avoid confusion I think.

Yeah, my brain is lazy about converting very large
numbers to powers of 2. So it's much easier to spot
zone alignment here.



>>  zone->start, zone->state);
>>   ret = BLKPREP_DEFER;
>>   goto out;
>> @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, 
>> struct request *rq,
>>   if (zone->state == BLK_ZONE_OFFLINE) {
>>   /* let the drive fail the command */
>>   sd_zbc_debug_ratelimit(sdkp,
>> -"Discarding offline zone %zu\n",
>> +"Discarding offline zone %zx\n",
>>  zone->start);
>>   goto out;
>>   }
>> -
>> - if (!blk_zone_is_smr(zone)) {
>> + if (blk_zone_is_cmr(zone)) {
>> + use_write_same = true;
>>   sd_zbc_debug_ratelimit(sdkp,
>> -"Discarding %s zone %zu\n",
>> -blk_zone_is_cmr(zone) ? "CMR" : 
>> "unknown",
>> +"Discarding CMR zone %zx\n",
>>  zone->start);
>> - ret = BLKPREP_DONE;
>>   goto out;
>>   }
>
> Some 10TB host managed disks out there have 1% conventional zone space,
> that is 100GB of capacity. When issuing a "reset all", doing a write
> same in these zones will take forever... If the user really wants zeroes
> in those zones, let it issue a zeroout.
>
> I think that it would a better choice to simply not report
> discard_zeroes_data as true and do nothing for conventional zones reset.

I think that would be unfortunate for Host Managed but I think it's
the right choice for Host Aware at this time. So either we base
it on disk type or we have some other config flag added to sysfs.

>> - if (blk_zone_is_empty(zone)) {
>> - sd_zbc_debug_ratelimit(sdkp,

Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same

2016-08-22 Thread Damien Le Moal

Shaun,

On 8/22/16 13:31, Shaun Tancheff wrote:
[...]
> -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq,
> -  sector_t sector, unsigned int num_sectors)
> +int sd_zbc_setup_discard(struct scsi_cmnd *cmd)
>  {
> - struct blk_zone *zone;
> + struct request *rq = cmd->request;
> + struct scsi_device *sdp = cmd->device;
> + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> + sector_t sector = blk_rq_pos(rq);
> + unsigned int nr_sectors = blk_rq_sectors(rq);
>   int ret = BLKPREP_OK;
> + struct blk_zone *zone;
>   unsigned long flags;
> + u32 wp_offset;
> + bool use_write_same = false;
>  
>   zone = blk_lookup_zone(rq->q, sector);
> - if (!zone)
> + if (!zone) {
> + /* Test for a runt zone before giving up */
> + if (sdp->type != TYPE_ZBC) {
> + struct request_queue *q = rq->q;
> + struct rb_node *node;
> +
> + node = rb_last(>zones);
> + if (node)
> + zone = rb_entry(node, struct blk_zone, node);
> + if (zone) {
> + spin_lock_irqsave(>lock, flags);
> + if ((zone->start + zone->len) <= sector)
> + goto out;
> + spin_unlock_irqrestore(>lock, flags);
> + zone = NULL;
> + }
> + }
>   return BLKPREP_KILL;
> + }

I do not understand the point of this code here to test for the runt
zone. As long as sector is within the device maximum capacity (in 512B
unit), blk_lookup_zone will return the pointer to the zone structure
containing that sector (the RB-tree does not have any constraint
regarding zone size). The only case where NULL would be returned is if
discard is issued super early right after the disk is probed and before
the zone refresh work has completed. We can certainly protect against
that by delaying the discard.


>  
>   spin_lock_irqsave(>lock, flags);
> -
>   if (zone->state == BLK_ZONE_UNKNOWN ||
>   zone->state == BLK_ZONE_BUSY) {
>   sd_zbc_debug_ratelimit(sdkp,
> -"Discarding zone %zu state %x, 
> deferring\n",
> +"Discarding zone %zx state %x, 
> deferring\n",

Sector values are usually displayed in decimal. Why use Hex here ? At
least "0x" would be needed to avoid confusion I think.

>  zone->start, zone->state);
>   ret = BLKPREP_DEFER;
>   goto out;
> @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct 
> request *rq,
>   if (zone->state == BLK_ZONE_OFFLINE) {
>   /* let the drive fail the command */
>   sd_zbc_debug_ratelimit(sdkp,
> -"Discarding offline zone %zu\n",
> +"Discarding offline zone %zx\n",
>  zone->start);
>   goto out;
>   }
> -
> - if (!blk_zone_is_smr(zone)) {
> + if (blk_zone_is_cmr(zone)) {
> + use_write_same = true;
>   sd_zbc_debug_ratelimit(sdkp,
> -"Discarding %s zone %zu\n",
> -blk_zone_is_cmr(zone) ? "CMR" : 
> "unknown",
> +"Discarding CMR zone %zx\n",
>  zone->start);
> - ret = BLKPREP_DONE;
>   goto out;
>   }

Some 10TB host managed disks out there have 1% conventional zone space,
that is 100GB of capacity. When issuing a "reset all", doing a write
same in these zones will take forever... If the user really wants zeroes
in those zones, let it issue a zeroout.

I think that it would a better choice to simply not report
discard_zeroes_data as true and do nothing for conventional zones reset.

> - if (blk_zone_is_empty(zone)) {
> - sd_zbc_debug_ratelimit(sdkp,
> -"Discarding empty zone %zu\n",
> -zone->start);
> - ret = BLKPREP_DONE;
> + if (zone->start != sector || zone->len < nr_sectors) {
> + sd_printk(KERN_ERR, sdkp,
> +   "Misaligned RESET WP %zx/%x on zone %zx/%zx\n",
> +   sector, nr_sectors, zone->start, zone->len);
> + ret = BLKPREP_KILL;
>   goto out;
>   }
> -
> - if (zone->start != sector ||
> - zone->len < num_sectors) {
> + /* Protect against Reset WP when more data had been written to the
> +  * zone than is being discarded.
> +  */
> + wp_offset = zone->wp - zone->start;
> + if (wp_offset > nr_sectors) {
>   sd_printk(KERN_ERR, sdkp,
> -   

[Bug 151661] Adaptec 3405 3805 prints "AAC: Host adapter dead -1" every 10 seconds but works fine anyway

2016-08-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=151661

--- Comment #11 from Piotr Szymaniak  ---
Created attachment 229761
  --> https://bugzilla.kernel.org/attachment.cgi?id=229761=edit
config-4.7.0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 151661] Adaptec 3405 3805 prints "AAC: Host adapter dead -1" every 10 seconds but works fine anyway

2016-08-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=151661

--- Comment #10 from Piotr Szymaniak  ---
Created attachment 229751
  --> https://bugzilla.kernel.org/attachment.cgi?id=229751=edit
config-4.3.6

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 151661] Adaptec 3405 3805 prints "AAC: Host adapter dead -1" every 10 seconds but works fine anyway

2016-08-22 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=151661

--- Comment #9 from Piotr Szymaniak  ---
Created attachment 229741
  --> https://bugzilla.kernel.org/attachment.cgi?id=229741=edit
dmesg vanilla 4.3.6 (with Alex patch)

I tried previous vanilla kernels 4.x.latest-released and, as 4.6.x prints
messages over and over I skipped 4.5 series and tried:
- 4.4.19 - prints dead messages
- 4.3.6 - works without messages (with Alex patch [1], intel_iommu=igfx_off)

Not sure if this helps, but I hope it will. Any suggestions how to proceed from
here?

[1] https://www.redhat.com/archives/vfio-users/2016-July/msg00063.html

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-22 Thread Tom Yan
On 22 August 2016 at 20:32, Shaun Tancheff  wrote:
> On Mon, Aug 22, 2016 at 3:07 PM, Tom Yan  wrote:
>> I don't see how that's possible. count / n_block will always be
>> smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention
>> that isn't even a "buffer limit" anyway. By SG_IO do you mean like
>> SCSI Write Same commands that issued with sg_write_same or so? If
>> that's the case, that's what exactly commit 5c79097a28c2
>> ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds
>> limit") is for.
>
> Ah, I see. You are guarding the only user of ata_set_lba_range_entries().

Yup. It is the only right thing to do anyway, that we leave the
function "open" and guard per context when we use it. Say if
ata_set_lba_range_entries() is gonna be a function that is shared by
others, it would only make this commit more important. As I said, we
did not guard it with a certain buffer limit, but merely redundantly
guard it with a ("humanized") limit that applies to TRIM only.

> Still if you are going to do that you have to alert any new user that they
> must have an appropriately sized buffer to be overwriting.
>
> Better to move it out of ata.h then the limit the scope of accidental
> misuse?

I am not sure if it is really necessary but that's fine to me. I see
that you have been doing it in your SCT Write Same patch series.
Probably I can/should just leave the move to you?

>
> Regards,
> Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 3:07 PM, Tom Yan  wrote:
> I don't see how that's possible. count / n_block will always be
> smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention
> that isn't even a "buffer limit" anyway. By SG_IO do you mean like
> SCSI Write Same commands that issued with sg_write_same or so? If
> that's the case, that's what exactly commit 5c79097a28c2
> ("libata-scsi: reject WRITE SAME (16) with n_block that exceeds
> limit") is for.

Ah, I see. You are guarding the only user of ata_set_lba_range_entries().
Still if you are going to do that you have to alert any new user that they
must have an appropriately sized buffer to be overwriting.

Better to move it out of ata.h then the limit the scope of accidental
misuse?

Regards,
Shaun
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] aacraid: Use memdup_user() rather than duplicating its implementation

2016-08-22 Thread SF Markus Elfring
>> @@ -526,15 +526,9 @@ static int aac_send_raw_srb(struct aac_dev* dev, void
>> __user * arg)
>> goto cleanup;
>> }
>>
>> -   user_srbcmd = kmalloc(fibsize, GFP_KERNEL);
>> -   if (!user_srbcmd) {
>> -   dprintk((KERN_DEBUG"aacraid: Could not make a copy of the 
>> srb\n"));
>> -   rcode = -ENOMEM;
>> -   goto cleanup;
>> -   }
>> -   if(copy_from_user(user_srbcmd, user_srb,fibsize)){
>> -   dprintk((KERN_DEBUG"aacraid: Could not copy srb from 
>> user\n"));
>> -   rcode = -EFAULT;
>> +   user_srbcmd = memdup_user(user_srb, fibsize);
>> +   if (IS_ERR(user_srbcmd)) {
>> +   rcode = PTR_ERR(user_srbcmd);
>> goto cleanup;
>> }
>>
>> --
> 
> Hi Markus,
> 
> Patch 2/7 should precede Patch 1/7, as falling into kfree() would not look 
> pretty.

Do you eventually prefer that this source code adjustment should be combined 
with
the update suggestion "[2/7] aacraid: One function call less in 
aac_send_raw_srb()
after error detection" in a single update step?

Regards,
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-22 Thread Tom Yan
I don't see how that's possible. count / n_block will always be
smaller than 65535 * ATA_MAX_TRIM_RNUM(64) = 4194240. Not to mention
that isn't even a "buffer limit" anyway. By SG_IO do you mean like
SCSI Write Same commands that issued with sg_write_same or so? If
that's the case, that's what exactly commit 5c79097a28c2
("libata-scsi: reject WRITE SAME (16) with n_block that exceeds
limit") is for.

On 23 August 2016 at 03:58, Shaun Tancheff  wrote:
> On Mon, Aug 22, 2016 at 1:55 PM,   wrote:
>> From: Tom Yan 
>>
>> In commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with
>> n_block that exceeds limit"), it is made sure that
>> ata_set_lba_range_entries() will never be called with a request
>> size (n_block) that is larger than the number of blocks that a
>> 512-byte block TRIM payload can describe (65535 * 64 = 4194240),
>> in addition to acknowlegding the SCSI/block layer with the same
>> limit by advertising it as the Maximum Write Same Length.
>>
>> Therefore, it is unnecessary to hard code the same limit in
>> ata_set_lba_range_entries() itself, which would only cost extra
>> maintenance effort. Such effort can be noticed in, for example,
>> commit 2983860c7668 ("libata-scsi: avoid repeated calculation of
>> number of TRIM ranges").
>>
>> Signed-off-by: Tom Yan 
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index be9c76c..9b74ecb 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -3322,7 +3322,7 @@ static unsigned int ata_scsi_write_same_xlat(struct 
>> ata_queued_cmd *qc)
>> buf = page_address(sg_page(scsi_sglist(scmd)));
>>
>> if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
>> -   size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, 
>> block, n_block);
>> +   size = ata_set_lba_range_entries(buf, block, n_block);
>> } else {
>> fp = 2;
>> goto invalid_fld;
>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>> index adbc812..5e2e9ad 100644
>> --- a/include/linux/ata.h
>> +++ b/include/linux/ata.h
>> @@ -1077,19 +1077,19 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>>   * TO NV CACHE PINNED SET.
>>   */
>>  static inline unsigned ata_set_lba_range_entries(void *_buffer,
>> -   unsigned num, u64 sector, unsigned long count)
>> +   u64 sector, unsigned long count)
>>  {
>> __le64 *buffer = _buffer;
>> unsigned i = 0, used_bytes;
>>
>> -   while (i < num) {
>> -   u64 entry = sector |
>> -   ((u64)(count > 0x ? 0x : count) << 48);
>> +   while (count > 0) {
>> +   u64 range, entry;
>> +
>> +   range = count > 0x ? 0x : count;
>> +   entry = sector | (range << 48);
>> buffer[i++] = __cpu_to_le64(entry);
>> -   if (count <= 0x)
>> -   break;
>> -   count -= 0x;
>> -   sector += 0x;
>> +   count -= range;
>> +   sector += range;
>> }
>
> I think the problem here is that I can now inject a buffer overflow
> via SG_IO.
>
>> used_bytes = ALIGN(i * 8, 512);
>> --
>> 2.9.3
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 1:55 PM,   wrote:
> From: Tom Yan 
>
> In commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with
> n_block that exceeds limit"), it is made sure that
> ata_set_lba_range_entries() will never be called with a request
> size (n_block) that is larger than the number of blocks that a
> 512-byte block TRIM payload can describe (65535 * 64 = 4194240),
> in addition to acknowlegding the SCSI/block layer with the same
> limit by advertising it as the Maximum Write Same Length.
>
> Therefore, it is unnecessary to hard code the same limit in
> ata_set_lba_range_entries() itself, which would only cost extra
> maintenance effort. Such effort can be noticed in, for example,
> commit 2983860c7668 ("libata-scsi: avoid repeated calculation of
> number of TRIM ranges").
>
> Signed-off-by: Tom Yan 
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index be9c76c..9b74ecb 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3322,7 +3322,7 @@ static unsigned int ata_scsi_write_same_xlat(struct 
> ata_queued_cmd *qc)
> buf = page_address(sg_page(scsi_sglist(scmd)));
>
> if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
> -   size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, 
> block, n_block);
> +   size = ata_set_lba_range_entries(buf, block, n_block);
> } else {
> fp = 2;
> goto invalid_fld;
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index adbc812..5e2e9ad 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -1077,19 +1077,19 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>   * TO NV CACHE PINNED SET.
>   */
>  static inline unsigned ata_set_lba_range_entries(void *_buffer,
> -   unsigned num, u64 sector, unsigned long count)
> +   u64 sector, unsigned long count)
>  {
> __le64 *buffer = _buffer;
> unsigned i = 0, used_bytes;
>
> -   while (i < num) {
> -   u64 entry = sector |
> -   ((u64)(count > 0x ? 0x : count) << 48);
> +   while (count > 0) {
> +   u64 range, entry;
> +
> +   range = count > 0x ? 0x : count;
> +   entry = sector | (range << 48);
> buffer[i++] = __cpu_to_le64(entry);
> -   if (count <= 0x)
> -   break;
> -   count -= 0x;
> -   sector += 0x;
> +   count -= range;
> +   sector += range;
> }

I think the problem here is that I can now inject a buffer overflow
via SG_IO.

> used_bytes = ALIGN(i * 8, 512);
> --
> 2.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ata: do not hard code limit in ata_set_lba_range_entries()

2016-08-22 Thread tom . ty89
From: Tom Yan 

In commit 5c79097a28c2 ("libata-scsi: reject WRITE SAME (16) with
n_block that exceeds limit"), it is made sure that
ata_set_lba_range_entries() will never be called with a request
size (n_block) that is larger than the number of blocks that a
512-byte block TRIM payload can describe (65535 * 64 = 4194240),
in addition to acknowlegding the SCSI/block layer with the same
limit by advertising it as the Maximum Write Same Length.

Therefore, it is unnecessary to hard code the same limit in
ata_set_lba_range_entries() itself, which would only cost extra
maintenance effort. Such effort can be noticed in, for example,
commit 2983860c7668 ("libata-scsi: avoid repeated calculation of
number of TRIM ranges").

Signed-off-by: Tom Yan 

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index be9c76c..9b74ecb 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3322,7 +3322,7 @@ static unsigned int ata_scsi_write_same_xlat(struct 
ata_queued_cmd *qc)
buf = page_address(sg_page(scsi_sglist(scmd)));
 
if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
-   size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, 
n_block);
+   size = ata_set_lba_range_entries(buf, block, n_block);
} else {
fp = 2;
goto invalid_fld;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index adbc812..5e2e9ad 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -1077,19 +1077,19 @@ static inline void ata_id_to_hd_driveid(u16 *id)
  * TO NV CACHE PINNED SET.
  */
 static inline unsigned ata_set_lba_range_entries(void *_buffer,
-   unsigned num, u64 sector, unsigned long count)
+   u64 sector, unsigned long count)
 {
__le64 *buffer = _buffer;
unsigned i = 0, used_bytes;
 
-   while (i < num) {
-   u64 entry = sector |
-   ((u64)(count > 0x ? 0x : count) << 48);
+   while (count > 0) {
+   u64 range, entry;
+
+   range = count > 0x ? 0x : count;
+   entry = sector | (range << 48);
buffer[i++] = __cpu_to_le64(entry);
-   if (count <= 0x)
-   break;
-   count -= 0x;
-   sector += 0x;
+   count -= range;
+   sector += range;
}
 
used_bytes = ALIGN(i * 8, 512);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/7] aacraid: Use memdup_user() rather than duplicating its implementation

2016-08-22 Thread David Carroll
> From: Markus Elfring 
> Date: Sat, 20 Aug 2016 20:05:24 +0200
> 
> Reuse existing functionality from memdup_user() instead of keeping duplicate
> source code.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/scsi/aacraid/commctrl.c | 12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
> index 5648b71..1af3084 100644
> --- a/drivers/scsi/aacraid/commctrl.c
> +++ b/drivers/scsi/aacraid/commctrl.c
> @@ -526,15 +526,9 @@ static int aac_send_raw_srb(struct aac_dev* dev, void
> __user * arg)
> goto cleanup;
> }
> 
> -   user_srbcmd = kmalloc(fibsize, GFP_KERNEL);
> -   if (!user_srbcmd) {
> -   dprintk((KERN_DEBUG"aacraid: Could not make a copy of the 
> srb\n"));
> -   rcode = -ENOMEM;
> -   goto cleanup;
> -   }
> -   if(copy_from_user(user_srbcmd, user_srb,fibsize)){
> -   dprintk((KERN_DEBUG"aacraid: Could not copy srb from 
> user\n"));
> -   rcode = -EFAULT;
> +   user_srbcmd = memdup_user(user_srb, fibsize);
> +   if (IS_ERR(user_srbcmd)) {
> +   rcode = PTR_ERR(user_srbcmd);
> goto cleanup;
> }
> 
> --

Hi Markus,

Patch 2/7 should precede Patch 1/7, as falling into kfree() would not look 
pretty.

Thanks, -Dave


Re: [PATCH 0/3] nvme-fabrics: Add NVME FC Transport support to lpfc

2016-08-22 Thread James Smart



On 8/11/2016 2:24 PM, Christoph Hellwig wrote:

On Fri, Jul 22, 2016 at 05:24:00PM -0700, James Smart wrote:

This patchset adds NVME support to the lpfc FC driver.  It reworks
the core driver for both NVME and SCSI protocol support, then adds the
nvme-over-fabrics host and target interfaces which connect to the
NVME FC transport lower-level api.

Patches are cut against the linux-nvme for-4.8/drivers branch

Before reviewing the details:  what's the plan for merging this as we'll
have dependencies on both the SCSI and the block tree?


Good question.   With the first round of lpfc patches, I tried to ensure 
we had all the cross-protocol infrastructure in place such that the 
driver can have 2 halves that effectively run independently.  I'm hoping 
it means that I can submit scsi stuff independent from nvme stuff, but 
we'll have to see how it goes over time.


I'm open to suggestions on how best to manage the 2 trees - e.g. scsi 
stuff goes in scsi tree, and if a dependency then has to be pulled into 
block ?  nvme stuff only to block. The initial lpfc patch set should 
have all the scsi infrastructure pulled into the 4.8 merge, so hopefully 
that dependency has been resolved already.


-- james


Any recommendations ?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] nvme-fabrics: Add FC LLDD loopback driver to test FC host and target transport

2016-08-22 Thread James Smart



On 8/11/2016 2:23 PM, Christoph Hellwig wrote:

On Fri, Jul 22, 2016 at 05:23:59PM -0700, James Smart wrote:

Add FC LLDD loopback driver to test FC host and target transport within
nvme-fabrics

To aid in the development and testing of the lower-level api of the FC
transport, this loopback driver has been created to act as if it were a
FC hba driver supporting both the host interfaces as well as the target
interfaces with the nvme FC transport.

A proper FC loop device should support SCSI and NVMe.  Any chance to
morph it into one that does both?


Nope. Not really appropriate for this module, whose goals were to test 
the fc-nvme interfaces. SCSI is a whole different beast.


-- james

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] nvme-fabrics: Add target FC transport support

2016-08-22 Thread James Smart



On 8/11/2016 2:22 PM, Christoph Hellwig wrote:



+   queue->work_q = alloc_workqueue("ntfc%d.%d.%d", 0, 0,
+   assoc->tgtport->fc_target_port.port_num,
+   assoc->a_id, qid);

Do we really need a workqueue per queue?


I thought we did - so there was always an execution context for the 
nvmet layer when a command is received. My understanding was the nvmet 
layer processes the cmd synchronously. I avoided per cpu - as I assumed 
the synchronous flows would end up blocking other connections to other 
subsystems/controllers.


If that's not the case - can you describe the nvmet layer better so I 
understand what we need ?





No reference counting?



yes - will make a pass at it.

-- james

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] nvme-fabrics: Add host FC transport support

2016-08-22 Thread James Smart



Are you going to send a FC support patch for nvme-cli as well?


I will, but concentrating on core support for now. I expect a fair 
number of things need to be done in the cli.






+   assoc_rqst->assoc_cmd.ersp_ratio = cpu_to_be16(ersp_ratio);
+   assoc_rqst->assoc_cmd.sqsize = cpu_to_be16(qsize);
+   /* TODO:
+* assoc_rqst->assoc_cmd.cntlid = cpu_to_be16(?);
+* strncpy(assoc_rqst->assoc_cmd.hostid, ?,
+*  min(FCNVME_ASSOC_HOSTID_LEN, NVMF_NQN_SIZE));
+* strncpy(assoc_rqst->assoc_cmd.hostnqn, ?,
+*  min(FCNVME_ASSOC_HOSTNQN_LEN, NVMF_NQN_SIZE];
+*/

What is the problem with filling all these out based on the information
in the nvme_host structure?


Well, I want to get them from the same place that the fabrics routines 
do when they build a capsule. So they should come from the opts struct. 
It's missing the controller id, so I'll need to address that.





This probably needs to be moved, similar to the patch Sagi just
did for RDMA.  In general it might be a good idea to look at the
various recent RDMA changes and check if they need to be brought over.


yeah - I typically do a resync every time I repost.





+   /*
+* TODO: blk_integrity_rq(rq)  for DIX
+*/

We'll hopefully be able to just move the DIX handling to common code.
Any help appreciated..


Yes and noand likely a separate discussion.  If DIF is passed, 
you should involve the host port to validate on egress as well has what 
the target controller will do. Although it brings up some headaches if 
the host detects an error (e.g. protocol requires target to be part of 
it).  If DIF is inserted, the question will be whether you want to 
insert DIF at the egress point of the host port or do you pass it over 
the fabric w/o dif and have the tgt controller insert it.  I assume 
we'll need to talk more.




This seems to miss reference counting on the lport and rport structure.

yes - an area of weakness right now.

It would be good to sort this out before merging. 


I'll make a pass at it.

-- james


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] nvme-fabrics: Add FC LLDD loopback driver to test FC host and target transport

2016-08-22 Thread James Smart



On 8/2/2016 5:15 PM, J Freyensee wrote:



  +int
+fcloop_fcp_op(struct nvmet_fc_target_port *tgtport,
+   struct nvmefc_tgt_fcp_req *tgt_fcpreq)
+{
+   ...
+
+   return 0;

if this function returns an 'int', why would it always return 0 and not
the fcp_err values (if there is an error)?


I'll look at it. The reason it's carried in the tgt_fcpreq is that a 
real LLDD would likely return from the fcp_op, and report an error later 
on. I'll see if things should be reorg'd.






+struct nvme_fc_port_template fctemplate = {
+   .create_queue   = fcloop_create_queue,
+   .delete_queue   = fcloop_delete_queue,
+   .ls_req = fcloop_ls_req,
+   .fcp_io = fcloop_fcp_req,
+   .ls_abort   = fcloop_ls_abort,
+   .fcp_abort  = fcloop_fcp_abort,
+
+   .max_hw_queues  = 1,
+   .max_sgl_segments = 256,
+   .max_dif_sgl_segments = 256,
+   .dma_boundary = 0x,

Between here and "struct nvmet_fc_target_template tgttemplate" they are
assigning the same magic values to the same variable names, so why not
have these values as #defines for a tad easier maintainability?


sure.





+static ssize_t
+fcloop_create_local_port(struct device *dev, struct device_attribute
*attr,
+   const char *buf, size_t count)
+{
+   struct nvme_fc_port_info pinfo;
+   struct fcloop_ctrl_options *opts;
+   struct nvme_fc_local_port *localport;
+   struct fcloop_lport *lport;
+   int ret;
+
+   opts = kzalloc(sizeof(*opts), GFP_KERNEL);
+   if (!opts)
+   return -ENOMEM;
+
+   ret = fcloop_parse_options(opts, buf);
+   if (ret)
+   goto out_free_opts;
+
+   /* everything there ? */
+   if ((opts->mask & LPORT_OPTS) != LPORT_OPTS) {
+   ret = -EINVAL;
+   goto out_free_opts;
+   }
+
+   pinfo.fabric_name = opts->fabric;
+   pinfo.node_name = opts->wwnn;
+   pinfo.port_name = opts->wwpn;
+   pinfo.port_role = opts->roles;
+   pinfo.port_id = opts->fcaddr;
+
+   ret = nvme_fc_register_localport(, , NULL,
);
+   if (!ret) {
+   /* success */
+   lport = localport->private;
+   lport->localport = localport;
+   INIT_LIST_HEAD(>list);
+   INIT_LIST_HEAD(>rport_list);
+   list_add_tail(>list, _lports);
+
+   /* mark all of the input buffer consumed */
+   ret = count;
+   }
+
+out_free_opts:
+   kfree(opts);
+   return ret;
+}
+
+static int __delete_local_port(struct fcloop_lport *lport)
+{
+   int ret;
+
+   if (!list_empty(>rport_list))
+   return -EBUSY;
+
+   list_del(>list);
+

Is a mutex or locking mechanism not needed here for this list?


Yeah - could probably use. As it was mainly a test tool with a 
controlled sequence, I think I ignored it.


-- james


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] nvme-fabrics: Add target FC transport support

2016-08-22 Thread James Smart



On 8/1/2016 9:37 PM, J Freyensee wrote:



+static void
+nvmet_fc_free_target_queue(struct nvmet_fc_tgt_queue *queue)
+{
+   struct nvmet_fc_tgtport *tgtport = queue->assoc->tgtport;
+   unsigned long flags;
+
+   /*
+* beware: nvmet layer hangs waiting for a completion if
+* connect command failed
+*/
+   flush_workqueue(queue->work_q);
+   if (queue->connected)
+   nvmet_sq_destroy(>nvme_sq);

I was wondering if there is any way for this fc target layer to fake
send an NVMe completion to the nvmet layer to prevent a nvmet layer
hang (because I'm assuming the nvmet layer hangs because it never
receives a connect completion upon failure here), then send a signal to
tear down the sq.

Or alternatively call nvmet_ctrl_fatal_error() if connect fails as a
trial/alternative to having the nvmet layer hang?


I haven't tried to fix the bugs in the other layers - enough to do in FC 
already.  Agree to look into this.





+
+/**
+ * nvme_fc_register_targetport - transport entry point called by an
+ *  LLDD to register the existence of a
local
+ *  NVME subystem FC port.
+ * @pinfo: pointer to information about the port to be
registered
+ * @template:  LLDD entrypoints and operational parameters for the
port
+ * @dev:   physical hardware device node port corresponds to.
Will be
+ * used for DMA mappings
+ * @tgtport_p:   pointer to a local port pointer. Upon success, the


looks like the variable tgtport_p does not exist (or it's now called
portptr)?


Yeah - comment not in sync with the code's name.





+/*
+ * Actual processing routine for received FC-NVME LS Requests from
the LLD
+ */
+void
+nvmet_fc_handle_ls_rqst(struct nvmet_fc_tgtport *tgtport,
+   struct nvmet_fc_ls_iod *iod)
+{
+   struct fcnvme_ls_rqst_w0 *w0 =
+   (struct fcnvme_ls_rqst_w0 *)iod->rqstbuf;
+
+   iod->lsreq->nvmet_fc_private = iod;
+   iod->lsreq->rspbuf = iod->rspbuf;
+   iod->lsreq->rspdma = iod->rspdma;
+   iod->lsreq->done = nvmet_fc_xmt_ls_rsp_done;
+   /* Be preventative. handlers will later set to valid length
*/
+   iod->lsreq->rsplen = 0;
+
+   iod->assoc = NULL;
+
+   /*
+* handlers:
+*   parse request input, set up nvmet req (cmd, rsp,
  execute)
+*   and format the LS response
+* if non-zero returned, then no futher action taken on the
LS
+* if zero:
+*   valid to call nvmet layer if execute routine set
+*   iod->rspbuf contains ls response
+*/
+   switch (w0->ls_cmd) {
+   case FCNVME_LS_CREATE_ASSOCIATION:
+   /* Creates Association and initial Admin
Queue/Connection */
+   nvmet_fc_ls_create_association(tgtport, iod);
+   break;
+   case FCNVME_LS_CREATE_CONNECTION:
+   /* Creates an IO Queue/Connection */
+   nvmet_fc_ls_create_connection(tgtport, iod);
+   break;
+   case FCNVME_LS_DISCONNECT:
+   /* Terminate a Queue/Connection or the Association
*/
+   nvmet_fc_ls_disconnect(tgtport, iod);
+   break;
+   default:
+   iod->lsreq->rsplen = nvmet_fc_format_rjt(iod
->rspbuf,
+   NVME_FC_MAX_LS_BUFFER_SIZE, w0
->ls_cmd,
+   LSRJT_REASON_INVALID_ELS_CODE,
+   LSRJT_EXPL_NO_EXPLANATION, 0);
+   }
+
+   nvmet_fc_xmt_ls_rsp(tgtport, iod);
+}

All the functions in the case() (nvmet_fc_ls_disconnect(),
nvmet_fc_ls_create_association(), etc)  have internal return values
(errors and certain values), I'm curious why you wouldn't want to
bubble up those values through the function calling chain?  Especially
since there is a comment in the function above that says "if non-zero
returned, then no futher action taken on the LS".


just style. they all generate a response payload that is then always 
sent, so returning a code didn't really mean much.  The comment is old. 
They used to return the value, but as you can see, they don't now. Will 
clean it up.





.

+
+static int __init nvmet_fc_init_module(void)
+{
+   /* ensure NVMET_NR_QUEUES is a power of 2 - required for our
masks */
+   if (!is_power_of_2((unsigned long)NVMET_NR_QUEUES)) {
+   pr_err("%s: NVMET_NR_QUEUES required to be power of
2\n",
+   __func__);

If this is so important that NVMET_NR_QUEUES always be a power of two
for this FC driver, I'd have the function print out the value in the
error message for easier diagnosis.


no problem.




And why mask the sign of NVMET_NR_QUEUES?  Yes, it would have to be a
really careless programmer error that would be caught very quick if the
#define became negative, but masking out the sign of a #define value
that seems to be pretty important for FC transport functionality makes
me a tad 

Re: [PATCH 3/5] nvme-fabrics: Add host FC transport support

2016-08-22 Thread James Smart



On 7/29/2016 3:10 PM, J Freyensee wrote:

+   /* TODO:
+* assoc_rqst->assoc_cmd.cntlid = cpu_to_be16(?);
+* strncpy(assoc_rqst->assoc_cmd.hostid, ?,
+*  min(FCNVME_ASSOC_HOSTID_LEN, NVMF_NQN_SIZE));
+* strncpy(assoc_rqst->assoc_cmd.hostnqn, ?,
+*  min(FCNVME_ASSOC_HOSTNQN_LEN, NVMF_NQN_SIZE];
+*/

What is the TODO here?


main todo is that the ctrlid is not known yet - not set in the opts 
struct yet. It's set within the fabrics connect
routines that build the capsule, which are called after this transport 
action.  So the todo is to modify
the core to add it to the opts struct as well. Then these lines can be 
added in.






more snip...



+
+static int
+nvme_fc_init_queue(struct nvme_fc_ctrl *ctrl, int idx, size_t
queue_size)
+{
+   ...
+
+   return 0;

Slightly confused.  Looks like in nvme_fc_configure_admin_queue() and
nvme_fc_init_io_queues() check for this function returning an error,
but nvme_fc_init_queue() never returns anything but 0.  Should it
return an error?  Does the comments above imply that this function
could change in the future such that it would return something other
than 0?

more more snip...


+
+static int
+nvme_fc_init_io_queues(struct nvme_fc_ctrl *ctrl)
+{
+   ...
+
+   return 0;

Right now as-is nvme_fc_init_queue() will always return 0, but this
function is hard-coded to return 0.  Independent of what
nvme_fc_init_queue() returns, this function should be returning 'ret'
as "nvme_fc_create_io_queues()" has code to check if this function
fails:


+static int
+nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
+{

.
.
.

+   dev_info(ctrl->ctrl.device, "creating %d I/O queues.\n",
+   opts->nr_io_queues);
+
+   ret = nvme_fc_init_io_queues(ctrl);
+   if (ret)
+   return ret;
+


Well, at the time it was written, it wasn't clear it would always return 
0. As it does, I'll see about cleaning it up.






+static int
+nvme_fc_start_fcp_op(struct nvme_fc_ctrl *ctrl, struct nvme_fc_queue
*queue,
+   struct nvme_fc_fcp_op *op, u32 data_len,
+   enum nvmefc_fcp_datadir io_dir)
+{
+   ...
+
+   if (op->rq)
+   blk_mq_start_request(op->rq);
+
+   ret = ctrl->lport->ops->fcp_io(>lport->localport,
+   >rport->remoteport,
+   queue->lldd_handle, 
->fcp_req);
+
+   if (ret) {
+   dev_err(ctrl->dev,
+   "Send nvme command failed - lldd returned
%d.\n", ret);

see a bit lower for a comment on this if(ret) evaluating to TRUE.


+
+   if (op->rq) {/* normal
request */
+   nvme_fc_unmap_data(ctrl, op->rq, op);
+   nvme_cleanup_cmd(op->rq);
+   if (ret != -EBUSY) {
+   /* complete the io w/ error status
*/
+   blk_mq_complete_request(op->rq,
+   NVME_SC_FC_TRANSPORT
_ERROR);
+   } else {
+   blk_mq_stop_hw_queues(op->rq->q);
+   nvme_requeue_req(op->rq);
+   blk_mq_delay_queue(queue->hctx,
+   NVMEFC_QUEUE_DELAY);
+   }
+   } else {/* aen */
+   struct nvme_completion *cqe = 
->rsp_iu.cqe;
+
+   cqe->status = (NVME_SC_FC_TRANSPORT_ERROR <<
1);
+   nvme_complete_async_event(>ctrl
->ctrl, cqe);
+   }
+   }
+
+   return BLK_MQ_RQ_QUEUE_OK;

Is this right?  We want to return 'BLK_MQ_RQ_QUEUE_OK' and not
something to signify the 'ret' value was not 0?


yeah - this could probably use fixing. I should be returning 
BLK_MQ_RQ_QUEUE_BUSY for some of the cases and BLK_MQ_RQ_QUEUE_ERROR on 
some others.






+static struct nvme_ctrl *
+__nvme_fc_create_ctrl(struct device *dev, struct nvmf_ctrl_options
*opts,
+   struct nvme_fc_lport *lport, struct nvme_fc_rport *rport)
+{
+   struct nvme_fc_ctrl *ctrl;
+   int ret;
+   bool changed;
+
+   ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
+   if (!ctrl)
+   return ERR_PTR(-ENOMEM);
+   ctrl->ctrl.opts = opts;
+   INIT_LIST_HEAD(>ctrl_list);
+   ctrl->lport = lport;
+   ctrl->l_id = lport->localport.port_num;
+   ctrl->rport = rport;
+   ctrl->r_id = rport->remoteport.port_num;
+   ctrl->dev = lport->dev;
+
+   ret = nvme_init_ctrl(>ctrl, dev, _fc_ctrl_ops,
0);
+   if (ret)
+   goto out_free_ctrl;
+
+   INIT_WORK(>delete_work, nvme_fc_del_ctrl_work);
+   spin_lock_init(>lock);
+
+   /* io queue count */
+   ctrl->queue_count = min_t(unsigned int,
+   opts->nr_io_queues,
+   

Re: [PATCH 2/2] Migrate zone cache from RB-Tree to arrays of descriptors

2016-08-22 Thread Shaun Tancheff
On Mon, Aug 22, 2016 at 2:11 AM, Hannes Reinecke  wrote:
> On 08/22/2016 06:34 AM, Shaun Tancheff wrote:
>> Currently the RB-Tree zone cache is fast and flexible. It does
>> use a rather largish amount of ram. This model reduces the ram
>> required from 120 bytes per zone to 16 bytes per zone with a
>> moderate transformation of the blk_zone_lookup() api.
>>
>> This model is predicated on the belief that most variations
>> on zoned media will follow a pattern of using collections of same
>> sized zones on a single device. Similar to the pattern of erase
>> blocks on flash devices being progressivly larger 16K, 64K, ...
>>
>> The goal is to be able to build a descriptor which is both memory
>> efficient, performant, and flexible.
>>
>> Signed-off-by: Shaun Tancheff 
>> ---
>>  block/blk-core.c   |2 +-
>>  block/blk-sysfs.c  |   31 +-
>>  block/blk-zoned.c  |  103 +++--
>>  drivers/scsi/sd.c  |5 +-
>>  drivers/scsi/sd.h  |4 +-
>>  drivers/scsi/sd_zbc.c  | 1025 
>> +++-
>>  include/linux/blkdev.h |   82 +++-
>>  7 files changed, 716 insertions(+), 536 deletions(-)

> Have you measure the performance impact here?

As far as actual hardware (HostAware) I am seeing the same
I/O performance. I suspect its just that below 100k iops the
zone cache just isn't a bottleneck.

> The main idea behind using an RB-tree is that each single element will
> fit in the CPU cache; using an array will prevent that.
> So we will increase the number of cache flushes, and most likely a
> performance penalty, too.
> Hence I'd rather like to see a performance measurement here before going
> down that road.

I think it will have to be a simulated benchmark, if that's okay.

Of course I'm open to suggestions if there is something you have in mind.
-- 
Regards,
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi:Prevent deletion of SCSI block device in use

2016-08-22 Thread kbuild test robot
Hi Vasundhara,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.8-rc3 next-20160822]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Vasundhara-Gurunath/scsi-Prevent-deletion-of-SCSI-block-device-in-use/20160819-184126
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: x86_64-randconfig-s2-08191801 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/scsi/scsi_sysfs.c: In function 'sdev_store_delete':
>> drivers/scsi/scsi_sysfs.c:744: warning: format '%d' expects type 'int', but 
>> argument 5 has type 'long int'

vim +744 drivers/scsi/scsi_sysfs.c

   728  sdev->delete_msg_buf =
   729  kmalloc(128, GFP_KERNEL);
   730  memset(sdev->delete_msg_buf, 0, 128);
   731  }
   732  do_gettimeofday();
   733  time_to_tm(tv.tv_sec, 0, );
   734  sprintf(sdev->delete_msg_buf,
   735  "Last delete attempt: %d:%d:%d 
%02d:%02d\n"
   736  "Open Count : %d\n"
   737  "IO Active Count : %d\n"
   738  "IO Done Count : %d\n",
   739  tms.tm_mday, tms.tm_mon + 1,
   740  tms.tm_year + 1900,
   741  tms.tm_hour, tms.tm_min,
   742  sdev->usage_count,
   743  sdev->iorequest_cnt.counter,
 > 744  sdev->iodone_cnt.counter);
   745  
   746  return count;
   747  }
   748  }
   749  
   750  if (device_remove_file_self(dev, attr))
   751  scsi_remove_device(to_scsi_device(dev));
   752  return count;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


RE: [PATCH] megaraid_sas: Use memdup_user() rather than duplicating its implementation

2016-08-22 Thread Sumit Saxena
>-Original Message-
>From: SF Markus Elfring [mailto:elfr...@users.sourceforge.net]
>Sent: Sunday, August 21, 2016 2:19 PM
>To: linux-scsi@vger.kernel.org; megaraidlinux@avagotech.com; James E.
>J.
>Bottomley; Kashyap Desai; Martin K. Petersen; Sumit Saxena; Uday Lingala
>Cc: LKML; kernel-janit...@vger.kernel.org; Julia Lawall
>Subject: [PATCH] megaraid_sas: Use memdup_user() rather than duplicating
>its
>implementation
>
>From: Markus Elfring 
>Date: Sun, 21 Aug 2016 10:39:04 +0200
>
>Reuse existing functionality from memdup_user() instead of keeping
>duplicate
>source code.
>
>This issue was detected by using the Coccinelle software.
>
>Signed-off-by: Markus Elfring 
>---
> drivers/scsi/megaraid/megaraid_sas_base.c | 11 +++
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>b/drivers/scsi/megaraid/megaraid_sas_base.c
>index c1ed25a..9a2fe4e 100644
>--- a/drivers/scsi/megaraid/megaraid_sas_base.c
>+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>@@ -6711,14 +6711,9 @@ static int megasas_mgmt_ioctl_fw(struct file *file,
>unsigned long arg)
>   unsigned long flags;
>   u32 wait_time = MEGASAS_RESET_WAIT_TIME;
>
>-  ioc = kmalloc(sizeof(*ioc), GFP_KERNEL);
>-  if (!ioc)
>-  return -ENOMEM;
>-
>-  if (copy_from_user(ioc, user_ioc, sizeof(*ioc))) {
>-  error = -EFAULT;
>-  goto out_kfree_ioc;
>-  }
>+  ioc = memdup_user(user_ioc, sizeof(*ioc));
>+  if (IS_ERR(ioc))
>+  return PTR_ERR(ioc);
>
>   instance = megasas_lookup_instance(ioc->host_no);
>   if (!instance) {

Acked by: Sumit Saxena 

>--
>2.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] ses: use scsi_is_sas_rphy instead of is_sas_attached

2016-08-22 Thread kbuild test robot
Hi Johannes,

[auto build test ERROR on scsi/for-next]
[also build test ERROR on v4.8-rc3]
[cannot apply to next-20160822]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Johannes-Thumshirn/Fix-panic-when-a-SES-device-is-attached-to-a-hpsa-logical-volume/20160815-231901
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: x86_64-randconfig-n0-08182202 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ses_match_to_enclosure':
>> ses.c:(.text+0x548dae): undefined reference to `scsi_is_sas_rphy'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH 1/6] target: fix hang in target_wait_for_sess_cmds()

2016-08-22 Thread Hannes Reinecke
When shutting down a session I'm seeing this hung_task message:

INFO: task kworker/u128:0:6 blocked for more than 480 seconds.
  Tainted: GE   N  4.4.17-default+ #241
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u128:0  D 880237615c00 0 6  2 0x
 Workqueue: fc_rport_eq fc_rport_work [libfc]
 88017edd7c48 81c11500 88017edd0180 88017edd8000
 88035b6860a8 88017edd0180 88036461eac0 
 88017edd7c60 815db4d5 7fff 88017edd7d10
Call Trace:
 [] schedule+0x35/0x80
 [] schedule_timeout+0x237/0x2d0
 [] wait_for_completion+0xa3/0x110
 [] target_wait_for_sess_cmds+0x45/0x190 [target_core_mod]
 [] ft_close_sess+0x24/0x30 [tcm_fc]
 [] ft_prlo+0x8a/0x95 [tcm_fc]
 [] fc_rport_work+0x3b8/0x7c0 [libfc]
 [] process_one_work+0x14e/0x410
 [] worker_thread+0x116/0x490

Looking at the code it looks as if there is some confusion about
when to call 'wait_for_completion(>cmd_wait_comp).
The problem is that there are _two_ sections where the code
waits for 'cmd_wait_comp' completion, but the completion
is only called with 'complete', not 'complete_all'.
So we cannot have both waiting for it, doing so will lead
to a stuck wait_for_completion.

Signed-off-by: Hannes Reinecke 
---
 drivers/target/target_core_transport.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 6094a6b..2e1a6d8 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2522,7 +2522,7 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int 
wait_for_tasks)
 * the remaining calls to target_put_sess_cmd(), and not the
 * callers of this function.
 */
-   if (aborted) {
+   if (aborted && !cmd->cmd_wait_set) {
pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
wait_for_completion(>cmd_wait_comp);
cmd->se_tfo->release_cmd(cmd);
@@ -2642,9 +2642,11 @@ void target_sess_cmd_list_set_waiting(struct se_session 
*se_sess)
list_for_each_entry(se_cmd, _sess->sess_wait_list, se_cmd_list) {
rc = kref_get_unless_zero(_cmd->cmd_kref);
if (rc) {
-   se_cmd->cmd_wait_set = 1;
spin_lock(_cmd->t_state_lock);
-   se_cmd->transport_state |= CMD_T_FABRIC_STOP;
+   if (!(se_cmd->transport_state & CMD_T_FABRIC_STOP)) {
+   se_cmd->cmd_wait_set = 1;
+   se_cmd->transport_state |= CMD_T_FABRIC_STOP;
+   }
spin_unlock(_cmd->t_state_lock);
}
}
@@ -2664,24 +2666,26 @@ void target_wait_for_sess_cmds(struct se_session 
*se_sess)
 
list_for_each_entry_safe(se_cmd, tmp_cmd,
_sess->sess_wait_list, se_cmd_list) {
+   int cmd_wait;
pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:"
" %d\n", se_cmd, se_cmd->t_state,
se_cmd->se_tfo->get_cmd_state(se_cmd));
 
spin_lock_irqsave(_cmd->t_state_lock, flags);
tas = (se_cmd->transport_state & CMD_T_TAS);
+   cmd_wait = se_cmd->cmd_wait_set;
spin_unlock_irqrestore(_cmd->t_state_lock, flags);
-
if (!target_put_sess_cmd(se_cmd)) {
if (tas)
target_put_sess_cmd(se_cmd);
}
-
+   if (!cmd_wait)
+   continue;
wait_for_completion(_cmd->cmd_wait_comp);
pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d"
" fabric state: %d\n", se_cmd, se_cmd->t_state,
se_cmd->se_tfo->get_cmd_state(se_cmd));
-
+   se_cmd->cmd_wait_set = 0;
se_cmd->se_tfo->release_cmd(se_cmd);
}
 
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting()

2016-08-22 Thread Hannes Reinecke
target_sess_cmd_list_waiting() might hit on a condition where
the kref for the command is already 0, but the destructor has
not been called yet (or is stuck in waiting for a spin lock).
Rather than leaving the command on the list we should explicitly
remove it to avoid race issues later on.

Signed-off-by: Hannes Reinecke 
---
 drivers/target/target_core_transport.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 2e1a6d8..ce136f0 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2547,8 +2547,8 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool 
ack_kref)
 * fabric acknowledgement that requires two target_put_sess_cmd()
 * invocations before se_cmd descriptor release.
 */
-   if (ack_kref)
-   kref_get(_cmd->cmd_kref);
+   if (ack_kref && !kref_get_unless_zero(_cmd->cmd_kref))
+   return -EINVAL;
 
spin_lock_irqsave(_sess->sess_cmd_lock, flags);
if (se_sess->sess_tearing_down) {
@@ -2627,7 +2627,7 @@ EXPORT_SYMBOL(target_put_sess_cmd);
  */
 void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
 {
-   struct se_cmd *se_cmd;
+   struct se_cmd *se_cmd, *tmp_cmd;
unsigned long flags;
int rc;
 
@@ -2639,7 +2639,8 @@ void target_sess_cmd_list_set_waiting(struct se_session 
*se_sess)
se_sess->sess_tearing_down = 1;
list_splice_init(_sess->sess_cmd_list, _sess->sess_wait_list);
 
-   list_for_each_entry(se_cmd, _sess->sess_wait_list, se_cmd_list) {
+   list_for_each_entry_safe(se_cmd, tmp_cmd,
+_sess->sess_wait_list, se_cmd_list) {
rc = kref_get_unless_zero(_cmd->cmd_kref);
if (rc) {
spin_lock(_cmd->t_state_lock);
@@ -2648,7 +2649,8 @@ void target_sess_cmd_list_set_waiting(struct se_session 
*se_sess)
se_cmd->transport_state |= CMD_T_FABRIC_STOP;
}
spin_unlock(_cmd->t_state_lock);
-   }
+   } else
+   list_del_init(_cmd->se_cmd_list);
}
 
spin_unlock_irqrestore(_sess->sess_cmd_lock, flags);
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage

2016-08-22 Thread Hannes Reinecke
Update the debug statements to match those from libfc.

Signed-off-by: Hannes Reinecke 
---
 drivers/target/tcm_fc/tfc_sess.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/target/tcm_fc/tfc_sess.c b/drivers/target/tcm_fc/tfc_sess.c
index cc8c261..fd5c3de 100644
--- a/drivers/target/tcm_fc/tfc_sess.c
+++ b/drivers/target/tcm_fc/tfc_sess.c
@@ -39,6 +39,11 @@
 
 #include "tcm_fc.h"
 
+#define TFC_SESS_DBG(lport, fmt, args...) \
+   pr_debug("host%u: rport %6.6x: " fmt,  \
+(lport)->host->host_no,   \
+(lport)->port_id, ##args )
+
 static void ft_sess_delete_all(struct ft_tport *);
 
 /*
@@ -167,24 +172,29 @@ static struct ft_sess *ft_sess_get(struct fc_lport 
*lport, u32 port_id)
struct ft_tport *tport;
struct hlist_head *head;
struct ft_sess *sess;
+   char *reason = "no session created";
 
rcu_read_lock();
tport = rcu_dereference(lport->prov[FC_TYPE_FCP]);
-   if (!tport)
+   if (!tport) {
+   reason = "not an FCP port";
goto out;
+   }
 
head = >hash[ft_sess_hash(port_id)];
hlist_for_each_entry_rcu(sess, head, hash) {
if (sess->port_id == port_id) {
kref_get(>kref);
rcu_read_unlock();
-   pr_debug("port_id %x found %p\n", port_id, sess);
+   TFC_SESS_DBG(lport, "port_id %x found %p\n",
+port_id, sess);
return sess;
}
}
 out:
rcu_read_unlock();
-   pr_debug("port_id %x not found\n", port_id);
+   TFC_SESS_DBG(lport, "port_id %x not found, %s\n",
+port_id, reason);
return NULL;
 }
 
@@ -195,7 +205,7 @@ static int ft_sess_alloc_cb(struct se_portal_group *se_tpg,
struct ft_tport *tport = sess->tport;
struct hlist_head *head = >hash[ft_sess_hash(sess->port_id)];
 
-   pr_debug("port_id %x sess %p\n", sess->port_id, sess);
+   TFC_SESS_DBG(tport->lport, "port_id %x sess %p\n", sess->port_id, sess);
hlist_add_head_rcu(>hash, head);
tport->sess_count++;
 
@@ -320,7 +330,7 @@ void ft_sess_close(struct se_session *se_sess)
mutex_unlock(_lport_lock);
return;
}
-   pr_debug("port_id %x\n", port_id);
+   TFC_SESS_DBG(sess->tport->lport, "port_id %x close session\n", port_id);
ft_sess_unhash(sess);
mutex_unlock(_lport_lock);
ft_close_sess(sess);
@@ -380,8 +390,13 @@ static int ft_prli_locked(struct fc_rport_priv *rdata, u32 
spp_len,
if (!(fcp_parm & FCP_SPPF_INIT_FCN))
return FC_SPP_RESP_CONF;
sess = ft_sess_create(tport, rdata->ids.port_id, rdata);
-   if (!sess)
-   return FC_SPP_RESP_RES;
+   if (IS_ERR(sess)) {
+   if (PTR_ERR(sess) == -EACCES) {
+   spp->spp_flags &= ~FC_SPP_EST_IMG_PAIR;
+   return FC_SPP_RESP_CONF;
+   } else
+   return FC_SPP_RESP_RES;
+   }
if (!sess->params)
rdata->prli_count++;
sess->params = fcp_parm;
@@ -424,8 +439,8 @@ static int ft_prli(struct fc_rport_priv *rdata, u32 spp_len,
mutex_lock(_lport_lock);
ret = ft_prli_locked(rdata, spp_len, rspp, spp);
mutex_unlock(_lport_lock);
-   pr_debug("port_id %x flags %x ret %x\n",
-  rdata->ids.port_id, rspp ? rspp->spp_flags : 0, ret);
+   TFC_SESS_DBG(rdata->local_port, "port_id %x flags %x ret %x\n",
+rdata->ids.port_id, rspp ? rspp->spp_flags : 0, ret);
return ret;
 }
 
@@ -478,11 +493,11 @@ static void ft_recv(struct fc_lport *lport, struct 
fc_frame *fp)
struct ft_sess *sess;
u32 sid = fc_frame_sid(fp);
 
-   pr_debug("sid %x\n", sid);
+   TFC_SESS_DBG(lport, "recv sid %x\n", sid);
 
sess = ft_sess_get(lport, sid);
if (!sess) {
-   pr_debug("sid %x sess lookup failed\n", sid);
+   TFC_SESS_DBG(lport, "sid %x sess lookup failed\n", sid);
/* TBD XXX - if FCP_CMND, send PRLO */
fc_frame_free(fp);
return;
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] target/tcm_fc: print command pointer in debug message

2016-08-22 Thread Hannes Reinecke
When allocating a new command we should add the pointer to the
debug statements; that allows us to match this with other debug
statements for handling data.

Signed-off-by: Hannes Reinecke 
---
 drivers/target/tcm_fc/tfc_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 216e18c..36f0864 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -575,7 +575,7 @@ static void ft_send_work(struct work_struct *work)
  TARGET_SCF_ACK_KREF))
goto err;
 
-   pr_debug("r_ctl %x alloc target_submit_cmd\n", fh->fh_r_ctl);
+   pr_debug("r_ctl %x target_submit_cmd %p\n", fh->fh_r_ctl, cmd);
return;
 
 err:
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] target/tcm_fc: use CPU affinity for responses

2016-08-22 Thread Hannes Reinecke
The libfc stack assigns exchange IDs based on the CPU the request
was received on, so we need to send the responses via the same CPU.
Otherwise the send logic gets confuses and responses will be delayed,
causing exchange timeouts on the initiator side.

Signed-off-by: Hannes Reinecke 
---
 drivers/target/tcm_fc/tfc_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 36f0864..ff5de9a 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -572,7 +572,7 @@ static void ft_send_work(struct work_struct *work)
if (target_submit_cmd(>se_cmd, cmd->sess->se_sess, fcp->fc_cdb,
  >ft_sense_buffer[0], 
scsilun_to_int(>fc_lun),
  ntohl(fcp->fc_dl), task_attr, data_dir,
- TARGET_SCF_ACK_KREF))
+ TARGET_SCF_ACK_KREF | TARGET_SCF_USE_CPUID))
goto err;
 
pr_debug("r_ctl %x target_submit_cmd %p\n", fh->fh_r_ctl, cmd);
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] target/tcm_fc: return detailed error in ft_sess_create()

2016-08-22 Thread Hannes Reinecke
Not every failure is due to out-of-memory; the ACLs might not be
set, too. So return a detailed error code in ft_sess_create()
instead of just a NULL pointer.

Signed-off-by: Hannes Reinecke 
---
 drivers/target/tcm_fc/tfc_sess.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/target/tcm_fc/tfc_sess.c b/drivers/target/tcm_fc/tfc_sess.c
index 6ffbb60..cc8c261 100644
--- a/drivers/target/tcm_fc/tfc_sess.c
+++ b/drivers/target/tcm_fc/tfc_sess.c
@@ -223,7 +223,7 @@ static struct ft_sess *ft_sess_create(struct ft_tport 
*tport, u32 port_id,
 
sess = kzalloc(sizeof(*sess), GFP_KERNEL);
if (!sess)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
 
kref_init(>kref); /* ref for table entry */
sess->tport = tport;
@@ -234,8 +234,9 @@ static struct ft_sess *ft_sess_create(struct ft_tport 
*tport, u32 port_id,
 TARGET_PROT_NORMAL, 
[0],
 sess, ft_sess_alloc_cb);
if (IS_ERR(sess->se_sess)) {
+   int rc = PTR_ERR(sess->se_sess);
kfree(sess);
-   return NULL;
+   sess = ERR_PTR(rc);
}
return sess;
 }
-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] tcm_fc fixes

2016-08-22 Thread Hannes Reinecke
Hi Nic,

here's a set of patches and fixes I've found during debugging
FCoE over virtio.
The first two fix a race condition I've encountered during starting
and stopping initiators; they are similar to your prior patches,
but I still think they should be applied to avoid any race issues.
The next three are just minor cleanups, but the last one is
_absolutely_ crucial.
With the current code tcm_fc will behave _very_ odd if you're
running in a virtualized environment, as the xid allocator in libfc
will be thouroughly thrashed by sending frames on _other_ cpus than
those the originator frame got received on.
This causes frames to be delayed (by up to several minutes)
leading to I/O errors on the initiator side.

As usual, comments and reviews are welcome.

Hannes Reinecke (6):
  target: fix hang in target_wait_for_sess_cmds()
  target: fix potential race window in target_sess_cmd_list_waiting()
  target/tcm_fc: print command pointer in debug message
  target/tcm_fc: return detailed error in ft_sess_create()
  target/tcm_fc: Update debugging statements to match libfc usage
  target/tcm_fc: use CPU affinity for responses

 drivers/target/target_core_transport.c | 28 ++-
 drivers/target/tcm_fc/tfc_cmd.c|  4 ++--
 drivers/target/tcm_fc/tfc_sess.c   | 42 +++---
 3 files changed, 48 insertions(+), 26 deletions(-)

-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Migrate zone cache from RB-Tree to arrays of descriptors

2016-08-22 Thread Hannes Reinecke
On 08/22/2016 06:34 AM, Shaun Tancheff wrote:
> Currently the RB-Tree zone cache is fast and flexible. It does
> use a rather largish amount of ram. This model reduces the ram
> required from 120 bytes per zone to 16 bytes per zone with a
> moderate transformation of the blk_zone_lookup() api.
> 
> This model is predicated on the belief that most variations
> on zoned media will follow a pattern of using collections of same
> sized zones on a single device. Similar to the pattern of erase
> blocks on flash devices being progressivly larger 16K, 64K, ...
> 
> The goal is to be able to build a descriptor which is both memory
> efficient, performant, and flexible.
> 
> Signed-off-by: Shaun Tancheff 
> ---
>  block/blk-core.c   |2 +-
>  block/blk-sysfs.c  |   31 +-
>  block/blk-zoned.c  |  103 +++--
>  drivers/scsi/sd.c  |5 +-
>  drivers/scsi/sd.h  |4 +-
>  drivers/scsi/sd_zbc.c  | 1025 
> +++-
>  include/linux/blkdev.h |   82 +++-
>  7 files changed, 716 insertions(+), 536 deletions(-)
> 
Have you measure the performance impact here?
The main idea behind using an RB-tree is that each single element will
fit in the CPU cache; using an array will prevent that.
So we will increase the number of cache flushes, and most likely a
performance penalty, too.
Hence I'd rather like to see a performance measurement here before going
down that road.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html