Re: [PATCH] scsi: lpfc: fix linking against modular NVMe support

2017-03-21 Thread James Smart
Note: the patch I referenced 
(http://www.spinics.net/lists/linux-scsi/msg106102.html) replaced the 
one I think you referenced below 
(http://www.spinics.net/lists/linux-scsi/msg106024.html)


-- james


On 3/21/2017 7:23 PM, James Smart wrote:

Arnd,

All of the build issues, including building as modules, should have 
been resolved by the following patch:

http://www.spinics.net/lists/linux-scsi/msg106102.html

Am I missing something ?

-- james


On 3/21/2017 6:09 AM, Arnd Bergmann wrote:

When LPFC is built-in but NVMe is a loadable module, we fail to
link the kernel:

drivers/scsi/built-in.o: In function `lpfc_nvme_create_localport':
(.text+0x156a82): undefined reference to `nvme_fc_register_localport'
drivers/scsi/built-in.o: In function `lpfc_nvme_destroy_localport':
(.text+0x156eaa): undefined reference to `nvme_fc_unregister_remoteport'

We can avoid this either by forcing lpfc to be a module, or by disabling
NVMe support in this case. This implements the former.

Fixes: 7d7080335f8d ("scsi: lpfc: Finalize Kconfig options for nvme")
Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/Kconfig | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 3c52867dfe28..d145e0d90227 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1241,6 +1241,8 @@ config SCSI_LPFC
  tristate "Emulex LightPulse Fibre Channel Support"
  depends on PCI && SCSI
  depends on SCSI_FC_ATTRS
+depends on NVME_TARGET_FC || NVME_TARGET_FC=n
+depends on NVME_FC || NVME_FC=n
  select CRC_T10DIF
  ---help---
This lpfc driver supports the Emulex LightPulse






Re: [PATCH] scsi: lpfc: fix linking against modular NVMe support

2017-03-21 Thread James Smart

Arnd,

All of the build issues, including building as modules, should have been 
resolved by the following patch:

http://www.spinics.net/lists/linux-scsi/msg106102.html

Am I missing something ?

-- james


On 3/21/2017 6:09 AM, Arnd Bergmann wrote:

When LPFC is built-in but NVMe is a loadable module, we fail to
link the kernel:

drivers/scsi/built-in.o: In function `lpfc_nvme_create_localport':
(.text+0x156a82): undefined reference to `nvme_fc_register_localport'
drivers/scsi/built-in.o: In function `lpfc_nvme_destroy_localport':
(.text+0x156eaa): undefined reference to `nvme_fc_unregister_remoteport'

We can avoid this either by forcing lpfc to be a module, or by disabling
NVMe support in this case. This implements the former.

Fixes: 7d7080335f8d ("scsi: lpfc: Finalize Kconfig options for nvme")
Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/Kconfig | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 3c52867dfe28..d145e0d90227 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1241,6 +1241,8 @@ config SCSI_LPFC
tristate "Emulex LightPulse Fibre Channel Support"
depends on PCI && SCSI
depends on SCSI_FC_ATTRS
+   depends on NVME_TARGET_FC || NVME_TARGET_FC=n
+   depends on NVME_FC || NVME_FC=n
select CRC_T10DIF
---help---
This lpfc driver supports the Emulex LightPulse




Re: [PATCHv4 2/4] tcmu: Fix wrongly calculating of the base_command_size

2017-03-21 Thread Mike Christie
On 03/21/2017 04:36 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> The t_data_nents and t_bidi_data_nents are the numbers of the
> segments, but it couldn't be sure the block size equals to size
> of the segment.
> 
> For the worst case, all the blocks are discontiguous and there
> will need the same number of iovecs, that's to say: blocks == iovs.
> So here just set the number of iovs to block count needed by tcmu
> cmd.
> 
> Signed-off-by: Xiubo Li 
> ---
>  drivers/target/target_core_user.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 65d475f..ae5b74f 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -408,6 +408,13 @@ static inline size_t tcmu_cmd_get_data_length(struct 
> tcmu_cmd *tcmu_cmd)
>   return data_length;
>  }
>  
> +static inline uint32_t tcmu_cmd_get_block_cnt(struct tcmu_cmd *tcmu_cmd)
> +{
> + size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd);
> +
> + return (data_length + DATA_BLOCK_SIZE - 1) / DATA_BLOCK_SIZE;
> +}

Can you just do data_length / DATA_BLOCK_SIZE, because
tcmu_cmd_get_data_length will return data_length as a multiple of
DATA_BLOCK_SIZE.


Re: [PATCHv4 1/4] tcmu: Fix possible overwrite of t_data_sg's last iov[]

2017-03-21 Thread Mike Christie
On 03/21/2017 04:36 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> If there has BIDI data, its first iov[] will overwrite the last
> iov[] for se_cmd->t_data_sg.
> 
> To fix this, we can just increase the iov pointer, but this may
> introuduce a new memory leakage bug: If the se_cmd->data_length
> and se_cmd->t_bidi_data_sg->length are all not aligned up to the
> DATA_BLOCK_SIZE, the actual length needed maybe larger than just
> sum of them.
> 
> So, this could be avoided by rounding all the data lengthes up
> to DATA_BLOCK_SIZE.
> 
> Signed-off-by: Xiubo Li 
> Signed-off-by: Bryant G. Ly 
> ---
>  drivers/target/target_core_user.c | 34 +++---
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 57defc3..65d475f 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -394,6 +394,20 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
> size_t cmd_size, size_t d
>   return true;
>  }
>  
> +static inline size_t tcmu_cmd_get_data_length(struct tcmu_cmd *tcmu_cmd)
> +{
> + struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
> + size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
> +
> + if (se_cmd->se_cmd_flags & SCF_BIDI) {
> + BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
> + data_length += round_up(se_cmd->t_bidi_data_sg->length,
> + DATA_BLOCK_SIZE);
> + }
> +
> + return data_length;
> +}
> +
>  static sense_reason_t
>  tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
>  {
> @@ -407,7 +421,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
> size_t cmd_size, size_t d
>   uint32_t cmd_head;
>   uint64_t cdb_off;
>   bool copy_to_data_area;
> - size_t data_length;
> + size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd);
>   DECLARE_BITMAP(old_bitmap, DATA_BLOCK_BITS);
>  
>   if (test_bit(TCMU_DEV_BIT_BROKEN, >flags))
> @@ -433,11 +447,6 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
> size_t cmd_size, size_t d
>  
>   mb = udev->mb_addr;
>   cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */
> - data_length = se_cmd->data_length;
> - if (se_cmd->se_cmd_flags & SCF_BIDI) {
> - BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
> - data_length += se_cmd->t_bidi_data_sg->length;
> - }
>   if ((command_size > (udev->cmdr_size / 2)) ||
>   data_length > udev->data_size) {
>   pr_warn("TCMU: Request of size %zu/%zu is too big for %u/%zu "
> @@ -511,11 +520,14 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
> size_t cmd_size, size_t d
>   entry->req.iov_dif_cnt = 0;
>  
>   /* Handle BIDI commands */
> - iov_cnt = 0;
> - alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg,
> - se_cmd->t_bidi_data_nents, , _cnt, false);
> - entry->req.iov_bidi_cnt = iov_cnt;
> -
> + if (se_cmd->se_cmd_flags & SCF_BIDI) {
> + iov_cnt = 0;
> + iov++;
> + alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg,
> + se_cmd->t_bidi_data_nents, , _cnt,
> + false);
> + entry->req.iov_bidi_cnt = iov_cnt;
> + }
>   /* cmd's data_bitmap is what changed in process */
>   bitmap_xor(tcmu_cmd->data_bitmap, old_bitmap, udev->data_bitmap,
>   DATA_BLOCK_BITS);
> 

Patch looks ok to me.

Reviewed-by: Mike Christie 


Re: [PATCHv4 0/4] tcmu: bug fix and dynamic growing data area support

2017-03-21 Thread Bryant G. Ly

On 3/21/17 3:36 AM, lixi...@cmss.chinamobile.com wrote:

From: Xiubo Li 

Changed for V4:
- re-order the #3, #4 at the head.
- merge most of the #5 to others.


Changed for V3:
- [PATCHv2 2/5] fix double usage of blocks and possible page fault
   call trace.
- [PATCHv2 5/5] fix a mistake.

Changed for V2:
- [PATCHv2 1/5] just fixes some small spelling and other mistakes.
   And as the initial patch, here sets cmd area to 8M and data area to
   1G(1M fixed and 1023M growing)
- [PATCHv2 2/5] is a new one, adding global data block pool support.
   The max total size of the pool is 2G and all the targets will get
   growing blocks from here.
   Test this using multi-targets at the same time.
- [PATCHv2 3/5] changed nothing, respin it to avoid the conflict.
- [PATCHv2 4/5] and [PATCHv2 5/5] are new ones.



Xiubo Li (4):
   tcmu: Fix possible overwrite of t_data_sg's last iov[]
   tcmu: Fix wrongly calculating of the base_command_size
   tcmu: Add dynamic growing data area feature support
   tcmu: Add global data block pool support

  drivers/target/target_core_user.c | 605 +++---
  1 file changed, 504 insertions(+), 101 deletions(-)


I have built this patch into our kernel and will get back to you.

We will be doing larger scale testing to make sure everything still works.

-Bryant




Re: [PATCHv3 3/5] target/user: Fix possible overwrite of t_data_sg's last iov[]

2017-03-21 Thread Mike Christie
On 03/20/2017 05:09 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> If there has BIDI data, its first iov[] will overwrite the last
> iov[] for se_cmd->t_data_sg.
> 
> To fix this, we can just increase the iov pointer, but this may
> introuduce a new memory leakage bug: If the se_cmd->data_length
> and se_cmd->t_bidi_data_sg->length are all not aligned up to the
> DATA_BLOCK_SIZE, the actual length needed maybe larger than just
> sum of them.
> 
> So, this could be avoided by rounding all the data lengthes up
> to DATA_BLOCK_SIZE.
> 
> Signed-off-by: Xiubo Li 
> Signed-off-by: Bryant G. Ly 
> ---
>  drivers/target/target_core_user.c | 32 +++-
>  1 file changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 8cbe196..780c30f 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -288,13 +288,14 @@ static inline void tcmu_free_cmd(struct tcmu_cmd 
> *tcmu_cmd)
>  
>  static inline uint32_t tcmu_cmd_get_dbi_len(struct se_cmd *se_cmd)
>  {
> - size_t data_length = se_cmd->data_length;
> + size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
>   uint32_t dbi_len;
>  
>   if (se_cmd->se_cmd_flags & SCF_BIDI)
> - data_length += se_cmd->t_bidi_data_sg->length;
> + data_length += round_up(se_cmd->t_bidi_data_sg->length,
> + DATA_BLOCK_SIZE);
>  
> - dbi_len = (data_length + DATA_BLOCK_SIZE - 1) / DATA_BLOCK_SIZE;
> + dbi_len = data_length / DATA_BLOCK_SIZE;
>  
>   return dbi_len;
>  }
> @@ -609,10 +610,11 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
> struct tcmu_cmd *cmd,
>  
>   mb = udev->mb_addr;
>   cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */
> - data_length = se_cmd->data_length;
> + data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
>   if (se_cmd->se_cmd_flags & SCF_BIDI) {
>   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
> - data_length += se_cmd->t_bidi_data_sg->length;
> + data_length += round_up(se_cmd->t_bidi_data_sg->length,
> + DATA_BLOCK_SIZE);
>   }
>   if ((command_size > (udev->cmdr_size / 2)) ||
>   data_length > udev->data_size) {
> @@ -690,15 +692,19 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
> struct tcmu_cmd *cmd,
>   entry->req.iov_dif_cnt = 0;
>  
>   /* Handle BIDI commands */
> - iov_cnt = 0;
> - ret = alloc_and_scatter_data_area(udev, tcmu_cmd,
> - se_cmd->t_bidi_data_sg, se_cmd->t_bidi_data_nents,
> - , _cnt, false);
> - if (ret) {
> - pr_err("tcmu: alloc and scatter bidi data failed\n");
> - return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
> + if (se_cmd->se_cmd_flags & SCF_BIDI) {
> + iov_cnt = 0;
> + iov++;
> + ret = alloc_and_scatter_data_area(udev, tcmu_cmd,
> + se_cmd->t_bidi_data_sg,
> + se_cmd->t_bidi_data_nents,
> + , _cnt, false);
> + if (ret) {
> + pr_err("tcmu: alloc and scatter bidi data failed\n");
> + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
> + }
> + entry->req.iov_bidi_cnt = iov_cnt;
>   }
> - entry->req.iov_bidi_cnt = iov_cnt;
>  
>   /* All offsets relative to mb_addr, not start of entry! */
>   cdb_off = CMDR_OFF + cmd_head + base_command_size;
> 

Looks ok to me. Thanks.

Reviewed-by: Mike Christie 


Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

2017-03-21 Thread Benjamin Block
On Thu, Mar 16, 2017 at 12:53:45PM +0100, Hannes Reinecke wrote:
>On 03/16/2017 12:01 PM, Benjamin Block wrote:
>> On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
>>> On 03/14/2017 06:56 PM, Benjamin Block wrote:
 Hello Hannes,

 On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
> When a command is sent as part of the error handling there
> is not point whatsoever to start EH escalation when that
> command fails; we are _already_ in the error handler,
> and the escalation is about to commence anyway.
> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> commands and let the main EH routine handle the rest.
>
> Signed-off-by: Hannes Reinecke 
> Reviewed-by: Johannes Thumshirn 
> Reviewed-by: Bart Van Assche 
> ---
>  drivers/scsi/scsi_error.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index e1ca3b8..4613aa1 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct 
> scsi_host_template *hostt,
>   return hostt->eh_abort_handler(scmd);
>  }
>
> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> -{
> - if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
> - if (scsi_try_bus_device_reset(scmd) != SUCCESS)
> - if (scsi_try_target_reset(scmd) != SUCCESS)
> - if (scsi_try_bus_reset(scmd) != SUCCESS)
> - scsi_try_host_reset(scmd);
> -}
> -
>  /**
>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error 
> recovery
>   * @scmd:   SCSI command structure to hijack
> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd 
> *scmd, unsigned char *cmnd,
>   break;
>   }
>   } else if (rtn != FAILED) {
> - scsi_abort_eh_cmnd(scmd);
> + scsi_try_to_abort_cmd(shost->hostt, scmd);
>   rtn = FAILED;
>   }

 The idea is sound, but this implementation would cause "use-after-free"s.

 I only know our own LLD well enough to judge, but with zFCP there will
 always be a chance that an abort fails - be it memory pressure,
 hardware/firmware behavior or internal EH in zFCP.

 Calling queuecommand() will mean for us in the LLD, that we allocate a
 unique internal request struct for the scsi_cmnd (struct
 zfcp_fsf_request) and add that to our internal hash-table with
 outstanding commands. We assume this scsi_cmnd-pointer is ours till we
 complete it via scsi_done are yield it via successful EH-actions.

 In case the abort fails, you fail to take back the ownership over the
 scsi command. Which in turn means possible "use-after-free"s when we
 still thinks the scsi command is ours, but EH has already overwritten
 the scsi-command with the original one. When we still get an answer or
 otherwise use the scsi_cmnd-pointer we would access an invalid one.

>>> That is actually not try.
>>> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
>>> assumed to reside in the SCSI midlayer;
>>
>> That can not be true. First of all, look at the function itself (v4.10):
>>
>>  static int scsi_try_to_abort_cmd...
>>  {
>>  if (!hostt->eh_abort_handler)
>>  return FAILED;
>>
>>  return hostt->eh_abort_handler(scmd);
>>  }
>>
>> If what you say is true, then this whole API of LLDs providing or
>> choosing not to provide implementations for these function would be
>> fundamentally broken.
>> The function itself returns FAILED when there is no such function.. how
>> is a LLD that does not implement it ever to know that you took ownership
>> by calling scsi_try_to_abort_cmd()?
>>
> Well. Ok.
>
> _Actually_, the whole concept of 'ownership' in SCSI EH is a bit flaky.
>
> There are two ways of entering the error handler:
> - SCSI command timeout
> - Failing to evaluate the SCSI command status
>
> For the latter case ownership already _is_ with the SCSI midlayer, as
> the LLDD called 'scsi_done' and with that moved ownership to the midlayer.
>
> The interesting part is command timeout.
> Once a command timeout triggers the block layer is calling
> 'blk_mark_rq_complete' to avoid any concurrent completions.
> IE any calls to scsi_done() will be short-circuited with that,
> effectively transferring ownership to SCSI midlayer.
>
> Now the SCSI midlayer has to inform the LLDD that it has taken
> ownership; for that it calls the various eh_XXX callbacks into the LLDD.
> While it's quite clear that SUCCESS signals a transfer of ownership to
> SCSI ML, it's less clear what 

Re: support ranges TRIM for libata

2017-03-21 Thread Tejun Heo
Hello,

On Mon, Mar 20, 2017 at 04:43:12PM -0400, Christoph Hellwig wrote:
> This series implements rangeѕ discard for ATA SSDs.  Compared to the
> initial NVMe support there are two things that complicate the ATA
> support:
> 
>  - ATA only suports 16-bit long ranges
>  - the whole mess of generating a SCSI command first and then
>translating it to an ATA one.
> 
> This series adds support for limited range size to the block layer,
> and stops translating discard commands - instead we add a new
> Vendor Specific SCSI command that contains the TRIM payload when
> the device asks for it.

I do like the fact that this is a lot simpler than the previous
implementation but am not quite sure we want to deviate significantly
from what we do for other commands (command translation).  Is it
because fixing the existing implementation would involve invaisve
changes including memory allocations?

Thanks.

-- 
tejun


RE: [PATCH v2] scsi: storvsc: Add support for FC rport.

2017-03-21 Thread KY Srinivasan


> -Original Message-
> From: Cathy Avery [mailto:cav...@redhat.com]
> Sent: Friday, March 17, 2017 8:18 AM
> To: KY Srinivasan ; h...@infradead.org; Haiyang Zhang
> ; j...@linux.vnet.ibm.com;
> martin.peter...@oracle.com
> Cc: step...@networkplumber.org; dan.carpen...@oracle.com;
> de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; linux-
> s...@vger.kernel.org
> Subject: [PATCH v2] scsi: storvsc: Add support for FC rport.
> 
> Included in the current storvsc driver for Hyper-V is the ability
> to access luns on an FC fabric via a virtualized fiber channel
> adapter exposed by the Hyper-V host. The driver also attaches to
> the FC transport to allow host and port names to be published under
> /sys/class/fc_host/hostX. Current customer tools running on the VM
> require that these names be available in the well known standard
> location under fc_host/hostX.
> 
> A problem arose when attaching to the FC transport. The scsi_scan
> code attempts to call fc_user_scan which has basically become a no-op
> due to the fact that the virtualized FC device does not expose rports.
> At this point you cannot refresh the scsi bus after mapping or unmapping
> luns on the SAN without a reboot of the VM.
> 
> This patch stubs in an rport per fc_host in storvsc so that the
> requirement of a defined rport is now met within the fc_transport and
> echo "- - -" > /sys/class/scsi_host/hostX/scan now works.
> 
> Signed-off-by: Cathy Avery 

Acked-by: K. Y. Srinivasan 
> ---
> Changes since v1:
> - Fix fc_rport_identifiers init [Stephen Hemminger]
> - Better error checking
> ---
>  drivers/scsi/storvsc_drv.c | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 638e5f4..37646d1 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -478,6 +478,9 @@ struct storvsc_device {
>*/
>   u64 node_name;
>   u64 port_name;
> +#if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
> + struct fc_rport *rport;
> +#endif
>  };
> 
>  struct hv_host_device {
> @@ -1816,19 +1819,27 @@ static int storvsc_probe(struct hv_device
> *device,
>   target = (device->dev_instance.b[5] << 8 |
>device->dev_instance.b[4]);
>   ret = scsi_add_device(host, 0, target, 0);
> - if (ret) {
> - scsi_remove_host(host);
> - goto err_out2;
> - }
> + if (ret)
> + goto err_out3;
>   }
>  #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
>   if (host->transportt == fc_transport_template) {
> + struct fc_rport_identifiers ids = {
> + .roles = FC_PORT_ROLE_FCP_TARGET,
> + };
> +
>   fc_host_node_name(host) = stor_device->node_name;
>   fc_host_port_name(host) = stor_device->port_name;
> + stor_device->rport = fc_remote_port_add(host, 0, );
> + if (!stor_device->rport)
> + goto err_out3;
>   }
>  #endif
>   return 0;
> 
> +err_out3:
> + scsi_remove_host(host);
> +
>  err_out2:
>   /*
>* Once we have connected with the host, we would need to
> @@ -1854,8 +1865,10 @@ static int storvsc_remove(struct hv_device *dev)
>   struct Scsi_Host *host = stor_device->host;
> 
>  #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
> - if (host->transportt == fc_transport_template)
> + if (host->transportt == fc_transport_template) {
> + fc_remote_port_delete(stor_device->rport);
>   fc_remove_host(host);
> + }
>  #endif
>   scsi_remove_host(host);
>   storvsc_dev_remove(dev);
> --
> 2.5.0



Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
On 03/21/2017 04:33 PM, James Bottomley wrote:
> On Tue, 2017-03-21 at 16:25 +0100, Hannes Reinecke wrote:
>> On 03/21/2017 02:05 PM, James Bottomley wrote:
>>> On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
 With the current design we're waiting for all async probes to
 finish when removing any sd device.
 This might lead to a livelock where the 'remove' call is blocking
 for any probe calls to finish, and the probe calls are waiting
 for
 a response, which will never be processes as the thread handling
 the responses is waiting for the remove call to finish.
 Which is completely pointless as we only _really_ care for the
 probe on _this_ device to be completed; any other probing can
 happily continue for all we care.
 So save the async probing cookie in the structure and only wait
 if this specific probe is still active.
>>>
>>> How does this preserve ordering?  It looks like you have one cookie 
>>> per sdkp ... is there some sort of ordering guarantee I'm not
>>> seeing?
>>>
>> Do we need one?
>> The only thing we care here is that probing for _this_ device has 
>> finished.
> 
> OK, so currently we guarantee the linear ordering luns for individual
> hbas.  We also guarantee no interleaving of sdX letters for individual
> hbas.  We don't guarantee the scan order of the hbas themselves. 
>  Preserve those guarantees and I'm happy with the patch.  If you can't
> preserve them I think we need further discussion.
> 
Which is actually not true.
If just some devices are removed from the hba (eg if they belong to the
same remote port) and we're rescanning devices once the port comes back
there is no guarantee that the devices will be getting the same device
letters. Nor that the device letters will be consecutive; just starting
'scsi_debug' with just one device before rescanning will mess up the
ordering. Even now.

So I don't see how we can be worse off than we are today.

Plus we (what with me now speaking for SUSE) never promised our
customers anything regardind sdX stability :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [GIT PULL] SCSI fixes for 4.11-rc3

2017-03-21 Thread Johannes Thumshirn
On Tue, Mar 21, 2017 at 11:08:51AM -0400, James Bottomley wrote:
> Nine small fixes: the biggest is probably finally sorting out Kconfig
> issues with lpfc nvme.

I'm sorry to disappoint you but there's another fix from Arnd from today
here: https://lkml.org/lkml/2017/3/21/308

Byte,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
On 03/21/2017 02:05 PM, James Bottomley wrote:
> On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
>> With the current design we're waiting for all async probes to
>> finish when removing any sd device.
>> This might lead to a livelock where the 'remove' call is blocking
>> for any probe calls to finish, and the probe calls are waiting for
>> a response, which will never be processes as the thread handling
>> the responses is waiting for the remove call to finish.
>> Which is completely pointless as we only _really_ care for the
>> probe on _this_ device to be completed; any other probing can
>> happily continue for all we care.
>> So save the async probing cookie in the structure and only wait
>> if this specific probe is still active.
> 
> How does this preserve ordering?  It looks like you have one cookie per
> sdkp ... is there some sort of ordering guarantee I'm not seeing?
> 
Do we need one?
The only thing we care here is that probing for _this_ device has finished.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread James Bottomley
On Tue, 2017-03-21 at 16:25 +0100, Hannes Reinecke wrote:
> On 03/21/2017 02:05 PM, James Bottomley wrote:
> > On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
> > > With the current design we're waiting for all async probes to
> > > finish when removing any sd device.
> > > This might lead to a livelock where the 'remove' call is blocking
> > > for any probe calls to finish, and the probe calls are waiting
> > > for
> > > a response, which will never be processes as the thread handling
> > > the responses is waiting for the remove call to finish.
> > > Which is completely pointless as we only _really_ care for the
> > > probe on _this_ device to be completed; any other probing can
> > > happily continue for all we care.
> > > So save the async probing cookie in the structure and only wait
> > > if this specific probe is still active.
> > 
> > How does this preserve ordering?  It looks like you have one cookie 
> > per sdkp ... is there some sort of ordering guarantee I'm not
> > seeing?
> > 
> Do we need one?
> The only thing we care here is that probing for _this_ device has 
> finished.

OK, so currently we guarantee the linear ordering luns for individual
hbas.  We also guarantee no interleaving of sdX letters for individual
hbas.  We don't guarantee the scan order of the hbas themselves. 
 Preserve those guarantees and I'm happy with the patch.  If you can't
preserve them I think we need further discussion.

James




Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
On 03/21/2017 02:33 PM, James Bottomley wrote:
> On Tue, 2017-03-21 at 13:30 +, Bart Van Assche wrote:
>> On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
>>> How does this preserve ordering?  It looks like you have one cookie 
>>> per sdkp ... is there some sort of ordering guarantee I'm not
>>> seeing?
>>
>> Hello James,
>>
>> Since the probe order depends on the order in which __async_probe() 
>> adds entries to the "pending" list, and since the order of the
>> __async_probe() calls is not changed by this patch, shouldn't the 
>> probe order be preserved by this patch?
> 
> I don't know: that's what I'm asking.  I believe they complete in order
> for a single domain.  I thought ordering isn't preserved between
> domains?  So moving to multiple domains loses us ordering of disk
> appearance.
> 
Ah.
But we don't move to multiple domains, now do we?
We're just terminating the wait until _our_ probe is completed.
It's not that we're having a individual probe domain per device...

Unless I'm misunderstanding something...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


[GIT PULL] SCSI fixes for 4.11-rc3

2017-03-21 Thread James Bottomley
Nine small fixes: the biggest is probably finally sorting out Kconfig
issues with lpfc nvme.  There are some performance fixes for megaraid
and hpsa and a static checker fix.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Don Brace (3):
  scsi: hpsa: do not timeout reset operations
  scsi: hpsa: limit outstanding rescans
  scsi: hpsa: update check for logical volume status

James Smart (1):
  scsi: lpfc: Finalize Kconfig options for nvme

Shivasharan S (4):
  scsi: megaraid_sas: Driver version upgrade
  scsi: megaraid_sas: raid6 also require cpuSel check same as raid5
  scsi: megaraid_sas: add correct return type check for ldio hint logic for 
raid1
  scsi: megaraid_sas: enable intx only if msix request fails

Tomas Winkler (1):
  scsi: ufs: don't check unsigned type for a negative value

And the diffstat:

 drivers/scsi/Kconfig| 14 
 drivers/scsi/hpsa.c | 53 +
 drivers/scsi/hpsa.h |  1 +
 drivers/scsi/hpsa_cmd.h |  2 ++
 drivers/scsi/lpfc/lpfc_attr.c   |  4 +--
 drivers/scsi/lpfc/lpfc_init.c   |  7 
 drivers/scsi/lpfc/lpfc_nvme.c   |  8 ++---
 drivers/scsi/lpfc/lpfc_nvmet.c  |  8 ++---
 drivers/scsi/megaraid/megaraid_sas.h|  4 +--
 drivers/scsi/megaraid/megaraid_sas_base.c   | 17 ++---
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 +--
 drivers/scsi/ufs/ufshcd.c   |  2 +-
 12 files changed, 69 insertions(+), 55 deletions(-)

James

---

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 4bf55b5..3c52867 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1253,20 +1253,6 @@ config SCSI_LPFC_DEBUG_FS
  This makes debugging information from the lpfc driver
  available via the debugfs filesystem.
 
-config LPFC_NVME_INITIATOR
-   bool "Emulex LightPulse Fibre Channel NVME Initiator Support"
-   depends on SCSI_LPFC && NVME_FC
-   ---help---
- This enables NVME Initiator support in the Emulex lpfc driver.
-
-config LPFC_NVME_TARGET
-   bool "Emulex LightPulse Fibre Channel NVME Initiator Support"
-   depends on SCSI_LPFC && NVME_TARGET_FC
-   ---help---
- This enables NVME Target support in the Emulex lpfc driver.
- Target enablement must still be enabled on a per adapter
- basis by module parameters.
-
 config SCSI_SIM710
tristate "Simple 53c710 SCSI support (Compaq, NCR machines)"
depends on (EISA || MCA) && SCSI
diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 524a0c7..0d0be77 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -2956,7 +2956,7 @@ static int hpsa_send_reset(struct ctlr_info *h, unsigned 
char *scsi3addr,
/* fill_cmd can't fail here, no data buffer to map. */
(void) fill_cmd(c, reset_type, h, NULL, 0, 0,
scsi3addr, TYPE_MSG);
-   rc = hpsa_scsi_do_simple_cmd(h, c, reply_queue, DEFAULT_TIMEOUT);
+   rc = hpsa_scsi_do_simple_cmd(h, c, reply_queue, NO_TIMEOUT);
if (rc) {
dev_warn(>pdev->dev, "Failed to send reset command\n");
goto out;
@@ -3714,7 +3714,7 @@ static int hpsa_get_volume_status(struct ctlr_info *h,
  *  # (integer code indicating one of several NOT READY states
  * describing why a volume is to be kept offline)
  */
-static int hpsa_volume_offline(struct ctlr_info *h,
+static unsigned char hpsa_volume_offline(struct ctlr_info *h,
unsigned char scsi3addr[])
 {
struct CommandList *c;
@@ -3735,7 +3735,7 @@ static int hpsa_volume_offline(struct ctlr_info *h,
DEFAULT_TIMEOUT);
if (rc) {
cmd_free(h, c);
-   return 0;
+   return HPSA_VPD_LV_STATUS_UNSUPPORTED;
}
sense = c->err_info->SenseInfo;
if (c->err_info->SenseLen > sizeof(c->err_info->SenseInfo))
@@ -3746,19 +3746,13 @@ static int hpsa_volume_offline(struct ctlr_info *h,
cmd_status = c->err_info->CommandStatus;
scsi_status = c->err_info->ScsiStatus;
cmd_free(h, c);
-   /* Is the volume 'not ready'? */
-   if (cmd_status != CMD_TARGET_STATUS ||
-   scsi_status != SAM_STAT_CHECK_CONDITION ||
-   sense_key != NOT_READY ||
-   asc != ASC_LUN_NOT_READY)  {
-   return 0;
-   }
 
/* Determine the reason for not ready state */
ldstat = hpsa_get_volume_status(h, scsi3addr);
 
/* Keep volume offline in certain cases: */
switch (ldstat) {
+   case HPSA_LV_FAILED:
case HPSA_LV_UNDERGOING_ERASE:
case HPSA_LV_NOT_AVAILABLE:
case HPSA_LV_UNDERGOING_RPI:
@@ -3780,7 +3774,7 @@ static int 

Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Bart Van Assche
On Tue, 2017-03-21 at 09:33 -0400, James Bottomley wrote:
> On Tue, 2017-03-21 at 13:30 +, Bart Van Assche wrote:
> > On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
> > > How does this preserve ordering?  It looks like you have one cookie 
> > > per sdkp ... is there some sort of ordering guarantee I'm not
> > > seeing?
> > 
> > Hello James,
> > 
> > Since the probe order depends on the order in which __async_probe() 
> > adds entries to the "pending" list, and since the order of the
> > __async_probe() calls is not changed by this patch, shouldn't the 
> > probe order be preserved by this patch?
> 
> I don't know: that's what I'm asking.  I believe they complete in order
> for a single domain.  I thought ordering isn't preserved between
> domains?  So moving to multiple domains loses us ordering of disk
> appearance.

Right, since sd_remove() doesn't wait any longer for completion of probes
from other domains the multi-domain probing behavior may change due to this
patch. However, the multi-domain probing order was already dependent on the
duration of individual probes so I don't think that it is guaranteed today
that multi-domain probing happens in the same order during every boot. I
hope that the change introduced by this patch will be considered acceptable.

Bart.

Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread James Bottomley
On Tue, 2017-03-21 at 13:30 +, Bart Van Assche wrote:
> On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
> > How does this preserve ordering?  It looks like you have one cookie 
> > per sdkp ... is there some sort of ordering guarantee I'm not
> > seeing?
> 
> Hello James,
> 
> Since the probe order depends on the order in which __async_probe() 
> adds entries to the "pending" list, and since the order of the
> __async_probe() calls is not changed by this patch, shouldn't the 
> probe order be preserved by this patch?

I don't know: that's what I'm asking.  I believe they complete in order
for a single domain.  I thought ordering isn't preserved between
domains?  So moving to multiple domains loses us ordering of disk
appearance.

James




Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Bart Van Assche
On Tue, 2017-03-21 at 09:05 -0400, James Bottomley wrote:
> How does this preserve ordering?  It looks like you have one cookie per
> sdkp ... is there some sort of ordering guarantee I'm not seeing?

Hello James,

Since the probe order depends on the order in which __async_probe() adds
entries to the "pending" list, and since the order of the __async_probe()
calls is not changed by this patch, shouldn't the probe order be preserved
by this patch?

Thanks,

Bart.

[PATCH] scsi: lpfc: fix linking against modular NVMe support

2017-03-21 Thread Arnd Bergmann
When LPFC is built-in but NVMe is a loadable module, we fail to
link the kernel:

drivers/scsi/built-in.o: In function `lpfc_nvme_create_localport':
(.text+0x156a82): undefined reference to `nvme_fc_register_localport'
drivers/scsi/built-in.o: In function `lpfc_nvme_destroy_localport':
(.text+0x156eaa): undefined reference to `nvme_fc_unregister_remoteport'

We can avoid this either by forcing lpfc to be a module, or by disabling
NVMe support in this case. This implements the former.

Fixes: 7d7080335f8d ("scsi: lpfc: Finalize Kconfig options for nvme")
Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 3c52867dfe28..d145e0d90227 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1241,6 +1241,8 @@ config SCSI_LPFC
tristate "Emulex LightPulse Fibre Channel Support"
depends on PCI && SCSI
depends on SCSI_FC_ATTRS
+   depends on NVME_TARGET_FC || NVME_TARGET_FC=n
+   depends on NVME_FC || NVME_FC=n
select CRC_T10DIF
---help---
   This lpfc driver supports the Emulex LightPulse
-- 
2.9.0



Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread James Bottomley
On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
> With the current design we're waiting for all async probes to
> finish when removing any sd device.
> This might lead to a livelock where the 'remove' call is blocking
> for any probe calls to finish, and the probe calls are waiting for
> a response, which will never be processes as the thread handling
> the responses is waiting for the remove call to finish.
> Which is completely pointless as we only _really_ care for the
> probe on _this_ device to be completed; any other probing can
> happily continue for all we care.
> So save the async probing cookie in the structure and only wait
> if this specific probe is still active.

How does this preserve ordering?  It looks like you have one cookie per
sdkp ... is there some sort of ordering guarantee I'm not seeing?

James




Re: [PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Bart Van Assche
On Tue, 2017-03-21 at 13:14 +0100, Hannes Reinecke wrote:
> With the current design we're waiting for all async probes to
> finish when removing any sd device.
> This might lead to a livelock where the 'remove' call is blocking
> for any probe calls to finish, and the probe calls are waiting for
> a response, which will never be processes as the thread handling
> the responses is waiting for the remove call to finish.
> Which is completely pointless as we only _really_ care for the
> probe on _this_ device to be completed; any other probing can
> happily continue for all we care.
> So save the async probing cookie in the structure and only wait
> if this specific probe is still active.

Nice work! This may even help to reduce system boot time.

Reviewed-by: Bart Van Assche 

[PATCH] scsi: ufs: remove the duplicated checking for supporting clkscaling

2017-03-21 Thread Jaehoon Chung
There are same conditions for checking whether supporting clkscaling or
not.
When ufshcd is supporting clkscaling, active_reqs should be decreased by
two.

Signed-off-by: Jaehoon Chung 
---
 drivers/scsi/ufs/ufshcd.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index dc6efbd..f2cbc71 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4598,8 +4598,6 @@ static void __ufshcd_transfer_req_compl(struct ufs_hba 
*hba,
}
if (ufshcd_is_clkscaling_supported(hba))
hba->clk_scaling.active_reqs--;
-   if (ufshcd_is_clkscaling_supported(hba))
-   hba->clk_scaling.active_reqs--;
}
 
/* clear corresponding bits of completed commands */
-- 
2.10.2



[PATCH] sd: use async_probe cookie to avoid deadlocks

2017-03-21 Thread Hannes Reinecke
With the current design we're waiting for all async probes to
finish when removing any sd device.
This might lead to a livelock where the 'remove' call is blocking
for any probe calls to finish, and the probe calls are waiting for
a response, which will never be processes as the thread handling
the responses is waiting for the remove call to finish.
Which is completely pointless as we only _really_ care for the
probe on _this_ device to be completed; any other probing can
happily continue for all we care.
So save the async probing cookie in the structure and only wait
if this specific probe is still active.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/sd.c | 7 ---
 drivers/scsi/sd.h | 3 +++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fb9b4d2..9f932e4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -48,7 +48,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -3217,7 +3216,8 @@ static int sd_probe(struct device *dev)
dev_set_drvdata(dev, sdkp);
 
get_device(>dev); /* prevent release before async_schedule */
-   async_schedule_domain(sd_probe_async, sdkp, _sd_probe_domain);
+   sdkp->async_probe = async_schedule_domain(sd_probe_async, sdkp,
+ _sd_probe_domain);
 
return 0;
 
@@ -3256,7 +3256,8 @@ static int sd_remove(struct device *dev)
scsi_autopm_get_device(sdkp->device);
 
async_synchronize_full_domain(_sd_pm_domain);
-   async_synchronize_full_domain(_sd_probe_domain);
+   async_synchronize_cookie_domain(sdkp->async_probe,
+   _sd_probe_domain);
device_del(>dev);
del_gendisk(sdkp->disk);
sd_shutdown(dev);
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..d4b5826 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -1,6 +1,8 @@
 #ifndef _SCSI_DISK_H
 #define _SCSI_DISK_H
 
+#include 
+
 /*
  * More than enough for everybody ;)  The huge number of majors
  * is a leftover from 16bit dev_t days, we don't really need that
@@ -73,6 +75,7 @@ struct scsi_disk {
unsigned intzones_optimal_nonseq;
unsigned intzones_max_open;
 #endif
+   async_cookie_t  async_probe;
atomic_topeners;
sector_tcapacity;   /* size in logical blocks */
u32 max_xfer_blocks;
-- 
1.8.5.6



[PATCH] BusLogic: Prevent write reordering when writing outgoing mailboxes

2017-03-21 Thread Alexander Eichner
From: Alexander Eichner 

Make sure the compiler doesn't reorder the writes to the outgoing
mailbox because the correct order does matter. Otherwise the
controller might see a stale CCB handle under rare circumstances
if the action code is written first and the controller happens to
read the mailbox between the action code and CCB handle write.
This causes trouble later on as the completion handler will get
confused because the request was not active.

This is very difficult to trigger but happened in some virtualized
environments with a high I/O load.

Signed-off-by: Alexander Eichner 
---
 drivers/scsi/BusLogic.c | 1 +
 drivers/scsi/BusLogic.h | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/BusLogic.c b/drivers/scsi/BusLogic.c
index c7be7bb37209..20cc34fa5754 100644
--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -2981,6 +2981,7 @@ static bool blogic_write_outbox(struct blogic_adapter 
*adapter,
   by the Host Adapter.
 */
next_outbox->ccb = ccb->dma_handle;
+   wmb();
next_outbox->action = action;
blogic_execmbox(adapter);
if (++next_outbox > adapter->last_outbox)
diff --git a/drivers/scsi/BusLogic.h b/drivers/scsi/BusLogic.h
index b53ec2f1e8cd..16f372f3e9c2 100644
--- a/drivers/scsi/BusLogic.h
+++ b/drivers/scsi/BusLogic.h
@@ -869,9 +869,9 @@ struct blogic_ccb {
 */
 
 struct blogic_outbox {
-   u32 ccb;/* Bytes 0-3 */
-   u32:24; /* Bytes 4-6 */
-   enum blogic_action action;  /* Byte 7 */
+   volatile u32 ccb;   /* Bytes 0-3 */
+   u32:24; /* 
Bytes 4-6 */
+   volatile enum blogic_action action; /* Byte 7 */
 };
 
 /*
-- 
2.12.0



Re: [PATCH] enclosure: fix sysfs symlinks creation when using multipath

2017-03-21 Thread Maurizio Lombardi


Dne 16.3.2017 v 19:49 James Bottomley napsal(a): 
> diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
> index 65fed71..ae89082 100644
> --- a/drivers/misc/enclosure.c
> +++ b/drivers/misc/enclosure.c
> @@ -375,6 +375,7 @@ int enclosure_add_device(struct enclosure_device *edev, 
> int component,
>struct device *dev)
>  {
>   struct enclosure_component *cdev;
> + int err;
>  
>   if (!edev || component >= edev->components)
>   return -EINVAL;
> @@ -384,12 +385,15 @@ int enclosure_add_device(struct enclosure_device *edev, 
> int component,
>   if (cdev->dev == dev)
>   return -EEXIST;
>  
> - if (cdev->dev)
> + if (cdev->dev) {
>   enclosure_remove_links(cdev);
> -
> - put_device(cdev->dev);
> - cdev->dev = get_device(dev);
> - return enclosure_add_links(cdev);
> + put_device(cdev->dev);
> + cdev->dev = NULL;
> + }
> + err = enclosure_add_links(cdev);
> + if (!err)
> + cdev->dev = get_device(dev);
> + return err;
>  }
>  EXPORT_SYMBOL_GPL(enclosure_add_device);
>  
>


I will ask our customer to test your patch,
there is only a small problem: you can't set cdev->dev = NULL
and then call enclosure_add_links(cdev) because you will end up dereferencing
a NULL pointer.
I suggest a slightly different patch:

@@ -375,6 +379,7 @@ int enclosure_add_device(struct enclosure_device *edev, int 
component,
 struct device *dev)
 {
struct enclosure_component *cdev;
+   int err;
 
if (!edev || component >= edev->components)
return -EINVAL;
@@ -384,12 +389,18 @@ int enclosure_add_device(struct enclosure_device *edev, 
int component,
if (cdev->dev == dev)
return -EEXIST;
 
-   if (cdev->dev)
+   if (cdev->dev) {
enclosure_remove_links(cdev);
-
-   put_device(cdev->dev);
+   put_device(cdev->dev);
+   }
cdev->dev = get_device(dev);
-   return enclosure_add_links(cdev);
+
+   err = enclosure_add_links(cdev);
+   if (err) {
+   put_device(cdev->dev);
+   cdev->dev = NULL;
+   }
+   return err;
 }
 EXPORT_SYMBOL_GPL(enclosure_add_device);
 
Thanks,
Maurizio.
 


[PATCHv4 1/4] tcmu: Fix possible overwrite of t_data_sg's last iov[]

2017-03-21 Thread lixiubo
From: Xiubo Li 

If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.

To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.

So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.

Signed-off-by: Xiubo Li 
Signed-off-by: Bryant G. Ly 
---
 drivers/target/target_core_user.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 57defc3..65d475f 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -394,6 +394,20 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
return true;
 }
 
+static inline size_t tcmu_cmd_get_data_length(struct tcmu_cmd *tcmu_cmd)
+{
+   struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
+   size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
+
+   if (se_cmd->se_cmd_flags & SCF_BIDI) {
+   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
+   data_length += round_up(se_cmd->t_bidi_data_sg->length,
+   DATA_BLOCK_SIZE);
+   }
+
+   return data_length;
+}
+
 static sense_reason_t
 tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
 {
@@ -407,7 +421,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
uint32_t cmd_head;
uint64_t cdb_off;
bool copy_to_data_area;
-   size_t data_length;
+   size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd);
DECLARE_BITMAP(old_bitmap, DATA_BLOCK_BITS);
 
if (test_bit(TCMU_DEV_BIT_BROKEN, >flags))
@@ -433,11 +447,6 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
 
mb = udev->mb_addr;
cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */
-   data_length = se_cmd->data_length;
-   if (se_cmd->se_cmd_flags & SCF_BIDI) {
-   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
-   data_length += se_cmd->t_bidi_data_sg->length;
-   }
if ((command_size > (udev->cmdr_size / 2)) ||
data_length > udev->data_size) {
pr_warn("TCMU: Request of size %zu/%zu is too big for %u/%zu "
@@ -511,11 +520,14 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
entry->req.iov_dif_cnt = 0;
 
/* Handle BIDI commands */
-   iov_cnt = 0;
-   alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg,
-   se_cmd->t_bidi_data_nents, , _cnt, false);
-   entry->req.iov_bidi_cnt = iov_cnt;
-
+   if (se_cmd->se_cmd_flags & SCF_BIDI) {
+   iov_cnt = 0;
+   iov++;
+   alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg,
+   se_cmd->t_bidi_data_nents, , _cnt,
+   false);
+   entry->req.iov_bidi_cnt = iov_cnt;
+   }
/* cmd's data_bitmap is what changed in process */
bitmap_xor(tcmu_cmd->data_bitmap, old_bitmap, udev->data_bitmap,
DATA_BLOCK_BITS);
-- 
1.8.3.1





[PATCHv4 4/4] tcmu: Add global data block pool support

2017-03-21 Thread lixiubo
From: Xiubo Li 

For each target there will be one ring, when the target number
grows larger and larger, it could eventually runs out of the
system memories.

In this patch for each target ring, for the cmd area the size
will be limited to 8MB and for the data area the size will be
limited to 256K * PAGE_SIZE.

For all the targets' data areas, they will get empty blocks
from the "global data block pool", which has limited to 512K *
PAGE_SIZE for now.

When the "global data block pool" has been used up, then any
target could trigger the unmapping thread routine to shrink the
targets' rings. And for the idle targets the unmapping routine
will reserve 256 blocks at least.

When user space has touched the data blocks out of the iov[N],
the tcmu_page_fault() will return one zeroed blocks.

Signed-off-by: Xiubo Li 
Signed-off-by: Jianfei Hu 
---
 drivers/target/target_core_user.c | 422 ++
 1 file changed, 339 insertions(+), 83 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 4731686..373bf91 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -31,6 +31,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -67,17 +69,24 @@
 
 #define TCMU_TIME_OUT (30 * MSEC_PER_SEC)
 
-/* For cmd area, the size is fixed 2M */
-#define CMDR_SIZE (2 * 1024 * 1024)
+/* For cmd area, the size is fixed 8MB */
+#define CMDR_SIZE (8 * 1024 * 1024)
 
-/* For data area, the size is fixed 32M */
-#define DATA_BLOCK_BITS (8 * 1024)
-#define DATA_BLOCK_SIZE 4096
+/*
+ * For data area, the block size is PAGE_SIZE and
+ * the total size is 256K * PAGE_SIZE.
+ */
+#define DATA_BLOCK_SIZE PAGE_SIZE
+#define DATA_BLOCK_BITS (256 * 1024)
 #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE)
+#define DATA_BLOCK_RES_BITS 256
 
-/* The ring buffer size is 34M */
+/* The total size of the ring is 8M + 256K * PAGE_SIZE */
 #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE)
 
+/* Default maximum of the global data blocks(512K * PAGE_SIZE) */
+#define TCMU_GLOBAL_MAX_BLOCKS (512 * 1024)
+
 static struct device *tcmu_root_device;
 
 struct tcmu_hba {
@@ -87,6 +96,8 @@ struct tcmu_hba {
 #define TCMU_CONFIG_LEN 256
 
 struct tcmu_dev {
+   struct list_head node;
+
struct se_device se_dev;
 
char *name;
@@ -98,6 +109,16 @@ struct tcmu_dev {
 
struct uio_info uio_info;
 
+   struct inode *inode;
+
+   struct mutex unmap_mutex;
+   bool unmapping;
+   bool waiting_global;
+   uint32_t dbi_max;
+   uint32_t dbi_thresh;
+   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
+   struct radix_tree_root data_blocks;
+
struct tcmu_mailbox *mb_addr;
size_t dev_size;
u32 cmdr_size;
@@ -111,10 +132,6 @@ struct tcmu_dev {
/* TODO should this be a mutex? */
spinlock_t cmdr_lock;
 
-   uint32_t dbi_max;
-   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
-   struct radix_tree_root data_blocks;
-
struct idr commands;
spinlock_t commands_lock;
 
@@ -146,6 +163,14 @@ struct tcmu_cmd {
unsigned long flags;
 };
 
+static struct task_struct *unmap_thread;
+static wait_queue_head_t unmap_wait;
+static DEFINE_MUTEX(udev_mutex);
+static LIST_HEAD(root_udev);
+
+static spinlock_t db_count_lock;
+static unsigned long global_db_count;
+
 static struct kmem_cache *tcmu_cmd_cache;
 
 /* multicast group */
@@ -169,54 +194,91 @@ enum tcmu_multicast_groups {
.netnsok = true,
 };
 
-static int tcmu_get_empty_block(struct tcmu_dev *udev, void **addr)
+#define tcmu_cmd_reset_dbi_cur(cmd) ((cmd)->dbi_cur = 0)
+#define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index))
+#define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++])
+
+static inline void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd, uint32_t len)
 {
-   void *p;
-   uint32_t dbi;
-   int ret;
+   struct tcmu_dev *udev = tcmu_cmd->tcmu_dev;
+   uint32_t i;
 
-   dbi = find_first_zero_bit(udev->data_bitmap, DATA_BLOCK_BITS);
-   if (dbi > udev->dbi_max)
-   udev->dbi_max = dbi;
+   for (i = 0; i < len; i++)
+   clear_bit(tcmu_cmd->dbi[i], udev->data_bitmap);
+}
 
-   set_bit(dbi, udev->data_bitmap);
+static inline bool tcmu_get_empty_block(struct tcmu_dev *udev,
+   struct tcmu_cmd *tcmu_cmd)
+{
+   struct page *page;
+   int ret, dbi;
 
-   p = radix_tree_lookup(>data_blocks, dbi);
-   if (!p) {
-   p = kzalloc(DATA_BLOCK_SIZE, GFP_ATOMIC);
-   if (!p) {
-   clear_bit(dbi, udev->data_bitmap);
-   return -ENOMEM;
+   dbi = find_first_zero_bit(udev->data_bitmap, udev->dbi_thresh);
+   if (dbi == udev->dbi_thresh)
+   return 

[PATCHv4 0/4] tcmu: bug fix and dynamic growing data area support

2017-03-21 Thread lixiubo
From: Xiubo Li 

Changed for V4:
- re-order the #3, #4 at the head.
- merge most of the #5 to others.


Changed for V3:
- [PATCHv2 2/5] fix double usage of blocks and possible page fault
  call trace.
- [PATCHv2 5/5] fix a mistake.

Changed for V2:
- [PATCHv2 1/5] just fixes some small spelling and other mistakes.
  And as the initial patch, here sets cmd area to 8M and data area to
  1G(1M fixed and 1023M growing)
- [PATCHv2 2/5] is a new one, adding global data block pool support.
  The max total size of the pool is 2G and all the targets will get
  growing blocks from here.
  Test this using multi-targets at the same time.
- [PATCHv2 3/5] changed nothing, respin it to avoid the conflict.
- [PATCHv2 4/5] and [PATCHv2 5/5] are new ones.



Xiubo Li (4):
  tcmu: Fix possible overwrite of t_data_sg's last iov[]
  tcmu: Fix wrongly calculating of the base_command_size
  tcmu: Add dynamic growing data area feature support
  tcmu: Add global data block pool support

 drivers/target/target_core_user.c | 605 +++---
 1 file changed, 504 insertions(+), 101 deletions(-)

-- 
1.8.3.1





[PATCHv4 2/4] tcmu: Fix wrongly calculating of the base_command_size

2017-03-21 Thread lixiubo
From: Xiubo Li 

The t_data_nents and t_bidi_data_nents are the numbers of the
segments, but it couldn't be sure the block size equals to size
of the segment.

For the worst case, all the blocks are discontiguous and there
will need the same number of iovecs, that's to say: blocks == iovs.
So here just set the number of iovs to block count needed by tcmu
cmd.

Signed-off-by: Xiubo Li 
---
 drivers/target/target_core_user.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 65d475f..ae5b74f 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -408,6 +408,13 @@ static inline size_t tcmu_cmd_get_data_length(struct 
tcmu_cmd *tcmu_cmd)
return data_length;
 }
 
+static inline uint32_t tcmu_cmd_get_block_cnt(struct tcmu_cmd *tcmu_cmd)
+{
+   size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd);
+
+   return (data_length + DATA_BLOCK_SIZE - 1) / DATA_BLOCK_SIZE;
+}
+
 static sense_reason_t
 tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
 {
@@ -435,8 +442,7 @@ static inline size_t tcmu_cmd_get_data_length(struct 
tcmu_cmd *tcmu_cmd)
 * expensive to tell how many regions are freed in the bitmap
*/
base_command_size = max(offsetof(struct tcmu_cmd_entry,
-   req.iov[se_cmd->t_bidi_data_nents +
-   se_cmd->t_data_nents]),
+   req.iov[tcmu_cmd_get_block_cnt(tcmu_cmd)]),
sizeof(struct tcmu_cmd_entry));
command_size = base_command_size
+ round_up(scsi_command_size(se_cmd->t_task_cdb), 
TCMU_OP_ALIGN_SIZE);
-- 
1.8.3.1





[PATCHv4 3/4] tcmu: Add dynamic growing data area feature support

2017-03-21 Thread lixiubo
From: Xiubo Li 

Currently for the TCMU, the ring buffer size is fixed to 64K cmd
area + 1M data area, and this will be bottlenecks for high iops.

The struct tcmu_cmd_entry {} size is fixed about 112 bytes with
iovec[N] & N <= 4, and the size of struct iovec is about 16 bytes.

If N == 0, the ratio will be sizeof(cmd entry) : sizeof(datas) ==
112Bytes : (N * 4096)Bytes = 28 : 0, no data area is need.

If 0 < N <=4, the ratio will be sizeof(cmd entry) : sizeof(datas)
== 112Bytes : (N * 4096)Bytes = 28 : (N * 1024), so the max will
be 28 : 1024.

If N > 4, the sizeof(cmd entry) will be [(N - 4) *16 + 112] bytes,
and its corresponding data size will be [N * 4096], so the ratio
of sizeof(cmd entry) : sizeof(datas) == [(N - 4) * 16 + 112)Bytes
: (N * 4096)Bytes == 4/1024 - 12/(N * 1024), so the max is about
4 : 1024.

When N is bigger, the ratio will be smaller.

As the initial patch, we will set the cmd area size to 2M, and
the cmd area size to 32M. The TCMU will dynamically grows the data
area from 0 to max 32M size as needed.

The cmd area memory will be allocated through vmalloc(), and the
data area's blocks will be allocated individually later when needed.

The allocated data area block memory will be managed via radix tree.
For now the bitmap still be the most efficient way to search and
manage the block index, this could be update later.

Signed-off-by: Xiubo Li 
Signed-off-by: Jianfei Hu 
---
 drivers/target/target_core_user.c | 337 ++
 1 file changed, 233 insertions(+), 104 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index ae5b74f..4731686 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -2,6 +2,7 @@
  * Copyright (C) 2013 Shaohua Li 
  * Copyright (C) 2014 Red Hat, Inc.
  * Copyright (C) 2015 Arrikto, Inc.
+ * Copyright (C) 2017 Chinamobile, Inc.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -25,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,15 +65,17 @@
  * this may have a 'UAM' comment.
  */
 
-
 #define TCMU_TIME_OUT (30 * MSEC_PER_SEC)
 
-#define DATA_BLOCK_BITS 256
-#define DATA_BLOCK_SIZE 4096
+/* For cmd area, the size is fixed 2M */
+#define CMDR_SIZE (2 * 1024 * 1024)
 
-#define CMDR_SIZE (16 * 4096)
+/* For data area, the size is fixed 32M */
+#define DATA_BLOCK_BITS (8 * 1024)
+#define DATA_BLOCK_SIZE 4096
 #define DATA_SIZE (DATA_BLOCK_BITS * DATA_BLOCK_SIZE)
 
+/* The ring buffer size is 34M */
 #define TCMU_RING_SIZE (CMDR_SIZE + DATA_SIZE)
 
 static struct device *tcmu_root_device;
@@ -103,12 +107,14 @@ struct tcmu_dev {
size_t data_off;
size_t data_size;
 
-   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
-
wait_queue_head_t wait_cmdr;
/* TODO should this be a mutex? */
spinlock_t cmdr_lock;
 
+   uint32_t dbi_max;
+   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
+   struct radix_tree_root data_blocks;
+
struct idr commands;
spinlock_t commands_lock;
 
@@ -130,7 +136,9 @@ struct tcmu_cmd {
 
/* Can't use se_cmd when cleaning up expired cmds, because if
   cmd has been completed then accessing se_cmd is off limits */
-   DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS);
+   uint32_t dbi_cnt;
+   uint32_t dbi_cur;
+   uint32_t *dbi;
 
unsigned long deadline;
 
@@ -161,10 +169,87 @@ enum tcmu_multicast_groups {
.netnsok = true,
 };
 
+static int tcmu_get_empty_block(struct tcmu_dev *udev, void **addr)
+{
+   void *p;
+   uint32_t dbi;
+   int ret;
+
+   dbi = find_first_zero_bit(udev->data_bitmap, DATA_BLOCK_BITS);
+   if (dbi > udev->dbi_max)
+   udev->dbi_max = dbi;
+
+   set_bit(dbi, udev->data_bitmap);
+
+   p = radix_tree_lookup(>data_blocks, dbi);
+   if (!p) {
+   p = kzalloc(DATA_BLOCK_SIZE, GFP_ATOMIC);
+   if (!p) {
+   clear_bit(dbi, udev->data_bitmap);
+   return -ENOMEM;
+   }
+
+   ret = radix_tree_insert(>data_blocks, dbi, p);
+   if (ret) {
+   kfree(p);
+   clear_bit(dbi, udev->data_bitmap);
+   return ret;
+   }
+   }
+
+   *addr = p;
+   return dbi;
+}
+
+static void *tcmu_get_block_addr(struct tcmu_dev *udev, uint32_t dbi)
+{
+   return radix_tree_lookup(>data_blocks, dbi);
+}
+
+#define tcmu_cmd_reset_dbi_cur(cmd) ((cmd)->dbi_cur = 0)
+#define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index))
+#define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++])
+
+static void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd)
+{

Re: [PATCH 24/29] drivers: convert iblock_req.pending from atomic_t to refcount_t

2017-03-21 Thread Nicholas A. Bellinger
Hi Elena,

On Mon, 2017-03-06 at 16:21 +0200, Elena Reshetova wrote:
> refcount_t type and corresponding API should be
> used instead of atomic_t when the variable is used as
> a reference counter. This allows to avoid accidental
> refcounter overflows that might lead to use-after-free
> situations.
> 
> Signed-off-by: Elena Reshetova 
> Signed-off-by: Hans Liljestrand 
> Signed-off-by: Kees Cook 
> Signed-off-by: David Windsor 
> ---
>  drivers/target/target_core_iblock.c | 12 ++--
>  drivers/target/target_core_iblock.h |  3 ++-
>  2 files changed, 8 insertions(+), 7 deletions(-)

After reading up on this thread, it looks like various subsystem
maintainers are now picking these atomic_t -> refcount_t conversions..

That said, applied to target-pending/for-next and will plan to include
for v4.12-rc1 merge window.

Thanks!