Re: [PATCH v6 0/6] g_NCR5380: PDMA fixes and cleanup

2017-07-01 Thread Finn Thain
On Sat, 1 Jul 2017, Ondrej Zary wrote:

> 
> The write corruption is still present - "start" must be rolled back in 
> both IRQ and timeout cases.

Your original algorithm aborts the transfer for a timeout. Same with mine.

The bug must be a elsewhere. 

> And 128 B is not enough , 256 is OK (why did it work last time?).

When I get contradictory results it usually means I booted the wrong build 
or built the wrong branch.

Actually, I think that adding 128 to the residual is correct in some 
sitations, and 256 is correct in other situations.

> We just wrote a buffer to the chip but the chip is writing the previous 
> one to the drive - so if a problem arises, both buffers are lost.
> 

I see. I guess we have to take buffer swaps into account.

> This fixes the corruption (although the "start > 0" check seems wrong now):
> --- a/drivers/scsi/g_NCR5380.c
> +++ b/drivers/scsi/g_NCR5380.c
> @@ -598,23 +598,17 @@ static inline int generic_NCR5380_psend(struct 
> NCR5380_hostdata *hostdata,
>  CSR_HOST_BUF_NOT_RDY, 0,
>  hostdata->c400_ctl_status,
>  CSR_GATED_53C80_IRQ,
> -CSR_GATED_53C80_IRQ, HZ / 64) < 0)
> -   break;
> -
> - if (NCR5380_read(hostdata->c400_ctl_status) &
> - CSR_HOST_BUF_NOT_RDY) {
> +CSR_GATED_53C80_IRQ, HZ / 64) < 0 ||
> + (NCR5380_read(hostdata->c400_ctl_status) &
> +  (CSR_HOST_BUF_NOT_RDY | CSR_GATED_53C80_IRQ))) {

You could add a printk to the timeout branch. If it executes, something is 
seriously wrong. E.g.

-   break;
+   { pr_err("send timeout %02x, %d/%d\n", 
NCR5380_read(hostdata->c400_ctl_status), start, len); break; }

>   /* The chip has done a 128 B buffer swap but the first
>* buffer still has not reached the SCSI bus.
>*/
>   if (start > 0)
> - start -= 128;
> + start -= 256;
>   break;
>   }

BTW, that change carries the risk of 'start' going negative and the 
residual exceeding the length of the original transfer.

But I agree with you that there's a problem with the residual.

If I understand correctly, the 53c400 can't do a buffer swap until the 
disk acknowledges each of the 128 bytes from the buffer. But I guess the 
first buffer is special because the disk will not see the first byte of 
the transfer until after the first buffer swap.

And it appears that the last buffer is also special: we have to wait for 
CSR_HOST_BUF_NOT_RDY even after start == len otherwise we may not detect a 
failure and fix the residual. So I think the datasheet is right; we have 
to iterate until the block counter goes to zero.

I think it is safe to say that when CSR_HOST_BUF_NOT_RDY, 'start' is 
between 128 and 256 B ahead of the disk. Otherwise, the host buffer is 
empty and 'start' is no more than 128 B ahead of the disk.

>  
> - if (NCR5380_read(hostdata->c400_ctl_status) &
> - CSR_GATED_53C80_IRQ)
> - break;
> -
>   if (hostdata->io_port && hostdata->io_width == 2)
>   outsw(hostdata->io_port + hostdata->c400_host_buf,
> src + start, 64);
> 
> 
> DTC seems to work too.
> 

OK. Thanks for testing. Please try the patch below on top of v6.

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 49312bf98068..74bbfaf0f397 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -525,8 +525,8 @@ static inline int generic_NCR5380_precv(struct 
NCR5380_hostdata *hostdata,
NCR5380_write(hostdata->c400_blk_cnt, len / 128);
 
do {
-   if (hostdata->board == BOARD_DTC3181E && start == len - 128) {
-   /* Ignore early CSR_GATED_53C80_IRQ */
+   if (start == len - 128) {
+   /* Ignore End of DMA interrupt for the last buffer */
if (NCR5380_poll_politely(hostdata, 
hostdata->c400_ctl_status,
  CSR_HOST_BUF_NOT_RDY, 0, HZ / 
64) < 0)
break;
@@ -603,17 +603,27 @@ static inline int generic_NCR5380_psend(struct 
NCR5380_hostdata *hostdata,
 
if (NCR5380_read(hostdata->c400_ctl_status) &
CSR_HOST_BUF_NOT_RDY) {
-   /* The chip has done a 128 B buffer swap but the first
-* buffer still has not reached the SCSI bus.
-*/
-   if (start > 0)
+   /* Both 128 B buffers are in use */
+   if (start >= 128)
+   start -= 128;
+   i

Re: [PATCH] tcmu: Fix flushing cmd entry dcache page

2017-07-01 Thread Mike Christie
On 06/30/2017 03:14 AM, lixi...@cmss.chinamobile.com wrote:
> From: Xiubo Li 
> 
> When feeding the tcmu's cmd ring, we need to flush the dcache page
> for the cmd entry to make sure these kernel stores are visible to
> user space mappings of that page.
> 
> For the none PAD cmd entry, this will be flushed at the end of the
> tcmu_queue_cmd_ring().
> 
> Signed-off-by: Xiubo Li 
> ---
>  drivers/target/target_core_user.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c 
> b/drivers/target/target_core_user.c
> index 203bff1..930800c 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -699,21 +699,21 @@ static inline size_t tcmu_cmd_get_cmd_size(struct 
> tcmu_cmd *tcmu_cmd,
>   size_t pad_size = head_to_end(cmd_head, udev->cmdr_size);
>  
>   entry = (void *) mb + CMDR_OFF + cmd_head;
> - tcmu_flush_dcache_range(entry, sizeof(*entry));
>   tcmu_hdr_set_op(&entry->hdr.len_op, TCMU_OP_PAD);
>   tcmu_hdr_set_len(&entry->hdr.len_op, pad_size);
>   entry->hdr.cmd_id = 0; /* not used for PAD */
>   entry->hdr.kflags = 0;
>   entry->hdr.uflags = 0;
> + tcmu_flush_dcache_range(entry, sizeof(*entry));
>  
>   UPDATE_HEAD(mb->cmd_head, pad_size, udev->cmdr_size);
> + tcmu_flush_dcache_range(mb, sizeof(*mb));
>  
>   cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */
>   WARN_ON(cmd_head != 0);
>   }
>  
>   entry = (void *) mb + CMDR_OFF + cmd_head;
> - tcmu_flush_dcache_range(entry, sizeof(*entry));
>   tcmu_hdr_set_op(&entry->hdr.len_op, TCMU_OP_CMD);
>   entry->hdr.cmd_id = tcmu_cmd->cmd_id;
>   entry->hdr.kflags = 0;
> 

Looks ok to me.

Reviewed-by: Mike Christie 


Re: [PATCH v6 0/6] g_NCR5380: PDMA fixes and cleanup

2017-07-01 Thread Ondrej Zary
On Saturday 01 July 2017 04:40:53 Finn Thain wrote:
> Ondrej, would you please test this new series?
>
> Changed since v1:
> - PDMA transfer residual is calculated earlier.
> - End of DMA flag check is now polled (if there is any residual).
>
> Changed since v2:
> - Bail out of transfer loops when Gated IRQ gets asserted.
> - Make udelay conditional on board type.
> - Drop sg_tablesize patch due to performance regression.
>
> Changed since v3:
> - Add Ondrej's workaround for corrupt WRITE commands on DTC boards.
> - Reset the 53c400 logic after any short PDMA transfer.
> - Don't fail the transfer if the 53c400 logic got a reset.
>
> Changed since v4:
> - Bail out of transfer loops when Gated IRQ gets asserted. (Again.)
> - Always call wait_for_53c80_registers() at end of transfer.
> - Drain chip buffers after PDMA receive is interrupted.
> - Rework residual calculation.
> - Add new patch to correct DMA terminology.
>
> Changed since v5:
> - Rework residual calculation to account for on-chip buffer swap.
> - Attempt to retain the disconnect/IRQ detection in the DTC436 workaround.
> - Move all DTC436 workarounds to final patch.
>
>
> Finn Thain (2):
>   g_NCR5380: Cleanup comments and whitespace
>   g_NCR5380: Use unambiguous terminology for PDMA send and receive
>
> Ondrej Zary (4):
>   g_NCR5380: Fix PDMA transfer size
>   g_NCR5380: End PDMA transfer correctly on target disconnection
>   g_NCR5380: Re-work PDMA loops
>   g_NCR5380: Various DTC436 workarounds
>
>  drivers/scsi/g_NCR5380.c | 273
> ++- 1 file changed, 151
> insertions(+), 122 deletions(-)

The write corruption is still present - "start" must be rolled back in both
IRQ and timeout cases. And 128 B is not enough , 256 is OK (why did it work
last time?). We just wrote a buffer to the chip but the chip is writing the
previous one to the drive - so if a problem arises, both buffers are lost.

This fixes the corruption (although the "start > 0" check seems wrong now):
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -598,23 +598,17 @@ static inline int generic_NCR5380_psend(struct 
NCR5380_hostdata *hostdata,
   CSR_HOST_BUF_NOT_RDY, 0,
   hostdata->c400_ctl_status,
   CSR_GATED_53C80_IRQ,
-  CSR_GATED_53C80_IRQ, HZ / 64) < 0)
-   break;
-
-   if (NCR5380_read(hostdata->c400_ctl_status) &
-   CSR_HOST_BUF_NOT_RDY) {
+  CSR_GATED_53C80_IRQ, HZ / 64) < 0 ||
+   (NCR5380_read(hostdata->c400_ctl_status) &
+(CSR_HOST_BUF_NOT_RDY | CSR_GATED_53C80_IRQ))) {
/* The chip has done a 128 B buffer swap but the first
 * buffer still has not reached the SCSI bus.
 */
if (start > 0)
-   start -= 128;
+   start -= 256;
break;
}
 
-   if (NCR5380_read(hostdata->c400_ctl_status) &
-   CSR_GATED_53C80_IRQ)
-   break;
-
if (hostdata->io_port && hostdata->io_width == 2)
outsw(hostdata->io_port + hostdata->c400_host_buf,
  src + start, 64);


DTC seems to work too.

-- 
Ondrej Zary


Re: [PATCH v2] qla2xxx: Fix NVMe entry_type for iocb packet on BE system

2017-07-01 Thread Martin K. Petersen

Himanshu,

> This patch fixes incorrect assignment for entry_type field for Continuation
> Type iocb packet on BE system. This was caught by -Woverflow warning on BE
> system compilation.
>
> For Continuation Type iocb driver needs to write complete 32 bit value to
> initialize other field members in stucture to 0.
>
> Following warning is seen on BE system compile: 
>
> drivers/scsi/qla2xxx/qla_nvme.c: In function 'qla2x00_start_nvme_mq':
> include/uapi/linux/byteorder/big_endian.h:32:26: warning: large integer
> implicitly truncated to unsigned type [-Woverflow]
>  #define __cpu_to_le32(x) ((__force __le32)__swab32((x)))

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 2/2] scsi: qla2xxx: avoid unused-function warning

2017-07-01 Thread Martin K. Petersen

Arnd,

> When NVMe support is disabled, we get a couple of harmless warnings:
>
> drivers/scsi/qla2xxx/qla_nvme.c:667:13: error: 
> 'qla_nvme_unregister_remote_port' defined but not used 
> [-Werror=unused-function]
> drivers/scsi/qla2xxx/qla_nvme.c:634:13: error: 'qla_nvme_abort_all' defined 
> but not used [-Werror=unused-function]
> drivers/scsi/qla2xxx/qla_nvme.c:604:12: error: 'qla_nvme_wait_on_rport_del' 
> defined but not used [-Werror=unused-function]
>
> This replaces the preprocessor checks in the code with equivalent
> compiler conditionals, which lets gcc drop the unused functions
> without warning, and is nicer to read.

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH][scsi-next] scsi: qla2xxx: fix a bunch of typos and spelling mistakes

2017-07-01 Thread Martin K. Petersen

Colin,

> Fix the following typos/spelling mistakes:
>
> "attribure" -> "attribute"
> "suppored" -> "supported"
> "Symobilic" -> "Symbolic"
> "iteself" -> "itself"
> "reqeust" -> "request"
> "nvme_wait_on_comand" -> "nvme_wait_on_command"
> "bount" -> "bound"
> "captrue_mask" -> "capture_mask"
> "tempelate" -> "template"
>
> ..and also unwrap a line to fix a checkpatch warning.

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 1/2] scsi: lpfc: spin_lock_irq() is not nestable

2017-07-01 Thread Martin K. Petersen

Dan,

> We're calling spin_lock_irq() multiple times, the problem is that on
> the first spin_unlock_irq() then we will re-enable IRQs and we don't
> want that.

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 2/2] scsi: lpfc: don't double count abort errors

2017-07-01 Thread Martin K. Petersen

Dan,

> If lpfc_nvmet_unsol_fcp_issue_abort() fails then we accidentally
> increment "tgtp->xmt_abort_rsp_error" and then two lines later we
> increment it a second time.

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] scsi: hisi_sas: optimise DMA slot memory

2017-07-01 Thread Martin K. Petersen

John,

> Currently we allocate 3 sets of DMA memories from separate
> pools for each slot. This is inefficient in terms of memory usage
> (buffers are less than 1 page in size, so we lose due to alignment),
> and also time spent in doing 3 allocations + de-allocations per slot,
> instead of 1.
>
> To optimise, combine the 3 DMA buffers into a single buffer from a
> single pool.

Applied to 4.13/scsi-queue. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: ibmvfc: constify dev_pm_ops structures.

2017-07-01 Thread Martin K. Petersen

Arvind,

> dev_pm_ops are not supposed to change at runtime. All functions
> working with dev_pm_ops provided by  work with const
> dev_pm_ops. So mark the non-const structs as const.

Applied to 4.13/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: ibmvscsi: constify dev_pm_ops structures.

2017-07-01 Thread Martin K. Petersen

Arvind,

> dev_pm_ops are not supposed to change at runtime. All functions
> working with dev_pm_ops provided by  work with const
> dev_pm_ops. So mark the non-const structs as const.

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/3] cxlflash: Minor fix and EH refactoring

2017-07-01 Thread Martin K. Petersen

Matthew,

> This small series fixes a recently injected double free and also
> includes two patches to refactor the host and reset handlers to ease
> the burden of making this driver compatible with an updated SCSI EH
> framework.

Applied to 4.13/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH RESEND] scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state

2017-07-01 Thread Martin K. Petersen

Ewan,

> The addition of the STARGET_REMOVE state had the side effect of
> introducing a race condition that can cause a crash.
>
> scsi_target_reap_ref_release() checks the starget->state to
> see if it still in STARGET_CREATED, and if so, skips calling
> transport_remove_device() and device_del(), because the starget->state
> is only set to STARGET_RUNNING after scsi_target_add() has called
> device_add() and transport_add_device().

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] ses: do not add a device to an enclosure if enclosure_add_links() fails.

2017-07-01 Thread Martin K. Petersen

Maurizio,

> The enclosure_add_device() function should fail if it can't
> create the relevant sysfs links.

Applied to 4.13/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: ufs: flush eh_work when eh_work scheduled.

2017-07-01 Thread Martin K. Petersen

Zang,

> Forget a condition: eh_work scheduled but do not start to work.

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] qla2xxx: Protect access to qpair members with qpair->qp_lock

2017-07-01 Thread Martin K. Petersen

Johannes,

> In qla2xx_start_scsi_mq() and qla2xx_dif_start_scsi_mq() we grab the
> qpair->qp_lock but do access members of the qpair before having the lock.
> Re-order the locking sequence to have all read and write access to qpair
> members under the qpair->qp_lock.

Applied to 4.13/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering