Re: [PATCH v2] virtio_scsi: Reject commands when virtqueue is broken

2017-01-15 Thread Fam Zheng
On Fri, 01/13 12:48, Eric Farman wrote:
> In the case of a graceful set of detaches, where the virtio-scsi-ccw
> disk is removed from the guest prior to the controller, the guest
> behaves quite normally.  Specifically, the detach gets us into
> sd_sync_cache to issue a Synchronize Cache(10) command, which
> immediately fails (and is retried a couple of times) because the
> device has been removed.  Later, the removal of the controller
> sees two CRWs presented, but there's no further indication of the
> removal from the guest viewpoint.
> 
>  [   17.217458] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>  [   17.219257] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
>  [   21.449400] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
> erc=4, rsid=2
>  [   21.449406] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
> erc=4, rsid=0
> 
> However, on s390, the SCSI disks can be removed "by surprise" when
> an entire controller (host) is removed and all associated disks
> are removed via the loop in scsi_forget_host.  The same call to
> sd_sync_cache is made, but because the controller has already
> been removed, the Synchronize Cache(10) command is neither issued
> (and then failed) nor rejected.
> 
> That the I/O isn't returned means the guest cannot have other devices
> added nor removed, and other tasks (such as shutdown or reboot) issued
> by the guest will not complete either.  The virtio ring has already
> been marked as broken (via virtio_break_device in virtio_ccw_remove),
> but we still attempt to queue the command only to have it remain there.
> The calling sequence provides a bit of distinction for us:
> 
>   virtscsi_queuecommand()
>-> virtscsi_kick_cmd()
> -> virtscsi_add_cmd()
>  -> virtqueue_add_sgs()
>   -> virtqueue_add()
>  if success
>return 0
>  elseif vq->broken or vring_mapping_error()
>return -EIO
>  else
>return -ENOSPC
> 
> A return of ENOSPC is generally a temporary condition, so returning
> "host busy" from virtscsi_queuecommand makes sense here, to have it
> redriven in a moment or two.  But the EIO return code is more of a
> permanent error and so it would be wise to return the I/O itself and
> allow the calling thread to finish gracefully.  The result is these
> four kernel messages in the guest (the fourth one does not occur
> prior to this patch):
> 
>  [   22.921562] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, 
> erc=4, rsid=2
>  [   22.921580] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, 
> erc=4, rsid=0
>  [   22.921978] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>  [   22.921993] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: 
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> 
> I opted to fill in the same response data that is returned from the
> more graceful device detach, where the disk device is removed prior
> to the controller device.
> 
> Signed-off-by: Eric Farman 
> ---
>  drivers/scsi/virtio_scsi.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index ec91bd0..c680d76 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -534,7 +534,9 @@ static int virtscsi_queuecommand(struct virtio_scsi 
> *vscsi,
>  {
>   struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
>   struct virtio_scsi_cmd *cmd = scsi_cmd_priv(sc);
> + unsigned long flags;
>   int req_size;
> + int ret;
>  
>   BUG_ON(scsi_sg_count(sc) > shost->sg_tablesize);
>  
> @@ -562,8 +564,15 @@ static int virtscsi_queuecommand(struct virtio_scsi 
> *vscsi,
>   req_size = sizeof(cmd->req.cmd);
>   }
>  
> - if (virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd)) != 
> 0)
> + ret = virtscsi_kick_cmd(req_vq, cmd, req_size, sizeof(cmd->resp.cmd));
> + if (ret == -EIO) {
> + cmd->resp.cmd.response = VIRTIO_SCSI_S_BAD_TARGET;
> + spin_lock_irqsave(&req_vq->vq_lock, flags);
> + virtscsi_complete_cmd(vscsi, cmd);
> + spin_unlock_irqrestore(&req_vq->vq_lock, flags);
> + } else if (ret != 0) {
>   return SCSI_MLQUEUE_HOST_BUSY;
> + }
>   return 0;
>  }
>  
> -- 
> 1.9.1
> 

Reviewed-by: Fam Zheng 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] ncr5380: Clean up dead code and redundant macro usage

2017-01-15 Thread Finn Thain
Remove dead code inside #if 0 conditionals.

Remove the #ifdef __KERNEL__ test, since NCR5380.h has no definitions
that relate to userspace code.

Remove two redundant macro definitions which were overlooked in
commit e9db3198e08b ("sun3_scsi: Adopt NCR5380.c core driver").

Signed-off-by: Finn Thain 
---
 drivers/scsi/NCR5380.h|  7 ---
 drivers/scsi/atari_scsi.c | 31 ---
 drivers/scsi/sun3_scsi.c  |  3 ---
 3 files changed, 41 deletions(-)

diff --git a/drivers/scsi/NCR5380.h b/drivers/scsi/NCR5380.h
index e61d9f9..d78f0957 100644
--- a/drivers/scsi/NCR5380.h
+++ b/drivers/scsi/NCR5380.h
@@ -166,11 +166,7 @@
 #define CSR_SCSI_BUF_RDY   0x02/* ro  SCSI buffer read */
 #define CSR_GATED_53C80_IRQ0x01/* ro  Last block xferred */
 
-#if 0
-#define CSR_BASE CSR_SCSI_BUFF_INTR | CSR_53C80_INTR
-#else
 #define CSR_BASE CSR_53C80_INTR
-#endif
 
 /* Note : PHASE_* macros are based on the values of the STATUS register */
 #define PHASE_MASK (SR_MSG | SR_CD | SR_IO)
@@ -229,8 +225,6 @@ struct NCR5380_hostdata {
char info[168]; /* Host banner message */
 };
 
-#ifdef __KERNEL__
-
 struct NCR5380_cmd {
struct list_head list;
 };
@@ -323,5 +317,4 @@ static inline int NCR5380_dma_residual_none(struct 
NCR5380_hostdata *hostdata)
return 0;
 }
 
-#endif /* __KERNEL__ */
 #endif /* NCR5380_H */
diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
index 105b353..2b6eb7c 100644
--- a/drivers/scsi/atari_scsi.c
+++ b/drivers/scsi/atari_scsi.c
@@ -178,37 +178,6 @@ static int scsi_dma_is_ignored_buserr(unsigned char 
dma_stat)
 }
 
 
-#if 0
-/* Dead code... wasn't called anyway :-) and causes some trouble, because at
- * end-of-DMA, both SCSI ints are triggered simultaneously, so the NCR int has
- * to clear the DMA int pending bit before it allows other level 6 interrupts.
- */
-static void scsi_dma_buserr(int irq, void *dummy)
-{
-   unsigned char dma_stat = tt_scsi_dma.dma_ctrl;
-
-   /* Don't do anything if a NCR interrupt is pending. Probably it's just
-* masked... */
-   if (atari_irq_pending(IRQ_TT_MFP_SCSI))
-   return;
-
-   printk("Bad SCSI DMA interrupt! dma_addr=0x%08lx dma_stat=%02x 
dma_cnt=%08lx\n",
-  SCSI_DMA_READ_P(dma_addr), dma_stat, SCSI_DMA_READ_P(dma_cnt));
-   if (dma_stat & 0x80) {
-   if (!scsi_dma_is_ignored_buserr(dma_stat))
-   printk("SCSI DMA bus error -- bad DMA programming!\n");
-   } else {
-   /* Under normal circumstances we never should get to this point,
-* since both interrupts are triggered simultaneously and the 
5380
-* int has higher priority. When this irq is handled, that DMA
-* interrupt is cleared. So a warning message is printed here.
-*/
-   printk("SCSI DMA intr ?? -- this shouldn't happen!\n");
-   }
-}
-#endif
-
-
 static irqreturn_t scsi_tt_intr(int irq, void *dev)
 {
struct Scsi_Host *instance = dev;
diff --git a/drivers/scsi/sun3_scsi.c b/drivers/scsi/sun3_scsi.c
index 88db699..166f466 100644
--- a/drivers/scsi/sun3_scsi.c
+++ b/drivers/scsi/sun3_scsi.c
@@ -56,9 +56,6 @@
 #define NCR5380_dma_send_setup  sun3scsi_dma_count
 #define NCR5380_dma_residualsun3scsi_dma_residual
 
-#define NCR5380_acquire_dma_irq(instance)(1)
-#define NCR5380_release_dma_irq(instance)
-
 #include "NCR5380.h"
 
 
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] ncr5380: Reduce #include files

2017-01-15 Thread Finn Thain
The NCR5380 wrapper drivers don't export symbols or declarations and
don't actually need separate header files. Most of these header files
were removed already; only sun3_scsi.h and g_NCR5380.h remain.

Move the remaining definitions to the corresponding .c files to improve
readability and proximity. The #defines which influence the #included
core driver are no longer mixed up with unrelated #defines and #includes.

Signed-off-by: Finn Thain 
---
 drivers/scsi/g_NCR5380.c |  45 -
 drivers/scsi/g_NCR5380.h |  56 --
 drivers/scsi/sun3_scsi.c |  80 -
 drivers/scsi/sun3_scsi.h | 102 ---
 4 files changed, 122 insertions(+), 161 deletions(-)
 delete mode 100644 drivers/scsi/g_NCR5380.h
 delete mode 100644 drivers/scsi/sun3_scsi.h

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 6f9665d..67c8dac 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -26,14 +26,55 @@
 #include 
 #include 
 #include 
-#include "g_NCR5380.h"
-#include "NCR5380.h"
 #include 
 #include 
 #include 
 #include 
 #include 
 
+/* Definitions for the core NCR5380 driver. */
+
+#define NCR5380_read(reg) \
+   ioread8(hostdata->io + hostdata->offset + (reg))
+#define NCR5380_write(reg, value) \
+   iowrite8(value, hostdata->io + hostdata->offset + (reg))
+
+#define NCR5380_implementation_fields \
+   int offset; \
+   int c400_ctl_status; \
+   int c400_blk_cnt; \
+   int c400_host_buf; \
+   int io_width
+
+#define NCR5380_dma_xfer_lengeneric_NCR5380_dma_xfer_len
+#define NCR5380_dma_recv_setup  generic_NCR5380_pread
+#define NCR5380_dma_send_setup  generic_NCR5380_pwrite
+#define NCR5380_dma_residualNCR5380_dma_residual_none
+
+#define NCR5380_intrgeneric_NCR5380_intr
+#define NCR5380_queue_command   generic_NCR5380_queue_command
+#define NCR5380_abort   generic_NCR5380_abort
+#define NCR5380_bus_reset   generic_NCR5380_bus_reset
+#define NCR5380_infogeneric_NCR5380_info
+
+#define NCR5380_io_delay(x) udelay(x)
+
+#include "NCR5380.h"
+
+#define DRV_MODULE_NAME "g_NCR5380"
+
+#define NCR53C400_mem_base 0x3880
+#define NCR53C400_host_buffer 0x3900
+#define NCR53C400_region_size 0x3a00
+
+#define BOARD_NCR5380 0
+#define BOARD_NCR53C400 1
+#define BOARD_NCR53C400A 2
+#define BOARD_DTC3181E 3
+#define BOARD_HP_C2502 4
+
+#define IRQ_AUTO 254
+
 #define MAX_CARDS 8
 
 /* old-style parameters for compatibility */
diff --git a/drivers/scsi/g_NCR5380.h b/drivers/scsi/g_NCR5380.h
deleted file mode 100644
index 81b22d9..000
--- a/drivers/scsi/g_NCR5380.h
+++ /dev/null
@@ -1,56 +0,0 @@
-/*
- * Generic Generic NCR5380 driver defines
- *
- * Copyright 1993, Drew Eckhardt
- * Visionary Computing
- * (Unix and Linux consulting and custom programming)
- * d...@colorado.edu
- *  +1 (303) 440-4894
- *
- * NCR53C400 extensions (c) 1994,1995,1996, Kevin Lentin
- *k.len...@cs.monash.edu.au
- */
-
-#ifndef GENERIC_NCR5380_H
-#define GENERIC_NCR5380_H
-
-#define DRV_MODULE_NAME "g_NCR5380"
-
-#define NCR5380_read(reg) \
-   ioread8(hostdata->io + hostdata->offset + (reg))
-#define NCR5380_write(reg, value) \
-   iowrite8(value, hostdata->io + hostdata->offset + (reg))
-
-#define NCR5380_implementation_fields \
-   int offset; \
-   int c400_ctl_status; \
-   int c400_blk_cnt; \
-   int c400_host_buf; \
-   int io_width;
-
-#define NCR53C400_mem_base 0x3880
-#define NCR53C400_host_buffer 0x3900
-#define NCR53C400_region_size 0x3a00
-
-#define NCR5380_dma_xfer_len   generic_NCR5380_dma_xfer_len
-#define NCR5380_dma_recv_setup generic_NCR5380_pread
-#define NCR5380_dma_send_setup generic_NCR5380_pwrite
-#define NCR5380_dma_residual   NCR5380_dma_residual_none
-
-#define NCR5380_intr generic_NCR5380_intr
-#define NCR5380_queue_command generic_NCR5380_queue_command
-#define NCR5380_abort generic_NCR5380_abort
-#define NCR5380_bus_reset generic_NCR5380_bus_reset
-#define NCR5380_info generic_NCR5380_info
-
-#define NCR5380_io_delay(x)udelay(x)
-
-#define BOARD_NCR5380  0
-#define BOARD_NCR53C4001
-#define BOARD_NCR53C400A 2
-#define BOARD_DTC3181E 3
-#define BOARD_HP_C2502 4
-
-#define IRQ_AUTO   254
-
-#endif /* GENERIC_NCR5380_H */
diff --git a/drivers/scsi/sun3_scsi.c b/drivers/scsi/sun3_scsi.c
index 166f466..14cebaf 100644
--- a/drivers/scsi/sun3_scsi.c
+++ b/drivers/scsi/sun3_scsi.c
@@ -34,7 +34,6 @@
 #include 
 
 #include 
-#include "sun3_scsi.h"
 
 /* minimum number of bytes to do dma on */
 #define DMA_MIN_SIZE129
@@ -58,6 +57,85 @@
 
 #include "NCR5380.h"
 
+/* dma regs start at regbase + 8, directly after the NCR regs */
+struct sun3_dma_regs {
+   unsigned short dma_addr_hi; /* vme only */
+   unsigned short d

[PATCH 1/6] ncr5380: Shorten host info string by removing unused option macros

2017-01-15 Thread Finn Thain
The DIFFERENTIAL and PARITY option macros are unused: no supported
hardware uses differential signalling and the core driver never
implemented parity checking. These options just waste space in the host
info string.

While we are here, fix a typo in the NCR5380_info() kernel-doc comment.

Signed-off-by: Finn Thain 
---
 drivers/scsi/NCR5380.c | 49 +
 drivers/scsi/NCR5380.h | 10 +-
 2 files changed, 10 insertions(+), 49 deletions(-)

diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
index 4f5ca79..f29b407 100644
--- a/drivers/scsi/NCR5380.c
+++ b/drivers/scsi/NCR5380.c
@@ -96,17 +96,6 @@
  * of chips.  To use it, you write an architecture specific functions
  * and macros and include this file in your driver.
  *
- * These macros control options :
- * AUTOSENSE - if defined, REQUEST SENSE will be performed automatically
- * for commands that return with a CHECK CONDITION status.
- *
- * DIFFERENTIAL - if defined, NCR53c81 chips will use external differential
- * transceivers.
- *
- * PSEUDO_DMA - if defined, PSEUDO DMA is used during the data transfer phases.
- *
- * REAL_DMA - if defined, REAL DMA is used during the data transfer phases.
- *
  * These macros MUST be defined :
  *
  * NCR5380_read(register)  - read from the specified register
@@ -347,7 +336,7 @@ static void NCR5380_print_phase(struct Scsi_Host *instance)
 #endif
 
 /**
- * NCR58380_info - report driver and host information
+ * NCR5380_info - report driver and host information
  * @instance: relevant scsi host instance
  *
  * For use as the host template info() handler.
@@ -360,33 +349,6 @@ static const char *NCR5380_info(struct Scsi_Host *instance)
return hostdata->info;
 }
 
-static void prepare_info(struct Scsi_Host *instance)
-{
-   struct NCR5380_hostdata *hostdata = shost_priv(instance);
-
-   snprintf(hostdata->info, sizeof(hostdata->info),
-"%s, irq %d, "
-"io_port 0x%lx, base 0x%lx, "
-"can_queue %d, cmd_per_lun %d, "
-"sg_tablesize %d, this_id %d, "
-"flags { %s%s%s}, "
-"options { %s} ",
-instance->hostt->name, instance->irq,
-hostdata->io_port, hostdata->base,
-instance->can_queue, instance->cmd_per_lun,
-instance->sg_tablesize, instance->this_id,
-hostdata->flags & FLAG_DMA_FIXUP ? "DMA_FIXUP " : "",
-hostdata->flags & FLAG_NO_PSEUDO_DMA ? "NO_PSEUDO_DMA " : "",
-hostdata->flags & FLAG_TOSHIBA_DELAY ? "TOSHIBA_DELAY "  : "",
-#ifdef DIFFERENTIAL
-"DIFFERENTIAL "
-#endif
-#ifdef PARITY
-"PARITY "
-#endif
-"");
-}
-
 /**
  * NCR5380_init - initialise an NCR5380
  * @instance: adapter to configure
@@ -436,7 +398,14 @@ static int NCR5380_init(struct Scsi_Host *instance, int 
flags)
if (!hostdata->work_q)
return -ENOMEM;
 
-   prepare_info(instance);
+   snprintf(hostdata->info, sizeof(hostdata->info),
+   "%s, irq %d, io_port 0x%lx, base 0x%lx, can_queue %d, 
cmd_per_lun %d, sg_tablesize %d, this_id %d, flags { %s%s%s}",
+   instance->hostt->name, instance->irq, hostdata->io_port,
+   hostdata->base, instance->can_queue, instance->cmd_per_lun,
+   instance->sg_tablesize, instance->this_id,
+   hostdata->flags & FLAG_DMA_FIXUP ? "DMA_FIXUP " : "",
+   hostdata->flags & FLAG_NO_PSEUDO_DMA ? "NO_PSEUDO_DMA " : "",
+   hostdata->flags & FLAG_TOSHIBA_DELAY ? "TOSHIBA_DELAY " : "");
 
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
NCR5380_write(MODE_REG, MR_BASE);
diff --git a/drivers/scsi/NCR5380.h b/drivers/scsi/NCR5380.h
index 51a3567..e61d9f9 100644
--- a/drivers/scsi/NCR5380.h
+++ b/drivers/scsi/NCR5380.h
@@ -81,11 +81,7 @@
 #define ICR_ASSERT_ATN 0x02/* rw Set to assert ATN */
 #define ICR_ASSERT_DATA0x01/* rw SCSI_DATA_REG is asserted 
*/
 
-#ifdef DIFFERENTIAL
-#define ICR_BASE   ICR_DIFF_ENABLE
-#else
 #define ICR_BASE   0
-#endif
 
 #define MODE_REG   2
 /*
@@ -102,11 +98,7 @@
 #define MR_DMA_MODE0x02/* rw DMA / pseudo DMA mode */
 #define MR_ARBITRATE   0x01/* rw start arbitration */
 
-#ifdef PARITY
-#define MR_BASEMR_ENABLE_PAR_CHECK
-#else
 #define MR_BASE0
-#endif
 
 #define TARGET_COMMAND_REG 3
 #define TCR_LAST_BYTE_SENT 0x80/* ro DMA done */
@@ -234,7 +226,7 @@ struct NCR5380_hostdata {
unsigned char id_higher_mask;   /* All bits above id_mask */
unsigned char last_message; /* Last Message Out */
unsigned long region_size;  /* Size of address/port range */
-   char info[256];
+   char info[168]; /* Hos

[PATCH 4/6] ncr5380: Resolve various static checker warnings

2017-01-15 Thread Finn Thain
Avoid various warnings from "make C=1" by annotating a couple of
unlock-then-lock sequences, replacing a zero with NULL and
correcting some type casts.

Also avoid a warning from "make W=1" by adding braces.

Signed-off-by: Finn Thain 
---
These are the warning messages referred to above:

drivers/scsi/NCR5380.c:
warning: context imbalance in 'NCR5380_select' - unexpected unlock
warning: context imbalance in 'NCR5380_information_transfer' - unexpected unlock

drivers/scsi/atari_scsi.c:816:39: warning: Using plain integer as NULL pointer

drivers/scsi/mac_scsi.c:157:46: warning: incorrect type in initializer 
(different address spaces)
drivers/scsi/mac_scsi.c:157:46: expected unsigned char *s
drivers/scsi/mac_scsi.c:157:46:got unsigned char [noderef] [usertype] 
*

drivers/scsi/mac_scsi.c:260:46: warning: incorrect type in initializer 
(different address spaces)
drivers/scsi/mac_scsi.c:260:46: expected unsigned char *d
drivers/scsi/mac_scsi.c:260:46:got unsigned char [noderef] [usertype] 
*

drivers/scsi/mac_scsi.c:384:22: warning: incorrect type in assignment 
(different address spaces)
drivers/scsi/mac_scsi.c:384:22: expected unsigned char [noderef] [usertype] 
*io
drivers/scsi/mac_scsi.c:384:22:got void *

drivers/scsi/mac_scsi.c:387:35: warning: incorrect type in assignment 
(different address spaces)
drivers/scsi/mac_scsi.c:387:35: expected unsigned char [noderef] [usertype] 
*pdma_io
drivers/scsi/mac_scsi.c:387:35:got void *

drivers/scsi/NCR5380.c: In function 'maybe_release_dma_irq':
drivers/scsi/NCR5380.c:626:36: warning: suggest braces around empty body in an 
'if' statement [-Wempty-body]
   NCR5380_release_dma_irq(instance);

Avoiding these messages doesn't fix any bugs but may improve the
signal/noise ratio of static checker output.
---
 drivers/scsi/NCR5380.c| 5 -
 drivers/scsi/atari_scsi.c | 2 +-
 drivers/scsi/mac_scsi.c   | 8 
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
index f29b407..518d101 100644
--- a/drivers/scsi/NCR5380.c
+++ b/drivers/scsi/NCR5380.c
@@ -591,8 +591,9 @@ static inline void maybe_release_dma_irq(struct Scsi_Host 
*instance)
list_empty(&hostdata->unissued) &&
list_empty(&hostdata->autosense) &&
!hostdata->connected &&
-   !hostdata->selecting)
+   !hostdata->selecting) {
NCR5380_release_dma_irq(instance);
+   }
 }
 
 /**
@@ -931,6 +932,7 @@ static irqreturn_t __maybe_unused NCR5380_intr(int irq, 
void *dev_id)
 
 static struct scsi_cmnd *NCR5380_select(struct Scsi_Host *instance,
 struct scsi_cmnd *cmd)
+   __releases(&hostdata->lock) __acquires(&hostdata->lock)
 {
struct NCR5380_hostdata *hostdata = shost_priv(instance);
unsigned char tmp[3], phase;
@@ -1623,6 +1625,7 @@ static int NCR5380_transfer_dma(struct Scsi_Host 
*instance,
  */
 
 static void NCR5380_information_transfer(struct Scsi_Host *instance)
+   __releases(&hostdata->lock) __acquires(&hostdata->lock)
 {
struct NCR5380_hostdata *hostdata = shost_priv(instance);
unsigned char msgout = NOP;
diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
index 2b6eb7c..b2ffab65 100644
--- a/drivers/scsi/atari_scsi.c
+++ b/drivers/scsi/atari_scsi.c
@@ -782,7 +782,7 @@ static int __init atari_scsi_probe(struct platform_device 
*pdev)
return -ENOMEM;
}
atari_dma_phys_buffer = atari_stram_to_phys(atari_dma_buffer);
-   atari_dma_orig_addr = 0;
+   atari_dma_orig_addr = NULL;
}
 
instance = scsi_host_alloc(&atari_scsi_template,
diff --git a/drivers/scsi/mac_scsi.c b/drivers/scsi/mac_scsi.c
index ccb68d1..196acc7 100644
--- a/drivers/scsi/mac_scsi.c
+++ b/drivers/scsi/mac_scsi.c
@@ -154,7 +154,7 @@ __asm__ __volatile__
\
 static inline int macscsi_pread(struct NCR5380_hostdata *hostdata,
 unsigned char *dst, int len)
 {
-   unsigned char *s = hostdata->pdma_io + (INPUT_DATA_REG << 4);
+   u8 __iomem *s = hostdata->pdma_io + (INPUT_DATA_REG << 4);
unsigned char *d = dst;
int n = len;
int transferred;
@@ -257,7 +257,7 @@ static inline int macscsi_pwrite(struct NCR5380_hostdata 
*hostdata,
  unsigned char *src, int len)
 {
unsigned char *s = src;
-   unsigned char *d = hostdata->pdma_io + (OUTPUT_DATA_REG << 4);
+   u8 __iomem *d = hostdata->pdma_io + (OUTPUT_DATA_REG << 4);
int n = len;
int transferred;
 
@@ -381,10 +381,10 @@ static int __init mac_scsi_probe(struct platform_device 
*pdev)
 
hostdata = shost_priv(instance);
hostdata->base = pio_mem->start;
-   hostdata->io = (void *)pio_mem->start;
+   hostdata->io = (u8 __iomem *)pio_mem->start;
 
if (pdma_mem && setup_use_pdm

[PATCH 5/6] ncr5380: Improve target selection robustness

2017-01-15 Thread Finn Thain
Handle timeout or bus phase change errors that could occur when
sending the IDENTIFY message.

Signed-off-by: Finn Thain 
---
 drivers/scsi/NCR5380.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
index 518d101..acc3344 100644
--- a/drivers/scsi/NCR5380.c
+++ b/drivers/scsi/NCR5380.c
@@ -1165,8 +1165,16 @@ static struct scsi_cmnd *NCR5380_select(struct Scsi_Host 
*instance,
data = tmp;
phase = PHASE_MSGOUT;
NCR5380_transfer_pio(instance, &phase, &len, &data);
+   if (len) {
+   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
+   cmd->result = DID_ERROR << 16;
+   complete_cmd(instance, cmd);
+   dsprintk(NDEBUG_SELECTION, instance, "IDENTIFY message transfer 
failed\n");
+   cmd = NULL;
+   goto out;
+   }
+
dsprintk(NDEBUG_SELECTION, instance, "nexus established.\n");
-   /* XXX need to handle errors here */
 
hostdata->connected = cmd;
hostdata->busy[cmd->device->id] |= 1 << cmd->device->lun;
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] atari_scsi: Reset DMA during bus reset only under ST-DMA lock

2017-01-15 Thread Finn Thain
The atari_scsi driver should not access Falcon DMA chip registers
unless it has acquired exclusive access to that chip. If the driver
doesn't have exclusive access then there's no need for a DMA reset
as there are no scsi commands in progress.

Signed-off-by: Finn Thain 
---
 drivers/scsi/atari_scsi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
index b2ffab65..f792420 100644
--- a/drivers/scsi/atari_scsi.c
+++ b/drivers/scsi/atari_scsi.c
@@ -682,7 +682,8 @@ static int atari_scsi_bus_reset(struct scsi_cmnd *cmd)
if (IS_A_TT()) {
tt_scsi_dma.dma_ctrl = 0;
} else {
-   st_dma.dma_mode_status = 0x90;
+   if (stdma_is_locked_by(scsi_falcon_intr))
+   st_dma.dma_mode_status = 0x90;
atari_dma_active = 0;
atari_dma_orig_addr = NULL;
}
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] ncr5380: Miscellaneous minor patches

2017-01-15 Thread Finn Thain
This series removes some unused code and related comments,
addresses the warnings generated by 'make W=1' and 'make C=1'
and fixes a theoretical bug in the bus reset method in atari_scsi.

There's also a patch to add a missing error check during target
selection. The only target I tested was a QUANTUM DAYTONA514S disk
as that's all I have access to right now. Some testing with other
targets would be prudent.

Michael, Ondrej, can I get you to review/test please?


Finn Thain (6):
  ncr5380: Shorten host info string by removing unused option macros
  ncr5380: Clean up dead code and redundant macro usage
  ncr5380: Reduce #include files
  ncr5380: Resolve various static checker warnings
  ncr5380: Improve target selection robustness
  atari_scsi: Reset DMA during bus reset only under ST-DMA lock

 drivers/scsi/NCR5380.c|  64 ++---
 drivers/scsi/NCR5380.h|  17 +---
 drivers/scsi/atari_scsi.c |  36 ++--
 drivers/scsi/g_NCR5380.c  |  45 +++-
 drivers/scsi/g_NCR5380.h  |  56 -
 drivers/scsi/mac_scsi.c   |   8 ++--
 drivers/scsi/sun3_scsi.c  |  83 +++--
 drivers/scsi/sun3_scsi.h  | 102 --
 8 files changed, 152 insertions(+), 259 deletions(-)
 delete mode 100644 drivers/scsi/g_NCR5380.h
 delete mode 100644 drivers/scsi/sun3_scsi.h

-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread Bart Van Assche
On Sun, 2017-01-15 at 11:49 -0800, James Bottomley wrote:
> On Sun, 2017-01-15 at 11:41 -0800, Linus Torvalds wrote:
> > On Sun, Jan 15, 2017 at 11:13 AM, James Bottomley
> >  wrote:
> > > 
> > > Can we compromise on "try not to revert a fix ...".
> > 
> > No.
> > 
> > It's about timing, and about how serious the regression is.
> > 
> > For example, if this happened in rc7, I would have reverted
> > immediately. No questions asked.
> > 
> > In this case, the "fix" was was also much less important then the
> > problem it caused. Some specialized pass-through command not
> > working
> > right, vs a machine not even booting? There's just no question
> > what-so-ever.
> > 
> > So the "fix" you claim just wasn't nearly important enough. It was
> > also pretty recent and clearly things had worked for _years_
> > without
> > it.
> > 
> > In fact, I'm still somewhat inclined to revert it, just to have a
> > working rc4 release later today. But I'm hoping maybe Ingo has
> > time 
> > to test things (although I suspect he's already asleep).
> 
> OK, so the patch to revert would actually be
> 
> commit 669f044170d8933c3d66d231b69ea97cb8447338
> Author: Bart Van Assche 
> Date:   Tue Nov 22 16:17:13 2016 -0800
> 
> scsi: srp_transport: Move queuecommand() wait code to SCSI core
> 
> Because that change in the wait code broke the "fix" in mpt3sas. 
> Before that was applied, it actually worked even though I think it's
> a wrong fix.

Hello James,

I disagree. Even if my patch would be reverted that still wouldn't fix
the severe race condition that was introduced in the mpt3sas driver by
the patch that triggers the lockup during boot. As I explained two
weeks ago (see also https://www.spinics.net/lists/kernel/msg2411413.htm
l), commit 18f6084a989b ("scsi: mpt3sas: Fix secure erase premature
termination") is the one that should be reverted instead of my patch. I
agree with Linus that the offending mpt3sas patch already should have
been reverted.

Bart.

Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread James Bottomley
On Sun, 2017-01-15 at 11:41 -0800, Linus Torvalds wrote:
> On Sun, Jan 15, 2017 at 11:13 AM, James Bottomley
>  wrote:
> > 
> > Can we compromise on "try not to revert a fix ...".
> 
> No.
> 
> It's about timing, and about how serious the regression is.
> 
> For example, if this happened in rc7, I would have reverted
> immediately. No questions asked.
> 
> In this case, the "fix" was was also much less important then the
> problem it caused. Some specialized pass-through command not working
> right, vs a machine not even booting? There's just no question
> what-so-ever.
> 
> So the "fix" you claim just wasn't nearly important enough. It was
> also pretty recent and clearly things had worked for _years_ without
> it.
> 
> In fact, I'm still somewhat inclined to revert it, just to have a
> working rc4 release later today. But I'm hoping maybe Ingo has time 
> to test things (although I suspect he's already asleep).

OK, so the patch to revert would actually be

commit 669f044170d8933c3d66d231b69ea97cb8447338
Author: Bart Van Assche 
Date:   Tue Nov 22 16:17:13 2016 -0800

scsi: srp_transport: Move queuecommand() wait code to SCSI core

Because that change in the wait code broke the "fix" in mpt3sas. 
 Before that was applied, it actually worked even though I think it's a
wrong fix.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread Linus Torvalds
On Sun, Jan 15, 2017 at 11:13 AM, James Bottomley
 wrote:
>
> Can we compromise on "try not to revert a fix ...".

No.

It's about timing, and about how serious the regression is.

For example, if this happened in rc7, I would have reverted
immediately. No questions asked.

In this case, the "fix" was was also much less important then the
problem it caused. Some specialized pass-through command not working
right, vs a machine not even booting? There's just no question
what-so-ever.

So the "fix" you claim just wasn't nearly important enough. It was
also pretty recent and clearly things had worked for _years_ without
it.

In fact, I'm still somewhat inclined to revert it, just to have a
working rc4 release later today. But I'm hoping maybe Ingo has time to
test things (although I suspect he's already asleep).

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread James Bottomley
On Sun, 2017-01-15 at 10:54 -0800, Linus Torvalds wrote:
> On Sun, Jan 15, 2017 at 8:11 AM, James Bottomley
>  wrote:
> > 
> > We're not reverting a fix that would cause regressions for others.
> 
> Oh HELL YES we are.
> 
> The rule is that we never break old stuff. Some new fix that fixes
> something that never used to work, but breaks something else, gets
> reverted very aggressively.
> 
> So if a new bugfix or workaround causes problems for existing users,
> it gets reverted. The fact that it fixed something else is COMPLETELY
> IRRELEVANT.
> 
> We do not do the "one step forward, two steps back" dance. If you
> can't fix a bug without breaking old systems, the "fix" gets
> reverted.
> 
> Apparently there is already a possible real fix in flight, so I won't
> actually do the revert, but I very much want to object to your
> statement.
> 
> Reverts happen.

Can we compromise on "try not to revert a fix ...".  The problem with
bugs in regression fixes is that we now have two constituencies: the
people who get the regression back if we revert the fix and the people
who are bitten by the bug in the original regression fix.  In this
particular case, I think we should always try to fix the fix because
reversion also violates "never break old stuff".  There are corner
cases, of course, like if the latter constituency is much bigger and
the fix is hard to fix, then we might revert and try again.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread Linus Torvalds
On Sun, Jan 15, 2017 at 8:11 AM, James Bottomley
 wrote:
>
> We're not reverting a fix that would cause regressions for others.

Oh HELL YES we are.

The rule is that we never break old stuff. Some new fix that fixes
something that never used to work, but breaks something else, gets
reverted very aggressively.

So if a new bugfix or workaround causes problems for existing users,
it gets reverted. The fact that it fixed something else is COMPLETELY
IRRELEVANT.

We do not do the "one step forward, two steps back" dance. If you
can't fix a bug without breaking old systems, the "fix" gets reverted.

Apparently there is already a possible real fix in flight, so I won't
actually do the revert, but I very much want to object to your
statement.

Reverts happen.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iscsi_trx going into D state

2017-01-15 Thread Laurence Oberman


- Original Message -
> From: "Robert LeBlanc" 
> To: "Laurence Oberman" 
> Cc: "Doug Ledford" , "Nicholas A. Bellinger" 
> , "Zhu Lingshan"
> , "linux-rdma" , 
> linux-scsi@vger.kernel.org, "Sagi Grimberg"
> , "Christoph Hellwig" 
> Sent: Friday, January 13, 2017 6:38:33 PM
> Subject: Re: iscsi_trx going into D state
> 
> Laurance,
> 
> I'm really starting to think that the stars aligned with the phase of
> the moon or something when I reproduced this in my lab before because
> I've been unable to reproduce it on Infiniband the last two days. The
> problem with this issue is that it is so hard to trigger, but causes a
> lot of problems when it does happen. I really hate wasting people's
> time when I can't reproduce it myself reliably. Please don't waste too
> much time if you can't get it reproduced on Infiniband, I'll have to
> wait until someone with the ConnectX-4-LX cards can replicate it.
> 
> Hmmm you do have ConnectX-4 cards which may have the same bug it
> Ethernet mode. I don't see the RoCE bug on my ConnectX-3 cards, but
> your ConnectX-4 cards may work. Try putting the cards into Ethernet
> mode, set the speed and advertised speed to something lower than the
> max speed and verify that the link speed is that (ethtool). On the
> ConnectX-4-LX cards, I just had to set both interfaces down and then
> back up at the same time, on the ConnectX-3 I had to pull the cable
> (shutting down the client might have worked). Then set up target and
> client with iSER, format and run the test and it should trigger
> automatically.
> 
> Looking at release notes on the ConnectX-4-LX cards, the latest
> firmware may fix the bug that so easily exposes the problem with that
> card. My cards are SuperMicro branded cards and don't have the new
> firmware available yet.
> 
> Good luck.
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Fri, Jan 13, 2017 at 8:10 AM, Laurence Oberman 
> wrote:
> >
> >
> > - Original Message -
> >> From: "Robert LeBlanc" 
> >> To: "Laurence Oberman" 
> >> Cc: "Doug Ledford" , "Nicholas A. Bellinger"
> >> , "Zhu Lingshan"
> >> , "linux-rdma" ,
> >> linux-scsi@vger.kernel.org, "Sagi Grimberg"
> >> , "Christoph Hellwig" 
> >> Sent: Thursday, January 12, 2017 4:26:05 PM
> >> Subject: Re: iscsi_trx going into D state
> >>
> >> Sorry sent prematurely...
> >>
> >> On Thu, Jan 12, 2017 at 2:22 PM, Robert LeBlanc 
> >> wrote:
> >> > I'm having trouble replicating the D state issue on Infiniband (I was
> >> > able to trigger it reliably a couple weeks back, I don't know if OFED
> >> > to verify the same results happen there as well.
> >>
> >> I'm having trouble replicating the D state issue on Infiniband (I was
> >> able to trigger it reliably a couple weeks back, I don't know if OFED
> >> being installed is altering things but it only installed for 3.10. The
> >> ConnectX-4-LX exposes the issue easily if you have those cards.) to
> >> verify the same results happen there as well.
> >>
> >> 
> >> Robert LeBlanc
> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> > I am only back in the office next Wednesday.
> > I have this all setup using ConnectX-4 with IB/ISER but have no way of
> > remotely creating the disconnect as I currently have it back-to-back.
> > Have run multiple tests with IB and ISER hard resting the client to break
> > the IB connection but have not been able to reproduce as yet.
> > So it will have to wait until I can pull cables next week as that seemed to
> > be the way you have been reproducing this.
> >
> > This is in a code area I also don't have a lot of knowledge of the flow but
> > have started trying to understand it better.
> >
> > Thanks
> > Laurence
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Robert

I will try this sometime tomorrow by running in ethernet mode.
Its been days of resets with no reproduction so I agree, very hard ro trproduce 
with Infiniband.

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands

2017-01-15 Thread James Bottomley
On Tue, 2017-01-03 at 15:46 -0500, Jason Baron wrote:
> On 01/01/2017 12:39 PM, James Bottomley wrote:
> > +   /*
> > +* Bug work around for firmware SATL handling
> > +*/
> > +   if (sas_device_priv_data->ata_command_pending) {
> > +   scmd->result = SAM_STAT_BUSY;
> > +   scmd->scsi_done(scmd);
> > +   return 0;
> > +   }
> > +   set_satl_pending(scmd, true);
> > +
> > sas_target_priv_data = sas_device_priv_data->sas_target;
> > 
> > /* invalid device handle */
> 
> 
> I was also wondering if 2 threads could both fall through and do:
> 'set_satl_pending(scmd, true)'; ?
> 
> Afaict there is nothing preventing that race?

Yes, it does look like queuecommand is lockless in the mpt3sas.  How
about this version using bitops for atomicity?

James

---

>From b47c28434e9cee9cbb95a794c97ec53657408111 Mon Sep 17 00:00:00 2001
From: James Bottomley 
Date: Sun, 1 Jan 2017 09:39:24 -0800
Subject: [PATCH] scsi: mpt3sas: fix hang on ata passthru commands

mp3sas has a firmware failure where it can only handle one pass
through ATA command at a time.  If another comes in, contrary to the
SAT standard, it will hang until the first one completes (causing long
commands like secure erase to timeout).  The original fix was to block
the device when an ATA command came in, but this caused a regression
with

commit 669f044170d8933c3d66d231b69ea97cb8447338
Author: Bart Van Assche 
Date:   Tue Nov 22 16:17:13 2016 -0800

scsi: srp_transport: Move queuecommand() wait code to SCSI core

So fix the original fix of the secure erase timeout by properly
returning SAM_STAT_BUSY like the SAT recommends.

Fixes: 18f6084a989ba1b38702f9af37a2e4049a924be6
Signed-off-by: James Bottomley 

---

v2 - use bitops for lockless atomicity

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 394fe13..dcb33f4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -393,6 +393,7 @@ struct MPT3SAS_TARGET {
  * @eedp_enable: eedp support enable bit
  * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
  * @eedp_block_length: block size
+ * @ata_command_pending: SATL passthrough outstanding for device
  */
 struct MPT3SAS_DEVICE {
struct MPT3SAS_TARGET *sas_target;
@@ -404,6 +405,17 @@ struct MPT3SAS_DEVICE {
u8  ignore_delay_remove;
/* Iopriority Command Handling */
u8  ncq_prio_enable;
+   /*
+* Bug workaround for SATL handling: the mpt2/3sas firmware
+* doesn't return BUSY or TASK_SET_FULL for subsequent
+* commands while a SATL pass through is in operation as the
+* spec requires, it simply does nothing with them until the
+* pass through completes, causing them possibly to timeout if
+* the passthrough is a long executing command (like format or
+* secure erase).  This variable allows us to do the right
+* thing while a SATL command is pending.
+*/
+   unsigned long ata_command_pending;
 
 };
 
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b5c966e..6f9b4c0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3899,9 +3899,18 @@ _scsih_temp_threshold_events(struct MPT3SAS_ADAPTER *ioc,
}
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+static int set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
-   return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+   struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
+
+   if  (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
+   return 0;
+
+   if (pending)
+   return test_and_set_bit(0, &priv->ata_command_pending);
+
+   clear_bit(0, &priv->ata_command_pending);
+   return 0;
 }
 
 /**
@@ -3925,9 +3934,7 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
if (!scmd)
continue;
count++;
-   if (ata_12_16_cmd(scmd))
-   scsi_internal_device_unblock(scmd->device,
-   SDEV_RUNNING);
+   set_satl_pending(scmd, false);
mpt3sas_base_free_smid(ioc, smid);
scsi_dma_unmap(scmd);
if (ioc->pci_error_recovery)
@@ -4063,13 +4070,6 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
if (ioc->logging_level & MPT_DEBUG_SCSI)
scsi_print_command(scmd);
 
-   /*
-* Lock the device for any subsequent command until command is
-* done.
-*/
-   if (ata_12_16_cmd(scmd))
-   scsi_internal_device_block(scmd->device);
-
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
scmd->result = DID_NO_CONNECT << 16;
@@ -4083,6 +4083,17 @@ scsih_qcmd(struct Scsi

Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread James Bottomley
On Sun, 2017-01-15 at 10:19 +0100, Ingo Molnar wrote:
> So there's a new mpt3sas SCSI driver boot regression, introduced in 
> this merge window, which made one of my servers unbootable.

We're not reverting a fix that would cause regressions for others. 
 However, The fix was manifestly wrong, so does this fix of the fix
work for you:

http://marc.info/?l=linux-scsi&m=148329237807604

It's been languishing a bit because no-one seemed to care enough to
test or review it.  IOf you can add a tested by, that will give the two
we need to push it.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


aacraid: kernel: AAC: Host adapter dead -1 (bisected)

2017-01-15 Thread Arkadiusz Miskiewicz

Hi.

There is a bug with handling of adaptec raid cards (in my case it is Adaptec 
3405) where kernel logs hundreds of "AAC: Host adapter dead -1" messages.

Bug was reported previously on lkml but there was no progres in solving it.

There is also bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=151661

I've bisected that to commit bellow and indeed, reverting it from kernel 4.9.3 
makes messages go away.

Could anyone at microsemi look at this regression?

Thanks

commit 78cbccd3bd683c295a44af8050797dc4a41376ff
Author: Raghava Aditya Renukunta 
Date:   Mon Apr 25 23:32:37 2016 -0700

aacraid: Fix for KDUMP driver hang

When KDUMP is triggered the driver first talks to the firmware in INTX
mode, but the adapter firmware is still in MSIX mode. Therefore the first
driver command hangs since the driver is waiting for an INTX response and
firmware gives a MSIX response. If when the OS is installed on a RAID
drive created by the adapter KDUMP will hang since the driver does not
receive a response in sync mode.

Fixed by: Change the firmware to INTX mode if it is in MSIX mode before
sending the first sync command.

Cc: sta...@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta 

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Martin K. Petersen 

my hardware:
02:0e.0 RAID bus controller [0104]: Adaptec AAC-RAID [9005:0285]
Subsystem: Adaptec 3405 [9005:02bb]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping+ SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- 
SERR- http://vger.kernel.org/majordomo-info.html


[Bug 151661] Adaptec 3405 3805 prints "AAC: Host adapter dead -1" every 10 seconds but works fine anyway

2017-01-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=151661

--- Comment #18 from Arkadiusz Miskiewicz  ---
The problem was introduced with commit below. Reverting this commit from kernel
4.9.3 makes the problem go away.

https://lkml.org/lkml/2017/1/15/47

commit 78cbccd3bd683c295a44af8050797dc4a41376ff
Author: Raghava Aditya Renukunta 
Date:   Mon Apr 25 23:32:37 2016 -0700

aacraid: Fix for KDUMP driver hang

When KDUMP is triggered the driver first talks to the firmware in INTX
mode, but the adapter firmware is still in MSIX mode. Therefore the first
driver command hangs since the driver is waiting for an INTX response and
firmware gives a MSIX response. If when the OS is installed on a RAID
drive created by the adapter KDUMP will hang since the driver does not
receive a response in sync mode.

Fixed by: Change the firmware to INTX mode if it is in MSIX mode before
sending the first sync command.

Cc: sta...@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Martin K. Petersen 

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

2017-01-15 Thread Ingo Molnar

So there's a new mpt3sas SCSI driver boot regression, introduced in this merge 
window, which made one of my servers unbootable.

The kernel, starting at upstream commit a829a8445f09, would hang thusly:

[6.230363] Linux agpgart interface v0.103
[6.245029] brd: module loaded
[6.253233] loop: module loaded
[6.256695] mpt3sas version 14.101.00.00 loaded
[6.261890] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem 
(65950628 kB)
[6.326222] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 32, 
max_msix_vectors: -1
[6.334953] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 24
[6.340237] mpt2sas_cm0: iomem(0xdff3c000), 
mapped(0xc90007414000), size(16384)
[6.349002] mpt2sas_cm0: ioport(0xe000), size(256)
[6.410830] mpt2sas_cm0: sending message unit reset !!
[6.417739] mpt2sas_cm0: message unit reset: SUCCESS
[6.463486] mpt2sas_cm0: Allocated physical memory: size(8199 kB)
[6.469820] mpt2sas_cm0: Current Controller Queue Depth(3640),Max Controller 
Queue Depth(3712)
[6.478840] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[6.530653] mpt2sas_cm0: LSISAS2008: FWVersion(12.00.00.00), 
ChipRevision(0x03), BiosVersion(07.23.01.00)
[6.540621] mpt2sas_cm0: Protocol=(
[6.540622] Initiator
[6.544346] ,Target
[6.546844] ), 
[6.549168] Capabilities=(
[6.551165] TLR
[6.554098] ,EEDP
[6.556095] ,Snapshot Buffer
[6.558249] ,Diag Trace Buffer
[6.561359] ,Task Set Full
[6.564666] ,NCQ
[6.567594] )
[6.571517] scsi host0: Fusion MPT SAS Host
[6.576539] mpt2sas_cm0: sending port enable !!
[6.576699] ahci :00:11.0: version 3.0
[6.577285] ahci :00:11.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf 
impl SATA mode
[6.577290] ahci :00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio 
slum part ccc 
[6.579218] scsi host1: ahci
[6.579685] scsi host2: ahci
[6.5800[   39.972084] sd 0:0:0:0: attempting task abort! 
scmd(881014cb9500)
[   39.978809] sd 0:0:0:0: [sda] tag#0 CDB: ATA command pass through(12)/Blank 
a1 08 2e 00 01 00 00 00 00 ec 00 00
[   39.989346] scsi target0:0:0: handle(0x0009), 
sas_address(0x44332211), phy(0)
[   39.997584] scsi target0:0:0: enclosure_logical_id(0x5003048003e10c00), 
slot(31)
[   40.005425] sd 0:0:0:0: task abort: SUCCESS scmd(881014cb9500)
udevd[472]: timeout 'ata_id --export /dev/sda'
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]
udevd[472]: timeout: killing 'ata_id --export /dev/sda' [503]

[ this would continue ad infinitum. ]

The correct bootup sequence would be:

[6.252918] loop: module loaded
[6.256390] mpt3sas version 14.101.00.00 loaded
[6.261554] mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem 
(65950628 kB)
[6.325894] mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 32, 
max_msix_vectors: -1
[6.334640] mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 24
[6.339925] mpt2sas_cm0: iomem(0xdff3c000), 
mapped(0xc900073f4000), size(16384)
[6.348672] mpt2sas_cm0: ioport(0xe000), size(256)
[6.410508] mpt2sas_cm0: sending message unit reset !!
[6.417437] mpt2sas_cm0: message unit reset: SUCCESS
[6.463275] mpt2sas_cm0: Allocated physical memory: size(8199 kB)
[6.469627] mpt2sas_cm0: Current Controller Queue Depth(3640),Max Controller 
Queue Depth(3712)
[6.478635] mpt2sas_cm0: Scatter Gather Elements per IO(128)
[6.530433] mpt2sas_cm0: LSISAS2008: FWVersion(12.00.00.00), 
ChipRevision(0x03), BiosVersion(07.23.01.00)
[6.540424] mpt2sas_cm0: Protocol=(
[6.540425] Initiator
[6.544150] ,Target
[6.546644] ), 
[6.548968] Capabilities=(
[6.550943] TLR
[6.553901] ,EEDP
[6.555898] ,Snapshot Buffer
[6.558050] ,Diag Trace Buffer
[6.561159] ,Task Set Full
[6.564462] ,NCQ
[6.567395] )
[6.571316] scsi host0: Fusion MPT SAS Host
[6.576344] mpt2sas_cm0: sending port enable !!
[6.576495] ahci :00:11.0: version 3.0
[6.577100] ahci :00:11.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf 
impl SATA mode
[6.577105] ahci :00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio 
slum part ccc 
[6.579016] scsi host1: ahci
[6.579387] scsi host2: ahci
[6.[
[32m  OK  
[0m] Started Journal Service.
...

(BTW., note the various broken printk lines - which is an unrelated bug.)

I bisected the regression back to this upstream merge commit done by Linus:

  commit a829a8445f09036404060f4d6489cb13433f4304
  Merge: 84b607913442 f5b893c94715
  Author: Linus Torvalds 
  Date:   Wed De