Re: [PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v4)

2016-11-12 Thread Andrey Grodzovsky
On Sat, Nov 12, 2016 at 10:29 AM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Andrey" == Andrey Grodzovsky <andrey2...@gmail.com> writes:
>
> Andrey,
>
> Andrey> Problem: This is a work around for a bug with LSI Fusion MPT
> Andrey> SAS2 when pefroming secure erase. Due to the very long time the
> Andrey> operation takes commands issued during the erase will time out
> Andrey> and will trigger execution of abort hook. Even though the abort
> Andrey> hook is called for the specific command which timed out this
> Andrey> leads to entire device halt (scsi_state terminated) and
> Andrey> premature termination of the secured erase.
>
> This patch didn't apply to the SCSI tree. I merged it into
> 4.9/scsi-fixes by hand.

Sorry about that and thanks. Next time i will work of off latest tree.
Regarding older code where there is still a separate mpt2sas driver, should
a separate patch to be done or this fix will be ported there ?

Thanks,
Andrey
>
> Also, please check Documentation/SubmittingPatches for future
> submissions. Patch version goes inside the [PATCH foo/bar] brackets and
> patch changelog entries below "---" separator.
>
> Thanks!
> Martin
>
> --
> Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v4)

2016-11-10 Thread Andrey Grodzovsky
Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specific command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.

Fix:
Set device state to busy while erase in progress to reject any incoming
commands until the erase is done. The device is blocked any way during
this time and cannot execute any other command.
More data and logs can be found here -
https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view

v2: Update according to example patch by Hannes Reinecke to apply
the blocking logic to any ATA 12/16 command.

v3: Use SCSI commands opcodes definitions instead of value and
correct identation.

v4: Fix checkpath errors and warning.

Signed-off-by: Andrey Grodzovsky <andrey2...@gmail.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prak...@broadcom.com>
Cc: Chaitra P B <chaitra.basa...@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subram...@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.re...@broadcom.com>
Cc: Hannes Reinecke <h...@suse.de>
Cc: <sta...@vger.kernel.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..c032319 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3500,6 +3500,10 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 
ioc_status)
SAM_STAT_CHECK_CONDITION;
 }
 
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+   return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+}
 
 /**
  * _scsih_qcmd - main scsi request entry point
@@ -3528,6 +3532,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
scsi_print_command(scmd);
 #endif
 
+   /**
+   * Lock the device for any subsequent command until
+   * command is done.
+   */
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_block(scmd->device);
+
+
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
scmd->result = DID_NO_CONNECT << 16;
@@ -4062,6 +4074,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
+
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
if (mpi_reply == NULL) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v3)

2016-11-10 Thread Andrey Grodzovsky
Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specifc command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.

Fix:
Set device state to busy while erase in progress to reject any incoming
commands until the erase is done. The device is blocked any way during
this time and cannot execute any other command.
More data and logs can be found here -
https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view

v2: Update according to example patch by Hannes Reinecke to apply
the blocking logic to any ATA 12/16 command.

v3: Use SCSI commands opcodes definitions instead of value and
correct identation.

Signed-off-by: Andrey Grodzovsky <andrey2...@gmail.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prak...@broadcom.com>
Cc: Chaitra P B <chaitra.basa...@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subram...@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.re...@broadcom.com>
Cc: Hannes Reinecke <h...@suse.de>
Cc: <sta...@vger.kernel.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..320f16c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3500,6 +3500,10 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 
ioc_status)
SAM_STAT_CHECK_CONDITION;
 }
 
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+   return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+}
 
 /**
  * _scsih_qcmd - main scsi request entry point
@@ -3528,6 +3532,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
scsi_print_command(scmd);
 #endif
 
+   /**
+   * Lock the device for any subsequent command until
+   * command is done.
+   */
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_block(scmd->device);
+
+
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
scmd->result = DID_NO_CONNECT << 16;
@@ -4062,6 +4074,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
+
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
if (mpi_reply == NULL) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v2)

2016-11-05 Thread Andrey Grodzovsky
On Fri, Nov 4, 2016 at 10:51 AM, Hannes Reinecke  wrote:
> On 11/04/2016 01:45 PM, Sreekanth Reddy wrote:
>>
>> Hi All,
>>
>> From last two days, I was working with my firmware team to get the
>> required info over this issue. Here is my firmware team response
>>
>> "For ATA PASSTHROUGH commands, the IOC SATL will not check for the
>> opcode and will direct it to the drive. So even though ATA PASSTHOUGH
>> has ATA erase to the drive, IOC SATL FW will not know that and as a
>> general logic for all ATA PASSTHOGH commands, IOC FW will pend the
>> upcoming IOs untill the previous ATA PASSTHORUGH completes. This is as
>> per the SAT specification for SAS controllers and we can't compare it
>> with the SATA controllers in the on board that have full fledge SATA
>> implementation".
>>
>> So this is an expected behavior from our HBA firmware. i.e. it will
>> pend the subsequent commands if any ATA PASSTHROUGH command is going
>> on. So their is no issue with the FW.
>>
> But is there a way to figure out if the firmware / SATL layer is busy
> processing requests?
>
> With 'real' ATA HBAs these issue doesn't occur, as the ATA erase command is
> a non-queued command, and hence the next command automatically has to wait
> for the erase command to complete.
> But this wait happens as the ATA HBA returns 'BUSY', and the linux I/O stack
> will then reset the timeout for all consecutive commands.
>
> With mpt3sas _all_ commands are queued, so if there is a long-running I/O
> command all other commands already in the queue will time out.
>
> Which is at least a very awkward behaviour.
>
> Checking with SAT-3 (section 6.2.4: Commands the SATL queues internally) the
> implemented behaviour is standards conformant, although the standard also
> allows for returning 'TASK SET FULL' or 'BUSY' in these cases.
> Doing so would nicely solve this issue.
>
>> Today I have tried the same test case on my local setup. i.e. I have
>> issued a secure erase command using hdparm utility and observed the
>> same issue on 4.2.3-300.fc23.x86_64 kernel.
>>
>> Then after browsing over this issue, I found that some people are
>> recommending to enable 'CONFIG_IDE_TASK_IOCTL' Kconfig flag. I had a
>> compiled 4.4.0 kernel, so I have enabled this CONFIG_IDE_TASK_IOCTL
>> and recompiled this 4.4.0 kernel and booted in to this kernel. Then I
>> tried same test case and I haven't observed this issue and secure
>> erase operation was completed successfully.
>>
>> So, can you please try once with CONFIG_IDE_TASK_IOCTL enabled.
>>
> Errm.
> CONFIG_IDE_TASK_IOCTL is for the old IDE subsystem, which isn't in use here.
> So this option does not make a difference when using mpt3sas, as this is a
> 'real' SCSI driver which never calls out into any of these subsystems.
>
> I would be _VERY_ much surprised if that would make a difference.
>
> The reason why this behaviour did go unnoticed with older kernels was that a
> command timeout would trigger SCSI EH to engage, and that in turn required
> all outstanding commands to complete.
> So by the time SCSI EH started the ERASE command was complete, and a retry
> of the timed-out commands would work.

Indeed, when retesting with CONFIG_IDE_TASK_IOCTL=y and. reverting the
fix the bug is back.

Thanks,
Andrey
>
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke   zSeries & Storage
> h...@suse.de  +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v2)

2016-11-01 Thread Andrey Grodzovsky
Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specifc command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.

Fix:
Set device state to busy while erase in progress to reject any incoming
commands until the erase is done. The device is blocked any way during
this time and cannot execute any other command.
More data and logs can be found here -
https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view

v2: Update according to example patch by Hannes Reinecke to apply
the blocking logic to any ATA 12/16 command.

Signed-off-by: Andrey Grodzovsky <andrey2...@gmail.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prak...@broadcom.com>
Cc: Chaitra P B <chaitra.basa...@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subram...@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.re...@broadcom.com>
Cc: Hannes Reinecke <h...@suse.de>
Cc: <sta...@vger.kernel.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..43ab0cc 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3500,6 +3500,20 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 
ioc_status)
SAM_STAT_CHECK_CONDITION;
 }
 
+/**
+ * This is a work around for a bug with LSI Fusion MPT SAS2 when
+ * pefroming secure erase. Due to the verly long time the operation
+ * takes commands issued during the erase will time out and will trigger
+ * execution of abort hook. This leads to device reset and premature
+ * termination of the secured erase.
+ *
+ */
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+   return (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85);
+}
+
+
 
 /**
  * _scsih_qcmd - main scsi request entry point
@@ -3528,6 +3542,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
scsi_print_command(scmd);
 #endif
 
+   /**
+   * Lock the device for any subsequent command until
+   * command is done.
+   */
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_block(scmd->device);
+
+
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
scmd->result = DID_NO_CONNECT << 16;
@@ -4062,6 +4084,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
+
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
if (mpi_reply == NULL) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] mpt3sas: Fix secure erase premature termination.

2016-10-30 Thread Andrey Grodzovsky
Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specifc command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.

Fix:
Set device state to busy while erase in progress to reject any incoming
commands until the erase is done. The device is blocked any way during
this time and cannot execute any other command.
More data and logs can be found here -
https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view

Signed-off-by: Andrey Grodzovsky <andrey2...@gmail.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prak...@broadcom.com>
Cc: Chaitra P B <chaitra.basa...@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subram...@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.re...@broadcom.com>
Cc: Hannes Reinecke <h...@suse.de>
Cc: <sta...@vger.kernel.org>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git [PATCH]drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..5542dd02 100644
--- [PATCH]drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3500,6 +3500,23 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 
ioc_status)
SAM_STAT_CHECK_CONDITION;
 }
 
+/**
+ * This is a work around for a bug with LSI Fusion MPT SAS2 when
+ * pefroming secure erase. Due to the verly long time the operation
+ * takes commands issued during the erase will time out and will trigger
+ * execution of abort hook. This leads to device reset and premature
+ * termination of the secured erase.
+ */
+static inline bool disk_erase_command(struct scsi_cmnd *scmd)
+{
+   /**
+   * Identify secure erase command according to
+   * ATA/ATAPI Command Set (ATA8-ACS) p.202
+   */
+   return scmd->cmd_len == 16 && scmd->cmnd[14] == 0xf4;
+}
+
+
 
 /**
  * _scsih_qcmd - main scsi request entry point
@@ -3528,6 +3545,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
scsi_print_command(scmd);
 #endif
 
+   /**
+   * Lock the device for any subsequent command until secured erase
+   * command is done.
+   */
+   if (disk_erase_command(scmd))
+   scsi_internal_device_block(scmd->device);
+
+
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
scmd->result = DID_NO_CONNECT << 16;
@@ -4062,6 +4087,11 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
+   /* Secured erase is done. Unlock the device */
+   if (disk_erase_command(scmd))
+   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
+
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
if (mpi_reply == NULL) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html