[PATCH v6 25/37] cxlflash: Fix to prevent EEH recovery failure

2015-10-21 Thread Matthew R. Ochs
The process_sense() routine can perform a read capacity which
can take some time to complete. If an EEH occurs while waiting
on the read capacity, the EEH handler will wait to obtain the
context's mutex in order to put the context in an error state.
The EEH handler will sit and wait until the context is free,
but this wait can potentially last forever (deadlock) if the
scsi_execute() that performs the read capacity experiences a
timeout and calls into the reset callback. When that occurs,
the reset callback sees that the device is already being reset
and waits for the reset to complete. This leaves two threads
waiting on the other.

To address this issue, make the context unavailable to new,
non-system owned threads and release the context while calling
into process_sense(). After returning from process_sense() the
context mutex is reacquired and the context is made available
again. The context can be safely moved to the error state if
needed during the unavailable window as no other threads will
hold its reference.

Signed-off-by: Matthew R. Ochs 
Signed-off-by: Manoj N. Kumar 
Reviewed-by: Brian King 
Reviewed-by: Daniel Axtens 
---
 drivers/scsi/cxlflash/superpipe.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/scsi/cxlflash/superpipe.c 
b/drivers/scsi/cxlflash/superpipe.c
index a6316f5..7283e83 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -1787,12 +1787,21 @@ static int cxlflash_disk_verify(struct scsi_device 
*sdev,
 * inquiry (i.e. the Unit attention is due to the WWN changing).
 */
if (verify->hint & DK_CXLFLASH_VERIFY_HINT_SENSE) {
+   /* Can't hold mutex across process_sense/read_cap16,
+* since we could have an intervening EEH event.
+*/
+   ctxi->unavail = true;
+   mutex_unlock(&ctxi->mutex);
rc = process_sense(sdev, verify);
if (unlikely(rc)) {
dev_err(dev, "%s: Failed to validate sense data (%d)\n",
__func__, rc);
+   mutex_lock(&ctxi->mutex);
+   ctxi->unavail = false;
goto out;
}
+   mutex_lock(&ctxi->mutex);
+   ctxi->unavail = false;
}
 
switch (gli->mode) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 25/37] cxlflash: Fix to prevent EEH recovery failure

2015-10-23 Thread Tomas Henzl
On 21.10.2015 22:14, Matthew R. Ochs wrote:
> The process_sense() routine can perform a read capacity which
> can take some time to complete. If an EEH occurs while waiting
> on the read capacity, the EEH handler will wait to obtain the
> context's mutex in order to put the context in an error state.
> The EEH handler will sit and wait until the context is free,
> but this wait can potentially last forever (deadlock) if the
> scsi_execute() that performs the read capacity experiences a
> timeout and calls into the reset callback. When that occurs,
> the reset callback sees that the device is already being reset
> and waits for the reset to complete. This leaves two threads
> waiting on the other.
>
> To address this issue, make the context unavailable to new,
> non-system owned threads and release the context while calling
> into process_sense(). After returning from process_sense() the
> context mutex is reacquired and the context is made available
> again. The context can be safely moved to the error state if
> needed during the unavailable window as no other threads will
> hold its reference.
>
> Signed-off-by: Matthew R. Ochs 
> Signed-off-by: Manoj N. Kumar 
> Reviewed-by: Brian King 
> Reviewed-by: Daniel Axtens 

Reviewed-by: Tomas Henzl 

Tomas

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html