subject:"\[PATCH v4 08\/32\] cxlflash\: Fix to avoid CXL services during EEH"

Re: [PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

2015-09-29 Thread Matthew R. Ochs

> On Sep 28, 2015, at 6:05 PM, Daniel Axtens  wrote:
> 
> You have two versions of check_state() below, which is a bit
> confusing. It looks like you've moved the function and also added the
> up/down of the read semaphore. I assume that's all that changed?

Correct.

It was originally moved to meet a dependency due to it being defined statically.

> 
>> 
>> /**
>> + * check_state() - checks and responds to the current adapter state
>> + * @cfg:Internal structure associated with the host.
>> + *
>> + * This routine can block and should only be used on process context.
>> + * It assumes that the caller is an ioctl thread and holding the ioctl
>> + * read semaphore. This is temporarily let up across the wait to allow
>> + * for draining actively running ioctls. Also note that when waking up
>> + * from waiting in reset, the state is unknown and must be checked again
>> + * before proceeding.
>> + *
>> + * Return: 0 on success, -errno on failure
>> + */
>> +static int check_state(struct cxlflash_cfg *cfg)
>> +{
>> +struct device *dev = &cfg->dev->dev;
>> +int rc = 0;
>> +
>> +retry:
>> +switch (cfg->state) {
>> +case STATE_LIMBO:
>> +dev_dbg(dev, "%s: Limbo state, going to wait...\n", __func__);
>> +up_read(&cfg->ioctl_rwsem);
>> +rc = wait_event_interruptible(cfg->limbo_waitq,
>> +  cfg->state != STATE_LIMBO);
>> +down_read(&cfg->ioctl_rwsem);
>> +if (unlikely(rc))
>> +break;
>> +goto retry;
>> +case STATE_FAILTERM:
>> +dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
>> +rc = -ENODEV;
>> +break;
>> +default:
>> +break;
>> +}
>> +
>> +return rc;
>> +}
>> +
>> +/**
>>  * cxlflash_disk_attach() - attach a LUN to a context
>>  * @sdev:SCSI device associated with LUN.
>>  * @attach:  Attach ioctl data structure.
>> @@ -1523,41 +1563,6 @@ err1:
>> }
>> 
>> /**
>> - * check_state() - checks and responds to the current adapter state
>> - * @cfg:Internal structure associated with the host.
>> - *
>> - * This routine can block and should only be used on process context.
>> - * Note that when waking up from waiting in limbo, the state is unknown
>> - * and must be checked again before proceeding.
>> - *
>> - * Return: 0 on success, -errno on failure
>> - */
>> -static int check_state(struct cxlflash_cfg *cfg)
>> -{
>> -struct device *dev = &cfg->dev->dev;
>> -int rc = 0;
>> -
>> -retry:
>> -switch (cfg->state) {
>> -case STATE_LIMBO:
>> -dev_dbg(dev, "%s: Limbo, going to wait...\n", __func__);
>> -rc = wait_event_interruptible(cfg->limbo_waitq,
>> -  cfg->state != STATE_LIMBO);
>> -if (unlikely(rc))
>> -break;
>> -goto retry;
>> -case STATE_FAILTERM:
>> -dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
>> -rc = -ENODEV;
>> -break;
>> -default:
>> -break;
>> -}
>> -
>> -return rc;
>> -}
>> -
>> -/**
>>  * cxlflash_afu_recover() - initiates AFU recovery
>>  * @sdev:SCSI device associated with LUN.
>>  * @recover: Recover ioctl data structure.
>> @@ -1646,9 +1651,14 @@ retry_recover:
>>  /* Test if in error state */
>>  reg = readq_be(&afu->ctrl_map->mbox_r);
>>  if (reg == -1) {
>> -dev_dbg(dev, "%s: MMIO read fail! Wait for recovery...\n",
>> -__func__);
>> -mutex_unlock(&ctxi->mutex);
>> +dev_dbg(dev, "%s: MMIO fail, wait for recovery.\n", __func__);
>> +
>> +/*
>> + * Before checking the state, put back the context obtained with
>> + * get_context() as it is no longer needed and sleep for a short
>> + * period of time (see prolog notes).
>> + */
>> +put_context(ctxi);
> 
> Is this needed for the drain to work? It looks like it fixes a
> refcounting bug in the function, but I'm not sure I understand how it
> interacts with the rest of this patch.

This was simply some "while I'm here" refactoring as the commit originally
included a change here. The main point of this change was to replace the
mutex_unlock() with put_context(), which is a wrapper around the unlocking
of the context's mutex.
> 
> Anyway, the patch overall looks good to me, and makes your driver
> interact with CXL's EEH support in the way I intended when I wrote it.

Thanks for reviewing.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

2015-09-28 Thread Daniel Axtens


You have two versions of check_state() below, which is a bit
confusing. It looks like you've moved the function and also added the
up/down of the read semaphore. I assume that's all that changed?

>  
>  /**
> + * check_state() - checks and responds to the current adapter state
> + * @cfg: Internal structure associated with the host.
> + *
> + * This routine can block and should only be used on process context.
> + * It assumes that the caller is an ioctl thread and holding the ioctl
> + * read semaphore. This is temporarily let up across the wait to allow
> + * for draining actively running ioctls. Also note that when waking up
> + * from waiting in reset, the state is unknown and must be checked again
> + * before proceeding.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +static int check_state(struct cxlflash_cfg *cfg)
> +{
> + struct device *dev = &cfg->dev->dev;
> + int rc = 0;
> +
> +retry:
> + switch (cfg->state) {
> + case STATE_LIMBO:
> + dev_dbg(dev, "%s: Limbo state, going to wait...\n", __func__);
> + up_read(&cfg->ioctl_rwsem);
> + rc = wait_event_interruptible(cfg->limbo_waitq,
> +   cfg->state != STATE_LIMBO);
> + down_read(&cfg->ioctl_rwsem);
> + if (unlikely(rc))
> + break;
> + goto retry;
> + case STATE_FAILTERM:
> + dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
> + rc = -ENODEV;
> + break;
> + default:
> + break;
> + }
> +
> + return rc;
> +}
> +
> +/**
>   * cxlflash_disk_attach() - attach a LUN to a context
>   * @sdev:SCSI device associated with LUN.
>   * @attach:  Attach ioctl data structure.
> @@ -1523,41 +1563,6 @@ err1:
>  }
>  
>  /**
> - * check_state() - checks and responds to the current adapter state
> - * @cfg: Internal structure associated with the host.
> - *
> - * This routine can block and should only be used on process context.
> - * Note that when waking up from waiting in limbo, the state is unknown
> - * and must be checked again before proceeding.
> - *
> - * Return: 0 on success, -errno on failure
> - */
> -static int check_state(struct cxlflash_cfg *cfg)
> -{
> - struct device *dev = &cfg->dev->dev;
> - int rc = 0;
> -
> -retry:
> - switch (cfg->state) {
> - case STATE_LIMBO:
> - dev_dbg(dev, "%s: Limbo, going to wait...\n", __func__);
> - rc = wait_event_interruptible(cfg->limbo_waitq,
> -   cfg->state != STATE_LIMBO);
> - if (unlikely(rc))
> - break;
> - goto retry;
> - case STATE_FAILTERM:
> - dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
> - rc = -ENODEV;
> - break;
> - default:
> - break;
> - }
> -
> - return rc;
> -}
> -
> -/**
>   * cxlflash_afu_recover() - initiates AFU recovery
>   * @sdev:SCSI device associated with LUN.
>   * @recover: Recover ioctl data structure.
> @@ -1646,9 +1651,14 @@ retry_recover:
>   /* Test if in error state */
>   reg = readq_be(&afu->ctrl_map->mbox_r);
>   if (reg == -1) {
> - dev_dbg(dev, "%s: MMIO read fail! Wait for recovery...\n",
> - __func__);
> - mutex_unlock(&ctxi->mutex);
> + dev_dbg(dev, "%s: MMIO fail, wait for recovery.\n", __func__);
> +
> + /*
> +  * Before checking the state, put back the context obtained with
> +  * get_context() as it is no longer needed and sleep for a short
> +  * period of time (see prolog notes).
> +  */
> + put_context(ctxi);

Is this needed for the drain to work? It looks like it fixes a
refcounting bug in the function, but I'm not sure I understand how it
interacts with the rest of this patch.

Anyway, the patch overall looks good to me, and makes your driver
interact with CXL's EEH support in the way I intended when I wrote it.

Reviewed-by: Daniel Axtens 

Regards,
Daniel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

2015-09-28 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

2015-09-25 Thread Matthew R. Ochs

During an EEH freeze event, certain CXL services should not be
called until after the hardware reset has taken place. Doing so
can result in unnecessary failures and possibly cause other ill
effects by triggering hardware accesses. This translates to a
requirement to quiesce all threads that may potentially use CXL
runtime service during this window. In particular, multiple ioctls
make use of the CXL services when acting on contexts on behalf of
the user. Thus, it is essential to 'drain' running ioctls _before_
proceeding with handling the EEH freeze event.

Create the ability to drain ioctls by wrapping the ioctl handler
call in a read semaphore and then implementing a small routine that
obtains the write semaphore, effectively creating a wait point for
all currently executing ioctls.

Signed-off-by: Matthew R. Ochs 
Signed-off-by: Manoj N. Kumar 
---
 drivers/scsi/cxlflash/common.h|  2 +
 drivers/scsi/cxlflash/main.c  | 18 +--
 drivers/scsi/cxlflash/superpipe.c | 98 ---
 3 files changed, 77 insertions(+), 41 deletions(-)

diff --git a/drivers/scsi/cxlflash/common.h b/drivers/scsi/cxlflash/common.h
index 1c56037..1abe4e0 100644
--- a/drivers/scsi/cxlflash/common.h
+++ b/drivers/scsi/cxlflash/common.h
@@ -16,6 +16,7 @@
 #define _CXLFLASH_COMMON_H
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -110,6 +111,7 @@ struct cxlflash_cfg {
atomic_t recovery_threads;
struct mutex ctx_recovery_mutex;
struct mutex ctx_tbl_list_mutex;
+   struct rw_semaphore ioctl_rwsem;
struct ctx_info *ctx_tbl[MAX_CONTEXT];
struct list_head ctx_err_recovery; /* contexts w/ recovery pending */
struct file_operations cxl_fops;
diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 3e3ccf1..6e85c77 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -2311,6 +2311,7 @@ static int cxlflash_probe(struct pci_dev *pdev,
cfg->lr_port = -1;
mutex_init(&cfg->ctx_tbl_list_mutex);
mutex_init(&cfg->ctx_recovery_mutex);
+   init_rwsem(&cfg->ioctl_rwsem);
INIT_LIST_HEAD(&cfg->ctx_err_recovery);
INIT_LIST_HEAD(&cfg->lluns);
 
@@ -2365,6 +2366,19 @@ out_remove:
 }
 
 /**
+ * drain_ioctls() - wait until all currently executing ioctls have completed
+ * @cfg:   Internal structure associated with the host.
+ *
+ * Obtain write access to read/write semaphore that wraps ioctl
+ * handling to 'drain' ioctls currently executing.
+ */
+static void drain_ioctls(struct cxlflash_cfg *cfg)
+{
+   down_write(&cfg->ioctl_rwsem);
+   up_write(&cfg->ioctl_rwsem);
+}
+
+/**
  * cxlflash_pci_error_detected() - called when a PCI error is detected
  * @pdev:  PCI device struct.
  * @state: PCI channel state.
@@ -2383,16 +2397,14 @@ static pci_ers_result_t 
cxlflash_pci_error_detected(struct pci_dev *pdev,
switch (state) {
case pci_channel_io_frozen:
cfg->state = STATE_LIMBO;
-
-   /* Turn off legacy I/O */
scsi_block_requests(cfg->host);
+   drain_ioctls(cfg);
rc = cxlflash_mark_contexts_error(cfg);
if (unlikely(rc))
dev_err(dev, "%s: Failed to mark user contexts!(%d)\n",
__func__, rc);
term_mc(cfg, UNDO_START);
stop_afu(cfg);
-
return PCI_ERS_RESULT_NEED_RESET;
case pci_channel_io_perm_failure:
cfg->state = STATE_FAILTERM;
diff --git a/drivers/scsi/cxlflash/superpipe.c 
b/drivers/scsi/cxlflash/superpipe.c
index 28aa9d9..655cbf1 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -1214,6 +1214,46 @@ static const struct file_operations null_fops = {
 };
 
 /**
+ * check_state() - checks and responds to the current adapter state
+ * @cfg:   Internal structure associated with the host.
+ *
+ * This routine can block and should only be used on process context.
+ * It assumes that the caller is an ioctl thread and holding the ioctl
+ * read semaphore. This is temporarily let up across the wait to allow
+ * for draining actively running ioctls. Also note that when waking up
+ * from waiting in reset, the state is unknown and must be checked again
+ * before proceeding.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static int check_state(struct cxlflash_cfg *cfg)
+{
+   struct device *dev = &cfg->dev->dev;
+   int rc = 0;
+
+retry:
+   switch (cfg->state) {
+   case STATE_LIMBO:
+   dev_dbg(dev, "%s: Limbo state, going to wait...\n", __func__);
+   up_read(&cfg->ioctl_rwsem);
+   rc = wait_event_interruptible(cfg->limbo_waitq,
+ cfg->state != STATE_LIMBO);
+   down_read(&cfg->ioctl_rwsem);
+   if (unlikely(rc))
+   break;
+   g

Re: [PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

Re: [PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

Re: [PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

[PATCH v4 08/32] cxlflash: Fix to avoid CXL services during EEH

4 matches

Site Navigation

Mail list logo

Footer information