[+cc Sinan]

On Wed, Sep 30, 2020 at 03:05:36AM -0400, Ethan Zhao wrote:
> When uncorrectable error happens, AER driver and DPC driver interrupt
> handlers likely call
> 
>    pcie_do_recovery()
>    ->pci_walk_bus()
>      ->report_frozen_detected()
> 
> with pci_channel_io_frozen the same time.
>    If pci_dev_set_io_state() return true even if the original state is
> pci_channel_io_frozen, that will cause AER or DPC handler re-enter
> the error detecting and recovery procedure one after another.
>    The result is the recovery flow mixed between AER and DPC.
> So simplify the pci_dev_set_io_state() function to only return true
> when dev->error_state is changed.
> 
> Signed-off-by: Ethan Zhao <haifeng.z...@intel.com>
> Tested-by: Wen Jin <wen....@intel.com>
> Tested-by: Shanshan Zhang <shanshanx.zh...@intel.com>
> Reviewed-by: Alexandru Gagniuc <mr.nuke...@gmail.com>
> Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com>
> ---
> Changnes:
>  v2: revise description and code according to suggestion from Andy.
>  v3: change code to simpler.
>  v4: no change.
>  v5: no change.
>  v6: no change.
> 
>  drivers/pci/pci.h | 37 +++++--------------------------------
>  1 file changed, 5 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 455b32187abd..f2beeaeda321 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -359,39 +359,12 @@ struct pci_sriov {
>  static inline bool pci_dev_set_io_state(struct pci_dev *dev,
>                                       pci_channel_state_t new)
>  {
> -     bool changed = false;
> -
>       device_lock_assert(&dev->dev);
> -     switch (new) {
> -     case pci_channel_io_perm_failure:
> -             switch (dev->error_state) {
> -             case pci_channel_io_frozen:
> -             case pci_channel_io_normal:
> -             case pci_channel_io_perm_failure:
> -                     changed = true;
> -                     break;
> -             }
> -             break;
> -     case pci_channel_io_frozen:
> -             switch (dev->error_state) {
> -             case pci_channel_io_frozen:
> -             case pci_channel_io_normal:
> -                     changed = true;
> -                     break;
> -             }
> -             break;
> -     case pci_channel_io_normal:
> -             switch (dev->error_state) {
> -             case pci_channel_io_frozen:
> -             case pci_channel_io_normal:
> -                     changed = true;
> -                     break;
> -             }
> -             break;
> -     }
> -     if (changed)
> -             dev->error_state = new;
> -     return changed;
> +     if (dev->error_state == new)
> +             return false;
> +
> +     dev->error_state = new;
> +     return true;
>  }

IIUC this changes the behavior of the function, but it's difficult to
analyze because it does a lot of simplification at the same time.

Please consider the following, which is intended to simplify the
function while preserving the behavior (but please verify; it's been a
long time since I looked at this).  Then maybe see how your patch
could be done on top of this?

Alternatively, come up with your own simplification patch + the
functionality change.


commit 983d9b1f8177 ("PCI/ERR: Simplify pci_dev_set_io_state()")
Author: Bjorn Helgaas <bhelg...@google.com>
Date:   Tue May 19 12:28:57 2020 -0500

    PCI/ERR: Simplify pci_dev_set_io_state()
    
    Truth table:
    
                                  requested new state
      current          ------------------------------------------
      state            normal         frozen         perm_failure
      ------------  +  -------------  -------------  ------------
      normal        |  normal         frozen         perm_failure
      frozen        |  normal         frozen         perm_failure
      perm_failure  |  perm_failure*  perm_failure*  perm_failure
    
      * "not changed", returns false
    
    No functional change intended.
    
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f75867106..81408552f7c9 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -358,39 +358,21 @@ struct pci_sriov {
 static inline bool pci_dev_set_io_state(struct pci_dev *dev,
                                        pci_channel_state_t new)
 {
-       bool changed = false;
-
        device_lock_assert(&dev->dev);
-       switch (new) {
-       case pci_channel_io_perm_failure:
-               switch (dev->error_state) {
-               case pci_channel_io_frozen:
-               case pci_channel_io_normal:
-               case pci_channel_io_perm_failure:
-                       changed = true;
-                       break;
-               }
-               break;
-       case pci_channel_io_frozen:
-               switch (dev->error_state) {
-               case pci_channel_io_frozen:
-               case pci_channel_io_normal:
-                       changed = true;
-                       break;
-               }
-               break;
-       case pci_channel_io_normal:
-               switch (dev->error_state) {
-               case pci_channel_io_frozen:
-               case pci_channel_io_normal:
-                       changed = true;
-                       break;
-               }
-               break;
+
+       /* Can always put a device in perm_failure state */
+       if (new == pci_channel_io_perm_failure) {
+               dev->error_state == pci_channel_io_perm_failure;
+               return true;
        }
-       if (changed)
-               dev->error_state = new;
-       return changed;
+
+       /* If already in perm_failure, can't set to normal or frozen */
+       if (dev->error_state == pci_channel_io_perm_failure)
+               return false;
+
+       /* Can always change normal to frozen or vice versa */
+       dev->error_state = new;
+       return true;
 }
 
 static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)

Reply via email to