RE: [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers

2012-11-28 Thread Pandarathil, Vijaymohan R
Looks correct.

Vijay

> -Original Message-
> From: Bjorn Helgaas [mailto:bhelg...@google.com]
> Sent: Wednesday, November 28, 2012 12:15 PM
> To: Pandarathil, Vijaymohan R
> Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org;
> linasveps...@gmail.com; Myron Stowe; Ortiz, Lance E; Huang Ying; Hidetoshi
> Seto; Patterson, Andrew D (LeftHand Networks); Zhang Yanmin
> Subject: Re: [PATCH v3] PCI-AER: Do not report successful error recovery
> for devices with AER-unaware drivers
> 
> On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
>  wrote:
> > When an error is detected on a PCIe device which does not have an
> > AER-aware driver, prevent AER infrastructure from reporting
> > successful error recovery.
> >
> > This is because the report_error_detected() function that gets
> > called in the first phase of recovery process allows forward
> > progress even when the driver for the device does not have AER
> > capabilities. It seems that all callbacks (in pci_error_handlers
> > structure) registered by drivers that gets called during error
> > recovery are not mandatory. So the intention of the infrastructure
> > design seems to be to allow forward progress even when a specific
> > callback has not been registered by a driver. However, if error
> > handler structure itself has not been registered, it doesn't make
> > sense to allow forward progress.
> >
> > As a result of the current design, in the case of a single device
> > having an AER-unaware driver or in the case of any function in a
> > multi-function card having an AER-unaware driver, a successful
> > recovery is reported.
> >
> > Typical scenario this happens is when a PCI device is detached
> > from a KVM host and the pci-stub driver on the host claims the
> > device. The pci-stub driver does not have error handling capabilities
> > but the AER infrastructure still reports that the device recovered
> > successfully.
> >
> > The changes proposed here leaves the device(s)in an unrecovered state
> > if the driver for the device or for any device in the subtree
> > does not have error handler structure registered. This reflects
> > the true state of the device and prevents any partial recovery (or no
> > recovery at all) reported as successful.
> >
> >
> > v3:
> >   - Fixed a checkpatch/build issue
> >
> > v2:
> >   - Made changes so that all devices in the subtree have the error
> > state set correctly.
> >
> >
> > Reviewed-by: Linas Vepstas  gmail.com>
> > Reviewed-by: Myron Stowe  redhat.com>
> > Reviewed-by: Bjorn Helgaas  google.com>
> > Signed-off-by: Vijay Mohan Pandarathil 
> 
> I added this to my -next branch, which will be merged for v3.8.
> 
> Please double-check that I resolved the merge conflict correctly.  It
> should show up here in the next few minutes:
> 
> http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs
> /heads/next
> 
> >  drivers/pci/pcie/aer/aerdrv.h  |  5 -
> >  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++---
> >  include/linux/pci.h|  3 +++
> >  3 files changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer/aerdrv.h
> b/drivers/pci/pcie/aer/aerdrv.h
> > index 94a7598..22f840f 100644
> > --- a/drivers/pci/pcie/aer/aerdrv.h
> > +++ b/drivers/pci/pcie/aer/aerdrv.h
> > @@ -87,6 +87,9 @@ struct aer_broadcast_data {
> >  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
> > enum pci_ers_result new)
> >  {
> > +   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> > +   return PCI_ERS_RESULT_NO_AER_DRIVER;
> > +
> > if (new == PCI_ERS_RESULT_NONE)
> > return orig;
> >
> > @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum
> pci_ers_result orig,
> > break;
> > case PCI_ERS_RESULT_DISCONNECT:
> > if (new == PCI_ERS_RESULT_NEED_RESET)
> > -   orig = new;
> > +   orig = PCI_ERS_RESULT_NEED_RESET;
> > break;
> > default:
> > break;
> > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c
> b/drivers/pci/pcie/aer/aerdrv_core.c
> > index 9f80cb0..b67f91a 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev
> *dev, void *data)
> > 

Re: [PATCH v3] PCI-AER: Do not report successful error recovery for devices with AER-unaware drivers

2012-11-28 Thread Bjorn Helgaas
On Sat, Nov 17, 2012 at 4:47 AM, Pandarathil, Vijaymohan R
 wrote:
> When an error is detected on a PCIe device which does not have an
> AER-aware driver, prevent AER infrastructure from reporting
> successful error recovery.
>
> This is because the report_error_detected() function that gets
> called in the first phase of recovery process allows forward
> progress even when the driver for the device does not have AER
> capabilities. It seems that all callbacks (in pci_error_handlers
> structure) registered by drivers that gets called during error
> recovery are not mandatory. So the intention of the infrastructure
> design seems to be to allow forward progress even when a specific
> callback has not been registered by a driver. However, if error
> handler structure itself has not been registered, it doesn't make
> sense to allow forward progress.
>
> As a result of the current design, in the case of a single device
> having an AER-unaware driver or in the case of any function in a
> multi-function card having an AER-unaware driver, a successful
> recovery is reported.
>
> Typical scenario this happens is when a PCI device is detached
> from a KVM host and the pci-stub driver on the host claims the
> device. The pci-stub driver does not have error handling capabilities
> but the AER infrastructure still reports that the device recovered
> successfully.
>
> The changes proposed here leaves the device(s)in an unrecovered state
> if the driver for the device or for any device in the subtree
> does not have error handler structure registered. This reflects
> the true state of the device and prevents any partial recovery (or no
> recovery at all) reported as successful.
>
>
> v3:
>   - Fixed a checkpatch/build issue
>
> v2:
>   - Made changes so that all devices in the subtree have the error
> state set correctly.
>
>
> Reviewed-by: Linas Vepstas  gmail.com>
> Reviewed-by: Myron Stowe  redhat.com>
> Reviewed-by: Bjorn Helgaas  google.com>
> Signed-off-by: Vijay Mohan Pandarathil 

I added this to my -next branch, which will be merged for v3.8.

Please double-check that I resolved the merge conflict correctly.  It
should show up here in the next few minutes:

http://git.kernel.org/?p=linux/kernel/git/helgaas/pci.git;a=shortlog;h=refs/heads/next

>  drivers/pci/pcie/aer/aerdrv.h  |  5 -
>  drivers/pci/pcie/aer/aerdrv_core.c | 21 ++---
>  include/linux/pci.h|  3 +++
>  3 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
> index 94a7598..22f840f 100644
> --- a/drivers/pci/pcie/aer/aerdrv.h
> +++ b/drivers/pci/pcie/aer/aerdrv.h
> @@ -87,6 +87,9 @@ struct aer_broadcast_data {
>  static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
> enum pci_ers_result new)
>  {
> +   if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> +   return PCI_ERS_RESULT_NO_AER_DRIVER;
> +
> if (new == PCI_ERS_RESULT_NONE)
> return orig;
>
> @@ -97,7 +100,7 @@ static inline pci_ers_result_t merge_result(enum 
> pci_ers_result orig,
> break;
> case PCI_ERS_RESULT_DISCONNECT:
> if (new == PCI_ERS_RESULT_NEED_RESET)
> -   orig = new;
> +   orig = PCI_ERS_RESULT_NEED_RESET;
> break;
> default:
> break;
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c 
> b/drivers/pci/pcie/aer/aerdrv_core.c
> index 9f80cb0..b67f91a 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -232,11 +232,26 @@ static int report_error_detected(struct pci_dev *dev, 
> void *data)
>dev->driver ?
>"no AER-aware driver" : "no driver");
> }
> -   return 0;
> +
> +   /*
> +* If there's any device in the subtree that does not
> +* have an error_detected callback, returning
> +* PCI_ERS_RESULT_NO_AER_DRIVER prevents calling of
> +* the subsequent mmio_enabled/slot_reset/resume
> +* callbacks of "any" device in the subtree. All the
> +* devices in the subtree are left in the error state
> +* without recovery.
> +*/
> +
> +   if (!(dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
> +   vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> +   else
> +   vote = PCI_ERS_RESULT_NONE;
> +   } else {
> +   err_handler = dev->driver->err_handler;
> +   vote = err_handler->error_detected(dev, result_data->state);
> }
>
> -   err_handler = dev->driver->err_handler;
> -   vote = err_handler->error_detected(dev, result_data->state);
> result_data->result = merge_result(result_data->result, vote);
> return 0;
>  }
>