Re: [PATCH] powerpc/eeh: crash caused by null eeh_dev

2012-04-17 Thread Anton Blanchard
Hi Gavin,

 The problem was reported by Anton Blanchard. While EEH error
 happened to the PCI device without the corresponding device
 driver, kernel crash was seen. Eventually, I successfully
 reproduced the problem on Firebird-L machine with utility
 errinjct. Initially, the device driver for Emulex ethernet
 MAC has been disabled from .config and force data parity on
 the Emulex ethernet MAC with help of errinjct. Eventually,
 I saw the kernel crash after issueing couple of lspci -v
 command.
 
 The root cause behind is that the PCI device, including the
 reference to the corresponding eeh device, will be removed
 from the system while EEH does recovery. Afterwards, the
 PCI device will be probed again and added into the system
 accordingly. So it's not safe to retrieve the eeh device from
 the corresponding PCI device after the PCI device has been removed
 and not added again.
 
 The patch fixes the issue and retrieve the eeh device from OF node
 instead of PCI device after the PCI device has been removed.

Thanks, this does fix the oops I see.

Tested-by: Anton Blanchard an...@samba.org

Anton

 Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
 ---
  arch/powerpc/platforms/pseries/eeh.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/platforms/pseries/eeh.c
 b/arch/powerpc/platforms/pseries/eeh.c index 309d38e..a75e37d 100644
 --- a/arch/powerpc/platforms/pseries/eeh.c
 +++ b/arch/powerpc/platforms/pseries/eeh.c
 @@ -1076,7 +1076,7 @@ static void eeh_add_device_late(struct pci_dev
 *dev) pr_debug(EEH: Adding device %s\n, pci_name(dev));
  
   dn = pci_device_to_OF_node(dev);
 - edev = pci_dev_to_eeh_dev(dev);
 + edev = of_node_to_eeh_dev(dn);
   if (edev-pdev == dev) {
   pr_debug(EEH: Already referenced !\n);
   return;

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc/eeh: crash caused by null eeh_dev

2012-04-16 Thread Gavin Shan
The problem was reported by Anton Blanchard. While EEH error
happened to the PCI device without the corresponding device
driver, kernel crash was seen. Eventually, I successfully
reproduced the problem on Firebird-L machine with utility
errinjct. Initially, the device driver for Emulex ethernet
MAC has been disabled from .config and force data parity on
the Emulex ethernet MAC with help of errinjct. Eventually,
I saw the kernel crash after issueing couple of lspci -v
command.

The root cause behind is that the PCI device, including the
reference to the corresponding eeh device, will be removed
from the system while EEH does recovery. Afterwards, the
PCI device will be probed again and added into the system
accordingly. So it's not safe to retrieve the eeh device from
the corresponding PCI device after the PCI device has been removed
and not added again.

The patch fixes the issue and retrieve the eeh device from OF node
instead of PCI device after the PCI device has been removed.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/eeh.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 309d38e..a75e37d 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -1076,7 +1076,7 @@ static void eeh_add_device_late(struct pci_dev *dev)
pr_debug(EEH: Adding device %s\n, pci_name(dev));
 
dn = pci_device_to_OF_node(dev);
-   edev = pci_dev_to_eeh_dev(dev);
+   edev = of_node_to_eeh_dev(dn);
if (edev-pdev == dev) {
pr_debug(EEH: Already referenced !\n);
return;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev