[PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug

2013-06-15 Thread Gavin Shan
While doing EEH recovery, the PCI devices of the problematic PE
should be removed and then added to the system again. During the
so-called hotplug event, the PCI devices of the problematic PE
will be probed through early/late phase. We would delay EEH probe
on late point for PowerNV platform since the PCI device isn't
available in early phase.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/eeh.c |   16 +++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index cda0b62..7d169d3 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -758,6 +758,14 @@ static void eeh_add_device_early(struct device_node *dn)
 {
struct pci_controller *phb;
 
+   /*
+* If we're doing EEH probe based on PCI device, we
+* would delay the probe until late stage because
+* the PCI device isn't available this moment.
+*/
+   if (!eeh_probe_mode_devtree())
+   return;
+
if (!of_node_to_eeh_dev(dn))
return;
phb = of_node_to_eeh_dev(dn)-phb;
@@ -766,7 +774,6 @@ static void eeh_add_device_early(struct device_node *dn)
if (NULL == phb || 0 == phb-buid)
return;
 
-   /* FIXME: hotplug support on POWERNV */
eeh_ops-of_probe(dn, NULL);
 }
 
@@ -817,6 +824,13 @@ static void eeh_add_device_late(struct pci_dev *dev)
edev-pdev = dev;
dev-dev.archdata.edev = edev;
 
+   /*
+* We have to do the EEH probe here because the PCI device
+* hasn't been created yet in the early stage.
+*/
+   if (eeh_probe_mode_dev())
+   eeh_ops-dev_probe(dev, NULL);
+
eeh_addr_cache_insert_dev(dev);
 }
 
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 07/27] powerpc/eeh: EEH post initialization operation

2013-06-15 Thread Gavin Shan
The patch adds new EEH operation post_init. It's used to notify
the platform that EEH core has completed the EEH probe. By that,
PowerNV platform starts to use the services supplied by EEH
functionality.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |1 +
 arch/powerpc/kernel/eeh.c  |   11 +++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index beb3cbc..beec788 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -131,6 +131,7 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct 
eeh_dev *edev)
 struct eeh_ops {
char *name;
int (*init)(void);
+   int (*post_init)(void);
void* (*of_probe)(struct device_node *dn, void *flag);
int (*dev_probe)(struct pci_dev *dev, void *flag);
int (*set_option)(struct eeh_pe *pe, int option);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c865c5f..a29cf47 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -720,6 +720,17 @@ int __init eeh_init(void)
return -EINVAL;
}
 
+   /*
+* Call platform post-initialization. Actually, It's good chance
+* to inform platform that EEH is ready to supply service if the
+* I/O cache stuff has been built up.
+*/
+   if (eeh_ops-post_init) {
+   ret = eeh_ops-post_init();
+   if (ret)
+   return ret;
+   }
+
if (eeh_subsystem_enabled)
pr_info(EEH: PCI Enhanced I/O Error Handling Enabled\n);
else
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public

2013-06-15 Thread Gavin Shan
While processing EEH event interrupt from P7IOC, we need function
to retrieve the PE according to the indicated EEH device. The patch
makes function eeh_pe_get() public so that other source files can call
it for that purpose. Also, the patch fixes referring to wrong BDF
(Bus/Device/Function) address while searching PE in function
__eeh_pe_get().

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |1 +
 arch/powerpc/kernel/eeh_pe.c   |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 4ac6f70..acdfcaa 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -185,6 +185,7 @@ static inline void eeh_unlock(void)
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int eeh_phb_pe_create(struct pci_controller *phb);
 struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
+struct eeh_pe *eeh_pe_get(struct eeh_dev *edev);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 71c4544..3d2dcf5 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -228,7 +228,7 @@ static void *__eeh_pe_get(void *data, void *flag)
return pe;
 
/* Try BDF address */
-   if (edev-pe_config_addr 
+   if (edev-config_addr 
   (edev-config_addr == pe-config_addr))
return pe;
 
@@ -246,7 +246,7 @@ static void *__eeh_pe_get(void *data, void *flag)
  * which is composed of PCI bus/device/function number, or unified
  * PE address.
  */
-static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
+struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
 {
struct eeh_pe *root = eeh_phb_pe_get(edev-phb);
struct eeh_pe *pe;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public

2013-06-15 Thread Gavin Shan
One of the possible cases indicated by P7IOC interrupt is fenced
PHB. For that case, we need fetch the PE corresponding to the PHB
and disable the PHB and all subordinate PCI buses/devices, recover
from the fenced state and eventually enable the whole PHB. We need
one function to fetch the PHB PE outside eeh_pe.c and the patch is
going to make eeh_phb_pe_get() public for that purpose.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |1 +
 arch/powerpc/kernel/eeh_pe.c   |2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e32c3c5..4ac6f70 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -184,6 +184,7 @@ static inline void eeh_unlock(void)
 
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int eeh_phb_pe_create(struct pci_controller *phb);
+struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb);
 int eeh_add_to_parent_pe(struct eeh_dev *edev);
 int eeh_rmv_from_parent_pe(struct eeh_dev *edev, int purge_pe);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 9d4a9e8..71c4544 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -95,7 +95,7 @@ int eeh_phb_pe_create(struct pci_controller *phb)
  * hierarchy tree is composed of PHB PEs. The function is used
  * to retrieve the corresponding PHB PE according to the given PHB.
  */
-static struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
+struct eeh_pe *eeh_phb_pe_get(struct pci_controller *phb)
 {
struct eeh_pe *pe;
 
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 12/27] powerpc/eeh: EEH backend for P7IOC

2013-06-15 Thread Gavin Shan
For EEH on PowerNV platform, the overall architecture is different
from that on pSeries platform. In order to support multiple I/O chips
in future, we split EEH to 3 layers for PowerNV platform: EEH core,
platform layer, I/O layer. It would give EEH implementation on PowerNV
platform much more flexibility in future.

The patch adds the EEH backend for P7IOC.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/Makefile   |1 +
 arch/powerpc/platforms/powernv/eeh-ioda.c |   44 +
 arch/powerpc/platforms/powernv/pci.h  |   22 ++
 3 files changed, 67 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-ioda.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index bcc3cb4..09bd0cb 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,3 +3,4 @@ obj-y   += opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
+obj-$(CONFIG_EEH)  += eeh-ioda.o
diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
new file mode 100644
index 000..b9564d5
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -0,0 +1,44 @@
+/*
+ * The file intends to implement the functions needed by EEH, which is
+ * built on IODA compliant chip. Actually, lots of functions related
+ * to EEH would be built based on the OPAL APIs.
+ *
+ * Copyright Benjamin Herrenschmidt  Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/bootmem.h
+#include linux/delay.h
+#include linux/init.h
+#include linux/io.h
+#include linux/irq.h
+#include linux/kernel.h
+#include linux/msi.h
+#include linux/pci.h
+#include linux/string.h
+
+#include asm/eeh.h
+#include asm/eeh_event.h
+#include asm/io.h
+#include asm/iommu.h
+#include asm/msi_bitmap.h
+#include asm/opal.h
+#include asm/pci-bridge.h
+#include asm/ppc-pci.h
+#include asm/tce.h
+
+#include powernv.h
+#include pci.h
+
+struct pnv_eeh_ops ioda_eeh_ops = {
+   .post_init  = NULL,
+   .set_option = NULL,
+   .get_state  = NULL,
+   .reset  = NULL,
+   .get_log= NULL,
+   .configure_bridge   = NULL
+};
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 25d76c4..6f69b87 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -66,15 +66,34 @@ struct pnv_ioda_pe {
struct list_headlist;
 };
 
+/* IOC dependent EEH operations */
+#ifdef CONFIG_EEH
+struct pnv_eeh_ops {
+   int (*post_init)(struct pci_controller *hose);
+   int (*set_option)(struct eeh_pe *pe, int option);
+   int (*get_state)(struct eeh_pe *pe);
+   int (*reset)(struct eeh_pe *pe, int option);
+   int (*get_log)(struct eeh_pe *pe, int severity,
+  char *drv_log, unsigned long len);
+   int (*configure_bridge)(struct eeh_pe *pe);
+};
+#endif /* CONFIG_EEH */
+
 struct pnv_phb {
struct pci_controller   *hose;
enum pnv_phb_type   type;
enum pnv_phb_model  model;
+   u64 hub_id;
u64 opal_id;
void __iomem*regs;
int initialized;
spinlock_t  lock;
 
+#ifdef CONFIG_EEH
+   struct pnv_eeh_ops  *eeh_ops;
+   int eeh_enabled;
+#endif
+
 #ifdef CONFIG_PCI_MSI
unsigned intmsi_base;
unsigned intmsi32_support;
@@ -150,6 +169,9 @@ struct pnv_phb {
 };
 
 extern struct pci_ops pnv_pci_ops;
+#ifdef CONFIG_EEH
+extern struct pnv_eeh_ops ioda_eeh_ops;
+#endif
 
 extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
  void *tce_mem, u64 tce_size,
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH v4 00/27] EEH Support for PowerNV platform

2013-06-15 Thread Gavin Shan
Initially, the series of patches is built based on 3.10.RC1 and the patchset
doesn't intend to enable EEH functionality for PHB3 for now. Obviously, PHB3
EEH support on PowerNV platform is something to do in future.

The series of patches intends to support EEH for PowerNV platform. The EEH
core already supports multiple probe methods: device tree nodes and PCI
devices. For EEH on PowerNV, we're using PCI devices to do EEH probe, which
is different from the probe type used on pSeries platform. Another point I
should mention is that the overall EEH would be split up to 3 layers: EEH
core, platform layer and I/O chip layer. It would make the EEH on PowerNV
platform can achieve more flexibility and support more I/O chips in future.
Besides, the EEH event can be produced by detecting 0xFF's from reading
PCI config or I/O registers, or from interrupts dedicated for EEH error
reporting. So we have to handle the EEH error interrupts. On the other hand,
the EEH events will be processed by EEH core like pSeries platform does.

We will have exported debugfs entries 
(/sys/kernel/debug/powerpc/PCI/err_injct),
which allows you to control the 0xD10 register in order to force errors like
frozen PE and fenced PHB for testing purpose. The following example is usualy
what I'm using to control that register. The patchset has been verified on
Firebird-L machine where I have 2 Emulex ethernet card on PHB#0. I keep pinging
to one of the ethernet cards (eth0) from external and then use following 
commands
to produce frozen PE or fenced PHB errors. Eventually, the errors can be 
recovered
and the ethernet card is reachable after temporary connection lost.

Trigger frozen PE:

echo 0x0200  /sys/kernel/debug/powerpc/PCI/err_injct
sleep 1
echo 0x0  /sys/kernel/debug/powerpc/PCI/err_injct

Trigger fenced PHB:

echo 0x8000  /sys/kernel/debug/powerpc/PCI/err_injct

Change log
==

v3 - v4:
* Rebase to 3.10.RC5 with originally first 2 patches from v3 applied and
  won't resend the first 2 patches again.
* Add 2 (first) patches to move the EEH core from pSeries platform to
  arch/powerpc/kernel and applied necessary cleanup.
* PowerNV platform layer initialize the delay for temporarily 
unavailable
  PE state to 0 and set it to default value (1 second) if necessary.
* Change variable names according to Ben's comments.
* Account for the maximal allowed waiting time in 
eeh-powernv.c::powernv_eeh_wait_state()
* Introduce eeh_serialize_lock/unlock so that pci-err.c can inject EEH
  event with consistent PE state (isolated/dead state). In a result,
  pci-err.c::pci_err_seq_sem has been removed completely.
* Introduce PE state (EEH_PE_PHB_DEAD) and the logic to remove the 
corresponding
  PCI domain upon detected dead IOC or PHB, instead of panicing the 
system.
* Remove unnecessary contiguous check on one specific PHB in 
pci-err.c::pci_err_handler().
* Refactor functions in pci-err.c for printing PHB diag-data. The 
diag-data header
  (including version/ioType) have been parsed and call into appropriate 
function
  for outputing the diag-data.
* Changelog adjustment on OPAL notifier according to Ben's comments.
* Split original opal_notifier_enable() to opal_notifier_enable/disable.
* Allow multiple clients to listen same OPAL event change in OPAL 
notifier.
* OPAL notifier is tracing the event change, instead of events.
v2 - v3:
* Rebase to 3.10.RC4
* Replace eeh_pci_dev_traverse() with pci_walk_bus()
* Changlog adjustment to make that more clear
* To call msleep() if possible after opal_pci_poll()
* Make sure we have OPALv3
* OPAL notifier so that we can register callback for the monitored 
events.
  The OPAL notifier is disabled while restarting or powering off the 
system.
* Make the debugfs entries something like (PCI/err_injct)
* Split the patch so that can be backported to stable kernel
* Allow to detect fenced PHB proactively (without interrupt)
* Start to use opal_pci_get_phb_diag_data2()
* Stack dump upon fenced PHB
v1 - v2:
* Rebase to 3.10.RC3
* Don't fetch PE state for the case of fenced PHB. It usually takes long
  time and possiblly incurs softlock warning. It requires the 
corresponding
  changes for the underly firmware
* Add debugfs entries so that we can inject errors like frozen PE and
  fenced PHB for testing purpose

---

arch/powerpc/include/asm/eeh.h |   24 +-
arch/powerpc/include/asm/opal.h|  139 +++-
arch/powerpc/kernel/Makefile   |4 +-
arch/powerpc/kernel/eeh.c  | 1044 
arch/powerpc/kernel/eeh_cache.c|  319 

[PATCH 02/27] powerpc/eeh: Cleanup for EEH core

2013-06-15 Thread Gavin Shan
While moving EEH core around from pSeries platform directory to
arch/powerpc/kernel (in previous one patch), there has lots of
complaints for coding style from git show. The patch is going
to fix them.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/eeh.c|   22 +++---
 arch/powerpc/kernel/eeh_driver.c |   14 +++---
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 6b73d6c..8a83451 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -368,7 +368,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
}
 
eeh_stats.slot_resets++;
- 
+
/* Avoid repeated reports of this failure, including problems
 * with other functions on this device, and functions under
 * bridges.
@@ -525,7 +525,7 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
 * or a fundamental reset (3).
 * A fundamental reset required by any device under
 * Partitionable Endpoint trumps hot-reset.
-*/
+*/
eeh_pe_dev_traverse(pe, eeh_set_dev_freset, freset);
 
if (freset)
@@ -538,8 +538,8 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
 */
 #define PCI_BUS_RST_HOLD_TIME_MSEC 250
msleep(PCI_BUS_RST_HOLD_TIME_MSEC);
-   
-   /* We might get hit with another EEH freeze as soon as the 
+
+   /* We might get hit with another EEH freeze as soon as the
 * pci slot reset line is dropped. Make sure we don't miss
 * these, and clear the flag now.
 */
@@ -604,7 +604,7 @@ void eeh_save_bars(struct eeh_dev *edev)
if (!edev)
return;
dn = eeh_dev_to_of_node(edev);
-   
+
for (i = 0; i  16; i++)
eeh_ops-read_config(dn, i * 4, 4, edev-config_space[i]);
 }
@@ -803,12 +803,12 @@ void eeh_add_device_tree_late(struct pci_bus *bus)
struct pci_dev *dev;
 
list_for_each_entry(dev, bus-devices, bus_list) {
-   eeh_add_device_late(dev);
-   if (dev-hdr_type == PCI_HEADER_TYPE_BRIDGE) {
-   struct pci_bus *subbus = dev-subordinate;
-   if (subbus)
-   eeh_add_device_tree_late(subbus);
-   }
+   eeh_add_device_late(dev);
+   if (dev-hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   struct pci_bus *subbus = dev-subordinate;
+   if (subbus)
+   eeh_add_device_tree_late(subbus);
+   }
}
 }
 EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index a3fefb6..0acc5a2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -154,9 +154,9 @@ static void eeh_enable_irq(struct pci_dev *dev)
  * eeh_report_error - Report pci error to each device driver
  * @data: eeh device
  * @userdata: return value
- * 
- * Report an EEH error to each device driver, collect up and 
- * merge the device driver responses. Cumulative response 
+ *
+ * Report an EEH error to each device driver, collect up and
+ * merge the device driver responses. Cumulative response
  * passed back in userdata.
  */
 static void *eeh_report_error(void *data, void *userdata)
@@ -376,9 +376,9 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus)
eeh_pe_restore_bars(pe);
 
/* Give the system 5 seconds to finish running the user-space
-* hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes, 
-* this is a hack, but if we don't do this, and try to bring 
-* the device up before the scripts have taken it down, 
+* hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
+* this is a hack, but if we don't do this, and try to bring
+* the device up before the scripts have taken it down,
 * potentially weird things happen.
 */
if (bus) {
@@ -520,7 +520,7 @@ void eeh_handle_event(struct eeh_pe *pe)
eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
return;
-   
+
 excess_failures:
/*
 * About 90% of all real-life EEH failures in the field
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware

2013-06-15 Thread Gavin Shan
The patch synchronizes OPAL APIs between kernel and firmware. Also,
we starts to replace opal_pci_get_phb_diag_data() with the similar
opal_pci_get_phb_diag_data2() and the former OPAL API would return
OPAL_UNSUPPORTED from now on.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/opal.h|  135 
 arch/powerpc/platforms/powernv/opal-wrappers.S |3 +
 arch/powerpc/platforms/powernv/pci.c   |3 +-
 3 files changed, 119 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index cbb9305..2880797 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -117,7 +117,13 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_SET_SLOT_LED_STATUS   55
 #define OPAL_GET_EPOW_STATUS   56
 #define OPAL_SET_SYSTEM_ATTENTION_LED  57
+#define OPAL_RESERVED1 58
+#define OPAL_RESERVED2 59
+#define OPAL_PCI_NEXT_ERROR60
+#define OPAL_PCI_EEH_FREEZE_STATUS261
+#define OPAL_PCI_POLL  62
 #define OPAL_PCI_MSI_EOI   63
+#define OPAL_PCI_GET_PHB_DIAG_DATA264
 
 #ifndef __ASSEMBLY__
 
@@ -125,6 +131,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
 enum OpalVendorApiTokens {
OPAL_START_VENDOR_API_RANGE = 1000, OPAL_END_VENDOR_API_RANGE = 1999
 };
+
 enum OpalFreezeState {
OPAL_EEH_STOPPED_NOT_FROZEN = 0,
OPAL_EEH_STOPPED_MMIO_FREEZE = 1,
@@ -134,55 +141,69 @@ enum OpalFreezeState {
OPAL_EEH_STOPPED_TEMP_UNAVAIL = 5,
OPAL_EEH_STOPPED_PERM_UNAVAIL = 6
 };
+
 enum OpalEehFreezeActionToken {
OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO = 1,
OPAL_EEH_ACTION_CLEAR_FREEZE_DMA = 2,
OPAL_EEH_ACTION_CLEAR_FREEZE_ALL = 3
 };
+
 enum OpalPciStatusToken {
-   OPAL_EEH_PHB_NO_ERROR = 0,
-   OPAL_EEH_PHB_FATAL = 1,
-   OPAL_EEH_PHB_RECOVERABLE = 2,
-   OPAL_EEH_PHB_BUS_ERROR = 3,
-   OPAL_EEH_PCI_NO_DEVSEL = 4,
-   OPAL_EEH_PCI_TA = 5,
-   OPAL_EEH_PCIEX_UR = 6,
-   OPAL_EEH_PCIEX_CA = 7,
-   OPAL_EEH_PCI_MMIO_ERROR = 8,
-   OPAL_EEH_PCI_DMA_ERROR = 9
+   OPAL_EEH_NO_ERROR   = 0,
+   OPAL_EEH_IOC_ERROR  = 1,
+   OPAL_EEH_PHB_ERROR  = 2,
+   OPAL_EEH_PE_ERROR   = 3,
+   OPAL_EEH_PE_MMIO_ERROR  = 4,
+   OPAL_EEH_PE_DMA_ERROR   = 5
 };
+
+enum OpalPciErrorSeverity {
+   OPAL_EEH_SEV_NO_ERROR   = 0,
+   OPAL_EEH_SEV_IOC_DEAD   = 1,
+   OPAL_EEH_SEV_PHB_DEAD   = 2,
+   OPAL_EEH_SEV_PHB_FENCED = 3,
+   OPAL_EEH_SEV_PE_ER  = 4,
+   OPAL_EEH_SEV_INF= 5
+};
+
 enum OpalShpcAction {
OPAL_SHPC_GET_LINK_STATE = 0,
OPAL_SHPC_GET_SLOT_STATE = 1
 };
+
 enum OpalShpcLinkState {
OPAL_SHPC_LINK_DOWN = 0,
OPAL_SHPC_LINK_UP = 1
 };
+
 enum OpalMmioWindowType {
OPAL_M32_WINDOW_TYPE = 1,
OPAL_M64_WINDOW_TYPE = 2,
OPAL_IO_WINDOW_TYPE = 3
 };
+
 enum OpalShpcSlotState {
OPAL_SHPC_DEV_NOT_PRESENT = 0,
OPAL_SHPC_DEV_PRESENT = 1
 };
+
 enum OpalExceptionHandler {
OPAL_MACHINE_CHECK_HANDLER = 1,
OPAL_HYPERVISOR_MAINTENANCE_HANDLER = 2,
OPAL_SOFTPATCH_HANDLER = 3
 };
+
 enum OpalPendingState {
-   OPAL_EVENT_OPAL_INTERNAL = 0x1,
-   OPAL_EVENT_NVRAM = 0x2,
-   OPAL_EVENT_RTC = 0x4,
-   OPAL_EVENT_CONSOLE_OUTPUT = 0x8,
-   OPAL_EVENT_CONSOLE_INPUT = 0x10,
-   OPAL_EVENT_ERROR_LOG_AVAIL = 0x20,
-   OPAL_EVENT_ERROR_LOG = 0x40,
-   OPAL_EVENT_EPOW = 0x80,
-   OPAL_EVENT_LED_STATUS = 0x100
+   OPAL_EVENT_OPAL_INTERNAL= 0x1,
+   OPAL_EVENT_NVRAM= 0x2,
+   OPAL_EVENT_RTC  = 0x4,
+   OPAL_EVENT_CONSOLE_OUTPUT   = 0x8,
+   OPAL_EVENT_CONSOLE_INPUT= 0x10,
+   OPAL_EVENT_ERROR_LOG_AVAIL  = 0x20,
+   OPAL_EVENT_ERROR_LOG= 0x40,
+   OPAL_EVENT_EPOW = 0x80,
+   OPAL_EVENT_LED_STATUS   = 0x100,
+   OPAL_EVENT_PCI_ERROR= 0x200
 };
 
 /* Machine check related definitions */
@@ -364,15 +385,80 @@ struct opal_machine_check_event {
} u;
 };
 
+enum {
+   OPAL_P7IOC_DIAG_TYPE_NONE   = 0,
+   OPAL_P7IOC_DIAG_TYPE_RGC= 1,
+   OPAL_P7IOC_DIAG_TYPE_BI = 2,
+   OPAL_P7IOC_DIAG_TYPE_CI = 3,
+   OPAL_P7IOC_DIAG_TYPE_MISC   = 4,
+   OPAL_P7IOC_DIAG_TYPE_I2C= 5,
+   OPAL_P7IOC_DIAG_TYPE_LAST   = 6
+};
+
+struct OpalIoP7IOCErrorData {
+   uint16_t type;
+
+   /* GEM */
+   uint64_t gemXfir;
+   uint64_t gemRfir;
+   uint64_t gemRirqfir;
+   uint64_t gemMask;
+   uint64_t gemRwof;
+
+   /* LEM */
+   uint64_t lemFir;
+   uint64_t lemErrMask;
+   uint64_t lemAction0;
+   uint64_t 

[PATCH 06/27] powerpc/eeh: Make eeh_init() public

2013-06-15 Thread Gavin Shan
For EEH on PowerNV platform, we will do EEH probe based on the
real PCI devices. The PCI devices are available after PCI probe.
So we have to call eeh_init() explicitly on PowerNV platform
after PCI probe. The patch also does EEH probe for PowerNV platform
in eeh_init().

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |8 +++-
 arch/powerpc/kernel/eeh.c  |   22 --
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index f3b49d6..beb3cbc 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -132,7 +132,7 @@ struct eeh_ops {
char *name;
int (*init)(void);
void* (*of_probe)(struct device_node *dn, void *flag);
-   void* (*dev_probe)(struct pci_dev *dev, void *flag);
+   int (*dev_probe)(struct pci_dev *dev, void *flag);
int (*set_option)(struct eeh_pe *pe, int option);
int (*get_pe_addr)(struct eeh_pe *pe);
int (*get_state)(struct eeh_pe *pe, int *state);
@@ -196,6 +196,7 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 void *eeh_dev_init(struct device_node *dn, void *data);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
+int __init eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
@@ -224,6 +225,11 @@ void eeh_remove_bus_device(struct pci_dev *, int);
 
 #else /* !CONFIG_EEH */
 
+static inline int eeh_init(void)
+{
+   return 0;
+}
+
 static inline void *eeh_dev_init(struct device_node *dn, void *data)
 {
return NULL;
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 8a83451..c865c5f 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -674,11 +674,21 @@ int __exit eeh_ops_unregister(const char *name)
  * Even if force-off is set, the EEH hardware is still enabled, so that
  * newer systems can boot.
  */
-static int __init eeh_init(void)
+int __init eeh_init(void)
 {
struct pci_controller *hose, *tmp;
struct device_node *phb;
-   int ret;
+   static int cnt = 0;
+   int ret = 0;
+
+   /*
+* We have to delay the initialization on PowerNV after
+* the PCI hierarchy tree has been built because the PEs
+* are figured out based on PCI devices instead of device
+* tree nodes
+*/
+   if (machine_is(powernv)  cnt++ = 0)
+   return ret;
 
/* call platform initialization function */
if (!eeh_ops) {
@@ -700,6 +710,14 @@ static int __init eeh_init(void)
phb = hose-dn;
traverse_pci_devices(phb, eeh_ops-of_probe, NULL);
}
+   } else if (eeh_probe_mode_dev()) {
+   list_for_each_entry_safe(hose, tmp,
+   hose_list, list_node)
+   pci_walk_bus(hose-bus, eeh_ops-dev_probe, NULL);
+   } else {
+   pr_warning(%s: Invalid probe mode %d\n,
+  __func__, eeh_probe_mode);
+   return -EINVAL;
}
 
if (eeh_subsystem_enabled)
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 14/27] powerpc/eeh: I/O chip EEH enable option

2013-06-15 Thread Gavin Shan
The patch adds the backend to enable or disable EEH functionality
for the specified PE. The backend is also used to enable MMIO or
DMA path for the problematic PE. It's notable that all PEs on
PowerNV platform support EEH functionality by default, and we
disallow to disable EEH for the specific PE.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   65 -
 1 files changed, 64 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 60ac8fe..b77e90e 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -53,9 +53,72 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
return 0;
 }
 
+/**
+ * ioda_eeh_set_option - Set EEH operation or I/O setting
+ * @pe: EEH PE
+ * @option: options
+ *
+ * Enable or disable EEH option for the indicated PE. The
+ * function also can be used to enable I/O or DMA for the
+ * PE.
+ */
+static int ioda_eeh_set_option(struct eeh_pe *pe, int option)
+{
+   s64 ret;
+   u32 pe_no;
+   struct pci_controller *hose = pe-phb;
+   struct pnv_phb *phb = hose-private_data;
+
+   /* Check on PE number */
+   if (pe-addr  0 || pe-addr = phb-ioda.total_pe) {
+   pr_err(%s: PE address %x out of range [0, %x] 
+  on PHB#%x\n,
+   __func__, pe-addr, phb-ioda.total_pe,
+   hose-global_number);
+   return -EINVAL;
+   }
+
+   pe_no = pe-addr;
+   switch (option) {
+   case EEH_OPT_DISABLE:
+   ret = -EEXIST;
+   break;
+   case EEH_OPT_ENABLE:
+   ret = 0;
+   break;
+   case EEH_OPT_THAW_MMIO:
+   ret = opal_pci_eeh_freeze_clear(phb-opal_id, pe_no,
+   OPAL_EEH_ACTION_CLEAR_FREEZE_MMIO);
+   if (ret) {
+   pr_warning(%s: Failed to enable MMIO for 
+  PHB#%x-PE#%x, err=%lld\n,
+   __func__, hose-global_number, pe_no, ret);
+   return -EIO;
+   }
+
+   break;
+   case EEH_OPT_THAW_DMA:
+   ret = opal_pci_eeh_freeze_clear(phb-opal_id, pe_no,
+   OPAL_EEH_ACTION_CLEAR_FREEZE_DMA);
+   if (ret) {
+   pr_warning(%s: Failed to enable DMA for 
+  PHB#%x-PE#%x, err=%lld\n,
+   __func__, hose-global_number, pe_no, ret);
+   return -EIO;
+   }
+
+   break;
+   default:
+   pr_warning(%s: Invalid option %d\n, __func__, option);
+   return -EINVAL;
+   }
+
+   return ret;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
.post_init  = ioda_eeh_post_init,
-   .set_option = NULL,
+   .set_option = ioda_eeh_set_option,
.get_state  = NULL,
.reset  = NULL,
.get_log= NULL,
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 13/27] powerpc/eeh: I/O chip post initialization

2013-06-15 Thread Gavin Shan
The post initialization (struct eeh_ops::post_init) is called after
the EEH probe is done. On the other hand, the EEH core post
initialization is designed to call platform and then I/O chip backend
on PowerNV platform.

The patch adds the backend for I/O chip to notify the platform
that the specific PHB is ready to supply EEH service.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   21 -
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index b9564d5..60ac8fe 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -34,8 +34,27 @@
 #include powernv.h
 #include pci.h
 
+/**
+ * ioda_eeh_post_init - Chip dependent post initialization
+ * @hose: PCI controller
+ *
+ * The function will be called after eeh PEs and devices
+ * have been built. That means the EEH is ready to supply
+ * service with I/O cache.
+ */
+static int ioda_eeh_post_init(struct pci_controller *hose)
+{
+   struct pnv_phb *phb = hose-private_data;
+
+   /* FIXME: Enable it for PHB3 later */
+   if (phb-type == PNV_PHB_IODA1)
+   phb-eeh_enabled = 1;
+
+   return 0;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
-   .post_init  = NULL,
+   .post_init  = ioda_eeh_post_init,
.set_option = NULL,
.get_state  = NULL,
.reset  = NULL,
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval

2013-06-15 Thread Gavin Shan
The patch adds I/O chip backend to retrieve the state for the
indicated PE. While the PE state is temperarily unavailable,
the upper layer (powernv platform) should return default delay
(1 second).

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   99 -
 1 files changed, 98 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index b77e90e..7105a4e 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -116,10 +116,107 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int 
option)
return ret;
 }
 
+/**
+ * ioda_eeh_get_state - Retrieve the state of PE
+ * @pe: EEH PE
+ *
+ * The PE's state should be retrieved from the PEEV, PEST
+ * IODA tables. Since the OPAL has exported the function
+ * to do it, it'd better to use that.
+ */
+static int ioda_eeh_get_state(struct eeh_pe *pe)
+{
+   s64 ret = 0;
+   u8 fstate;
+   u16 pcierr;
+   u32 pe_no;
+   int result;
+   struct pci_controller *hose = pe-phb;
+   struct pnv_phb *phb = hose-private_data;
+
+   /*
+* Sanity check on PE address. The PHB PE address should
+* be zero.
+*/
+   if (pe-addr  0 || pe-addr = phb-ioda.total_pe) {
+   pr_err(%s: PE address %x out of range [0, %x] 
+  on PHB#%x\n,
+   __func__, pe-addr, phb-ioda.total_pe,
+   hose-global_number);
+   return EEH_STATE_NOT_SUPPORT;
+   }
+
+   /* Retrieve PE status through OPAL */
+   pe_no = pe-addr;
+   ret = opal_pci_eeh_freeze_status(phb-opal_id, pe_no,
+   fstate, pcierr, NULL);
+   if (ret) {
+   pr_err(%s: Failed to get EEH status on 
+  PHB#%x-PE#%x\n, err=%lld\n,
+   __func__, hose-global_number, pe_no, ret);
+   return EEH_STATE_NOT_SUPPORT;
+   }
+
+   /* Check PHB status */
+   if (pe-type  EEH_PE_PHB) {
+   result = 0;
+   result = ~EEH_STATE_RESET_ACTIVE;
+
+   if (pcierr != OPAL_EEH_PHB_ERROR) {
+   result |= EEH_STATE_MMIO_ACTIVE;
+   result |= EEH_STATE_DMA_ACTIVE;
+   result |= EEH_STATE_MMIO_ENABLED;
+   result |= EEH_STATE_DMA_ENABLED;
+   }
+
+   return result;
+   }
+
+   /* Parse result out */
+   result = 0;
+   switch (fstate) {
+   case OPAL_EEH_STOPPED_NOT_FROZEN:
+   result = ~EEH_STATE_RESET_ACTIVE;
+   result |= EEH_STATE_MMIO_ACTIVE;
+   result |= EEH_STATE_DMA_ACTIVE;
+   result |= EEH_STATE_MMIO_ENABLED;
+   result |= EEH_STATE_DMA_ENABLED;
+   break;
+   case OPAL_EEH_STOPPED_MMIO_FREEZE:
+   result = ~EEH_STATE_RESET_ACTIVE;
+   result |= EEH_STATE_DMA_ACTIVE;
+   result |= EEH_STATE_DMA_ENABLED;
+   break;
+   case OPAL_EEH_STOPPED_DMA_FREEZE:
+   result = ~EEH_STATE_RESET_ACTIVE;
+   result |= EEH_STATE_MMIO_ACTIVE;
+   result |= EEH_STATE_MMIO_ENABLED;
+   break;
+   case OPAL_EEH_STOPPED_MMIO_DMA_FREEZE:
+   result = ~EEH_STATE_RESET_ACTIVE;
+   break;
+   case OPAL_EEH_STOPPED_RESET:
+   result |= EEH_STATE_RESET_ACTIVE;
+   break;
+   case OPAL_EEH_STOPPED_TEMP_UNAVAIL:
+   result |= EEH_STATE_UNAVAILABLE;
+   break;
+   case OPAL_EEH_STOPPED_PERM_UNAVAIL:
+   result |= EEH_STATE_NOT_SUPPORT;
+   break;
+   default:
+   pr_warning(%s: Unexpected EEH status 0x%x 
+  on PHB#%x-PE#%x\n,
+   __func__, fstate, hose-global_number, pe_no);
+   }
+
+   return result;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
.post_init  = ioda_eeh_post_init,
.set_option = ioda_eeh_set_option,
-   .get_state  = NULL,
+   .get_state  = ioda_eeh_get_state,
.reset  = NULL,
.get_log= NULL,
.configure_bridge   = NULL
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup

2013-06-15 Thread Gavin Shan
The patch adds backends to retrieve error log and configure p2p
bridges for the indicated PE.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   57 -
 1 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index f552e23..95f7d96 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -444,11 +444,64 @@ static int ioda_eeh_reset(struct eeh_pe *pe, int option)
return ret;
 }
 
+/**
+ * ioda_eeh_get_log - Retrieve error log
+ * @pe: EEH PE
+ * @severity: Severity level of the log
+ * @drv_log: buffer to store the log
+ * @len: space of the log buffer
+ *
+ * The function is used to retrieve error log from P7IOC.
+ */
+static int ioda_eeh_get_log(struct eeh_pe *pe, int severity,
+   char *drv_log, unsigned long len)
+{
+   s64 ret;
+   unsigned long flags;
+   struct pci_controller *hose = pe-phb;
+   struct pnv_phb *phb = hose-private_data;
+
+   spin_lock_irqsave(phb-lock, flags);
+
+   ret = opal_pci_get_phb_diag_data2(phb-opal_id,
+   phb-diag.blob, PNV_PCI_DIAG_BUF_SIZE);
+   if (ret) {
+   spin_unlock_irqrestore(phb-lock, flags);
+   pr_warning(%s: Failed to retrieve log for PHB#%x-PE#%x\n,
+   __func__, hose-global_number, pe-addr);
+   return -EIO;
+   }
+
+   /*
+* FIXME: We probably need log the error in somewhere.
+* Lets make it up in future.
+*/
+   /* pr_info(%s, phb-diag.blob); */
+
+   spin_unlock_irqrestore(phb-lock, flags);
+
+   return 0;
+}
+
+/**
+ * ioda_eeh_configure_bridge - Configure the PCI bridges for the indicated PE
+ * @pe: EEH PE
+ *
+ * For particular PE, it might have included PCI bridges. In order
+ * to make the PE work properly, those PCI bridges should be configured
+ * correctly. However, we need do nothing on P7IOC since the reset
+ * function will do everything that should be covered by the function.
+ */
+static int ioda_eeh_configure_bridge(struct eeh_pe *pe)
+{
+   return 0;
+}
+
 struct pnv_eeh_ops ioda_eeh_ops = {
.post_init  = ioda_eeh_post_init,
.set_option = ioda_eeh_set_option,
.get_state  = ioda_eeh_get_state,
.reset  = ioda_eeh_reset,
-   .get_log= NULL,
-   .configure_bridge   = NULL
+   .get_log= ioda_eeh_get_log,
+   .configure_bridge   = ioda_eeh_configure_bridge
 };
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 16/27] powerpc/eeh: I/O chip PE reset

2013-06-15 Thread Gavin Shan
The patch adds the I/O chip backend to do PE reset. For now, we
focus on PCI bus dependent PE. If PHB PE has been put into error
state, the PHB will take complete reset. Besides, the root bridge
will take fundamental or hot reset accordingly if the indicated
PE locates at the toppest of PCI hierarchy tree. Otherwise, the
upstream p2p bridge will take hot reset.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |  233 -
 1 files changed, 232 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 7105a4e..f552e23 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -213,11 +213,242 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
return result;
 }
 
+static int ioda_eeh_pe_clear(struct eeh_pe *pe)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   u32 pe_no;
+   u8 fstate;
+   u16 pcierr;
+   s64 ret;
+
+   pe_no = pe-addr;
+   hose = pe-phb;
+   phb = pe-phb-private_data;
+
+   /* Clear the EEH error on the PE */
+   ret = opal_pci_eeh_freeze_clear(phb-opal_id,
+   pe_no, OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+   if (ret) {
+   pr_err(%s: Failed to clear EEH error for 
+  PHB#%x-PE#%x, err=%lld\n,
+   __func__, hose-global_number, pe_no, ret);
+   return -EIO;
+   }
+
+   /*
+* Read the PE state back and verify that the frozen
+* state has been removed.
+*/
+   ret = opal_pci_eeh_freeze_status(phb-opal_id, pe_no,
+   fstate, pcierr, NULL);
+   if (ret) {
+   pr_err(%s: Failed to get EEH status on 
+  PHB#%x-PE#%x\n, err=%lld\n,
+   __func__, hose-global_number, pe_no, ret);
+   return -EIO;
+   }
+   if (fstate != OPAL_EEH_STOPPED_NOT_FROZEN) {
+   pr_err(%s: Frozen state not cleared on 
+  PHB#%x-PE#%x, sts=%x\n,
+   __func__, hose-global_number, pe_no, fstate);
+   return -EIO;
+   }
+
+   return 0;
+}
+
+static s64 ioda_eeh_phb_poll(struct pnv_phb *phb)
+{
+   s64 rc = OPAL_HARDWARE;
+
+   while (1) {
+   rc = opal_pci_poll(phb-opal_id);
+   if (rc = 0)
+   break;
+
+   msleep(rc);
+   }
+
+   return rc;
+}
+
+static int ioda_eeh_phb_reset(struct pci_controller *hose, int option)
+{
+   struct pnv_phb *phb = hose-private_data;
+   s64 rc = OPAL_HARDWARE;
+
+   pr_debug(%s: Reset PHB#%x, option=%d\n,
+   __func__, hose-global_number, option);
+
+   /* Issue PHB complete reset request */
+   if (option == EEH_RESET_FUNDAMENTAL ||
+   option == EEH_RESET_HOT)
+   rc = opal_pci_reset(phb-opal_id,
+   OPAL_PHB_COMPLETE,
+   OPAL_ASSERT_RESET);
+   else if (option == EEH_RESET_DEACTIVATE)
+   rc = opal_pci_reset(phb-opal_id,
+   OPAL_PHB_COMPLETE,
+   OPAL_DEASSERT_RESET);
+   if (rc  0)
+   goto out;
+
+   /*
+* Poll state of the PHB until the request is done
+* successfully.
+*/
+   rc = ioda_eeh_phb_poll(phb);
+out:
+   if (rc != OPAL_SUCCESS)
+   return -EIO;
+
+   return 0;
+}
+
+static int ioda_eeh_root_reset(struct pci_controller *hose, int option)
+{
+   struct pnv_phb *phb = hose-private_data;
+   s64 rc = OPAL_SUCCESS;
+
+   pr_debug(%s: Reset PHB#%x, option=%d\n,
+   __func__, hose-global_number, option);
+
+   /*
+* During the reset deassert time, we needn't care
+* the reset scope because the firmware does nothing
+* for fundamental or hot reset during deassert phase.
+*/
+   if (option == EEH_RESET_FUNDAMENTAL)
+   rc = opal_pci_reset(phb-opal_id,
+   OPAL_PCI_FUNDAMENTAL_RESET,
+   OPAL_ASSERT_RESET);
+   else if (option == EEH_RESET_HOT)
+   rc = opal_pci_reset(phb-opal_id,
+   OPAL_PCI_HOT_RESET,
+   OPAL_ASSERT_RESET);
+   else if (option == EEH_RESET_DEACTIVATE)
+   rc = opal_pci_reset(phb-opal_id,
+   OPAL_PCI_HOT_RESET,
+   OPAL_DEASSERT_RESET);
+   if (rc  0)
+   goto out;
+
+   /* Poll state of the PHB until the request is done */
+   rc = ioda_eeh_phb_poll(phb);
+out:
+   if (rc != OPAL_SUCCESS)
+   return -EIO;
+
+   return 0;
+}
+
+static int ioda_eeh_bridge_reset(struct pci_controller *hose,
+   

[PATCH 20/27] powerpc/eeh: Enable EEH check for config access

2013-06-15 Thread Gavin Shan
The patch enables EEH check and let EEH core to process the EEH
errors for PowerNV platform while accessing config space. Originally,
the implementation already had mechanism to check EEH errors and
tried to recover from them. However, we never let EEH core to handle
the EEH errors.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/pci.c |   40 +-
 1 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 20af220..6d9a506 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -32,6 +32,8 @@
 #include asm/iommu.h
 #include asm/tce.h
 #include asm/firmware.h
+#include asm/eeh_event.h
+#include asm/eeh.h
 
 #include powernv.h
 #include pci.h
@@ -259,6 +261,10 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 {
struct pci_controller *hose = pci_bus_to_host(bus);
struct pnv_phb *phb = hose-private_data;
+#ifdef CONFIG_EEH
+   struct device_node *busdn, *dn;
+   struct eeh_pe *phb_pe = NULL;
+#endif
u32 bdfn = (((uint64_t)bus-number)  8) | devfn;
s64 rc;
 
@@ -291,8 +297,34 @@ static int pnv_pci_read_config(struct pci_bus *bus,
cfg_dbg(pnv_pci_read_config bus: %x devfn: %x +%x/%x - %08x\n,
bus-number, devfn, where, size, *val);
 
-   /* Check if the PHB got frozen due to an error (no response) */
+   /*
+* Check if the specified PE has been put into frozen
+* state. On the other hand, we needn't do that while
+* the PHB has been put into frozen state because of
+* PHB-fatal errors.
+*/
+#ifdef CONFIG_EEH
+   phb_pe = eeh_phb_pe_get(hose);
+   if (phb_pe  (phb_pe-state  EEH_PE_ISOLATED))
+   return PCIBIOS_SUCCESSFUL;
+
+   if (phb-eeh_enabled) {
+   if (*val == EEH_IO_ERROR_VALUE(size)) {
+   busdn = pci_bus_to_OF_node(bus);
+   for (dn = busdn-child; dn; dn = dn-sibling) {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (pdn  pdn-devfn == devfn 
+   
eeh_dev_check_failure(of_node_to_eeh_dev(dn)))
+   return PCIBIOS_DEVICE_NOT_FOUND;
+   }
+   }
+   } else {
+   pnv_pci_config_check_eeh(phb, bus, bdfn);
+   }
+#else
pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
return PCIBIOS_SUCCESSFUL;
 }
@@ -323,8 +355,14 @@ static int pnv_pci_write_config(struct pci_bus *bus,
default:
return PCIBIOS_FUNC_NOT_SUPPORTED;
}
+
/* Check if the PHB got frozen due to an error (no response) */
+#ifdef CONFIG_EEH
+   if (!phb-eeh_enabled)
+   pnv_pci_config_check_eeh(phb, bus, bdfn);
+#else
pnv_pci_config_check_eeh(phb, bus, bdfn);
+#endif
 
return PCIBIOS_SUCCESSFUL;
 }
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 19/27] powerpc/eeh: Initialization for PowerNV

2013-06-15 Thread Gavin Shan
The patch initializes EEH for PowerNV platform. Because the OPAL
APIs requires HUB ID, we need trace that through struct pnv_phb.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/pci-ioda.c   |   16 +---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |6 --
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9c9d15e..48b0940 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -973,6 +973,11 @@ static void pnv_pci_ioda_fixup(void)
pnv_pci_ioda_setup_PEs();
pnv_pci_ioda_setup_seg();
pnv_pci_ioda_setup_DMA();
+
+#ifdef CONFIG_EEH
+   eeh_addr_cache_build();
+   eeh_init();
+#endif
 }
 
 /*
@@ -1049,7 +1054,8 @@ static void pnv_pci_ioda_shutdown(struct pnv_phb *phb)
   OPAL_ASSERT_RESET);
 }
 
-void __init pnv_pci_init_ioda_phb(struct device_node *np, int ioda_type)
+void __init pnv_pci_init_ioda_phb(struct device_node *np,
+ u64 hub_id, int ioda_type)
 {
struct pci_controller *hose;
static int primary = 1;
@@ -1087,6 +1093,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, 
int ioda_type)
hose-first_busno = 0;
hose-last_busno = 0xff;
hose-private_data = phb;
+   phb-hub_id = hub_id;
phb-opal_id = phb_id;
phb-type = ioda_type;
 
@@ -1172,6 +1179,9 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, 
int ioda_type)
phb-ioda.io_size, phb-ioda.io_segsize);
 
phb-hose-ops = pnv_pci_ops;
+#ifdef CONFIG_EEH
+   phb-eeh_ops = ioda_eeh_ops;
+#endif
 
/* Setup RID - PE mapping function */
phb-bdfn_to_pe = pnv_ioda_bdfn_to_pe;
@@ -1212,7 +1222,7 @@ void __init pnv_pci_init_ioda_phb(struct device_node *np, 
int ioda_type)
 
 void pnv_pci_init_ioda2_phb(struct device_node *np)
 {
-   pnv_pci_init_ioda_phb(np, PNV_PHB_IODA2);
+   pnv_pci_init_ioda_phb(np, 0, PNV_PHB_IODA2);
 }
 
 void __init pnv_pci_init_ioda_hub(struct device_node *np)
@@ -1235,6 +1245,6 @@ void __init pnv_pci_init_ioda_hub(struct device_node *np)
for_each_child_of_node(np, phbn) {
/* Look for IODA1 PHBs */
if (of_device_is_compatible(phbn, ibm,ioda-phb))
-   pnv_pci_init_ioda_phb(phbn, PNV_PHB_IODA1);
+   pnv_pci_init_ioda_phb(phbn, hub_id, PNV_PHB_IODA1);
}
 }
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c 
b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index 92b37a0..ae72616 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -92,7 +92,7 @@ static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
set_iommu_table_base(pdev-dev, phb-p5ioc2.iommu_table);
 }
 
-static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np,
+static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id,
   void *tce_mem, u64 tce_size)
 {
struct pnv_phb *phb;
@@ -133,6 +133,7 @@ static void __init pnv_pci_init_p5ioc2_phb(struct 
device_node *np,
phb-hose-first_busno = 0;
phb-hose-last_busno = 0xff;
phb-hose-private_data = phb;
+   phb-hub_id = hub_id;
phb-opal_id = phb_id;
phb-type = PNV_PHB_P5IOC2;
phb-model = PNV_PHB_MODEL_P5IOC2;
@@ -226,7 +227,8 @@ void __init pnv_pci_init_p5ioc2_hub(struct device_node *np)
for_each_child_of_node(np, phbn) {
if (of_device_is_compatible(phbn, ibm,p5ioc2-pcix) ||
of_device_is_compatible(phbn, ibm,p5ioc2-pciex)) {
-   pnv_pci_init_p5ioc2_phb(phbn, tce_mem, tce_per_phb);
+   pnv_pci_init_p5ioc2_phb(phbn, hub_id,
+   tce_mem, tce_per_phb);
tce_mem += tce_per_phb;
}
}
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH

2013-06-15 Thread Gavin Shan
On PowerNV platform, the EEH event is produced either by detect
on accessing config or I/O registers, or by interrupts dedicated
for EEH report. The patch adds support to process the interrupts
dedicated for EEH report.

Firstly, the kernel thread will be waken up to process incoming
interrupt. The PHBs will be scanned one by one to process all
existing EEH errors. Besides, There're mulple EEH errors that can
be reported from interrupts and we have differentiated actions
against them:

- If the IOC is dead, all PCI buses under all PHBs will be removed
  from the system.
- If the PHB is dead, all PCI buses under the PHB will be removed
  from the system.
- If the PHB is fenced, EEH event will be sent to EEH core and
  the fenced PHB is expected to be resetted completely.
- If specific PE has been put into frozen state, EEH event will
  be sent to EEH core so that the PE will be resetted.
- If the error is informational one, we just output the related
  registers for debugging purpose and no more action will be
  taken.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |1 +
 arch/powerpc/kernel/eeh_driver.c |   10 +
 arch/powerpc/platforms/powernv/Makefile  |2 +-
 arch/powerpc/platforms/powernv/pci-err.c |  519 ++
 arch/powerpc/platforms/powernv/pci.h |1 +
 5 files changed, 532 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/pci-err.c

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7ebf522..b52d8d7 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -52,6 +52,7 @@ struct device_node;
 
 #define EEH_PE_ISOLATED(1  0)/* Isolated PE  
*/
 #define EEH_PE_RECOVERING  (1  1)/* Recovering PE*/
+#define EEH_PE_PHB_DEAD(1  2)/* Dead PHB 
*/
 
 struct eeh_pe {
int type;   /* PE type: PHB/Bus/Device  */
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 0acc5a2..c7e13b0 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -439,6 +439,15 @@ void eeh_handle_event(struct eeh_pe *pe)
 */
eeh_pe_dev_traverse(pe, eeh_report_error, result);
 
+   /*
+* On PowerNV platform, the PHB might have been dead. We need
+* remove all subordinate PCI buses under the dead PHB.
+*/
+   if (eeh_probe_mode_dev() 
+   (pe-type  EEH_PE_PHB) 
+   (pe-state  EEH_PE_PHB_DEAD))
+   goto remove_bus;
+
/* Get the current PCI slot state. This can take a long time,
 * sometimes over 3 seconds for certain systems.
 */
@@ -542,6 +551,7 @@ hard_fail:
 perm_error:
eeh_slot_error_detail(pe, EEH_LOG_PERM);
 
+remove_bus:
/* Notify all devices that they're about to go down. */
eeh_pe_dev_traverse(pe, eeh_report_failure, NULL);
 
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 7fe5951..912fa7c 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,4 +3,4 @@ obj-y   += opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
-obj-$(CONFIG_EEH)  += eeh-ioda.o eeh-powernv.o
+obj-$(CONFIG_EEH)  += pci-err.o eeh-ioda.o eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/pci-err.c 
b/arch/powerpc/platforms/powernv/pci-err.c
new file mode 100644
index 000..e54135b
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -0,0 +1,519 @@
+/*
+ * The file instends to handle those interrupts dedicated for error
+ * detection from IOC chips. Currently, we only support P7IOC and
+ * need support more IOC chips in the future. The interrupts have
+ * been exported to hypervisor through opal-interrupts of ibm,opal
+ * OF node. When one of them comes in, the hypervisor simply turns
+ * to the firmware and expects the appropriate events returned. In
+ * turn, we will format one message and queue that in order to process
+ * it at later point.
+ *
+ * On the other hand, we need maintain information about the states
+ * of IO HUBs and their associated PHBs. The information would be
+ * shared by hypervisor and guests in future. While hypervisor or guests
+ * accessing IO HUBs, PHBs and PEs, the state should be checked and
+ * return approriate results. That would benefit EEH RTAS emulation in
+ * hypervisor as well.
+ *
+ * Copyright Benjamin Herrenschmidt  Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 

[PATCH 23/27] powernv/opal: Notifier for OPAL events

2013-06-15 Thread Gavin Shan
This patch implements a notifier to receive a notification on OPAL
event mask changes. The notifier is only called as a result of an OPAL
interrupt, which will happen upon reception of FSP messages or PCI errors.
Any event mask change detected as a result of opal_poll_events() will not
result in a notifier call.

[benh: changelog]
Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/opal.h   |4 ++
 arch/powerpc/platforms/powernv/opal.c |   74 -
 2 files changed, 77 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 2880797..c5803c0 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -644,6 +644,10 @@ extern void hvc_opal_init_early(void);
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
 
+extern int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t));
+extern void opal_notifier_disable(void);
+extern void opal_notifier_enable(void);
+
 extern int opal_get_chars(uint32_t vtermno, char *buf, int count);
 extern int opal_put_chars(uint32_t vtermno, const char *buf, int total_len);
 
diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 628c564..9e4c9e9 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -26,11 +26,21 @@ struct opal {
u64 entry;
 } opal;
 
+struct opal_cb {
+   struct list_head list;
+   uint64_t mask;
+   void (*cb)(uint64_t);
+};
+
 static struct device_node *opal_node;
 static DEFINE_SPINLOCK(opal_write_lock);
 extern u64 opal_mc_secondary_handler[];
 static unsigned int *opal_irqs;
 static unsigned int opal_irq_count;
+static LIST_HEAD(opal_notifier);
+static DEFINE_SPINLOCK(opal_notifier_lock);
+static uint64_t last_notified_mask = 0x0ul;
+static atomic_t opal_notifier_hold = ATOMIC_INIT(0);
 
 int __init early_init_dt_scan_opal(unsigned long node,
   const char *uname, int depth, void *data)
@@ -95,6 +105,68 @@ static int __init opal_register_exception_handlers(void)
 
 early_initcall(opal_register_exception_handlers);
 
+int opal_notifier_register(uint64_t mask, void (*cb)(uint64_t))
+{
+   unsigned long flags;
+   struct opal_cb *p;
+
+   if (!mask || !cb) {
+   pr_warning(%s: Invalid argument (%llx, %p)!\n,
+   __func__, mask, cb);
+   return -EINVAL;
+   }
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p) {
+   pr_warning(%s: Out of memory (%llx, %p)!\n,
+   __func__, mask, cb);
+   return -ENOMEM;
+   }
+   p-mask = mask;
+   p-cb   = cb;
+
+   spin_lock_irqsave(opal_notifier_lock, flags);
+   list_add_tail(p-list, opal_notifier);
+   spin_unlock_irqrestore(opal_notifier_lock, flags);
+
+   return 0;
+}
+
+static void opal_do_notifier(uint64_t events)
+{
+   struct opal_cb *p;
+   uint64_t changed_mask;
+
+   if (atomic_read(opal_notifier_hold))
+   return;
+
+   changed_mask = last_notified_mask ^ events;
+   last_notified_mask = events;
+
+   list_for_each_entry(p, opal_notifier, list) {
+   if (changed_mask  p-mask)
+   p-cb(events);
+   }
+}
+
+void opal_notifier_disable(void)
+{
+   atomic_set(opal_notifier_hold, 1);
+}
+
+void opal_notifier_enable(void)
+{
+   int64_t rc;
+   uint64_t evt = 0;
+
+   atomic_set(opal_notifier_hold, 0);
+
+   /* Process pending events */
+   rc = opal_poll_events(evt);
+   if (rc == OPAL_SUCCESS  evt)
+   opal_do_notifier(evt);
+}
+
 int opal_get_chars(uint32_t vtermno, char *buf, int count)
 {
s64 len, rc;
@@ -297,7 +369,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
 
opal_handle_interrupt(virq_to_hw(irq), events);
 
-   /* XXX TODO: Do something with the events */
+   opal_do_notifier(events);
 
return IRQ_HANDLED;
 }
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 26/27] powerpc/powernv: Debugfs directory for PHB

2013-06-15 Thread Gavin Shan
The patch creates one debugfs directory (powerpc/PCI) for
each PHB so that we can hook EEH error injection debugfs entry
there in proceeding patch.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   22 ++
 arch/powerpc/platforms/powernv/pci.h  |4 
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 48b0940..0d9d302 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -13,6 +13,7 @@
 
 #include linux/kernel.h
 #include linux/pci.h
+#include linux/debugfs.h
 #include linux/delay.h
 #include linux/string.h
 #include linux/init.h
@@ -968,12 +969,33 @@ static void pnv_pci_ioda_setup_DMA(void)
}
 }
 
+static void pnv_pci_ioda_create_dbgfs(void)
+{
+#ifdef CONFIG_DEBUG_FS
+   struct pci_controller *hose, *tmp;
+   struct pnv_phb *phb;
+   char name[16];
+
+   list_for_each_entry_safe(hose, tmp, hose_list, list_node) {
+   phb = hose-private_data;
+
+   sprintf(name, PCI%04x, hose-global_number);
+   phb-dbgfs = debugfs_create_dir(name, powerpc_debugfs_root);
+   if (!phb-dbgfs)
+   pr_warning(%s: Error on creating debugfs on PHB#%x\n,
+   __func__, hose-global_number);
+   }
+#endif /* CONFIG_DEBUG_FS */
+}
+
 static void pnv_pci_ioda_fixup(void)
 {
pnv_pci_ioda_setup_PEs();
pnv_pci_ioda_setup_seg();
pnv_pci_ioda_setup_DMA();
 
+   pnv_pci_ioda_create_dbgfs();
+
 #ifdef CONFIG_EEH
eeh_addr_cache_build();
eeh_init();
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 08d53b0..d3d67e1 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -95,6 +95,10 @@ struct pnv_phb {
int removed;
 #endif
 
+#ifdef CONFIG_DEBUG_FS
+   struct dentry   *dbgfs;
+#endif
+
 #ifdef CONFIG_PCI_MSI
unsigned intmsi_base;
unsigned intmsi32_support;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff

2013-06-15 Thread Gavin Shan
While we're restarting or powering off the system, we needn't
the OPAL notifier any more. So just to disable that.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/setup.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index d4459bf..84438af 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -93,6 +93,8 @@ static void  __noreturn pnv_restart(char *cmd)
 {
long rc = OPAL_BUSY;
 
+   opal_notifier_disable();
+
while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
rc = opal_cec_reboot();
if (rc == OPAL_BUSY_EVENT)
@@ -108,6 +110,8 @@ static void __noreturn pnv_power_off(void)
 {
long rc = OPAL_BUSY;
 
+   opal_notifier_disable();
+
while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
rc = opal_cec_power_down(0);
if (rc == OPAL_BUSY_EVENT)
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 27/27] powerpc/eeh: Debugfs for error injection

2013-06-15 Thread Gavin Shan
The patch creates debugfs entries (powerpc/PCI/err_injct) for
injecting EEH errors for testing purpose.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   33 -
 1 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 95f7d96..ff7a504 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -12,6 +12,7 @@
  */
 
 #include linux/bootmem.h
+#include linux/debugfs.h
 #include linux/delay.h
 #include linux/init.h
 #include linux/io.h
@@ -34,6 +35,29 @@
 #include powernv.h
 #include pci.h
 
+#ifdef CONFIG_DEBUG_FS
+static int ioda_eeh_dbgfs_set(void *data, u64 val)
+{
+   struct pci_controller *hose = data;
+   struct pnv_phb *phb = hose-private_data;
+
+   out_be64(phb-regs + 0xD10, val);
+   return 0;
+}
+
+static int ioda_eeh_dbgfs_get(void *data, u64 *val)
+{
+   struct pci_controller *hose = data;
+   struct pnv_phb *phb = hose-private_data;
+
+   *val = in_be64(phb-regs + 0xD10);
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_dbgfs_ops, ioda_eeh_dbgfs_get,
+   ioda_eeh_dbgfs_set, 0x%llx\n);
+#endif /* CONFIG_DEBUG_FS */
+
 /**
  * ioda_eeh_post_init - Chip dependent post initialization
  * @hose: PCI controller
@@ -47,8 +71,15 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
struct pnv_phb *phb = hose-private_data;
 
/* FIXME: Enable it for PHB3 later */
-   if (phb-type == PNV_PHB_IODA1)
+   if (phb-type == PNV_PHB_IODA1) {
+#ifdef CONFIG_DEBUG_FS
+   if (phb-dbgfs)
+   debugfs_create_file(err_injct, 0600,
+   phb-dbgfs, hose, ioda_eeh_dbgfs_ops);
+#endif
+
phb-eeh_enabled = 1;
+   }
 
return 0;
 }
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once()

2013-06-15 Thread Gavin Shan
We shouldn't check that the returned PE status is exactly equal to
(EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) but instead only check
that they are both set.

[benh: changelog]
Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/eeh.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index a29cf47..cda0b62 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -565,6 +565,7 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
  */
 int eeh_reset_pe(struct eeh_pe *pe)
 {
+   int flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
int i, rc;
 
/* Take three shots at resetting the bus */
@@ -572,7 +573,7 @@ int eeh_reset_pe(struct eeh_pe *pe)
eeh_reset_pe_once(pe);
 
rc = eeh_ops-wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
-   if (rc == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
+   if ((rc  flags) == flags)
return 0;
 
if (rc  0) {
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 05/27] powerpc/eeh: Trace PCI bus from PE

2013-06-15 Thread Gavin Shan
There're several types of PEs can be supported for now: PHB, Bus
and Device dependent PE. For PCI bus dependent PE, tracing the
corresponding PCI bus from PE (struct eeh_pe) would make the code
more efficient. The patch also enables the retrieval of PCI bus based
on the PCI bus dependent PE.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |1 +
 arch/powerpc/kernel/eeh_pe.c   |   22 ++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index acdfcaa..f3b49d6 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -59,6 +59,7 @@ struct eeh_pe {
int config_addr;/* Traditional PCI address  */
int addr;   /* PE configuration address */
struct pci_controller *phb; /* Associated PHB   */
+   struct pci_bus *bus;/* Top PCI bus for bus PE   */
int check_count;/* Times of ignored error   */
int freeze_count;   /* Times of froze up*/
int false_positives;/* Times of reported #ff's  */
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 3d2dcf5..5bd1637 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -304,6 +304,7 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev 
*edev)
 int eeh_add_to_parent_pe(struct eeh_dev *edev)
 {
struct eeh_pe *pe, *parent;
+   struct eeh_dev *first_edev;
 
eeh_lock();
 
@@ -326,6 +327,21 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
pe-type = EEH_PE_BUS;
edev-pe = pe;
 
+   /*
+* For PCI bus sensitive PE, we can reset the parent
+* bridge in order for hot-reset. However, the PCI
+* devices including the associated EEH devices might
+* be removed when EEH core is doing recovery. So that
+* won't safe to retrieve the bridge through downstream
+* EEH device. We have to trace the parent PCI bus, then
+* the parent bridge explicitly.
+*/
+   if (eeh_probe_mode_dev()  !pe-bus) {
+   first_edev = list_first_entry(pe-edevs,
+ struct eeh_dev, list);
+   pe-bus = eeh_dev_to_pci_dev(first_edev)-bus;
+   }
+
/* Put the edev to PE */
list_add_tail(edev-list, pe-edevs);
eeh_unlock();
@@ -641,12 +657,18 @@ struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe)
bus = pe-phb-bus;
} else if (pe-type  EEH_PE_BUS ||
   pe-type  EEH_PE_DEVICE) {
+   if (pe-bus) {
+   bus = pe-bus;
+   goto out;
+   }
+
edev = list_first_entry(pe-edevs, struct eeh_dev, list);
pdev = eeh_dev_to_pci_dev(edev);
if (pdev)
bus = pdev-bus;
}
 
+out:
eeh_unlock();
 
return bus;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 10/27] powerpc/eeh: Export confirm_error_lock

2013-06-15 Thread Gavin Shan
An EEH event is created and queued to the event queue for each
ingress EEH error. When there're mutiple EEH errors, we need serialize
the process to keep consistent PE state (flags). The spinlock
confirm_error_lock was introduced for the purpose. We'll inject
EEH event upon error reporting interrupts on PowerNV platform. So
we export the spinlock for that to use for consistent PE state.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |   11 +++
 arch/powerpc/kernel/eeh.c  |   10 --
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index beec788..7ebf522 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -148,6 +148,7 @@ struct eeh_ops {
 extern struct eeh_ops *eeh_ops;
 extern int eeh_subsystem_enabled;
 extern struct mutex eeh_mutex;
+extern raw_spinlock_t confirm_error_lock;
 extern int eeh_probe_mode;
 
 #define EEH_PROBE_MODE_DEV (10)  /* From PCI device  */
@@ -178,6 +179,16 @@ static inline void eeh_unlock(void)
mutex_unlock(eeh_mutex);
 }
 
+static inline void eeh_serialize_lock(unsigned long *flags)
+{
+   raw_spin_lock_irqsave(confirm_error_lock, *flags);
+}
+
+static inline void eeh_serialize_unlock(unsigned long flags)
+{
+   raw_spin_unlock_irqrestore(confirm_error_lock, flags);
+}
+
 /*
  * Max number of EEH freezes allowed before we consider the device
  * to be permanently disabled.
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 7d169d3..f7cbeae 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -107,7 +107,7 @@ int eeh_probe_mode;
 DEFINE_MUTEX(eeh_mutex);
 
 /* Lock to avoid races due to multiple reports of an error */
-static DEFINE_RAW_SPINLOCK(confirm_error_lock);
+DEFINE_RAW_SPINLOCK(confirm_error_lock);
 
 /* Buffer for reporting pci register dumps. Its here in BSS, and
  * not dynamically alloced, so that it ends up in RMO where RTAS
@@ -325,7 +325,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 * in one slot might report errors simultaneously, and we
 * only want one error recovery routine running.
 */
-   raw_spin_lock_irqsave(confirm_error_lock, flags);
+   eeh_serialize_lock(flags);
rc = 1;
if (pe-state  EEH_PE_ISOLATED) {
pe-check_count++;
@@ -374,7 +374,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 * bridges.
 */
eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
-   raw_spin_unlock_irqrestore(confirm_error_lock, flags);
+   eeh_serialize_unlock(flags);
 
eeh_send_failure_event(pe);
 
@@ -386,7 +386,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
return 1;
 
 dn_unlock:
-   raw_spin_unlock_irqrestore(confirm_error_lock, flags);
+   eeh_serialize_unlock(flags);
return rc;
 }
 
@@ -702,8 +702,6 @@ int __init eeh_init(void)
return ret;
}
 
-   raw_spin_lock_init(confirm_error_lock);
-
/* Enable EEH for all adapters */
if (eeh_probe_mode_devtree()) {
list_for_each_entry_safe(hose, tmp,
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 18/27] powerpc/eeh: PowerNV EEH backends

2013-06-15 Thread Gavin Shan
The patch adds EEH backends for PowerNV platform. It's notable that
part of those EEH backends call to the I/O chip dependent backends.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/Makefile  |2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  396 ++
 2 files changed, 397 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/eeh-powernv.c

diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 09bd0cb..7fe5951 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -3,4 +3,4 @@ obj-y   += opal-rtc.o opal-nvram.o
 
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_PCI)  += pci.o pci-p5ioc2.o pci-ioda.o
-obj-$(CONFIG_EEH)  += eeh-ioda.o
+obj-$(CONFIG_EEH)  += eeh-ioda.o eeh-powernv.o
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
new file mode 100644
index 000..decb317
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -0,0 +1,396 @@
+/*
+ * The file intends to implement the platform dependent EEH operations on
+ * powernv platform. Actually, the powernv was created in order to fully
+ * hypervisor support.
+ *
+ * Copyright Benjamin Herrenschmidt  Gavin Shan, IBM Corporation 2013.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/atomic.h
+#include linux/delay.h
+#include linux/export.h
+#include linux/init.h
+#include linux/list.h
+#include linux/msi.h
+#include linux/of.h
+#include linux/pci.h
+#include linux/proc_fs.h
+#include linux/rbtree.h
+#include linux/sched.h
+#include linux/seq_file.h
+#include linux/spinlock.h
+
+#include asm/eeh.h
+#include asm/eeh_event.h
+#include asm/firmware.h
+#include asm/io.h
+#include asm/iommu.h
+#include asm/machdep.h
+#include asm/msi_bitmap.h
+#include asm/opal.h
+#include asm/ppc-pci.h
+
+#include powernv.h
+#include pci.h
+
+/**
+ * powernv_eeh_init - EEH platform dependent initialization
+ *
+ * EEH platform dependent initialization on powernv
+ */
+static int powernv_eeh_init(void)
+{
+   /* We require OPALv3 */
+   if (!firmware_has_feature(FW_FEATURE_OPALv3)) {
+   pr_warning(%s: OPALv3 is required !\n, __func__);
+   return -EINVAL;
+   }
+
+   /* Set EEH probe mode */
+   eeh_probe_mode_set(EEH_PROBE_MODE_DEV);
+
+   return 0;
+}
+
+/**
+ * powernv_eeh_post_init - EEH platform dependent post initialization
+ *
+ * EEH platform dependent post initialization on powernv. When
+ * the function is called, the EEH PEs and devices should have
+ * been built. If the I/O cache staff has been built, EEH is
+ * ready to supply service.
+ */
+static int powernv_eeh_post_init(void)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   int ret = 0;
+
+   list_for_each_entry(hose, hose_list, list_node) {
+   phb = hose-private_data;
+
+   if (phb-eeh_ops  phb-eeh_ops-post_init) {
+   ret = phb-eeh_ops-post_init(hose);
+   if (ret)
+   break;
+   }
+   }
+
+   return ret;
+}
+
+/**
+ * powernv_eeh_dev_probe - Do probe on PCI device
+ * @dev: PCI device
+ * @flag: unused
+ *
+ * When EEH module is installed during system boot, all PCI devices
+ * are checked one by one to see if it supports EEH. The function
+ * is introduced for the purpose. By default, EEH has been enabled
+ * on all PCI devices. That's to say, we only need do necessary
+ * initialization on the corresponding eeh device and create PE
+ * accordingly.
+ *
+ * It's notable that's unsafe to retrieve the EEH device through
+ * the corresponding PCI device. During the PCI device hotplug, which
+ * was possiblly triggered by EEH core, the binding between EEH device
+ * and the PCI device isn't built yet.
+ */
+static int powernv_eeh_dev_probe(struct pci_dev *dev, void *flag)
+{
+   struct pci_controller *hose = pci_bus_to_host(dev-bus);
+   struct pnv_phb *phb = hose-private_data;
+   struct device_node *dn = pci_device_to_OF_node(dev);
+   struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+
+   /*
+* When probing the root bridge, which doesn't have any
+* subordinate PCI devices. We don't have OF node for
+* the root bridge. So it's not reasonable to continue
+* the probing.
+*/
+   if (!dn || !edev)
+   return 0;
+
+   /* Skip for PCI-ISA bridge */
+   if ((dev-class  8) == PCI_CLASS_BRIDGE_ISA)
+   return 0;
+
+   /* Initialize eeh device */
+   edev-class_code= dev-class;
+   edev-mode   

[PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error

2013-06-15 Thread Gavin Shan
The patch intends to register OPAL event notifier and process the
PCI errors from firmware. If we have pending PCI errors, the kthread
will be invoked to handle that in turn.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/pci-err.c |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-err.c 
b/arch/powerpc/platforms/powernv/pci-err.c
index e54135b..9b5c4ae 100644
--- a/arch/powerpc/platforms/powernv/pci-err.c
+++ b/arch/powerpc/platforms/powernv/pci-err.c
@@ -425,6 +425,13 @@ static void pci_err_process(struct pci_controller *hose,
}
 }
 
+static void pci_err_event(u64 event)
+{
+   /* Notify kthread to process error */
+   if (event  OPAL_EVENT_PCI_ERROR)
+   up(pci_err_int_sem);
+}
+
 static int pci_err_handler(void *dummy)
 {
struct pnv_phb *phb;
@@ -513,6 +520,16 @@ static int __init pci_err_init(void)
return ret;
}
 
+   /* Register OPAL event notifier */
+   ret = opal_notifier_register(OPAL_EVENT_PCI_ERROR, pci_err_event);
+   if (ret) {
+   kthread_stop(pci_err_thread);
+   free_page((unsigned long)pci_err_diag);
+   pr_err(%s: Failed to register OPAL notifier, rc=%d\n,
+   __func__, ret);
+   return ret;
+   }
+
return 0;
 }
 
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively

2013-06-15 Thread Gavin Shan
It's meaningless to handle frozen PE if we already had fenced PHB.
The patch intends to check the PHB state before checking PE. If the
PHB has been put into fenced state, we need take care of that firstly.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/eeh.c |   60 +
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index f7cbeae..bfd1c20 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -269,6 +269,58 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
return pa | (token  (PAGE_SIZE-1));
 }
 
+/*
+ * On PowerNV platform, we might already have fenced PHB there.
+ * For that case, it's meaningless to recover frozen PE. Intead,
+ * We have to handle fenced PHB firstly.
+ */
+static int eeh_phb_check_failure(struct eeh_pe *pe)
+{
+   struct eeh_pe *phb_pe;
+   unsigned long flags;
+   int ret;
+
+   if (!eeh_probe_mode_dev())
+   return -EPERM;
+
+   /* Find the PHB PE */
+   phb_pe = eeh_phb_pe_get(pe-phb);
+   if (!phb_pe) {
+   pr_warning(%s Can't find PE for PHB#%d\n,
+  __func__, pe-phb-global_number);
+   return -EEXIST;
+   }
+
+   /* If the PHB has been in problematic state */
+   eeh_serialize_lock(flags);
+   if (phb_pe-state  (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)) {
+   ret = 0;
+   goto out;
+   }
+
+   /* Check PHB state */
+   ret = eeh_ops-get_state(phb_pe, NULL);
+   if ((ret  0) ||
+   (ret == EEH_STATE_NOT_SUPPORT) ||
+   (ret  (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
+   (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
+   ret = 0;
+   goto out;
+   }
+
+   /* Isolate the PHB and send event */
+   eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
+   eeh_serialize_unlock(flags);
+   eeh_send_failure_event(phb_pe);
+
+   WARN(1, EEH: PHB failure detected\n);
+
+   return 1;
+out:
+   eeh_serialize_unlock(flags);
+   return ret;
+}
+
 /**
  * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
  * @edev: eeh device
@@ -319,6 +371,14 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
return 0;
}
 
+   /*
+* On PowerNV platform, we might already have fenced PHB
+* there and we need take care of that firstly.
+*/
+   ret = eeh_phb_check_failure(pe);
+   if (ret  0)
+   return ret;
+
/* If we already have a pending isolation event for this
 * slot, we know it's bad already, we don't need to check.
 * Do this checking under a lock; as multiple PCI devices
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC 10/10] irqchip: Make versatile fpga irq driver a generic chip

2013-06-15 Thread Linus Walleij
On Mon, Jun 10, 2013 at 12:50 PM, Grant Likely grant.lik...@linaro.org wrote:
 On Mon, Jun 10, 2013 at 8:40 AM, Linus Walleij linus.wall...@linaro.org 
 wrote:
 On Mon, Jun 10, 2013 at 2:49 AM, Grant Likely grant.lik...@linaro.org 
 wrote:

 This is an RFC patch to convert the versatile FPGA irq controller driver
 to use generic irq chip. It builds on the series that extends the
 generic chip code to allow a linear irq domain to contain one or more
 generic irq chips so that each interrupt controller doesn't need to hand
 code the generic chip setup.

 I've written this as a proof of concept to see if the new generic irq
 code does what it needs to. I had to extend it slightly to properly
 handle the valid mask used by the versatile FPGA driver.

 Tested on QEMU, but not on real hardware.

 Is this the same as the one I tested previously?

 If it need re-testing please push a branch and I'll take it
 for a spin.

 Yes, it's the same, but if you can test the branch I would appreciate
 it. I've done some heavy rework on the irqdomain code that makes
 everything simpler, but also makes it likely that I've broken
 something.

 git://git.secretlab.ca/git/linux.git irqdomain/test

It still works like a charm on the Integrator/AP.
Tested-by: Linus Walleij linus.wall...@linaro.org

Yours,
Linus Walleij
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC 10/10] irqchip: Make versatile fpga irq driver a generic chip

2013-06-15 Thread Linus Walleij
On Sat, Jun 15, 2013 at 11:19 PM, Linus Walleij
linus.wall...@linaro.org wrote:
 On Mon, Jun 10, 2013 at 12:50 PM, Grant Likely grant.lik...@linaro.org 
 wrote:
 On Mon, Jun 10, 2013 at 8:40 AM, Linus Walleij linus.wall...@linaro.org 
 wrote:
 On Mon, Jun 10, 2013 at 2:49 AM, Grant Likely grant.lik...@linaro.org 
 wrote:

 This is an RFC patch to convert the versatile FPGA irq controller driver
 to use generic irq chip. It builds on the series that extends the
 generic chip code to allow a linear irq domain to contain one or more
 generic irq chips so that each interrupt controller doesn't need to hand
 code the generic chip setup.

 I've written this as a proof of concept to see if the new generic irq
 code does what it needs to. I had to extend it slightly to properly
 handle the valid mask used by the versatile FPGA driver.

 Tested on QEMU, but not on real hardware.

 Is this the same as the one I tested previously?

 If it need re-testing please push a branch and I'll take it
 for a spin.

 Yes, it's the same, but if you can test the branch I would appreciate
 it. I've done some heavy rework on the irqdomain code that makes
 everything simpler, but also makes it likely that I've broken
 something.

 git://git.secretlab.ca/git/linux.git irqdomain/test

 It still works like a charm on the Integrator/AP.
 Tested-by: Linus Walleij linus.wall...@linaro.org

BTW here is the new hwirq output in /proc/interrupts and it's really nice:

root@integrator:/proc cat interrupts
   CPU0
 17:   1845   pic   1  uart-pl010
 22:  13716   pic   6  timer
 24:  0   pic   8  rtc-pl030
Err:  0

Yours,
Linus Walleij
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC 10/10] irqchip: Make versatile fpga irq driver a generic chip

2013-06-15 Thread Grant Likely
On Sat, Jun 15, 2013 at 10:22 PM, Linus Walleij
linus.wall...@linaro.org wrote:
 On Sat, Jun 15, 2013 at 11:19 PM, Linus Walleij
 linus.wall...@linaro.org wrote:
 It still works like a charm on the Integrator/AP.
 Tested-by: Linus Walleij linus.wall...@linaro.org

 BTW here is the new hwirq output in /proc/interrupts and it's really nice:

 root@integrator:/proc cat interrupts
CPU0
  17:   1845   pic   1  uart-pl010
  22:  13716   pic   6  timer
  24:  0   pic   8  rtc-pl030
 Err:  0

Glad you like it. :-)

g.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] MDIO: FSL_PQ_MDIO: Fix bug on incorrect offset of tbipa register

2013-06-15 Thread Timur Tabi
On Wed, Jun 12, 2013 at 1:31 PM, Scott Wood scottw...@freescale.com wrote:

 I'm not sure it's stable material if this is something that has never
 worked...

 The device tree binding will also need to be fixed to note the difference in
 reg between fsl,gianfar-mdio and fsl-gianfar-tbi -- and should give an
 example of the latter.

I don't remember how much I tested it, but I'm pretty sure that at
some of the suspect devices did work for me.  My goal was to maintain
compatibility with existing device trees, and just refactor the code
so that it's easier to read.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V10 00/15] THP support for PPC64

2013-06-15 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 20:58 +0530, Aneesh Kumar K.V wrote:
 This is the second patchset needed to support THP on ppc64. Some of
 the changes
 included in this series are tricky in that it changes the powerpc
 linux page table
 walk subtly. We also overload few of the pte flags for ptes at PMD
 level (huge
 page PTEs).
 
 The related mm/ changes are already merged to Andrew's -mm tree.

[Andrea, question for you near the end ]

So I'm trying to understand how you handle races between hash_page
and collapse.

The generic collapse code does:

_pmd = pmdp_clear_flush(vma, address, pmd);

Which expects the architecture to essentially have stopped any
concurrent walk by the time it returns.

Your implementation of the above does this:

pmd = *pmdp;
pmd_clear(pmdp);
/*
 * Now invalidate the hpte entries in the range
 * covered by pmd. This make sure we take a
 * fault and will find the pmd as none, which will
 * result in a major fault which takes mmap_sem and
 * hence wait for collapse to complete. Without this
 * the __collapse_huge_page_copy can result in copying
 * the old content.
 */
flush_tlb_pmd_range(vma-vm_mm, pmd, address);

So we clear the pmd after making a copy of it. This will eventually
prevent a tablewalk but only eventually, once that store becomes visible
to other processors, which may take a while. Then you proceed to flush
the hash table for all the underlying PTEs.

So at this point, hash_page might *still* see the old pmd. Unless I
missed something, you did nothing that will prevent that (the only way
to lock against hash_page is really an IPI  wait or to take the PTE's
busy and make them !present or something). So as far as I can tell,
a concurrent hash_page can still sneak into the hash some small
entries after you have supposedly flushed them.

In addition, my reading of __collapse_huge_page_isolate() is that it
expects page_young() to be stable at that point, while because of the
above, a concurrent hash_page() might still be setting _PAGE_ACCESSED
(and _PAGE_DIRTY).

So it might be that you have a sneaky way to perform the synchronization
that I have missed :-) But so far I haven't seen it

Also, a more general question from Andrea. Looking at the code, I was
originally thinking that there is a similar race with dirty. But then
I noticed that the collapse code doesn't look at dirty at all on the sub
pages, it just ignores it. That stroke me as broken until I also noticed
that you seem to always make the THPs dirty

Is there a reason for that rather than harvesting dirty in the sub pages
and making the THP's dirty the logical OR of the small pages one ?

I understand that anonymous memory is often either zero or dirty, but
I suppose it can be clear with content as well as a result of swap out
and back in, no ? Or is there other reasons why THPs must be dirty
always ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V10 00/15] THP support for PPC64

2013-06-15 Thread Benjamin Herrenschmidt
On Sun, 2013-06-16 at 12:00 +1000, Benjamin Herrenschmidt wrote:
 So at this point, hash_page might *still* see the old pmd. Unless I
 missed something, you did nothing that will prevent that (the only way
 to lock against hash_page is really an IPI  wait or to take the PTE's
 busy and make them !present or something). So as far as I can tell,
 a concurrent hash_page can still sneak into the hash some small
 entries after you have supposedly flushed them.

Note that the _PAGE_PRESENT bit is removed eventually ... but much
later, in __collapse_huge_page_copy() which will also flush the hash, so
at least we will remove a stale hash entry that would have been added by
the race above I suppose...  but:

 - _PAGE_ACCESSED can still potentially be set after it was supposed to
be stable

 - The clearing happens *after* copy_user_highpage(), ie, unless I
missed something here, we potentially still have something writing to
the 4k page while it's being copied, which is BAD.

Now, let me know if I did miss something here :-)

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V10 00/15] THP support for PPC64

2013-06-15 Thread Benjamin Herrenschmidt
On Sun, 2013-06-16 at 13:37 +1000, Benjamin Herrenschmidt wrote:
 On Sun, 2013-06-16 at 12:00 +1000, Benjamin Herrenschmidt wrote:
  So at this point, hash_page might *still* see the old pmd. Unless I
  missed something, you did nothing that will prevent that (the only way
  to lock against hash_page is really an IPI  wait or to take the PTE's
  busy and make them !present or something). So as far as I can tell,
  a concurrent hash_page can still sneak into the hash some small
  entries after you have supposedly flushed them.
 
 Note that the _PAGE_PRESENT bit is removed eventually ... but much
 later, in __collapse_huge_page_copy() which will also flush the hash, so
 at least we will remove a stale hash entry that would have been added by
 the race above I suppose...  but:
 
  - _PAGE_ACCESSED can still potentially be set after it was supposed to
 be stable
 
  - The clearing happens *after* copy_user_highpage(), ie, unless I
 missed something here, we potentially still have something writing to
 the 4k page while it's being copied, which is BAD.
 
 Now, let me know if I did miss something here :-)

An additional issue is that this all collides a bit with Alexey's work
to support TCEs in real mode in KVM, which is necessary to have usable
PCI pass-through.

Look at patches http://patchwork.ozlabs.org/patch/248920/ and followup,
he basically walks the page tables here in a slightly different way than
Paul does in H_ENTER. It's more like gup_fast. It will need to handle
concurrent split/collapse etc... as well which it doesn't right now.

I'm considering merging Alexei stuff first (provided I don't find major
problems with it), then you can provide a new THP series on top of it.

While at it, also fix:

 - Some of your patches are bug fixes (like the one about subpage
protection). They need to be either merged in the main patch or put
before the patch that enables THP.

 - I haven't completely yet considered the impact of the demotion of
segments, but neither do you :-) IE. Under some circumstances, we can
demote entire segments from 64K HW pages to 4K HW pages in the SLB. For
example if a driver (such as HCA) sets the 4K_PFN bit in a PTE, this
will happen at hashing time. I don't think your code deals with that at
all, am I correct ? It *might* be that the right approach with those is:

  * If you find a THP in hash_page and the segment size is 4k, fault

  * In do_page_fault, re-check for that condition (or maybe we can make
hash_page return a specific bit that gets ORed into the error_code into
do_page_fault ?) and split huge pages there.

But that's just an idea off the top of my mind, there might be a better
way. Of course this needs to be tested.

BTW. For the subpage protection, similarily, you need to make sure you
properly map the entire segment as no THP, not just the range
passed-in by the user.

Cheers,
Ben.
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls

2013-06-15 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote:
 This adds real mode handlers for the H_PUT_TCE_INDIRECT and
 H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO
 devices or emulated PCI.  These calls allow adding multiple entries
 (up to 512) into the TCE table in one call which saves time on
 transition to/from real mode.
 
 This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
 (copied from user and verified) before writing the whole list into
 the TCE table. This cache will be utilized more in the upcoming
 VFIO/IOMMU support to continue TCE list processing in the virtual
 mode in the case if the real mode handler failed for some reason.
 
 This adds a guest physical to host real address converter
 and calls the existing H_PUT_TCE handler. The converting function
 is going to be fully utilized by upcoming VFIO supporting patches.
 
 This also implements the KVM_CAP_PPC_MULTITCE capability,
 so in order to support the functionality of this patch, QEMU
 needs to query for this capability and set the hcall-multi-tce
 hypertas property only if the capability is present, otherwise
 there will be serious performance degradation.
 
 Cc: David Gibson da...@gibson.dropbear.id.au
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 ---
 Changelog:
 2013/06/05:
 * fixed mistype about IBMVIO in the commit message
 * updated doc and moved it to another section
 * changed capability number
 
 2013/05/21:
 * added kvm_vcpu_arch::tce_tmp
 * removed cleanup if put_indirect failed, instead we do not even start
 writing to TCE table if we cannot get TCEs from the user and they are
 invalid
 * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
 and kvmppc_emulated_validate_tce (for the previous item)
 * fixed bug with failthrough for H_IPI
 * removed all get_user() from real mode handlers
 * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)
 ---
  Documentation/virtual/kvm/api.txt   |   17 ++
  arch/powerpc/include/asm/kvm_host.h |2 +
  arch/powerpc/include/asm/kvm_ppc.h  |   16 +-
  arch/powerpc/kvm/book3s_64_vio.c|  118 ++
  arch/powerpc/kvm/book3s_64_vio_hv.c |  266 
 +++
  arch/powerpc/kvm/book3s_hv.c|   39 +
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 +
  arch/powerpc/kvm/book3s_pr_papr.c   |   37 -
  arch/powerpc/kvm/powerpc.c  |3 +
  include/uapi/linux/kvm.h|1 +
  10 files changed, 473 insertions(+), 32 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 5f91eda..6c082ff 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed to 
 userspace to be
  handled.
  
 
 +4.83 KVM_CAP_PPC_MULTITCE
 +
 +Capability: KVM_CAP_PPC_MULTITCE
 +Architectures: ppc
 +Type: vm
 +
 +This capability tells the guest that multiple TCE entry add/remove hypercalls
 +handling is supported by the kernel. This significanly accelerates DMA
 +operations for PPC KVM guests.
 +
 +Unlike other capabilities in this section, this one does not have an ioctl.
 +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and
 +H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed to
 +the guest. Othwerwise it might be better for the guest to continue using 
 H_PUT_TCE
 +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present).
 +
 +
  5. The kvm_run structure
  
  
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index af326cd..85d8f26 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -609,6 +609,8 @@ struct kvm_vcpu_arch {
   spinlock_t tbacct_lock;
   u64 busy_stolen;
   u64 busy_preempt;
 +
 + unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */
  #endif
  };
  
 diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
 b/arch/powerpc/include/asm/kvm_ppc.h
 index a5287fe..e852921b 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -133,8 +133,20 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu 
 *vcpu);
  
  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
   struct kvm_create_spapr_tce *args);
 -extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 -  unsigned long ioba, unsigned long tce);
 +extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
 + struct kvm_vcpu *vcpu, unsigned long liobn);
 +extern long kvmppc_emulated_validate_tce(unsigned long tce);
 +extern void kvmppc_emulated_put_tce(struct kvmppc_spapr_tce_table *tt,
 + unsigned long ioba, unsigned long tce);
 +extern long 

Re: [PATCH 2/4] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-06-15 Thread Benjamin Herrenschmidt
 +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
 +int realmode_get_page(struct page *page)
 +{
 + if (PageCompound(page))
 + return -EAGAIN;
 +
 + get_page(page);
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(realmode_get_page);
 +
 +int realmode_put_page(struct page *page)
 +{
 + if (PageCompound(page))
 + return -EAGAIN;
 +
 + if (!atomic_add_unless(page-_count, -1, 1))
 + return -EAGAIN;
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(realmode_put_page);
 +#endif

Several worries here, mostly that if the generic code ever changes
(something gets added to get_page() that makes it no-longer safe for use
in real mode for example, or some other condition gets added to
put_page()), we go out of sync and potentially end up with very hard and
very subtle bugs.

It might be worth making sure that:

 - This is reviewed by some generic VM people (and make sure they
understand why we need to do that)

 - A comment is added to get_page() and put_page() to make sure that if
they are changed in any way, dbl check the impact on our
realmode_get_page() (or ping us to make sure things are still ok).

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/4] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-06-15 Thread Benjamin Herrenschmidt
On Sun, 2013-06-16 at 14:26 +1000, Benjamin Herrenschmidt wrote:
  +int realmode_get_page(struct page *page)
  +{
  + if (PageCompound(page))
  + return -EAGAIN;
  +
  + get_page(page);
  +
  + return 0;
  +}

Shouldn't it be get_page_unless_zero ?

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling

2013-06-15 Thread Benjamin Herrenschmidt
  static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned long hva, bool writing,
 - unsigned long *pte_sizep)
 + unsigned long *pte_sizep, bool do_get_page)
  {
   pte_t *ptep;
   unsigned int shift = 0;
 @@ -135,6 +136,14 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
   if (!pte_present(*ptep))
   return __pte(0);
  
 + /*
 +  * Put huge pages handling to the virtual mode.
 +  * The only exception is for TCE list pages which we
 +  * do need to call get_page() for.
 +  */
 + if ((*pte_sizep  PAGE_SIZE)  do_get_page)
 + return __pte(0);
 +
   /* wait until _PAGE_BUSY is clear then set it atomically */
   __asm__ __volatile__ (
   1: ldarx   %0,0,%3\n
 @@ -148,6 +157,18 @@ static pte_t kvmppc_lookup_pte(pgd_t *pgdir, unsigned 
 long hva, bool writing,
   : cc);
  
   ret = pte;
 + if (do_get_page  pte_present(pte)  (!writing || pte_write(pte))) {
 + struct page *pg = NULL;
 + pg = realmode_pfn_to_page(pte_pfn(pte));
 + if (realmode_get_page(pg)) {
 + ret = __pte(0);
 + } else {
 + pte = pte_mkyoung(pte);
 + if (writing)
 + pte = pte_mkdirty(pte);
 + }
 + }
 + *ptep = pte;/* clears _PAGE_BUSY */
  
   return ret;
  }

So now you are adding the clearing of _PAGE_BUSY that was missing for
your first patch, except that this is not enough since that means that
in the emulated case (ie, !do_get_page) you will in essence return
and then use a PTE that is not locked without any synchronization to
ensure that the underlying page doesn't go away... then you'll
dereference that page.

So either make everything use speculative get_page, or make the emulated
case use the MMU notifier to drop the operation in case of collision.

The former looks easier.

Also, any specific reason why you do:

  - Lock the PTE
  - get_page()
  - Unlock the PTE

Instead of

  - Read the PTE
  - get_page_unless_zero
  - re-check PTE

Like get_user_pages_fast() does ?

The former will be two atomic ops, the latter only one (faster), but
maybe you have a good reason why that can't work...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/9] PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.

2013-06-15 Thread Rusty Russell
Sweep of the simple cases.

Cc: net...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: Julia Lawall julia.law...@lip6.fr
Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 arch/arm/mach-omap2/i2c.c |  2 +-
 arch/m68k/amiga/platform.c|  2 +-
 arch/m68k/kernel/time.c   |  2 +-
 arch/m68k/q40/config.c|  2 +-
 arch/powerpc/kernel/iommu.c   |  2 +-
 arch/powerpc/kernel/time.c|  2 +-
 arch/powerpc/platforms/ps3/time.c |  2 +-
 arch/powerpc/sysdev/rtc_cmos_setup.c  |  2 +-
 arch/s390/hypfs/hypfs_dbfs.c  |  2 +-
 drivers/char/tile-srom.c  |  2 +-
 drivers/infiniband/core/cma.c |  2 +-
 drivers/net/appletalk/cops.c  |  2 +-
 drivers/net/appletalk/ltpc.c  |  2 +-
 drivers/net/ethernet/amd/atarilance.c |  2 +-
 drivers/net/ethernet/amd/mvme147.c|  2 +-
 drivers/net/ethernet/amd/ni65.c   |  2 +-
 drivers/net/ethernet/amd/sun3lance.c  |  2 +-
 drivers/net/wireless/brcm80211/brcmfmac/dhd_dbg.c |  2 +-
 drivers/net/wireless/brcm80211/brcmsmac/debug.c   |  2 +-
 drivers/platform/x86/samsung-q10.c|  2 +-
 drivers/regulator/fan53555.c  |  2 +-
 drivers/spi/spi-fsl-spi.c |  2 +-
 drivers/spi/spidev.c  |  2 +-
 drivers/video/omap2/dss/core.c|  2 +-
 fs/btrfs/dev-replace.c|  2 +-
 fs/btrfs/inode.c  |  2 +-
 net/bluetooth/hci_sysfs.c |  2 +-
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/bridge/netfilter/ebtable_filter.c |  2 +-
 net/bridge/netfilter/ebtable_nat.c|  2 +-
 net/ipv4/netfilter/arptable_filter.c  |  2 +-
 net/ipv4/netfilter/iptable_filter.c   |  2 +-
 net/ipv4/netfilter/iptable_mangle.c   |  2 +-
 net/ipv4/netfilter/iptable_nat.c  |  2 +-
 net/ipv4/netfilter/iptable_raw.c  |  2 +-
 net/ipv4/netfilter/iptable_security.c |  2 +-
 net/ipv6/netfilter/ip6table_filter.c  |  2 +-
 net/ipv6/netfilter/ip6table_mangle.c  |  2 +-
 net/ipv6/netfilter/ip6table_nat.c |  2 +-
 net/ipv6/netfilter/ip6table_raw.c |  2 +-
 net/ipv6/netfilter/ip6table_security.c|  2 +-
 scripts/coccinelle/api/ptr_ret.cocci  | 10 +-
 sound/soc/soc-io.c|  2 +-
 43 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/arch/arm/mach-omap2/i2c.c b/arch/arm/mach-omap2/i2c.c
index d940e53..b456b44 100644
--- a/arch/arm/mach-omap2/i2c.c
+++ b/arch/arm/mach-omap2/i2c.c
@@ -181,7 +181,7 @@ int __init omap_i2c_add_bus(struct 
omap_i2c_bus_platform_data *i2c_pdata,
 sizeof(struct omap_i2c_bus_platform_data));
WARN(IS_ERR(pdev), Could not build omap_device for %s\n, name);
 
-   return PTR_RET(pdev);
+   return PTR_ERR_OR_ZERO(pdev);
 }
 
 static  int __init omap_i2c_cmdline(void)
diff --git a/arch/m68k/amiga/platform.c b/arch/m68k/amiga/platform.c
index 6083088..dacd9f9 100644
--- a/arch/m68k/amiga/platform.c
+++ b/arch/m68k/amiga/platform.c
@@ -56,7 +56,7 @@ static int __init amiga_init_bus(void)
n = AMIGAHW_PRESENT(ZORRO3) ? 4 : 2;
pdev = platform_device_register_simple(amiga-zorro, -1,
   zorro_resources, n);
-   return PTR_RET(pdev);
+   return PTR_ERR_OR_ZERO(pdev);
 }
 
 subsys_initcall(amiga_init_bus);
diff --git a/arch/m68k/kernel/time.c b/arch/m68k/kernel/time.c
index bea6bcf..7eb9792 100644
--- a/arch/m68k/kernel/time.c
+++ b/arch/m68k/kernel/time.c
@@ -90,7 +90,7 @@ static int __init rtc_init(void)
return -ENODEV;
 
pdev = platform_device_register_simple(rtc-generic, -1, NULL, 0);
-   return PTR_RET(pdev);
+   return PTR_ERR_OR_ZERO(pdev);
 }
 
 module_init(rtc_init);
diff --git a/arch/m68k/q40/config.c b/arch/m68k/q40/config.c
index 658542b..078bb74 100644
--- a/arch/m68k/q40/config.c
+++ b/arch/m68k/q40/config.c
@@ -338,6 +338,6 @@ static __init int q40_add_kbd_device(void)
return -ENODEV;
 
pdev = platform_device_register_simple(q40kbd, -1, NULL, 0);
-   return PTR_RET(pdev);
+   return PTR_ERR_OR_ZERO(pdev);
 }
 arch_initcall(q40_add_kbd_device);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c0d0dbd..3a149eb 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -102,7 +102,7 @@ static int __init fail_iommu_debugfs(void)
struct dentry *dir = fault_create_debugfs_attr(fail_iommu,
  

Re: [PATCH 4/4] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

2013-06-15 Thread Benjamin Herrenschmidt
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote:

 @@ -185,7 +186,31 @@ static unsigned long kvmppc_realmode_gpa_to_hpa(struct 
 kvm_vcpu *vcpu,
   unsigned long hva, hpa, pg_size = 0, offset;
   unsigned long gfn = gpa  PAGE_SHIFT;
   bool writing = gpa  TCE_PCI_WRITE;
 + struct kvmppc_iommu_hugepage *hp;
  
 + /*
 +  * Try to find an already used hugepage.
 +  * If it is not there, the kvmppc_lookup_pte() will return zero
 +  * as it won't do get_page() on a huge page in real mode
 +  * and therefore the request will be passed to the virtual mode.
 +  */
 + if (tt) {
 + spin_lock(tt-hugepages_lock);
 + list_for_each_entry(hp, tt-hugepages, list) {
 + if ((gpa  hp-gpa) || (gpa = hp-gpa + hp-size))
 + continue;
 +
 + /* Calculate host phys address keeping flags and offset 
 in the page */
 + offset = gpa  (hp-size - 1);
 +
 + /* pte_pfn(pte) should return an address aligned to 
 pg_size */
 + hpa = (pte_pfn(hp-pte)  PAGE_SHIFT) + offset;
 + spin_unlock(tt-hugepages_lock);
 +
 + return hpa;
 + }
 + spin_unlock(tt-hugepages_lock);
 + }

Wow  this is run in real mode right ?

spin_lock() and spin_unlock() are a big no-no in real mode. If lockdep
and/or spinlock debugging are enabled and something goes pear-shaped
they are going to bring your whole system down in a blink in quite
horrible ways.

If you are going to do that, you need some kind of custom low-level
lock.

Also, I see that you are basically using a non-ordered list and doing a
linear search in it every time. That's going to COST !

You should really consider a more efficient data structure. You should
also be able to do something that doesn't require locks for readers.

   /* Find a KVM memslot */
   memslot = search_memslots(kvm_memslots(vcpu-kvm), gfn);
   if (!memslot)
 @@ -237,6 +262,10 @@ static long kvmppc_clear_tce_real_mode(struct kvm_vcpu 
 *vcpu,
   if (oldtce  TCE_PCI_WRITE)
   SetPageDirty(page);
  
 + /* Do not put a huge page and continue without error */
 + if (PageCompound(page))
 + continue;
 +
   if (realmode_put_page(page)) {
   ret = H_TOO_HARD;
   break;
 @@ -282,7 +311,7 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned 
 long liobn,
   if (iommu_tce_put_param_check(tbl, ioba, tce))
   return H_PARAMETER;
  
 - hpa = kvmppc_realmode_gpa_to_hpa(vcpu, tce, true);
 + hpa = kvmppc_realmode_gpa_to_hpa(vcpu, tt, tce, true);
   if (hpa == ERROR_ADDR) {
   vcpu-arch.tce_reason = H_TOO_HARD;
   return H_TOO_HARD;
 @@ -295,6 +324,11 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned 
 long liobn,
   if (unlikely(ret)) {
   struct page *pg = realmode_pfn_to_page(hpa);
   BUG_ON(!pg);
 +
 + /* Do not put a huge page and return an error */
 + if (!PageCompound(pg))
 + return H_HARDWARE;
 +
   if (realmode_put_page(pg)) {
   vcpu-arch.tce_reason = H_HARDWARE;
   return H_TOO_HARD;
 @@ -351,7 +385,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
   vcpu-arch.tce_tmp_num = 0;
   vcpu-arch.tce_reason = 0;
  
 - tces = (unsigned long *) kvmppc_realmode_gpa_to_hpa(vcpu,
 + tces = (unsigned long *) kvmppc_realmode_gpa_to_hpa(vcpu, NULL,
   tce_list, false);
   if ((unsigned long)tces == ERROR_ADDR)
   return H_TOO_HARD;
 @@ -374,7 +408,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
  
   /* Translate TCEs and go get_page */
   for (i = 0; i  npages; ++i) {
 - unsigned long hpa = kvmppc_realmode_gpa_to_hpa(vcpu,
 + unsigned long hpa = kvmppc_realmode_gpa_to_hpa(vcpu, tt,
   vcpu-arch.tce_tmp[i], true);
   if (hpa == ERROR_ADDR) {
   vcpu-arch.tce_tmp_num = i;

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH

2013-06-15 Thread Benjamin Herrenschmidt
On Sat, 2013-06-15 at 17:03 +0800, Gavin Shan wrote:
 On PowerNV platform, the EEH event is produced either by detect
 on accessing config or I/O registers, or by interrupts dedicated
 for EEH report. The patch adds support to process the interrupts
 dedicated for EEH report.
 
 Firstly, the kernel thread will be waken up to process incoming
 interrupt. The PHBs will be scanned one by one to process all
 existing EEH errors. Besides, There're mulple EEH errors that can
 be reported from interrupts and we have differentiated actions
 against them:
 
 - If the IOC is dead, all PCI buses under all PHBs will be removed
   from the system.
 - If the PHB is dead, all PCI buses under the PHB will be removed
   from the system.
 - If the PHB is fenced, EEH event will be sent to EEH core and
   the fenced PHB is expected to be resetted completely.
 - If specific PE has been put into frozen state, EEH event will
   be sent to EEH core so that the PE will be resetted.
 - If the error is informational one, we just output the related
   registers for debugging purpose and no more action will be
   taken.

Getting better but:

 - I still don't like having a kthread for that. Why not use schedule_work() ?

 - We already have an EEH thread, why not just use it ? IE send it a special
type of message that makes it query the backend for error info instead ?

 - I'm not fan of exposing that EEH private lock. I don't entirely understand
why you need to do that either.

Generally speaking, I'm thinking this file should contain less stuff, most of
it should move into the ioda backend, the interrupt just turning into some
request down to the existing EEH thread.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev