[git pull] Please pull powerpc.git merge branch

2012-09-05 Thread Benjamin Herrenschmidt
Hi Linus !

Here are a few fixes for 3.6 that were piling up while I was away or
busy (I was mostly MIA a week or two before San Diego). Some fixes from
Anton fixing up issues with our relatively new DSCR control feature,
and a few other fixes that are either regressions or bugs nasty enough
to warrant not waiting.

Cheers,
Ben.

The following changes since commit 5b716ac728bcc01b1f2a7ed6e437196602237c27:

  Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 (2012-09-02 
11:30:10 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git 

for you to fetch changes up to 636802ef96eebe279b22ad9f9dacfe29291e45c7:

  powerpc: Don't use __put_user() in patch_instruction (2012-09-05 16:05:23 
+1000)


Anton Blanchard (4):
  powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
  powerpc: Keep thread.dscr and thread.dscr_inherit in sync
  powerpc: Fix DSCR inheritance in copy_thread()
  powerpc: Restore correct DSCR in context switch

Benjamin Herrenschmidt (1):
  powerpc: Don't use __put_user() in patch_instruction

Jesse Larrew (1):
  powerpc/vphn: Fix arch_update_cpu_topology() return value

Paul Mackerras (3):
  powerpc: Give hypervisor decrementer interrupts their own handler
  powerpc/powernv: Always go into nap mode when CPU is offline
  powerpc: Make sure IPI handlers see data written by IPI senders

 arch/powerpc/include/asm/processor.h |1 +
 arch/powerpc/kernel/asm-offsets.c|1 +
 arch/powerpc/kernel/dbell.c  |2 ++
 arch/powerpc/kernel/entry_64.S   |   23 +--
 arch/powerpc/kernel/exceptions-64s.S |3 ++-
 arch/powerpc/kernel/idle_power7.S|2 ++
 arch/powerpc/kernel/process.c|   12 ++--
 arch/powerpc/kernel/smp.c|   11 +--
 arch/powerpc/kernel/sysfs.c  |   10 ++
 arch/powerpc/kernel/time.c   |9 +
 arch/powerpc/kernel/traps.c  |3 ++-
 arch/powerpc/lib/code-patching.c |2 +-
 arch/powerpc/mm/numa.c   |7 ---
 arch/powerpc/platforms/powernv/smp.c |   10 +-
 arch/powerpc/sysdev/xics/icp-hv.c|6 +-
 15 files changed, 68 insertions(+), 34 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 00/21 V3] powerpc/eeh: PE support

2012-09-05 Thread Gavin Shan
The series of patches address explicit PE support as well as probe type
support. For explicit PE support, struct eeh_pe has been introduced.
While designing the struct, following factors have been taken into
account.

   * For one particular PE, it might be composed of single PCI device,
 or multiple PCI devices and its educed children PCI devices (e.g.
 by PCIe bridges). The PE struct has included a linked list to refer
 the included PCI devices. Also, the linked list of devices has relected
 top-to-bottom fasion of the PCI subtree. That's to say, the first device
 in the linked list should be the toppest element in the PCI subtree which
 is being managed by the PE.
   * PEs correlate to each other. So the existing PEs have to form hierarchy
 levels. There're some fields in PE struct (e.g. parent/child/silbing)
 have been introduced for the purpose.
   * For one PE, it's only meaningful in the PHB domain.

In addition, the mechniasm used to do memory bars restore, error report have
been reworked based on PE. The eeh cache has been reworked for a little bit
based on Ben's suggestion to trace eeh device. 

In order for explicit probe support, either OF node or pci device, global
variable and some inline functions are introduced. For pSeries platform, it's
going to support OF node probe and figure out PEs from the corresponding OF
nodes. In contrast, powernv platform has to use pci device probe type since
the PEs are being constructed at PHB fixup time.

The series of patches have been verified on Firebird-L machine using errinjct
utility. Here's the command used for that.

errinjct eeh -v -f 0 -p U78AE.001.WZS00M9-P1-C18-L1-T2 -a 0x0 -m 0x0

V2 - V3:
* Rebase to 3.6.RC4.
V1 - V2:
* Rebase to 3.5.RC4.
* Use the link list to trace the relationships of PEs, PE and eeh
  devices according to Ram's suggestion.
* Simplify the PE tranverse function according to Ram's example.
* Move EEH initialization around according to Ben's suggestion so
  that we can do memory allocation through slab.
* Use kzmalloc() to allocate memory chunks for PE and eeh devices.
* More booting messages for EEH initialization functions.
* Introduce global EEH mutex to protect the PEs and eeh devices.
* Added functions to support PE removal.
* Comments cleanup
* Change on the comparison of PE or BDF (Bus/Device/Function)
  address so that code looks more readable.

-

arch/powerpc/include/asm/eeh.h   |  132 +--
arch/powerpc/include/asm/eeh_event.h |6 +-
arch/powerpc/include/asm/pci-bridge.h|2 +
arch/powerpc/include/asm/ppc-pci.h   |   15 +-
arch/powerpc/kernel/rtas_pci.c   |5 +-
arch/powerpc/platforms/pseries/Makefile  |5 +-
arch/powerpc/platforms/pseries/eeh.c |  527 +--
arch/powerpc/platforms/pseries/eeh_cache.c   |   19 +-
arch/powerpc/platforms/pseries/eeh_dev.c |   14 +-
arch/powerpc/platforms/pseries/eeh_driver.c  |  235 +--
arch/powerpc/platforms/pseries/eeh_event.c   |   54 +--
arch/powerpc/platforms/pseries/eeh_pe.c  |  583 ++
arch/powerpc/platforms/pseries/eeh_pseries.c |  246 
arch/powerpc/platforms/pseries/eeh_sysfs.c   |9 -
arch/powerpc/platforms/pseries/msi.c |6 +-
arch/powerpc/platforms/pseries/setup.c   |2 -
16 files changed, 1119 insertions(+), 741 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/eeh_pe.c

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 01/21] ppc/eeh: move EEH initialization around

2012-09-05 Thread Gavin Shan
Currently, we have 3 phases for EEH initialization on pSeries platform
using builtin functions: platform initialization, EEH device creation,
and EEH subsystem enablement. All of them are done no later than
ppc_md.setup_arch. That means that the slab/slub isn't ready yet, so
we have to allocate memory chunks on basis of PAGE_SIZE for those
dynamically created EEH devices. That's pretty expensive.

In order to utilize slab/slub for memory allocation, we have to move
the EEH initialization functions around, but all of them should be
called after slab/slub is ready.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |   16 
 arch/powerpc/kernel/rtas_pci.c   |3 ---
 arch/powerpc/platforms/pseries/eeh.c |   10 +++---
 arch/powerpc/platforms/pseries/eeh_dev.c |6 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c |4 +++-
 arch/powerpc/platforms/pseries/setup.c   |2 --
 6 files changed, 15 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index d60f998..06dedff 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -117,11 +117,6 @@ extern int eeh_subsystem_enabled;
 
 void * __devinit eeh_dev_init(struct device_node *dn, void *data);
 void __devinit eeh_dev_phb_init_dynamic(struct pci_controller *phb);
-void __init eeh_dev_phb_init(void);
-void __init eeh_init(void);
-#ifdef CONFIG_PPC_PSERIES
-int __init eeh_pseries_init(void);
-#endif
 int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
@@ -156,17 +151,6 @@ static inline void *eeh_dev_init(struct device_node *dn, 
void *data)
 
 static inline void eeh_dev_phb_init_dynamic(struct pci_controller *phb) { }
 
-static inline void eeh_dev_phb_init(void) { }
-
-static inline void eeh_init(void) { }
-
-#ifdef CONFIG_PPC_PSERIES
-static inline int eeh_pseries_init(void)
-{
-   return 0;
-}
-#endif /* CONFIG_PPC_PSERIES */
-
 static inline unsigned long eeh_check_failure(const volatile void __iomem 
*token, unsigned long val)
 {
return val;
diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index 179af90..140735c 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -275,9 +275,6 @@ void __init find_and_init_phbs(void)
of_node_put(root);
pci_devs_phb_init();
 
-   /* Create EEH devices for all PHBs */
-   eeh_dev_phb_init();
-
/*
 * PCI_PROBE_ONLY and PCI_REASSIGN_ALL_BUS can be set via properties
 * in chosen.
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index ecd394c..e819448 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -982,7 +982,7 @@ int __exit eeh_ops_unregister(const char *name)
  * Even if force-off is set, the EEH hardware is still enabled, so that
  * newer systems can boot.
  */
-void __init eeh_init(void)
+static int __init eeh_init(void)
 {
struct pci_controller *hose, *tmp;
struct device_node *phb;
@@ -992,11 +992,11 @@ void __init eeh_init(void)
if (!eeh_ops) {
pr_warning(%s: Platform EEH operation not found\n,
__func__);
-   return;
+   return -EEXIST;
} else if ((ret = eeh_ops-init())) {
pr_warning(%s: Failed to call platform init function (%d)\n,
__func__, ret);
-   return;
+   return ret;
}
 
raw_spin_lock_init(confirm_error_lock);
@@ -1011,8 +1011,12 @@ void __init eeh_init(void)
printk(KERN_INFO EEH: PCI Enhanced I/O Error Handling 
Enabled\n);
else
printk(KERN_WARNING EEH: No capable adapters found\n);
+
+   return ret;
 }
 
+core_initcall_sync(eeh_init);
+
 /**
  * eeh_add_device_early - Enable EEH for the indicated device_node
  * @dn: device node for which to set up EEH
diff --git a/arch/powerpc/platforms/pseries/eeh_dev.c 
b/arch/powerpc/platforms/pseries/eeh_dev.c
index c4507d0..ab68c59 100644
--- a/arch/powerpc/platforms/pseries/eeh_dev.c
+++ b/arch/powerpc/platforms/pseries/eeh_dev.c
@@ -93,10 +93,14 @@ void __devinit eeh_dev_phb_init_dynamic(struct 
pci_controller *phb)
  * Scan all the existing PHBs and create EEH devices for their OF
  * nodes and their children OF nodes
  */
-void __init eeh_dev_phb_init(void)
+static int __init eeh_dev_phb_init(void)
 {
struct pci_controller *phb, *tmp;
 
list_for_each_entry_safe(phb, tmp, hose_list, list_node)
eeh_dev_phb_init_dynamic(phb);
+
+   return 0;
 }
+
+core_initcall(eeh_dev_phb_init);
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index c33360ec..5e2805a 

[PATCH 03/21] ppc/eeh: more logs for EEH initialization

2012-09-05 Thread Gavin Shan
The patch adds more logs to EEH initialization functions for
debugging purpose. Also, the machine type (pSeries) is checked
in the platform initialization to assure it's the correct platform
to invoke it.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/eeh_dev.c |2 ++
 arch/powerpc/platforms/pseries/eeh_pseries.c |   13 -
 2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_dev.c 
b/arch/powerpc/platforms/pseries/eeh_dev.c
index 8e3443b..a0cee3a 100644
--- a/arch/powerpc/platforms/pseries/eeh_dev.c
+++ b/arch/powerpc/platforms/pseries/eeh_dev.c
@@ -100,6 +100,8 @@ static int __init eeh_dev_phb_init(void)
list_for_each_entry_safe(phb, tmp, hose_list, list_node)
eeh_dev_phb_init_dynamic(phb);
 
+   pr_info(EEH: devices created\n);
+
return 0;
 }
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 5e2805a..46616c8 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -561,7 +561,18 @@ static struct eeh_ops pseries_eeh_ops = {
  */
 static int __init eeh_pseries_init(void)
 {
-   return eeh_ops_register(pseries_eeh_ops);
+   int ret = -EINVAL;
+
+   if (!machine_is(pseries))
+   return ret;
+
+   ret = eeh_ops_register(pseries_eeh_ops);
+   if (!ret)
+   pr_info(EEH: pSeries platform initialized\n);
+   else
+   pr_info(EEH: pSeries platform initialization failure\n);
+
+   return ret;
 }
 
 early_initcall(eeh_pseries_init);
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 05/21] ppc/eeh: introduce global mutex

2012-09-05 Thread Gavin Shan
The patch introduces global mutex for EEH so that the core data
structures can be protected by that. Also, 2 inline functions
are exported for that: eeh_lock() and eeh_unlock().

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |   15 +++
 arch/powerpc/platforms/pseries/eeh.c |3 +++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index f77b6d7..248b3d9 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -146,6 +146,17 @@ struct eeh_ops {
 
 extern struct eeh_ops *eeh_ops;
 extern int eeh_subsystem_enabled;
+extern struct mutex eeh_mutex;
+
+static inline void eeh_lock(void)
+{
+   mutex_lock(eeh_mutex);
+}
+
+static inline void eeh_unlock(void)
+{
+   mutex_unlock(eeh_mutex);
+}
 
 /*
  * Max number of EEH freezes allowed before we consider the device
@@ -206,6 +217,10 @@ static inline void eeh_add_device_tree_early(struct 
device_node *dn) { }
 static inline void eeh_add_device_tree_late(struct pci_bus *bus) { }
 
 static inline void eeh_remove_bus_device(struct pci_dev *dev) { }
+
+static inline void eeh_lock(void) { }
+static inline void eeh_unlock(void) { }
+
 #define EEH_POSSIBLE_ERROR(val, type) (0)
 #define EEH_IO_ERROR_VALUE(size) (-1UL)
 #endif /* CONFIG_EEH */
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index e819448..0ba7e3b 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -92,6 +92,9 @@ struct eeh_ops *eeh_ops = NULL;
 int eeh_subsystem_enabled;
 EXPORT_SYMBOL(eeh_subsystem_enabled);
 
+/* Global EEH mutex */
+DEFINE_MUTEX(eeh_mutex);
+
 /* Lock to avoid races due to multiple reports of an error */
 static DEFINE_RAW_SPINLOCK(confirm_error_lock);
 
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 04/21] ppc/eeh: Introduce eeh_pe struct

2012-09-05 Thread Gavin Shan
As defined in PAPR 2.4, Partitionable Endpoint (PE) is an I/O subtree
that can be treated as a unit for the purposes of partitioning and error
recovery. Therefore, eeh core should be aware of PE. With eeh_pe struct,
we can support PE explicitly. Further more, it makes all the staff as
data centralized. Another important reason is for eeh core to support
multiple platforms. Some of them like pSeries figures out PEs through
OF nodes while others like powernv have to do that through PCI bus/device
tree. With explicit PE support, eeh core will be implemented based on
the centrialized data and platform dependent implementations figure it
out by their feasible ways.

When the struct is designed, following factors are taken in account:
  * Reflecting the relationships of PEs. PE might have parent
as well children.
  * Reflecting the association of PE and (eeh) devices.
  * PEs have PHB boundary.
  * PE should have unique address assigned in the corresponding
PHB domain.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |   38 ++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 06dedff..f77b6d7 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -32,6 +32,42 @@ struct device_node;
 #ifdef CONFIG_EEH
 
 /*
+ * The struct is used to trace PE related EEH functionality.
+ * In theory, there will have one instance of the struct to
+ * be created against particular PE. In nature, PEs corelate
+ * to each other. the struct has to reflect that hierarchy in
+ * order to easily pick up those affected PEs when one particular
+ * PE has EEH errors.
+ *
+ * Also, one particular PE might be composed of PCI device, PCI
+ * bus and its subordinate components. The struct also need ship
+ * the information. Further more, one particular PE is only meaingful
+ * in the corresponding PHB. Therefore, the root PEs should be created
+ * against existing PHBs in on-to-one fashion.
+ */
+#define EEH_PE_PHB 1   /* PHB PE*/
+#define EEH_PE_DEVICE  2   /* Device PE */
+#define EEH_PE_BUS 3   /* Bus PE*/
+
+#define EEH_PE_ISOLATED(1  0)/* Isolated PE  
*/
+#define EEH_PE_RECOVERING  (1  1)/* Recovering PE*/
+
+struct eeh_pe {
+   int type;   /* PE type: PHB/Bus/Device  */
+   int state;  /* PE EEH dependent mode*/
+   int config_addr;/* Traditional PCI address  */
+   int addr;   /* PE configuration address */
+   struct pci_controller *phb; /* Associated PHB   */
+   int check_count;/* Times of ignored error   */
+   int freeze_count;   /* Times of froze up*/
+   int false_positives;/* Times of reported #ff's  */
+   struct eeh_pe *parent;  /* Parent PE*/
+   struct list_head child_list;/* Link PE to the child list*/
+   struct list_head edevs; /* Link list of EEH devices */
+   struct list_head child; /* Child PEs*/
+};
+
+/*
  * The struct is used to trace EEH state for the associated
  * PCI device node or PCI device. In future, it might
  * represent PE as well so that the EEH device to form
@@ -53,6 +89,8 @@ struct eeh_dev {
int freeze_count;   /* Times of froze up*/
int false_positives;/* Times of reported #ff's  */
u32 config_space[16];   /* Saved PCI config space   */
+   struct eeh_pe *pe;  /* Associated PE*/
+   struct list_head list;  /* Form link list in the PE */
struct pci_controller *phb; /* Associated PHB   */
struct device_node *dn; /* Associated device node   */
struct pci_dev *pdev;   /* Associated PCI device*/
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 06/21] ppc/eeh: Create PEs for PHBs

2012-09-05 Thread Gavin Shan
For one particular PE, it's only meaningful in the ancestor PHB
domain. Therefore, each PHB should have its own PE hierarchy tree
to trace those PEs created against the PHB.

The patch creates PEs for the PHBs and put those PEs into the
global link list traced by eeh_phb_pe. The link list of PEs
would be first level of overall PE hierarchy tree across the
system.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |2 +
 arch/powerpc/platforms/pseries/Makefile  |5 +-
 arch/powerpc/platforms/pseries/eeh_dev.c |4 +
 arch/powerpc/platforms/pseries/eeh_pe.c  |  103 ++
 4 files changed, 112 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/eeh_pe.c

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 248b3d9..7b9c7d6 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -164,6 +164,8 @@ static inline void eeh_unlock(void)
  */
 #define EEH_MAX_ALLOWED_FREEZES 5
 
+int __devinit eeh_phb_pe_create(struct pci_controller *phb);
+
 void * __devinit eeh_dev_init(struct device_node *dn, void *data);
 void __devinit eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 int __init eeh_ops_register(struct eeh_ops *ops);
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index c222189..890622b 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,8 +6,9 @@ obj-y   := lpar.o hvCall.o nvram.o reconfig.o \
   firmware.o power.o dlpar.o mobility.o
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_SCANLOG)  += scanlog.o
-obj-$(CONFIG_EEH)  += eeh.o eeh_dev.o eeh_cache.o eeh_driver.o \
-  eeh_event.o eeh_sysfs.o eeh_pseries.o
+obj-$(CONFIG_EEH)  += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
+  eeh_driver.o eeh_event.o eeh_sysfs.o \
+  eeh_pseries.o
 obj-$(CONFIG_KEXEC)+= kexec.o
 obj-$(CONFIG_PCI)  += pci.o pci_dlpar.o
 obj-$(CONFIG_PSERIES_MSI)  += msi.o
diff --git a/arch/powerpc/platforms/pseries/eeh_dev.c 
b/arch/powerpc/platforms/pseries/eeh_dev.c
index a0cee3a..6644234 100644
--- a/arch/powerpc/platforms/pseries/eeh_dev.c
+++ b/arch/powerpc/platforms/pseries/eeh_dev.c
@@ -65,6 +65,7 @@ void * __devinit eeh_dev_init(struct device_node *dn, void 
*data)
PCI_DN(dn)-edev = edev;
edev-dn  = dn;
edev-phb = phb;
+   INIT_LIST_HEAD(edev-list);
 
return NULL;
 }
@@ -80,6 +81,9 @@ void __devinit eeh_dev_phb_init_dynamic(struct pci_controller 
*phb)
 {
struct device_node *dn = phb-dn;
 
+   /* EEH PE for PHB */
+   eeh_phb_pe_create(phb);
+
/* EEH device for PHB */
eeh_dev_init(dn, phb);
 
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c 
b/arch/powerpc/platforms/pseries/eeh_pe.c
new file mode 100644
index 000..20d65dc
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -0,0 +1,103 @@
+/*
+ * The file intends to implement PE based on the information from
+ * platforms. Basically, there have 3 types of PEs: PHB/Bus/Device.
+ * All the PEs should be organized as hierarchy tree. The first level
+ * of the tree will be associated to existing PHBs since the particular
+ * PE is only meaningful in one PHB domain.
+ *
+ * Copyright Benjamin Herrenschmidt  Gavin Shan, IBM Corporation 2012.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ */
+
+#include linux/export.h
+#include linux/gfp.h
+#include linux/init.h
+#include linux/kernel.h
+#include linux/pci.h
+#include linux/string.h
+
+#include asm/pci-bridge.h
+#include asm/ppc-pci.h
+
+static LIST_HEAD(eeh_phb_pe);
+
+/**
+ * eeh_phb_pe_create - Create PHB PE 
+ * @phb: PCI controller
+ *
+ * The function should be called while the PHB is detected during
+ * system boot or PCI hotplug in order to create PHB PE.
+ */
+int __devinit eeh_phb_pe_create(struct pci_controller *phb)
+{
+   struct eeh_pe *pe;
+
+   /* Allocate PHB PE */
+   pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
+   if (!pe) {
+   pr_err(%s: out of memory!\n, __func__);
+   return -ENOMEM;
+   }
+
+   /* Initialize PHB PE */
+  

[PATCH 07/21] ppc/eeh: Search PE based on requirement

2012-09-05 Thread Gavin Shan
The patch implements searching PE based on the following
requirements:

 * Search PE according to PE address, which is traditional
   PE address that is composed of PCI bus/device/function
   number, or unified PE address assigned by firmware or
   platform.
 * Search parent PE according to the given EEH device. It's
   useful when creating new PE and put it into right position.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h  |1 +
 arch/powerpc/platforms/pseries/eeh_pe.c |  146 +++
 2 files changed, 147 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 7b9c7d6..1cc1388 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -164,6 +164,7 @@ static inline void eeh_unlock(void)
  */
 #define EEH_MAX_ALLOWED_FREEZES 5
 
+typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int __devinit eeh_phb_pe_create(struct pci_controller *phb);
 
 void * __devinit eeh_dev_init(struct device_node *dn, void *data);
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c 
b/arch/powerpc/platforms/pseries/eeh_pe.c
index 20d65dc..f019953 100644
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -101,3 +101,149 @@ static struct eeh_pe *eeh_phb_pe_get(struct 
pci_controller *phb)
return NULL;
 }
 
+/**
+ * eeh_pe_next - Retrieve the next PE in the tree
+ * @pe: current PE
+ * @root: root PE
+ *
+ * The function is used to retrieve the next PE in the
+ * hierarchy PE tree.
+ */
+static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe,
+ struct eeh_pe *root)
+{
+   struct list_head *next = pe-child_list.next;
+
+   if (next == pe-child_list) {
+   while (1) {
+   if (pe == root)
+   return NULL;
+   next = pe-child.next;
+   if (next != pe-parent-child_list)
+   break;
+   pe = pe-parent;
+   }
+   }
+
+   return list_entry(next, struct eeh_pe, child);
+}
+
+/**
+ * eeh_pe_traverse - Traverse PEs in the specified PHB
+ * @root: root PE
+ * @fn: callback
+ * @flag: extra parameter to callback
+ *
+ * The function is used to traverse the specified PE and its
+ * child PEs. The traversing is to be terminated once the
+ * callback returns something other than NULL, or no more PEs
+ * to be traversed.
+ */
+static void *eeh_pe_traverse(struct eeh_pe *root,
+   eeh_traverse_func fn, void *flag)
+{
+   struct eeh_pe *pe;
+   void *ret;
+
+   for (pe = root; pe; pe = eeh_pe_next(pe, root)) {
+   ret = fn(pe, flag);
+   if (ret) return ret;
+   }
+
+   return NULL;
+}
+
+/**
+ * __eeh_pe_get - Check the PE address
+ * @data: EEH PE
+ * @flag: EEH device
+ *
+ * For one particular PE, it can be identified by PE address
+ * or tranditional BDF address. BDF address is composed of
+ * Bus/Device/Function number. The extra data referred by flag
+ * indicates which type of address should be used.
+ */
+static void *__eeh_pe_get(void *data, void *flag)
+{
+   struct eeh_pe *pe = (struct eeh_pe *)data;
+   struct eeh_dev *edev = (struct eeh_dev *)flag;
+
+   /* Unexpected PHB PE */
+   if (pe-type == EEH_PE_PHB)
+   return NULL;
+
+   /* We prefer PE address */
+   if (edev-pe_config_addr 
+   (edev-pe_config_addr == pe-addr))
+   return pe;
+
+   /* Try BDF address */
+   if (edev-pe_config_addr 
+   (edev-config_addr == pe-config_addr))
+   return pe;
+
+   return NULL;
+}
+
+/**
+ * eeh_pe_get - Search PE based on the given address
+ * @edev: EEH device
+ *
+ * Search the corresponding PE based on the specified address which
+ * is included in the eeh device. The function is used to check if
+ * the associated PE has been created against the PE address. It's
+ * notable that the PE address has 2 format: traditional PE address
+ * which is composed of PCI bus/device/function number, or unified
+ * PE address.
+ */
+static struct eeh_pe *eeh_pe_get(struct eeh_dev *edev)
+{
+   struct eeh_pe *root = eeh_phb_pe_get(edev-phb);
+   struct eeh_pe *pe;
+
+   eeh_lock();
+   pe = eeh_pe_traverse(root, __eeh_pe_get, edev);
+   eeh_unlock();
+
+   return pe;
+}
+
+/**
+ * eeh_pe_get_parent - Retrieve the parent PE
+ * @edev: EEH device
+ *
+ * The whole PEs existing in the system are organized as hierarchy
+ * tree. The function is used to retrieve the parent PE according
+ * to the parent EEH device.
+ */
+static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev *edev)
+{
+   struct device_node *dn;
+   struct eeh_dev *parent;
+
+   /*
+* It might have the case for the indirect parent
+* EEH device already having associated PE, 

[PATCH 08/21] ppc/eeh: create PEs duing EEH initialization

2012-09-05 Thread Gavin Shan
The patch creates PEs and associated the newly created PEs with
it parent/silbing as well as EEH devices. It would become more
straight to trace EEH errors and recover them accordingly.

Once the EEH functionality on one PCI IOA has been enabled, we
tries to create PE against it. If there's existing PE, to which
the current PCI IOA should be attached, the existing PE will be
converted from device type to bus type accordingly.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h  |1 +
 arch/powerpc/platforms/pseries/eeh.c|6 ++
 arch/powerpc/platforms/pseries/eeh_pe.c |   89 +++
 3 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 1cc1388..e41107d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -166,6 +166,7 @@ static inline void eeh_unlock(void)
 
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int __devinit eeh_phb_pe_create(struct pci_controller *phb);
+int eeh_pe_create(struct eeh_dev *edev);
 
 void * __devinit eeh_dev_init(struct device_node *dn, void *data);
 void __devinit eeh_dev_phb_init_dynamic(struct pci_controller *phb);
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 0ba7e3b..99937da 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -895,6 +895,8 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
eeh_subsystem_enabled = 1;
edev-mode |= EEH_MODE_SUPPORTED;
 
+   eeh_pe_create(edev);
+
pr_debug(EEH: %s: eeh enabled, config=%x 
pe_config=%x\n,
 dn-full_name, edev-config_addr,
 edev-pe_config_addr);
@@ -908,6 +910,10 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
/* Parent supports EEH. */
edev-mode |= EEH_MODE_SUPPORTED;
edev-config_addr = 
of_node_to_eeh_dev(dn-parent)-config_addr;
+   edev-pe_config_addr = 
of_node_to_eeh_dev(dn-parent)-pe_config_addr;
+
+   eeh_pe_create(edev);
+
return NULL;
}
}
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c 
b/arch/powerpc/platforms/pseries/eeh_pe.c
index f019953..56aab91 100644
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -247,3 +247,92 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev 
*edev)
return NULL;
 }
 
+/**
+ * eeh_pe_create - Create EEH PE according to EEH device
+ * @edev: EEH device
+ *
+ * Create EEH PE according to the specified EEH device and
+ * put the EEH PE into appropriate place in the PE hierarchy
+ * tree.
+ */
+int eeh_pe_create(struct eeh_dev *edev)
+{
+   struct eeh_pe *pe, *parent;
+
+   /*
+* Search the PE has been existing or not according
+* to the PE address. If that has been existing, the
+* PE should be composed of PCI bus and its subordinate
+* components.
+*/
+   pe = eeh_pe_get(edev);
+   if (pe) {
+   if (!edev-pe_config_addr) {
+   pr_err(%s: PE with addr 0x%x already exists\n,
+   __func__, edev-config_addr);
+   return -EEXIST;
+   }
+
+   /* Mark the PE as type of PCI bus */
+   pe-type = EEH_PE_BUS;
+   edev-pe = pe;
+
+   /* Put the edev to PE */
+   list_add_tail(edev-list, pe-edevs);
+
+   pr_info(EEH: Add %s to Bus PE#%x\n,
+   edev-dn-full_name, pe-addr);
+
+return 0;
+   }
+
+   /* Create a new EEH PE */
+   pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
+   if (!pe) {
+   pr_err(%s: out of memory!\n, __func__);
+   return -ENOMEM;
+   }
+   pe-addr= edev-pe_config_addr;
+   pe-config_addr = edev-config_addr;
+   pe-type= EEH_PE_DEVICE;
+   pe-phb = edev-phb;
+   INIT_LIST_HEAD(pe-child_list);
+   INIT_LIST_HEAD(pe-child);
+   INIT_LIST_HEAD(pe-edevs);
+
+   /*
+* Put the new EEH PE into hierarchy tree. If the parent
+* can't be found, the newly created PE will be attached
+* to PHB directly. Otherwise, we have to associate the
+* PE with its parent.
+*/
+   parent = eeh_pe_get_parent(edev);
+   if (!parent) {
+   parent = eeh_phb_pe_get(edev-phb);
+   if (!parent) {
+   pr_err(%s: No PHB PE is found (PHB Domain=%d)\n,
+   __func__, 

[PATCH 09/21] ppc/eeh: remove PE at appropriate time

2012-09-05 Thread Gavin Shan
During PCI hotplug and EEH recovery, the PE hierarchy PE might be
changed due to the PCI topology changes. At later point when the
PCI device is added, the PE will be created dynamically again.

The patch introduces new function to remove EEH devices from the
associated PE. That also can cause that the parent PE is removed
from the PE tree if the parent PE doesn't include valid EEH devices
and child PEs.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h  |1 +
 arch/powerpc/platforms/pseries/eeh.c|1 +
 arch/powerpc/platforms/pseries/eeh_pe.c |   46 +++
 3 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e41107d..fd69584 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -167,6 +167,7 @@ static inline void eeh_unlock(void)
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int __devinit eeh_phb_pe_create(struct pci_controller *phb);
 int eeh_pe_create(struct eeh_dev *edev);
+int eeh_pe_remove(struct eeh_dev *edev);
 
 void * __devinit eeh_dev_init(struct device_node *dn, void *data);
 void __devinit eeh_dev_phb_init_dynamic(struct pci_controller *phb);
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 99937da..82a5fdc 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -1156,6 +1156,7 @@ static void eeh_remove_device(struct pci_dev *dev)
dev-dev.archdata.edev = NULL;
pci_dev_put(dev);
 
+   eeh_pe_remove(edev);
pci_addr_cache_remove_device(dev);
eeh_sysfs_remove_device(dev);
 }
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c 
b/arch/powerpc/platforms/pseries/eeh_pe.c
index 56aab91..ed675b8 100644
--- a/arch/powerpc/platforms/pseries/eeh_pe.c
+++ b/arch/powerpc/platforms/pseries/eeh_pe.c
@@ -336,3 +336,49 @@ int eeh_pe_create(struct eeh_dev *edev)
 return 0;
 }
 
+/**
+ * eeh_pe_remove - Remove one EEH device from the associated PE
+ * @edev: EEH device
+ *
+ * The PE hierarchy tree might be changed when doing PCI hotplug.
+ * Also, the PCI devices or buses could be removed from the system
+ * during EEH recovery.
+ */
+int eeh_pe_remove(struct eeh_dev *edev)
+{
+   struct eeh_pe *pe, *parent;
+
+   if (!edev-pe) {
+   pr_err(%s: No PE found for EEH device %s\n,
+   __func__, edev-dn-full_name);
+   return -EEXIST;
+   }
+
+   /* Remove the EEH device */
+   pe = edev-pe;
+   edev-pe = NULL;
+   list_del(edev-list);
+
+   /*
+* Check if the parent PE includes any EEH devices.
+* If not, we should delete that. Also, we should
+* delete the parent PE if it doesn't have associated
+* child PEs and EEH devices.
+*/
+   while (1) {
+   parent = pe-parent;
+   if (pe-type  EEH_PE_PHB)
+   break;
+
+   if (list_empty(pe-edevs) 
+   list_empty(pe-child_list)) {
+   list_del(pe-child);
+   kfree(pe);
+   }
+
+   pe = parent;
+   }
+
+   return 0;
+}
+
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 10/21] ppc/eeh: build EEH event based on PE

2012-09-05 Thread Gavin Shan
The original implementation builds EEH event based on EEH device.
We already had dedicated struct to depict PE. It's reasonable to
build EEH event based on PE.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh_event.h   |4 +-
 arch/powerpc/platforms/pseries/eeh_event.c |   29 +++
 2 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh_event.h 
b/arch/powerpc/include/asm/eeh_event.h
index c68b012..dc722b5 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -28,10 +28,10 @@
  */
 struct eeh_event {
struct list_headlist;   /* to form event queue  */
-   struct eeh_dev  *edev;  /* EEH device   */
+   struct eeh_pe   *pe;/* EEH PE   */
 };
 
-int eeh_send_failure_event(struct eeh_dev *edev);
+int eeh_send_failure_event(struct eeh_pe *pe);
 struct eeh_dev *handle_eeh_events(struct eeh_event *);
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/platforms/pseries/eeh_event.c 
b/arch/powerpc/platforms/pseries/eeh_event.c
index 6132772..7f89f1e 100644
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ b/arch/powerpc/platforms/pseries/eeh_event.c
@@ -119,36 +119,23 @@ static void eeh_thread_launcher(struct work_struct *dummy)
 
 /**
  * eeh_send_failure_event - Generate a PCI error event
- * @edev: EEH device
+ * @pe: EEH PE
  *
  * This routine can be called within an interrupt context;
  * the actual event will be delivered in a normal context
  * (from a workqueue).
  */
-int eeh_send_failure_event(struct eeh_dev *edev)
+int eeh_send_failure_event(struct eeh_pe *pe)
 {
unsigned long flags;
struct eeh_event *event;
-   struct device_node *dn = eeh_dev_to_of_node(edev);
-   const char *location;
-
-   if (!mem_init_done) {
-   printk(KERN_ERR EEH: event during early boot not handled\n);
-   location = of_get_property(dn, ibm,loc-code, NULL);
-   printk(KERN_ERR EEH: device node = %s\n, dn-full_name);
-   printk(KERN_ERR EEH: PCI location = %s\n, location);
-   return 1;
-   }
-   event = kzalloc(sizeof(*event), GFP_ATOMIC);
-   if (event == NULL) {
-   printk(KERN_ERR EEH: out of memory, event not handled\n);
-   return 1;
-   }
-
-   if (edev-pdev)
-   pci_dev_get(edev-pdev);
 
-   event-edev = edev;
+   event = kzalloc(sizeof(*event), GFP_ATOMIC);
+   if (!event) {
+   pr_err(EEH: out of memory, event not handled\n);
+   return -ENOMEM;
+   }
+   event-pe = pe;
 
/* We may or may not be called in an interrupt context */
spin_lock_irqsave(eeh_eventlist_lock, flags);
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 11/21] ppc/eeh: trace EEH state based on PE

2012-09-05 Thread Gavin Shan
Since we've introduced dedicated struct to trace individual PEs,
it's reasonable to trace its state through the dedicated struct
instead of using eeh_dev any more.

The patches implements the state tracing based on PE. It's notable
that the PE state will be applied to the specified PE as well as
its child PEs. That complies with the rule that problematic parent
PE will prevent those child PEs from working properly.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h  |3 +
 arch/powerpc/include/asm/ppc-pci.h  |4 +-
 arch/powerpc/platforms/pseries/eeh.c|  102 ---
 arch/powerpc/platforms/pseries/eeh_pe.c |   79 
 4 files changed, 84 insertions(+), 104 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index fd69584..493dc7c 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -67,6 +67,9 @@ struct eeh_pe {
struct list_head child; /* Child PEs*/
 };
 
+#define eeh_pe_for_each_dev(pe, edev) \
+   list_for_each_entry(edev, pe-edevs, list)
+
 /*
  * The struct is used to trace EEH state for the associated
  * PCI device node or PCI device. In future, it might
diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 80fa704..c7e5bd6 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -57,8 +57,8 @@ int eeh_reset_pe(struct eeh_dev *);
 void eeh_restore_bars(struct eeh_dev *);
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
 int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
-void eeh_mark_slot(struct device_node *dn, int mode_flag);
-void eeh_clear_slot(struct device_node *dn, int mode_flag);
+void eeh_pe_state_mark(struct eeh_pe *pe, int state);
+void eeh_pe_state_clear(struct eeh_pe *pe, int state);
 struct device_node *eeh_find_device_pe(struct device_node *dn);
 
 void eeh_sysfs_add_device(struct pci_dev *pdev);
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 82a5fdc..c527c46 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -279,108 +279,6 @@ struct device_node *eeh_find_device_pe(struct device_node 
*dn)
 }
 
 /**
- * __eeh_mark_slot - Mark all child devices as failed
- * @parent: parent device
- * @mode_flag: failure flag
- *
- * Mark all devices that are children of this device as failed.
- * Mark the device driver too, so that it can see the failure
- * immediately; this is critical, since some drivers poll
- * status registers in interrupts ... If a driver is polling,
- * and the slot is frozen, then the driver can deadlock in
- * an interrupt context, which is bad.
- */
-static void __eeh_mark_slot(struct device_node *parent, int mode_flag)
-{
-   struct device_node *dn;
-
-   for_each_child_of_node(parent, dn) {
-   if (of_node_to_eeh_dev(dn)) {
-   /* Mark the pci device driver too */
-   struct pci_dev *dev = of_node_to_eeh_dev(dn)-pdev;
-
-   of_node_to_eeh_dev(dn)-mode |= mode_flag;
-
-   if (dev  dev-driver)
-   dev-error_state = pci_channel_io_frozen;
-
-   __eeh_mark_slot(dn, mode_flag);
-   }
-   }
-}
-
-/**
- * eeh_mark_slot - Mark the indicated device and its children as failed
- * @dn: parent device
- * @mode_flag: failure flag
- *
- * Mark the indicated device and its child devices as failed.
- * The device drivers are marked as failed as well.
- */
-void eeh_mark_slot(struct device_node *dn, int mode_flag)
-{
-   struct pci_dev *dev;
-   dn = eeh_find_device_pe(dn);
-
-   /* Back up one, since config addrs might be shared */
-   if (!pcibios_find_pci_bus(dn)  of_node_to_eeh_dev(dn-parent))
-   dn = dn-parent;
-
-   of_node_to_eeh_dev(dn)-mode |= mode_flag;
-
-   /* Mark the pci device too */
-   dev = of_node_to_eeh_dev(dn)-pdev;
-   if (dev)
-   dev-error_state = pci_channel_io_frozen;
-
-   __eeh_mark_slot(dn, mode_flag);
-}
-
-/**
- * __eeh_clear_slot - Clear failure flag for the child devices
- * @parent: parent device
- * @mode_flag: flag to be cleared
- *
- * Clear failure flag for the child devices.
- */
-static void __eeh_clear_slot(struct device_node *parent, int mode_flag)
-{
-   struct device_node *dn;
-
-   for_each_child_of_node(parent, dn) {
-   if (of_node_to_eeh_dev(dn)) {
-   of_node_to_eeh_dev(dn)-mode = ~mode_flag;
-   of_node_to_eeh_dev(dn)-check_count = 0;
-   __eeh_clear_slot(dn, mode_flag);
-   }
-   }
-}
-
-/**
- * eeh_clear_slot - Clear failure flag for the indicated device and its 
children
- * @dn: parent device
- 

[PATCH 02/21] ppc/eeh: use slab to allocate eeh devices

2012-09-05 Thread Gavin Shan
The EEH initialization functions have been postponed until slab/slub
are ready. So we use slab/slub to allocate the memory chunks for
newly creatd EEH devices. That would save lots of memory.

The patch also does cleanup to replace kmalloc with kzalloc so
that we needn't clear the allocated memory chunk explicitly.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/eeh_cache.c |2 +-
 arch/powerpc/platforms/pseries/eeh_dev.c   |2 +-
 arch/powerpc/platforms/pseries/eeh_event.c |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_cache.c 
b/arch/powerpc/platforms/pseries/eeh_cache.c
index e5ae1c6..f50b717 100644
--- a/arch/powerpc/platforms/pseries/eeh_cache.c
+++ b/arch/powerpc/platforms/pseries/eeh_cache.c
@@ -151,7 +151,7 @@ pci_addr_cache_insert(struct pci_dev *dev, unsigned long 
alo,
return piar;
}
}
-   piar = kmalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
+   piar = kzalloc(sizeof(struct pci_io_addr_range), GFP_ATOMIC);
if (!piar)
return NULL;
 
diff --git a/arch/powerpc/platforms/pseries/eeh_dev.c 
b/arch/powerpc/platforms/pseries/eeh_dev.c
index ab68c59..8e3443b 100644
--- a/arch/powerpc/platforms/pseries/eeh_dev.c
+++ b/arch/powerpc/platforms/pseries/eeh_dev.c
@@ -55,7 +55,7 @@ void * __devinit eeh_dev_init(struct device_node *dn, void 
*data)
struct eeh_dev *edev;
 
/* Allocate EEH device */
-   edev = zalloc_maybe_bootmem(sizeof(*edev), GFP_KERNEL);
+   edev = kzalloc(sizeof(*edev), GFP_KERNEL);
if (!edev) {
pr_warning(%s: out of memory\n, __func__);
return NULL;
diff --git a/arch/powerpc/platforms/pseries/eeh_event.c 
b/arch/powerpc/platforms/pseries/eeh_event.c
index fb50631..6132772 100644
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ b/arch/powerpc/platforms/pseries/eeh_event.c
@@ -139,7 +139,7 @@ int eeh_send_failure_event(struct eeh_dev *edev)
printk(KERN_ERR EEH: PCI location = %s\n, location);
return 1;
}
-   event = kmalloc(sizeof(*event), GFP_ATOMIC);
+   event = kzalloc(sizeof(*event), GFP_ATOMIC);
if (event == NULL) {
printk(KERN_ERR EEH: out of memory, event not handled\n);
return 1;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 13/21] ppc/eeh: eeh options based on PE

2012-09-05 Thread Gavin Shan
Originally, all the EEH options were implemented based on OF node.
Actually, it explicitly breaks the rules that the operation target
is PE instead of device. Therefore, the patch makes all the operations
based on PE instead of device.

Unfortunately, the backend for config space has to be kept as original
because it doesn't depend on PE actually.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |   14 ++--
 arch/powerpc/platforms/pseries/eeh.c |   13 ++-
 arch/powerpc/platforms/pseries/eeh_pseries.c |  133 +++---
 3 files changed, 74 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 493dc7c..96451b7 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -136,13 +136,13 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct 
eeh_dev *edev)
 struct eeh_ops {
char *name;
int (*init)(void);
-   int (*set_option)(struct device_node *dn, int option);
-   int (*get_pe_addr)(struct device_node *dn);
-   int (*get_state)(struct device_node *dn, int *state);
-   int (*reset)(struct device_node *dn, int option);
-   int (*wait_state)(struct device_node *dn, int max_wait);
-   int (*get_log)(struct device_node *dn, int severity, char *drv_log, 
unsigned long len);
-   int (*configure_bridge)(struct device_node *dn);
+   int (*set_option)(struct eeh_pe *pe, int option);
+   int (*get_pe_addr)(struct eeh_pe *pe);
+   int (*get_state)(struct eeh_pe *pe, int *state);
+   int (*reset)(struct eeh_pe *pe, int option);
+   int (*wait_state)(struct eeh_pe *pe, int max_wait);
+   int (*get_log)(struct eeh_pe *pe, int severity, char *drv_log, unsigned 
long len);
+   int (*configure_bridge)(struct eeh_pe *pe);
int (*read_config)(struct device_node *dn, int where, int size, u32 
*val);
int (*write_config)(struct device_node *dn, int where, int size, u32 
val);
 };
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 341ba1a..636413f 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -729,6 +729,7 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
const u32 *regs;
int enable;
struct eeh_dev *edev = of_node_to_eeh_dev(dn);
+   struct eeh_pe pe;
 
edev-class_code = 0;
edev-mode = 0;
@@ -755,9 +756,14 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
 */
regs = of_get_property(dn, reg, NULL);
if (regs) {
+   /* Initialize the fake PE */
+   memset(pe, 0, sizeof(struct eeh_pe));
+   pe.phb = edev-phb;
+   pe.config_addr = regs[0];
+
/* First register entry is addr (00BBSS00)  */
/* Try to enable eeh */
-   ret = eeh_ops-set_option(dn, EEH_OPT_ENABLE);
+   ret = eeh_ops-set_option(pe, EEH_OPT_ENABLE);
 
enable = 0;
if (ret == 0) {
@@ -766,14 +772,15 @@ static void *eeh_early_enable(struct device_node *dn, 
void *data)
/* If the newer, better, ibm,get-config-addr-info is 
supported, 
 * then use that instead.
 */
-   edev-pe_config_addr = eeh_ops-get_pe_addr(dn);
+   edev-pe_config_addr = eeh_ops-get_pe_addr(pe);
+   pe.addr = edev-pe_config_addr;
 
/* Some older systems (Power4) allow the
 * ibm,set-eeh-option call to succeed even on nodes
 * where EEH is not supported. Verify support
 * explicitly.
 */
-   ret = eeh_ops-get_state(dn, NULL);
+   ret = eeh_ops-get_state(pe, NULL);
if (ret  0  ret != EEH_STATE_NOT_SUPPORT)
enable = 1;
}
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 46616c8..90a4f20 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -134,22 +134,18 @@ static int pseries_eeh_init(void)
 
 /**
  * pseries_eeh_set_option - Initialize EEH or MMIO/DMA reenable
- * @dn: device node
+ * @pe: EEH PE
  * @option: operation to be issued
  *
  * The function is used to control the EEH functionality globally.
  * Currently, following options are support according to PAPR:
  * Enable EEH, Disable EEH, Enable MMIO and Enable DMA
  */
-static int pseries_eeh_set_option(struct device_node *dn, int option)
+static int pseries_eeh_set_option(struct eeh_pe *pe, int option)
 {
int ret = 0;
-   struct eeh_dev *edev;
-   const u32 *reg;
int 

[PATCH 15/21] ppc/eeh: I/O enable and log retrival based on PE

2012-09-05 Thread Gavin Shan
The patch refactors the original implementation in order to enable
I/O and do log retrieval based on PE.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/ppc-pci.h   |4 +-
 arch/powerpc/platforms/pseries/eeh.c |   44 ++---
 2 files changed, 21 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 5cbe3f2..5e34b10 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -51,8 +51,8 @@ void pci_addr_cache_build(void);
 void pci_addr_cache_insert_device(struct pci_dev *dev);
 void pci_addr_cache_remove_device(struct pci_dev *dev);
 struct pci_dev *pci_addr_cache_get_device(unsigned long addr);
-void eeh_slot_error_detail(struct eeh_dev *edev, int severity);
-int eeh_pci_enable(struct eeh_dev *edev, int function);
+void eeh_slot_error_detail(struct eeh_pe *pe, int severity);
+int eeh_pci_enable(struct eeh_pe *pe, int function);
 int eeh_reset_pe(struct eeh_dev *);
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
 int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 28d0c04..031935d 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -207,22 +207,12 @@ static size_t eeh_gather_pci_data(struct eeh_dev *edev, 
char * buf, size_t len)
}
}
 
-   /* Gather status on devices under the bridge */
-   if (dev-class  16 == PCI_BASE_CLASS_BRIDGE) {
-   struct device_node *child;
-
-   for_each_child_of_node(dn, child) {
-   if (of_node_to_eeh_dev(child))
-   n += 
eeh_gather_pci_data(of_node_to_eeh_dev(child), buf+n, len-n);
-   }
-   }
-
return n;
 }
 
 /**
  * eeh_slot_error_detail - Generate combined log including driver log and 
error log
- * @edev: device to report error log for
+ * @pe: EEH PE
  * @severity: temporary or permanent error log
  *
  * This routine should be called to generate the combined log, which
@@ -230,17 +220,22 @@ static size_t eeh_gather_pci_data(struct eeh_dev *edev, 
char * buf, size_t len)
  * out from the config space of the corresponding PCI device, while
  * the error log is fetched through platform dependent function call.
  */
-void eeh_slot_error_detail(struct eeh_dev *edev, int severity)
+void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 {
size_t loglen = 0;
-   pci_regs_buf[0] = 0;
+   struct eeh_dev *edev;
 
-   eeh_pci_enable(edev, EEH_OPT_THAW_MMIO);
-   eeh_ops-configure_bridge(eeh_dev_to_of_node(edev));
-   eeh_restore_bars(edev);
-   loglen = eeh_gather_pci_data(edev, pci_regs_buf, EEH_PCI_REGS_LOG_LEN);
+   eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
+   eeh_ops-configure_bridge(pe);
+   eeh_pe_restore_bars(pe);
 
-   eeh_ops-get_log(eeh_dev_to_of_node(edev), severity, pci_regs_buf, 
loglen);
+   pci_regs_buf[0] = 0;
+   eeh_pe_for_each_dev(pe, edev) {
+   loglen += eeh_gather_pci_data(edev, pci_regs_buf,
+   EEH_PCI_REGS_LOG_LEN);
+}
+
+   eeh_ops-get_log(pe, severity, pci_regs_buf, loglen);
 }
 
 /**
@@ -427,23 +422,22 @@ EXPORT_SYMBOL(eeh_check_failure);
 
 /**
  * eeh_pci_enable - Enable MMIO or DMA transfers for this slot
- * @edev: pci device node
+ * @pe: EEH PE
  *
  * This routine should be called to reenable frozen MMIO or DMA
  * so that it would work correctly again. It's useful while doing
  * recovery or log collection on the indicated device.
  */
-int eeh_pci_enable(struct eeh_dev *edev, int function)
+int eeh_pci_enable(struct eeh_pe *pe, int function)
 {
int rc;
-   struct device_node *dn = eeh_dev_to_of_node(edev);
 
-   rc = eeh_ops-set_option(dn, function);
+   rc = eeh_ops-set_option(pe, function);
if (rc)
-   printk(KERN_WARNING EEH: Unexpected state change %d, err=%d 
dn=%s\n,
-   function, rc, dn-full_name);
+   pr_warning(%s: Unexpected state change %d on PHB#%d-PE#%x, 
err=%d\n,
+   __func__, function, pe-phb-global_number, pe-addr, 
rc);
 
-   rc = eeh_ops-wait_state(dn, PCI_BUS_RESET_WAIT_MSEC);
+   rc = eeh_ops-wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
if (rc  0  (rc  EEH_STATE_MMIO_ENABLED) 
   (function == EEH_OPT_THAW_MMIO))
return 0;
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 14/21] ppc/eeh: device bars restore based on PE

2012-09-05 Thread Gavin Shan
The patch introduces the function to traverse the devices of the
specified PE and its child PEs. Also, the restore on device bars
is implemented based on the traverse function.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h  |3 +
 arch/powerpc/include/asm/ppc-pci.h  |1 -
 arch/powerpc/platforms/pseries/eeh.c|   79 --
 arch/powerpc/platforms/pseries/eeh_pe.c |   93 +++
 4 files changed, 96 insertions(+), 80 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 96451b7..9a9fe28 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -171,6 +171,9 @@ typedef void *(*eeh_traverse_func)(void *data, void *flag);
 int __devinit eeh_phb_pe_create(struct pci_controller *phb);
 int eeh_pe_create(struct eeh_dev *edev);
 int eeh_pe_remove(struct eeh_dev *edev);
+void *eeh_pe_dev_traverse(struct eeh_pe *root,
+   eeh_traverse_func fn, void *flag);
+void eeh_pe_restore_bars(struct eeh_pe *pe);
 
 void * __devinit eeh_dev_init(struct device_node *dn, void *data);
 void __devinit eeh_dev_phb_init_dynamic(struct pci_controller *phb);
diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 3e301b1..5cbe3f2 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -54,7 +54,6 @@ struct pci_dev *pci_addr_cache_get_device(unsigned long addr);
 void eeh_slot_error_detail(struct eeh_dev *edev, int severity);
 int eeh_pci_enable(struct eeh_dev *edev, int function);
 int eeh_reset_pe(struct eeh_dev *);
-void eeh_restore_bars(struct eeh_dev *);
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
 int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
 void eeh_pe_state_mark(struct eeh_pe *pe, int state);
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 636413f..28d0c04 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -610,85 +610,6 @@ int eeh_reset_pe(struct eeh_dev *edev)
return -1;
 }
 
-/** Save and restore of PCI BARs
- *
- * Although firmware will set up BARs during boot, it doesn't
- * set up device BAR's after a device reset, although it will,
- * if requested, set up bridge configuration. Thus, we need to
- * configure the PCI devices ourselves.  
- */
-
-/**
- * eeh_restore_one_device_bars - Restore the Base Address Registers for one 
device
- * @edev: PCI device associated EEH device
- *
- * Loads the PCI configuration space base address registers,
- * the expansion ROM base address, the latency timer, and etc.
- * from the saved values in the device node.
- */
-static inline void eeh_restore_one_device_bars(struct eeh_dev *edev)
-{
-   int i;
-   u32 cmd;
-   struct device_node *dn = eeh_dev_to_of_node(edev);
-
-   if (!edev-phb)
-   return;
-
-   for (i=4; i10; i++) {
-   eeh_ops-write_config(dn, i*4, 4, edev-config_space[i]);
-   }
-
-   /* 12 == Expansion ROM Address */
-   eeh_ops-write_config(dn, 12*4, 4, edev-config_space[12]);
-
-#define BYTE_SWAP(OFF) (8*((OFF)/4)+3-(OFF))
-#define SAVED_BYTE(OFF) (((u8 *)(edev-config_space))[BYTE_SWAP(OFF)])
-
-   eeh_ops-write_config(dn, PCI_CACHE_LINE_SIZE, 1,
-   SAVED_BYTE(PCI_CACHE_LINE_SIZE));
-
-   eeh_ops-write_config(dn, PCI_LATENCY_TIMER, 1,
-   SAVED_BYTE(PCI_LATENCY_TIMER));
-
-   /* max latency, min grant, interrupt pin and line */
-   eeh_ops-write_config(dn, 15*4, 4, edev-config_space[15]);
-
-   /* Restore PERR  SERR bits, some devices require it,
-* don't touch the other command bits
-*/
-   eeh_ops-read_config(dn, PCI_COMMAND, 4, cmd);
-   if (edev-config_space[1]  PCI_COMMAND_PARITY)
-   cmd |= PCI_COMMAND_PARITY;
-   else
-   cmd = ~PCI_COMMAND_PARITY;
-   if (edev-config_space[1]  PCI_COMMAND_SERR)
-   cmd |= PCI_COMMAND_SERR;
-   else
-   cmd = ~PCI_COMMAND_SERR;
-   eeh_ops-write_config(dn, PCI_COMMAND, 4, cmd);
-}
-
-/**
- * eeh_restore_bars - Restore the PCI config space info
- * @edev: EEH device
- *
- * This routine performs a recursive walk to the children
- * of this device as well.
- */
-void eeh_restore_bars(struct eeh_dev *edev)
-{
-   struct device_node *dn;
-   if (!edev)
-   return;
-   
-   if ((edev-mode  EEH_MODE_SUPPORTED)  !IS_BRIDGE(edev-class_code))
-   eeh_restore_one_device_bars(edev);
-
-   for_each_child_of_node(eeh_dev_to_of_node(edev), dn)
-   eeh_restore_bars(of_node_to_eeh_dev(dn));
-}
-
 /**
  * eeh_save_bars - Save device bars
  * @edev: PCI device associated EEH device
diff --git a/arch/powerpc/platforms/pseries/eeh_pe.c 
b/arch/powerpc/platforms/pseries/eeh_pe.c
index 

[PATCH 16/21] ppc/eeh: do reset based on PE

2012-09-05 Thread Gavin Shan
The patch implements reset based on PE instead of eeh device. Also,
The functions used to retrieve the reset type, either hot or fundamental
reset, have been reworked for a little bit. More specificly, it's
implemented based the the eeh device traverse function.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/ppc-pci.h   |2 +-
 arch/powerpc/platforms/pseries/eeh.c |   91 +-
 2 files changed, 35 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 5e34b10..2a80f08 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -53,7 +53,7 @@ void pci_addr_cache_remove_device(struct pci_dev *dev);
 struct pci_dev *pci_addr_cache_get_device(unsigned long addr);
 void eeh_slot_error_detail(struct eeh_pe *pe, int severity);
 int eeh_pci_enable(struct eeh_pe *pe, int function);
-int eeh_reset_pe(struct eeh_dev *);
+int eeh_reset_pe(struct eeh_pe *);
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
 int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
 void eeh_pe_state_mark(struct eeh_pe *pe, int state);
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 031935d..d855c20 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -455,17 +455,24 @@ int eeh_pci_enable(struct eeh_pe *pe, int function)
  */
 int pcibios_set_pcie_reset_state(struct pci_dev *dev, enum pcie_reset_state 
state)
 {
-   struct device_node *dn = pci_device_to_OF_node(dev);
+   struct eeh_dev *edev = pci_dev_to_eeh_dev(dev);
+   struct eeh_pe *pe = edev-pe;
+
+   if (!pe) {
+   pr_err(%s: No PE found on PCI device %s\n,
+   __func__, pci_name(dev));
+   return -EINVAL;
+   }
 
switch (state) {
case pcie_deassert_reset:
-   eeh_ops-reset(dn, EEH_RESET_DEACTIVATE);
+   eeh_ops-reset(pe, EEH_RESET_DEACTIVATE);
break;
case pcie_hot_reset:
-   eeh_ops-reset(dn, EEH_RESET_HOT);
+   eeh_ops-reset(pe, EEH_RESET_HOT);
break;
case pcie_warm_reset:
-   eeh_ops-reset(dn, EEH_RESET_FUNDAMENTAL);
+   eeh_ops-reset(pe, EEH_RESET_FUNDAMENTAL);
break;
default:
return -EINVAL;
@@ -475,66 +482,37 @@ int pcibios_set_pcie_reset_state(struct pci_dev *dev, 
enum pcie_reset_state stat
 }
 
 /**
- * __eeh_set_pe_freset - Check the required reset for child devices
- * @parent: parent device
- * @freset: return value
- *
- * Each device might have its preferred reset type: fundamental or
- * hot reset. The routine is used to collect the information from
- * the child devices so that they could be reset accordingly.
- */
-void __eeh_set_pe_freset(struct device_node *parent, unsigned int *freset)
-{
-   struct device_node *dn;
-
-   for_each_child_of_node(parent, dn) {
-   if (of_node_to_eeh_dev(dn)) {
-   struct pci_dev *dev = of_node_to_eeh_dev(dn)-pdev;
-
-   if (dev  dev-driver)
-   *freset |= dev-needs_freset;
-
-   __eeh_set_pe_freset(dn, freset);
-   }
-   }
-}
-
-/**
- * eeh_set_pe_freset - Check the required reset for the indicated device and 
its children
- * @dn: parent device
- * @freset: return value
+ * eeh_set_pe_freset - Check the required reset for the indicated device
+ * @data: EEH device
+ * @flag: return value
  *
  * Each device might have its preferred reset type: fundamental or
  * hot reset. The routine is used to collected the information for
  * the indicated device and its children so that the bunch of the
  * devices could be reset properly.
  */
-void eeh_set_pe_freset(struct device_node *dn, unsigned int *freset)
+static void *eeh_set_dev_freset(void *data, void *flag)
 {
struct pci_dev *dev;
-   dn = eeh_find_device_pe(dn);
-
-   /* Back up one, since config addrs might be shared */
-   if (!pcibios_find_pci_bus(dn)  of_node_to_eeh_dev(dn-parent))
-   dn = dn-parent;
+   unsigned int *freset = (unsigned int *)flag;
+   struct eeh_dev *edev = (struct eeh_dev *)data;
 
-   dev = of_node_to_eeh_dev(dn)-pdev;
+   dev = eeh_dev_to_pci_dev(edev);
if (dev)
*freset |= dev-needs_freset;
 
-   __eeh_set_pe_freset(dn, freset);
+   return NULL;
 }
 
 /**
  * eeh_reset_pe_once - Assert the pci #RST line for 1/4 second
- * @edev: pci device node to be reset.
+ * @pe: EEH PE
  *
  * Assert the PCI #RST line for 1/4 second.
  */
-static void eeh_reset_pe_once(struct eeh_dev *edev)
+static void eeh_reset_pe_once(struct eeh_pe *pe)
 {
unsigned int freset = 0;
-   struct device_node *dn = eeh_dev_to_of_node(edev);
 
   

[PATCH 17/21] ppc/eeh: make EEH handler PE sensitive

2012-09-05 Thread Gavin Shan
Once eeh error is found, eeh event will be created and put it into
the global linked list. At the mean while, kernel thread will be
started to process it. The handler for the kernel thread originally
was eeh device sensitive.

The patch reworks the handler of the kernel thread so that it's PE
sensitive.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/pseries/eeh_event.c |   25 ++---
 1 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_event.c 
b/arch/powerpc/platforms/pseries/eeh_event.c
index 7f89f1e..ba7005a 100644
--- a/arch/powerpc/platforms/pseries/eeh_event.c
+++ b/arch/powerpc/platforms/pseries/eeh_event.c
@@ -57,7 +57,7 @@ static int eeh_event_handler(void * dummy)
 {
unsigned long flags;
struct eeh_event *event;
-   struct eeh_dev *edev;
+   struct eeh_pe *pe;
 
set_task_comm(current, eehd);
 
@@ -76,28 +76,23 @@ static int eeh_event_handler(void * dummy)
 
/* Serialize processing of EEH events */
mutex_lock(eeh_event_mutex);
-   edev = event-edev;
-   eeh_mark_slot(eeh_dev_to_of_node(edev), EEH_MODE_RECOVERING);
-
-   printk(KERN_INFO EEH: Detected PCI bus error on device %s\n,
-  eeh_pci_name(edev-pdev));
+   pe = event-pe;
+   eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
+   pr_info(EEH: Detected PCI bus error on PHB#%d-PE#%x\n,
+   pe-phb-global_number, pe-addr);
 
set_current_state(TASK_INTERRUPTIBLE);  /* Don't add to load average */
-   edev = handle_eeh_events(event);
-
-   if (edev) {
-   eeh_clear_slot(eeh_dev_to_of_node(edev), EEH_MODE_RECOVERING);
-   pci_dev_put(edev-pdev);
-   }
+   handle_eeh_events(event);
+   eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
 
kfree(event);
mutex_unlock(eeh_event_mutex);
 
/* If there are no new errors after an hour, clear the counter. */
-   if (edev  edev-freeze_count0) {
+   if (pe  pe-freeze_count  0) {
msleep_interruptible(3600*1000);
-   if (edev-freeze_count0)
-   edev-freeze_count--;
+   if (pe-freeze_count  0)
+   pe-freeze_count--;
 
}
 
-- 
1.7.5.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 18/21] ppc/eeh: handle EEH error based on PE

2012-09-05 Thread Gavin Shan
The patch reworks the current implementation so that the eeh errors
will be handled basing on PE instead of eeh device.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h  |1 +
 arch/powerpc/include/asm/eeh_event.h|2 +-
 arch/powerpc/platforms/pseries/eeh_driver.c |  229 +++
 arch/powerpc/platforms/pseries/eeh_event.c  |2 +-
 arch/powerpc/platforms/pseries/eeh_pe.c |   27 +++
 5 files changed, 124 insertions(+), 137 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 9a9fe28..e07ece1 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -174,6 +174,7 @@ int eeh_pe_remove(struct eeh_dev *edev);
 void *eeh_pe_dev_traverse(struct eeh_pe *root,
eeh_traverse_func fn, void *flag);
 void eeh_pe_restore_bars(struct eeh_pe *pe);
+struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
 void * __devinit eeh_dev_init(struct device_node *dn, void *data);
 void __devinit eeh_dev_phb_init_dynamic(struct pci_controller *phb);
diff --git a/arch/powerpc/include/asm/eeh_event.h 
b/arch/powerpc/include/asm/eeh_event.h
index dc722b5..de67d83 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -32,7 +32,7 @@ struct eeh_event {
 };
 
 int eeh_send_failure_event(struct eeh_pe *pe);
-struct eeh_dev *handle_eeh_events(struct eeh_event *);
+void eeh_handle_event(struct eeh_pe *pe);
 
 #endif /* __KERNEL__ */
 #endif /* ASM_POWERPC_EEH_EVENT_H */
diff --git a/arch/powerpc/platforms/pseries/eeh_driver.c 
b/arch/powerpc/platforms/pseries/eeh_driver.c
index baf92cd..343c807 100644
--- a/arch/powerpc/platforms/pseries/eeh_driver.c
+++ b/arch/powerpc/platforms/pseries/eeh_driver.c
@@ -116,28 +116,35 @@ static void eeh_enable_irq(struct pci_dev *dev)
 
 /**
  * eeh_report_error - Report pci error to each device driver
- * @dev: PCI device
+ * @data: eeh device
  * @userdata: return value
  * 
  * Report an EEH error to each device driver, collect up and 
  * merge the device driver responses. Cumulative response 
  * passed back in userdata.
  */
-static int eeh_report_error(struct pci_dev *dev, void *userdata)
+static void *eeh_report_error(void *data, void *userdata)
 {
+   struct eeh_dev *edev = (struct eeh_dev *)data;
+   struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
enum pci_ers_result rc, *res = userdata;
struct pci_driver *driver = dev-driver;
 
+   /* We might not have the associated PCI device,
+* then we should continue for next one.
+*/
+   if (!dev) return NULL;
+
dev-error_state = pci_channel_io_frozen;
 
if (!driver)
-   return 0;
+   return NULL;
 
eeh_disable_irq(dev);
 
if (!driver-err_handler ||
!driver-err_handler-error_detected)
-   return 0;
+   return NULL;
 
rc = driver-err_handler-error_detected(dev, pci_channel_io_frozen);
 
@@ -145,27 +152,31 @@ static int eeh_report_error(struct pci_dev *dev, void 
*userdata)
if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
-   return 0;
+   return NULL;
 }
 
 /**
  * eeh_report_mmio_enabled - Tell drivers that MMIO has been enabled
- * @dev: PCI device
+ * @data: eeh device
  * @userdata: return value
  *
  * Tells each device driver that IO ports, MMIO and config space I/O
  * are now enabled. Collects up and merges the device driver responses.
  * Cumulative response passed back in userdata.
  */
-static int eeh_report_mmio_enabled(struct pci_dev *dev, void *userdata)
+static void *eeh_report_mmio_enabled(void *data, void *userdata)
 {
+   struct eeh_dev *edev = (struct eeh_dev *)data;
+   struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
enum pci_ers_result rc, *res = userdata;
-   struct pci_driver *driver = dev-driver;
+   struct pci_driver *driver;
 
-   if (!driver ||
+   if (!dev) return NULL;
+
+   if (!(driver = dev-driver) ||
!driver-err_handler ||
!driver-err_handler-mmio_enabled)
-   return 0;
+   return NULL;
 
rc = driver-err_handler-mmio_enabled(dev);
 
@@ -173,12 +184,12 @@ static int eeh_report_mmio_enabled(struct pci_dev *dev, 
void *userdata)
if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
-   return 0;
+   return NULL;
 }
 
 /**
  * eeh_report_reset - Tell device that slot has been reset
- * @dev: PCI device
+ * @data: eeh device
  * @userdata: return value
  *
  * This routine must be called while EEH tries to reset particular
@@ -186,13 +197,15 @@ static int eeh_report_mmio_enabled(struct pci_dev *dev, 
void *userdata)
  * some actions, usually to save data the driver needs so that the
  * driver can work again while the device is recovered.
  */
-static 

[PATCH 20/21] ppc/eeh: probe mode support

2012-09-05 Thread Gavin Shan
While EEH module is installed, PCI devices is checked one by one
to see if it supports eeh. That is done based on OF nodes or
PCI device referred by struct pci_dev. In order to distinguish
the case, global variable eeh_probe_mode is introduced.

The patch implements the support to eeh probe mode. Also, the
EEH on pseries has set it into EEH_PROBE_MODE_FDT. That means
the probe will be done based on OF nodes on pSeries platform.

In addition, On pSeries platform, it's done by OF nodes. The patch
moves the the probe function to platform dependent backend and do
some cleanup.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |   21 
 arch/powerpc/include/asm/ppc-pci.h   |1 +
 arch/powerpc/platforms/pseries/eeh.c |  131 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c |   96 +++
 4 files changed, 140 insertions(+), 109 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index b7ac3f7..91c38b7 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -129,6 +129,8 @@ static inline struct pci_dev *eeh_dev_to_pci_dev(struct 
eeh_dev *edev)
 struct eeh_ops {
char *name;
int (*init)(void);
+   void* (*of_probe)(struct device_node *dn, void *flag);
+   void* (*dev_probe)(struct pci_dev *dev, void *flag);
int (*set_option)(struct eeh_pe *pe, int option);
int (*get_pe_addr)(struct eeh_pe *pe);
int (*get_state)(struct eeh_pe *pe, int *state);
@@ -143,6 +145,25 @@ struct eeh_ops {
 extern struct eeh_ops *eeh_ops;
 extern int eeh_subsystem_enabled;
 extern struct mutex eeh_mutex;
+extern int eeh_probe_mode;
+
+#define EEH_PROBE_MODE_DEV (10)  /* From PCI device  */
+#define EEH_PROBE_MODE_FDT (11)  /* From FDT */
+
+static inline void eeh_probe_mode_set(int flag)
+{
+   eeh_probe_mode = flag;
+}
+
+static inline int eeh_probe_mode_fdt(void)
+{
+   return (eeh_probe_mode == EEH_PROBE_MODE_FDT);
+}
+
+static inline int eeh_probe_mode_dev(void)
+{
+   return (eeh_probe_mode == EEH_PROBE_MODE_DEV);
+}
 
 static inline void eeh_lock(void)
 {
diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 2a80f08..56d55c7 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -54,6 +54,7 @@ struct pci_dev *pci_addr_cache_get_device(unsigned long addr);
 void eeh_slot_error_detail(struct eeh_pe *pe, int severity);
 int eeh_pci_enable(struct eeh_pe *pe, int function);
 int eeh_reset_pe(struct eeh_pe *);
+void eeh_save_bars(struct eeh_dev *edev);
 int rtas_write_config(struct pci_dn *, int where, int size, u32 val);
 int rtas_read_config(struct pci_dn *, int where, int size, u32 *val);
 void eeh_pe_state_mark(struct eeh_pe *pe, int state);
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 1438c4e..b2caf84 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -92,6 +92,17 @@ struct eeh_ops *eeh_ops = NULL;
 int eeh_subsystem_enabled;
 EXPORT_SYMBOL(eeh_subsystem_enabled);
 
+/*
+ * EEH probe mode support. The intention is to support multiple
+ * platforms for EEH. Some platforms like pSeries do PCI emunation
+ * based on FDT (Flat Device Tree). However, other platforms like
+ * powernv probe PCI devices from hardware. The flag is used to
+ * distinguish that. In addition, struct eeh_ops::probe would be
+ * invoked for particular OF node or PCI device so that the
+ * corresponding PE would be created there.
+ */
+int eeh_probe_mode;
+
 /* Global EEH mutex */
 DEFINE_MUTEX(eeh_mutex);
 
@@ -590,7 +601,7 @@ int eeh_reset_pe(struct eeh_pe *pe)
  * PCI devices are added individually; but, for the restore,
  * an entire slot is reset at a time.
  */
-static void eeh_save_bars(struct eeh_dev *edev)
+void eeh_save_bars(struct eeh_dev *edev)
 {
int i;
struct device_node *dn;
@@ -604,108 +615,6 @@ static void eeh_save_bars(struct eeh_dev *edev)
 }
 
 /**
- * eeh_early_enable - Early enable EEH on the indicated device
- * @dn: device node
- * @data: BUID
- *
- * Enable EEH functionality on the specified PCI device. The function
- * is expected to be called before real PCI probing is done. However,
- * the PHBs have been initialized at this point.
- */
-static void *eeh_early_enable(struct device_node *dn, void *data)
-{
-   int ret;
-   const u32 *class_code = of_get_property(dn, class-code, NULL);
-   const u32 *vendor_id = of_get_property(dn, vendor-id, NULL);
-   const u32 *device_id = of_get_property(dn, device-id, NULL);
-   const u32 *regs;
-   int enable;
-   struct eeh_dev *edev = of_node_to_eeh_dev(dn);
-   struct eeh_pe pe;
-
-   edev-class_code = 0;
-   edev-mode = 0;
-
-   if (!of_device_is_available(dn))
-   return NULL;
-
-   /* Ignore bad 

[PATCH 21/21] ppc/eeh: trace eeh device from I/O cache

2012-09-05 Thread Gavin Shan
The idea comes from Benjamin Herrenschmidt. The eeh cache helps
fetching the pci device according to the given I/O address. Since
the eeh cache is serving for eeh, it's reasonable for eeh cache
to trace eeh device except pci device.

The patch make eeh cache to trace eeh device. Also, the major
eeh entry function eeh_dn_check_failure has been renamed to
eeh_dev_check_failure since it will take eeh device as input
parameter.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h |7 +
 arch/powerpc/include/asm/pci-bridge.h  |2 +
 arch/powerpc/include/asm/ppc-pci.h |2 +-
 arch/powerpc/kernel/rtas_pci.c |2 +-
 arch/powerpc/platforms/pseries/eeh.c   |   33 +++
 arch/powerpc/platforms/pseries/eeh_cache.c |   14 ++-
 6 files changed, 28 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 91c38b7..4d59da0 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -196,7 +196,7 @@ int __init eeh_ops_register(struct eeh_ops *ops);
 int __exit eeh_ops_unregister(const char *name);
 unsigned long eeh_check_failure(const volatile void __iomem *token,
unsigned long val);
-int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev);
+int eeh_dev_check_failure(struct eeh_dev *edev);
 void __init pci_addr_cache_build(void);
 void eeh_add_device_tree_early(struct device_node *);
 void eeh_add_device_tree_late(struct pci_bus *);
@@ -231,10 +231,7 @@ static inline unsigned long eeh_check_failure(const 
volatile void __iomem *token
return val;
 }
 
-static inline int eeh_dn_check_failure(struct device_node *dn, struct pci_dev 
*dev)
-{
-   return 0;
-}
+#define eeh_dev_check_failure(x) (0)
 
 static inline void pci_addr_cache_build(void) { }
 
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 8cccbee..973df4d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -184,6 +184,8 @@ static inline struct eeh_dev *of_node_to_eeh_dev(struct 
device_node *dn)
 {
return PCI_DN(dn)-edev;
 }
+#else
+#define of_node_to_eeh_dev(x) (NULL)
 #endif
 
 /** Find the bus corresponding to the indicated device node */
diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 56d55c7..962a902 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -50,7 +50,7 @@ extern int rtas_setup_phb(struct pci_controller *phb);
 void pci_addr_cache_build(void);
 void pci_addr_cache_insert_device(struct pci_dev *dev);
 void pci_addr_cache_remove_device(struct pci_dev *dev);
-struct pci_dev *pci_addr_cache_get_device(unsigned long addr);
+struct eeh_dev *pci_addr_cache_get_device(unsigned long addr);
 void eeh_slot_error_detail(struct eeh_pe *pe, int severity);
 int eeh_pci_enable(struct eeh_pe *pe, int function);
 int eeh_reset_pe(struct eeh_pe *);
diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index 140735c..6de63e3 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -81,7 +81,7 @@ int rtas_read_config(struct pci_dn *pdn, int where, int size, 
u32 *val)
return PCIBIOS_DEVICE_NOT_FOUND;
 
if (returnval == EEH_IO_ERROR_VALUE(size) 
-   eeh_dn_check_failure (pdn-node, NULL))
+   eeh_dev_check_failure(of_node_to_eeh_dev(pdn-node)))
return PCIBIOS_DEVICE_NOT_FOUND;
 
return PCIBIOS_SUCCESSFUL;
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index b2caf84..81e8c8e 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -270,9 +270,8 @@ static inline unsigned long eeh_token_to_phys(unsigned long 
token)
 }
 
 /**
- * eeh_dn_check_failure - Check if all 1's data is due to EEH slot freeze
- * @dn: device node
- * @dev: pci device, if known
+ * eeh_dev_check_failure - Check if all 1's data is due to EEH slot freeze
+ * @edev: eeh device
  *
  * Check for an EEH failure for the given device node.  Call this
  * routine if the result of a read was all 0xff's and you want to
@@ -284,12 +283,13 @@ static inline unsigned long eeh_token_to_phys(unsigned 
long token)
  *
  * It is safe to call this routine in an interrupt context.
  */
-int eeh_dn_check_failure(struct device_node *dn, struct pci_dev *dev)
+int eeh_dev_check_failure(struct eeh_dev *edev)
 {
int ret;
unsigned long flags;
+   struct device_node *dn;
+   struct pci_dev *dev;
struct eeh_pe *pe;
-   struct eeh_dev *edev;
int rc = 0;
const char *location;
 
@@ -298,15 +298,12 @@ int eeh_dn_check_failure(struct device_node *dn, struct 
pci_dev *dev)
if (!eeh_subsystem_enabled)
return 0;
 
-   if (dn) {
-  

[PATCH 19/21] ppc/eeh: move stats to PE

2012-09-05 Thread Gavin Shan
The patch removes the eeh related statistics for eeh device since
they have been maintained by the corresponding eeh PE. Also, the
flags used to trace the state of eeh device and PE have been reworked
for a little bit.

Signed-off-by: Gavin Shan sha...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h  |9 +
 arch/powerpc/platforms/pseries/eeh.c|   13 +++--
 arch/powerpc/platforms/pseries/eeh_cache.c  |3 +--
 arch/powerpc/platforms/pseries/eeh_driver.c |6 +++---
 arch/powerpc/platforms/pseries/eeh_sysfs.c  |9 -
 5 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index e07ece1..b7ac3f7 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -77,20 +77,13 @@ struct eeh_pe {
  * another tree except the currently existing tree of PCI
  * buses and PCI devices
  */
-#define EEH_MODE_SUPPORTED (10)  /* EEH supported on the device  */
-#define EEH_MODE_NOCHECK   (11)  /* EEH check should be skipped  */
-#define EEH_MODE_ISOLATED  (12)  /* The device has been isolated */
-#define EEH_MODE_RECOVERING(13)  /* Recovering the device*/
-#define EEH_MODE_IRQ_DISABLED  (14)  /* Interrupt disabled   */
+#define EEH_DEV_IRQ_DISABLED   (10)  /* Interrupt disabled   */
 
 struct eeh_dev {
int mode;   /* EEH mode */
int class_code; /* Class code of the device */
int config_addr;/* Config address   */
int pe_config_addr; /* PE config address*/
-   int check_count;/* Times of ignored error   */
-   int freeze_count;   /* Times of froze up*/
-   int false_positives;/* Times of reported #ff's  */
u32 config_space[16];   /* Saved PCI config space   */
struct eeh_pe *pe;  /* Associated PE*/
struct list_head list;  /* Form link list in the PE */
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index d855c20..1438c4e 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -537,7 +537,7 @@ static void eeh_reset_pe_once(struct eeh_pe *pe)
 * pci slot reset line is dropped. Make sure we don't miss
 * these, and clear the flag now.
 */
-   eeh_pe_state_clear(pe, EEH_MODE_ISOLATED);
+   eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
 
eeh_ops-reset(pe, EEH_RESET_DEACTIVATE);
 
@@ -625,9 +625,6 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
 
edev-class_code = 0;
edev-mode = 0;
-   edev-check_count = 0;
-   edev-freeze_count = 0;
-   edev-false_positives = 0;
 
if (!of_device_is_available(dn))
return NULL;
@@ -637,10 +634,8 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
return NULL;
 
/* There is nothing to check on PCI to ISA bridges */
-   if (dn-type  !strcmp(dn-type, isa)) {
-   edev-mode |= EEH_MODE_NOCHECK;
+   if (dn-type  !strcmp(dn-type, isa))
return NULL;
-   }
edev-class_code = *class_code;
 
/* Ok... see if this device supports EEH.  Some do, some don't,
@@ -679,7 +674,6 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
 
if (enable) {
eeh_subsystem_enabled = 1;
-   edev-mode |= EEH_MODE_SUPPORTED;
 
eeh_pe_create(edev);
 
@@ -692,9 +686,8 @@ static void *eeh_early_enable(struct device_node *dn, void 
*data)
 * EEH parent, in which case we mark it as supported.
 */
if (dn-parent  of_node_to_eeh_dev(dn-parent) 
-   (of_node_to_eeh_dev(dn-parent)-mode  
EEH_MODE_SUPPORTED)) {
+   of_node_to_eeh_dev(dn-parent)-pe) {
/* Parent supports EEH. */
-   edev-mode |= EEH_MODE_SUPPORTED;
edev-config_addr = 
of_node_to_eeh_dev(dn-parent)-config_addr;
edev-pe_config_addr = 
of_node_to_eeh_dev(dn-parent)-pe_config_addr;
 
diff --git a/arch/powerpc/platforms/pseries/eeh_cache.c 
b/arch/powerpc/platforms/pseries/eeh_cache.c
index f50b717..a191057 100644
--- a/arch/powerpc/platforms/pseries/eeh_cache.c
+++ b/arch/powerpc/platforms/pseries/eeh_cache.c
@@ -192,8 +192,7 @@ static void __pci_addr_cache_insert_device(struct pci_dev 
*dev)
}
 
/* Skip any devices for which EEH is not enabled. */
-   if (!(edev-mode  EEH_MODE_SUPPORTED) ||
-   edev-mode  EEH_MODE_NOCHECK) {
+   if (!edev-pe) {
 

Re: [PATCH 13/25] macintosh/mediabay: add a const qualifier

2012-09-05 Thread Uwe Kleine-König
Hello,

On Wed, Sep 05, 2012 at 12:40:17PM +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2012-07-23 at 11:13 +0200, Uwe Kleine-König wrote:
  This prepares *of_device_id.data becoming const. Without this change
  the following warning would occur:
  
  drivers/macintosh/mediabay.c: In function 'media_bay_attach':
  drivers/macintosh/mediabay.c:589:11: warning: assignment discards 
  'const' qualifier from pointer target type [enabled by default]
  
  Signed-off-by: Uwe Kleine-König u.kleine-koe...@pengutronix.de
  ---
 
 Ack all of these assuming you test built (I didn't).
all means the two mediabay patches? And yes, they are all built
tested.

  Do you need me to
 carry any of this via the powerpc tree ?
The patch that adds the const to of_device_id depends[1] on this patch.
And the other mediabay is only valid after that const is added. So the
easiest is if the series is pulled in one go. Arnd intended to let Linus
pull it for 3.6-rc1 but it was missed because of a communication
problem. So I intend to get it in during the 3.7 merge window.

Best regards
Uwe

[1] it's only a soft depend, because the result is only a warning, but
still.

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH] DMA/RaidEngine: Enable FSL RaidEngine

2012-09-05 Thread Shi Xuelin-B29237
Hi Dan,

Do you have any comments about this RaidEngine patch?

Thanks,
Forrest

-Original Message-
From: Linuxppc-dev 
[mailto:linuxppc-dev-bounces+qiang.liu=freescale@lists.ozlabs.org] On 
Behalf Of Shi Xuelin-B29237
Sent: 2012年8月22日 14:24
To: dan.j.willi...@gmail.com; vinod.k...@intel.com; 
linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
Cc: Rai Harninder-B01044; Rai Harninder-B01044; i...@ovro.caltech.edu; Burmi 
Naveen-B16502; Burmi Naveen-B16502; Shi Xuelin-B29237
Subject: [PATCH] DMA/RaidEngine: Enable FSL RaidEngine

From: Xuelin Shi b29...@freescale.com

The RaidEngine is a new FSL hardware that used as hardware acceration for 
RAID5/6.

This patch enables the RaidEngine functionality and provides hardware 
offloading capability for memcpy, xor and raid6 pq computation. It works under 
dmaengine control with async_layer interface.

Signed-off-by: Harninder Rai harninder@freescale.com
Signed-off-by: Naveen Burmi naveenbu...@freescale.com
Signed-off-by: Xuelin Shi b29...@freescale.com
---
 arch/powerpc/boot/dts/fsl/p5020si-post.dtsi|1 +
 arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi |6 +
 arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi |   85 ++
 drivers/dma/Kconfig|   14 +
 drivers/dma/Makefile   |1 +
 drivers/dma/fsl_raid.c | 1090 
 drivers/dma/fsl_raid.h |  294 +++
 7 files changed, 1491 insertions(+), 0 deletions(-)  create mode 100644 
arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi
 create mode 100644 drivers/dma/fsl_raid.c  create mode 100644 
drivers/dma/fsl_raid.h

diff --git a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
index 64b6abe..5d7205b 100644
--- a/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5020si-post.dtsi
@@ -354,4 +354,5 @@
 /include/ qoriq-sata2-0.dtsi
 /include/ qoriq-sata2-1.dtsi
 /include/ qoriq-sec4.2-0.dtsi
+/include/ qoriq-raid1.0-0.dtsi
 };
diff --git a/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi 
b/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
index ae823a4..d54cd90 100644
--- a/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
+++ b/arch/powerpc/boot/dts/fsl/p5020si-pre.dtsi
@@ -70,6 +70,12 @@
rtic_c = rtic_c;
rtic_d = rtic_d;
sec_mon = sec_mon;
+
+   raideng = raideng;
+   raideng_jr0 = raideng_jr0;
+   raideng_jr1 = raideng_jr1;
+   raideng_jr2 = raideng_jr2;
+   raideng_jr3 = raideng_jr3;
};
 
cpus {
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi
new file mode 100644
index 000..8d2e8aa
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/qoriq-raid1.0-0.dtsi
@@ -0,0 +1,85 @@
+/*
+ * QorIQ RAID 1.0 device tree stub [ controller @ offset 0x32 ]
+ *
+ * Copyright 2012 Freescale Semiconductor Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of 
+the
+ * GNU General Public License (GPL) as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND 
+ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
+IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
+ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR 
+ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
+DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 
+SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 
+CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 
+OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE 
+USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+raideng: raideng@32 {
+   compatible = fsl,raideng-v1.0;
+   #address-cells = 1;
+   #size-cells = 1;
+   reg = 0x32 0x1;
+   ranges = 0 0x32 0x1;
+

Re: [PATCH v2] powerpc: fix personality handling in ppc64_personality()

2012-09-05 Thread Jiri Kosina
On Wed, 5 Sep 2012, Benjamin Herrenschmidt wrote:

  Directly comparing current-personality against PER_LINUX32 doesn't work
  in cases when any of the personality flags stored in the top three bytes
  are used.
  
  Directly forcefully setting personality to PER_LINUX32 or PER_LINUX
  discards any flags stored in the top three bytes
  
  Use personality() macro to compare only PER_MASK bytes and make sure that
  we are setting only the bits that should be set, instead of
  overwriting the whole value.
  
  Signed-off-by: Jiri Kosina jkos...@suse.cz
  ---
  
  changed since v1: fix the bit ops to reflect the fact that PER_LINUX is 
  actually 0
 
 Had already merged v1 (oops.. didn't spot the issue with PER_LINUX being
 0). Can you send an incremental fixup ?

Hi Benjamin,

actually commit 7256a5d2da56 seems to contain the correct PER_LINUX 
handling, so seems like you picked the right one :)

Thanks,

-- 
Jiri Kosina
SUSE Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 07/21] memory-hotplug: call acpi_bus_remove() to remove memory device

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

The memory device has been ejected and powoffed, so we can call
acpi_bus_remove() to remove the memory device from acpi bus.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 drivers/acpi/acpi_memhotplug.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 9d47458..b152767 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -425,8 +425,9 @@ static void acpi_memory_device_notify(acpi_handle handle, 
u32 event, void *data)
}
 
/*
-* TBD: Invoke acpi_bus_remove to cleanup data structures
+* Invoke acpi_bus_remove() to remove memory device
 */
+   acpi_bus_remove(device, 1);
 
/* _EJ0 succeeded; _OST is not necessary */
return;
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 03/21] memory-hotplug: store the node id in acpi_memory_device

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

The memory device has only one node id. Store the node id when
enable the memory device, and we can reuse it when removing the
memory device.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 drivers/acpi/acpi_memhotplug.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 2a7beac..7873832 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -83,6 +83,7 @@ struct acpi_memory_info {
 struct acpi_memory_device {
struct acpi_device * device;
unsigned int state; /* State of the memory device */
+   int nid;
struct list_head res_list;
 };
 
@@ -256,6 +257,9 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
info-enabled = 1;
num_enabled++;
}
+
+   mem_device-nid = node;
+
if (!num_enabled) {
printk(KERN_ERR PREFIX add_memory failed\n);
mem_device-state = MEMORY_INVALID_STATE;
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 09/21] memory-hotplug: does not release memory region in PAGES_PER_SECTION chunks

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

Since applying a patch(de7f0cba96786c), release_mem_region() has been changed
as called in PAGES_PER_SECTION chunks because register_memory_resource() is
called in PAGES_PER_SECTION chunks by add_memory(). But it seems firmware
dependency. If CRS are written in the PAGES_PER_SECTION chunks in ACPI DSDT
Table, register_memory_resource() is called in PAGES_PER_SECTION chunks.
But if CRS are written in the DIMM unit in ACPI DSDT Table,
register_memory_resource() is called in DIMM unit. So release_mem_region()
should not be called in PAGES_PER_SECTION chunks. The patch fixes it.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |   13 +
 mm/memory_hotplug.c |4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 11d8e05..dc0a035 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -77,7 +77,8 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
 {
unsigned long start, start_pfn;
struct zone *zone;
-   int ret;
+   int i, ret;
+   int sections_to_remove;
 
start_pfn = base  PAGE_SHIFT;
 
@@ -97,9 +98,13 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
 * to sysfs state file and we can't remove sysfs entries
 * while writing to it. So we have to defer it to here.
 */
-   ret = __remove_pages(zone, start_pfn, memblock_size  PAGE_SHIFT);
-   if (ret)
-   return ret;
+   sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
+   for (i = 0; i  sections_to_remove; i++) {
+   unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
+   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   if (ret)
+   return ret;
+   }
 
/*
 * Update memory regions for memory remove
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e74a01d..2353887 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -358,11 +358,11 @@ int __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
BUG_ON(phys_start_pfn  ~PAGE_SECTION_MASK);
BUG_ON(nr_pages % PAGES_PER_SECTION);
 
+   release_mem_region(phys_start_pfn  PAGE_SHIFT,  nr_pages * PAGE_SIZE);
+
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
-   release_mem_region(pfn  PAGE_SHIFT,
-  PAGES_PER_SECTION  PAGE_SHIFT);
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

remove_memory() only try to offline pages. It is called in two cases:
1. hot remove a memory device
2. echo offline /sys/devices/system/memory/memoryXX/state

In the 1st case, we should also change memory block's state, and notify
the userspace that the memory block's state is changed after offlining
pages.

So rename remove_memory() to offline_memory()/offline_pages(). And in
the 1st case, offline_memory() will be used. The function offline_memory()
is not implemented. In the 2nd case, offline_pages() will be used.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 drivers/acpi/acpi_memhotplug.c |2 +-
 drivers/base/memory.c  |9 +++--
 include/linux/memory_hotplug.h |3 ++-
 mm/memory_hotplug.c|   22 ++
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 24c807f..2a7beac 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,7 @@ static int acpi_memory_disable_device(struct 
acpi_memory_device *mem_device)
 */
list_for_each_entry_safe(info, n, mem_device-res_list, list) {
if (info-enabled) {
-   result = remove_memory(info-start_addr, info-length);
+   result = offline_memory(info-start_addr, info-length);
if (result)
return result;
}
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..44e7de6 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -248,26 +248,23 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn,
 static int
 memory_block_action(unsigned long phys_index, unsigned long action)
 {
-   unsigned long start_pfn, start_paddr;
+   unsigned long start_pfn;
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
struct page *first_page;
int ret;
 
first_page = pfn_to_page(phys_index  PFN_SECTION_SHIFT);
+   start_pfn = page_to_pfn(first_page);
 
switch (action) {
case MEM_ONLINE:
-   start_pfn = page_to_pfn(first_page);
-
if (!pages_correctly_reserved(start_pfn, nr_pages))
return -EBUSY;
 
ret = online_pages(start_pfn, nr_pages);
break;
case MEM_OFFLINE:
-   start_paddr = page_to_pfn(first_page)  PAGE_SHIFT;
-   ret = remove_memory(start_paddr,
-   nr_pages  PAGE_SHIFT);
+   ret = offline_pages(start_pfn, nr_pages);
break;
default:
WARN(1, KERN_WARNING %s(%ld, %ld) unknown action: 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..c183f39 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -233,7 +233,8 @@ static inline int is_mem_section_removable(unsigned long 
pfn,
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
*ms);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3ad25f9..bb42316 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -866,7 +866,7 @@ check_pages_isolated(unsigned long start_pfn, unsigned long 
end_pfn)
return offlined;
 }
 
-static int __ref offline_pages(unsigned long start_pfn,
+static int __ref __offline_pages(unsigned long start_pfn,
  unsigned long end_pfn, unsigned long timeout)
 {
unsigned long pfn, nr_pages, expire;
@@ -994,18 +994,24 @@ out:
return ret;
 }
 
-int remove_memory(u64 start, u64 size)
+int offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
-   unsigned long start_pfn, end_pfn;
+   return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ);
+}
 
-   start_pfn = 

[RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

  - acpi_memory_info  : [RFC PATCH 4/19]
  - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
  - iomem_resource: [RFC PATCH 9/19]
  - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
  - page table of removed memory  : [RFC PATCH 12/19]
  - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.

How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
   ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug
3. hotplug the memory device(it depends on your hardware)
   You will see the memory device under the directory /sys/bus/acpi/devices/.
   Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
   You can write online/offline to /sys/devices/system/memory/memoryX/state to
   online/offline pages provided by this memory device
5. hotremove the memory device
   You can hotremove the memory device by the hardware, or writing 1 to
   /sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
   For example: there is a memory device on node 1. The address range
   is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
   and memory11 under the directory /sys/devices/system/memory/.
   If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
   when we online pages. When we online memory8, the memory stored page cgroup
   is not provided by this memory device. But when we online memory9, the memory
   stored page cgroup may be provided by memory8. So we can't offline memory8
   now. We should offline the memory in the reversed order.
   When the memory device is hotremoved, we will auto offline memory provided
   by this memory device. But we don't know which memory is onlined first, so
   offlining memory may fail. In such case, you should offline the memory by
   hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
   This bug will be fixed by Liu Jiang's patch:
   https://lkml.org/lkml/2012/7/3/1

change log of v9:
 [RFC PATCH v9 8/21]
   * add a lock to protect the list map_entries
   * add an indicator to firmware_map_entry to remember whether the memory
 is allocated from bootmem
 [RFC PATCH v9 10/21]
   * change the macro to inline function
 [RFC PATCH v9 19/21]
   * don't offline the node if the cpu on the node is onlined
 [RFC PATCH v9 21/21]
   * create new patch: auto offline page_cgroup when onlining memory block
 failed

change log of v8:
 [RFC PATCH v8 17/20]
   * Fix problems when one node's range include the other nodes
 [RFC PATCH v8 18/20]
   * fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
 is not defined.
 [RFC PATCH v8 19/20]
   * don't offline node when some memory sections are not removed
 [RFC PATCH v8 20/20]
   * create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
 [RFC PATCH v7 4/19]
   * do not continue if acpi_memory_device_remove_memory() fails.
 [RFC PATCH v7 15/19]
   * handle usemap in register_page_bootmem_info_section() too.

change log of v6:
 [RFC PATCH v6 12/19]
   * fix building error on other archtitectures than x86

 [RFC PATCH v6 15-16/19]
   * fix building error on other archtitectures than x86

change log of v5:
 * merge the patchset to clear page table and the patchset to hot remove
   memory(from ishimatsu) to one big patchset.

 [RFC PATCH v5 1/19]
   * rename remove_memory() to offline_memory()/offline_pages()

 [RFC PATCH v5 2/19]
   * new patch: implement offline_memory(). This function offlines pages,
 update memory block's state, and notify the userspace that the memory
 block's state is changed.

 [RFC PATCH v5 4/19]
   * offline and remove memory in acpi_memory_disable_device() too.

 [RFC PATCH v5 17/19]
   * new patch: add a new function __remove_zone() to revert the things done
 in the function __add_zone().

 [RFC PATCH v5 18/19]
   * flush work befor reseting node device.

change log of v4:
 * remove memory-hotplug : unify argument of firmware_map_add_early/hotplug
   from the patch series, since the patch is a bugfix. It is being disccussed
   on other thread. But for testing the patch series, the patch is needed.
   So I added the patch as [PATCH 0/13].

 [RFC PATCH v4 2/13]
   * check memory is online or not at remove_memory()
   * add memory_add_physaddr_to_nid() to acpi_memory_device_remove() for
 getting node id
 
 [RFC PATCH v4 3/13]
   * create new patch : check memory is online or not 

[RFC v9 PATCH 08/21] memory-hotplug: remove /sys/firmware/memmap/X sysfs

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type}
sysfs files are created. But there is no code to remove these files. The patch
implements the function to remove them.

Note : The code does not free firmware_map_entry since there is no way to free
   memory which is allocated by bootmem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 drivers/firmware/memmap.c|   98 +-
 include/linux/firmware-map.h |6 +++
 mm/memory_hotplug.c  |9 +++-
 3 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/memmap.c b/drivers/firmware/memmap.c
index c1cdc92..6740d26 100644
--- a/drivers/firmware/memmap.c
+++ b/drivers/firmware/memmap.c
@@ -21,6 +21,7 @@
 #include linux/types.h
 #include linux/bootmem.h
 #include linux/slab.h
+#include linux/mm.h
 
 /*
  * Data types 
--
@@ -41,6 +42,7 @@ struct firmware_map_entry {
const char  *type;  /* type of the memory range */
struct list_headlist;   /* entry for the linked list */
struct kobject  kobj;   /* kobject for each entry */
+   unsigned intbootmem:1; /* allocated from bootmem */
 };
 
 /*
@@ -79,7 +81,26 @@ static const struct sysfs_ops memmap_attr_ops = {
.show = memmap_attr_show,
 };
 
+
+static inline struct firmware_map_entry *
+to_memmap_entry(struct kobject *kobj)
+{
+   return container_of(kobj, struct firmware_map_entry, kobj);
+}
+
+static void release_firmware_map_entry(struct kobject *kobj)
+{
+   struct firmware_map_entry *entry = to_memmap_entry(kobj);
+
+   if (entry-bootmem)
+   /* There is no way to free memory allocated from bootmem */
+   return;
+
+   kfree(entry);
+}
+
 static struct kobj_type memmap_ktype = {
+   .release= release_firmware_map_entry,
.sysfs_ops  = memmap_attr_ops,
.default_attrs  = def_attrs,
 };
@@ -94,6 +115,7 @@ static struct kobj_type memmap_ktype = {
  * in firmware initialisation code in one single thread of execution.
  */
 static LIST_HEAD(map_entries);
+static DEFINE_SPINLOCK(map_entries_lock);
 
 /**
  * firmware_map_add_entry() - Does the real work to add a firmware memmap 
entry.
@@ -118,11 +140,25 @@ static int firmware_map_add_entry(u64 start, u64 end,
INIT_LIST_HEAD(entry-list);
kobject_init(entry-kobj, memmap_ktype);
 
+   spin_lock(map_entries_lock);
list_add_tail(entry-list, map_entries);
+   spin_unlock(map_entries_lock);
 
return 0;
 }
 
+/**
+ * firmware_map_remove_entry() - Does the real work to remove a firmware
+ * memmap entry.
+ * @entry: removed entry.
+ **/
+static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
+{
+   spin_lock(map_entries_lock);
+   list_del(entry-list);
+   spin_unlock(map_entries_lock);
+}
+
 /*
  * Add memmap entry on sysfs
  */
@@ -144,6 +180,35 @@ static int add_sysfs_fw_map_entry(struct 
firmware_map_entry *entry)
return 0;
 }
 
+/*
+ * Remove memmap entry on sysfs
+ */
+static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
+{
+   kobject_put(entry-kobj);
+}
+
+/*
+ * Search memmap entry
+ */
+
+static struct firmware_map_entry * __meminit
+firmware_map_find_entry(u64 start, u64 end, const char *type)
+{
+   struct firmware_map_entry *entry;
+
+   spin_lock(map_entries_lock);
+   list_for_each_entry(entry, map_entries, list)
+   if ((entry-start == start)  (entry-end == end) 
+   (!strcmp(entry-type, type))) {
+   spin_unlock(map_entries_lock);
+   return entry;
+   }
+
+   spin_unlock(map_entries_lock);
+   return NULL;
+}
+
 /**
  * firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
  * memory hotplug.
@@ -193,9 +258,36 @@ int __init firmware_map_add_early(u64 start, u64 end, 
const char *type)
if (WARN_ON(!entry))
return -ENOMEM;
 
+   entry-bootmem = 1;
return firmware_map_add_entry(start, end, type, entry);
 }
 
+/**
+ * firmware_map_remove() - remove a firmware mapping entry
+ * @start: Start of the memory range.
+ * @end:   End of the memory range.
+ * @type:  Type of the memory range.
+ *
+ * removes a firmware mapping entry.
+ *
+ * Returns 0 on success, or -EINVAL if no entry.
+ **/
+int __meminit 

[RFC v9 PATCH 06/21] memory-hotplug: export the function acpi_bus_remove()

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

The function acpi_bus_remove() can remove a acpi device from acpi device.
When a acpi device is removed, we need to call this function to remove
the acpi device from acpi bus. So export this function.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 drivers/acpi/scan.c |3 ++-
 include/acpi/acpi_bus.h |1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index d1ecca2..1cefc34 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1224,7 +1224,7 @@ static int acpi_device_set_context(struct acpi_device 
*device)
return -ENODEV;
 }
 
-static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 {
if (!dev)
return -EINVAL;
@@ -1246,6 +1246,7 @@ static int acpi_bus_remove(struct acpi_device *dev, int 
rmdevice)
 
return 0;
 }
+EXPORT_SYMBOL(acpi_bus_remove);
 
 static int acpi_add_single_object(struct acpi_device **child,
  acpi_handle handle, int type,
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index bde976e..2ccf109 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -360,6 +360,7 @@ bool acpi_bus_power_manageable(acpi_handle handle);
 bool acpi_bus_can_wakeup(acpi_handle handle);
 int acpi_power_resource_register_device(struct device *dev, acpi_handle 
handle);
 void acpi_power_resource_unregister_device(struct device *dev, acpi_handle 
handle);
+int acpi_bus_remove(struct acpi_device *dev, int rmdevice);
 #ifdef CONFIG_ACPI_PROC_EVENT
 int acpi_bus_generate_proc_event(struct acpi_device *device, u8 type, int 
data);
 int acpi_bus_generate_proc_event4(const char *class, const char *bid, u8 type, 
int data);
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 05/21] memory-hotplug: check whether memory is present or not

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

If system supports memory hot-remove, online_pages() may online removed pages.
So online_pages() need to check whether onlining pages are present or not.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 include/linux/mmzone.h |   19 +++
 mm/memory_hotplug.c|   13 +
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2daa54f..ac3ae30 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1180,6 +1180,25 @@ void sparse_init(void);
 #define sparse_index_init(_sec, _nid)  do {} while (0)
 #endif /* CONFIG_SPARSEMEM */
 
+#ifdef CONFIG_SPARSEMEM
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   int i;
+   for (i = 0; i  nr_pages; i++) {
+   if (pfn_present(pfn + i))
+   continue;
+   else
+   return -EINVAL;
+   }
+   return 0;
+}
+#else
+static inline int pfns_present(unsigned long pfn, unsigned long nr_pages)
+{
+   return 0;
+}
+#endif /* CONFIG_SPARSEMEM*/
+
 #ifdef CONFIG_NODES_SPAN_OTHER_NODES
 bool early_pfn_in_nid(unsigned long pfn, int nid);
 #else
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 49f7747..299747d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -467,6 +467,19 @@ int __ref online_pages(unsigned long pfn, unsigned long 
nr_pages)
struct memory_notify arg;
 
lock_memory_hotplug();
+   /*
+* If system supports memory hot-remove, the memory may have been
+* removed. So we check whether the memory has been removed or not.
+*
+* Note: When CONFIG_SPARSEMEM is defined, pfns_present() become
+*   effective. If CONFIG_SPARSEMEM is not defined, pfns_present()
+*   always returns 0.
+*/
+   ret = pfns_present(pfn, nr_pages);
+   if (ret) {
+   unlock_memory_hotplug();
+   return ret;
+   }
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 04/21] memory-hotplug: offline and remove memory when removing the memory device

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

We should offline and remove memory when removing the memory device.
The memory device can be removed by 2 ways:
1. send eject request by SCI
2. echo 1 /sys/bus/pci/devices/PNP0C80:XX/eject

In the 1st case, acpi_memory_disable_device() will be called. In the 2nd
case, acpi_memory_device_remove() will be called. acpi_memory_device_remove()
will also be called when we unbind the memory device from the driver
acpi_memhotplug. If the type is ACPI_BUS_REMOVAL_EJECT, it means
that the user wants to eject the memory device, and we should offline
and remove memory in acpi_memory_device_remove().

The function remove_memory() is not implemeted now. It only check whether
all memory has been offllined now.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 drivers/acpi/acpi_memhotplug.c |   45 +--
 drivers/base/memory.c  |   39 ++
 include/linux/memory.h |5 
 include/linux/memory_hotplug.h |5 
 mm/memory_hotplug.c|   22 +++
 5 files changed, 109 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 7873832..9d47458 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -29,6 +29,7 @@
 #include linux/module.h
 #include linux/init.h
 #include linux/types.h
+#include linux/memory.h
 #include linux/memory_hotplug.h
 #include linux/slab.h
 #include acpi/acpi_drivers.h
@@ -310,25 +311,44 @@ static int acpi_memory_powerdown_device(struct 
acpi_memory_device *mem_device)
return 0;
 }
 
-static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+static int
+acpi_memory_device_remove_memory(struct acpi_memory_device *mem_device)
 {
int result;
struct acpi_memory_info *info, *n;
+   int node = mem_device-nid;
 
-
-   /*
-* Ask the VM to offline this memory range.
-* Note: Assume that this function returns zero on success
-*/
list_for_each_entry_safe(info, n, mem_device-res_list, list) {
if (info-enabled) {
result = offline_memory(info-start_addr, info-length);
if (result)
return result;
+
+   result = remove_memory(node, info-start_addr,
+  info-length);
+   if (result)
+   return result;
}
+
+   list_del(info-list);
kfree(info);
}
 
+   return 0;
+}
+
+static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
+{
+   int result;
+
+   /*
+* Ask the VM to offline this memory range.
+* Note: Assume that this function returns zero on success
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+   if (result)
+   return result;
+
/* Power-off and eject the device */
result = acpi_memory_powerdown_device(mem_device);
if (result) {
@@ -477,12 +497,23 @@ static int acpi_memory_device_add(struct acpi_device 
*device)
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
struct acpi_memory_device *mem_device = NULL;
-
+   int result;
 
if (!device || !acpi_driver_data(device))
return -EINVAL;
 
mem_device = acpi_driver_data(device);
+
+   if (type == ACPI_BUS_REMOVAL_EJECT) {
+   /*
+* offline and remove memory only when the memory device is
+* ejected.
+*/
+   result = acpi_memory_device_remove_memory(mem_device);
+   if (result)
+   return result;
+   }
+
kfree(mem_device);
 
return 0;
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 86c8821..038be73 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -70,6 +70,45 @@ void unregister_memory_isolate_notifier(struct 
notifier_block *nb)
 }
 EXPORT_SYMBOL(unregister_memory_isolate_notifier);
 
+bool is_memblk_offline(unsigned long start, unsigned long size)
+{
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = PFN_UP(start + size);
+
+   for (pfn = start_pfn; pfn  

[RFC v9 PATCH 02/21] memory-hotplug: implement offline_memory()

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

The function offline_memory() will be called when hot removing a
memory device. The memory device may contain more than one memory
block. If the memory block has been offlined, __offline_pages()
will fail. So we should try to offline one memory block at a
time.

If the memory block is offlined in offline_memory(), we also
update it's state, and notify the userspace that its state is
changed.

The function offline_memory() also check each memory block's
state. So there is no need to check the memory block's state
before calling offline_memory().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
CC: Vasilis Liaskovitis vasilis.liaskovi...@profitbricks.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 drivers/base/memory.c  |   31 +++
 include/linux/memory_hotplug.h |2 ++
 mm/memory_hotplug.c|   37 -
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 44e7de6..86c8821 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -275,13 +275,11 @@ memory_block_action(unsigned long phys_index, unsigned 
long action)
return ret;
 }
 
-static int memory_block_change_state(struct memory_block *mem,
+static int __memory_block_change_state(struct memory_block *mem,
unsigned long to_state, unsigned long from_state_req)
 {
int ret = 0;
 
-   mutex_lock(mem-state_mutex);
-
if (mem-state != from_state_req) {
ret = -EINVAL;
goto out;
@@ -309,10 +307,20 @@ static int memory_block_change_state(struct memory_block 
*mem,
break;
}
 out:
-   mutex_unlock(mem-state_mutex);
return ret;
 }
 
+static int memory_block_change_state(struct memory_block *mem,
+   unsigned long to_state, unsigned long from_state_req)
+{
+   int ret;
+
+   mutex_lock(mem-state_mutex);
+   ret = __memory_block_change_state(mem, to_state, from_state_req);
+   mutex_unlock(mem-state_mutex);
+
+   return ret;
+}
 static ssize_t
 store_mem_state(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
@@ -653,6 +661,21 @@ int unregister_memory_section(struct mem_section *section)
 }
 
 /*
+ * offline one memory block. If the memory block has been offlined, do nothing.
+ */
+int offline_memory_block(struct memory_block *mem)
+{
+   int ret = 0;
+
+   mutex_lock(mem-state_mutex);
+   if (mem-state != MEM_OFFLINE)
+   ret = __memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+   mutex_unlock(mem-state_mutex);
+
+   return ret;
+}
+
+/*
  * Initialize the sysfs support for memory devices...
  */
 int __init memory_dev_init(void)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index c183f39..0b040bb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -10,6 +10,7 @@ struct page;
 struct zone;
 struct pglist_data;
 struct mem_section;
+struct memory_block;
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 
@@ -234,6 +235,7 @@ extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
+extern int offline_memory_block(struct memory_block *mem);
 extern int offline_memory(u64 start, u64 size);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index bb42316..6fc1908 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1001,7 +1001,42 @@ int offline_pages(unsigned long start_pfn, unsigned long 
nr_pages)
 
 int offline_memory(u64 start, u64 size)
 {
-   return -EINVAL;
+   struct memory_block *mem = NULL;
+   struct mem_section *section;
+   unsigned long start_pfn, end_pfn;
+   unsigned long pfn, section_nr;
+   int ret;
+
+   start_pfn = PFN_DOWN(start);
+   end_pfn = start_pfn + PFN_DOWN(size);
+
+   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
+   section_nr = pfn_to_section_nr(pfn);
+   if (!present_section_nr(section_nr))
+   continue;
+
+   section = __nr_to_section(section_nr);
+   /* same memblock? */
+   if (mem)
+   if ((section_nr = 

[RFC v9 PATCH 20/21] memory-hotplug: clear hwpoisoned flag when onlining pages

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

hwpoisoned may set when we offline a page by the sysfs interface
/sys/devices/system/memory/soft_offline_page or
/sys/devices/system/memory/hard_offline_page. If we don't clear
this flag when onlining pages, this page can't be freed, and will
not in free list. So we can't offline these pages again. So we
should clear this flag when onlining pages.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 mm/memory_hotplug.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 270c249..140c080 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -661,6 +661,11 @@ EXPORT_SYMBOL_GPL(__online_page_increment_counters);
 
 void __online_page_free(struct page *page)
 {
+#ifdef CONFIG_MEMORY_FAILURE
+   /* The page may be marked HWPoisoned by soft/hard offline page */
+   ClearPageHWPoison(page);
+#endif
+
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 15/21] memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by get_page_bootmem().
So the patch searches pages of virtual mapping and registers the pages by
get_page_bootmem().

Note: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390,
and sparc.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 arch/ia64/mm/discontig.c   |6 
 arch/powerpc/mm/init_64.c  |6 
 arch/s390/mm/vmem.c|6 
 arch/sparc/mm/init_64.c|6 
 arch/x86/mm/init_64.c  |   52 
 include/linux/memory_hotplug.h |2 +
 include/linux/mm.h |3 +-
 mm/memory_hotplug.c|   31 +--
 8 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index c641333..33943db 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -822,4 +822,10 @@ int __meminit vmemmap_populate(struct page *start_page,
 {
return vmemmap_populate_basepages(start_page, size, node);
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
 #endif
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 620b7ac..3690c44 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -298,5 +298,11 @@ int __meminit vmemmap_populate(struct page *start_page,
 
return 0;
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index 6f896e7..eda55cd 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -227,6 +227,12 @@ out:
return ret;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
+
 /*
  * Add memory segment to the segment list if it doesn't overlap with
  * an already present segment.
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index d58edf5..add1cc7 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2077,6 +2077,12 @@ void __meminit vmemmap_populate_print_last(void)
node_start = 0;
}
 }
+
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   /* TODO */
+}
 #endif /* CONFIG_SPARSEMEM_VMEMMAP */
 
 static void prot_init_common(unsigned long page_none,
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e0d88ba..0075592 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,58 @@ vmemmap_populate(struct page *start_page, unsigned long 
size, int node)
return 0;
 }
 
+void register_page_bootmem_memmap(unsigned long section_nr,
+ struct page *start_page, unsigned long size)
+{
+   unsigned long addr = (unsigned long)start_page;
+   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long next;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+
+   for (; addr  end; addr = next) {
+   pte_t *pte = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud)) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   continue;
+   }
+   get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   continue;
+   get_page_bootmem(section_nr, pmd_page(*pmd),
+MIX_SECTION_INFO);
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+  

[RFC v9 PATCH 21/21] memory-hotplug: auto offline page_cgroup when onlining memory block failed

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

When a memory block is onlined, we will try allocate memory on that node
to store page_cgroup. If onlining the memory block failed, we don't
offline the page cgroup, and we have no chance to offline this page cgroup
unless the memory block is onlined successfully again. It will cause
that we can't hot-remove the memory device on that node, because some
memory is used to store page cgroup. If onlining the memory block
is failed, there is no need to stort page cgroup for this memory. So
auto offline page_cgroup when onlining memory block failed.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 mm/page_cgroup.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..44db00e 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -251,6 +251,9 @@ static int __meminit page_cgroup_callback(struct 
notifier_block *self,
mn-nr_pages, mn-status_change_nid);
break;
case MEM_CANCEL_ONLINE:
+   offline_page_cgroup(mn-start_pfn,
+   mn-nr_pages, mn-status_change_nid);
+   break;
case MEM_GOING_OFFLINE:
break;
case MEM_ONLINE:
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 13/25] macintosh/mediabay: add a const qualifier

2012-09-05 Thread Benjamin Herrenschmidt
On Wed, 2012-09-05 at 10:02 +0200, Uwe Kleine-König wrote:
 Hello,
 
 On Wed, Sep 05, 2012 at 12:40:17PM +1000, Benjamin Herrenschmidt wrote:
  On Mon, 2012-07-23 at 11:13 +0200, Uwe Kleine-König wrote:
   This prepares *of_device_id.data becoming const. Without this change
   the following warning would occur:
   
 drivers/macintosh/mediabay.c: In function 'media_bay_attach':
 drivers/macintosh/mediabay.c:589:11: warning: assignment discards 
   'const' qualifier from pointer target type [enabled by default]
   
   Signed-off-by: Uwe Kleine-König u.kleine-koe...@pengutronix.de
   ---
  
  Ack all of these assuming you test built (I didn't).
 all means the two mediabay patches? And yes, they are all built
 tested.

Yeah, anything powerpc you had in your series, but the two mediabay ones
are what I spotted when digging through stuff today :-)

   Do you need me to
  carry any of this via the powerpc tree ?
 The patch that adds the const to of_device_id depends[1] on this patch.
 And the other mediabay is only valid after that const is added. So the
 easiest is if the series is pulled in one go. Arnd intended to let Linus
 pull it for 3.6-rc1 but it was missed because of a communication
 problem. So I intend to get it in during the 3.7 merge window.

Ok, I'm happy for Arnd to carry all of these, I trust his
judgement/review for that stuff including for ppc specific bits.

Cheers,
Ben.

 Best regards
 Uwe
 
 [1] it's only a soft depend, because the result is only a warning, but
 still.
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC v9 PATCH 13/21] memory-hotplug: check page type in get_page_bootmem

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

The function get_page_bootmem() may be called more than one time to the same
page. There is no need to set page's type, private if the function is not
the first time called to the page.

Note: the patch is just optimization and does not fix any problem.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 mm/memory_hotplug.c |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d736df3..26a5012 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
-   page-lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(page-_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long)page-lru.next;
+   if (page_type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   page_type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page-lru.next = (struct list_head *)type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(page-_count);
+   } else
+   atomic_inc(page-_count);
 }
 
 /* reference to __meminit __free_pages_bootmem is valid
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 10/21] memory-hotplug: add memory_block_release

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

When calling remove_memory_block(), the function shows following message at
device_release().

Device 'memory528' does not have a release() function, it is broken and must
be fixed.

remove_memory_block() calls kfree(mem). I think it shouled be called from
device_release(). So the patch implements memory_block_release()

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 drivers/base/memory.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 038be73..f44d624 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -109,6 +109,13 @@ bool is_memblk_offline(unsigned long start, unsigned long 
size)
 }
 EXPORT_SYMBOL(is_memblk_offline);
 
+static void release_memory_block(struct device *dev)
+{
+   struct memory_block *mem = container_of(dev, struct memory_block, dev);
+
+   kfree(mem);
+}
+
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
@@ -119,6 +126,7 @@ int register_memory(struct memory_block *memory)
 
memory-dev.bus = memory_subsys;
memory-dev.id = memory-start_section_nr / sections_per_block;
+   memory-dev.release = release_memory_block;
 
error = device_register(memory-dev);
return error;
@@ -674,7 +682,6 @@ int remove_memory_block(unsigned long node_id, struct 
mem_section *section,
mem_remove_simple_file(mem, phys_device);
mem_remove_simple_file(mem, removable);
unregister_memory(mem);
-   kfree(mem);
} else
kobject_put(mem-dev.kobj);
 
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 19/21] memory-hotplug: remove sysfs file of node

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

This patch introduces a new function try_offline_node() to
remove sysfs file of node when all memory sections of this
node are removed. If some memory sections of this node are
not removed, this function does nothing.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 mm/memory_hotplug.c |   54 +++
 1 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index afda7e9..270c249 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -29,6 +29,7 @@
 #include linux/suspend.h
 #include linux/mm_inline.h
 #include linux/firmware-map.h
+#include linux/stop_machine.h
 
 #include asm/tlbflush.h
 
@@ -1285,6 +1286,57 @@ int offline_memory(u64 start, u64 size)
return 0;
 }
 
+static int check_cpu_on_node(void *data)
+{
+   struct pglist_data *pgdat = data;
+   int cpu;
+
+   for_each_online_cpu(cpu) {
+   if (cpu_to_node(cpu) == pgdat-node_id)
+   /*
+* the cpu on this node is onlined, and we can't
+* offline this node.
+*/
+   return -EBUSY;
+   }
+
+   return 0;
+}
+
+/* offline the node if all memory sections of this node are removed */
+static void try_offline_node(int nid)
+{
+   unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn;
+   unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages;
+   unsigned long pfn;
+
+   for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
+   unsigned long section_nr = pfn_to_section_nr(pfn);
+
+   if (!present_section_nr(section_nr))
+   continue;
+
+   if (pfn_to_nid(pfn) != nid)
+   continue;
+
+   /*
+* some memory sections of this node are not removed, and we
+* can't offline node now.
+*/
+   return;
+   }
+
+   if (stop_machine(check_cpu_on_node, NODE_DATA(nid), NULL))
+   return;
+
+   /*
+* all memory sections of this node are removed, we can offline this
+* node now.
+*/
+   node_set_offline(nid);
+   unregister_one_node(nid);
+}
+
 int __ref remove_memory(int nid, u64 start, u64 size)
 {
int ret = 0;
@@ -1305,6 +1357,8 @@ int __ref remove_memory(int nid, u64 start, u64 size)
firmware_map_remove(start, start + size, System RAM);
 
arch_remove_memory(start, size);
+
+   try_offline_node(nid);
 out:
unlock_memory_hotplug();
return ret;
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 14/21] memory-hotplug: move register_page_bootmem_info_node and put_page_bootmem for sparse-vmemmap

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

For implementing register_page_bootmem_info_node of sparse-vmemmap,
register_page_bootmem_info_node and put_page_bootmem are moved to
memory_hotplug.c

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 include/linux/memory_hotplug.h |9 -
 mm/memory_hotplug.c|8 ++--
 2 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index cdbbd79..1133e63 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -162,17 +162,8 @@ static inline void arch_refresh_nodedata(int nid, 
pg_data_t *pgdat)
 #endif /* CONFIG_NUMA */
 #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
 
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
-{
-}
-static inline void put_page_bootmem(struct page *page)
-{
-}
-#else
 extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
-#endif
 
 /*
  * Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 26a5012..df6857b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -91,7 +91,6 @@ static void release_memory_resource(struct resource *res)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
 {
@@ -127,6 +126,7 @@ void __ref put_page_bootmem(struct page *page)
 
 }
 
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
 static void register_page_bootmem_info_section(unsigned long start_pfn)
 {
unsigned long *usemap, mapsize, section_nr, i;
@@ -163,6 +163,11 @@ static void register_page_bootmem_info_section(unsigned 
long start_pfn)
get_page_bootmem(section_nr, page, MIX_SECTION_INFO);
 
 }
+#else
+static inline void register_page_bootmem_info_section(unsigned long start_pfn)
+{
+}
+#endif
 
 void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
@@ -198,7 +203,6 @@ void register_page_bootmem_info_node(struct pglist_data 
*pgdat)
register_page_bootmem_info_section(pfn);
 
 }
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
   unsigned long end_pfn)
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 11/21] memory-hotplug: remove_memory calls __remove_pages

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

The patch adds __remove_pages() to remove_memory(). Then the range of
phys_start_pfn argument and nr_pages argument in __remove_pagse() may
have different zone. So zone argument is removed from __remove_pages()
and __remove_pages() caluculates zone in each section.

When CONFIG_SPARSEMEM_VMEMMAP is defined, there is no way to remove a memmap.
So __remove_section only calls unregister_memory_section().

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 arch/powerpc/platforms/pseries/hotplug-memory.c |5 +
 include/linux/memory_hotplug.h  |3 +--
 mm/memory_hotplug.c |   17 ++---
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index dc0a035..cc14da4 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -76,7 +76,6 @@ unsigned long memory_block_size_bytes(void)
 static int pseries_remove_memblock(unsigned long base, unsigned int 
memblock_size)
 {
unsigned long start, start_pfn;
-   struct zone *zone;
int i, ret;
int sections_to_remove;
 
@@ -87,8 +86,6 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
return 0;
}
 
-   zone = page_zone(pfn_to_page(start_pfn));
-
/*
 * Remove section mappings and sysfs entries for the
 * section of the memory we are removing.
@@ -101,7 +98,7 @@ static int pseries_remove_memblock(unsigned long base, 
unsigned int memblock_siz
sections_to_remove = (memblock_size  PAGE_SHIFT) / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
-   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
+   ret = __remove_pages(start_pfn,  PAGES_PER_SECTION);
if (ret)
return ret;
}
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index fd84ea9..8bf820d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -90,8 +90,7 @@ extern bool is_pageblock_removable_nolock(struct page *page);
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(int nid, struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
-extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
-   unsigned long nr_pages);
+extern int __remove_pages(unsigned long start_pfn, unsigned long nr_pages);
 
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 2353887..7fbfc9f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -275,11 +275,14 @@ static int __meminit __add_section(int nid, struct zone 
*zone,
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static int __remove_section(struct zone *zone, struct mem_section *ms)
 {
-   /*
-* XXX: Freeing memmap with vmemmap is not implement yet.
-*  This should be removed later.
-*/
-   return -EBUSY;
+   int ret = -EINVAL;
+
+   if (!valid_section(ms))
+   return ret;
+
+   ret = unregister_memory_section(ms);
+
+   return ret;
 }
 #else
 static int __remove_section(struct zone *zone, struct mem_section *ms)
@@ -346,8 +349,7 @@ EXPORT_SYMBOL_GPL(__add_pages);
  * sure that pages are marked reserved and zones are adjust properly by
  * calling offline_pages().
  */
-int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
-unsigned long nr_pages)
+int __remove_pages(unsigned long phys_start_pfn, unsigned long nr_pages)
 {
unsigned long i, ret = 0;
int sections_to_remove;
@@ -363,6 +365,7 @@ int __remove_pages(struct zone *zone, unsigned long 
phys_start_pfn,
sections_to_remove = nr_pages / PAGES_PER_SECTION;
for (i = 0; i  sections_to_remove; i++) {
unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
+   struct zone *zone = page_zone(pfn_to_page(pfn));
ret = __remove_section(zone, __pfn_to_section(pfn));
if (ret)
break;
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RFC v9 PATCH 17/21] memory_hotplug: clear zone when the memory is removed

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

When a memory is added, we update zone's and pgdat's start_pfn and spanned_pages
in the function __add_zone(). So we should revert these when the memory is
removed. Add a new function __remove_zone() to do this.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 mm/memory_hotplug.c |  207 +++
 1 files changed, 207 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c54922c..afda7e9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -308,10 +308,213 @@ static int __meminit __add_section(int nid, struct zone 
*zone,
return register_new_memory(nid, __pfn_to_section(phys_start_pfn));
 }
 
+/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
+static int find_smallest_section_pfn(int nid, struct zone *zone,
+unsigned long start_pfn,
+unsigned long end_pfn)
+{
+   struct mem_section *ms;
+
+   for (; start_pfn  end_pfn; start_pfn += PAGES_PER_SECTION) {
+   ms = __pfn_to_section(start_pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   if (unlikely(pfn_to_nid(start_pfn)) != nid)
+   continue;
+
+   if (zone  zone != page_zone(pfn_to_page(start_pfn)))
+   continue;
+
+   return start_pfn;
+   }
+
+   return 0;
+}
+
+/* find the biggest valid pfn in the range [start_pfn, end_pfn). */
+static int find_biggest_section_pfn(int nid, struct zone *zone,
+   unsigned long start_pfn,
+   unsigned long end_pfn)
+{
+   struct mem_section *ms;
+   unsigned long pfn;
+
+   /* pfn is the end pfn of a memory section. */
+   pfn = end_pfn - 1;
+   for (; pfn = start_pfn; pfn -= PAGES_PER_SECTION) {
+   ms = __pfn_to_section(pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   if (unlikely(pfn_to_nid(pfn)) != nid)
+   continue;
+
+   if (zone  zone != page_zone(pfn_to_page(pfn)))
+   continue;
+
+   return pfn;
+   }
+
+   return 0;
+}
+
+static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
+unsigned long end_pfn)
+{
+   unsigned long zone_start_pfn =  zone-zone_start_pfn;
+   unsigned long zone_end_pfn = zone-zone_start_pfn + zone-spanned_pages;
+   unsigned long pfn;
+   struct mem_section *ms;
+   int nid = zone_to_nid(zone);
+
+   zone_span_writelock(zone);
+   if (zone_start_pfn == start_pfn) {
+   /*
+* If the section is smallest section in the zone, it need
+* shrink zone-zone_start_pfn and zone-zone_spanned_pages.
+* In this case, we find second smallest valid mem_section
+* for shrinking zone.
+*/
+   pfn = find_smallest_section_pfn(nid, zone, end_pfn,
+   zone_end_pfn);
+   if (pfn) {
+   zone-zone_start_pfn = pfn;
+   zone-spanned_pages = zone_end_pfn - pfn;
+   }
+   } else if (zone_end_pfn == end_pfn) {
+   /*
+* If the section is biggest section in the zone, it need
+* shrink zone-spanned_pages.
+* In this case, we find second biggest valid mem_section for
+* shrinking zone.
+*/
+   pfn = find_biggest_section_pfn(nid, zone, zone_start_pfn,
+  start_pfn);
+   if (pfn)
+   zone-spanned_pages = pfn - zone_start_pfn + 1;
+   }
+
+   /*
+* The section is not biggest or smallest mem_section in the zone, it
+* only creates a hole in the zone. So in this case, we need not
+* change the zone. But perhaps, the zone has only hole data. Thus
+* it check the zone has only hole or not.
+*/
+   pfn = zone_start_pfn;
+   for (; pfn  zone_end_pfn; pfn += PAGES_PER_SECTION) {
+   ms = __pfn_to_section(pfn);
+
+   if (unlikely(!valid_section(ms)))
+   continue;
+
+   if (page_zone(pfn_to_page(pfn)) != zone)
+   

[RFC v9 PATCH 16/21] memory-hotplug: free memmap of sparse-vmemmap

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

All pages of virtual mapping in removed memory cannot be freed, since some pages
used as PGD/PUD includes not only removed memory but also other memory. So the
patch checks whether page can be freed or not.

How to check whether page can be freed or not?
 1. When removing memory, the page structs of the revmoved memory are filled
with 0FD.
 2. All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared.
In this case, the page used as PT/PMD can be freed.

Applying patch, __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is integrated
into one. So __remove_section() of CONFIG_SPARSEMEM_VMEMMAP is deleted.

Note:  vmemmap_kfree() and vmemmap_free_bootmem() are not implemented for ia64,
ppc, s390, and sparc.

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
 arch/ia64/mm/discontig.c  |8 +++
 arch/powerpc/mm/init_64.c |8 +++
 arch/s390/mm/vmem.c   |8 +++
 arch/sparc/mm/init_64.c   |8 +++
 arch/x86/mm/init_64.c |  119 +
 include/linux/mm.h|2 +
 mm/memory_hotplug.c   |   17 +--
 mm/sparse.c   |5 +-
 8 files changed, 158 insertions(+), 17 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 33943db..0d23b69 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -823,6 +823,14 @@ int __meminit vmemmap_populate(struct page *start_page,
return vmemmap_populate_basepages(start_page, size, node);
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 3690c44..835a2b3 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -299,6 +299,14 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index eda55cd..4b42b0b 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -227,6 +227,14 @@ out:
return ret;
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index add1cc7..1384826 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2078,6 +2078,14 @@ void __meminit vmemmap_populate_print_last(void)
}
 }
 
+void vmemmap_kfree(struct page *memmap, unsigned long nr_pages)
+{
+}
+
+void vmemmap_free_bootmem(struct page *memmap, unsigned long nr_pages)
+{
+}
+
 void register_page_bootmem_memmap(unsigned long section_nr,
  struct page *start_page, unsigned long size)
 {
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0075592..4e8f8a4 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1138,6 +1138,125 @@ vmemmap_populate(struct page *start_page, unsigned long 
size, int node)
return 0;
 }
 
+#define PAGE_INUSE 0xFD
+
+unsigned long find_and_clear_pte_page(unsigned long addr, unsigned long end,
+   struct page **pp, int *page_size)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   void *page_addr;
+   unsigned long next;
+
+   *pp = NULL;
+
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd))
+   return pgd_addr_end(addr, end);
+
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud))
+   return pud_addr_end(addr, end);
+
+   if (!cpu_has_pse) {
+   next = (addr + PAGE_SIZE)  PAGE_MASK;
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd))
+   return next;
+
+   pte = pte_offset_kernel(pmd, addr);
+   if (pte_none(*pte))
+   return next;
+
+  

[RFC v9 PATCH 12/21] memory-hotplug: introduce new function arch_remove_memory()

2012-09-05 Thread wency
From: Wen Congyang we...@cn.fujitsu.com

We don't call __add_pages() directly in the function add_memory()
because some other architecture related things need to be done
before or after calling __add_pages(). So we should introduce
a new function arch_remove_memory() to revert the things
done in arch_add_memory().

Note: the function for s390 is not implemented(I don't know how to
implement it for s390).

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 arch/ia64/mm/init.c  |   16 
 arch/powerpc/mm/mem.c|   14 +++
 arch/s390/mm/init.c  |   12 +++
 arch/sh/mm/init.c|   15 +++
 arch/tile/mm/init.c  |8 ++
 arch/x86/include/asm/pgtable_types.h |1 +
 arch/x86/mm/init_32.c|   10 ++
 arch/x86/mm/init_64.c|  160 ++
 arch/x86/mm/pageattr.c   |   47 +-
 include/linux/memory_hotplug.h   |1 +
 mm/memory_hotplug.c  |1 +
 11 files changed, 263 insertions(+), 22 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 0eab454..1e345ed 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -688,6 +688,22 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
return ret;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+   int ret;
+
+   ret = __remove_pages(start_pfn, nr_pages);
+   if (ret)
+   pr_warn(%s: Problem encountered in __remove_pages() as
+ret=%d\n, __func__,  ret);
+
+   return ret;
+}
+#endif
 #endif
 
 /*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index fbdad0e..011170b 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -133,6 +133,20 @@ int arch_add_memory(int nid, u64 start, u64 size)
 
return __add_pages(nid, zone, start_pfn, nr_pages);
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+
+   start = (unsigned long)__va(start);
+   if (remove_section_mapping(start, start + size))
+   return -EINVAL;
+
+   return __remove_pages(start_pfn, nr_pages);
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
 /*
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 6adbc08..501b20e 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -257,4 +257,16 @@ int arch_add_memory(int nid, u64 start, u64 size)
vmem_remove_mapping(start, size);
return rc;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   /*
+* There is no hardware or firmware interface which could trigger a
+* hot memory remove on s390. So there is nothing that needs to be
+* implemented.
+*/
+   return -EBUSY;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 82cc576..fc84491 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -558,4 +558,19 @@ int memory_add_physaddr_to_nid(u64 addr)
 EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
 #endif
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   unsigned long start_pfn = start  PAGE_SHIFT;
+   unsigned long nr_pages = size  PAGE_SHIFT;
+   int ret;
+
+   ret = __remove_pages(start_pfn, nr_pages);
+   if (unlikely(ret))
+   pr_warn(%s: Failed, __remove_pages() == %d\n, __func__,
+   ret);
+
+   return ret;
+}
+#endif
 #endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index ef29d6c..2749515 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size)
 {
return -EINVAL;
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int arch_remove_memory(u64 start, u64 size)
+{
+   /* TODO */
+   return -EBUSY;
+}
+#endif
 #endif
 
 struct kmem_cache *pgd_cache;
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 013286a..b725af2 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -334,6 +334,7 @@ static inline void update_page_count(int level, unsigned 
long pages) { }
  * as a pte too.
  */
 extern pte_t *lookup_address(unsigned long address, 

[RFC v9 PATCH 18/21] memory-hotplug: add node_device_release

2012-09-05 Thread wency
From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

When calling unregister_node(), the function shows following message at
device_release().

Device 'node2' does not have a release() function, it is broken and must be
fixed.

So the patch implements node_device_release()

CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
Signed-off-by: Wen Congyang we...@cn.fujitsu.com
---
 drivers/base/node.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..07523fb 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -252,6 +252,16 @@ static inline void hugetlb_register_node(struct node 
*node) {}
 static inline void hugetlb_unregister_node(struct node *node) {}
 #endif
 
+static void node_device_release(struct device *dev)
+{
+   struct node *node_dev = to_node(dev);
+
+#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE)  defined(CONFIG_HUGETLBFS)
+   flush_work(node_dev-node_work);
+#endif
+
+   memset(node_dev, 0, sizeof(struct node));
+}
 
 /*
  * register_node - Setup a sysfs device for a node.
@@ -265,6 +275,7 @@ int register_node(struct node *node, int num, struct node 
*parent)
 
node-dev.id = num;
node-dev.bus = node_subsys;
+   node-dev.release = node_device_release;
error = device_register(node-dev);
 
if (!error){
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V7 04/12] arch/powerpc: Convert virtual address to vpn

2012-09-05 Thread Paul Mackerras
On Tue, Sep 04, 2012 at 02:31:21PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This patch convert different functions to take virtual page number
 instead of virtual address. Virtual page number is virtual address
 shifted right by VPN_SHIFT (12) bits. This enable us to have an
 address range of upto 76 bits.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

A few comments below...

 diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
 b/arch/powerpc/include/asm/mmu-hash64.h
 index 1c65a59..d3a1139 100644
 --- a/arch/powerpc/include/asm/mmu-hash64.h
 +++ b/arch/powerpc/include/asm/mmu-hash64.h
 @@ -15,6 +15,10 @@
  #include asm/asm-compat.h
  #include asm/page.h
  
 +#ifndef __ASSEMBLY__
 +#include linux/bug.h
 +#endif
 +

This is unnecessary, since you haven't added any BUG_ONs or WARN_ONs
in this file.

 @@ -233,13 +276,19 @@ static inline unsigned long hpt_va(unsigned long ea, 
 unsigned long vsid,
  static inline unsigned long hpt_hash(unsigned long va, unsigned int shift,
int ssize)
  {
 + int mask;
   unsigned long hash, vsid;
  
 + /* VPN_SHIFT can be atmost 12 */
   if (ssize == MMU_SEGSIZE_256M) {
 - hash = (va  28) ^ ((va  0x0fffUL)  shift);
 + mask = (1ul  (SID_SHIFT - VPN_SHIFT)) - 1;
 + hash = ((va  (SID_SHIFT - VPN_SHIFT))  0x007f) ^

You have added the  0x007f part, which is unnecessary
since the result is anded with that same value before being returned.

 + (((va  mask)  (shift - VPN_SHIFT))  0x);

Similarly the  0x is completely redundant, since you have anded
the va (really vpn) with the mask already.

   } else {
 - vsid = va  40;
 - hash = vsid ^ (vsid  25) ^ ((va  0xffUL)  shift);
 + mask = (1ul  (SID_SHIFT_1T - VPN_SHIFT)) - 1;
 + vsid = va  (SID_SHIFT_1T - VPN_SHIFT);
 + hash = (vsid  0xff) ^ ((vsid  25)  0x7f) ^

Here the vsid  0xff is actually wrong, since the architecture
says that this term takes the whole VSID, not just the bottom 24 bits.
I realise you were copying what happened before, where the VSID was
restricted to 24 bits anyway because the VA was restricted to 64 bits,
but now that you are extending the VA size you need to include the
whole VSID.

As in the previous case, the  0x7f part is redundant.

Also, it's very confusing to review this patch with variables called
va containing vpns.  It would actually be easier to review if you
combined this patch and the following one (that renames va to vpn).

 + (((va  mask)  (shift - VPN_SHIFT))  0xfff);

Once again the  0xfff is redundant because the  mask has
already removed any bits that the second and would remove.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] usb: gadget: fsl_udc_core: remove mapped flag

2012-09-05 Thread Felipe Balbi
Hi,

On Tue, Sep 04, 2012 at 07:24:59PM +0200, Enrico Scholz wrote:
 The 'mapped' flag in 'struct fsl_req' flag is redundant with checking
 for 'req.dma != DMA_ADDR_INVALID' and it was also set to a wrong value

you should not be using DMA_ADDR_INVALID anymore. Use the generic
map/unmap routines from udc-core.c

 (see 2nd hunk of patch).
 
 Replacing it in the way described above saves 60 bytes:
 
   function old new   delta
   fsl_udc_irq 29522940 -12
   ep0_prime_status 380 368 -12
   done 448 432 -16
   fsl_ep_queue 668 648 -20
 
 and has same (or less) runtime costs like evaluating 'req-mapped'.
 
 Signed-off-by: Enrico Scholz enrico.sch...@sigma-chemnitz.de
 ---
  drivers/usb/gadget/fsl_udc_core.c | 10 ++
  drivers/usb/gadget/fsl_usb2_udc.h |  1 -
  2 files changed, 2 insertions(+), 9 deletions(-)
 
 diff --git a/drivers/usb/gadget/fsl_udc_core.c 
 b/drivers/usb/gadget/fsl_udc_core.c
 index 55c4a61..1282a11 100644
 --- a/drivers/usb/gadget/fsl_udc_core.c
 +++ b/drivers/usb/gadget/fsl_udc_core.c
 @@ -195,14 +195,13 @@ static void done(struct fsl_ep *ep, struct fsl_req 
 *req, int status)
   dma_pool_free(udc-td_pool, curr_td, curr_td-td_dma);
   }
  
 - if (req-mapped) {
 + if (req-req.dma != DMA_ADDR_INVALID) {
   dma_unmap_single(ep-udc-gadget.dev.parent,
   req-req.dma, req-req.length,
   ep_is_in(ep)
   ? DMA_TO_DEVICE
   : DMA_FROM_DEVICE);
   req-req.dma = DMA_ADDR_INVALID;
 - req-mapped = 0;
   } else
   dma_sync_single_for_cpu(ep-udc-gadget.dev.parent,
   req-req.dma, req-req.length,
 @@ -915,15 +914,12 @@ fsl_ep_queue(struct usb_ep *_ep, struct usb_request 
 *_req, gfp_t gfp_flags)
   req-req.length, ep_is_in(ep)
   ? DMA_TO_DEVICE
   : DMA_FROM_DEVICE);
 - req-mapped = 1;
 - } else {
 + } else
   dma_sync_single_for_device(ep-udc-gadget.dev.parent,
   req-req.dma, req-req.length,
   ep_is_in(ep)
   ? DMA_TO_DEVICE
   : DMA_FROM_DEVICE);
 - req-mapped = 0;
 - }
  
   req-req.status = -EINPROGRESS;
   req-req.actual = 0;
 @@ -1306,7 +1302,6 @@ static int ep0_prime_status(struct fsl_udc *udc, int 
 direction)
   req-req.dma = dma_map_single(ep-udc-gadget.dev.parent,
   req-req.buf, req-req.length,
   ep_is_in(ep) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
 - req-mapped = 1;
  
   if (fsl_req_to_dtd(req, GFP_ATOMIC) == 0)
   fsl_queue_td(ep, req);
 @@ -1389,7 +1384,6 @@ static void ch9getstatus(struct fsl_udc *udc, u8 
 request_type, u16 value,
   req-req.dma = dma_map_single(ep-udc-gadget.dev.parent,
   req-req.buf, req-req.length,
   ep_is_in(ep) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
 - req-mapped = 1;
  
   /* prime the data phase */
   if ((fsl_req_to_dtd(req, GFP_ATOMIC) == 0))
 diff --git a/drivers/usb/gadget/fsl_usb2_udc.h 
 b/drivers/usb/gadget/fsl_usb2_udc.h
 index fbd77ba..9aab166 100644
 --- a/drivers/usb/gadget/fsl_usb2_udc.h
 +++ b/drivers/usb/gadget/fsl_usb2_udc.h
 @@ -436,7 +436,6 @@ struct fsl_req {
   /* ep_queue() func will add
  a request-queue into a udc_ep-queue 'd tail */
   struct fsl_ep *ep;
 - unsigned mapped:1;
  
   struct ep_td_struct *head, *tail;   /* For dTD List
  cpu endian Virtual addr */
 -- 
 1.7.11.4
 

-- 
balbi


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH -V7 04/12] arch/powerpc: Convert virtual address to vpn

2012-09-05 Thread Aneesh Kumar K.V
Paul Mackerras pau...@samba.org writes:

 On Tue, Sep 04, 2012 at 02:31:21PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This patch convert different functions to take virtual page number
 instead of virtual address. Virtual page number is virtual address
 shifted right by VPN_SHIFT (12) bits. This enable us to have an
 address range of upto 76 bits.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 A few comments below...

 diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
 b/arch/powerpc/include/asm/mmu-hash64.h
 index 1c65a59..d3a1139 100644
 --- a/arch/powerpc/include/asm/mmu-hash64.h
 +++ b/arch/powerpc/include/asm/mmu-hash64.h
 @@ -15,6 +15,10 @@
  #include asm/asm-compat.h
  #include asm/page.h
  
 +#ifndef __ASSEMBLY__
 +#include linux/bug.h
 +#endif
 +

 This is unnecessary, since you haven't added any BUG_ONs or WARN_ONs
 in this file.

 @@ -233,13 +276,19 @@ static inline unsigned long hpt_va(unsigned long ea, 
 unsigned long vsid,
  static inline unsigned long hpt_hash(unsigned long va, unsigned int shift,
   int ssize)
  {
 +int mask;
  unsigned long hash, vsid;
  
 +/* VPN_SHIFT can be atmost 12 */
  if (ssize == MMU_SEGSIZE_256M) {
 -hash = (va  28) ^ ((va  0x0fffUL)  shift);
 +mask = (1ul  (SID_SHIFT - VPN_SHIFT)) - 1;
 +hash = ((va  (SID_SHIFT - VPN_SHIFT))  0x007f) ^

 You have added the  0x007f part, which is unnecessary
 since the result is anded with that same value before being returned.

 +(((va  mask)  (shift - VPN_SHIFT))  0x);

 Similarly the  0x is completely redundant, since you have anded
 the va (really vpn) with the mask already.

  } else {
 -vsid = va  40;
 -hash = vsid ^ (vsid  25) ^ ((va  0xffUL)  shift);
 +mask = (1ul  (SID_SHIFT_1T - VPN_SHIFT)) - 1;
 +vsid = va  (SID_SHIFT_1T - VPN_SHIFT);
 +hash = (vsid  0xff) ^ ((vsid  25)  0x7f) ^

 Here the vsid  0xff is actually wrong, since the architecture
 says that this term takes the whole VSID, not just the bottom 24 bits.
 I realise you were copying what happened before, where the VSID was
 restricted to 24 bits anyway because the VA was restricted to 64 bits,
 but now that you are extending the VA size you need to include the
 whole VSID.

 As in the previous case, the  0x7f part is redundant.

Updated the patch with all the above changes.


 Also, it's very confusing to review this patch with variables called
 va containing vpns.  It would actually be easier to review if you
 combined this patch and the following one (that renames va to vpn).

One of the reason i avoided va - vpn rename is to avoid unnecessary
hunks. But then what i can do is to apply the name to those part of the
hunks which is being touched in this patch. Will that help. Or do you
want me to fold the two patches ?


 +(((va  mask)  (shift - VPN_SHIFT))  0xfff);

 Once again the  0xfff is redundant because the  mask has
 already removed any bits that the second and would remove.


done


Thanks for the review
-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V3 1/3] drivers/char/tpm: Add new device driver to support IBM vTPM

2012-09-05 Thread Kent Yoder
On Wed, Sep 05, 2012 at 01:40:07PM +1000, Benjamin Herrenschmidt wrote:
 On Wed, 2012-08-22 at 16:42 -0500, Kent Yoder wrote:
  On Wed, Aug 22, 2012 at 04:17:43PM -0500, Ashley Lai wrote:
   This patch adds a new device driver to support IBM virtual TPM
   (vTPM) for PPC64.  IBM vTPM is supported through the adjunct
   partition with firmware release 740 or higher.  With vTPM
   support, each lpar is able to have its own vTPM without the
   physical TPM hardware.
   
   This driver provides TPM functionalities by communicating with
   the vTPM adjunct partition through Hypervisor calls (Hcalls)
   and Command/Response Queue (CRQ) commands.
  
   Thanks Ashley, I'll include this in my next pull request to James.
 
 Oh ? I was about to put it in the powerpc tree ... But yeah, I notice
 there's a change to tpm.h so it probably should be at least acked by the
 TPM folks. As for the subsequent patches, they are powerpc specific so I
 can add them, I don't think there's a strict dependency, is there ?

  James did accept my pull request, so these are already in
security-next...

Kent

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V3 1/3] drivers/char/tpm: Add new device driver to support IBM vTPM

2012-09-05 Thread Ashley Lai
Hi Ben.,

Thank you so much for the comments.  Please see my response below.

   + */
   +static int tpm_ibmvtpm_recv(struct tpm_chip *chip, u8 *buf, size_t count)
   +{
   + struct ibmvtpm_dev *ibmvtpm;
   + u16 len;
   +
   + ibmvtpm = (struct ibmvtpm_dev *)chip-vendor.data;
   +
   + if (!ibmvtpm-rtce_buf) {
   + dev_err(ibmvtpm-dev, ibmvtpm device is not ready\n);
   + return 0;
   + }
   +
   + wait_event_interruptible(wq, ibmvtpm-crq_res.len != 0);
   +
   + if (count  ibmvtpm-crq_res.len) {
 
 That doesn't look right. The other side as far as I can tell is:
 
   + case VTPM_TPM_COMMAND_RES:
   + ibmvtpm-crq_res.valid = crq-valid;
   + ibmvtpm-crq_res.msg = crq-msg;
   + ibmvtpm-crq_res.len = crq-len;
   + ibmvtpm-crq_res.data = crq-data;
   + wake_up_interruptible(wq);
 
 That looks racy to me. At the very least it should be doing:
 
   ibmvtpm-crq_res.data = crq-data;
   smp_wmb();
   ibmvtpm-crq_res.len = crq-len;
 
 IE. Ensure that len is written last, and possibly have an
 corresponding smp_rmb() after wait_event_interruptible() in the receive
 case.

Good catch. I agreed len should be written last and adding memory
barrier is a good idea.

 
 Also, I dislike having a global symbol called wq. You also don't seem
 to have any synchronization on access to that wq, can't you end up with
 several clients trying to send messages  wait for responses getting all
 mixed up ? You might need to make sure that at least you do a
 wake_up_interruptible_all() to ensure you wake them all up (inefficient
 but works). Unless you can track per-client ?

The TPM layer above allows only one request at any given point in time.
Therefore we don't need to wake_up_interruptible_all().  I can move wq
to the private structure.

 
 Or is the above TPM layer making sure only one command/response happens
 at a given point in time ?

You are right.  See above.
 
 You also do an interruptible wait but don't seem to be checking for
 signals (and not testing the result from wait_event_interruptible which
 might be returning -ERESTARTSYS in this case).
 
Good point.  I will check for signal in the next version.

 That all sound a bit wrong to me ...
   +static irqreturn_t ibmvtpm_interrupt(int irq, void *vtpm_instance)
   +{
   + struct ibmvtpm_dev *ibmvtpm = (struct ibmvtpm_dev *) vtpm_instance;
   + unsigned long flags;
   +
   + spin_lock_irqsave(ibmvtpm-lock, flags);
   + vio_disable_interrupts(ibmvtpm-vdev);
   + tasklet_schedule(ibmvtpm-tasklet);
   + spin_unlock_irqrestore(ibmvtpm-lock, flags);
   +
   + return IRQ_HANDLED;
   +}
 
 Tasklets ? We still use those things ? That's softirq iirc, do you
 really need to offload ? I mean, it allows to limit the amount of time
 an interrupt is disabled, but that's only useful if you assume you'll
 have quite a lot of traffic here, is that the case ?
 
 You might be actually increasing latency here in favor of throughput, is
 that what you are aiming for ? Also keep in mind that
 vio_disable/enable_interrupt() being an hcall, it can have significant
 overhead. So again, only worth it if you're going to process a bulk of
 data at a time.

My original thought was to have the bottom half process the crq data
while the top half can service another request.  There is some work in
processing crq data but not that bad. You are right the
vio_disable/enable_interrupt() can have significant overhead.  I will
remove tasklet in the next version.

 
 Cheers,
 Ben.
 
 

Thanks,
--Ashley Lai


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH V3 1/3] drivers/char/tpm: Add new device driver to support IBM vTPM

2012-09-05 Thread Benjamin Herrenschmidt
On Wed, 2012-09-05 at 10:46 -0500, Kent Yoder wrote:
 On Wed, Sep 05, 2012 at 01:40:07PM +1000, Benjamin Herrenschmidt wrote:
  On Wed, 2012-08-22 at 16:42 -0500, Kent Yoder wrote:
   On Wed, Aug 22, 2012 at 04:17:43PM -0500, Ashley Lai wrote:
This patch adds a new device driver to support IBM virtual TPM
(vTPM) for PPC64.  IBM vTPM is supported through the adjunct
partition with firmware release 740 or higher.  With vTPM
support, each lpar is able to have its own vTPM without the
physical TPM hardware.

This driver provides TPM functionalities by communicating with
the vTPM adjunct partition through Hypervisor calls (Hcalls)
and Command/Response Queue (CRQ) commands.
   
Thanks Ashley, I'll include this in my next pull request to James.
  
  Oh ? I was about to put it in the powerpc tree ... But yeah, I notice
  there's a change to tpm.h so it probably should be at least acked by the
  TPM folks. As for the subsequent patches, they are powerpc specific so I
  can add them, I don't think there's a strict dependency, is there ?
 
   James did accept my pull request, so these are already in
 security-next...

For the driver itself, it's not a big issue (though I did found issue
while reviewing it so it will need another round of updates). For the
code that changes arch/powerpc, especially prom_init.c, that stuff must
at the very least be acked by me (or the acting powerpc person if I'm
away) if it's going to go via a different tree.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] powerpc: fix personality handling in ppc64_personality()

2012-09-05 Thread Benjamin Herrenschmidt
On Wed, 2012-09-05 at 10:56 +0200, Jiri Kosina wrote:

 Hi Benjamin,
 
 actually commit 7256a5d2da56 seems to contain the correct PER_LINUX 
 handling, so seems like you picked the right one :)
 

Odd, they looked different around the use of PER_MASK when I looked but
I was tired  jet lagged, so I might have just had a brain fail...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [git pull] Please pull powerpc.git merge branch

2012-09-05 Thread Benjamin Herrenschmidt
On Wed, 2012-09-05 at 16:12 +1000, Benjamin Herrenschmidt wrote:
 The following changes since commit
 5b716ac728bcc01b1f2a7ed6e437196602237c27:
 
   Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
 (2012-09-02 11:30:10 -0700)
 
 are available in the git repository at:
 
 
   git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git 

Little glitch, the mirrors hadn't caught up yet when I use git
request-pull so it didn't append the branch name. It's merge as the
subject implies.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V7 11/12] arch/powerpc: Add 64TB support

2012-09-05 Thread Paul Mackerras
On Tue, Sep 04, 2012 at 02:31:28PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Increase max addressable range to 64TB. This is not tested on
 real hardware yet.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Reviewed-by: Paul Mackerras pau...@samba.org
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V7 07/12] arch/powerpc: Increase the slice range to 64TB

2012-09-05 Thread Paul Mackerras
On Tue, Sep 04, 2012 at 02:31:24PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This patch makes the high psizes mask as an unsigned char array
 so that we can have more than 16TB. Currently we support upto
 64TB
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Reviewed-by: Paul Mackerras pau...@samba.org
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V7 12/12] arch/powerpc: Update VSID allocation documentation

2012-09-05 Thread Paul Mackerras
On Tue, Sep 04, 2012 at 02:31:29PM +0530, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 This update the proto-VSID and VSID scramble related information
 to be more generic by using names instead of current values.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Reviewed-by: Paul Mackerras pau...@samba.org
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH -V7 04/12] arch/powerpc: Convert virtual address to vpn

2012-09-05 Thread Paul Mackerras
On Wed, Sep 05, 2012 at 08:15:11PM +0530, Aneesh Kumar K.V wrote:
 
 One of the reason i avoided va - vpn rename is to avoid unnecessary
 hunks. But then what i can do is to apply the name to those part of the
 hunks which is being touched in this patch. Will that help. Or do you
 want me to fold the two patches ?

Either would do, whichever is easier for you.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [powerpc:next 11/29] arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit declaration of function 'memblock_end_of_DRAM'

2012-09-05 Thread Benjamin Herrenschmidt
On Thu, 2012-09-06 at 10:20 +0800, Fengguang Wu wrote:
 Hi Michael,
 
 FYI, kernel build failed on
 
 tree:   git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git next
 head:   8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad
 commit: 474e3d569b63f7275cfec072d7ef7b2ffb8904c8 [11/29] powerpc/pseries: 
 Remove uses of abs_to_virt() and virt_to_abs()
 config: powerpc-allmodconfig (attached as .config)
 
 All related error/warning messages:
 
 arch/powerpc/platforms/pseries/iommu.c: In function 'enable_ddw':
 arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit declaration of 
 function 'memblock_end_of_DRAM' [-Werror=implicit-function-declaration]
 cc1: some warnings being treated as errors

Thanks. That seems to be building in my tree oddly, I didn't even have a
warning. Maybe some conditional inclusion ? I will add a commit that
adds an explicit #include of memblock.h

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [powerpc:next 11/29] arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit declaration of function 'memblock_end_of_DRAM'

2012-09-05 Thread Benjamin Herrenschmidt
On Thu, 2012-09-06 at 12:44 +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2012-09-06 at 10:20 +0800, Fengguang Wu wrote:
  Hi Michael,
  
  FYI, kernel build failed on
  
  tree:   git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git next
  head:   8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad
  commit: 474e3d569b63f7275cfec072d7ef7b2ffb8904c8 [11/29] powerpc/pseries: 
  Remove uses of abs_to_virt() and virt_to_abs()
  config: powerpc-allmodconfig (attached as .config)
  
  All related error/warning messages:
  
  arch/powerpc/platforms/pseries/iommu.c: In function 'enable_ddw':
  arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit declaration 
  of function 'memblock_end_of_DRAM' [-Werror=implicit-function-declaration]
  cc1: some warnings being treated as errors
 
 Thanks. That seems to be building in my tree oddly, I didn't even have a
 warning. Maybe some conditional inclusion ? I will add a commit that
 adds an explicit #include of memblock.h

Ok, I see. It's a subtle bisection breakage, the next commit fixes it.
Oh well, I won't rebase for that, but thanks for the heads up.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [powerpc:next 24/29] drivers/atm/fore200e.h:263:3: error: redefinition of typedef 'opcode_t' with different type

2012-09-05 Thread Benjamin Herrenschmidt
On Thu, 2012-09-06 at 10:19 +0800, Fengguang Wu wrote:
 Hi Ananth,
 
 FYI, kernel build failed on
 
 tree:   git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git next
 head:   8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad
 commit: 8b7b80b9ebb46dd88fbb94e918297295cf312b59 [24/29] powerpc: Uprobes 
 port to powerpc
 config: powerpc-allmodconfig (attached as .config)
 
 All related error/warning messages:
 
 In file included from drivers/atm/fore200e.c:70:0:
 drivers/atm/fore200e.h:263:3: error: redefinition of typedef 'opcode_t' with 
 different type
 arch/powerpc/include/asm/probes.h:25:13: note: previous declaration of 
 'opcode_t' was here

This is a bit more annoying. Ananth, do we need that to be called
opcode_t for generic reasons or can we make it ppc_opcode_t ? If it has
to remain, I suppose we can try to change that ATM driver to use a
different type name...

(CC'ing Dave and Meelis who from the git history *might* have HW access
to test a possible patch).

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [powerpc:next 11/29] arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit declaration of function 'memblock_end_of_DRAM'

2012-09-05 Thread Fengguang Wu
On Thu, Sep 06, 2012 at 12:49:14PM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2012-09-06 at 12:44 +1000, Benjamin Herrenschmidt wrote:
  On Thu, 2012-09-06 at 10:20 +0800, Fengguang Wu wrote:
   Hi Michael,
   
   FYI, kernel build failed on
   
   tree:   git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git 
   next
   head:   8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad
   commit: 474e3d569b63f7275cfec072d7ef7b2ffb8904c8 [11/29] powerpc/pseries: 
   Remove uses of abs_to_virt() and virt_to_abs()
   config: powerpc-allmodconfig (attached as .config)
   
   All related error/warning messages:
   
   arch/powerpc/platforms/pseries/iommu.c: In function 'enable_ddw':
   arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit 
   declaration of function 'memblock_end_of_DRAM' 
   [-Werror=implicit-function-declaration]
   cc1: some warnings being treated as errors
  
  Thanks. That seems to be building in my tree oddly, I didn't even have a
  warning. Maybe some conditional inclusion ? I will add a commit that
  adds an explicit #include of memblock.h
 
 Ok, I see. It's a subtle bisection breakage, the next commit fixes it.

Yes, I'm doing bisectibility tests :)

 Oh well, I won't rebase for that, but thanks for the heads up.

No problem. Sorry I didn't know that. Will test HEAD commits only for
this branch in future.

Thanks,
Fengguang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [powerpc:next 11/29] arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit declaration of function 'memblock_end_of_DRAM'

2012-09-05 Thread Benjamin Herrenschmidt
On Thu, 2012-09-06 at 11:03 +0800, Fengguang Wu wrote:
 
 No problem. Sorry I didn't know that. Will test HEAD commits only for
 this branch in future.

Actually, I'd rather you continue doing bisection tests, it's good, I
can whack on the head of people who submit stuff with breakage.

In fact, next time around, I'll try to be more pro-active at putting
things in my rebase-able test branch so you get a chance to run your
tests on it and we can find  fix these before it hits next. I'll let
you know when I do it.

In the meantime, maybe just blacklist those commits ?

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [powerpc:next 11/29] arch/powerpc/platforms/pseries/iommu.c:1064:2: error: implicit declaration of function 'memblock_end_of_DRAM'

2012-09-05 Thread Fengguang Wu
On Thu, Sep 06, 2012 at 01:05:56PM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2012-09-06 at 11:03 +0800, Fengguang Wu wrote:
  
  No problem. Sorry I didn't know that. Will test HEAD commits only for
  this branch in future.
 
 Actually, I'd rather you continue doing bisection tests, it's good, I
 can whack on the head of people who submit stuff with breakage.

OK.

 In fact, next time around, I'll try to be more pro-active at putting
 things in my rebase-able test branch so you get a chance to run your
 tests on it and we can find  fix these before it hits next. I'll let
 you know when I do it.

That would be great.

 In the meantime, maybe just blacklist those commits ?

Should not necessary if I understand you right. FYI the build system
will auto ignore all notified build errors and skip compiled commits.


Thanks,
Fengguang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


linux-next: manual merge of the trivial tree with the powerpc tree

2012-09-05 Thread Stephen Rothwell
Hi Jiri,

Today's linux-next merge of the trivial tree got a conflict in
drivers/scsi/ipr.c between commit d3dbeef657fd (powerpc: Rename 64-bit
PVR constants to PVR_foo) from the powerpc tree and commit 203fa3fe9c9d
(ipr: fix small coding style issues) from the trivial tree.

Just context changes.  I fixed it up (see below) and can carry the fix as
necessary (no action required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/scsi/ipr.c
index cff6503,1059c99..000
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@@ -6398,8 -6330,8 +6398,8 @@@ static int ipr_invalid_adapter(struct i
int i;
  
if ((ioa_cfg-type == 0x5702)  (ioa_cfg-pdev-revision  4)) {
-   for (i = 0; i  ARRAY_SIZE(ipr_blocked_processors); i++){
+   for (i = 0; i  ARRAY_SIZE(ipr_blocked_processors); i++) {
 -  if (__is_processor(ipr_blocked_processors[i]))
 +  if (pvr_version_is(ipr_blocked_processors[i]))
return 1;
}
}


pgpIU9L0ZICrl.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [powerpc:next 24/29] drivers/atm/fore200e.h:263:3: error: redefinition of typedef 'opcode_t' with different type

2012-09-05 Thread David Miller
From: Benjamin Herrenschmidt b...@kernel.crashing.org
Date: Thu, 06 Sep 2012 12:56:12 +1000

 (CC'ing Dave and Meelis who from the git history *might* have HW access
 to test a possible patch).

Hardware isn't necessary, just make sure the resulting binary is
identical both before and after the change.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] [v2] sata_fsl: add workaround for data length mismatch on freescale V2 controller

2012-09-05 Thread Shaohui Xie
The freescale V2 SATA controller checks if the received data length matches
the programmed length 'ttl', if not, it assumes that this is an error.
In ATAPI, the 'ttl' is based on max allocation length and not the actual
data transfer length, controller will raise 'DLM' (Data length Mismatch)
error bit in Hstatus register. Along with 'DLM', DE (Device error) and
FE (fatal Error) bits are also set in Hstatus register, 'E' (Internal Error)
bit is set in Serror register and CE (Command Error) and DE (Device error)
registers have the corresponding bit set. In this condition, we need to
clear errors in following way: in the service routine, based on 'DLM' flag,
HCONTROL[27] operation clears Hstatus, CE and DE registers, clear Serror
register.

Signed-off-by: Shaohui Xie shaohui@freescale.com
Signed-off-by: Anju Bhartiya anju.bhart...@freescale.com
---
changes for V2:
1. remove the using of quirk;
2. wrap errata codes in condition;

 drivers/ata/sata_fsl.c |   40 +++-
 1 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/ata/sata_fsl.c b/drivers/ata/sata_fsl.c
index d6577b9..6b7b73e 100644
--- a/drivers/ata/sata_fsl.c
+++ b/drivers/ata/sata_fsl.c
@@ -143,6 +143,7 @@ enum {
FATAL_ERR_CRC_ERR_RX |
FATAL_ERR_FIFO_OVRFL_TX | FATAL_ERR_FIFO_OVRFL_RX,
 
+   INT_ON_DATA_LENGTH_MISMATCH = (1  12),
INT_ON_FATAL_ERR = (1  5),
INT_ON_PHYRDY_CHG = (1  4),
 
@@ -1180,26 +1181,55 @@ static void sata_fsl_host_intr(struct ata_port *ap)
void __iomem *hcr_base = host_priv-hcr_base;
u32 hstatus, done_mask = 0;
struct ata_queued_cmd *qc;
-   u32 SError;
+   u32 SError, tag;
+   u32 status_mask = INT_ON_ERROR;
 
hstatus = ioread32(hcr_base + HSTATUS);
 
sata_fsl_scr_read(ap-link, SCR_ERROR, SError);
 
+   /* Read command completed register */
+   done_mask = ioread32(hcr_base + CC);
+
+   /* Workaround for data length mismatch errata */
+   if (unlikely(hstatus  INT_ON_DATA_LENGTH_MISMATCH)) {
+   for (tag = 0; tag  ATA_MAX_QUEUE; tag++) {
+   qc = ata_qc_from_tag(ap, tag);
+   if (qc  ata_is_atapi(qc-tf.protocol)) {
+   u32 Hcontrol;
+#define HCONTROL_CLEAR_ERROR   (1  27)
+   /* Set HControl[27] to clear error registers */
+   Hcontrol = ioread32(hcr_base + HCONTROL);
+   iowrite32(Hcontrol | HCONTROL_CLEAR_ERROR,
+   hcr_base + HCONTROL);
+
+   /* Clear HControl[27] */
+   iowrite32(Hcontrol  (~HCONTROL_CLEAR_ERROR),
+   hcr_base + HCONTROL);
+
+   /* Clear SError[E] bit */
+   sata_fsl_scr_write(ap-link, SCR_ERROR,
+   SError);
+
+   /* Ignore fatal error and device error */
+   status_mask = ~(INT_ON_SINGL_DEVICE_ERR
+   | INT_ON_FATAL_ERR);
+   break;
+   }
+   }
+   }
+
if (unlikely(SError  0x)) {
DPRINTK(serror @host_intr : 0x%x\n, SError);
sata_fsl_error_intr(ap);
}
 
-   if (unlikely(hstatus  INT_ON_ERROR)) {
+   if (unlikely(hstatus  status_mask)) {
DPRINTK(error interrupt!!\n);
sata_fsl_error_intr(ap);
return;
}
 
-   /* Read command completed register */
-   done_mask = ioread32(hcr_base + CC);
-
VPRINTK(Status of all queues :\n);
VPRINTK(done_mask/CC = 0x%x, CA = 0x%x, CE=0x%x,CQ=0x%x,apqa=0x%x\n,
done_mask,
-- 
1.6.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/5] powerpc: Pack arch_hw_breakpoint to avoid holes in struct

2012-09-05 Thread Michael Neuling
No functional change

Signed-off-by: Michael Neuling mi...@neuling.org
---
 arch/powerpc/include/asm/hw_breakpoint.h |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index be04330..39b323e 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -27,10 +27,10 @@
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 
 struct arch_hw_breakpoint {
-   boolextraneous_interrupt;
-   u8  len; /* length of the target data symbol */
-   int type;
unsigned long   address;
+   int type;
+   u8  len; /* length of the target data symbol */
+   boolextraneous_interrupt;
 };
 
 #include linux/kdebug.h
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/5] powerpc: Use consistent name info for arch_hw_breakpoint

2012-09-05 Thread Michael Neuling
Change bp_info to info to be consistent with the rest of this file.

Signed-off-by: Michael Neuling mi...@neuling.org
---
 arch/powerpc/kernel/hw_breakpoint.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 956a4c4..6767445 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -294,7 +294,7 @@ int __kprobes single_step_dabr_instruction(struct die_args 
*args)
 {
struct pt_regs *regs = args-regs;
struct perf_event *bp = NULL;
-   struct arch_hw_breakpoint *bp_info;
+   struct arch_hw_breakpoint *info;
 
bp = current-thread.last_hit_ubp;
/*
@@ -304,16 +304,16 @@ int __kprobes single_step_dabr_instruction(struct 
die_args *args)
if (!bp)
return NOTIFY_DONE;
 
-   bp_info = counter_arch_bp(bp);
+   info = counter_arch_bp(bp);
 
/*
 * We shall invoke the user-defined callback function in the single
 * stepping handler to confirm to 'trigger-after-execute' semantics
 */
-   if (!bp_info-extraneous_interrupt)
+   if (!info-extraneous_interrupt)
perf_bp_event(bp, regs);
 
-   set_dabr(bp_info-address | bp_info-type | DABR_TRANSLATION);
+   set_dabr(info-address | info-type | DABR_TRANSLATION);
current-thread.last_hit_ubp = NULL;
 
/*
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 3/5] powerpc: Use the XDABR hcall

2012-09-05 Thread Michael Neuling
We never use the XDABR hcall since we check for DABR hcall first.
XDABR syscall is better since it allows us to also set the DABRX.

Signed-off-by: Michael Neuling mi...@neuling.org
---
 arch/powerpc/platforms/pseries/setup.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 51ecac9..36b7744 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -529,10 +529,10 @@ static void __init pSeries_init_early(void)
if (firmware_has_feature(FW_FEATURE_LPAR))
hvc_vio_init_early();
 #endif
-   if (firmware_has_feature(FW_FEATURE_DABR))
-   ppc_md.set_dabr = pseries_set_dabr;
-   else if (firmware_has_feature(FW_FEATURE_XDABR))
+   if (firmware_has_feature(FW_FEATURE_XDABR))
ppc_md.set_dabr = pseries_set_xdabr;
+   else if (firmware_has_feature(FW_FEATURE_DABR))
+   ppc_md.set_dabr = pseries_set_dabr;
 
pSeries_cmo_feature_init();
iommu_init_early_pSeries();
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 4/5] powerpc: Rework set_dabr so it can take a DABRX value as well

2012-09-05 Thread Michael Neuling
Rework set_dabr to take a DABRX value as well. We are not actually
changing any functionality at this stage, just preparing for that.

The SET_XDABR hcall checks to make sure DABRX is non-zero, and if it
is it barfs.  So in this case when we are clearing both DABR and
DABRX, just set 1 bit to make sure the hcall doesn't fail.

Signed-off-by: Michael Neuling mi...@neuling.org
---
 arch/powerpc/include/asm/debug.h |2 +-
 arch/powerpc/include/asm/hw_breakpoint.h |2 +-
 arch/powerpc/include/asm/machdep.h   |3 ++-
 arch/powerpc/include/asm/processor.h |1 +
 arch/powerpc/include/asm/reg.h   |3 +++
 arch/powerpc/kernel/hw_breakpoint.c  |   12 ++--
 arch/powerpc/kernel/process.c|   14 +++---
 arch/powerpc/kernel/ptrace.c |3 +++
 arch/powerpc/kernel/signal.c |2 +-
 arch/powerpc/platforms/cell/beat.c   |4 ++--
 arch/powerpc/platforms/cell/beat.h   |2 +-
 arch/powerpc/platforms/ps3/setup.c   |6 ++
 arch/powerpc/platforms/pseries/setup.c   |9 ++---
 arch/powerpc/xmon/xmon.c |4 ++--
 14 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index 716d2f0..32de257 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -44,7 +44,7 @@ static inline int debugger_dabr_match(struct pt_regs *regs) { 
return 0; }
 static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
 #endif
 
-extern int set_dabr(unsigned long dabr);
+extern int set_dabr(unsigned long dabr, unsigned long dabrx);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
 extern void do_send_trap(struct pt_regs *regs, unsigned long address,
 unsigned long error_code, int signal_code, int brkpt);
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 39b323e..c6f48eb 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -61,7 +61,7 @@ extern void ptrace_triggered(struct perf_event *bp,
struct perf_sample_data *data, struct pt_regs *regs);
 static inline void hw_breakpoint_disable(void)
 {
-   set_dabr(0);
+   set_dabr(0, 0);
 }
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 42ce570..236b477 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -180,7 +180,8 @@ struct machdep_calls {
void(*enable_pmcs)(void);
 
/* Set DABR for this platform, leave empty for default implemenation */
-   int (*set_dabr)(unsigned long dabr);
+   int (*set_dabr)(unsigned long dabr,
+   unsigned long dabrx);
 
 #ifdef CONFIG_PPC32/* XXX for now */
/* A general init function, called by ppc_init in init/main.c.
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 54b73a2..17b58e5 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -219,6 +219,7 @@ struct thread_struct {
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif
unsigned long   dabr;   /* Data address breakpoint register */
+   unsigned long   dabrx;  /*  ... extension  */
 #ifdef CONFIG_ALTIVEC
/* Complete AltiVec register set */
vector128   vr[32] __attribute__((aligned(16)));
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 6386086..334be34 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -208,6 +208,9 @@
 #define SPRN_DABRX 0x3F7   /* Data Address Breakpoint Register Extension */
 #define   DABRX_USER   (1UL  0)
 #define   DABRX_KERNEL (1UL  1)
+#define   DABRX_HYP(1UL  2)
+#define   DABRX_BTI(1UL  3)
+#define   DABRX_ALL (DABRX_BTI | DABRX_HYP | DABRX_KERNEL | DABRX_USER)
 #define SPRN_DAR   0x013   /* Data Address Register */
 #define SPRN_DBCR  0x136   /* e300 Data Breakpoint Control Reg */
 #define SPRN_DSISR 0x012   /* Data Storage Interrupt Status Register */
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 6767445..6891d79 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -73,7 +73,7 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
 * If so, DABR will be populated in single_step_dabr_instruction().
 */
if (current-thread.last_hit_ubp != bp)
-   set_dabr(info-address | info-type | DABR_TRANSLATION);
+   set_dabr(info-address | info-type | DABR_TRANSLATION, 
DABRX_ALL);
 
return 0;
 }
@@ -97,7 +97,7 @@ void arch_uninstall_hw_breakpoint(struct perf_event *bp)
  

[PATCH 5/5] powerpc: Dynamically calculate the dabrx based on kernel/user/hypervisor

2012-09-05 Thread Michael Neuling
Currently we mark the DABRX to interrupt on all matches
(hypervisor/kernel/user and then filter in software.  We can be a lot
smarter now that we can set the DABRX dynamically.

This sets the DABRX based on the flags passed by the user.

Signed-off-by: Michael Neuling mi...@neuling.org
---
 arch/powerpc/include/asm/hw_breakpoint.h |1 +
 arch/powerpc/kernel/hw_breakpoint.c  |   15 +++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index c6f48eb..4234245 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -28,6 +28,7 @@
 
 struct arch_hw_breakpoint {
unsigned long   address;
+   unsigned long   dabrx;
int type;
u8  len; /* length of the target data symbol */
boolextraneous_interrupt;
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 6891d79..a89cae4 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -73,7 +73,7 @@ int arch_install_hw_breakpoint(struct perf_event *bp)
 * If so, DABR will be populated in single_step_dabr_instruction().
 */
if (current-thread.last_hit_ubp != bp)
-   set_dabr(info-address | info-type | DABR_TRANSLATION, 
DABRX_ALL);
+   set_dabr(info-address | info-type | DABR_TRANSLATION, 
info-dabrx);
 
return 0;
 }
@@ -170,6 +170,13 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
 
info-address = bp-attr.bp_addr;
info-len = bp-attr.bp_len;
+   info-dabrx = DABRX_ALL;
+   if (bp-attr.exclude_user)
+   info-dabrx = ~DABRX_USER;
+   if (bp-attr.exclude_kernel)
+   info-dabrx = ~DABRX_KERNEL;
+   if (bp-attr.exclude_hv)
+   info-dabrx = ~DABRX_HYP;
 
/*
 * Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8)
@@ -197,7 +204,7 @@ void thread_change_pc(struct task_struct *tsk, struct 
pt_regs *regs)
 
info = counter_arch_bp(tsk-thread.last_hit_ubp);
regs-msr = ~MSR_SE;
-   set_dabr(info-address | info-type | DABR_TRANSLATION, DABRX_ALL);
+   set_dabr(info-address | info-type | DABR_TRANSLATION, info-dabrx);
tsk-thread.last_hit_ubp = NULL;
 }
 
@@ -281,7 +288,7 @@ int __kprobes hw_breakpoint_handler(struct die_args *args)
if (!info-extraneous_interrupt)
perf_bp_event(bp, regs);
 
-   set_dabr(info-address | info-type | DABR_TRANSLATION, DABRX_ALL);
+   set_dabr(info-address | info-type | DABR_TRANSLATION, info-dabrx);
 out:
rcu_read_unlock();
return rc;
@@ -313,7 +320,7 @@ int __kprobes single_step_dabr_instruction(struct die_args 
*args)
if (!info-extraneous_interrupt)
perf_bp_event(bp, regs);
 
-   set_dabr(info-address | info-type | DABR_TRANSLATION, DABRX_ALL);
+   set_dabr(info-address | info-type | DABR_TRANSLATION, info-dabrx);
current-thread.last_hit_ubp = NULL;
 
/*
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 5/5] powerpc: Dynamically calculate the dabrx based on kernel/user/hypervisor

2012-09-05 Thread Michael Neuling
Michael Neuling mi...@neuling.org wrote:

 Currently we mark the DABRX to interrupt on all matches
 (hypervisor/kernel/user and then filter in software.  We can be a lot
 smarter now that we can set the DABRX dynamically.
 
 This sets the DABRX based on the flags passed by the user.
 
 Signed-off-by: Michael Neuling mi...@neuling.org


For what it's worth, I have this test case now:

  http://neuling.org/devel/junkcode/hw_brk_test.c

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 4/5] powerpc: Rework set_dabr so it can take a DABRX value as well

2012-09-05 Thread Geert Uytterhoeven
On Thu, Sep 6, 2012 at 7:17 AM, Michael Neuling mi...@neuling.org wrote:
 Rework set_dabr to take a DABRX value as well. We are not actually
 changing any functionality at this stage, just preparing for that.

You are changing functionality.

  #define   DABRX_USER   (1UL  0)
  #define   DABRX_KERNEL (1UL  1)
 +#define   DABRX_HYP(1UL  2)
 +#define   DABRX_BTI(1UL  3)
 +#define   DABRX_ALL (DABRX_BTI | DABRX_HYP | DABRX_KERNEL | DABRX_USER)

 --- a/arch/powerpc/platforms/cell/beat.c
 +++ b/arch/powerpc/platforms/cell/beat.c
 @@ -136,9 +136,9 @@ ssize_t beat_nvram_get_size(void)
 return BEAT_NVRAM_SIZE;
  }

 -int beat_set_xdabr(unsigned long dabr)
 +int beat_set_xdabr(unsigned long dabr, unsigned long dabrx)
  {
 -   if (beat_set_dabr(dabr, DABRX_KERNEL | DABRX_USER))
 +   if (beat_set_dabr(dabr, dabrx))
 return -1;
 return 0;
  }

 --- a/arch/powerpc/platforms/ps3/setup.c
 +++ b/arch/powerpc/platforms/ps3/setup.c
 @@ -184,11 +184,9 @@ early_param(ps3flash, early_parse_ps3flash);
  #define prealloc_ps3flash_bounce_buffer()  do { } while (0)
  #endif

 -static int ps3_set_dabr(unsigned long dabr)
 +static int ps3_set_dabr(unsigned long dabr, unsigned long dabrx)
  {
 -   enum {DABR_USER = 1, DABR_KERNEL = 2,};
 -
 -   return lv1_set_dabr(dabr, DABR_KERNEL | DABR_USER) ? -1 : 0;
 +   return lv1_set_dabr(dabr, dabrx) ? -1 : 0;
  }

 -   set_dabr(dabr.address | (dabr.enabled  7));
 +   set_dabr(dabr.address | (dabr.enabled  7), DABRX_ALL);

Before, beat_set_dabr() and lv1_set_dabr() would have been called with dabrx = 3
(DABRX_KERNEL | DABRX_USER). Now they're called with dabrx = 15
(DABRX_ALL = DABRX_BTI | DABRX_HYP | DABRX_KERNEL | DABRX_USER).

No idea what's the impact of this...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev