Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore

2014-02-24 Thread Deepthi Dharwar
On 02/22/2014 05:44 AM, Cody P Schafer wrote:
> /sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP
> in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does
> nothing. Add a pr_warn() to convince any users that they should stop
> using it.
> 
> The commit message from the removing commit notes that this
> functionality should move into the cpuidle driver, essentially by
> adjusting target_residency to the specified value. At the moment,
> target_residency is not exposed by cpuidle's sysfs, so there isn't a
> drop in replacement for this.
> 
> Signed-off-by: Cody P Schafer 



smt-snooze-delay was used to delay an entry into NAP state
or disable NAP state completely. This was before we adopted cpuidle
framework for idle state management on powerpc. This is per-cpu based
tunable, where we could have cores with  different target residencies
and idle states.

Now that we have moved towards cpuidle framework, which provides a
better way of idle state management and this framework expects a single
target residency for all the cpus. We can no longer honour
smt-snooze-delay functionality of providing per-cpu target residency.
This was badly broken in the kernel before the patch to clean it up.
By removing this we would honour cpuidle framework through which we
carry out idle state management.

And generic cpuidle framework does not provide the flexibility to change
target residency on the go as there are multiple idle states supported
and trying to change target residency of one state (incorrectly) may
result in undefined behavior.

Also, the second functionality to disable/enable states can be done
using the cpuidle sysfs files. So this is functionality is preserved.

We currently do not use smt-snooze-delay in the kernel.
The sysfs entries needs to  be retained until we do a clean up ppc64_cpu
util that uses these entries to determine SMT,
clean up patch for this has already been posted out by Prerna.
Once, we have the ppc64_cpu changes in, we can look to clean up these
parts from the kernel.

Regards,
Deepthi





> ---
>  arch/powerpc/kernel/sysfs.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 97e1dc9..84097b4 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -50,6 +50,9 @@ static ssize_t store_smt_snooze_delay(struct device *dev,
>   if (ret != 1)
>   return -EINVAL;
>  
> + pr_warn_ratelimited("%s (%d): 
> /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +   current->comm, task_pid_nr(current), cpu->dev.id);
> +
>   per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
>   return count;
>  }
> @@ -60,6 +63,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>  {
>   struct cpu *cpu = container_of(dev, struct cpu, dev);
>  
> + pr_warn_ratelimited("%s (%d): 
> /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +   current->comm, task_pid_nr(current), cpu->dev.id);
> +
>   return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
>  }
>  
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/5] powerpc/powernv: Refactor PHB diag-data dump

2014-02-24 Thread Gavin Shan
As Ben suggested, the patch prints PHB diag-data with multiple
fields in one line and omits the line if the fields of that
line are all zero.

With the patch applied, the PHB3 diag-data dump looks like:

PHB3 PHB#3 Diag-data (Version: 1)

  brdgCtl: 0002
  RootSts: 000f 0040 b0830008 00100147 2000
  nFir: 0030006e 
  PhbSts:  001c 
  Lem: 0010 42498e327f502eae 
  InAErr:  8000 8000 04020300 \
   
  PE[  8] A/B: 8480002b 8000

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci.c |  220 +++---
 1 file changed, 125 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 3955fc0..114e1a7 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -134,57 +134,72 @@ static void pnv_pci_dump_p7ioc_diag_data(struct 
pci_controller *hose,
pr_info("P7IOC PHB#%d Diag-data (Version: %d)\n\n",
hose->global_number, common->version);
 
-   pr_info("  brdgCtl:  %08x\n", data->brdgCtl);
-
-   pr_info("  portStatusReg:%08x\n", data->portStatusReg);
-   pr_info("  rootCmplxStatus:  %08x\n", data->rootCmplxStatus);
-   pr_info("  busAgentStatus:   %08x\n", data->busAgentStatus);
-
-   pr_info("  deviceStatus: %08x\n", data->deviceStatus);
-   pr_info("  slotStatus:   %08x\n", data->slotStatus);
-   pr_info("  linkStatus:   %08x\n", data->linkStatus);
-   pr_info("  devCmdStatus: %08x\n", data->devCmdStatus);
-   pr_info("  devSecStatus: %08x\n", data->devSecStatus);
-
-   pr_info("  rootErrorStatus:  %08x\n", data->rootErrorStatus);
-   pr_info("  uncorrErrorStatus:%08x\n", data->uncorrErrorStatus);
-   pr_info("  corrErrorStatus:  %08x\n", data->corrErrorStatus);
-   pr_info("  tlpHdr1:  %08x\n", data->tlpHdr1);
-   pr_info("  tlpHdr2:  %08x\n", data->tlpHdr2);
-   pr_info("  tlpHdr3:  %08x\n", data->tlpHdr3);
-   pr_info("  tlpHdr4:  %08x\n", data->tlpHdr4);
-   pr_info("  sourceId: %08x\n", data->sourceId);
-   pr_info("  errorClass:   %016llx\n", data->errorClass);
-   pr_info("  correlator:   %016llx\n", data->correlator);
-   pr_info("  p7iocPlssr:   %016llx\n", data->p7iocPlssr);
-   pr_info("  p7iocCsr: %016llx\n", data->p7iocCsr);
-   pr_info("  lemFir:   %016llx\n", data->lemFir);
-   pr_info("  lemErrorMask: %016llx\n", data->lemErrorMask);
-   pr_info("  lemWOF:   %016llx\n", data->lemWOF);
-   pr_info("  phbErrorStatus:   %016llx\n", data->phbErrorStatus);
-   pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
-   pr_info("  phbErrorLog0: %016llx\n", data->phbErrorLog0);
-   pr_info("  phbErrorLog1: %016llx\n", data->phbErrorLog1);
-   pr_info("  mmioErrorStatus:  %016llx\n", data->mmioErrorStatus);
-   pr_info("  mmioFirstErrorStatus: %016llx\n", 
data->mmioFirstErrorStatus);
-   pr_info("  mmioErrorLog0:%016llx\n", data->mmioErrorLog0);
-   pr_info("  mmioErrorLog1:%016llx\n", data->mmioErrorLog1);
-   pr_info("  dma0ErrorStatus:  %016llx\n", data->dma0ErrorStatus);
-   pr_info("  dma0FirstErrorStatus: %016llx\n", 
data->dma0FirstErrorStatus);
-   pr_info("  dma0ErrorLog0:%016llx\n", data->dma0ErrorLog0);
-   pr_info("  dma0ErrorLog1:%016llx\n", data->dma0ErrorLog1);
-   pr_info("  dma1ErrorStatus:  %016llx\n", data->dma1ErrorStatus);
-   pr_info("  dma1FirstErrorStatus: %016llx\n", 
data->dma1FirstErrorStatus);
-   pr_info("  dma1ErrorLog0:%016llx\n", data->dma1ErrorLog0);
-   pr_info("  dma1ErrorLog1:%016llx\n", data->dma1ErrorLog1);
+   if (data->brdgCtl)
+   pr_info("  brdgCtl: %08x\n",
+   data->brdgCtl);
+   if (data->portStatusReg || data->rootCmplxStatus ||
+   data->busAgentStatus)
+   pr_info("  UtlSts:  %08x %08x %08x\n",
+   data->portStatusReg, data->rootCmplxStatus,
+   data->busAgentStatus);
+   if (data->deviceStatus || data->slotStatus   ||
+   data->linkStatus   || data->devCmdStatus ||
+   data->devSecStatus)
+   pr_info("  RootSts: %08x %08x %08x %08x %08x\n",
+   data->deviceStatus, data->slotStatus,
+   data->linkStatus, data->devCmdStatus,
+   data->devSecStatus);
+   if (data->rootErrorStatus   || data->uncorrErrorStatus ||
+

[PATCH 4/5] powerpc/powernv: Dump PHB diag-data immediately

2014-02-24 Thread Gavin Shan
The PHB diag-data is useful to help locating the root cause for
frozen PE or fenced PHB. However, EEH core enables IO path by clearing
part of HW registers before collecting it and eventually we got broken
PHB diag-data.

The patch intends to fix it by dumping the PHB diag-data immediately
when frozen/fenced state on PE or PHB is detected for the first time
in eeh_ops::get_state() or next_error() backend.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   79 +++--
 1 file changed, 42 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 04b4710..6dba684 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -114,6 +114,22 @@ DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_inbB_dbgfs_ops, 
ioda_eeh_inbB_dbgfs_get,
ioda_eeh_inbB_dbgfs_set, "0x%llx\n");
 #endif /* CONFIG_DEBUG_FS */
 
+static void ioda_eeh_phb_diag(struct pci_controller *hose)
+{
+   struct pnv_phb *phb = hose->private_data;
+   long rc;
+
+   rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
+PNV_PCI_DIAG_BUF_SIZE);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("%s: Failed to get diag-data for PHB#%x (%ld)\n",
+   __func__, hose->global_number, rc);
+   return;
+   }
+
+   pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
+}
+
 /**
  * ioda_eeh_post_init - Chip dependent post initialization
  * @hose: PCI controller
@@ -272,6 +288,9 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
result |= EEH_STATE_DMA_ACTIVE;
result |= EEH_STATE_MMIO_ENABLED;
result |= EEH_STATE_DMA_ENABLED;
+   } else if (!(pe->state & EEH_PE_ISOLATED)) {
+   eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
+   ioda_eeh_phb_diag(hose);
}
 
return result;
@@ -315,6 +334,15 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
   __func__, fstate, hose->global_number, pe_no);
}
 
+   /* Dump PHB diag-data for frozen PE */
+   if (result != EEH_STATE_NOT_SUPPORT &&
+   (result & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) !=
+   (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) &&
+   !(pe->state & EEH_PE_ISOLATED)) {
+   eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
+   ioda_eeh_phb_diag(hose);
+   }
+
return result;
 }
 
@@ -541,27 +569,6 @@ static int ioda_eeh_reset(struct eeh_pe *pe, int option)
 static int ioda_eeh_get_log(struct eeh_pe *pe, int severity,
char *drv_log, unsigned long len)
 {
-   s64 ret;
-   unsigned long flags;
-   struct pci_controller *hose = pe->phb;
-   struct pnv_phb *phb = hose->private_data;
-
-   spin_lock_irqsave(&phb->lock, flags);
-
-   ret = opal_pci_get_phb_diag_data2(phb->opal_id,
-   phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
-   if (ret) {
-   spin_unlock_irqrestore(&phb->lock, flags);
-   pr_warning("%s: Can't get log for PHB#%x-PE#%x (%lld)\n",
-  __func__, hose->global_number, pe->addr, ret);
-   return -EIO;
-   }
-
-   /* The PHB diag-data is always indicative */
-   pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
-
-   spin_unlock_irqrestore(&phb->lock, flags);
-
return 0;
 }
 
@@ -646,22 +653,6 @@ static void ioda_eeh_hub_diag(struct pci_controller *hose)
}
 }
 
-static void ioda_eeh_phb_diag(struct pci_controller *hose)
-{
-   struct pnv_phb *phb = hose->private_data;
-   long rc;
-
-   rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
-PNV_PCI_DIAG_BUF_SIZE);
-   if (rc != OPAL_SUCCESS) {
-   pr_warning("%s: Failed to get diag-data for PHB#%x (%ld)\n",
-   __func__, hose->global_number, rc);
-   return;
-   }
-
-   pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
-}
-
 static int ioda_eeh_get_pe(struct pci_controller *hose,
   u16 pe_no, struct eeh_pe **pe)
 {
@@ -809,6 +800,20 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
}
 
/*
+* EEH core will try recover from fenced PHB or
+* frozen PE. In the time for frozen PE, EEH core
+* enable IO path for that before collecting logs,
+* but it ruins the site. So we have to dump the
+* log in advance here.
+*/
+   if ((ret == EEH_NEXT_ERR_FROZEN_PE  ||
+   ret == EEH_NEXT_ERR_FENCED_PHB) &&
+   !((*pe)->state & EEH_PE_ISOLATED)) {
+ 

[PATCH 3/5] powerpc/powernv: Move PNV_EEH_STATE_ENABLED around

2014-02-24 Thread Gavin Shan
The flag PNV_EEH_STATE_ENABLED is put into pnv_phb::eeh_state,
which is protected by CONFIG_EEH. We needn't that. Instead, we
can have pnv_phb::flags and maintain all flags there, which is
the purpose of the patch. The patch also renames PNV_EEH_STATE_ENABLED
to PNV_PHB_FLAG_EEH.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |2 +-
 arch/powerpc/platforms/powernv/pci.c  |8 ++--
 arch/powerpc/platforms/powernv/pci.h  |7 +++
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 0d1d424..04b4710 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -153,7 +153,7 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
}
 #endif
 
-   phb->eeh_state |= PNV_EEH_STATE_ENABLED;
+   phb->flags |= PNV_PHB_FLAG_EEH;
 
return 0;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 95633d7..3955fc0 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -396,7 +396,7 @@ int pnv_pci_cfg_read(struct device_node *dn,
if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
return PCIBIOS_SUCCESSFUL;
 
-   if (phb->eeh_state & PNV_EEH_STATE_ENABLED) {
+   if (phb->flags & PNV_PHB_FLAG_EEH) {
if (*val == EEH_IO_ERROR_VALUE(size) &&
eeh_dev_check_failure(of_node_to_eeh_dev(dn)))
return PCIBIOS_DEVICE_NOT_FOUND;
@@ -434,12 +434,8 @@ int pnv_pci_cfg_write(struct device_node *dn,
}
 
/* Check if the PHB got frozen due to an error (no response) */
-#ifdef CONFIG_EEH
-   if (!(phb->eeh_state & PNV_EEH_STATE_ENABLED))
+   if (!(phb->flags & PNV_PHB_FLAG_EEH))
pnv_pci_config_check_eeh(phb, dn);
-#else
-   pnv_pci_config_check_eeh(phb, dn);
-#endif
 
return PCIBIOS_SUCCESSFUL;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 6870f60..94e3495 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -81,24 +81,23 @@ struct pnv_eeh_ops {
int (*configure_bridge)(struct eeh_pe *pe);
int (*next_error)(struct eeh_pe **pe);
 };
-
-#define PNV_EEH_STATE_ENABLED  (1 << 0)/* EEH enabled  */
-
 #endif /* CONFIG_EEH */
 
+#define PNV_PHB_FLAG_EEH   (1 << 0)
+
 struct pnv_phb {
struct pci_controller   *hose;
enum pnv_phb_type   type;
enum pnv_phb_model  model;
u64 hub_id;
u64 opal_id;
+   int flags;
void __iomem*regs;
int initialized;
spinlock_t  lock;
 
 #ifdef CONFIG_EEH
struct pnv_eeh_ops  *eeh_ops;
-   int eeh_state;
 #endif
 
 #ifdef CONFIG_DEBUG_FS
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/5] powerpc/powernv: Remove PNV_EEH_STATE_REMOVED

2014-02-24 Thread Gavin Shan
The PHB state PNV_EEH_STATE_REMOVED maintained in pnv_phb isn't
so useful any more and it's duplicated to EEH_PE_ISOLATED. The
patch replaces PNV_EEH_STATE_REMOVED with EEH_PE_ISOLATED.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   56 -
 arch/powerpc/platforms/powernv/pci.h  |1 -
 2 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index f514743..0d1d424 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -662,22 +662,6 @@ static void ioda_eeh_phb_diag(struct pci_controller *hose)
pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
 }
 
-static int ioda_eeh_get_phb_pe(struct pci_controller *hose,
-  struct eeh_pe **pe)
-{
-   struct eeh_pe *phb_pe;
-
-   phb_pe = eeh_phb_pe_get(hose);
-   if (!phb_pe) {
-   pr_warning("%s Can't find PE for PHB#%d\n",
-  __func__, hose->global_number);
-   return -EEXIST;
-   }
-
-   *pe = phb_pe;
-   return 0;
-}
-
 static int ioda_eeh_get_pe(struct pci_controller *hose,
   u16 pe_no, struct eeh_pe **pe)
 {
@@ -685,7 +669,8 @@ static int ioda_eeh_get_pe(struct pci_controller *hose,
struct eeh_dev dev;
 
/* Find the PHB PE */
-   if (ioda_eeh_get_phb_pe(hose, &phb_pe))
+   phb_pe = eeh_phb_pe_get(hose);
+   if (!phb_pe)
return -EEXIST;
 
/* Find the PE according to PE# */
@@ -713,6 +698,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 {
struct pci_controller *hose;
struct pnv_phb *phb;
+   struct eeh_pe *phb_pe;
u64 frozen_pe_no;
u16 err_type, severity;
long rc;
@@ -729,10 +715,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
list_for_each_entry(hose, &hose_list, list_node) {
/*
 * If the subordinate PCI buses of the PHB has been
-* removed, we needn't take care of it any more.
+* removed or is exactly under error recovery, we
+* needn't take care of it any more.
 */
phb = hose->private_data;
-   if (phb->eeh_state & PNV_EEH_STATE_REMOVED)
+   phb_pe = eeh_phb_pe_get(hose);
+   if (!phb_pe || (phb_pe->state & EEH_PE_ISOLATED))
continue;
 
rc = opal_pci_next_error(phb->opal_id,
@@ -765,12 +753,6 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
switch (err_type) {
case OPAL_EEH_IOC_ERROR:
if (severity == OPAL_EEH_SEV_IOC_DEAD) {
-   list_for_each_entry(hose, &hose_list,
-   list_node) {
-   phb = hose->private_data;
-   phb->eeh_state |= PNV_EEH_STATE_REMOVED;
-   }
-
pr_err("EEH: dead IOC detected\n");
ret = EEH_NEXT_ERR_DEAD_IOC;
} else if (severity == OPAL_EEH_SEV_INF) {
@@ -783,17 +765,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
break;
case OPAL_EEH_PHB_ERROR:
if (severity == OPAL_EEH_SEV_PHB_DEAD) {
-   if (ioda_eeh_get_phb_pe(hose, pe))
-   break;
-
+   *pe = phb_pe;
pr_err("EEH: dead PHB#%x detected\n",
hose->global_number);
-   phb->eeh_state |= PNV_EEH_STATE_REMOVED;
ret = EEH_NEXT_ERR_DEAD_PHB;
} else if (severity == OPAL_EEH_SEV_PHB_FENCED) {
-   if (ioda_eeh_get_phb_pe(hose, pe))
-   break;
-
+   *pe = phb_pe;
pr_err("EEH: fenced PHB#%x detected\n",
hose->global_number);
ret = EEH_NEXT_ERR_FENCED_PHB;
@@ -813,15 +790,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 * fenced PHB so that it can be recovered.
 */
if (ioda_eeh_get_pe(hose, frozen_pe_no, pe)) {
-   if (!ioda_eeh_get_phb_pe(hose, pe)) {
-   pr_err("EEH: Escalated fenced PHB#%x "
-  "detected for PE#%llx\n",
-   hose->global_number,
-   frozen_pe_no);
-  

[PATCH v3 0/5] EEH improvement

2014-02-24 Thread Gavin Shan
The series of patches intends to improve reliability of EEH on PowerNV
platform. First all, we have had multiple duplicate states (flags) for
PHB and PE, so we remove those duplicate states to simplify the code.
Besides, we had corrupted PHB diag-data for case of frozen PE. In order
to solve the problem, we introduce eeh_ops->event() and notifications
are sent from EEH core to (PowerNV) platform on creating or destroying
PE instance so that we can allocate or free PHB diag-data backend. Then
we cache the PHB diag-data on the first call to eeh_ops->get_state()
and dump it afterwards, which helps to get correct PHB diag-data.

With the patchset applied, we never dump PHB diag-data for INF errors.
Instead, we just maintain statistics in /proc/powerpc/eeh_inf_err. Also,
we changed the PHB diag-data dump format for a bit to have multiple
fields per line and omits the line with all zero'd fields as Ben suggested.

v2 -> v3:
* We don't cache the PHB diag-data, instead we just grab and
  dump PHB diag-data on the first catch-up to avoid broken
  PHB diag-data.
v1 -> v2:
* Amending commit logs
* Support eeh_ops->event() and maintain PHB diag-data on basis
  of PE instance
* When dumping PHB diag-data, to replace "-" with "" and
  omit the line if the fields of it are all zeros.

---

arch/powerpc/include/asm/eeh.h|1 -
arch/powerpc/kernel/eeh.c |   10 +---
arch/powerpc/kernel/eeh_driver.c  |   10 ++--
arch/powerpc/platforms/powernv/eeh-ioda.c |  137 
--
arch/powerpc/platforms/powernv/pci.c  |  228 
++---
arch/powerpc/platforms/powernv/pci.h  |8 +--
6 files changed, 195 insertions(+), 199 deletions(-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/5] powerpc/eeh: Remove EEH_PE_PHB_DEAD

2014-02-24 Thread Gavin Shan
The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to
EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD
would be removed from the system. However, it's safe to replace
that with EEH_PE_ISOLATED.

The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled,
either failure or success. It makes the PHB PE state consistent with:

PHB functions normallyNONE
PHB has been removed  EEH_PE_ISOLATED
PHB fenced, recovery in progress  EEH_PE_ISOLATED | RECOVERING

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h   |1 -
 arch/powerpc/kernel/eeh.c|   10 ++
 arch/powerpc/kernel/eeh_driver.c |   10 +-
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index d4dd41f..a61b06f 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -53,7 +53,6 @@ struct device_node;
 
 #define EEH_PE_ISOLATED(1 << 0)/* Isolated PE  
*/
 #define EEH_PE_RECOVERING  (1 << 1)/* Recovering PE*/
-#define EEH_PE_PHB_DEAD(1 << 2)/* Dead PHB 
*/
 
 #define EEH_PE_KEEP(1 << 8)/* Keep PE on hotplug   */
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e7b76a6..f167676 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -232,7 +232,6 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 {
size_t loglen = 0;
struct eeh_dev *edev, *tmp;
-   bool valid_cfg_log = true;
 
/*
 * When the PHB is fenced or dead, it's pointless to collect
@@ -240,12 +239,7 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 * 0xFF's. For ER, we still retrieve the data from the PCI
 * config space.
 */
-   if (eeh_probe_mode_dev() &&
-   (pe->type & EEH_PE_PHB) &&
-   (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
-   valid_cfg_log = false;
-
-   if (valid_cfg_log) {
+   if (!(pe->type & EEH_PE_PHB)) {
eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
eeh_ops->configure_bridge(pe);
eeh_pe_restore_bars(pe);
@@ -309,7 +303,7 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
 
/* If the PHB has been in problematic state */
eeh_serialize_lock(&flags);
-   if (phb_pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)) {
+   if (phb_pe->state & EEH_PE_ISOLATED) {
ret = 0;
goto out;
}
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fdc679d..4cf0467 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -665,8 +665,7 @@ static void eeh_handle_special_event(void)
phb_pe = eeh_phb_pe_get(hose);
if (!phb_pe) continue;
 
-   eeh_pe_state_mark(phb_pe,
-   EEH_PE_ISOLATED | EEH_PE_PHB_DEAD);
+   eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
}
 
eeh_serialize_unlock(flags);
@@ -682,8 +681,7 @@ static void eeh_handle_special_event(void)
eeh_remove_event(pe);
 
if (rc == EEH_NEXT_ERR_DEAD_PHB)
-   eeh_pe_state_mark(pe,
-   EEH_PE_ISOLATED | EEH_PE_PHB_DEAD);
+   eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
else
eeh_pe_state_mark(pe,
EEH_PE_ISOLATED | EEH_PE_RECOVERING);
@@ -707,12 +705,14 @@ static void eeh_handle_special_event(void)
if (rc == EEH_NEXT_ERR_FROZEN_PE ||
rc == EEH_NEXT_ERR_FENCED_PHB) {
eeh_handle_normal_event(pe);
+   eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
} else {
pci_lock_rescan_remove();
list_for_each_entry(hose, &hose_list, list_node) {
phb_pe = eeh_phb_pe_get(hose);
if (!phb_pe ||
-   !(phb_pe->state & EEH_PE_PHB_DEAD))
+   !(phb_pe->state & EEH_PE_ISOLATED) ||
+   (phb_pe->state & EEH_PE_RECOVERING))
continue;
 
/* Notify all devices to be down */
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/9] EEH improvement

2014-02-24 Thread Gavin Shan
On Tue, Feb 25, 2014 at 01:37:41PM +0800, Gavin Shan wrote:
>The series of patches intends to improve reliability of EEH on PowerNV
>platform. First all, we have had multiple duplicate states (flags) for
>PHB and PE, so we remove those duplicate states to simplify the code.
>Besides, we had corrupted PHB diag-data for case of frozen PE. In order
>to solve the problem, we introduce eeh_ops->event() and notifications
>are sent from EEH core to (PowerNV) platform on creating or destroying
>PE instance so that we can allocate or free PHB diag-data backend. Then
>we cache the PHB diag-data on the first call to eeh_ops->get_state()
>and dump it afterwards, which helps to get correct PHB diag-data.
>
>With the patchset applied, we never dump PHB diag-data for INF errors.
>Instead, we just maintain statistics in /proc/powerpc/eeh_inf_err. Also,
>we changed the PHB diag-data dump format for a bit to have multiple
>fields per line and omits the line with all zero'd fields as Ben suggested.
>
>
>v1 -> v2:
>   * Amending commit logs
>   * Support eeh_ops->event() and maintain PHB diag-data on basis
> of PE instance
>   * When dumping PHB diag-data, to replace "-" with "" and
> omit the line if the fields of it are all zeros.
>

Please ignore this and I'm going to send out v3 where we just
grab and dump the PHB diag-data (without cache any more) as
Ben suggested :-)

Thanks,
Gavin

>---
>
>arch/powerpc/include/asm/eeh.h   |7 ++-
>arch/powerpc/kernel/eeh.c|   10 +---
>arch/powerpc/kernel/eeh_driver.c |   10 ++--
>arch/powerpc/kernel/eeh_pe.c |   39 -
>arch/powerpc/platforms/powernv/eeh-ioda.c|  193 
>-
>arch/powerpc/platforms/powernv/eeh-powernv.c |   74 +++-
>arch/powerpc/platforms/powernv/pci.c |  228 
>+-
>arch/powerpc/platforms/powernv/pci.h |   11 ++--
>arch/powerpc/platforms/pseries/eeh_pseries.c |3 +-
>9 files changed, 358 insertions(+), 217 deletions(-)
>
>Thanks,
>Gavin
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/pci: Use of_pci_range_parser helper in pci_process_bridge_OF_ranges

2014-02-24 Thread Andrew Murray
This patch updates the implementation of pci_process_bridge_OF_ranges to use
the of_pci_range_parser helpers.

Signed-off-by: Andrew Murray 
---
I've verified that this builds, however I have no hardware to test this.
---
 arch/powerpc/kernel/pci-common.c | 88 +---
 1 file changed, 29 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index d9476c1..a05fe18 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -666,60 +666,36 @@ void pci_resource_to_user(const struct pci_dev *dev, int 
bar,
 void pci_process_bridge_OF_ranges(struct pci_controller *hose,
  struct device_node *dev, int primary)
 {
-   const __be32 *ranges;
-   int rlen;
-   int pna = of_n_addr_cells(dev);
-   int np = pna + 5;
int memno = 0;
-   u32 pci_space;
-   unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
struct resource *res;
+   struct of_pci_range range;
+   struct of_pci_range_parser parser;
 
printk(KERN_INFO "PCI host bridge %s %s ranges:\n",
   dev->full_name, primary ? "(primary)" : "");
 
-   /* Get ranges property */
-   ranges = of_get_property(dev, "ranges", &rlen);
-   if (ranges == NULL)
+   /* Check for ranges property */
+   if (of_pci_range_parser_init(&parser, dev))
return;
 
/* Parse it */
-   while ((rlen -= np * 4) >= 0) {
-   /* Read next ranges element */
-   pci_space = of_read_number(ranges, 1);
-   pci_addr = of_read_number(ranges + 1, 2);
-   cpu_addr = of_translate_address(dev, ranges + 3);
-   size = of_read_number(ranges + pna + 3, 2);
-   ranges += np;
-
+   for_each_of_pci_range(&parser, &range) {
/* If we failed translation or got a zero-sized region
 * (some FW try to feed us with non sensical zero sized regions
 * such as power3 which look like some kind of attempt at 
exposing
 * the VGA memory hole)
 */
-   if (cpu_addr == OF_BAD_ADDR || size == 0)
+   if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
continue;
 
-   /* Now consume following elements while they are contiguous */
-   for (; rlen >= np * sizeof(u32);
-ranges += np, rlen -= np * 4) {
-   if (of_read_number(ranges, 1) != pci_space)
-   break;
-   pci_next = of_read_number(ranges + 1, 2);
-   cpu_next = of_translate_address(dev, ranges + 3);
-   if (pci_next != pci_addr + size ||
-   cpu_next != cpu_addr + size)
-   break;
-   size += of_read_number(ranges + pna + 3, 2);
-   }
-
/* Act based on address space type */
res = NULL;
-   switch ((pci_space >> 24) & 0x3) {
-   case 1: /* PCI IO space */
+   switch (range.flags & IORESOURCE_TYPE_BITS) {
+   case IORESOURCE_IO:
printk(KERN_INFO
   "  IO 0x%016llx..0x%016llx -> 0x%016llx\n",
-  cpu_addr, cpu_addr + size - 1, pci_addr);
+  range.cpu_addr, range.cpu_addr + range.size - 1,
+  range.pci_addr);
 
/* We support only one IO range */
if (hose->pci_io_size) {
@@ -729,11 +705,12 @@ void pci_process_bridge_OF_ranges(struct pci_controller 
*hose,
}
 #ifdef CONFIG_PPC32
/* On 32 bits, limit I/O space to 16MB */
-   if (size > 0x0100)
-   size = 0x0100;
+   if (range.size > 0x0100)
+   range.size = 0x0100;
 
/* 32 bits needs to map IOs here */
-   hose->io_base_virt = ioremap(cpu_addr, size);
+   hose->io_base_virt = ioremap(range.cpu_addr,
+   range.size);
 
/* Expect trouble if pci_addr is not 0 */
if (primary)
@@ -743,20 +720,20 @@ void pci_process_bridge_OF_ranges(struct pci_controller 
*hose,
/* pci_io_size and io_base_phys always represent IO
 * space starting at 0 so we factor in pci_addr
 */
-   hose->pci_io_size = pci_addr + size;
-   hose->io_base_phys = cpu_addr - pci_addr;
+   hose->pci_io_size = range.pci_addr + range.size;
+   h

Re: [PATCH 7/7] powerpc: Added PCI MSI support using the HSTA module

2014-02-24 Thread Alistair Popple
On Sat, 22 Feb 2014 07:41:26 Benjamin Herrenschmidt wrote:
> On Fri, 2014-02-21 at 15:33 +0100, Arnd Bergmann wrote:

[...]

> 
> Should we (provided it's possible in HW) create two ranges instead ? One
> covering RAM and one covering MSIs ? To avoid stray DMAs whacking random
> HW registers in the chip ...
> 

The thought occurred to me but I figured if we had stray DMAs then they could 
already whack random bits of system memory which would likely break your 
system anyway so I wasn't sure how much we'd gain. I guess whacking random HW 
registers is arguably a bit worse though.

I did a bit of digging into the HW documentation and it looks like it _may_ be 
possible to create a second range that would limit access to a subset of HW 
registers, although there doesn't seem to be much flexibility. Personally I'm 
not sure it justifies the work, but I'm happy to look into it a bit more if 
you feel it's important?

- Alistair

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 5/9] powerpc/eeh: Introduce eeh_ops->event()

2014-02-24 Thread Gavin Shan
The patch introduces eeh_ops->event() so that we can pass various
events to underly platform. One reason to have that is to allocate
or free PHB diag-data for individual PEs on PowerNV platform in
future when EEH core to create or destroy PE instances.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h |6 ++
 arch/powerpc/kernel/eeh_pe.c   |   14 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index a61b06f..8fd1c2d 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -71,6 +71,7 @@ struct eeh_pe {
struct list_head child_list;/* Link PE to the child list*/
struct list_head edevs; /* Link list of EEH devices */
struct list_head child; /* Child PEs*/
+   void *data; /* Platform dependent data  */
 };
 
 #define eeh_pe_for_each_dev(pe, edev, tmp) \
@@ -151,6 +152,10 @@ enum {
 #define EEH_LOG_TEMP   1   /* EEH temporary error log  */
 #define EEH_LOG_PERM   2   /* EEH permanent error log  */
 
+/* EEH events sent to platform */
+#define EEH_EVENT_PE_ALLOC 0
+#define EEH_EVENT_PE_FREE  1
+
 struct eeh_ops {
char *name;
int (*init)(void);
@@ -168,6 +173,7 @@ struct eeh_ops {
int (*write_config)(struct device_node *dn, int where, int size, u32 
val);
int (*next_error)(struct eeh_pe **pe);
int (*restore_config)(struct device_node *dn);
+   int (*event)(int event, void *data);
 };
 
 extern struct eeh_ops *eeh_ops;
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 2add834..6cdc7a8 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -44,6 +44,7 @@ static LIST_HEAD(eeh_phb_pe);
 static struct eeh_pe *eeh_pe_alloc(struct pci_controller *phb, int type)
 {
struct eeh_pe *pe;
+   int ret;
 
/* Allocate PHB PE */
pe = kzalloc(sizeof(struct eeh_pe), GFP_KERNEL);
@@ -56,6 +57,16 @@ static struct eeh_pe *eeh_pe_alloc(struct pci_controller 
*phb, int type)
INIT_LIST_HEAD(&pe->child);
INIT_LIST_HEAD(&pe->edevs);
 
+   if (eeh_ops->event) {
+   ret = eeh_ops->event(EEH_EVENT_PE_ALLOC, pe);
+   if (ret) {
+   pr_warn("%s: Can't alloc PE (%d)\n",
+   __func__, ret);
+   kfree(pe);
+   return NULL;
+   }
+   }
+
return pe;
 }
 
@@ -77,6 +88,9 @@ static void eeh_pe_free(struct eeh_pe *pe)
return;
}
 
+   if (eeh_ops->event)
+   eeh_ops->event(EEH_EVENT_PE_FREE, pe);
+
kfree(pe);
 }
 
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 9/9] powerpc/powernv: Refactor PHB diag-data dump

2014-02-24 Thread Gavin Shan
As Ben suggested, the patch prints PHB diag-data with multiple
fields in one line and omits the line if the fields of that
line are all zero.

With the patch applied, the PHB3 diag-data dump looks like:

PHB3 PHB#3 Diag-data (Version: 1)

  brdgCtl: 0002
  RootSts: 000f 0040 b0830008 00100147 2000
  nFir: 0030006e 
  PhbSts:  001c 
  Lem: 0010 42498e327f502eae 
  InAErr:  8000 8000 04020300 \
   
  PE[  8] A/B: 8480002b 8000

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci.c |  220 +++---
 1 file changed, 125 insertions(+), 95 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 3955fc0..114e1a7 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -134,57 +134,72 @@ static void pnv_pci_dump_p7ioc_diag_data(struct 
pci_controller *hose,
pr_info("P7IOC PHB#%d Diag-data (Version: %d)\n\n",
hose->global_number, common->version);
 
-   pr_info("  brdgCtl:  %08x\n", data->brdgCtl);
-
-   pr_info("  portStatusReg:%08x\n", data->portStatusReg);
-   pr_info("  rootCmplxStatus:  %08x\n", data->rootCmplxStatus);
-   pr_info("  busAgentStatus:   %08x\n", data->busAgentStatus);
-
-   pr_info("  deviceStatus: %08x\n", data->deviceStatus);
-   pr_info("  slotStatus:   %08x\n", data->slotStatus);
-   pr_info("  linkStatus:   %08x\n", data->linkStatus);
-   pr_info("  devCmdStatus: %08x\n", data->devCmdStatus);
-   pr_info("  devSecStatus: %08x\n", data->devSecStatus);
-
-   pr_info("  rootErrorStatus:  %08x\n", data->rootErrorStatus);
-   pr_info("  uncorrErrorStatus:%08x\n", data->uncorrErrorStatus);
-   pr_info("  corrErrorStatus:  %08x\n", data->corrErrorStatus);
-   pr_info("  tlpHdr1:  %08x\n", data->tlpHdr1);
-   pr_info("  tlpHdr2:  %08x\n", data->tlpHdr2);
-   pr_info("  tlpHdr3:  %08x\n", data->tlpHdr3);
-   pr_info("  tlpHdr4:  %08x\n", data->tlpHdr4);
-   pr_info("  sourceId: %08x\n", data->sourceId);
-   pr_info("  errorClass:   %016llx\n", data->errorClass);
-   pr_info("  correlator:   %016llx\n", data->correlator);
-   pr_info("  p7iocPlssr:   %016llx\n", data->p7iocPlssr);
-   pr_info("  p7iocCsr: %016llx\n", data->p7iocCsr);
-   pr_info("  lemFir:   %016llx\n", data->lemFir);
-   pr_info("  lemErrorMask: %016llx\n", data->lemErrorMask);
-   pr_info("  lemWOF:   %016llx\n", data->lemWOF);
-   pr_info("  phbErrorStatus:   %016llx\n", data->phbErrorStatus);
-   pr_info("  phbFirstErrorStatus:  %016llx\n", data->phbFirstErrorStatus);
-   pr_info("  phbErrorLog0: %016llx\n", data->phbErrorLog0);
-   pr_info("  phbErrorLog1: %016llx\n", data->phbErrorLog1);
-   pr_info("  mmioErrorStatus:  %016llx\n", data->mmioErrorStatus);
-   pr_info("  mmioFirstErrorStatus: %016llx\n", 
data->mmioFirstErrorStatus);
-   pr_info("  mmioErrorLog0:%016llx\n", data->mmioErrorLog0);
-   pr_info("  mmioErrorLog1:%016llx\n", data->mmioErrorLog1);
-   pr_info("  dma0ErrorStatus:  %016llx\n", data->dma0ErrorStatus);
-   pr_info("  dma0FirstErrorStatus: %016llx\n", 
data->dma0FirstErrorStatus);
-   pr_info("  dma0ErrorLog0:%016llx\n", data->dma0ErrorLog0);
-   pr_info("  dma0ErrorLog1:%016llx\n", data->dma0ErrorLog1);
-   pr_info("  dma1ErrorStatus:  %016llx\n", data->dma1ErrorStatus);
-   pr_info("  dma1FirstErrorStatus: %016llx\n", 
data->dma1FirstErrorStatus);
-   pr_info("  dma1ErrorLog0:%016llx\n", data->dma1ErrorLog0);
-   pr_info("  dma1ErrorLog1:%016llx\n", data->dma1ErrorLog1);
+   if (data->brdgCtl)
+   pr_info("  brdgCtl: %08x\n",
+   data->brdgCtl);
+   if (data->portStatusReg || data->rootCmplxStatus ||
+   data->busAgentStatus)
+   pr_info("  UtlSts:  %08x %08x %08x\n",
+   data->portStatusReg, data->rootCmplxStatus,
+   data->busAgentStatus);
+   if (data->deviceStatus || data->slotStatus   ||
+   data->linkStatus   || data->devCmdStatus ||
+   data->devSecStatus)
+   pr_info("  RootSts: %08x %08x %08x %08x %08x\n",
+   data->deviceStatus, data->slotStatus,
+   data->linkStatus, data->devCmdStatus,
+   data->devSecStatus);
+   if (data->rootErrorStatus   || data->uncorrErrorStatus ||
+

[PATCH 8/9] powerpc/powernv: Add /proc/powerpc/eeh_inf_err

2014-02-24 Thread Gavin Shan
The patch adds /proc/powerpc/eeh_inf_err to count the INF errors
happened on PHBs as Ben suggested.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   51 +
 arch/powerpc/platforms/powernv/pci.h  |1 +
 2 files changed, 52 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index cd06c52..3ddd706 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -35,6 +36,8 @@
 #include "powernv.h"
 #include "pci.h"
 
+static u64 ioda_eeh_ioc_inf_err = 0;
+static int ioda_eeh_proc_init = 0;
 static int ioda_eeh_nb_init = 0;
 
 static int ioda_eeh_event(struct notifier_block *nb,
@@ -114,6 +117,44 @@ DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_inbB_dbgfs_ops, 
ioda_eeh_inbB_dbgfs_get,
ioda_eeh_inbB_dbgfs_set, "0x%llx\n");
 #endif /* CONFIG_DEBUG_FS */
 
+#ifdef CONFIG_PROC_FS
+static int ioda_eeh_proc_show(struct seq_file *m, void *v)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+
+   if (!eeh_enabled()) {
+seq_printf(m, "EEH Subsystem disabled\n");
+   return 0;
+   }
+
+   seq_printf(m, "EEH Subsystem enabled\n");
+   if (ioda_eeh_ioc_inf_err > 0)
+   seq_printf(m, "\nIOC INF Errors: %llu\n\n",
+  ioda_eeh_ioc_inf_err);
+
+   list_for_each_entry(hose, &hose_list, list_node) {
+   phb = hose->private_data;
+   seq_printf(m, "PHB#%d INF Errors: %llu\n",
+  hose->global_number, phb->inf_err);
+   }
+
+   return 0;
+}
+
+static int ioda_eeh_proc_open(struct inode *inode, struct file *file)
+{
+return single_open(file, ioda_eeh_proc_show, NULL);
+}
+
+static const struct file_operations ioda_eeh_proc_ops = {
+   .open   = ioda_eeh_proc_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+#endif /* CONFIG_PROC_FS */
+
 static void ioda_eeh_phb_diag(struct pci_controller *hose, char *buf)
 {
struct pnv_phb *phb = hose->private_data;
@@ -170,6 +211,14 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
}
 #endif
 
+#ifdef CONFIG_PROC_FS
+   if (!ioda_eeh_proc_init) {
+   ioda_eeh_proc_init = 1;
+   proc_create("powerpc/eeh_inf_err", 0,
+   NULL, &ioda_eeh_proc_ops);
+   }
+#endif
+
phb->flags |= PNV_PHB_FLAG_EEH;
 
return 0;
@@ -755,6 +804,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
} else if (severity == OPAL_EEH_SEV_INF) {
pr_info("EEH: IOC informative error "
"detected\n");
+   ioda_eeh_ioc_inf_err++;
ioda_eeh_hub_diag(hose);
ret = EEH_NEXT_ERR_NONE;
}
@@ -775,6 +825,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
pr_info("EEH: PHB#%x informative error "
"detected\n",
hose->global_number);
+   phb->inf_err++;
ioda_eeh_phb_diag(hose, phb->diag.blob);
ret = EEH_NEXT_ERR_NONE;
}
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 3645fc4..64ca719 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -97,6 +97,7 @@ struct pnv_phb {
spinlock_t  lock;
 
 #ifdef CONFIG_EEH
+   u64 inf_err;
struct pnv_eeh_ops  *eeh_ops;
 #endif
 
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 6/9] powerpc/powernv: Support eeh_ops->event()

2014-02-24 Thread Gavin Shan
The patch implements the backend for eeh_ops->event() on PowerNV
platform so that we can allocate or destroy PHB diag-data buffer,
which is attached to eeh_pe::data.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |   42 +-
 arch/powerpc/platforms/pseries/eeh_pseries.c |3 +-
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index a59788e..cfba40a 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -365,6 +365,45 @@ static int powernv_eeh_restore_config(struct device_node 
*dn)
return 0;
 }
 
+static int powernv_eeh_event(int event, void *data)
+{
+   struct eeh_pe *pe = data;
+   struct pnv_phb *phb;
+   int ret = 0;
+
+   switch (event) {
+   case EEH_EVENT_PE_ALLOC:
+   if (!pe) {
+   ret = -EINVAL;
+   break;
+   } else if (pe->data) {
+   ret = -EEXIST;
+   break;
+   }
+
+   phb = pe->phb->private_data;
+   if (phb->model == PNV_PHB_MODEL_P7IOC ||
+   phb->model == PNV_PHB_MODEL_PHB3) {
+   pe->data = kzalloc(PNV_PCI_DIAG_BUF_SIZE, GFP_KERNEL);
+   if (!pe->data)
+   ret = -ENOMEM;
+   }
+
+   break;
+   case EEH_EVENT_PE_FREE:
+   if (pe->data) {
+   kfree(pe->data);
+   pe->data = NULL;
+   }
+
+   break;
+   default:
+   return 0;
+   }
+
+   return ret;
+}
+
 static struct eeh_ops powernv_eeh_ops = {
.name   = "powernv",
.init   = powernv_eeh_init,
@@ -381,7 +420,8 @@ static struct eeh_ops powernv_eeh_ops = {
.read_config= pnv_pci_cfg_read,
.write_config   = pnv_pci_cfg_write,
.next_error = powernv_eeh_next_error,
-   .restore_config = powernv_eeh_restore_config
+   .restore_config = powernv_eeh_restore_config,
+   .event  = powernv_eeh_event
 };
 
 /**
diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 8a8f047..b9a4ddb 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -691,7 +691,8 @@ static struct eeh_ops pseries_eeh_ops = {
.read_config= pseries_eeh_read_config,
.write_config   = pseries_eeh_write_config,
.next_error = NULL,
-   .restore_config = NULL
+   .restore_config = NULL,
+   .event  = NULL
 };
 
 /**
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 7/9] powerpc/powernv: Cache PHB diag-data

2014-02-24 Thread Gavin Shan
The PHB diag-data is useful to help locating the root cause for
frozen PE or fenced PHB. However, EEH core enables IO path by clearing
part of HW registers before collecting it and eventually we got broken
PHB diag-data.

The patch intends to fix it by caching the PHB diag-data in advance
to eeh_pe::data when frozen/fenced state on PE or PHB is detected
for the first time in eeh_ops::get_state() or next_error() backend.
Also, we collect diag-data for INF error without dumping it.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c|   84 ++
 arch/powerpc/platforms/powernv/eeh-powernv.c |   32 ++
 arch/powerpc/platforms/powernv/pci.h |2 +-
 3 files changed, 67 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 04b4710..cd06c52 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -114,6 +114,23 @@ DEFINE_SIMPLE_ATTRIBUTE(ioda_eeh_inbB_dbgfs_ops, 
ioda_eeh_inbB_dbgfs_get,
ioda_eeh_inbB_dbgfs_set, "0x%llx\n");
 #endif /* CONFIG_DEBUG_FS */
 
+static void ioda_eeh_phb_diag(struct pci_controller *hose, char *buf)
+{
+   struct pnv_phb *phb = hose->private_data;
+   long rc;
+
+   if (!buf)
+   return;
+
+   rc = opal_pci_get_phb_diag_data2(phb->opal_id, buf,
+ PNV_PCI_DIAG_BUF_SIZE);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("%s: Failed to get PHB#%x diag-data (%ld)\n",
+   __func__, hose->global_number, rc);
+   return;
+   }
+}
+
 /**
  * ioda_eeh_post_init - Chip dependent post initialization
  * @hose: PCI controller
@@ -224,12 +241,13 @@ static int ioda_eeh_set_option(struct eeh_pe *pe, int 
option)
 /**
  * ioda_eeh_get_state - Retrieve the state of PE
  * @pe: EEH PE
+ * @cache_diag: Cache PHB diag-data or not
  *
  * The PE's state should be retrieved from the PEEV, PEST
  * IODA tables. Since the OPAL has exported the function
  * to do it, it'd better to use that.
  */
-static int ioda_eeh_get_state(struct eeh_pe *pe)
+static int ioda_eeh_get_state(struct eeh_pe *pe, bool cache_diag)
 {
s64 ret = 0;
u8 fstate;
@@ -272,6 +290,9 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
result |= EEH_STATE_DMA_ACTIVE;
result |= EEH_STATE_MMIO_ENABLED;
result |= EEH_STATE_DMA_ENABLED;
+   } else if (cache_diag && !(pe->state & EEH_PE_ISOLATED)) {
+   /* Cache diag-data for fenced PHB */
+   ioda_eeh_phb_diag(hose, pe->data);
}
 
return result;
@@ -315,6 +336,14 @@ static int ioda_eeh_get_state(struct eeh_pe *pe)
   __func__, fstate, hose->global_number, pe_no);
}
 
+   /* Cache PHB diag-data for frozen PE */
+   if (cache_diag &&
+   result != EEH_STATE_NOT_SUPPORT &&
+   (result & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) !=
+   (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) &&
+   !(pe->state & EEH_PE_ISOLATED))
+   ioda_eeh_phb_diag(hose, pe->data);
+
return result;
 }
 
@@ -541,26 +570,10 @@ static int ioda_eeh_reset(struct eeh_pe *pe, int option)
 static int ioda_eeh_get_log(struct eeh_pe *pe, int severity,
char *drv_log, unsigned long len)
 {
-   s64 ret;
-   unsigned long flags;
-   struct pci_controller *hose = pe->phb;
-   struct pnv_phb *phb = hose->private_data;
+   if (!pe->data)
+   return 0;
 
-   spin_lock_irqsave(&phb->lock, flags);
-
-   ret = opal_pci_get_phb_diag_data2(phb->opal_id,
-   phb->diag.blob, PNV_PCI_DIAG_BUF_SIZE);
-   if (ret) {
-   spin_unlock_irqrestore(&phb->lock, flags);
-   pr_warning("%s: Can't get log for PHB#%x-PE#%x (%lld)\n",
-  __func__, hose->global_number, pe->addr, ret);
-   return -EIO;
-   }
-
-   /* The PHB diag-data is always indicative */
-   pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
-
-   spin_unlock_irqrestore(&phb->lock, flags);
+   pnv_pci_dump_phb_diag_data(pe->phb, pe->data);
 
return 0;
 }
@@ -646,22 +659,6 @@ static void ioda_eeh_hub_diag(struct pci_controller *hose)
}
 }
 
-static void ioda_eeh_phb_diag(struct pci_controller *hose)
-{
-   struct pnv_phb *phb = hose->private_data;
-   long rc;
-
-   rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob,
-PNV_PCI_DIAG_BUF_SIZE);
-   if (rc != OPAL_SUCCESS) {
-   pr_warning("%s: Failed to get diag-data for PHB#%x (%ld)\n",
-   __func__, hose->global_number, rc);
-   return;
-   }
-
-  

[PATCH 1/9] powerpc/eeh: Remove EEH_PE_PHB_DEAD

2014-02-24 Thread Gavin Shan
The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to
EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD
would be removed from the system. However, it's safe to replace
that with EEH_PE_ISOLATED.

The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled,
either failure or success. It makes the PHB PE state consistent with:

PHB functions normallyNONE
PHB has been removed  EEH_PE_ISOLATED
PHB fenced, recovery in progress  EEH_PE_ISOLATED | RECOVERING

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h   |1 -
 arch/powerpc/kernel/eeh.c|   10 ++
 arch/powerpc/kernel/eeh_driver.c |   10 +-
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index d4dd41f..a61b06f 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -53,7 +53,6 @@ struct device_node;
 
 #define EEH_PE_ISOLATED(1 << 0)/* Isolated PE  
*/
 #define EEH_PE_RECOVERING  (1 << 1)/* Recovering PE*/
-#define EEH_PE_PHB_DEAD(1 << 2)/* Dead PHB 
*/
 
 #define EEH_PE_KEEP(1 << 8)/* Keep PE on hotplug   */
 
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e7b76a6..f167676 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -232,7 +232,6 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 {
size_t loglen = 0;
struct eeh_dev *edev, *tmp;
-   bool valid_cfg_log = true;
 
/*
 * When the PHB is fenced or dead, it's pointless to collect
@@ -240,12 +239,7 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 * 0xFF's. For ER, we still retrieve the data from the PCI
 * config space.
 */
-   if (eeh_probe_mode_dev() &&
-   (pe->type & EEH_PE_PHB) &&
-   (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
-   valid_cfg_log = false;
-
-   if (valid_cfg_log) {
+   if (!(pe->type & EEH_PE_PHB)) {
eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
eeh_ops->configure_bridge(pe);
eeh_pe_restore_bars(pe);
@@ -309,7 +303,7 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
 
/* If the PHB has been in problematic state */
eeh_serialize_lock(&flags);
-   if (phb_pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)) {
+   if (phb_pe->state & EEH_PE_ISOLATED) {
ret = 0;
goto out;
}
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fdc679d..4cf0467 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -665,8 +665,7 @@ static void eeh_handle_special_event(void)
phb_pe = eeh_phb_pe_get(hose);
if (!phb_pe) continue;
 
-   eeh_pe_state_mark(phb_pe,
-   EEH_PE_ISOLATED | EEH_PE_PHB_DEAD);
+   eeh_pe_state_mark(phb_pe, EEH_PE_ISOLATED);
}
 
eeh_serialize_unlock(flags);
@@ -682,8 +681,7 @@ static void eeh_handle_special_event(void)
eeh_remove_event(pe);
 
if (rc == EEH_NEXT_ERR_DEAD_PHB)
-   eeh_pe_state_mark(pe,
-   EEH_PE_ISOLATED | EEH_PE_PHB_DEAD);
+   eeh_pe_state_mark(pe, EEH_PE_ISOLATED);
else
eeh_pe_state_mark(pe,
EEH_PE_ISOLATED | EEH_PE_RECOVERING);
@@ -707,12 +705,14 @@ static void eeh_handle_special_event(void)
if (rc == EEH_NEXT_ERR_FROZEN_PE ||
rc == EEH_NEXT_ERR_FENCED_PHB) {
eeh_handle_normal_event(pe);
+   eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
} else {
pci_lock_rescan_remove();
list_for_each_entry(hose, &hose_list, list_node) {
phb_pe = eeh_phb_pe_get(hose);
if (!phb_pe ||
-   !(phb_pe->state & EEH_PE_PHB_DEAD))
+   !(phb_pe->state & EEH_PE_ISOLATED) ||
+   (phb_pe->state & EEH_PE_RECOVERING))
continue;
 
/* Notify all devices to be down */
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/9] powerpc/powernv: Move PNV_EEH_STATE_ENABLED around

2014-02-24 Thread Gavin Shan
The flag PNV_EEH_STATE_ENABLED is put into pnv_phb::eeh_state,
which is protected by CONFIG_EEH. We needn't that. Instead, we
can have pnv_phb::flags and maintain all flags there, which is
the purpose of the patch. The patch also renames PNV_EEH_STATE_ENABLED
to PNV_PHB_FLAG_EEH.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |2 +-
 arch/powerpc/platforms/powernv/pci.c  |8 ++--
 arch/powerpc/platforms/powernv/pci.h  |7 +++
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index 0d1d424..04b4710 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -153,7 +153,7 @@ static int ioda_eeh_post_init(struct pci_controller *hose)
}
 #endif
 
-   phb->eeh_state |= PNV_EEH_STATE_ENABLED;
+   phb->flags |= PNV_PHB_FLAG_EEH;
 
return 0;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 95633d7..3955fc0 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -396,7 +396,7 @@ int pnv_pci_cfg_read(struct device_node *dn,
if (phb_pe && (phb_pe->state & EEH_PE_ISOLATED))
return PCIBIOS_SUCCESSFUL;
 
-   if (phb->eeh_state & PNV_EEH_STATE_ENABLED) {
+   if (phb->flags & PNV_PHB_FLAG_EEH) {
if (*val == EEH_IO_ERROR_VALUE(size) &&
eeh_dev_check_failure(of_node_to_eeh_dev(dn)))
return PCIBIOS_DEVICE_NOT_FOUND;
@@ -434,12 +434,8 @@ int pnv_pci_cfg_write(struct device_node *dn,
}
 
/* Check if the PHB got frozen due to an error (no response) */
-#ifdef CONFIG_EEH
-   if (!(phb->eeh_state & PNV_EEH_STATE_ENABLED))
+   if (!(phb->flags & PNV_PHB_FLAG_EEH))
pnv_pci_config_check_eeh(phb, dn);
-#else
-   pnv_pci_config_check_eeh(phb, dn);
-#endif
 
return PCIBIOS_SUCCESSFUL;
 }
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 6870f60..94e3495 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -81,24 +81,23 @@ struct pnv_eeh_ops {
int (*configure_bridge)(struct eeh_pe *pe);
int (*next_error)(struct eeh_pe **pe);
 };
-
-#define PNV_EEH_STATE_ENABLED  (1 << 0)/* EEH enabled  */
-
 #endif /* CONFIG_EEH */
 
+#define PNV_PHB_FLAG_EEH   (1 << 0)
+
 struct pnv_phb {
struct pci_controller   *hose;
enum pnv_phb_type   type;
enum pnv_phb_model  model;
u64 hub_id;
u64 opal_id;
+   int flags;
void __iomem*regs;
int initialized;
spinlock_t  lock;
 
 #ifdef CONFIG_EEH
struct pnv_eeh_ops  *eeh_ops;
-   int eeh_state;
 #endif
 
 #ifdef CONFIG_DEBUG_FS
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/9] powerpc/powernv: Remove PNV_EEH_STATE_REMOVED

2014-02-24 Thread Gavin Shan
The PHB state PNV_EEH_STATE_REMOVED maintained in pnv_phb isn't
so useful any more and it's duplicated to EEH_PE_ISOLATED. The
patch replaces PNV_EEH_STATE_REMOVED with EEH_PE_ISOLATED.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-ioda.c |   56 -
 arch/powerpc/platforms/powernv/pci.h  |1 -
 2 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c 
b/arch/powerpc/platforms/powernv/eeh-ioda.c
index f514743..0d1d424 100644
--- a/arch/powerpc/platforms/powernv/eeh-ioda.c
+++ b/arch/powerpc/platforms/powernv/eeh-ioda.c
@@ -662,22 +662,6 @@ static void ioda_eeh_phb_diag(struct pci_controller *hose)
pnv_pci_dump_phb_diag_data(hose, phb->diag.blob);
 }
 
-static int ioda_eeh_get_phb_pe(struct pci_controller *hose,
-  struct eeh_pe **pe)
-{
-   struct eeh_pe *phb_pe;
-
-   phb_pe = eeh_phb_pe_get(hose);
-   if (!phb_pe) {
-   pr_warning("%s Can't find PE for PHB#%d\n",
-  __func__, hose->global_number);
-   return -EEXIST;
-   }
-
-   *pe = phb_pe;
-   return 0;
-}
-
 static int ioda_eeh_get_pe(struct pci_controller *hose,
   u16 pe_no, struct eeh_pe **pe)
 {
@@ -685,7 +669,8 @@ static int ioda_eeh_get_pe(struct pci_controller *hose,
struct eeh_dev dev;
 
/* Find the PHB PE */
-   if (ioda_eeh_get_phb_pe(hose, &phb_pe))
+   phb_pe = eeh_phb_pe_get(hose);
+   if (!phb_pe)
return -EEXIST;
 
/* Find the PE according to PE# */
@@ -713,6 +698,7 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 {
struct pci_controller *hose;
struct pnv_phb *phb;
+   struct eeh_pe *phb_pe;
u64 frozen_pe_no;
u16 err_type, severity;
long rc;
@@ -729,10 +715,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
list_for_each_entry(hose, &hose_list, list_node) {
/*
 * If the subordinate PCI buses of the PHB has been
-* removed, we needn't take care of it any more.
+* removed or is exactly under error recovery, we
+* needn't take care of it any more.
 */
phb = hose->private_data;
-   if (phb->eeh_state & PNV_EEH_STATE_REMOVED)
+   phb_pe = eeh_phb_pe_get(hose);
+   if (!phb_pe || (phb_pe->state & EEH_PE_ISOLATED))
continue;
 
rc = opal_pci_next_error(phb->opal_id,
@@ -765,12 +753,6 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
switch (err_type) {
case OPAL_EEH_IOC_ERROR:
if (severity == OPAL_EEH_SEV_IOC_DEAD) {
-   list_for_each_entry(hose, &hose_list,
-   list_node) {
-   phb = hose->private_data;
-   phb->eeh_state |= PNV_EEH_STATE_REMOVED;
-   }
-
pr_err("EEH: dead IOC detected\n");
ret = EEH_NEXT_ERR_DEAD_IOC;
} else if (severity == OPAL_EEH_SEV_INF) {
@@ -783,17 +765,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
break;
case OPAL_EEH_PHB_ERROR:
if (severity == OPAL_EEH_SEV_PHB_DEAD) {
-   if (ioda_eeh_get_phb_pe(hose, pe))
-   break;
-
+   *pe = phb_pe;
pr_err("EEH: dead PHB#%x detected\n",
hose->global_number);
-   phb->eeh_state |= PNV_EEH_STATE_REMOVED;
ret = EEH_NEXT_ERR_DEAD_PHB;
} else if (severity == OPAL_EEH_SEV_PHB_FENCED) {
-   if (ioda_eeh_get_phb_pe(hose, pe))
-   break;
-
+   *pe = phb_pe;
pr_err("EEH: fenced PHB#%x detected\n",
hose->global_number);
ret = EEH_NEXT_ERR_FENCED_PHB;
@@ -813,15 +790,12 @@ static int ioda_eeh_next_error(struct eeh_pe **pe)
 * fenced PHB so that it can be recovered.
 */
if (ioda_eeh_get_pe(hose, frozen_pe_no, pe)) {
-   if (!ioda_eeh_get_phb_pe(hose, pe)) {
-   pr_err("EEH: Escalated fenced PHB#%x "
-  "detected for PE#%llx\n",
-   hose->global_number,
-   frozen_pe_no);
-  

[PATCH v2 0/9] EEH improvement

2014-02-24 Thread Gavin Shan
The series of patches intends to improve reliability of EEH on PowerNV
platform. First all, we have had multiple duplicate states (flags) for
PHB and PE, so we remove those duplicate states to simplify the code.
Besides, we had corrupted PHB diag-data for case of frozen PE. In order
to solve the problem, we introduce eeh_ops->event() and notifications
are sent from EEH core to (PowerNV) platform on creating or destroying
PE instance so that we can allocate or free PHB diag-data backend. Then
we cache the PHB diag-data on the first call to eeh_ops->get_state()
and dump it afterwards, which helps to get correct PHB diag-data.

With the patchset applied, we never dump PHB diag-data for INF errors.
Instead, we just maintain statistics in /proc/powerpc/eeh_inf_err. Also,
we changed the PHB diag-data dump format for a bit to have multiple
fields per line and omits the line with all zero'd fields as Ben suggested.


v1 -> v2:
* Amending commit logs
* Support eeh_ops->event() and maintain PHB diag-data on basis
  of PE instance
* When dumping PHB diag-data, to replace "-" with "" and
  omit the line if the fields of it are all zeros.

---

arch/powerpc/include/asm/eeh.h   |7 ++-
arch/powerpc/kernel/eeh.c|   10 +---
arch/powerpc/kernel/eeh_driver.c |   10 ++--
arch/powerpc/kernel/eeh_pe.c |   39 -
arch/powerpc/platforms/powernv/eeh-ioda.c|  193 
-
arch/powerpc/platforms/powernv/eeh-powernv.c |   74 +++-
arch/powerpc/platforms/powernv/pci.c |  228 
+-
arch/powerpc/platforms/powernv/pci.h |   11 ++--
arch/powerpc/platforms/pseries/eeh_pseries.c |3 +-
9 files changed, 358 insertions(+), 217 deletions(-)

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/9] powerpc/eeh: Introduce eeh_pe_free()

2014-02-24 Thread Gavin Shan
The patch introduces eeh_pe_free() to replace original kfree(pe)
so that we could have more checks there and calls to platform
interface supplied by eeh_ops in future.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_pe.c |   25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index f0c353f..2add834 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -60,6 +60,27 @@ static struct eeh_pe *eeh_pe_alloc(struct pci_controller 
*phb, int type)
 }
 
 /**
+ * eeh_pe_free - Free PE
+ * @pe: EEH PE
+ *
+ * Free PE instance dynamically
+ */
+static void eeh_pe_free(struct eeh_pe *pe)
+{
+   if (!pe)
+   return;
+
+   if (!list_empty(&pe->child_list) ||
+   !list_empty(&pe->edevs)) {
+   pr_warn("%s: PHB#%x-PE#%x has child PE or EEH dev\n",
+   __func__, pe->phb->global_number, pe->addr);
+   return;
+   }
+
+   kfree(pe);
+}
+
+/**
  * eeh_phb_pe_create - Create PHB PE
  * @phb: PCI controller
  *
@@ -374,7 +395,7 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
pr_err("%s: No PHB PE is found (PHB Domain=%d)\n",
__func__, edev->phb->global_number);
edev->pe = NULL;
-   kfree(pe);
+   eeh_pe_free(pe);
return -EEXIST;
}
}
@@ -433,7 +454,7 @@ int eeh_rmv_from_parent_pe(struct eeh_dev *edev)
if (list_empty(&pe->edevs) &&
list_empty(&pe->child_list)) {
list_del(&pe->child);
-   kfree(pe);
+   eeh_pe_free(pe);
} else {
break;
}
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface.

2014-02-24 Thread Mahesh Jagannath Salgaonkar
On 02/21/2014 05:41 AM, Stewart Smith wrote:
>  Mahesh J Salgaonkar  writes:
>  > This patch adds support to read error logs from OPAL and export them
>  > to userspace through sysfs interface /sys/firmware/opa/opal-elog.

Hi Stewart,

Thanks for the review. This code definitely needs improvement.

> 
>  I think we could provide a better interface with instead having a file
>  per log message appear in sysfs. We're never going to have more than 128
>  of these at any one time on the Linux side, so it's not going to bee too
>  many files.

It is not just about 128 files, we may be adding/removing sysfs node for
every new log id that gets informed to kernel and ack-ed. In worst case,
when we have flood of elog errors with user daemon consuming it and
ack-ing back to get ready for next log in a tight poll, we may
continuously add/remove the sysfs node for each new .

> 
>  e.g. /sys/firmware/opal/elog/
> 
>  that way, any new file in /sys/firmware/opal/elog/ means there's a new
>  log entry available. I believe there's 
> 
>  To ack a log, you could just echo 'ack' to the file.
> 
>  The other option woudl be to more closely follow what sysfs is meant to
>  be - ascii text. This would mean having more (any) of the parser in
>  kernel for the error logs - which may/may not be a bad idea.
> 
>  However, it would make the end user code for consuming them much much
>  simpler, and that may be a good thing.
> 
>  Having some way of getting some information out without a userspace
>  parser is probably good though, I'm pretty sure having only a binary
>  interface in /sys is at least partially frowned upon.
> 
>  > This is what user space tool would do:
>  > - Read error log from /sys/firmware/opa/opal-elog.
>  > - Save it to the disk.
>  > - Send an acknowledgement on successful consumption by writing error log
>  >   id to /sys/firmware/opa/opal-elog-ack.
> 
>  A userspace tool may want to explicitly *not* ack the log too, or only
>  ack some entries, so the interface sohuld be sane for this use case too.
> 
>  e.g. we could display them in petitboot.
> 
>  > diff --git a/arch/powerpc/platforms/powernv/opal-elog.c
>  > b/arch/powerpc/platforms/powernv/opal-elog.c
>  [ 2 more citation lines. Click/Enter to show. ]
>  > new file mode 100644
>  > index 000..fc891ae
>  > --- /dev/null
>  > +++ b/arch/powerpc/platforms/powernv/opal-elog.c
>  > @@ -0,0 +1,309 @@
>  
>  > +/* Maximum size of a single log on FSP is 16KB */
>  > +#define OPAL_MAX_ERRLOG_SIZE  16384
> 
>  I've seen some conflicting things on this - is it 2kb or 16kb?

We choose 16kb because we want to pull all the log data and not partial.

> 
>  > +
>  > +struct opal_err_log {
>  > +  struct list_head link;
>  > +  uint64_t opal_log_id;
> 
>  why is this uint64_t and not uint32_t? It appears that the log id is 32bits.

Agree, This needs to be changed to uint_32.

> 
>  > +  size_t opal_log_size;
>  > +  uint8_t data[OPAL_MAX_ERRLOG_SIZE];
>  > +};
>  > +
>  > +/* Pre-allocated temp buffer to pull error log from opal. */
>  > +static uint8_t err_log_data[OPAL_MAX_ERRLOG_SIZE];
> 
>  Why do we need temporary space? Why not just store directly into struct
>  opal_err_log?
> 
>  > +/* Protect err_log_data buf */
>  > +static DEFINE_MUTEX(err_log_data_mutex);
>  [ 15 more citation lines. Click/Enter to show. ]
>  > +
>  > +static uint64_t total_log_size;
>  > +static bool opal_log_available;
>  > +static LIST_HEAD(elog_list);
>  > +static LIST_HEAD(elog_ack_list);
>  > +
>  > +/* lock to protect elog_list and elog-ack_list. */
>  > +static DEFINE_SPINLOCK(opal_elog_lock);
>  > +
>  > +static DECLARE_WAIT_QUEUE_HEAD(opal_log_wait);
>  > +
>  > +/*
>  > + * Interface for user to acknowledge the error log.
>  > + *
>  > + * Once user acknowledge the log, we delete that record entry from the
>  > + * list and move it ack list.
>  > + */
>  > +void opal_elog_ack(uint64_t ack_id)
> 
>  s/ack_id/log_id/

Yup. makes sense.

> 
>  > +
>  > +static ssize_t elog_ack_store(struct kobject *kobj,
>  [ 7 more citation lines. Click/Enter to show. ]
>  > +  struct kobj_attribute *attr,
>  > +  const char *buf, size_t count)
>  > +{
>  > +  uint32_t log_ack_id;
>  > +  log_ack_id = *(uint32_t *) buf;
>  > +
>  > +  /* send acknowledgment to FSP */
>  > +  opal_elog_ack(log_ack_id);
>  > +  return 0;
>  > +}
> 
>  This function has a few problems:
> 
>  Consider the following actions:
>  $ echo 1 > /sys/firmware/opal/opal-elog-ack
>  $ echo 'abcde' > /sys/firmware/opal/opal-elog-ack
> 
>  The former will read undefined memory and the latter will make a kernel
>  thread, rsyslogd and systemd-journal all each a CPU each.
> 
>  Basically, the problems are:
>  1) not endian safe
>  2) not following store API of returning nr bytes read
>  3) binary interface. Use sscanf to read numbers instead.
> 
>  > +/*
>  > + * Show error log records to user.
>  [ 9 more citation lines. Click/Enter to show. ]
>  > + */
>  > +

Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore

2014-02-24 Thread Madhavan Srinivasan
On Saturday 22 February 2014 05:44 AM, Cody P Schafer wrote:
> /sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP
> in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does
> nothing. Add a pr_warn() to convince any users that they should stop
> using it.
> 
> The commit message from the removing commit notes that this
> functionality should move into the cpuidle driver, essentially by

Would prefer to cleanup the code since the functionality is moved,
instead of adding to it.

> adjusting target_residency to the specified value. At the moment,
> target_residency is not exposed by cpuidle's sysfs, so there isn't a
> drop in replacement for this.
> 
> Signed-off-by: Cody P Schafer 
> ---
>  arch/powerpc/kernel/sysfs.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 97e1dc9..84097b4 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -50,6 +50,9 @@ static ssize_t store_smt_snooze_delay(struct device *dev,
>   if (ret != 1)
>   return -EINVAL;
>  
> + pr_warn_ratelimited("%s (%d): 
> /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +   current->comm, task_pid_nr(current), cpu->dev.id);
> +
>   per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
>   return count;
>  }
> @@ -60,6 +63,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>  {
>   struct cpu *cpu = container_of(dev, struct cpu, dev);
>  
> + pr_warn_ratelimited("%s (%d): 
> /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +   current->comm, task_pid_nr(current), cpu->dev.id);
> +
>   return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
>  }
>  
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 10/11] powerpc/perf: add kconfig option for hypervisor provided counters

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:14 UTC, Cody P Schafer wrote:
> Signed-off-by: Cody P Schafer 
> ---
>  arch/powerpc/perf/Makefile | 2 ++
>  arch/powerpc/platforms/Kconfig.cputype | 6 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index 60d71ee..f9c083a 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)   += mpc7450-pmu.o
>  obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
>  obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
>  
> +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
> +
>  obj-$(CONFIG_PPC64)  += $(obj64-y)
>  obj-$(CONFIG_PPC32)  += $(obj32-y)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index 434fda3..dcc67cd 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -364,6 +364,12 @@ config PPC_PERF_CTRS
> help
>   This enables the powerpc-specific perf_event back-end.
>  
> +config HV_PERF_CTRS
> +   def_bool y

This was bool, why did you change it?

> +   depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT

Should be:

depends on PERF_EVENTS && PPC_PSERIES

> +   help
> + Enable access to perf counters provided by the hypervisor
> +

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 09/11] powerpc/perf: add support for the hv 24x7 interface

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:13 UTC, Cody P Schafer wrote:
> This provides a basic interface between hv_24x7 and perf. Similar to
> the one provided for gpci, it lacks transaction support and does not
> list any events.
> 
> Signed-off-by: Cody P Schafer 
> ---
>  arch/powerpc/perf/hv-24x7.c | 491 
> 
>  1 file changed, 491 insertions(+)
>  create mode 100644 arch/powerpc/perf/hv-24x7.c
> 
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> new file mode 100644
> index 000..13de140
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-24x7.c
...
> +
> +/*
> + * read_offset_data - copy data from one buffer to another while treating the
> + *source buffer as a small view on the total avaliable
> + *source data.
> + *
> + * @dest: buffer to copy into
> + * @dest_len: length of @dest in bytes
> + * @requested_offset: the offset within the source data we want. Must be > 0
> + * @src: buffer to copy data from
> + * @src_len: length of @src in bytes
> + * @source_offset: the offset in the sorce data that (src,src_len) refers to.
> + * Must be > 0
> + *
> + * returns the number of bytes copied.
> + *
> + * '.' areas in d are written to.
> + *
> + *   u
> + *   x w  v  z
> + * d   |.|
> + * s |--|
> + *
> + *  u
> + *   x w z v
> + * d   |--|
> + * s |--|
> + *
> + *   x wu,z,v
> + * d   ||
> + * s |--|
> + *
> + *   x,wu,v,z
> + * d |--|
> + * s |--|
> + *
> + *   xu
> + *   wv  z
> + * d ||
> + * s |--|
> + *
> + *   x  z   w  v
> + * d|--|
> + * s |--|
> + *
> + * x = source_offset
> + * w = requested_offset
> + * z = source_offset + src_len
> + * v = requested_offset + dest_len
> + *
> + * w_offset_in_s = w - x = requested_offset - source_offset
> + * z_offset_in_s = z - x = src_len
> + * v_offset_in_s = v - x = request_offset + dest_len - src_len
> + * u_offset_in_s = min(z_offset_in_s, v_offset_in_s)
> + *
> + * copy_len = u_offset_in_s - w_offset_in_s = min(z_offset_in_s, 
> v_offset_in_s)
> + *   - w_offset_in_s

Comments are great, especially for complicated code like this. But at a glance
I don't actually understand what this comment is trying to tell me.

> + */
> +static ssize_t read_offset_data(void *dest, size_t dest_len,
> + loff_t requested_offset, void *src,
> + size_t src_len, loff_t source_offset)
> +{
> + size_t w_offset_in_s = requested_offset - source_offset;
> + size_t z_offset_in_s = src_len;
> + size_t v_offset_in_s = requested_offset + dest_len - src_len;
> + size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s);
> + size_t copy_len = u_offset_in_s - w_offset_in_s;
> +
> + if (requested_offset < 0 || source_offset < 0)
> + return -EINVAL;
> +
> + if (z_offset_in_s <= w_offset_in_s)
> + return 0;
> +
> + memcpy(dest, src + w_offset_in_s, copy_len);
> + return copy_len;
> +}
> +
> +static unsigned long h_get_24x7_catalog_page(char page[static 4096],
> +  u32 version, u32 index)
> +{
> + WARN_ON(!IS_ALIGNED((unsigned long)page, 4096));
> + return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE,
> + virt_to_phys(page),
> + version,
> + index);
> +}
> +
> +static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *bin_attr, char *buf,
> + loff_t offset, size_t count)
> +{
> + unsigned long hret;
> + ssize_t ret = 0;
> + size_t catalog_len = 0, catalog_page_len = 0, page_count = 0;
> + loff_t page_offset = 0;
> + uint32_t catalog_version_num = 0;
> + void *page = kmalloc(4096, GFP_USER);
> + struct hv_24x7_catalog_page_0 *page_0 = page;
> + if (!page)
> + return -ENOMEM;
> +
> +
> + hret = h_get_24x7_catalog_page(page, 0, 0);
> + if (hret) {
> + ret = -EIO;
> + goto e_free;
> + }
> +
> + catalog_version_num = be32_to_cpu(page_0->version);
> + catalog_page_len = be32_to_cpu(page_0->length);
> + catalog_len = catalog_page_len * 4096;
> +
> + page_offset = offset / 4096;
> + page_count  = count  / 4096;
> +
> + if (page_offset >= catalog_page_len)
> + goto e_free;
> +
> + if (page_offset != 0) {
> + hret = h_get_24x7_catalog_page(page, catalog_version_num,
> +page_offset);
> + if (hret) {
> + ret = -EIO;
> + go

Re: [PATCH v2 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:12 UTC, Cody P Schafer wrote:
> This provides a basic link between perf and hv_gpci. Notably, it does
> not yet support transactions and does not list any events (they can
> still be manually composed).

Can you explain how the HV_CAPS stuff ends up looking.

I'm not against adding it, but I'd like to understand how we expect it to be
used a bit better.

cheers

> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> new file mode 100644
> index 000..1f5d96d
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-gpci.c
> +
> +static struct pmu h_gpci_pmu = {
> + .task_ctx_nr = perf_invalid_context,
> +
> + .name = "hv_gpci",
> + .attr_groups = attr_groups,
> + .event_init  = h_gpci_event_init,
> + .add = h_gpci_event_add,
> + .del = h_gpci_event_del,
 = h_gpci_event_stop,

> + .start   = h_gpci_event_start,
> + .stop= h_gpci_event_stop,
> + .read= h_gpci_event_read,
 = h_gpci_event_update

> + .event_idx = perf_swevent_event_idx,
> +};


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 05/11] powerpc: add hv_gpci interface header

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:09 UTC, Cody P Schafer wrote:
> "H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
> here on) is an interface to retrieve specific performance counters and
> other data from the hypervisor. All outputs have a fixed format (and
> are represented as structs in this patch).

I still see unused stuff in here, can you strip it back to just what we need.
Same goes for the next patch.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities

2014-02-24 Thread Michael Ellerman
[PATCH v2 07/11] powerpc: add a shared interface to get gpci version and 
capabilities

All the patches that touch perf should be "powerpc/perf: foo"

On Fri, 2014-14-02 at 22:02:11 UTC, Cody P Schafer wrote:
> ...

I realise this is a fairly small patch but a changelog is still nice. You could
for example mention that we don't currently use .ga, .expanded or .lab but
we're adding the logic anyway because ...


> Signed-off-by: Cody P Schafer 
> ---
>  arch/powerpc/perf/hv-common.c | 39 +++
>  arch/powerpc/perf/hv-common.h | 17 +
>  2 files changed, 56 insertions(+)
>  create mode 100644 arch/powerpc/perf/hv-common.c
>  create mode 100644 arch/powerpc/perf/hv-common.h
> 
> diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c
> new file mode 100644
> index 000..47e02b3
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-common.c
> @@ -0,0 +1,39 @@
> +#include 
> +#include 
> +
> +#include "hv-gpci.h"
> +#include "hv-common.h"
> +
> +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
> +{
> + unsigned long r;
> + struct p {
> + struct hv_get_perf_counter_info_params params;
> + struct cv_system_performance_capabilities caps;
> + } __packed __aligned(sizeof(uint64_t));
> +
> + struct p arg = {
> + .params = {
> + .counter_request = cpu_to_be32(
> + CIR_SYSTEM_PERFORMANCE_CAPABILITIES),
> + .starting_index = cpu_to_be32(-1),
> + .counter_info_version_in = 0,
> + }
> + };
> +
> + r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
> +virt_to_phys(&arg), sizeof(arg));
> +
> + if (r)
> + return r;
> +
> + pr_devel("capability_mask: 0x%x\n", arg.caps.capability_mask);
> +
> + caps->version = arg.params.counter_info_version_out;
> + caps->collect_privileged = !!arg.caps.perf_collect_privileged;
> + caps->ga = !!(arg.caps.capability_mask & CV_CM_GA);
> + caps->expanded = !!(arg.caps.capability_mask & CV_CM_EXPANDED);
> + caps->lab = !!(arg.caps.capability_mask & CV_CM_LAB);
> +
> + return r;
> +}
> diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h
> new file mode 100644
> index 000..7e615bd
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-common.h
> @@ -0,0 +1,17 @@
> +#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_
> +#define LINUX_POWERPC_PERF_HV_COMMON_H_
> +
> +#include 
> +
> +struct hv_perf_caps {
> + u16 version;
> + u16 collect_privileged:1,
> + ga:1,
> + expanded:1,
> + lab:1,
> + unused:12;
> +};
> +
> +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps);
> +
> +#endif
> -- 
> 1.8.5.4
> 
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:08 UTC, Cody P Schafer wrote:
> Signed-off-by: Cody P Schafer 
> ---
>  arch/powerpc/include/asm/hvcall.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h 
> b/arch/powerpc/include/asm/hvcall.h
> index d8b600b..652f7e4 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -274,6 +274,11 @@
>  /* Platform specific hcalls, used by KVM */
>  #define H_RTAS   0xf000
>  
> +/* "Platform specific hcalls", provided by PHYP */
> +#define H_GET_24X7_CATALOG_PAGE 0xF078
> +#define H_GET_24X7_DATA  0xF07C
> +#define H_GET_PERF_COUNTER_INFO 0xF080

Some tabs some spaces, use tabs.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 03/11] sysfs: create bin_attributes under the requested group

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:07 UTC, Cody P Schafer wrote:
> bin_attributes created/updated in create_files() (such as those listed
> via (struct device).attribute_groups) were not placed under the
> specified group, and instead appeared in the base kobj directory.
> 
> Fix this by making bin_attributes use creating code similar to normal
> attributes.
> 
> A quick grep shows that no one is using bin_attrs in a named attribute
> group yet, so we can do this without breaking anything in usespace.
> 
> Note that I do not add is_visible() support to
> bin_attributes, though that could be done as well.
> 
> Signed-off-by: Cody P Schafer 

Greg has already taken this, so we'll consider that as good as an ack from him,
unless he wants to give us one.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote:
> Export the swevent hrtimer helpers currently only used in events/core.c
> to allow the addition of architecture specific sw-like pmus.

Peter, Ingo, can we get your ACK on this please?

cheers


> Signed-off-by: Cody P Schafer 
> ---
>  include/linux/perf_event.h | 5 -
>  kernel/events/core.c   | 8 
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 2702e91..24378a9 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -559,7 +559,10 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
>   int src_cpu, int dst_cpu);
>  extern u64 perf_event_read_value(struct perf_event *event,
>u64 *enabled, u64 *running);
> -
> +extern void perf_swevent_init_hrtimer(struct perf_event *event);
> +extern void perf_swevent_start_hrtimer(struct perf_event *event);
> +extern void perf_swevent_cancel_hrtimer(struct perf_event *event);
> +extern int perf_swevent_event_idx(struct perf_event *event);
>  
>  struct perf_sample_data {
>   u64 type;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 56003c6..feb0347 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5816,7 +5816,7 @@ static int perf_swevent_init(struct perf_event *event)
>   return 0;
>  }
>  
> -static int perf_swevent_event_idx(struct perf_event *event)
> +int perf_swevent_event_idx(struct perf_event *event)
>  {
>   return 0;
>  }
> @@ -6045,7 +6045,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct 
> hrtimer *hrtimer)
>   return ret;
>  }
>  
> -static void perf_swevent_start_hrtimer(struct perf_event *event)
> +void perf_swevent_start_hrtimer(struct perf_event *event)
>  {
>   struct hw_perf_event *hwc = &event->hw;
>   s64 period;
> @@ -6067,7 +6067,7 @@ static void perf_swevent_start_hrtimer(struct 
> perf_event *event)
>   HRTIMER_MODE_REL_PINNED, 0);
>  }
>  
> -static void perf_swevent_cancel_hrtimer(struct perf_event *event)
> +void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  {
>   struct hw_perf_event *hwc = &event->hw;
>  
> @@ -6079,7 +6079,7 @@ static void perf_swevent_cancel_hrtimer(struct 
> perf_event *event)
>   }
>  }
>  
> -static void perf_swevent_init_hrtimer(struct perf_event *event)
> +void perf_swevent_init_hrtimer(struct perf_event *event)
>  {
>   struct hw_perf_event *hwc = &event->hw;
>  
> -- 
> 1.8.5.4
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus

2014-02-24 Thread Michael Ellerman
On Fri, 2014-14-02 at 22:02:05 UTC, Cody P Schafer wrote:
> Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
> generate functions to extract the relevent bits from
> event->attr.config{,1,2} for use by sw-like pmus where the
> 'config{,1,2}' values don't map directly to hardware registers.
> 
> Signed-off-by: Cody P Schafer 
> ---
>  include/linux/perf_event.h | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e56b07f..2702e91 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -871,4 +871,21 @@ _name##_show(struct device *dev, 
> \
>   \
>  static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
>  
> +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)   \
> +PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);
> \
> +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)
> +
> +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)   \
> +static u64 event_get_##name##_max(void)  
> \
> +{\
> + int bits = (bit_end) - (bit_start) + 1; \
> + return ((0x1ULL << (bits - 1ULL)) - 1ULL) | \
> + (0xFULL << (bits - 4ULL));  \
> +}\
> +static u64 event_get_##name(struct perf_event *event)
> \
> +{\
> + return (event->attr.attr_var >> (bit_start)) &  \
> + event_get_##name##_max();   \
> +}

I still don't like the names.

EVENT_GETTER_AND_FORMAT()
EVENT_RESERVED()

?

It's not clear to me the max routine is useful in general. Can't we just do:

> +#define EVENT_RESERVED(name, attr_var, bit_start, bit_end)   \
> +static u64 event_get_##name(struct perf_event *event)\
> +{\
> + return (event->attr.attr_var >> (bit_start)) &  \
> + ((0x1ULL << ((bit_end) - (bit_start) + 1)) - 1ULL); \
> +}


cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mm: return NUMA_NO_NODE in local_memory_node if zonelists are not setup

2014-02-24 Thread Nishanth Aravamudan
On 24.02.2014 [13:43:31 -0600], Christoph Lameter wrote:
> On Fri, 21 Feb 2014, Nishanth Aravamudan wrote:
> 
> > I added two calls to local_memory_node(), I *think* both are necessary,
> > but am willing to be corrected.
> >
> > One is in map_cpu_to_node() and one is in start_secondary(). The
> > start_secondary() path is fine, AFAICT, as we are up & running at that
> > point. But in [the renamed function] update_numa_cpu_node() which is
> > used by hotplug, we get called from do_init_bootmem(), which is before
> > the zonelists are setup.
> >
> > I think both calls are necessary because I believe the
> > arch_update_cpu_topology() is used for supporting firmware-driven
> > home-noding, which does not invoke start_secondary() again (the
> > processor is already running, we're just updating the topology in that
> > situation).
> >
> > Then again, I could special-case the do_init_bootmem callpath, which is
> > only called at kernel init time?
> 
> Well taht looks to be simpler.

Ok, I'll work on this.

> > > I do agree that calling local_memory_node() too early then trying to
> > > fudge around the consequences seems rather wrong.
> >
> > If the answer is to simply not call local_memory_node() early, I'll
> > submit a patch to at least add a comment, as there's nothing in the code
> > itself to prevent this from happening and is guaranteed to oops.
> 
> Ok.

Thanks!
-Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/powernv Platform dump interface

2014-02-24 Thread Stewart Smith
This enables support for userspace to fetch and initiate FSP and
Platform dumps from the service processor (via firmware) through sysfs.

Based on original patch from Vasant Hegde 

Flow:
  - We register for OPAL notification events.
  - OPAL sends new dump available notification.
  - We make information on dump available via sysfs
  - Userspace requests dump contents
  - We retrieve the dump via OPAL interface
  - User copies the dump data
  - userspace sends ack for dump
  - We send ACK to OPAL.

sysfs files:
  - We add the /sys/firmware/opal/dump directory
  - echoing 1 (well, anything, but in future we may support
different dump types) to /sys/firmware/opal/dump/initiate_dump
will initiate a dump.
  - Each dump that we've been notified of gets a directory
in /sys/firmware/opal/dump/ with a name of the dump ID (in hex,
as this is what's used elsewhere to identify the dump).
  - Each dump has files: id, type, dump and acknowledge
dump is binary and is the dump itself.
echoing 'ack' to acknowledge (currently any string will do) will
acknowledge the dump and it will soon after disappear from sysfs.

OPAL APIs:
  - opal_dump_init()
  - opal_dump_info()
  - opal_dump_read()
  - opal_dump_ack()
  - opal_dump_resend_notification()

Currently we are only ever notified for one dump at a time (until
the user explicitly acks the current dump, then we get a notification
of the next dump), but this kernel code should "just work" when OPAL
starts notifying us of all the dumps present.

Signed-off-by: Stewart Smith 
---
 Documentation/ABI/stable/sysfs-firmware-opal-dump |   29 ++
 arch/powerpc/include/asm/opal.h   |   12 +
 arch/powerpc/platforms/powernv/Makefile   |2 +-
 arch/powerpc/platforms/powernv/opal-dump.c|  511 +
 arch/powerpc/platforms/powernv/opal-wrappers.S|5 +
 arch/powerpc/platforms/powernv/opal.c |2 +
 6 files changed, 560 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/stable/sysfs-firmware-opal-dump
 create mode 100644 arch/powerpc/platforms/powernv/opal-dump.c

diff --git a/Documentation/ABI/stable/sysfs-firmware-opal-dump 
b/Documentation/ABI/stable/sysfs-firmware-opal-dump
new file mode 100644
index 000..3c2d252
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-firmware-opal-dump
@@ -0,0 +1,29 @@
+What:  /sys/firmware/opal/dump
+Date:  Feb 2014
+Contact:   Stewart Smith 
+Description:
+   This directory exposes interfaces for interacting with
+   the FSP and platform dumps through OPAL firmware interface.
+
+   This is only for the powerpc/powernv platform.
+
+   initiate_dump:  When '1' is written to it,
+   we will initiate a dump.
+   Read this file for supported commands.
+
+   0x: A directory for dump 0x (in hex).
+
+   Each dump has the following files:
+   id: An ASCII representation of the dump ID
+   in hex.
+   type:   An ASCII representation of the type of
+   dump (or 'unknown').
+   dump:   A binary file containing the dump.
+   The size of the dump is the size of this file.
+   acknowledge:When 'ack' is written to this, we will
+   acknowledge that we've retrieved the
+   dump to the service processor. It will
+   then remove it, making the dump
+   inaccessible.
+   Reading this file will get a list of
+   supported actions.
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 40157e2..3194870 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -154,8 +154,13 @@ extern int opal_enter_rtas(struct rtas_args *args,
 #define OPAL_FLASH_VALIDATE76
 #define OPAL_FLASH_MANAGE  77
 #define OPAL_FLASH_UPDATE  78
+#define OPAL_DUMP_INIT 81
+#define OPAL_DUMP_INFO 82
+#define OPAL_DUMP_READ 83
+#define OPAL_DUMP_ACK  84
 #define OPAL_GET_MSG   85
 #define OPAL_CHECK_ASYNC_COMPLETION86
+#define OPAL_DUMP_RESEND   91
 #define OPAL_SYNC_HOST_REBOOT  87
 
 #ifndef __ASSEMBLY__
@@ -237,6 +242,7 @@ enum OpalPendingState {
OPAL_EVENT_EPOW = 0x80,
OPAL_EVENT_LED_STATUS   = 0x100,
OPAL_EVENT_PCI_ERROR= 0x200,
+   OPAL_EVENT_DUMP_AVAIL   = 0x400,
OPAL_EVENT_MSG_PENDING  = 0x800,
 };
 
@@ -826,6 +832,11 @@ int64_t 

Re: [PATCH] powerpc/crashdump : fix page frame number check in copy_oldmem_page

2014-02-24 Thread Michael Ellerman
On Mon, 2014-02-24 at 17:30 +0100, Laurent Dufour wrote:
> In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
> decide if the page is backed or not, is not valid when the memory layout is
> not continuous.
> 
> This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
> in the memory. In that case max_pfn points to the end of RTAS, and a hole
> between the end of the kdump kernel and RTAS is not backed by PTEs. As a
> consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
> in a direct way the pages in that hole.
> 
> This fix relies on the memblock's service memblock_is_region_memory to
> check if the read page is part or not of the directly accessible memory.

Hi Laurent,

This looks good to me, assuming you've tested it on a PowerVM system as well as
under KVM.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc476: Enable a linker work around for IBM errata #46

2014-02-24 Thread Alistair Popple
On Mon, 24 Feb 2014 08:35:06 Josh Boyer wrote:
> On Mon, Feb 24, 2014 at 2:00 AM, Alistair Popple  
wrote:
> > This patch adds an option to enable a work around for an icache bug on
> > 476 that can cause execution of stale instructions when falling
> > through pages (IBM errata #46). It requires a recent version of
> > binutils which supports the --ppc476-workaround option.
> > 
> > The work around enables the appropriate linker options and ensures
> > that all module output sections are aligned to 4K page boundaries. The
> > work around is only required when building modules.
> 
> What happens if you're using 64K pages?  Is the alignment 4K always,
> or does it need to be aligned to PAGE_SIZE?

The work around inserts an extra instruction on 4K page boundaries. As a 64K 
(or a 16K) page boundary is also a 4K page boundary the work around should 
cover those page sizes as well.

- Alistair

> josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] watchdog: mpc8xxx_wdt: MPC8xx is HW enabled

2014-02-24 Thread Wim Van Sebroeck
Hi Christophe,

> MPC8xx watchdog is enabled at startup by HW.
> If the bootloader disables it, it cannot be reenabled.
> 
> Signed-off-by: Christophe Leroy 
> 
> diff -ur a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
> --- a/drivers/watchdog/mpc8xxx_wdt.c  2013-05-11 22:57:46.0 +0200
> +++ b/drivers/watchdog/mpc8xxx_wdt.c  2013-08-08 02:12:15.0 +0200
> @@ -273,6 +310,7 @@
>   .compatible = "fsl,mpc823-wdt",
>   .data = &(struct mpc8xxx_wdt_type) {
>   .prescaler = 0x800,
> + .hw_enabled = true,
>   },
>   },
>   {},
> 
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
> parce que la protection avast! Antivirus est active.
> http://www.avast.com
> 

This patch has been added to linux-watchdog-next.

Kind regards,
Wim.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-24 Thread Christoph Lameter
On Mon, 24 Feb 2014, Joonsoo Kim wrote:

> > It will not common get there because of the tracking. Instead a per cpu
> > object will be used.
> > > get_partial_node() always fails even if there are some partial slab on
> > > memoryless node's neareast node.
> >
> > Correct and that leads to a page allocator action whereupon the node will
> > be marked as empty.
>
> Why do we need to request to a page allocator if there is partial slab?
> Checking whether node is memoryless or not is really easy, so we don't need
> to skip this. To skip this is suboptimal solution.

The page allocator action is also used to determine to which other node we
should fall back if the node is empty. So we need to call the page
allocator when the per cpu slab is exhaused with the node of the
memoryless node to get memory from the proper fallback node.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] mm: return NUMA_NO_NODE in local_memory_node if zonelists are not setup

2014-02-24 Thread Christoph Lameter
On Fri, 21 Feb 2014, Nishanth Aravamudan wrote:

> I added two calls to local_memory_node(), I *think* both are necessary,
> but am willing to be corrected.
>
> One is in map_cpu_to_node() and one is in start_secondary(). The
> start_secondary() path is fine, AFAICT, as we are up & running at that
> point. But in [the renamed function] update_numa_cpu_node() which is
> used by hotplug, we get called from do_init_bootmem(), which is before
> the zonelists are setup.
>
> I think both calls are necessary because I believe the
> arch_update_cpu_topology() is used for supporting firmware-driven
> home-noding, which does not invoke start_secondary() again (the
> processor is already running, we're just updating the topology in that
> situation).
>
> Then again, I could special-case the do_init_bootmem callpath, which is
> only called at kernel init time?

Well taht looks to be simpler.

> > I do agree that calling local_memory_node() too early then trying to
> > fudge around the consequences seems rather wrong.
>
> If the answer is to simply not call local_memory_node() early, I'll
> submit a patch to at least add a comment, as there's nothing in the code
> itself to prevent this from happening and is guaranteed to oops.

Ok.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/crashdump : fix page frame number check in copy_oldmem_page

2014-02-24 Thread Laurent Dufour
In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
decide if the page is backed or not, is not valid when the memory layout is
not continuous.

This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
in the memory. In that case max_pfn points to the end of RTAS, and a hole
between the end of the kdump kernel and RTAS is not backed by PTEs. As a
consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
in a direct way the pages in that hole.

This fix relies on the memblock's service memblock_is_region_memory to
check if the read page is part or not of the directly accessible memory.

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kernel/crash_dump.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 11c1d06..7a13f37 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -98,17 +98,19 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
size_t csize, unsigned long offset, int userbuf)
 {
void  *vaddr;
+   phys_addr_t paddr;
 
if (!csize)
return 0;
 
csize = min_t(size_t, csize, PAGE_SIZE);
+   paddr = pfn << PAGE_SHIFT;
 
-   if ((min_low_pfn < pfn) && (pfn < max_pfn)) {
-   vaddr = __va(pfn << PAGE_SHIFT);
+   if (memblock_is_region_memory(paddr, csize)) {
+   vaddr = __va(paddr);
csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf);
} else {
-   vaddr = __ioremap(pfn << PAGE_SHIFT, PAGE_SIZE, 0);
+   vaddr = __ioremap(paddr, PAGE_SIZE, 0);
csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf);
iounmap(vaddr);
}

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc476: Enable a linker work around for IBM errata #46

2014-02-24 Thread Josh Boyer
On Mon, Feb 24, 2014 at 2:00 AM, Alistair Popple  wrote:
> This patch adds an option to enable a work around for an icache bug on
> 476 that can cause execution of stale instructions when falling
> through pages (IBM errata #46). It requires a recent version of
> binutils which supports the --ppc476-workaround option.
>
> The work around enables the appropriate linker options and ensures
> that all module output sections are aligned to 4K page boundaries. The
> work around is only required when building modules.

What happens if you're using 64K pages?  Is the alignment 4K always,
or does it need to be aligned to PAGE_SIZE?

josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RFC v8 2/5] dma: mpc512x: add support for peripheral transfers

2014-02-24 Thread Andy Shevchenko
On Mon, 2014-02-24 at 15:09 +0400, Alexander Popov wrote:
> Introduce support for slave s/g transfer preparation and the associated
> device control callback in the MPC512x DMA controller driver, which adds
> support for data transfers between memory and peripheral I/O to the
> previously supported mem-to-mem transfers.
> 

Few comments below.

> Signed-off-by: Alexander Popov 
> ---
>  drivers/dma/mpc512x_dma.c | 235 
> +-
>  1 file changed, 230 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
> index 2ce248b..8f504cb 100644
> --- a/drivers/dma/mpc512x_dma.c
> +++ b/drivers/dma/mpc512x_dma.c
> @@ -2,6 +2,7 @@
>   * Copyright (C) Freescale Semicondutor, Inc. 2007, 2008.
>   * Copyright (C) Semihalf 2009
>   * Copyright (C) Ilya Yanok, Emcraft Systems 2010
> + * Copyright (C) Alexander Popov, Promcontroller 2013
>   *
>   * Written by Piotr Ziecik . Hardware description
>   * (defines, structures and comments) was taken from MPC5121 DMA driver
> @@ -29,8 +30,17 @@
>   */
>  
>  /*
> - * This is initial version of MPC5121 DMA driver. Only memory to memory
> - * transfers are supported (tested using dmatest module).
> + * MPC512x and MPC8308 DMA driver. It supports
> + * memory to memory data transfers (tested using dmatest module) and
> + * data transfers between memory and peripheral I/O memory
> + * by means of slave s/g with these limitations:
> + *  - chunked transfers (transfers with more than one part) are refused
> + * as long as proper support for scatter/gather is missing;
> + *  - transfers on MPC8308 always start from software as this SoC appears
> + * not to have external request lines for peripheral flow control;
> + *  - minimal memory <-> I/O memory transfer chunk is 4 bytes and 
> consequently
> + * source and destination addresses must be 4-byte aligned
> + * and transfer size must be aligned on (4 * maxburst) boundary;
>   */
>  
>  #include 
> @@ -189,6 +199,7 @@ struct mpc_dma_desc {
>   dma_addr_t  tcd_paddr;
>   int error;
>   struct list_headnode;
> + int will_access_peripheral;
>  };
>  
>  struct mpc_dma_chan {
> @@ -201,6 +212,10 @@ struct mpc_dma_chan {
>   struct mpc_dma_tcd  *tcd;
>   dma_addr_t  tcd_paddr;
>  
> + /* Settings for access to peripheral FIFO */
> + dma_addr_t  per_paddr;  /* FIFO address */
> + u32 tcd_nunits;
> +
>   /* Lock for this structure */
>   spinlock_t  lock;
>  };
> @@ -251,8 +266,23 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
>   struct mpc_dma_desc *mdesc;
>   int cid = mchan->chan.chan_id;
>  
> - /* Move all queued descriptors to active list */
> - list_splice_tail_init(&mchan->queued, &mchan->active);
> + while (!list_empty(&mchan->queued)) {
> + mdesc = list_first_entry(&mchan->queued,
> + struct mpc_dma_desc, node);
> + /*
> +  * Grab either several mem-to-mem transfer descriptors
> +  * or one peripheral transfer descriptor,
> +  * don't mix mem-to-mem and peripheral transfer descriptors
> +  * within the same 'active' list.
> +  */
> + if (mdesc->will_access_peripheral) {
> + if (list_empty(&mchan->active))
> + list_move_tail(&mdesc->node, &mchan->active);
> + break;
> + } else {
> + list_move_tail(&mdesc->node, &mchan->active);
> + }
> + }
>  
>   /* Chain descriptors into one transaction */
>   list_for_each_entry(mdesc, &mchan->active, node) {
> @@ -278,7 +308,17 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
>  
>   if (first != prev)
>   mdma->tcd[cid].e_sg = 1;
> - out_8(&mdma->regs->dmassrt, cid);
> +
> + if (mdma->is_mpc8308) {
> + /* MPC8308, no request lines, software initiated start */
> + out_8(&mdma->regs->dmassrt, cid);
> + } else if (first->will_access_peripheral) {
> + /* peripherals involved, start by external request signal */

Probably you have to keep style of all comments in the code. For
example, let's start sentences from a capital letter.

> + out_8(&mdma->regs->dmaserq, cid);
> + } else {
> + /* memory to memory transfer, software initiated start */
> + out_8(&mdma->regs->dmassrt, cid);
> + }
>  }
>  
>  /* Handle interrupt on one half of DMA controller (32 channels) */
> @@ -596,6 +636,7 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t 
> dst, dma_addr_t src,
>   }
>  
>   mdesc->error = 0;
> + mdesc->will_access_peripheral = 0;
>   tcd = 

Re: [PATCH RFC v8 5/5] dma: mpc512x: register for device tree channel lookup

2014-02-24 Thread Andy Shevchenko
On Mon, 2014-02-24 at 15:09 +0400, Alexander Popov wrote:
> From: Gerhard Sittig 
> 
> register the controller for device tree based lookup of DMA channels
> (non-fatal for backwards compatibility with older device trees) and
> provide the '#dma-cells' property in the shared mpc5121.dtsi file
> 
> Signed-off-by: Gerhard Sittig 
> [ a13xp0p0...@gmail.com: resolve little patch conflict and put
>   MPC512x DMA controller bindings document to a separate patch ]
> ---
>  arch/powerpc/boot/dts/mpc5121.dtsi |  1 +
>  drivers/dma/mpc512x_dma.c  | 21 ++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/boot/dts/mpc5121.dtsi 
> b/arch/powerpc/boot/dts/mpc5121.dtsi
> index 2c0e155..7f9d14f 100644
> --- a/arch/powerpc/boot/dts/mpc5121.dtsi
> +++ b/arch/powerpc/boot/dts/mpc5121.dtsi
> @@ -498,6 +498,7 @@
>   compatible = "fsl,mpc5121-dma";
>   reg = <0x14000 0x1800>;
>   interrupts = <65 0x8>;
> + #dma-cells = <1>;
>   };
>   };
>  
> diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
> index 8f504cb..d9f8740 100644
> --- a/drivers/dma/mpc512x_dma.c
> +++ b/drivers/dma/mpc512x_dma.c
> @@ -52,6 +52,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -1018,11 +1019,23 @@ static int mpc_dma_probe(struct platform_device *op)
>   /* Register DMA engine */
>   dev_set_drvdata(dev, mdma);
>   retval = dma_async_device_register(dma);
> - if (retval) {
> - devm_free_irq(dev, mdma->irq, mdma);
> - irq_dispose_mapping(mdma->irq);
> + if (retval)
> + goto out_irq;
> +
> + /* register with OF helpers for DMA lookups (nonfatal) */
> + if (dev->of_node) {
> + retval = of_dma_controller_register(dev->of_node,
> + of_dma_xlate_by_chan_id,
> + mdma);
> + if (retval)
> + dev_warn(dev, "could not register for OF lookup\n");
>   }
>  
> + return 0;
> +
> +out_irq:
> + devm_free_irq(dev, mdma->irq, mdma);

Something wrong either with devm_request_irq() or you don't need to call
devm_free_irq() explicitly. Once we already try to discuss this earlier
in this mailing list with Lars-Peter(?), though there were no solution
how to keep devm_*_irq usability.

> + irq_dispose_mapping(mdma->irq);
>   return retval;
>  }
>  
> @@ -1031,6 +1044,8 @@ static int mpc_dma_remove(struct platform_device *op)
>   struct device *dev = &op->dev;
>   struct mpc_dma *mdma = dev_get_drvdata(dev);
>  
> + if (dev->of_node)
> + of_dma_controller_free(dev->of_node);
>   dma_async_device_unregister(&mdma->dma);
>   devm_free_irq(dev, mdma->irq, mdma);
>   irq_dispose_mapping(mdma->irq);


-- 
Andy Shevchenko 
Intel Finland Oy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch 03/26] powerpc: eeh: Kill another abuse of irq_desc

2014-02-24 Thread Thomas Gleixner
On Mon, 24 Feb 2014, Gavin Shan wrote:
> On Sun, Feb 23, 2014 at 09:40:09PM -, Thomas Gleixner wrote:
> >commit 91150af3a (powerpc/eeh: Fix unbalanced enable for IRQ) is
> >another brilliant example of trainwreck engineering.
> >
> >The patch "fixes" the issue of an unbalanced call to irq_enable()
> >which causes a prominent warning by checking the disabled state of the
> >interrupt line and call conditionally into the core code.
> >
> >This is wrong in two aspects:
> >
> >1) The warning is there to tell users, that they need to fix their
> >   asymetric enable/disable patterns by finding the root cause and
> >   solving it there.
> >
> >   It's definitely not meant to work around it by conditionally
> >   calling into the core code depending on the random state of the irq
> >   line.
> >
> >   Asymetric irq_disable/enable calls are a clear sign of wrong usage
> >   of the interfaces which have to be cured at the root and not by
> >   somehow hacking around it.
> >
> >2) The abuse of core internal data structure instead of using the
> >   proper interfaces for retrieving the information for the 'hack
> >   around'
> >
> >   irq_desc is core internal and it's clear enough stated.
> >
> >Replace at least the irq_desc abuse with the proper functions and add
> >a big fat comment why this is absurd and completely wrong.
> >
> 
> Thanks for pointing it out. I think we might have this patch for now
> and I'll look into individual drivers to fix the unbalanced function
> calls later one by one.

Fine with me. You wont escape my scan scripts :)

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RFC v8 5/5] dma: mpc512x: register for device tree channel lookup

2014-02-24 Thread Alexander Popov
From: Gerhard Sittig 

register the controller for device tree based lookup of DMA channels
(non-fatal for backwards compatibility with older device trees) and
provide the '#dma-cells' property in the shared mpc5121.dtsi file

Signed-off-by: Gerhard Sittig 
[ a13xp0p0...@gmail.com: resolve little patch conflict and put
  MPC512x DMA controller bindings document to a separate patch ]
---
 arch/powerpc/boot/dts/mpc5121.dtsi |  1 +
 drivers/dma/mpc512x_dma.c  | 21 ++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc5121.dtsi 
b/arch/powerpc/boot/dts/mpc5121.dtsi
index 2c0e155..7f9d14f 100644
--- a/arch/powerpc/boot/dts/mpc5121.dtsi
+++ b/arch/powerpc/boot/dts/mpc5121.dtsi
@@ -498,6 +498,7 @@
compatible = "fsl,mpc5121-dma";
reg = <0x14000 0x1800>;
interrupts = <65 0x8>;
+   #dma-cells = <1>;
};
};
 
diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index 8f504cb..d9f8740 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -1018,11 +1019,23 @@ static int mpc_dma_probe(struct platform_device *op)
/* Register DMA engine */
dev_set_drvdata(dev, mdma);
retval = dma_async_device_register(dma);
-   if (retval) {
-   devm_free_irq(dev, mdma->irq, mdma);
-   irq_dispose_mapping(mdma->irq);
+   if (retval)
+   goto out_irq;
+
+   /* register with OF helpers for DMA lookups (nonfatal) */
+   if (dev->of_node) {
+   retval = of_dma_controller_register(dev->of_node,
+   of_dma_xlate_by_chan_id,
+   mdma);
+   if (retval)
+   dev_warn(dev, "could not register for OF lookup\n");
}
 
+   return 0;
+
+out_irq:
+   devm_free_irq(dev, mdma->irq, mdma);
+   irq_dispose_mapping(mdma->irq);
return retval;
 }
 
@@ -1031,6 +1044,8 @@ static int mpc_dma_remove(struct platform_device *op)
struct device *dev = &op->dev;
struct mpc_dma *mdma = dev_get_drvdata(dev);
 
+   if (dev->of_node)
+   of_dma_controller_free(dev->of_node);
dma_async_device_unregister(&mdma->dma);
devm_free_irq(dev, mdma->irq, mdma);
irq_dispose_mapping(mdma->irq);
-- 
1.8.4.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RFC v8 4/5] dma: mpc512x: add device tree binding document

2014-02-24 Thread Alexander Popov
From: Gerhard Sittig 

introduce a device tree binding document for the MPC512x DMA controller

Signed-off-by: Gerhard Sittig 
[ a13xp0p0...@gmail.com: turn this into a separate patch ]
---
 .../devicetree/bindings/dma/mpc512x-dma.txt| 55 ++
 1 file changed, 55 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/mpc512x-dma.txt

diff --git a/Documentation/devicetree/bindings/dma/mpc512x-dma.txt 
b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
new file mode 100644
index 000..a4867d5
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/mpc512x-dma.txt
@@ -0,0 +1,55 @@
+* Freescale MPC512x DMA Controller
+
+The DMA controller in the Freescale MPC512x SoC can move blocks of
+memory contents between memory and peripherals or memory to memory.
+
+Refer to the "Generic DMA Controller and DMA request bindings" description
+in the dma.txt file for a more detailled discussion of the binding.  The
+MPC512x DMA engine binding follows the common scheme, but doesn't provide
+support for the optional channels and requests counters (those values are
+derived from the detected hardware features) and has a fixed client
+specifier length of 1 integer cell (the value is the DMA channel, since
+the DMA controller uses a fixed assignment of request lines per channel).
+
+
+DMA controller node properties:
+
+Required properties:
+- compatible:  should be "fsl,mpc5121-dma"
+- reg: address and size of the DMA controller's register set
+- interrupts:  interrupt spec for the DMA controller
+
+Optional properties:
+- #dma-cells:  must be <1>, describes the number of integer cells
+   needed to specify the 'dmas' property in client nodes,
+   strongly recommended since common client helper code
+   uses this property
+
+Example:
+
+   dma0: dma@14000 {
+   compatible = "fsl,mpc5121-dma";
+   reg = <0x14000 0x1800>;
+   interrupts = <65 0x8>;
+   #dma-cells = <1>;
+   };
+
+
+Client node properties:
+
+Required properties:
+- dmas:list of DMA specifiers, consisting each of a 
handle
+   for the DMA controller and integer cells to specify
+   the channel used within the DMA controller
+- dma-names:   list of identifier strings for the DMA specifiers,
+   client device driver code uses these strings to
+   have DMA channels looked up at the controller
+
+Example:
+
+   sdhc@1500 {
+   compatible = "fsl,mpc5121-sdhc";
+   /* ... */
+   dmas = <&dma0 30>;
+   dma-names = "rx-tx";
+   };
-- 
1.8.4.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RFC v8 3/5] dma: of: Add common xlate function for matching by channel id

2014-02-24 Thread Alexander Popov
This patch adds a new common OF dma xlate callback function which will match a
channel by it's id. The binding expects one integer argument which it will use 
to
lookup the channel by the id.

Unlike of_dma_simple_xlate this function is able to handle a system with
multiple DMA controllers. When registering the of dma provider with
of_dma_controller_register a pointer to the dma_device struct which is
associated with the dt node needs to passed as the data parameter.
New function will use this pointer to match only channels which belong to the
specified DMA controller.

Signed-off-by: Alexander Popov 
---
 drivers/dma/of-dma.c   | 35 +++
 include/linux/of_dma.h |  4 
 2 files changed, 39 insertions(+)

diff --git a/drivers/dma/of-dma.c b/drivers/dma/of-dma.c
index e8fe9dc..d5fbeaa 100644
--- a/drivers/dma/of-dma.c
+++ b/drivers/dma/of-dma.c
@@ -218,3 +218,38 @@ struct dma_chan *of_dma_simple_xlate(struct 
of_phandle_args *dma_spec,
&dma_spec->args[0]);
 }
 EXPORT_SYMBOL_GPL(of_dma_simple_xlate);
+
+/**
+ * of_dma_xlate_by_chan_id - Translate dt property to DMA channel by channel id
+ * @dma_spec:  pointer to DMA specifier as found in the device tree
+ * @of_dma:pointer to DMA controller data
+ *
+ * This function can be used as the of xlate callback for DMA driver which 
wants
+ * to match the channel based on the channel id. When using this xlate function
+ * the #dma-cells propety of the DMA controller dt node needs to be set to 1.
+ * The data parameter of of_dma_controller_register must be a pointer to the
+ * dma_device struct the function should match upon.
+ *
+ * Returns pointer to appropriate dma channel on success or NULL on error.
+ */
+struct dma_chan *of_dma_xlate_by_chan_id(struct of_phandle_args *dma_spec,
+struct of_dma *ofdma)
+{
+   struct dma_device *dev = ofdma->of_dma_data;
+   struct dma_chan *chan, *candidate = NULL;
+
+   if (!dev || dma_spec->args_count != 1)
+   return NULL;
+
+   list_for_each_entry(chan, &dev->channels, device_node)
+   if (chan->chan_id == dma_spec->args[0]) {
+   candidate = chan;
+   break;
+   }
+
+   if (!candidate)
+   return NULL;
+
+   return dma_get_slave_channel(candidate);
+}
+EXPORT_SYMBOL_GPL(of_dma_xlate_by_chan_id);
diff --git a/include/linux/of_dma.h b/include/linux/of_dma.h
index ae36298..56bc026 100644
--- a/include/linux/of_dma.h
+++ b/include/linux/of_dma.h
@@ -41,6 +41,8 @@ extern struct dma_chan *of_dma_request_slave_channel(struct 
device_node *np,
 const char *name);
 extern struct dma_chan *of_dma_simple_xlate(struct of_phandle_args *dma_spec,
struct of_dma *ofdma);
+extern struct dma_chan *of_dma_xlate_by_chan_id(struct of_phandle_args 
*dma_spec,
+   struct of_dma *ofdma);
 #else
 static inline int of_dma_controller_register(struct device_node *np,
struct dma_chan *(*of_dma_xlate)
@@ -66,6 +68,8 @@ static inline struct dma_chan *of_dma_simple_xlate(struct 
of_phandle_args *dma_s
return NULL;
 }
 
+#define of_dma_xlate_by_chan_id NULL
+
 #endif
 
 #endif /* __LINUX_OF_DMA_H */
-- 
1.8.4.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RFC v8 2/5] dma: mpc512x: add support for peripheral transfers

2014-02-24 Thread Alexander Popov
Introduce support for slave s/g transfer preparation and the associated
device control callback in the MPC512x DMA controller driver, which adds
support for data transfers between memory and peripheral I/O to the
previously supported mem-to-mem transfers.

Signed-off-by: Alexander Popov 
---
 drivers/dma/mpc512x_dma.c | 235 +-
 1 file changed, 230 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index 2ce248b..8f504cb 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -2,6 +2,7 @@
  * Copyright (C) Freescale Semicondutor, Inc. 2007, 2008.
  * Copyright (C) Semihalf 2009
  * Copyright (C) Ilya Yanok, Emcraft Systems 2010
+ * Copyright (C) Alexander Popov, Promcontroller 2013
  *
  * Written by Piotr Ziecik . Hardware description
  * (defines, structures and comments) was taken from MPC5121 DMA driver
@@ -29,8 +30,17 @@
  */
 
 /*
- * This is initial version of MPC5121 DMA driver. Only memory to memory
- * transfers are supported (tested using dmatest module).
+ * MPC512x and MPC8308 DMA driver. It supports
+ * memory to memory data transfers (tested using dmatest module) and
+ * data transfers between memory and peripheral I/O memory
+ * by means of slave s/g with these limitations:
+ *  - chunked transfers (transfers with more than one part) are refused
+ * as long as proper support for scatter/gather is missing;
+ *  - transfers on MPC8308 always start from software as this SoC appears
+ * not to have external request lines for peripheral flow control;
+ *  - minimal memory <-> I/O memory transfer chunk is 4 bytes and consequently
+ * source and destination addresses must be 4-byte aligned
+ * and transfer size must be aligned on (4 * maxburst) boundary;
  */
 
 #include 
@@ -189,6 +199,7 @@ struct mpc_dma_desc {
dma_addr_t  tcd_paddr;
int error;
struct list_headnode;
+   int will_access_peripheral;
 };
 
 struct mpc_dma_chan {
@@ -201,6 +212,10 @@ struct mpc_dma_chan {
struct mpc_dma_tcd  *tcd;
dma_addr_t  tcd_paddr;
 
+   /* Settings for access to peripheral FIFO */
+   dma_addr_t  per_paddr;  /* FIFO address */
+   u32 tcd_nunits;
+
/* Lock for this structure */
spinlock_t  lock;
 };
@@ -251,8 +266,23 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
struct mpc_dma_desc *mdesc;
int cid = mchan->chan.chan_id;
 
-   /* Move all queued descriptors to active list */
-   list_splice_tail_init(&mchan->queued, &mchan->active);
+   while (!list_empty(&mchan->queued)) {
+   mdesc = list_first_entry(&mchan->queued,
+   struct mpc_dma_desc, node);
+   /*
+* Grab either several mem-to-mem transfer descriptors
+* or one peripheral transfer descriptor,
+* don't mix mem-to-mem and peripheral transfer descriptors
+* within the same 'active' list.
+*/
+   if (mdesc->will_access_peripheral) {
+   if (list_empty(&mchan->active))
+   list_move_tail(&mdesc->node, &mchan->active);
+   break;
+   } else {
+   list_move_tail(&mdesc->node, &mchan->active);
+   }
+   }
 
/* Chain descriptors into one transaction */
list_for_each_entry(mdesc, &mchan->active, node) {
@@ -278,7 +308,17 @@ static void mpc_dma_execute(struct mpc_dma_chan *mchan)
 
if (first != prev)
mdma->tcd[cid].e_sg = 1;
-   out_8(&mdma->regs->dmassrt, cid);
+
+   if (mdma->is_mpc8308) {
+   /* MPC8308, no request lines, software initiated start */
+   out_8(&mdma->regs->dmassrt, cid);
+   } else if (first->will_access_peripheral) {
+   /* peripherals involved, start by external request signal */
+   out_8(&mdma->regs->dmaserq, cid);
+   } else {
+   /* memory to memory transfer, software initiated start */
+   out_8(&mdma->regs->dmassrt, cid);
+   }
 }
 
 /* Handle interrupt on one half of DMA controller (32 channels) */
@@ -596,6 +636,7 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t dst, 
dma_addr_t src,
}
 
mdesc->error = 0;
+   mdesc->will_access_peripheral = 0;
tcd = mdesc->tcd;
 
/* Prepare Transfer Control Descriptor for this transaction */
@@ -643,6 +684,187 @@ mpc_dma_prep_memcpy(struct dma_chan *chan, dma_addr_t 
dst, dma_addr_t src,
return &mdesc->desc;
 }
 
+static struct dma_async_tx_descriptor *
+mpc_dma_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
+ 

[PATCH RFC v8 0/5] MPC512x DMA slave s/g support, OF DMA lookup

2014-02-24 Thread Alexander Popov
2013/7/14 Gerhard Sittig :
> this series
> - introduces slave s/g support (that's support for DMA transfers which
>involve peripherals in contrast to mem-to-mem transfers)
> - adds device tree based lookup support for DMA channels
> - combines floating patches and related feedback which already covered
>several aspects of what the suggested LPB driver needs, to demonstrate
>how integration might be done
> - carries Q&D SD card support to enable another DMA client during test,
>while this patch needs to get dropped upon pickup

Changes in v2:
> - re-order mpc8308 related code paths for improved readability, no
>change in behaviour, introduction of symbolic channel names here
>already
> - squash 'execute() start condition' and 'terminate all' into the
>introduction of 'slave s/g prep' and 'device control' support; refuse
>s/g lists with more than one item since slave support is operational
>yet proper s/g support is missing (can get addressed later)
> - always start transfers from software on MPC8308 as there are no
>external request lines for peripheral flow control
> - drop dt-bindings header file and symbolic channel names in OF nodes

Changes in v3 and v4:
 Part 1/5:
 - use #define instead of enum since individual channels don't require
special handling.
 Part 2/5:
 - add a flag "will_access_peripheral" to DMA transfer descriptor
according recommendations of Gerhard Sittig.
This flag is set in mpc_dma_prep_memcpy() and mpc_dma_prep_slave_sg()
and is evaluated in mpc_dma_execute() to choose a type of start for
the transfer.
 - prevent descriptors of transfers which involve peripherals from
being chained together;
each of such transfers needs hardware initiated start.
 - add locking while working with struct mpc_dma_chan
according recommendations of Lars-Peter Clausen.
 - remove default nbytes value. Client kernel modules must set
src_maxburst and dst_maxburst fields of struct dma_slave_config 
(dmaengine.h).

Changes in v5:
 Part 2/5:
 - add and improve comments;
 - improve the code moving transfer descriptors from 'queued' to 'active' list
in mpc_dma_execute();
 - allow mpc_dma_prep_slave_sg() to run with non-empty 'active' list;
 - take 'mdesc' back to 'free' list in case of error in mpc_dma_prep_slave_sg();
 - improve checks of the transfer parameters;
 - provide the default value for 'maxburst' in mpc_dma_device_control().

Changes in v6:
 Part 2/5:
 - remove doubtful comment;
 - fix coding style issues;
 - set default value for 'maxburst' to 1 which applies to most cases;
 Part 3/5:
 - use dma_get_slave_channel() instead of dma_request_channel()
in new function of_dma_xlate_by_chan_id() according recommendations of
Arnd Bergmann;
 Part 4/5:
 - set DMA_PRIVATE flag for MPC512x DMA controller since its driver relies on
of_dma_xlate_by_chan_id() which doesn't use dma_request_channel()
any more; (removed in v7)
 - resolve little patch conflict;
 Part 5/5:
 - resolve little patch conflict;

Changes in v7:
 Part 2:
 - improve comment;
 Part 4:
 - split in two separate patches. Part 4/6 contains device tree
binding document and in part 5/6 MPC512x DMA controller is registered
for device tree channel lookup;
 - remove setting DMA_PRIVATE flag for MPC512x DMA controller from part 5/6;

Changes in v8:
 Part 2:
 - improve comments;
 - fix style issues;
 Part 6:
 - remove since it has become obsolete;

> known issues:
> - it's yet to get confirmed whether MPC8308 can use slave support or
>whether the DMA controller's driver shall actively reject it, the
>information that's available so far suggests that peripheral transfers
>to IP bus attached I/O is useful and shall not get blocked right away
 - adding support for transfers which don't increment the RAM address or
do increment the peripheral "port's" address is easy with
this implementation; but which options of the common API
should be used for specifying such transfers?
2014/02/13 Gerhard Sittig :
> - The MPC512x DMA completely lacks a binding document, so one
>should get added.
> - The MPC8308 hardware is similar and can re-use the MPC512x
>binding, which should be stated.
> - The Linux implementation currently has no OF based channel
>lookup support, so '#dma-cells' is "a future feature".  I guess
>the binding can and should already discuss the feature,
>regardless of whether all implementations support it.


Alexander Popov (3):
  dma: mpc512x: reorder mpc8308 specific instructions
  dma: mpc512x: add support for peripheral transfers
  dma: of: Add common xlate function for matching by channel id

Gerhard Sittig (2):
  dma: mpc512x: add device tree binding document
  dma: mpc512x: register for device tree channel lookup

 .../devicetree/bindings/dma/mpc512x-dma.txt|  55 
 arch/powerpc/boot/dts/mpc5121.dtsi |   1 +
 drivers/dma/mpc512x_dma.c  | 298 

[PATCH RFC v8 1/5] dma: mpc512x: reorder mpc8308 specific instructions

2014-02-24 Thread Alexander Popov
Concentrate the specific code for MPC8308 in the 'if' branch
and handle MPC512x in the 'else' branch.
This modification only reorders instructions but doesn't change behaviour.

Signed-off-by: Alexander Popov 
Acked-by: Anatolij Gustschin 
Acked-by: Gerhard Sittig 
---
 drivers/dma/mpc512x_dma.c | 42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
index 448750d..2ce248b 100644
--- a/drivers/dma/mpc512x_dma.c
+++ b/drivers/dma/mpc512x_dma.c
@@ -52,9 +52,17 @@
 #define MPC_DMA_DESCRIPTORS64
 
 /* Macro definitions */
-#define MPC_DMA_CHANNELS   64
 #define MPC_DMA_TCD_OFFSET 0x1000
 
+/*
+ * Maximum channel counts for individual hardware variants
+ * and the maximum channel count over all supported controllers,
+ * used for data structure size
+ */
+#define MPC8308_DMACHAN_MAX16
+#define MPC512x_DMACHAN_MAX64
+#define MPC_DMA_CHANNELS   64
+
 /* Arbitration mode of group and channel */
 #define MPC_DMA_DMACR_EDCG (1 << 31)
 #define MPC_DMA_DMACR_ERGA (1 << 3)
@@ -710,10 +718,10 @@ static int mpc_dma_probe(struct platform_device *op)
 
dma = &mdma->dma;
dma->dev = dev;
-   if (!mdma->is_mpc8308)
-   dma->chancnt = MPC_DMA_CHANNELS;
+   if (mdma->is_mpc8308)
+   dma->chancnt = MPC8308_DMACHAN_MAX;
else
-   dma->chancnt = 16; /* MPC8308 DMA has only 16 channels */
+   dma->chancnt = MPC512x_DMACHAN_MAX;
dma->device_alloc_chan_resources = mpc_dma_alloc_chan_resources;
dma->device_free_chan_resources = mpc_dma_free_chan_resources;
dma->device_issue_pending = mpc_dma_issue_pending;
@@ -747,7 +755,19 @@ static int mpc_dma_probe(struct platform_device *op)
 * - Round-robin group arbitration,
 * - Round-robin channel arbitration.
 */
-   if (!mdma->is_mpc8308) {
+   if (mdma->is_mpc8308) {
+   /* MPC8308 has 16 channels and lacks some registers */
+   out_be32(&mdma->regs->dmacr, MPC_DMA_DMACR_ERCA);
+
+   /* enable snooping */
+   out_be32(&mdma->regs->dmagpor, MPC_DMA_DMAGPOR_SNOOP_ENABLE);
+   /* Disable error interrupts */
+   out_be32(&mdma->regs->dmaeeil, 0);
+
+   /* Clear interrupts status */
+   out_be32(&mdma->regs->dmaintl, 0x);
+   out_be32(&mdma->regs->dmaerrl, 0x);
+   } else {
out_be32(&mdma->regs->dmacr, MPC_DMA_DMACR_EDCG |
MPC_DMA_DMACR_ERGA | 
MPC_DMA_DMACR_ERCA);
 
@@ -768,18 +788,6 @@ static int mpc_dma_probe(struct platform_device *op)
/* Route interrupts to IPIC */
out_be32(&mdma->regs->dmaihsa, 0);
out_be32(&mdma->regs->dmailsa, 0);
-   } else {
-   /* MPC8308 has 16 channels and lacks some registers */
-   out_be32(&mdma->regs->dmacr, MPC_DMA_DMACR_ERCA);
-
-   /* enable snooping */
-   out_be32(&mdma->regs->dmagpor, MPC_DMA_DMAGPOR_SNOOP_ENABLE);
-   /* Disable error interrupts */
-   out_be32(&mdma->regs->dmaeeil, 0);
-
-   /* Clear interrupts status */
-   out_be32(&mdma->regs->dmaintl, 0x);
-   out_be32(&mdma->regs->dmaerrl, 0x);
}
 
/* Register DMA engine */
-- 
1.8.4.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] dt/bindings: fsl-fec: add "per" to clock properties

2014-02-24 Thread Gerhard Sittig
a recent FEC binding document update that was motivated by i.MX
development revealed that ARM and PowerPC implementations in Linux
did not agree on the clock names to use for the FEC nodes

update the FEC (fast ethernet controller) binding to document the
"per" clock name as an obsolete alias for "ipg"

Signed-off-by: Gerhard Sittig 
---

this patch depends on "dt/bindings: fsl-fec: add clock properties"
by Shawn Guo which introduces the context of this patch

the patch only is necessary if the MPC5121 .dtsi update (switch
FEC nodes from "per" to "ipg") won't make it for v3.14

---
 Documentation/devicetree/bindings/net/fsl-fec.txt |2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt 
b/Documentation/devicetree/bindings/net/fsl-fec.txt
index 468736d4323d..f59b58e29da6 100644
--- a/Documentation/devicetree/bindings/net/fsl-fec.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
@@ -24,6 +24,8 @@ Optional properties:
  or external oscillator via pad depending on board design.
- "enet_out": the phy reference clock provided by SoC via pad, which
  is available on SoC like i.MX28.
+   - "per": obsolete alias for "ipg" for compatibility with early
+ MPC5121 implementations, not recommended for new .dts files
 - clock-names: Must contain the clock names described just above
 
 Example:
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] dts: mpc512x: adjust clock specs for FEC nodes

2014-02-24 Thread Gerhard Sittig
a recent FEC binding document update that was motivated by i.MX
development revealed that ARM and PowerPC implementations in Linux
did not agree on the clock names to use for the FEC nodes

change clock names from "per" to "ipg" in the FEC nodes of the
mpc5121.dtsi include file such that the .dts specs comply with
the common FEC binding

this "incompatible" change does not break operation, because
- COMMON_CLK support for MPC5121/23/25 and adjusted .dts files
  were only introduced in Linux v3.14-rc1, no mainline release
  provided these specs before
- if this change won't make it for v3.14, the MPC512x CCF support
  provides full backwards compability, and keeps operating with
  device trees which lack clock specs or don't match in the names

Signed-off-by: Gerhard Sittig 
---
 arch/powerpc/boot/dts/mpc5121.dtsi |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc5121.dtsi 
b/arch/powerpc/boot/dts/mpc5121.dtsi
index 2c0e1552d20b..a5a375598ed8 100644
--- a/arch/powerpc/boot/dts/mpc5121.dtsi
+++ b/arch/powerpc/boot/dts/mpc5121.dtsi
@@ -281,7 +281,7 @@
#address-cells = <1>;
#size-cells = <0>;
clocks = <&clks MPC512x_CLK_FEC>;
-   clock-names = "per";
+   clock-names = "ipg";
};
 
eth0: ethernet@2800 {
@@ -291,7 +291,7 @@
local-mac-address = [ 00 00 00 00 00 00 ];
interrupts = <4 0x8>;
clocks = <&clks MPC512x_CLK_FEC>;
-   clock-names = "per";
+   clock-names = "ipg";
};
 
/* USB1 using external ULPI PHY */
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] fs_enet: update clock names to comply with FEC binding

2014-02-24 Thread Gerhard Sittig
a recent FEC binding document update that was motivated by i.MX
development revealed that ARM and PowerPC implementations in Linux
did not agree on the clock names to use for the FEC nodes

change the OF clock lookup to prefer "ipg" over "per", which
improves compliance with the binding, and keeps compatibility
with former device trees

Signed-off-by: Gerhard Sittig 
---
 drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c 
b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
index 62f042d4aaa9..ce20184b96cb 100644
--- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
+++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
@@ -1037,11 +1037,20 @@ static int fs_enet_probe(struct platform_device *ofdev)
fpi->use_rmii = 1;
}
 
-   /* make clock lookup non-fatal (the driver is shared among platforms),
+   /* the driver is shared across several PowerPC platforms, not all
+* of them provide COMMON_CLK support, and newer kernels are supposed
+* to keep working with older DT blobs, so clock lookup is non-fatal
+*
 * but require enable to succeed when a clock was specified/found,
 * keep a reference to the clock upon successful acquisition
+*
+* the FEC binding is shared with ARM platforms, so we lookup several
+* clock names to prefer the common naming convention yet support
+* names that were used before unification
 */
-   clk = devm_clk_get(&ofdev->dev, "per");
+   clk = devm_clk_get(&ofdev->dev, "ipg");
+   if (IS_ERR(clk))
+   clk = devm_clk_get(&ofdev->dev, "per");
if (!IS_ERR(clk)) {
err = clk_prepare_enable(clk);
if (err) {
-- 
1.7.10.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev