RE: linux-4.5-rc4/arch/powerpc/boot/treeboot-akebono.c:90: possible bad test ?

2016-02-16 Thread David Binderman
Hello there Daniel,


> That looks like a good suggestion: maybe make the test 'if (!emac)'
> rather than explicitly comparing with zero.

Righto.

> Are you comfortable sending a patch to that effect?

Sorry, no. 

My email provider can't implement the somewhat strict whitespace
rules of kernel patches.

Nothing stopping any keen person implementing that patch, however ;->

I checked the rest of the treeboot-akebono.c source code file with
the static analyser and found nothing wrong. No warnings from
gcc -Wextra, either, so after the patch, it looks ok to me.


Regards

David Binderman

  
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, kernel] powerpc/ioda: Set "read" permission when "write" is set

2016-02-16 Thread Alexey Kardashevskiy

On 02/16/2016 02:18 PM, Michael Ellerman wrote:

On Tue, 2016-12-01 at 04:40:20 UTC, Alexey Kardashevskiy wrote:

Quite often drivers set only "write" permission assuming that this
includes "read" permission as well and this works on plenty platforms.
However IODA2 is strict about this and produces an EEH when "read"
permission is not and reading happens.

This adds a workaround in IODA code to always add the "read" bit when
the "write" bit is set.

Cc: Benjamin Herrenschmidt 
Signed-off-by: Alexey Kardashevskiy 
Tested-by: Douglas Miller 


Are you planning on sending a non-RFC version of this?


Just posted.


If so is it an urgent fix I should send upstream now?


Yes.

> And if so should it also be CC'ed to stable?

Ben suggested that yes, it should. Thanks. Sorry about breaking things.


--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH kernel v2] powerpc/ioda: Set "read" permission when "write" is set

2016-02-16 Thread Alexey Kardashevskiy
Quite often drivers set only "write" permission assuming that this
includes "read" permission as well and this works on plenty platforms.
However IODA2 is strict about this and produces an EEH when "read"
permission is not and reading happens.

This adds a workaround in IODA code to always add the "read" bit when
the "write" bit is set.

This fixes breakage introduced in
10b35b2b74 powerpc/powernv: Do not set "read" flag if direction==DMA_NONE

Cc: sta...@vger.kernel.org # 4.2+
Cc: Benjamin Herrenschmidt 
Signed-off-by: Alexey Kardashevskiy 
Tested-by: Douglas Miller 
---
 arch/powerpc/platforms/powernv/pci.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 2f55c86..6a97ba4 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -599,6 +599,9 @@ int pnv_tce_build(struct iommu_table *tbl, long index, long 
npages,
u64 rpn = __pa(uaddr) >> tbl->it_page_shift;
long i;
 
+   if (proto_tce & TCE_PCI_WRITE)
+   proto_tce |= TCE_PCI_READ;
+
for (i = 0; i < npages; i++) {
unsigned long newtce = proto_tce |
((rpn + i) << tbl->it_page_shift);
@@ -620,6 +623,9 @@ int pnv_tce_xchg(struct iommu_table *tbl, long index,
 
BUG_ON(*hpa & ~IOMMU_PAGE_MASK(tbl));
 
+   if (newtce & TCE_PCI_WRITE)
+   newtce |= TCE_PCI_READ;
+
oldtce = xchg(pnv_tce(tbl, idx), cpu_to_be64(newtce));
*hpa = be64_to_cpu(oldtce) & ~(TCE_PCI_READ | TCE_PCI_WRITE);
*direction = iommu_tce_direction(oldtce);
-- 
2.5.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fix BUG_ON() reporting in real mode on powerpc

2016-02-16 Thread Balbir Singh

> It might be a little better to do this:
> 
>   bugaddr = regs->nip;
>   if (REGION_ID(bugaddr) == 0 && !(regs->msr & MSR_IR))
>   bugaddr += PAGE_OFFSET;
> 
> It is possible to execute from addresses with the 0xc000... on top in
> real mode, because the CPU ignores the top 4 address bits in real
> mode.

Good catch! Thank you

Changelog:
 Don't add PAGE_OFFSET blindly, check if REGION_ID is 0

I ran into this issue while debugging an early boot problem.
The system hit a BUG_ON() but report bug failed to print the
line number and file name. The reason being that the system
was running in real mode and report_bug() searches for
addresses in the PAGE_OFFSET+ region

Suggested-by: Paul Mackerras 
Signed-off-by: Balbir Singh 
---
 arch/powerpc/kernel/traps.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index b6becc7..4de4fe7 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1148,6 +1148,7 @@ void __kprobes program_check_exception(struct pt_regs 
*regs)
    goto bail;
    }
    if (reason & REASON_TRAP) {
+   unsigned long bugaddr;
    /* Debugger is first in line to stop recursive faults in
     * rcu_lock, notify_die, or atomic_notifier_call_chain */
    if (debugger_bpt(regs))
@@ -1158,8 +1159,12 @@ void __kprobes program_check_exception(struct pt_regs 
*regs)
    == NOTIFY_STOP)
    goto bail;
 
+   bugaddr = regs->nip;
+   if ((REGION_ID(bugaddr) == 0) && !(regs->msr & MSR_IR))
+   bugaddr += PAGE_OFFSET;
+
    if (!(regs->msr & MSR_PR) &&  /* not user-mode */
-   report_bug(regs->nip, regs) == BUG_TRAP_TYPE_WARN) {
+   report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
    regs->nip += 4;
    goto bail;
    }
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC 2/2] selftests/powerpc: Add tests for various hash page fault paths

2016-02-16 Thread Anshuman Khandual
This new test case tries to create virtual memory scenarios to
drive different types of hash page faults. It also uses perf
API to capture the number of times it went through the intended
hash fault paths.

Signed-off-by: Anshuman Khandual 
---
Test result looks like this now. The objective is to auto verify
the count of these traces for various buffer sizes and scenarios.
created.

vm.nr_hugepages = 10
HugeTLB allocation
[  faults]:  1
[major-faults]:  0
[minor-faults]:  1
[ hash_faults]:  2
[ hash_faults_thp]:  0
[ hash_faults_64K]:  0
[  hash_faults_4K]:  0
[ hash_faults_hugetlb]:  1
THP allocation
[  faults]:  256
[major-faults]:  0
[minor-faults]:  256
[ hash_faults]:  256
[ hash_faults_thp]:  0
[ hash_faults_64K]:  256
[  hash_faults_4K]:  0
[ hash_faults_hugetlb]:  0
SUBPAGE protection
[  faults]:  0
[major-faults]:  0
[minor-faults]:  0
[ hash_faults]:  4096
[ hash_faults_thp]:  0
[ hash_faults_64K]:  0
[  hash_faults_4K]:  4096
[ hash_faults_hugetlb]:  0
PFN flush
[  faults]:  256
[major-faults]:  0
[minor-faults]:  256
[ hash_faults]:  4352
[ hash_faults_thp]:  0
[ hash_faults_64K]:  0
[  hash_faults_4K]:  4096
[ hash_faults_hugetlb]:  0
vm.nr_hugepages = 0

THP allocation on a free system is not happening even after a call to
MADVISE_HUGEPAGE. Problem seems to be related to NUMA memory configuration
which I will debug further.

 tools/testing/selftests/powerpc/mm/Makefile |   2 +-
 tools/testing/selftests/powerpc/mm/mem_perf.c   | 198 
 tools/testing/selftests/powerpc/mm/run_mem_perf |   3 +
 3 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/mem_perf.c
 create mode 100755 tools/testing/selftests/powerpc/mm/run_mem_perf

diff --git a/tools/testing/selftests/powerpc/mm/Makefile 
b/tools/testing/selftests/powerpc/mm/Makefile
index ee179e2..13bc5c3 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -1,7 +1,7 @@
 noarg:
$(MAKE) -C ../
 
-TEST_PROGS := hugetlb_vs_thp_test subpage_prot
+TEST_PROGS := hugetlb_vs_thp_test subpage_prot mem_perf
 TEST_FILES := tempfile
 
 all: $(TEST_PROGS) $(TEST_FILES)
diff --git a/tools/testing/selftests/powerpc/mm/mem_perf.c 
b/tools/testing/selftests/powerpc/mm/mem_perf.c
new file mode 100644
index 000..f5d8348
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/mem_perf.c
@@ -0,0 +1,198 @@
+/*
+ * Copyright 2016, Anshuman Khandual, IBM Corp.
+ * Licensed under GPLv2.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../pmu/event.c"
+
+#define ADDR_INPUT 0xa00UL
+#define HPAGE_SIZE 0x100
+#define PSIZE_64K  0x1
+#define PSIZE_4K   0x1000
+
+#define MAX_MM_EVENTS  8
+
+struct event mm_events[MAX_MM_EVENTS];
+
+static void setup_event(struct event *e, u64 config, char *name)
+{
+   event_init_opts(e, config, PERF_TYPE_SOFTWARE, name);
+   e->attr.disabled = 1;
+   e->attr.exclude_kernel = 1;
+   e->attr.exclude_hv = 1;
+   e->attr.exclude_idle = 1;
+}
+
+static void setup_event_tr(struct event *e, u64 config, char *name)
+{
+   memset(e, 0, sizeof(*e));
+
+   e->name = name;
+   e->attr.type = PERF_TYPE_TRACEPOINT;
+   e->attr.config = config;
+   e->attr.size = sizeof(e->attr);
+   e->attr.sample_period = PERF_SAMPLE_IDENTIFIER;
+   e->attr.inherit = 1;
+   e->attr.enable_on_exec = 1;
+   e->attr.exclude_guest = 1;
+
+   /* This has to match the structure layout in the header */
+   e->attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | \
+   PERF_FORMAT_TOTAL_TIME_RUNNING;
+   e->attr.disabled = 1;
+}
+
+
+static void prepare_events(void)
+{
+   int i;
+
+   for (i = 0; i < MAX_MM_EVENTS; i++)
+   event_reset(_events[i]);
+
+   for (i = 0; i < MAX_MM_EVENTS; i++)
+   event_enable(_events[i]);
+}
+
+static void close_events(void)
+{
+   int i;
+
+   for (i = 0; i < MAX_MM_EVENTS; i++)
+   event_close(_events[i]);
+}
+
+static void display_events(void)
+{
+   int i;
+
+   for (i = 0; i < MAX_MM_EVENTS; i++)
+   event_disable(_events[i]);
+
+   for (i = 0; i < MAX_MM_EVENTS; i++)
+   event_read(_events[i]);
+
+   for (i = 0; i < MAX_MM_EVENTS; i++)
+   printf("[%20s]: \t %llu\n", mm_events[i].name, 
mm_events[i].result.value);
+}
+
+static void 

[RFC 1/2] powerpc/mm: Add trace points for various types of hash faults

2016-02-16 Thread Anshuman Khandual
This adds trace point definitions and invocations for all types
of hash faults like THP, HugeTLB, 64K, 4K mappings. These are
intended to be used in user space for performance and functional
evaluation of various memory management paths.

Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/include/asm/trace.h | 82 
 arch/powerpc/mm/hash64_64k.c |  3 ++
 arch/powerpc/mm/hugepage-hash64.c|  2 +
 arch/powerpc/mm/hugetlbpage-hash64.c |  3 ++
 4 files changed, 90 insertions(+)

diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h
index 8e86b48..4f0a829 100644
--- a/arch/powerpc/include/asm/trace.h
+++ b/arch/powerpc/include/asm/trace.h
@@ -164,6 +164,88 @@ TRACE_EVENT(hash_fault,
  __entry->addr, __entry->access, __entry->trap)
 );
 
+TRACE_EVENT(hash_fault_hugetlb,
+
+   TP_PROTO(unsigned long addr, unsigned long access, unsigned long 
trap),
+   TP_ARGS(addr, access, trap),
+   TP_STRUCT__entry(
+   __field(unsigned long, addr)
+   __field(unsigned long, access)
+   __field(unsigned long, trap)
+   ),
+
+   TP_fast_assign(
+   __entry->addr = addr;
+   __entry->access = access;
+   __entry->trap = trap;
+   ),
+
+   TP_printk("HugeTLB hash fault with addr 0x%lx and access = 0x%lx 
trap = 0x%lx",
+ __entry->addr, __entry->access, __entry->trap)
+);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+TRACE_EVENT(hash_fault_thp,
+
+   TP_PROTO(unsigned long addr, unsigned long access, unsigned long 
trap),
+   TP_ARGS(addr, access, trap),
+   TP_STRUCT__entry(
+   __field(unsigned long, addr)
+   __field(unsigned long, access)
+   __field(unsigned long, trap)
+   ),
+
+   TP_fast_assign(
+   __entry->addr = addr;
+   __entry->access = access;
+   __entry->trap = trap;
+   ),
+
+   TP_printk("THP hash fault with addr 0x%lx and access = 0x%lx trap = 
0x%lx",
+ __entry->addr, __entry->access, __entry->trap)
+);
+#endif
+
+TRACE_EVENT(hash_fault_64K,
+
+   TP_PROTO(unsigned long addr, unsigned long access, unsigned long 
trap),
+   TP_ARGS(addr, access, trap),
+   TP_STRUCT__entry(
+   __field(unsigned long, addr)
+   __field(unsigned long, access)
+   __field(unsigned long, trap)
+   ),
+
+   TP_fast_assign(
+   __entry->addr = addr;
+   __entry->access = access;
+   __entry->trap = trap;
+   ),
+
+   TP_printk("64K hash fault with addr 0x%lx and access = 0x%lx trap = 
0x%lx",
+ __entry->addr, __entry->access, __entry->trap)
+);
+
+TRACE_EVENT(hash_fault_4K,
+
+   TP_PROTO(unsigned long addr, unsigned long access, unsigned long 
trap),
+   TP_ARGS(addr, access, trap),
+   TP_STRUCT__entry(
+   __field(unsigned long, addr)
+   __field(unsigned long, access)
+   __field(unsigned long, trap)
+   ),
+
+   TP_fast_assign(
+   __entry->addr = addr;
+   __entry->access = access;
+   __entry->trap = trap;
+   ),
+
+   TP_printk("4K hash fault with addr 0x%lx and access = 0x%lx trap = 
0x%lx",
+ __entry->addr, __entry->access, __entry->trap)
+);
+
 #endif /* _TRACE_POWERPC_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 0762c1e..7966fee 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 /*
  * index from 0 - 15
  */
@@ -58,6 +59,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, 
unsigned long vsid,
unsigned long vpn, hash, slot;
unsigned long shift = mmu_psize_defs[MMU_PAGE_4K].shift;
 
+   trace_hash_fault_4K(ea, access, trap);
/*
 * atomically mark the linux large page PTE busy and dirty
 */
@@ -221,6 +223,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
unsigned long vpn, hash, slot;
unsigned long shift = mmu_psize_defs[MMU_PAGE_64K].shift;
 
+   trace_hash_fault_64K(ea, access, trap);
/*
 * atomically mark the linux large page PTE busy and dirty
 */
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index 49b152b..daa588c 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -17,6 +17,7 @@
  */
 #include 
 #include 
+#include 
 
 int 

[PATCH] MAINTAINERS: Update EEH details and maintainership

2016-02-16 Thread Russell Currey
Enhanced Error Handling could mean anything in the context of the entire
kernel, so change the name to reference that it is both for PCI and
powerpc.

EEH covers a bit more than the previously listed files, so add the headers
and platform-specific code to the EEH maintained section.

In addition, I am taking over the maintainership.

Signed-off-by: Russell Currey 
---
 MAINTAINERS | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 28eb61b..95d999e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4222,13 +4222,6 @@ M:   Maxim Levitsky 
 S: Maintained
 F: drivers/media/rc/ene_ir.*
 
-ENHANCED ERROR HANDLING (EEH)
-M: Gavin Shan 
-L: linuxppc-dev@lists.ozlabs.org
-S: Supported
-F: Documentation/powerpc/eeh-pci-error-recovery.txt
-F: arch/powerpc/kernel/eeh*.c
-
 EPSON S1D13XXX FRAMEBUFFER DRIVER
 M: Kristoffer Ericson 
 S: Maintained
@@ -8244,6 +8237,15 @@ L:   linux-...@vger.kernel.org
 S: Supported
 F: Documentation/PCI/pci-error-recovery.txt
 
+PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC
+M: Russell Currey 
+L: linuxppc-dev@lists.ozlabs.org
+S: Supported
+F: Documentation/powerpc/eeh-pci-error-recovery.txt
+F: arch/powerpc/kernel/eeh*.c
+F: arch/powerpc/platforms/*/eeh*.c
+F: arch/powerpc/include/*/eeh*.h
+
 PCI SUBSYSTEM
 M: Bjorn Helgaas 
 L: linux-...@vger.kernel.org
-- 
2.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc: Add POWER9 cputable entry

2016-02-16 Thread Madhavan Srinivasan


On Wednesday 17 February 2016 10:37 AM, Michael Neuling wrote:
> Add a cputable entry for POWER9.  More code is required to actually
> boot and run on a POWER9 but this gets the base piece in which we can
> start building on.
>
> Copies over from POWER8 except for:
> - Adds a new CPU_FTR_ARCH_30 bit to start hanging new architecture
>   features from (in subsequent patches).
> - Advertises new user features bits PPC_FEATURE2_ARCH_3_00 &
>   HAS_IEEE128 when on POWER9.
> - Drops CPU_FTR_SUBCORE.
>
> Signed-off-by: Michael Neuling 
> ---
>  arch/powerpc/include/asm/cputable.h   | 14 --
>  arch/powerpc/include/asm/mmu-hash64.h |  1 +
>  arch/powerpc/include/asm/mmu.h|  1 +
>  arch/powerpc/kernel/cpu_setup_power.S | 48 
> +++
>  arch/powerpc/kernel/cputable.c| 36 ++
>  arch/powerpc/kernel/mce_power.c   | 15 +--
>  6 files changed, 105 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/cputable.h 
> b/arch/powerpc/include/asm/cputable.h
> index a47e175..7fb238c 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -171,7 +171,7 @@ enum {
>  #define CPU_FTR_ARCH_201 LONG_ASM_CONST(0x0002)
>  #define CPU_FTR_ARCH_206 LONG_ASM_CONST(0x0004)
>  #define CPU_FTR_ARCH_207SLONG_ASM_CONST(0x0008)
> -/* Free  
> LONG_ASM_CONST(0x0010) */
> +#define CPU_FTR_ARCH_30  
> LONG_ASM_CONST(0x0010)
>  #define CPU_FTR_MMCRA
> LONG_ASM_CONST(0x0020)
>  #define CPU_FTR_CTRL LONG_ASM_CONST(0x0040)
>  #define CPU_FTR_SMT  LONG_ASM_CONST(0x0080)
> @@ -447,6 +447,16 @@ enum {
>   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_SUBCORE)
>  #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
>  #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
> +#define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
> + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
> + CPU_FTR_MMCRA | CPU_FTR_SMT | \
> + CPU_FTR_COHERENT_ICACHE | \
> + CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
> + CPU_FTR_DSCR | CPU_FTR_SAO  | \
> + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
> + CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
> + CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
> + CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_30)
>  #define CPU_FTRS_CELL(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
>   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
>   CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
> @@ -465,7 +475,7 @@ enum {
>   (CPU_FTRS_POWER4 | CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
>CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
>CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
> -  CPU_FTRS_PA6T | CPU_FTR_VSX)
> +  CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9)
>  #endif
>  #else
>  enum {
> diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
> b/arch/powerpc/include/asm/mmu-hash64.h
> index 7352d3f..e36dc90 100644
> --- a/arch/powerpc/include/asm/mmu-hash64.h
> +++ b/arch/powerpc/include/asm/mmu-hash64.h
> @@ -114,6 +114,7 @@
>  
>  #define POWER7_TLB_SETS  128 /* # sets in POWER7 TLB */
>  #define POWER8_TLB_SETS  512 /* # sets in POWER8 TLB */
> +#define POWER9_TLB_SETS_HASH 256 /* # sets in POWER9 TLB Hash mode */
>  
>  #ifndef __ASSEMBLY__
>  
> diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
> index 3d5abfe..54d4650 100644
> --- a/arch/powerpc/include/asm/mmu.h
> +++ b/arch/powerpc/include/asm/mmu.h
> @@ -97,6 +97,7 @@
>  #define MMU_FTRS_POWER6  MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
>  #define MMU_FTRS_POWER7  MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
>  #define MMU_FTRS_POWER8  MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
> +#define MMU_FTRS_POWER9  MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
>  #define MMU_FTRS_CELLMMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
>   MMU_FTR_CI_LARGE_PAGE
>  #define MMU_FTRS_PA6TMMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
> b/arch/powerpc/kernel/cpu_setup_power.S
> index 9c9b741..1785480 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -83,6 +83,43 @@ _GLOBAL(__restore_cpu_power8)
>   mtlrr11
>   blr
>  
> +_GLOBAL(__setup_cpu_power9)
> + mflrr11
> + bl  __init_FSCR
> + bl  __init_PMU
Just to keep in mind, I am not sure whether
powerisa 3.0 support MMCRS spr, 

[PATCH 2/2] powerpc: Add POWER9 cputable entry

2016-02-16 Thread Michael Neuling
Add a cputable entry for POWER9.  More code is required to actually
boot and run on a POWER9 but this gets the base piece in which we can
start building on.

Copies over from POWER8 except for:
- Adds a new CPU_FTR_ARCH_30 bit to start hanging new architecture
  features from (in subsequent patches).
- Advertises new user features bits PPC_FEATURE2_ARCH_3_00 &
  HAS_IEEE128 when on POWER9.
- Drops CPU_FTR_SUBCORE.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/cputable.h   | 14 --
 arch/powerpc/include/asm/mmu-hash64.h |  1 +
 arch/powerpc/include/asm/mmu.h|  1 +
 arch/powerpc/kernel/cpu_setup_power.S | 48 +++
 arch/powerpc/kernel/cputable.c| 36 ++
 arch/powerpc/kernel/mce_power.c   | 15 +--
 6 files changed, 105 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index a47e175..7fb238c 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -171,7 +171,7 @@ enum {
 #define CPU_FTR_ARCH_201   LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_206   LONG_ASM_CONST(0x0004)
 #define CPU_FTR_ARCH_207S  LONG_ASM_CONST(0x0008)
-/* Free
LONG_ASM_CONST(0x0010) */
+#define CPU_FTR_ARCH_30
LONG_ASM_CONST(0x0010)
 #define CPU_FTR_MMCRA  LONG_ASM_CONST(0x0020)
 #define CPU_FTR_CTRL   LONG_ASM_CONST(0x0040)
 #define CPU_FTR_SMTLONG_ASM_CONST(0x0080)
@@ -447,6 +447,16 @@ enum {
CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_SUBCORE)
 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
 #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
+#define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
+   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
+   CPU_FTR_MMCRA | CPU_FTR_SMT | \
+   CPU_FTR_COHERENT_ICACHE | \
+   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
+   CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
+   CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
+   CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
+   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_30)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -465,7 +475,7 @@ enum {
(CPU_FTRS_POWER4 | CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_POWER8_DD1 | CPU_FTRS_CELL | \
-CPU_FTRS_PA6T | CPU_FTR_VSX)
+CPU_FTRS_PA6T | CPU_FTR_VSX | CPU_FTRS_POWER9)
 #endif
 #else
 enum {
diff --git a/arch/powerpc/include/asm/mmu-hash64.h 
b/arch/powerpc/include/asm/mmu-hash64.h
index 7352d3f..e36dc90 100644
--- a/arch/powerpc/include/asm/mmu-hash64.h
+++ b/arch/powerpc/include/asm/mmu-hash64.h
@@ -114,6 +114,7 @@
 
 #define POWER7_TLB_SETS128 /* # sets in POWER7 TLB */
 #define POWER8_TLB_SETS512 /* # sets in POWER8 TLB */
+#define POWER9_TLB_SETS_HASH   256 /* # sets in POWER9 TLB Hash mode */
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index 3d5abfe..54d4650 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -97,6 +97,7 @@
 #define MMU_FTRS_POWER6MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
 #define MMU_FTRS_POWER7MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
 #define MMU_FTRS_POWER8MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
+#define MMU_FTRS_POWER9MMU_FTRS_POWER4 | MMU_FTR_LOCKLESS_TLBIE
 #define MMU_FTRS_CELL  MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
MMU_FTR_CI_LARGE_PAGE
 #define MMU_FTRS_PA6T  MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index 9c9b741..1785480 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -83,6 +83,43 @@ _GLOBAL(__restore_cpu_power8)
mtlrr11
blr
 
+_GLOBAL(__setup_cpu_power9)
+   mflrr11
+   bl  __init_FSCR
+   bl  __init_PMU
+   bl  __init_hvmode_206
+   mtlrr11
+   beqlr
+   li  r0,0
+   mtspr   SPRN_LPID,r0
+   mfspr   r3,SPRN_LPCR
+   ori r3, r3, LPCR_PECEDH
+   bl  __init_LPCR
+   bl  __init_HFSCR
+   bl  __init_tlb_power9
+   bl  

[PATCH 1/2] powerpc/powernv: Create separate subcores CPU feature bit

2016-02-16 Thread Michael Neuling
Subcores isn't really part of the 2.07 architecture but currently we
turn it on using the 2.07 feature bit.  Subcores is really a POWER8
specific feature.

This adds a new CPU_FTR bit just for subcores and moves the subcore
init code over to use this.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/include/asm/cputable.h  | 3 ++-
 arch/powerpc/platforms/powernv/subcore.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index b118072..a47e175 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -196,6 +196,7 @@ enum {
 #define CPU_FTR_DAWR   LONG_ASM_CONST(0x0400)
 #define CPU_FTR_DABRX  LONG_ASM_CONST(0x0800)
 #define CPU_FTR_PMAO_BUG   LONG_ASM_CONST(0x1000)
+#define CPU_FTR_SUBCORE
LONG_ASM_CONST(0x2000)
 
 #ifndef __ASSEMBLY__
 
@@ -443,7 +444,7 @@ enum {
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
-   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
+   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_SUBCORE)
 #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
 #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
diff --git a/arch/powerpc/platforms/powernv/subcore.c 
b/arch/powerpc/platforms/powernv/subcore.c
index 503a73f..0babef1 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -407,7 +407,7 @@ static DEVICE_ATTR(subcores_per_core, 0644,
 
 static int subcore_init(void)
 {
-   if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+   if (!cpu_has_feature(CPU_FTR_SUBCORE))
return 0;
 
/*
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Fix BUG_ON() reporting in real mode on powerpc

2016-02-16 Thread Paul Mackerras
On Wed, Feb 17, 2016 at 03:43:11PM +1100, Balbir Singh wrote:
> From: Balbir Singh 
> 
> I ran into this issue while debugging an early boot problem.
> The system hit a BUG_ON() but report bug failed to print the
> line number and file name. The reason being that the system
> was running in real mode and report_bug() searches for
> addresses in the PAGE_OFFSET+ region
> 
> Suggested-by: Paul Mackerras 
> Signed-off-by: Balbir Singh 
> ---
>  arch/powerpc/kernel/traps.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index b6becc7..8f28120 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1148,6 +1148,7 @@ void __kprobes program_check_exception(struct pt_regs 
> *regs)
>   goto bail;
>   }
>   if (reason & REASON_TRAP) {
> + unsigned long bugaddr;
>   /* Debugger is first in line to stop recursive faults in
>    * rcu_lock, notify_die, or atomic_notifier_call_chain */
>   if (debugger_bpt(regs))
> @@ -1158,8 +1159,13 @@ void __kprobes program_check_exception(struct pt_regs 
> *regs)
>   == NOTIFY_STOP)
>   goto bail;
>  
> + if (!(regs->msr & MSR_IR))
> + bugaddr = regs->nip + PAGE_OFFSET;
> + else
> + bugaddr = regs->nip;

It might be a little better to do this:

bugaddr = regs->nip;
if (REGION_ID(bugaddr) == 0 && !(regs->msr & MSR_IR))
bugaddr += PAGE_OFFSET;

It is possible to execute from addresses with the 0xc000... on top in
real mode, because the CPU ignores the top 4 address bits in real
mode.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Fix BUG_ON() reporting in real mode on powerpc

2016-02-16 Thread Balbir Singh
From: Balbir Singh 

I ran into this issue while debugging an early boot problem.
The system hit a BUG_ON() but report bug failed to print the
line number and file name. The reason being that the system
was running in real mode and report_bug() searches for
addresses in the PAGE_OFFSET+ region

Suggested-by: Paul Mackerras 
Signed-off-by: Balbir Singh 
---
 arch/powerpc/kernel/traps.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index b6becc7..8f28120 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1148,6 +1148,7 @@ void __kprobes program_check_exception(struct pt_regs 
*regs)
    goto bail;
    }
    if (reason & REASON_TRAP) {
+   unsigned long bugaddr;
    /* Debugger is first in line to stop recursive faults in
     * rcu_lock, notify_die, or atomic_notifier_call_chain */
    if (debugger_bpt(regs))
@@ -1158,8 +1159,13 @@ void __kprobes program_check_exception(struct pt_regs 
*regs)
    == NOTIFY_STOP)
    goto bail;
 
+   if (!(regs->msr & MSR_IR))
+   bugaddr = regs->nip + PAGE_OFFSET;
+   else
+   bugaddr = regs->nip;
+
    if (!(regs->msr & MSR_PR) &&  /* not user-mode */
-   report_bug(regs->nip, regs) == BUG_TRAP_TYPE_WARN) {
+   report_bug(bugaddr, regs) == BUG_TRAP_TYPE_WARN) {
    regs->nip += 4;
    goto bail;
    }
-- 
2.5.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances

2016-02-16 Thread Andrew Donnellan

On 17/02/16 14:43, Gavin Shan wrote:

This cleans up on below data struct instances to use tab instead of
space indent of statement to avoid complains from scripts/checkpatch.pl.
No logical changes introduced.

   @pnv_pci_ioda_controller_ops
   @pnv_npu_ioda_controller_ops

Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Axtens 


Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset()

2016-02-16 Thread Andrew Donnellan

On 17/02/16 14:44, Gavin Shan wrote:

This drops unnecessary nested if statements in pnv_eeh_reset() to
improve the code readability. After the changes, the unused local
variable "ret" is dropped as well. No logical changes introduced.

Signed-off-by: Gavin Shan 


This looks good to me.

Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V11_RESEND 00/10] powerpc/perf: Enable SW branch filters

2016-02-16 Thread Michael Ellerman
On Tue, 2016-02-16 at 14:08 +0100, Christophe Leroy wrote:

> Your patches seem to be under review in Patchwork, see 
> https://patchwork.ozlabs.org/patch/526275/
> Why do you resend it ?

Because I haven't *actually* got around to reviewing them. Or at least I
started and found problems and haven't had time to debug them and reply.

So I'm OK for Anshuman to post a rebase of these.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops

2016-02-16 Thread Andrew Donnellan

On 17/02/16 14:43, Gavin Shan wrote:

Each PHB has one instance of "struct pci_controller_ops", which
includes various callbacks called by PCI subsystem. In the definition
of this struct, some callbacks have explicit names for its arguments,
but the left don't have.

This adds all explicit names of the arguments to the callbacks in
"struct pci_controller_ops" so that the code looks consistent.

Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Axtens 


Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  Software Engineer, OzLabs
andrew.donnel...@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V11_RESEND 00/10] powerpc/perf: Enable SW branch filters

2016-02-16 Thread Anshuman Khandual
On 02/16/2016 06:38 PM, Christophe Leroy wrote:
> Your patches seem to be under review in Patchwork, see
> https://patchwork.ozlabs.org/patch/526275/
> Why do you resend it ?

Its been 3-4 months I had posted last version V11. Wanted to check if it all
works till now. So rebased the series, tested and resent.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list

2016-02-16 Thread Gavin Shan
PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
to their DMA32 weight. The PEs on the list are iterated to setup
their TCE32 tables at system booting time. The list is used for
once and there is for keep having it.

This moves the logic calculating DMA32 weight of PHB and PE to
pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
are useless and they're removed.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 168 +-
 arch/powerpc/platforms/powernv/pci.h  |  19 
 2 files changed, 75 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index e60cff6..0fc2309 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -886,44 +886,6 @@ out:
return 0;
 }
 
-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
-  struct pnv_ioda_pe *pe)
-{
-   struct pnv_ioda_pe *lpe;
-
-   list_for_each_entry(lpe, >ioda.pe_dma_list, dma_link) {
-   if (lpe->dma_weight < pe->dma_weight) {
-   list_add_tail(>dma_link, >dma_link);
-   return;
-   }
-   }
-   list_add_tail(>dma_link, >ioda.pe_dma_list);
-}
-
-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
-{
-   /* This is quite simplistic. The "base" weight of a device
-* is 10. 0 means no DMA is to be accounted for it.
-*/
-
-   /* If it's a bridge, no DMA */
-   if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
-   return 0;
-
-   /* Reduce the weight of slow USB controllers */
-   if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
-   dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
-   dev->class == PCI_CLASS_SERIAL_USB_EHCI)
-   return 3;
-
-   /* Increase the weight of RAID (includes Obsidian) */
-   if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
-   return 15;
-
-   /* Default */
-   return 10;
-}
-
 #ifdef CONFIG_PCI_IOV
 static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 {
@@ -1028,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
pci_dev *dev)
pe->flags = PNV_IODA_PE_DEV;
pe->pdev = dev;
pe->pbus = NULL;
-   pe->tce32_seg = -1;
pe->mve_number = -1;
pe->rid = dev->bus->number << 8 | pdn->devfn;
 
@@ -1044,16 +1005,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
pci_dev *dev)
return NULL;
}
 
-   /* Assign a DMA weight to the device */
-   pe->dma_weight = pnv_ioda_dma_weight(dev);
-   if (pe->dma_weight != 0) {
-   phb->ioda.dma_weight += pe->dma_weight;
-   phb->ioda.dma_pe_count++;
-   }
-
-   /* Link the PE */
-   pnv_ioda_link_pe_by_weight(phb, pe);
-
return pe;
 }
 
@@ -1071,7 +1022,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
}
pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
-   pe->dma_weight += pnv_ioda_dma_weight(dev);
if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
pnv_ioda_setup_same_PE(dev->subordinate, pe);
}
@@ -1108,10 +1058,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, 
bool all)
pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
pe->pbus = bus;
pe->pdev = NULL;
-   pe->tce32_seg = -1;
pe->mve_number = -1;
pe->rid = bus->busn_res.start << 8;
-   pe->dma_weight = 0;
 
if (all)
pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
@@ -1133,17 +1081,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, 
bool all)
 
/* Put PE to the list */
list_add_tail(>list, >ioda.pe_list);
-
-   /* Account for one DMA PE if at least one DMA capable device exist
-* below the bridge
-*/
-   if (pe->dma_weight != 0) {
-   phb->ioda.dma_weight += pe->dma_weight;
-   phb->ioda.dma_pe_count++;
-   }
-
-   /* Link the PE */
-   pnv_ioda_link_pe_by_weight(phb, pe);
 }
 
 static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
@@ -1184,7 +1121,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct 
pci_dev *npu_pdev)
rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
npu_pdn->pcidev = npu_pdev;
npu_pdn->pe_number = pe_num;
-   pe->dma_weight += pnv_ioda_dma_weight(npu_pdev);
phb->ioda.pe_rmap[rid] = pe->pe_number;
 
/* Map the PE to this link */
@@ -1532,7 +1468,6 @@ 

[PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()

2016-02-16 Thread Gavin Shan
The original implementation of pnv_ioda_setup_pe_seg() configures
IO and M32 segments by separate logics, which can be merged by
by caching @segmap, @seg_size, @win in advance. This shouldn't
cause any behavioural changes.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 62 ++-
 1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 44cc5f3..fd7d382 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2940,8 +2940,10 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller 
*hose,
struct pnv_phb *phb = hose->private_data;
struct pci_bus_region region;
struct resource *res;
-   int i, index;
-   int rc;
+   unsigned int segsize;
+   int *segmap, index, i;
+   uint16_t win;
+   int64_t rc;
 
/*
 * NOTE: We only care PCI bus based PE for now. For PCI
@@ -2958,23 +2960,9 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller 
*hose,
if (res->flags & IORESOURCE_IO) {
region.start = res->start - phb->ioda.io_pci_base;
region.end   = res->end - phb->ioda.io_pci_base;
-   index = region.start / phb->ioda.io_segsize;
-
-   while (index < phb->ioda.total_pe_num &&
-  region.start <= region.end) {
-   phb->ioda.io_segmap[index] = pe->pe_number;
-   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-   pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, 
index);
-   if (rc != OPAL_SUCCESS) {
-   pr_err("%s: OPAL error %d when mapping 
IO "
-  "segment #%d to PE#%d\n",
-  __func__, rc, index, 
pe->pe_number);
-   break;
-   }
-
-   region.start += phb->ioda.io_segsize;
-   index++;
-   }
+   segsize  = phb->ioda.io_segsize;
+   segmap   = phb->ioda.io_segmap;
+   win  = OPAL_IO_WINDOW_TYPE;
} else if ((res->flags & IORESOURCE_MEM) &&
   !pnv_pci_is_mem_pref_64(res->flags)) {
region.start = res->start -
@@ -2983,23 +2971,29 @@ static void pnv_ioda_setup_pe_seg(struct pci_controller 
*hose,
region.end   = res->end -
   hose->mem_offset[0] -
   phb->ioda.m32_pci_base;
-   index = region.start / phb->ioda.m32_segsize;
-
-   while (index < phb->ioda.total_pe_num &&
-  region.start <= region.end) {
-   phb->ioda.m32_segmap[index] = pe->pe_number;
-   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
-   pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, 
index);
-   if (rc != OPAL_SUCCESS) {
-   pr_err("%s: OPAL error %d when mapping 
M32 "
-  "segment#%d to PE#%d",
-  __func__, rc, index, 
pe->pe_number);
-   break;
-   }
+   segsize  = phb->ioda.m32_segsize;
+   segmap   = phb->ioda.m32_segmap;
+   win  = OPAL_M32_WINDOW_TYPE;
+   } else {
+   continue;
+   }
 
-   region.start += phb->ioda.m32_segsize;
-   index++;
+   index = region.start / segsize;
+   while (index < phb->ioda.total_pe_num &&
+  region.start <= region.end) {
+   segmap[index] = pe->pe_number;
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   pe->pe_number, win, 0, index);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("%s: Error %lld mapping (%d) seg#%d to 
PHB#%d-PE#%d\n",
+   __func__, rc, win, index,
+   pe->phb->hose->global_number,
+   pe->pe_number);
+   break;
}
+
+   region.start += segsize;
+   index++;
}
}

[PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time

2016-02-16 Thread Gavin Shan
Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The
function is called for once after PCI probing and resources
assignment is completed. So it isn't hotplug friendly.

This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
is called on the event during system bootup and PCI hotplug: updating
PCI bridge's windows after resource assignment/reassignment are done.
For partial hotplug case, where not all PCI devices belonging to the
PE are unplugged and plugged again, we just need unbinding/binding
the affected PCI devices with the corresponding PE without creating
new one.

As there is no upstream bridge for root bus that needs to be covered
by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
before any other PEs can be created, as PE for root bus is the ancestor
to anyone else.

Also, the windows of root port or the upstream port of PCIe switch behind
root port are extended to be PHB's apertures to accommodate the additional
resources needed by newly plugged devices based on the fact: hotpluggable
slot is behind root port or downstream port of the PCIe switch behind
root port. The extension for those PCI brdiges' windows is done in
ppc_md.pcibios_setup_bridge() as well.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 294 +-
 arch/powerpc/platforms/powernv/pci.h  |   2 +
 2 files changed, 168 insertions(+), 128 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 565725b..d360607 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -197,14 +197,14 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
set_bit(phb->ioda.m64_bar_idx, >ioda.m64_bar_alloc);
 
/*
-* Strip off the segment used by the reserved PE, which is
-* expected to be 0 or last one of PE capabicity.
+* Exclude the segments for reserved and root bus PE, which
+* are first or last two PEs.
 */
r = >hose->mem_resources[1];
if (phb->ioda.reserved_pe_idx == 0)
-   r->start += phb->ioda.m64_segsize;
+   r->start += (2 * phb->ioda.m64_segsize);
else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
-   r->end -= phb->ioda.m64_segsize;
+   r->end -= (2 * phb->ioda.m64_segsize);
else
pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
phb->ioda.reserved_pe_idx);
@@ -284,14 +284,14 @@ static int pnv_ioda1_init_m64(struct pnv_phb *phb)
}
 
/*
-* Exclude the segment used by the reserved PE, which
-* is expected to be 0 or last supported PE#.
+* Exclude the segments for reserved and root bus PE, which
+* are first or last two PEs.
 */
r = >hose->mem_resources[1];
if (phb->ioda.reserved_pe_idx == 0)
-   r->start += phb->ioda.m64_segsize;
+   r->start += (2 * phb->ioda.m64_segsize);
else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
-   r->end -= phb->ioda.m64_segsize;
+   r->end -= (2 * phb->ioda.m64_segsize);
else
pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
phb->ioda.reserved_pe_idx);
@@ -1022,6 +1022,15 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
pci_name(dev));
continue;
}
+
+   /*
+* In partial hotplug case, the PCI device might be still
+* associated with the PE and needn't be attached to the
+* PE again.
+*/
+   if (pdn->pe_number != IODA_INVALID_PE)
+   continue;
+
pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
@@ -1040,9 +1049,26 @@ static struct pnv_ioda_pe *pnv_ioda_setup_bus_PE(struct 
pci_bus *bus, bool all)
struct pci_controller *hose = pci_bus_to_host(bus);
struct pnv_phb *phb = hose->private_data;
struct pnv_ioda_pe *pe = NULL;
+   int pe_num;
+
+   /*
+* In partial hotplug case, the PE instance might be still alive.
+* We should reuse it instead of allocating a new one.
+*/
+   pe_num = phb->ioda.pe_rmap[bus->number << 8];
+   if (pe_num != IODA_INVALID_PE) {
+   pe = >ioda.pe_array[pe_num];
+   pnv_ioda_setup_same_PE(bus, pe);
+   return NULL;
+   }
+
+   /* PE number for root bus should have been reserved */
+   if (pci_is_root_bus(bus) &&
+   phb->ioda.root_pe_idx != IODA_INVALID_PE)
+

[PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table

2016-02-16 Thread Gavin Shan
pnv_pci_ioda_table_free_pages() can be reused to release the IODA1
TCE table when releasing IODA1 PE in subsequent patches.

This renames the following functions to support releasing IODA1 TCE
table: pnv_pci_ioda2_table_free_pages() to pnv_pci_ioda_table_free_pages(),
pnv_pci_ioda2_table_do_free_pages() to pnv_pci_ioda_table_do_free_pages().
No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index d360607..077f9db 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -51,7 +51,7 @@
 #define POWERNV_IOMMU_DEFAULT_LEVELS   1
 #define POWERNV_IOMMU_MAX_LEVELS   5
 
-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl);
 
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
const char *fmt, ...)
@@ -1352,7 +1352,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev 
*dev, struct pnv_ioda_pe
iommu_group_put(pe->table_group.group);
BUG_ON(pe->table_group.group);
}
-   pnv_pci_ioda2_table_free_pages(tbl);
+   pnv_pci_ioda_table_free_pages(tbl);
iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
 }
 
@@ -1946,7 +1946,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, 
long index,
 
 static void pnv_ioda2_table_free(struct iommu_table *tbl)
 {
-   pnv_pci_ioda2_table_free_pages(tbl);
+   pnv_pci_ioda_table_free_pages(tbl);
iommu_free_table(tbl, "pnv");
 }
 
@@ -2448,7 +2448,7 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int 
nid, unsigned shift,
return addr;
 }
 
-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
unsigned long size, unsigned level);
 
 static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
@@ -2487,7 +2487,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, 
__u64 bus_offset,
 * release partially allocated table.
 */
if (offset < tce_table_size) {
-   pnv_pci_ioda2_table_do_free_pages(addr,
+   pnv_pci_ioda_table_do_free_pages(addr,
1ULL << (level_shift - 3), levels - 1);
return -ENOMEM;
}
@@ -2505,7 +2505,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, 
__u64 bus_offset,
return 0;
 }
 
-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
unsigned long size, unsigned level)
 {
const unsigned long addr_ul = (unsigned long) addr &
@@ -2521,7 +2521,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 
*addr,
if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
continue;
 
-   pnv_pci_ioda2_table_do_free_pages(__va(hpa), size,
+   pnv_pci_ioda_table_do_free_pages(__va(hpa), size,
level - 1);
}
}
@@ -2529,7 +2529,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 
*addr,
free_pages(addr_ul, get_order(size << 3));
 }
 
-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl)
 {
const unsigned long size = tbl->it_indirect_levels ?
tbl->it_level_size : tbl->it_size;
@@ -2537,7 +2537,7 @@ static void pnv_pci_ioda2_table_free_pages(struct 
iommu_table *tbl)
if (!tbl->it_size)
return;
 
-   pnv_pci_ioda2_table_do_free_pages((__be64 *)tbl->it_base, size,
+   pnv_pci_ioda_table_do_free_pages((__be64 *)tbl->it_base, size,
tbl->it_indirect_levels);
 }
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 27/45] powerpc/pci: Export pci_add_device_node_info()

2016-02-16 Thread Gavin Shan
This renames update_dn_pci_info() to pci_add_device_node_info()
with corresponding adjustment on the parameter type and exports it.
The function is used to create pdn (struct pci_dn) for the indicated
device node. Another function add_pdn(), almost wrapper of
pci_add_device_node_info(), to be used in traverse_pci_devices(). No
logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h  |  3 ++-
 arch/powerpc/kernel/pci_dn.c   | 30 +++---
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 03f4ee7..72a9d4e 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -238,7 +238,8 @@ extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus 
*bus,
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
 extern struct pci_dn *add_dev_pci_data(struct pci_dev *pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
-extern void *update_dn_pci_info(struct device_node *dn, void *data);
+extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
+  struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 38102cb..0a249ff 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -282,13 +282,9 @@ void remove_dev_pci_data(struct pci_dev *pdev)
 #endif /* CONFIG_PCI_IOV */
 }
 
-/*
- * Traverse_func that inits the PCI fields of the device node.
- * NOTE: this *must* be done before read/write config to the device.
- */
-void *update_dn_pci_info(struct device_node *dn, void *data)
+struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
+   struct device_node *dn)
 {
-   struct pci_controller *phb = data;
const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", 
NULL);
const __be32 *regs;
struct device_node *parent;
@@ -299,7 +295,7 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
return NULL;
dn->data = pdn;
pdn->node = dn;
-   pdn->phb = phb;
+   pdn->phb = hose;
 #ifdef CONFIG_PPC_POWERNV
pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -331,8 +327,9 @@ void *update_dn_pci_info(struct device_node *dn, void *data)
if (pdn->parent)
list_add_tail(>list, >parent->child_list);
 
-   return NULL;
+   return pdn;
 }
+EXPORT_SYMBOL_GPL(pci_add_device_node_info);
 
 /*
  * Traverse a device tree stopping each PCI device in the tree.
@@ -432,6 +429,18 @@ void *traverse_pci_dn(struct pci_dn *root,
return NULL;
 }
 
+static void *add_pdn(struct device_node *dn, void *data)
+{
+   struct pci_controller *hose = data;
+   struct pci_dn *pdn;
+
+   pdn = pci_add_device_node_info(hose, dn);
+   if (!pdn)
+   return ERR_PTR(-ENOMEM);
+
+   return NULL;
+}
+
 /** 
  * pci_devs_phb_init_dynamic - setup pci devices under this PHB
  * phb: pci-to-host bridge (top-level bridge connecting to cpu)
@@ -446,8 +455,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
struct pci_dn *pdn;
 
/* PHB nodes themselves must not match */
-   update_dn_pci_info(dn, phb);
-   pdn = dn->data;
+   pdn = pci_add_device_node_info(phb, dn);
if (pdn) {
pdn->devfn = pdn->busno = -1;
pdn->vendor_id = pdn->device_id = pdn->class_code = 0;
@@ -456,7 +464,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
}
 
/* Update dn->phb ptrs for new phb and children devices */
-   traverse_pci_devices(dn, update_dn_pci_info, phb);
+   traverse_pci_devices(dn, add_pdn, phb);
 }
 
 /** 
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 36df46e..6f8d020 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -265,7 +265,7 @@ static int pci_dn_reconfig_notifier(struct notifier_block 
*nb, unsigned long act
pdn = parent ? PCI_DN(parent) : NULL;
if (pdn) {
/* Create pdn and EEH device */
-   update_dn_pci_info(np, pdn->phb);
+   pci_add_device_node_info(pdn->phb, np);
eeh_dev_init(PCI_DN(np), pdn->phb);
}
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 23/45] powerpc/powernv: Dynamically release PEs

2016-02-16 Thread Gavin Shan
This support releasing PEs dynamically. Firstly, this moves
pnv_pci_ioda2_release_dma_pe() around, which is called to
release DMA resource on releasing IODA2 PE. Secondly, several
functions are implemented to release the consumed resources
on releasing the PE:

   * pnv_pci_ioda1_unset_window() to unset TVEs for the PE.
   * pnv_pci_ioda1_release_dma_pe() to unset TVEs for the PE and
 destroy the IOMMU table.
   * pnv_ioda_release_pe_seg() releases the consumed IO/M32/M64
 segments by the PE.

Lastly, this adds a reference count of PE, representing the number
of PCI devices associated with the PE. The reference count is
increased when PCI device joins the PE. It's decreased when PCI
device leaves the PE in pnv_pci_release_device(). When the count
becomes zero, its consumed resources are released by functions
as mentioned above. Note that the count is accessed concurrently.
So a "counter" with "int" type is enough here.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 236 ++
 arch/powerpc/platforms/powernv/pci.h  |   1 +
 2 files changed, 209 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 077f9db..fa428a8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -119,6 +119,158 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long 
flags)
(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static unsigned int pnv_pci_ioda_pe_dma_weight(struct pnv_ioda_pe *pe);
+static long pnv_pci_ioda1_unset_window(struct iommu_table_group *table_group,
+  int num);
+static void pnv_pci_ioda1_release_dma_pe(struct pnv_ioda_pe *pe)
+{
+   struct iommu_table *tbl;
+   unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
+   int64_t rc;
+
+   if (!weight)
+   return;
+
+   tbl = pe->table_group.tables[0];
+   rc = pnv_pci_ioda1_unset_window(>table_group, 0);
+   if (rc)
+   pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
+
+   if (pe->table_group.group) {
+   iommu_group_put(pe->table_group.group);
+   WARN_ON(pe->table_group.group);
+   }
+
+   pnv_pci_ioda_table_free_pages(tbl);
+   iommu_free_table(tbl, "pnv");
+}
+
+static long pnv_pci_ioda2_unset_window(struct iommu_table_group *table_group,
+  int num);
+static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
+static void pnv_pci_ioda2_release_dma_pe(struct pnv_ioda_pe *pe)
+{
+   struct iommu_table *tbl;
+   unsigned int weight = pnv_pci_ioda_pe_dma_weight(pe);
+   int64_t rc;
+
+   if (!weight)
+   return;
+
+   tbl = pe->table_group.tables[0];
+   rc = pnv_pci_ioda2_unset_window(>table_group, 0);
+   if (rc)
+   pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
+
+   pnv_pci_ioda2_set_bypass(pe, false);
+   if (pe->table_group.group) {
+   iommu_group_put(pe->table_group.group);
+   WARN_ON(pe->table_group.group);
+   }
+
+   pnv_pci_ioda_table_free_pages(tbl);
+   iommu_free_table(tbl, "pnv");
+}
+
+static void pnv_ioda_release_pe_seg(struct pnv_ioda_pe *pe)
+{
+   struct pnv_phb *phb = pe->phb;
+   int win, index, *segmap = NULL;
+   int64_t rc;
+
+   for (win = OPAL_M32_WINDOW_TYPE; win <= OPAL_IO_WINDOW_TYPE; win++) {
+   if (phb->type == PNV_PHB_IODA2 &&
+   (win == OPAL_IO_WINDOW_TYPE || win == OPAL_M64_WINDOW_TYPE))
+   continue;
+
+   switch (win) {
+   case OPAL_IO_WINDOW_TYPE:
+   segmap = phb->ioda.io_segmap;
+   break;
+   case OPAL_M32_WINDOW_TYPE:
+   segmap = phb->ioda.m32_segmap;
+   break;
+   case OPAL_M64_WINDOW_TYPE:
+   segmap = phb->ioda.m64_segmap;
+   break;
+   }
+
+   for (index = 0; index < phb->ioda.total_pe_num; index++) {
+   if (segmap[index] != pe->pe_number)
+   continue;
+
+   if (win == OPAL_M64_WINDOW_TYPE)
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   phb->ioda.reserved_pe_idx, win,
+   index / PNV_IODA1_M64_SEGS,
+   index % PNV_IODA1_M64_SEGS);
+   else
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   phb->ioda.reserved_pe_idx, win,
+   0, index);
+ 

[PATCH v8 24/45] powerpc/pci: Rename pcibios_{add, remove}_pci_devices()

2016-02-16 Thread Gavin Shan
This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
with names of the weak functions in PCI subsystem, which have the
prefix "pcibios". No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h |  4 ++--
 arch/powerpc/kernel/eeh_driver.c  | 12 ++--
 arch/powerpc/kernel/pci-hotplug.c | 15 +++
 drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
 drivers/pci/hotplug/rpaphp_core.c |  4 ++--
 drivers/pci/hotplug/rpaphp_pci.c  |  2 +-
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 4dd6ef4..c817f38 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct 
pci_dn *pdn)
 extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
+extern void pci_remove_pci_devices(struct pci_bus *bus);
 
 /** Discover new pci devices under this bus, and add them */
-extern void pcibios_add_pci_devices(struct pci_bus *bus);
+extern void pci_add_pci_devices(struct pci_bus *bus);
 
 
 extern void isa_bridge_find_early(struct pci_controller *hose);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fb6207d..59e53fe 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
 * We don't remove the corresponding PE instances because
 * we need the information afterwords. The attached EEH
 * devices are expected to be attached soon when calling
-* into pcibios_add_pci_devices().
+* into pci_add_pci_devices().
 */
eeh_pe_state_mark(pe, EEH_PE_KEEP);
if (bus) {
@@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
} else {
eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
pci_lock_rescan_remove();
-   pcibios_remove_pci_devices(bus);
+   pci_remove_pci_devices(bus);
pci_unlock_rescan_remove();
}
} else if (frozen_bus) {
@@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
if (pe->type & EEH_PE_VF)
eeh_add_virt_device(edev, NULL);
else
-   pcibios_add_pci_devices(bus);
+   pci_add_pci_devices(bus);
} else if (frozen_bus && rmv_data->removed) {
pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
ssleep(5);
@@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
if (pe->type & EEH_PE_VF)
eeh_add_virt_device(edev, NULL);
else
-   pcibios_add_pci_devices(frozen_bus);
+   pci_add_pci_devices(frozen_bus);
}
eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -896,7 +896,7 @@ perm_error:
eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 
pci_lock_rescan_remove();
-   pcibios_remove_pci_devices(frozen_bus);
+   pci_remove_pci_devices(frozen_bus);
pci_unlock_rescan_remove();
}
}
@@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
bus = eeh_pe_bus_get(phb_pe);
eeh_pe_dev_traverse(pe,
eeh_report_failure, NULL);
-   pcibios_remove_pci_devices(bus);
+   pci_remove_pci_devices(bus);
}
pci_unlock_rescan_remove();
}
diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 59c4361..78bf2a1 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -38,20 +38,20 @@ void pcibios_release_device(struct pci_dev *dev)
 }
 
 /**
- * pcibios_remove_pci_devices - remove all devices under this bus
+ * pci_remove_pci_devices - remove all devices under this bus
  * @bus: the indicated PCI bus
  *
  * Remove all of the PCI devices under this bus both from the
  * linux pci device tree, and from the powerpc EEH address cache.
  */
-void pcibios_remove_pci_devices(struct pci_bus *bus)
+void pci_remove_pci_devices(struct pci_bus *bus)
 {
struct pci_dev *dev, *tmp;
struct pci_bus *child_bus;
 
/* First go down child busses */
list_for_each_entry(child_bus, >children, node)
-

[PATCH v8 04/45] powerpc/powernv: Cleanup on pci_controller_ops instances

2016-02-16 Thread Gavin Shan
This cleans up on below data struct instances to use tab instead of
space indent of statement to avoid complains from scripts/checkpatch.pl.
No logical changes introduced.

  @pnv_pci_ioda_controller_ops
  @pnv_npu_ioda_controller_ops

Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Axtens 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 36 +++
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index c5baaf3..524c9c7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3210,31 +3210,31 @@ static void pnv_pci_ioda_shutdown(struct pci_controller 
*hose)
 }
 
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
-   .dma_dev_setup = pnv_pci_dma_dev_setup,
-   .dma_bus_setup = pnv_pci_dma_bus_setup,
+   .dma_dev_setup  = pnv_pci_dma_dev_setup,
+   .dma_bus_setup  = pnv_pci_dma_bus_setup,
 #ifdef CONFIG_PCI_MSI
-   .setup_msi_irqs = pnv_setup_msi_irqs,
-   .teardown_msi_irqs = pnv_teardown_msi_irqs,
+   .setup_msi_irqs = pnv_setup_msi_irqs,
+   .teardown_msi_irqs  = pnv_teardown_msi_irqs,
 #endif
-   .enable_device_hook = pnv_pci_enable_device_hook,
-   .window_alignment = pnv_pci_window_alignment,
-   .reset_secondary_bus = pnv_pci_reset_secondary_bus,
-   .dma_set_mask = pnv_pci_ioda_dma_set_mask,
-   .dma_get_required_mask = pnv_pci_ioda_dma_get_required_mask,
-   .shutdown = pnv_pci_ioda_shutdown,
+   .enable_device_hook = pnv_pci_enable_device_hook,
+   .window_alignment   = pnv_pci_window_alignment,
+   .reset_secondary_bus= pnv_pci_reset_secondary_bus,
+   .dma_set_mask   = pnv_pci_ioda_dma_set_mask,
+   .dma_get_required_mask  = pnv_pci_ioda_dma_get_required_mask,
+   .shutdown   = pnv_pci_ioda_shutdown,
 };
 
 static const struct pci_controller_ops pnv_npu_ioda_controller_ops = {
-   .dma_dev_setup = pnv_pci_dma_dev_setup,
+   .dma_dev_setup  = pnv_pci_dma_dev_setup,
 #ifdef CONFIG_PCI_MSI
-   .setup_msi_irqs = pnv_setup_msi_irqs,
-   .teardown_msi_irqs = pnv_teardown_msi_irqs,
+   .setup_msi_irqs = pnv_setup_msi_irqs,
+   .teardown_msi_irqs  = pnv_teardown_msi_irqs,
 #endif
-   .enable_device_hook = pnv_pci_enable_device_hook,
-   .window_alignment = pnv_pci_window_alignment,
-   .reset_secondary_bus = pnv_pci_reset_secondary_bus,
-   .dma_set_mask = pnv_npu_dma_set_mask,
-   .shutdown = pnv_pci_ioda_shutdown,
+   .enable_device_hook = pnv_pci_enable_device_hook,
+   .window_alignment   = pnv_pci_window_alignment,
+   .reset_secondary_bus= pnv_pci_reset_secondary_bus,
+   .dma_set_mask   = pnv_npu_dma_set_mask,
+   .shutdown   = pnv_pci_ioda_shutdown,
 };
 
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order

2016-02-16 Thread Gavin Shan
PE number for one particular PE can be allocated dynamically or
reserved according to the consumed M64 (64-bits prefetchable)
segments of the PE. The M64 resources, and hence their segments
and PE number are assigned/reserved in ascending order. The PE
numbers are allocated dynamically in ascending order as well.
It's not a problem as the PE numbers are reserved and then
allocated all at once in fine order. However, it will introduce
conflicts when PCI hotplug is supported: the PE number to be
reserved for newly added PE might have been assigned.

To resolve above conflicts, this forces the PE number to be
allocated dynamically in reverse order. With this patch applied,
the PE numbers are reserved in ascending order, but allocated
dynamically in reverse order.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f182ca7..565725b 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -144,16 +144,14 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int 
pe_no)
 
 static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
-   unsigned long pe;
+   unsigned long pe = phb->ioda.total_pe_num - 1;
 
-   do {
-   pe = find_next_zero_bit(phb->ioda.pe_alloc,
-   phb->ioda.total_pe_num, 0);
-   if (pe >= phb->ioda.total_pe_num)
-   return NULL;
-   } while(test_and_set_bit(pe, phb->ioda.pe_alloc));
+   for (pe = phb->ioda.total_pe_num - 1; pe >= 0; pe--) {
+   if (!test_and_set_bit(pe, phb->ioda.pe_alloc))
+   return pnv_ioda_init_pe(phb, pe);
+   }
 
-   return pnv_ioda_init_pe(phb, pe);
+   return NULL;
 }
 
 static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 25/45] powerpc/pci: Rename pcibios_find_pci_bus()

2016-02-16 Thread Gavin Shan
This renames pcibios_find_pci_bus() to pci_find_bus_by_node() to
avoid conflicts with those PCI subsystem weak function names, which
have prefix "pcibios". No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h  | 2 +-
 arch/powerpc/platforms/pseries/pci_dlpar.c | 5 ++---
 drivers/pci/hotplug/rpadlpar_core.c| 6 +++---
 drivers/pci/hotplug/rpaphp_pci.c   | 2 +-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index c817f38..03f4ee7 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -260,7 +260,7 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct pci_dn 
*pdn)
 #endif
 
 /** Find the bus corresponding to the indicated device node */
-extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
+extern struct pci_bus *pci_find_bus_by_node(struct device_node *dn);
 
 /** Remove all of the PCI devices under this bus */
 extern void pci_remove_pci_devices(struct pci_bus *bus);
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 5d4a3df..aee22b4 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -54,8 +54,7 @@ find_bus_among_children(struct pci_bus *bus,
return child;
 }
 
-struct pci_bus *
-pcibios_find_pci_bus(struct device_node *dn)
+struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
 {
struct pci_dn *pdn = dn->data;
 
@@ -64,7 +63,7 @@ pcibios_find_pci_bus(struct device_node *dn)
 
return find_bus_among_children(pdn->phb->bus, dn);
 }
-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
+EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
 
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
diff --git a/drivers/pci/hotplug/rpadlpar_core.c 
b/drivers/pci/hotplug/rpadlpar_core.c
index 730982b..acbf041 100644
--- a/drivers/pci/hotplug/rpadlpar_core.c
+++ b/drivers/pci/hotplug/rpadlpar_core.c
@@ -175,7 +175,7 @@ static int dlpar_add_pci_slot(char *drc_name, struct 
device_node *dn)
struct pci_dev *dev;
struct pci_controller *phb;
 
-   if (pcibios_find_pci_bus(dn))
+   if (pci_find_bus_by_node(dn))
return -EINVAL;
 
/* Add pci bus */
@@ -212,7 +212,7 @@ static int dlpar_remove_phb(char *drc_name, struct 
device_node *dn)
struct pci_dn *pdn;
int rc = 0;
 
-   if (!pcibios_find_pci_bus(dn))
+   if (!pci_find_bus_by_node(dn))
return -EINVAL;
 
/* If pci slot is hotpluggable, use hotplug to remove it */
@@ -356,7 +356,7 @@ int dlpar_remove_pci_slot(char *drc_name, struct 
device_node *dn)
 
pci_lock_rescan_remove();
 
-   bus = pcibios_find_pci_bus(dn);
+   bus = pci_find_bus_by_node(dn);
if (!bus) {
ret = -EINVAL;
goto out;
diff --git a/drivers/pci/hotplug/rpaphp_pci.c b/drivers/pci/hotplug/rpaphp_pci.c
index 1099b38..a9180bb 100644
--- a/drivers/pci/hotplug/rpaphp_pci.c
+++ b/drivers/pci/hotplug/rpaphp_pci.c
@@ -93,7 +93,7 @@ int rpaphp_enable_slot(struct slot *slot)
if (rc)
return rc;
 
-   bus = pcibios_find_pci_bus(slot->dn);
+   bus = pci_find_bus_by_node(slot->dn);
if (!bus) {
err("%s: no pci_bus for dn %s\n", __func__, 
slot->dn->full_name);
return -EINVAL;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 42/45] drivers/of: Rename unflatten_dt_node()

2016-02-16 Thread Gavin Shan
This renames unflatten_dt_node() to unflatten_dt_nodes() as it
populates multiple device nodes from FDT blob. No logical changes
introduced.

Signed-off-by: Gavin Shan 
---
 drivers/of/fdt.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 667a5b2..3fc9a30 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -381,7 +381,7 @@ static void reverse_nodes(struct device_node *parent)
 }
 
 /**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
+ * unflatten_dt_nodes - Alloc and populate a device_node from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
  * @dad: Parent struct device_node
@@ -389,10 +389,10 @@ static void reverse_nodes(struct device_node *parent)
  *
  * It returns the size of unflattened device tree or error code
  */
-static int unflatten_dt_node(const void *blob,
-void *mem,
-struct device_node *dad,
-struct device_node **nodepp)
+static int unflatten_dt_nodes(const void *blob,
+ void *mem,
+ struct device_node *dad,
+ struct device_node **nodepp)
 {
struct device_node *root;
int offset = 0, depth = 0;
@@ -479,7 +479,7 @@ static void __unflatten_device_tree(const void *blob,
}
 
/* First pass, scan for size */
-   size = unflatten_dt_node(blob, NULL, NULL, NULL);
+   size = unflatten_dt_nodes(blob, NULL, NULL, NULL);
if (size < 0)
return;
 
@@ -495,7 +495,7 @@ static void __unflatten_device_tree(const void *blob,
pr_debug("  unflattening %p...\n", mem);
 
/* Second pass, do actual unflattening */
-   unflatten_dt_node(blob, mem, NULL, mynodes);
+   unflatten_dt_nodes(blob, mem, NULL, mynodes);
if (be32_to_cpup(mem + size) != 0xdeadbeef)
pr_warning("End of tree marker overwritten: %08x\n",
   be32_to_cpup(mem + size));
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 31/45] powerpc/pci: Don't scan empty slot

2016-02-16 Thread Gavin Shan
In hotplug case, function pci_add_pci_devices() is called to rescan
the specified PCI bus, which might not have any child devices. Access
to the PCI bus's child device node will cause kernel crash without
exception.

This adds one more check to skip scanning PCI bus that doesn't have
any subordinate devices from device-tree, in order to avoid kernel
crash.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/pci-hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 7929a1c..3628c38 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -120,7 +120,8 @@ void pci_add_pci_devices(struct pci_bus *bus)
if (mode == PCI_PROBE_DEVTREE) {
/* use ofdt-based probe */
of_rescan_bus(dn, bus);
-   } else if (mode == PCI_PROBE_NORMAL) {
+   } else if (mode == PCI_PROBE_NORMAL &&
+  dn->child && PCI_DN(dn->child)) {
/*
 * Use legacy probe. In the partial hotplug case, we
 * probably have grandchildren devices unplugged. So
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption

2016-02-16 Thread Gavin Shan
When unplugging PCI devices, their parent PEs might be offline.
The consumed M64 resource by the PEs should be released at that
time. As we track M32 segment consumption, this introduces an
array to the PHB to track the mapping between M64 segment and
PE number.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 10 --
 arch/powerpc/platforms/powernv/pci.h  |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7330a73..fc0374a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -305,6 +305,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
phb->ioda.total_pe_num) {
pe = >ioda.pe_array[i];
 
+   phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
if (!master_pe) {
pe->flags |= PNV_IODA_PE_MASTER;
INIT_LIST_HEAD(>slaves);
@@ -3245,7 +3246,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 {
struct pci_controller *hose;
struct pnv_phb *phb;
-   unsigned long size, m32map_off, pemap_off, iomap_off = 0;
+   unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
const __be64 *prop64;
const __be32 *prop32;
int i, len;
@@ -3332,6 +,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 
/* Allocate aux data & arrays. We don't have IO ports on PHB3 */
size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
+   m64map_off = size;
+   size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
m32map_off = size;
size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
if (phb->type == PNV_PHB_IODA1) {
@@ -3342,9 +3345,12 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
aux = memblock_virt_alloc(size, 0);
phb->ioda.pe_alloc = aux;
+   phb->ioda.m64_segmap = aux + m64map_off;
phb->ioda.m32_segmap = aux + m32map_off;
-   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   for (i = 0; i < phb->ioda.total_pe_num; i++) {
+   phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+   }
if (phb->type == PNV_PHB_IODA1) {
phb->ioda.io_segmap = aux + iomap_off;
for (i = 0; i < phb->ioda.total_pe_num; i++)
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 36c4965..866a5ea 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,6 +146,7 @@ struct pnv_phb {
struct pnv_ioda_pe  *pe_array;
 
/* M32 & IO segment maps */
+   int *m64_segmap;
int *m32_segmap;
int *io_segmap;
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 05/45] powerpc/powernv: Drop phb->bdfn_to_pe()

2016-02-16 Thread Gavin Shan
This drops struct pnv_phb::bdfn_to_pe() as nobody uses it.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 -
 arch/powerpc/platforms/powernv/pci.h  | 1 -
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 524c9c7..10ecd97 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3195,12 +3195,6 @@ static bool pnv_pci_enable_device_hook(struct pci_dev 
*dev)
return true;
 }
 
-static u32 pnv_ioda_bdfn_to_pe(struct pnv_phb *phb, struct pci_bus *bus,
-  u32 devfn)
-{
-   return phb->ioda.pe_rmap[(bus->number << 8) | devfn];
-}
-
 static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
 {
struct pnv_phb *phb = hose->private_data;
@@ -3377,9 +3371,6 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
phb->freeze_pe = pnv_ioda_freeze_pe;
phb->unfreeze_pe = pnv_ioda_unfreeze_pe;
 
-   /* Setup RID -> PE mapping function */
-   phb->bdfn_to_pe = pnv_ioda_bdfn_to_pe;
-
/* Setup TCEs */
phb->dma_dev_setup = pnv_pci_ioda_dma_dev_setup;
 
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 3f814f3..78f035e 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -110,7 +110,6 @@ struct pnv_phb {
 unsigned int is_64, struct msi_msg *msg);
void (*dma_dev_setup)(struct pnv_phb *phb, struct pci_dev *pdev);
void (*fixup_phb)(struct pci_controller *hose);
-   u32 (*bdfn_to_pe)(struct pnv_phb *phb, struct pci_bus *bus, u32 devfn);
int (*init_m64)(struct pnv_phb *phb);
void (*reserve_m64_pe)(struct pci_bus *bus,
   unsigned long *pe_bitmap, bool all);
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 35/45] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()

2016-02-16 Thread Gavin Shan
In pnv_pci_reset_secondary_bus(), we should issue fundamental reset
if any one subordinate device of the specified bus is requesting that.
Otherwise, the device might not come up after the reset.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 593b8dc..c7454ba 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -866,9 +866,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
option)
return 0;
 }
 
+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
+{
+   int *freset = data;
+
+   /*
+* Stop the iteration immediately if there has any one
+* PCI device requesting fundamental reset.
+*/
+   *freset |= pdev->needs_freset;
+   return *freset;
+}
+
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
-   pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
+   int option, freset = 0;
+
+   if (dev->subordinate)
+   pci_walk_bus(dev->subordinate,
+pnv_pci_dev_reset_type, );
+
+   option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
+   pnv_eeh_bridge_reset(dev, option);
pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status

2016-02-16 Thread Gavin Shan
This exports 4 functins, which base on the corresponding OPAL
APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:

   pnv_pci_get_device_tree()opal_get_device_tree()
   pnv_pci_get_presence_state() opal_pci_get_presence_state()
   pnv_pci_get_power_state()opal_pci_get_power_state()
   pnv_pci_set_power_state()opal_pci_set_power_state()

Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
unregister}() to allow registration and unregistration of PCI hotplug
notifier, which will be used to receive PCI hotplug message from
skiboot firmware in PowerNV PCI hotplug driver.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/opal-api.h| 17 ++-
 arch/powerpc/include/asm/opal.h|  4 ++
 arch/powerpc/include/asm/pnv-pci.h |  7 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
 arch/powerpc/platforms/powernv/pci.c   | 66 ++
 5 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index f8faaae..a6af338 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -158,7 +158,11 @@
 #define OPAL_LEDS_SET_INDICATOR115
 #define OPAL_CEC_REBOOT2   116
 #define OPAL_CONSOLE_FLUSH 117
-#define OPAL_LAST  117
+#define OPAL_GET_DEVICE_TREE   118
+#define OPAL_PCI_GET_PRESENCE_STATE119
+#define OPAL_PCI_GET_POWER_STATE   120
+#define OPAL_PCI_SET_POWER_STATE   121
+#define OPAL_LAST  121
 
 /* Device tree flags */
 
@@ -344,6 +348,16 @@ enum OpalPciResetState {
OPAL_ASSERT_RESET   = 1
 };
 
+enum OpalPciSlotPresentenceState {
+   OPAL_PCI_SLOT_EMPTY = 0,
+   OPAL_PCI_SLOT_PRESENT   = 1
+};
+
+enum OpalPciSlotPowerState {
+   OPAL_PCI_SLOT_POWER_OFF = 0,
+   OPAL_PCI_SLOT_POWER_ON  = 1
+};
+
 enum OpalSlotLedType {
OPAL_SLOT_LED_TYPE_ID = 0,  /* IDENTIFY LED */
OPAL_SLOT_LED_TYPE_FAULT = 1,   /* FAULT LED */
@@ -378,6 +392,7 @@ enum opal_msg_type {
OPAL_MSG_DPO,
OPAL_MSG_PRD,
OPAL_MSG_OCC,
+   OPAL_MSG_PCI_HOTPLUG,
OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9e0039f..899bcb941 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
uint64_t buf,
uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h 
b/arch/powerpc/include/asm/pnv-pci.h
index 6f77f71..d9d095b 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,13 @@
 #include 
 #include 
 
+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index e45b88a..3ea1a855 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg, 
OPAL_PRD_MSG);
 OPAL_CALL(opal_leds_get_ind,   OPAL_LEDS_GET_INDICATOR);
 OPAL_CALL(opal_leds_set_ind,   OPAL_LEDS_SET_INDICATOR);
 OPAL_CALL(opal_console_flush,  OPAL_CONSOLE_FLUSH);
+OPAL_CALL(opal_get_device_tree,OPAL_GET_DEVICE_TREE);
+OPAL_CALL(opal_pci_get_presence_state, OPAL_PCI_GET_PRESENCE_STATE);
+OPAL_CALL(opal_pci_get_power_state,OPAL_PCI_GET_POWER_STATE);
+OPAL_CALL(opal_pci_set_power_state,OPAL_PCI_SET_POWER_STATE);
diff --git a/arch/powerpc/platforms/powernv/pci.c 

[PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track

2016-02-16 Thread Gavin Shan
In current implementation, the DMA32 segments required by one specific
PE isn't calculated with the information hold in the PE independently.
It conflicts with the PCI hotplug design: PE centralized, meaning the
PE's DMA32 segments should be calculated from the information hold in
the PE independently.

This introduces an array (@dma32_segmap) for every PHB to track the
DMA32 segmeng usage. Besides, this moves the logic calculating PE's
consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
DMA32 segments are calculated/allocated from the information hold in
the PE (DMA32 weight). Also the logic is improved: we try to allocate
as much DMA32 segments as we can. It's acceptable that number of DMA32
segments less than the expected number are allocated.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 111 +-
 arch/powerpc/platforms/powernv/pci.h  |   7 +-
 2 files changed, 66 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0fc2309..59782fba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2007,20 +2007,54 @@ static unsigned int 
pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
 }
 
 static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
-  struct pnv_ioda_pe *pe,
-  unsigned int base,
-  unsigned int segs)
+  struct pnv_ioda_pe *pe)
 {
 
struct page *tce_mem = NULL;
struct iommu_table *tbl;
-   unsigned int tce32_segsz, i;
+   unsigned int weight, total_weight;
+   unsigned int tce32_segsz, base, segs, i;
int64_t rc;
void *addr;
 
/* XXX FIXME: Handle 64-bit only DMA devices */
/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
/* XXX FIXME: Allocate multi-level tables on PHB3 */
+   total_weight = pnv_pci_ioda_total_dma_weight(phb);
+   weight = pnv_pci_ioda_pe_dma_weight(pe);
+
+   segs = (weight * phb->ioda.dma32_count) / total_weight;
+   if (!segs)
+   segs = 1;
+
+   /*
+* Allocate contiguous DMA32 segments. We begin with the expected
+* number of segments. With one more attempt, the number of DMA32
+* segments to be allocated is decreased by one until one segment
+* is allocated successfully.
+*/
+   while (segs) {
+   for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
+   for (i = base; i < base + segs; i++) {
+   if (phb->ioda.dma32_segmap[i] !=
+   IODA_INVALID_PE)
+   break;
+   }
+
+   if (i >= base + segs)
+   break;
+   }
+
+   if (i >= base + segs)
+   break;
+
+   segs--;
+   }
+
+   if (!segs) {
+   pe_warn(pe, "No available DMA32 segments\n");
+   return;
+   }
 
tbl = pnv_pci_table_alloc(phb->hose->node);
iommu_register_group(>table_group, phb->hose->global_number,
@@ -2028,6 +2062,8 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
pnv_pci_link_table_and_group(phb->hose->node, 0, tbl, >table_group);
 
/* Grab a 32-bit TCE table */
+   pe_info(pe, "DMA weight %d (%d), assigned (%d) %d DMA32 segments\n",
+   weight, total_weight, base, segs);
pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
base * PNV_IODA1_DMA32_SEGSIZE,
(base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
@@ -2064,6 +2100,10 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
}
}
 
+   /* Setup DMA32 segment mapping */
+   for (i = base; i < base + segs; i++)
+   phb->ioda.dma32_segmap[i] = pe->pe_number;
+
/* Setup linux iommu table */
pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
  base * PNV_IODA1_DMA32_SEGSIZE,
@@ -2538,70 +2578,34 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 static void pnv_ioda_setup_dma(struct pnv_phb *phb)
 {
struct pci_controller *hose = phb->hose;
-   unsigned int weight, total_weight, dma_pe_count;
-   unsigned int residual, remaining, segs, base;
struct pnv_ioda_pe *pe;
-
-   total_weight = pnv_pci_ioda_total_dma_weight(phb);
-   dma_pe_count = 0;
-   list_for_each_entry(pe, >ioda.pe_list, list) {
-   weight = pnv_pci_ioda_pe_dma_weight(pe);
-   if (weight > 0)
-   dma_pe_count++;
-   }
+   unsigned int weight;
 

[PATCH v8 44/45] drivers/of: Return allocated memory from of_fdt_unflatten_tree()

2016-02-16 Thread Gavin Shan
This returns the allocate memory chunk, storing the unflattened device
tree, from of_fdt_unflatten_tree() so that memory chunk can be released
on demand in PowerNV PCI hotplug driver.

Signed-off-by: Gavin Shan 
Acked-by: Rob Herring 
---
 drivers/of/fdt.c   | 33 ++---
 include/linux/of_fdt.h |  6 +++---
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 16a1ba5..47ec278 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -454,11 +454,14 @@ static int unflatten_dt_nodes(const void *blob,
  * @mynodes: The device_node tree created by the call
  * @dt_alloc: An allocator that provides a virtual address to memory
  * for the resulting tree
+ *
+ * Returns NULL on failure or the memory chunk containing the unflattened
+ * device tree on success.
  */
-static void __unflatten_device_tree(const void *blob,
-struct device_node *dad,
-struct device_node **mynodes,
-void * (*dt_alloc)(u64 size, u64 align))
+static void *__unflatten_device_tree(const void *blob,
+struct device_node *dad,
+struct device_node **mynodes,
+void *(*dt_alloc)(u64 size, u64 align))
 {
int size;
void *mem;
@@ -467,7 +470,7 @@ static void __unflatten_device_tree(const void *blob,
 
if (!blob) {
pr_debug("No device tree pointer\n");
-   return;
+   return NULL;
}
 
pr_debug("Unflattening device tree:\n");
@@ -477,13 +480,13 @@ static void __unflatten_device_tree(const void *blob,
 
if (fdt_check_header(blob)) {
pr_err("Invalid device tree blob header\n");
-   return;
+   return NULL;
}
 
/* First pass, scan for size */
size = unflatten_dt_nodes(blob, NULL, dad, NULL);
if (size < 0)
-   return;
+   return NULL;
 
size = ALIGN(size, 4);
pr_debug("  size is %d, allocating...\n", size);
@@ -503,6 +506,7 @@ static void __unflatten_device_tree(const void *blob,
   be32_to_cpup(mem + size));
 
pr_debug(" <- unflatten_device_tree()\n");
+   return mem;
 }
 
 static void *kernel_tree_alloc(u64 size, u64 align)
@@ -522,14 +526,21 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
  * tree of struct device_node. It also fills the "name" and "type"
  * pointers of the nodes so the normal device-tree walking functions
  * can be used.
+ *
+ * Returns NULL on failure or the memory chunk containing the unflattened
+ * device tree on success.
  */
-void of_fdt_unflatten_tree(const unsigned long *blob,
-   struct device_node *dad,
-   struct device_node **mynodes)
+void *of_fdt_unflatten_tree(const unsigned long *blob,
+   struct device_node *dad,
+   struct device_node **mynodes)
 {
+   void *mem;
+
mutex_lock(_fdt_unflatten_mutex);
-   __unflatten_device_tree(blob, dad, mynodes, _tree_alloc);
+   mem = __unflatten_device_tree(blob, dad, mynodes, _tree_alloc);
mutex_unlock(_fdt_unflatten_mutex);
+
+   return mem;
 }
 EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
 
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 3644960..b87b26a7 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -37,9 +37,9 @@ extern bool of_fdt_is_big_endian(const void *blob,
 unsigned long node);
 extern int of_fdt_match(const void *blob, unsigned long node,
const char *const *compat);
-extern void of_fdt_unflatten_tree(const unsigned long *blob,
-  struct device_node *dad,
-  struct device_node **mynodes);
+extern void *of_fdt_unflatten_tree(const unsigned long *blob,
+  struct device_node *dad,
+  struct device_node **mynodes);
 
 /* TBD: Temporary export of fdt globals - remove when code fully merged */
 extern int __initdata dt_root_addr_cells;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 43/45] drivers/of: Specify parent node in of_fdt_unflatten_tree()

2016-02-16 Thread Gavin Shan
This adds one more argument to of_fdt_unflatten_tree() to specify
the parent node of the FDT blob that is going to be unflattened.
In the result, the function can be used to unflatten FDT blob that
represents device sub-tree in PowerNV PCI hotplug driver.

Cc: Jyri Sarha 
Signed-off-by: Gavin Shan 
---
 drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c |  2 +-
 drivers/of/fdt.c | 14 ++
 drivers/of/unittest.c|  2 +-
 include/linux/of_fdt.h   |  1 +
 4 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c 
b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
index 106679b..f9c79da 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_slave_compat.c
@@ -157,7 +157,7 @@ struct device_node * __init tilcdc_get_overlay(struct 
kfree_table *kft)
if (!overlay_data || kfree_table_add(kft, overlay_data))
return NULL;
 
-   of_fdt_unflatten_tree(overlay_data, );
+   of_fdt_unflatten_tree(overlay_data, NULL, );
if (!overlay) {
pr_warn("%s: Unfattening overlay tree failed\n", __func__);
return NULL;
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 3fc9a30..16a1ba5 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -450,11 +450,13 @@ static int unflatten_dt_nodes(const void *blob,
  * pointers of the nodes so the normal device-tree walking functions
  * can be used.
  * @blob: The blob to expand
+ * @dad: Parent device node
  * @mynodes: The device_node tree created by the call
  * @dt_alloc: An allocator that provides a virtual address to memory
  * for the resulting tree
  */
 static void __unflatten_device_tree(const void *blob,
+struct device_node *dad,
 struct device_node **mynodes,
 void * (*dt_alloc)(u64 size, u64 align))
 {
@@ -479,7 +481,7 @@ static void __unflatten_device_tree(const void *blob,
}
 
/* First pass, scan for size */
-   size = unflatten_dt_nodes(blob, NULL, NULL, NULL);
+   size = unflatten_dt_nodes(blob, NULL, dad, NULL);
if (size < 0)
return;
 
@@ -495,7 +497,7 @@ static void __unflatten_device_tree(const void *blob,
pr_debug("  unflattening %p...\n", mem);
 
/* Second pass, do actual unflattening */
-   unflatten_dt_nodes(blob, mem, NULL, mynodes);
+   unflatten_dt_nodes(blob, mem, dad, mynodes);
if (be32_to_cpup(mem + size) != 0xdeadbeef)
pr_warning("End of tree marker overwritten: %08x\n",
   be32_to_cpup(mem + size));
@@ -512,6 +514,9 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
 
 /**
  * of_fdt_unflatten_tree - create tree of device_nodes from flat blob
+ * @blob: Flat device tree blob
+ * @dad: Parent device node
+ * @mynodes: The device tree created by the call
  *
  * unflattens the device-tree passed by the firmware, creating the
  * tree of struct device_node. It also fills the "name" and "type"
@@ -519,10 +524,11 @@ static DEFINE_MUTEX(of_fdt_unflatten_mutex);
  * can be used.
  */
 void of_fdt_unflatten_tree(const unsigned long *blob,
+   struct device_node *dad,
struct device_node **mynodes)
 {
mutex_lock(_fdt_unflatten_mutex);
-   __unflatten_device_tree(blob, mynodes, _tree_alloc);
+   __unflatten_device_tree(blob, dad, mynodes, _tree_alloc);
mutex_unlock(_fdt_unflatten_mutex);
 }
 EXPORT_SYMBOL_GPL(of_fdt_unflatten_tree);
@@ -1180,7 +1186,7 @@ bool __init early_init_dt_scan(void *params)
  */
 void __init unflatten_device_tree(void)
 {
-   __unflatten_device_tree(initial_boot_params, _root,
+   __unflatten_device_tree(initial_boot_params, NULL, _root,
early_init_dt_alloc_memory_arch);
 
/* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 979b6e4..ec36f93 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -921,7 +921,7 @@ static int __init unittest_data_add(void)
"not running tests\n", __func__);
return -ENOMEM;
}
-   of_fdt_unflatten_tree(unittest_data, _data_node);
+   of_fdt_unflatten_tree(unittest_data, NULL, _data_node);
if (!unittest_data_node) {
pr_warn("%s: No tree to attach; not running tests\n", __func__);
return -ENODATA;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index df9ef38..3644960 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -38,6 +38,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
 extern int of_fdt_match(const void *blob, unsigned long node,
const char *const *compat);
 extern void 

[PATCH v8 12/45] powerpc/powernv: Rename M64 related functions

2016-02-16 Thread Gavin Shan
This renames those functions picking PE number based on consumed
M64 segments, mapping M64 segments to PEs as those functions are
going to be shared by IODA1/IODA2 in next patch. No logical changes
introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index fc0374a..1dc663a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -219,7 +219,7 @@ fail:
return -EIO;
 }
 
-static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev *pdev,
+static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev *pdev,
 unsigned long *pe_bitmap)
 {
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
@@ -246,22 +246,22 @@ static void pnv_ioda2_reserve_dev_m64_pe(struct pci_dev 
*pdev,
}
 }
 
-static void pnv_ioda2_reserve_m64_pe(struct pci_bus *bus,
-unsigned long *pe_bitmap,
-bool all)
+static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
+   unsigned long *pe_bitmap,
+   bool all)
 {
struct pci_dev *pdev;
 
list_for_each_entry(pdev, >devices, bus_list) {
-   pnv_ioda2_reserve_dev_m64_pe(pdev, pe_bitmap);
+   pnv_ioda_reserve_dev_m64_pe(pdev, pe_bitmap);
 
if (all && pdev->subordinate)
-   pnv_ioda2_reserve_m64_pe(pdev->subordinate,
-pe_bitmap, all);
+   pnv_ioda_reserve_m64_pe(pdev->subordinate,
+   pe_bitmap, all);
}
 }
 
-static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool all)
+static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
struct pci_controller *hose = pci_bus_to_host(bus);
struct pnv_phb *phb = hose->private_data;
@@ -283,7 +283,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
}
 
/* Figure out reserved PE numbers by the PE */
-   pnv_ioda2_reserve_m64_pe(bus, pe_alloc, all);
+   pnv_ioda_reserve_m64_pe(bus, pe_alloc, all);
 
/*
 * the current bus might not own M64 window and that's all
@@ -365,8 +365,8 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb 
*phb)
/* Use last M64 BAR to cover M64 window */
phb->ioda.m64_bar_idx = 15;
phb->init_m64 = pnv_ioda2_init_m64;
-   phb->reserve_m64_pe = pnv_ioda2_reserve_m64_pe;
-   phb->pick_m64_pe = pnv_ioda2_pick_m64_pe;
+   phb->reserve_m64_pe = pnv_ioda_reserve_m64_pe;
+   phb->pick_m64_pe = pnv_ioda_pick_m64_pe;
 }
 
 static void pnv_ioda_freeze_pe(struct pnv_phb *phb, int pe_no)
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC

2016-02-16 Thread Gavin Shan
The device tree will change dynamically in PowerNV PCI hotplug
driver. This enables CONFIG_OF_DYNAMIC to support that.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 604190c..e7b1ad7 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -18,6 +18,7 @@ config PPC_POWERNV
select CPU_FREQ_GOV_ONDEMAND
select CPU_FREQ_GOV_CONSERVATIVE
select PPC_DOORBELL
+   select OF_DYNAMIC
default y
 
 config OPAL_PRD
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver

2016-02-16 Thread Gavin Shan
This adds standalone driver to support PCI hotplug for PowerPC PowerNV
platform that runs on top of skiboot firmware. The firmware identifies
hotpluggable slots and marked their device tree node with proper
"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
device tree nodes to create/register PCI hotplug slot accordingly.

The PCI slots are organized in fashion of tree, which means one
PCI slot might have parent PCI slot and parent PCI slot possibly
contains multiple child PCI slots. At the plugging time, the parent
PCI slot is populated before its children. The child PCI slots are
removed before their parent PCI slot can be removed from the system.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/hotplug/Kconfig   |  12 +
 drivers/pci/hotplug/Makefile  |   3 +
 drivers/pci/hotplug/pnv_php.c | 870 ++
 3 files changed, 885 insertions(+)
 create mode 100644 drivers/pci/hotplug/pnv_php.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..167c8ce 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+   tristate "PowerPC PowerNV PCI Hotplug driver"
+   depends on PPC_POWERNV && EEH
+   help
+ Say Y here if you run PowerPC PowerNV platform that supports
+ PCI Hotplug
+
+ To compile this driver as a module, choose M here: the
+ module will be called pnv-php.
+
+ When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
tristate "RPA PCI Hotplug driver"
depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index b616e75..e33cdda 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)  += cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC) += cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC) += shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)  += pnv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)  += rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)  += sgi_hotplug.o
@@ -50,6 +51,8 @@ ibmphp-objs   :=  ibmphp_core.o   \
 acpiphp-objs   :=  acpiphp_core.o  \
acpiphp_glue.o
 
+pnv-php-objs   :=  pnv_php.o
+
 rpaphp-objs:=  rpaphp_core.o   \
rpaphp_pci.o\
rpaphp_slot.o
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
new file mode 100644
index 000..364ec36
--- /dev/null
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -0,0 +1,870 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION "0.1"
+#define DRIVER_AUTHOR  "Gavin Shan, IBM Corporation"
+#define DRIVER_DESC"PowerPC PowerNV PCI Hotplug Driver"
+
+struct pnv_php_slot {
+   struct hotplug_slot slot;
+   struct hotplug_slot_infoslot_info;
+   uint64_tid;
+   char*name;
+   int slot_no;
+   struct kref kref;
+#define PNV_PHP_STATE_INITIALIZED  0
+#define PNV_PHP_STATE_REGISTERED   1
+#define PNV_PHP_STATE_POPULATED2
+   int state;
+   struct device_node  *dn;
+   struct pci_dev  *pdev;
+   struct pci_bus  *bus;
+   boolpower_state_check;
+   int power_state_confirmed;
+#define PNV_PHP_POWER_CONFIRMED_INVALID0
+#define PNV_PHP_POWER_CONFIRMED_SUCCESS1
+#define PNV_PHP_POWER_CONFIRMED_FAIL   2
+   struct opal_msg *msg;
+   void*fdt;
+   void*dt;
+   struct of_changeset ocs;
+   struct work_struct  work;
+   wait_queue_head_t

[PATCH v8 19/45] powerpc/powernv: Use PE instead of number during setup and release

2016-02-16 Thread Gavin Shan
In current implementation, the PEs that are allocated or picked
from the reserved list are identified by PE number. The PE instance
has to be picked according to the PE number eventually. We have
same issue when PE is released.

For pnv_ioda_pick_m64_pe() and pnv_ioda_alloc_pe(), this returns
PE instance so that pnv_ioda_setup_bus_PE() can use the allocated
or reserved PE instance directly. Also, pnv_ioda_setup_bus_PE()
returns the reserved/allocated PE instance to be used in subsequent
patches. On the other hand, pnv_ioda_free_pe() uses PE instance
(not number) as its argument. No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 104 +-
 arch/powerpc/platforms/powernv/pci.h  |   2 +-
 2 files changed, 59 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7800897..f182ca7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -119,6 +119,14 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long 
flags)
(IORESOURCE_MEM_64 | IORESOURCE_PREFETCH));
 }
 
+static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int pe_no)
+{
+   phb->ioda.pe_array[pe_no].phb = phb;
+   phb->ioda.pe_array[pe_no].pe_number = pe_no;
+
+   return >ioda.pe_array[pe_no];
+}
+
 static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
@@ -131,11 +139,10 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int 
pe_no)
pr_debug("%s: PE %d was reserved on PHB#%x\n",
 __func__, pe_no, phb->hose->global_number);
 
-   phb->ioda.pe_array[pe_no].phb = phb;
-   phb->ioda.pe_array[pe_no].pe_number = pe_no;
+   pnv_ioda_init_pe(phb, pe_no);
 }
 
-static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
+static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
 {
unsigned long pe;
 
@@ -143,20 +150,20 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
pe = find_next_zero_bit(phb->ioda.pe_alloc,
phb->ioda.total_pe_num, 0);
if (pe >= phb->ioda.total_pe_num)
-   return IODA_INVALID_PE;
+   return NULL;
} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
-   phb->ioda.pe_array[pe].phb = phb;
-   phb->ioda.pe_array[pe].pe_number = pe;
-   return pe;
+   return pnv_ioda_init_pe(phb, pe);
 }
 
-static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
+static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
 {
-   WARN_ON(phb->ioda.pe_array[pe].pdev);
+   struct pnv_phb *phb = pe->phb;
 
-   memset(>ioda.pe_array[pe], 0, sizeof(struct pnv_ioda_pe));
-   clear_bit(pe, phb->ioda.pe_alloc);
+   WARN_ON(pe->pdev);
+
+   memset(pe, 0, sizeof(struct pnv_ioda_pe));
+   clear_bit(pe->pe_number, phb->ioda.pe_alloc);
 }
 
 /* The default M64 BAR is shared by all PEs */
@@ -316,7 +323,7 @@ static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
}
 }
 
-static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
+static struct pnv_ioda_pe *pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool all)
 {
struct pci_controller *hose = pci_bus_to_host(bus);
struct pnv_phb *phb = hose->private_data;
@@ -326,7 +333,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool 
all)
 
/* Root bus shouldn't use M64 */
if (pci_is_root_bus(bus))
-   return IODA_INVALID_PE;
+   return NULL;
 
/* Allocate bitmap */
size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
@@ -334,7 +341,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool 
all)
if (!pe_alloc) {
pr_warn("%s: Out of memory !\n",
__func__);
-   return IODA_INVALID_PE;
+   return NULL;
}
 
/* Figure out reserved PE numbers by the PE */
@@ -347,7 +354,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool 
all)
 */
if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
kfree(pe_alloc);
-   return IODA_INVALID_PE;
+   return NULL;
}
 
/*
@@ -393,7 +400,7 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool 
all)
}
 
kfree(pe_alloc);
-   return master_pe->pe_number;
+   return master_pe;
 }
 
 static void __init pnv_ioda_parse_m64_window(struct pnv_phb *phb)
@@ -959,7 +966,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
pci_dev *dev)
struct pnv_phb *phb = hose->private_data;
struct pci_dn *pdn = pci_get_pdn(dev);
struct pnv_ioda_pe *pe;
-   int pe_num;
 
if (!pdn) {
pr_err("%s: 

[PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC

2016-02-16 Thread Gavin Shan
This enables M64 window on P7IOC, which has been enabled on PHB3.
Different from PHB3 where 16 M64 BARs are supported and each of
them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
of them are divided to 8 segments. So every P7IOC PHB supports
128 M64 segments in total. P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#. In order to have same code to support M64 on P7IOC and
PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
of them is pinned to the fixed PE# by bypassing the function of
M64DT. In turn, we just need different phb->init_m64() for P7IOC
and PHB3 to support M64.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 86 +--
 arch/powerpc/platforms/powernv/pci.h  |  3 ++
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1dc663a..8488238 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev 
*pdev,
}
 }
 
+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+   struct resource *r;
+   int index;
+
+   /*
+* There are 16 M64 BARs, each of which has 8 segments. So
+* there are as many M64 segments as the maximum number of
+* PEs, which is 128.
+*/
+   for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
+   unsigned long base, segsz = phb->ioda.m64_segsize;
+   int64_t rc;
+
+   base = phb->ioda.m64_base +
+  index * PNV_IODA1_M64_SEGS * segsz;
+   rc = opal_pci_set_phb_mem_window(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index, base, 0,
+   PNV_IODA1_M64_SEGS * segsz);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
+   rc, phb->hose->global_number, index);
+   goto fail;
+   }
+
+   rc = opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index,
+   OPAL_ENABLE_M64_SPLIT);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
+   rc, phb->hose->global_number, index);
+   goto fail;
+   }
+   }
+
+   /*
+* Exclude the segment used by the reserved PE, which
+* is expected to be 0 or last supported PE#.
+*/
+   r = >hose->mem_resources[1];
+   if (phb->ioda.reserved_pe_idx == 0)
+   r->start += phb->ioda.m64_segsize;
+   else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
+   r->end -= phb->ioda.m64_segsize;
+   else
+   pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
+   phb->ioda.reserved_pe_idx);
+
+   return 0;
+
+fail:
+   for ( ; index >= 0; index--)
+   opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
+
+   return -EIO;
+}
+
 static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
unsigned long *pe_bitmap,
bool all)
@@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool 
all)
pe->master = master_pe;
list_add_tail(>list, _pe->slaves);
}
+
+   /*
+* P7IOC supports M64DT, which helps mapping M64 segment
+* to one particular PE#. However, PHB3 has fixed mapping
+* between M64 segment and PE#. In order to have same logic
+* for P7IOC and PHB3, we enforce fixed mapping between M64
+* segment and PE# on P7IOC.
+*/
+   if (phb->type == PNV_PHB_IODA1) {
+   int64_t rc;
+
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   pe->pe_number, OPAL_M64_WINDOW_TYPE,
+   pe->pe_number / PNV_IODA1_M64_SEGS,
+   pe->pe_number % PNV_IODA1_M64_SEGS);
+   if (rc != OPAL_SUCCESS)
+   pr_warn("%s: Error %lld mapping M64 for 
PHB#%d-PE#%d\n",
+   __func__, rc, phb->hose->global_number,
+   pe->pe_number);
+   }
}
 
 

[PATCH v8 34/45] powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()

2016-02-16 Thread Gavin Shan
The function pnv_pci_reset_secondary_bus() is called like below.
It's impossible for call the function on root bus. So it's safe
to remove the root bus case in the function. No functional changes
introduced.

   pci_parent_bus_reset() / pci_bus_reset() / pci_try_reset_bus()
   pci_reset_bridge_secondary_bus()
   pcibios_reset_secondary_bus()
   pnv_pci_reset_secondary_bus()

Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Axtens 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 9226df1..593b8dc 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -868,16 +868,8 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
option)
 
 void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
 {
-   struct pci_controller *hose;
-
-   if (pci_is_root_bus(dev->bus)) {
-   hose = pci_bus_to_host(dev->bus);
-   pnv_eeh_root_reset(hose, EEH_RESET_HOT);
-   pnv_eeh_root_reset(hose, EEH_RESET_DEACTIVATE);
-   } else {
-   pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
-   pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
-   }
+   pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
+   pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
 }
 
 static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, const char *type,
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure

2016-02-16 Thread Gavin Shan
The skiboot firmware might provide the PCI slot reset capability
which is identified by property "ibm,reset-by-firmware" on the
PCI slot associated device node.

This checks the property. If it exists, the reset request is routed
to firmware. Otherwise, the reset is done by kernel as before.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e23b063..c8a5217 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -789,7 +789,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, 
int option)
return ret;
 }
 
-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
 {
struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -840,6 +840,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
option)
return 0;
 }
 
+static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
+   uint64_t id = (0x1ul << 60);
+   uint8_t scope;
+   int64_t rc;
+
+   /*
+* If the firmware can't handle it, we will issue hot reset
+* on the secondary bus despite the requested reset type.
+*/
+   if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
+   return __pnv_eeh_bridge_reset(pdev, option);
+
+   /* The firmware can handle the request */
+   switch (option) {
+   case EEH_RESET_HOT:
+   scope = OPAL_RESET_PCI_HOT;
+   break;
+   case EEH_RESET_FUNDAMENTAL:
+   scope = OPAL_RESET_PCI_FUNDAMENTAL;
+   break;
+   case EEH_RESET_DEACTIVATE:
+   return 0;
+   default:
+   dev_warn(>dev, "%s: Unsupported reset %d\n",
+__func__, option);
+   return -EINVAL;
+   }
+
+   hose = pci_bus_to_host(pdev->bus);
+   phb = hose->private_data;
+   id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
+   rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
+   return pnv_pci_poll(id, rc, NULL);
+}
+
 static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
 {
int *freset = data;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 30/45] powerpc/pci: Delay populating pdn

2016-02-16 Thread Gavin Shan
The pdn (struct pci_dn) instances are allocated from memblock or
bootmem when creating PCI controller (hoses) in setup_arch(). PCI
hotplug, which will be supported by proceeding patches, releases
PCI device nodes and their corresponding pdn on unplugging event.
The memory chunks for pdn instances allocated from memblock or
bootmem are hard to reused after being released.

This delays creating pdn by pci_devs_phb_init() from setup_arch()
to core_initcall() so that they are allocated from slab. The memory
consumed by pdn can be released to system without problem during
PCI unplugging time. It indicates that pci_dn is unavailable in
setup_arch() and the the fixup on pdn (like AGP's) can't be carried
out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
on maple/pasemi/powermac platforms where/when the pdn is available.

At the mean while, the EEH device is created when pdn is populated,
meaning pdn and EEH device have same life cycle. In turn, we needn't
call eeh_dev_init() to create EEH device explicitly.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h |  2 +-
 arch/powerpc/include/asm/ppc-pci.h |  2 --
 arch/powerpc/kernel/eeh_dev.c  | 17 +++
 arch/powerpc/kernel/pci_dn.c   | 23 
 arch/powerpc/platforms/maple/pci.c | 34 ++
 arch/powerpc/platforms/pasemi/pci.c|  3 ---
 arch/powerpc/platforms/powermac/pci.c  | 38 +-
 arch/powerpc/platforms/powernv/pci.c   |  3 ---
 arch/powerpc/platforms/pseries/setup.c |  6 +-
 9 files changed, 69 insertions(+), 59 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index fb9f376..8721580 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -274,7 +274,7 @@ void eeh_pe_restore_bars(struct eeh_pe *pe);
 const char *eeh_pe_loc_get(struct eeh_pe *pe);
 struct pci_bus *eeh_pe_bus_get(struct eeh_pe *pe);
 
-void *eeh_dev_init(struct pci_dn *pdn, void *data);
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn);
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb);
 int eeh_init(void);
 int __init eeh_ops_register(struct eeh_ops *ops);
diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index 8753e4e..0f73de0 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -39,8 +39,6 @@ void *pci_traverse_device_nodes(struct device_node *start,
 void *traverse_pci_dn(struct pci_dn *root,
  void *(*fn)(struct pci_dn *, void *),
  void *data);
-
-extern void pci_devs_phb_init(void);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
 /* From rtas_pci.h */
diff --git a/arch/powerpc/kernel/eeh_dev.c b/arch/powerpc/kernel/eeh_dev.c
index 7815095..d6b2ca7 100644
--- a/arch/powerpc/kernel/eeh_dev.c
+++ b/arch/powerpc/kernel/eeh_dev.c
@@ -44,14 +44,13 @@
 /**
  * eeh_dev_init - Create EEH device according to OF node
  * @pdn: PCI device node
- * @data: PHB
  *
  * It will create EEH device according to the given OF node. The function
  * might be called by PCI emunation, DR, PHB hotplug.
  */
-void *eeh_dev_init(struct pci_dn *pdn, void *data)
+struct eeh_dev *eeh_dev_init(struct pci_dn *pdn)
 {
-   struct pci_controller *phb = data;
+   struct pci_controller *phb = pdn->phb;
struct eeh_dev *edev;
 
/* Allocate EEH device */
@@ -69,7 +68,7 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
INIT_LIST_HEAD(>list);
INIT_LIST_HEAD(>rmv_list);
 
-   return NULL;
+   return edev;
 }
 
 /**
@@ -81,16 +80,8 @@ void *eeh_dev_init(struct pci_dn *pdn, void *data)
  */
 void eeh_dev_phb_init_dynamic(struct pci_controller *phb)
 {
-   struct pci_dn *root = phb->pci_data;
-
/* EEH PE for PHB */
eeh_phb_pe_create(phb);
-
-   /* EEH device for PHB */
-   eeh_dev_init(root, phb);
-
-   /* EEH devices for children OF nodes */
-   traverse_pci_dn(root, eeh_dev_init, phb);
 }
 
 /**
@@ -106,8 +97,6 @@ static int __init eeh_dev_phb_init(void)
list_for_each_entry_safe(phb, tmp, _list, list_node)
eeh_dev_phb_init_dynamic(phb);
 
-   pr_info("EEH: devices created\n");
-
return 0;
 }
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index ecdccce..9cbf95a 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -209,8 +209,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
}
 
/* Create the EEH device for the VF */
-   eeh_dev_init(pdn, pci_bus_to_host(pdev->bus));
-   edev = pdn_to_eeh_dev(pdn);
+   edev = eeh_dev_init(pdn);
BUG_ON(!edev);
edev->physfn = pdev;
}
@@ -289,8 +288,11 @@ struct pci_dn *pci_add_device_node_info(struct 
pci_controller 

[PATCH v8 40/45] drivers/of: Split unflatten_dt_node()

2016-02-16 Thread Gavin Shan
The function unflatten_dt_node() is called recursively to unflatten
device nodes and properties in the FDT blob. It looks complicated
and hard to be understood.

This splits the function into 3 functions: populate_properties(),
populate_node() and unflatten_dt_node(). populate_properties(),
which is called by populate_node(), creates properties for the
indicated device node. The later one creates the device nodes
from FDT blob. populate_node() gets the offset in FDT blob for
next device nodes and then calls populate_node(). No logical
changes introduced.

Signed-off-by: Gavin Shan 
---
 drivers/of/fdt.c | 249 ---
 1 file changed, 147 insertions(+), 102 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 655f79d..3c69002 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -161,39 +161,127 @@ static void *unflatten_dt_alloc(void **mem, unsigned 
long size,
return res;
 }
 
-/**
- * unflatten_dt_node - Alloc and populate a device_node from the flat tree
- * @blob: The parent device tree blob
- * @mem: Memory chunk to use for allocating device nodes and properties
- * @poffset: pointer to node in flat tree
- * @dad: Parent struct device_node
- * @nodepp: The device_node tree created by the call
- * @fpsize: Size of the node path up at the current depth.
- * @dryrun: If true, do not allocate device nodes but still calculate needed
- * memory size
- */
-static void * unflatten_dt_node(const void *blob,
-   void *mem,
-   int *poffset,
-   struct device_node *dad,
-   struct device_node **nodepp,
-   unsigned long fpsize,
+static void populate_properties(const void *blob,
+   int offset,
+   void **mem,
+   struct device_node *np,
+   const char *nodename,
bool dryrun)
 {
-   const __be32 *p;
+   struct property *pp, **pprev = NULL;
+   int cur;
+   bool has_name = false;
+
+   pprev = >properties;
+   for (cur = fdt_first_property_offset(blob, offset);
+cur >= 0;
+cur = fdt_next_property_offset(blob, cur)) {
+   const __be32 *val;
+   const char *pname;
+   u32 sz;
+
+   val = fdt_getprop_by_offset(blob, cur, , );
+   if (!val) {
+   pr_warn("%s: Cannot locate property at 0x%x\n",
+   __func__, cur);
+   continue;
+   }
+
+   if (!pname) {
+   pr_warn("%s: Cannot find property name at 0x%x\n",
+   __func__, cur);
+   continue;
+   }
+
+   if (!strcmp(pname, "name"))
+   has_name = true;
+
+   pp = unflatten_dt_alloc(mem, sizeof(struct property),
+   __alignof__(struct property));
+   if (dryrun)
+   continue;
+
+   /* We accept flattened tree phandles either in
+* ePAPR-style "phandle" properties, or the
+* legacy "linux,phandle" properties.  If both
+* appear and have different values, things
+* will get weird. Don't do that.
+*/
+   if (!strcmp(pname, "phandle") ||
+   !strcmp(pname, "linux,phandle")) {
+   if (!np->phandle)
+   np->phandle = be32_to_cpup(val);
+   }
+
+   /* And we process the "ibm,phandle" property
+* used in pSeries dynamic device tree
+* stuff
+*/
+   if (!strcmp(pname, "ibm,phandle"))
+   np->phandle = be32_to_cpup(val);
+
+   pp->name   = (char *)pname;
+   pp->length = sz;
+   pp->value  = (__be32 *)val;
+   *pprev = pp;
+   pprev  = >next;
+   }
+
+   /* With version 0x10 we may not have the name property,
+* recreate it here from the unit name if absent
+*/
+   if (!has_name) {
+   const char *p = nodename, *ps = p, *pa = NULL;
+   int len;
+
+   while (*p) {
+   if ((*p) == '@')
+   pa = p;
+   else if ((*p) == '/')
+   ps = p + 1;
+   p++;
+   }
+
+   if (pa < ps)
+   pa = p;
+   len = (pa - ps) + 1;
+   pp = unflatten_dt_alloc(mem, sizeof(struct property) + len,
+   __alignof__(struct property));
+   

[PATCH v8 14/45] powerpc/powernv/ioda1: Rename pnv_pci_ioda_setup_dma_pe()

2016-02-16 Thread Gavin Shan
This renames pnv_pci_ioda_setup_dma_pe() to pnv_pci_ioda1_setup_dma_pe()
as it's the counter-part of IODA2's pnv_pci_ioda2_setup_dma_pe().
No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 8488238..d18b95e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2026,9 +2026,10 @@ static struct iommu_table_ops pnv_ioda2_iommu_ops = {
.free = pnv_ioda2_table_free,
 };
 
-static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
- struct pnv_ioda_pe *pe, unsigned int base,
- unsigned int segs)
+static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
+  struct pnv_ioda_pe *pe,
+  unsigned int base,
+  unsigned int segs)
 {
 
struct page *tce_mem = NULL;
@@ -2616,7 +2617,7 @@ static void pnv_ioda_setup_dma(struct pnv_phb *phb)
if (phb->type == PNV_PHB_IODA1) {
pe_info(pe, "DMA weight %d, assigned %d DMA32 
segments\n",
pe->dma_weight, segs);
-   pnv_pci_ioda_setup_dma_pe(phb, pe, base, segs);
+   pnv_pci_ioda1_setup_dma_pe(phb, pe, base, segs);
} else if (phb->type == PNV_PHB_IODA2) {
pe_info(pe, "Assign DMA32 space\n");
segs = 0;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 06/45] powerpc/powernv: Reorder fields in struct pnv_phb

2016-02-16 Thread Gavin Shan
This moves those fields in struct pnv_phb that are related to PE
allocation around. No logical change.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 78f035e..f2a1452 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -140,15 +140,14 @@ struct pnv_phb {
unsigned intio_segsize;
unsigned intio_pci_base;
 
-   /* PE allocation bitmap */
-   unsigned long   *pe_alloc;
-   /* PE allocation mutex */
+   /* PE allocation */
struct mutexpe_alloc_mutex;
+   unsigned long   *pe_alloc;
+   struct pnv_ioda_pe  *pe_array;
 
/* M32 & IO segment maps */
unsigned int*m32_segmap;
unsigned int*io_segmap;
-   struct pnv_ioda_pe  *pe_array;
 
/* IRQ chip */
int irq_chip_init;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 10/45] powerpc/powernv: IO and M32 mapping based on PCI device resources

2016-02-16 Thread Gavin Shan
Currently, the IO and M32 segments are mapped to the corresponding
PE based on the windows of the parent bridge of PE's primary bus.
It's not going to work when the windows of root port or upstream
port of the PCIe switch behind root port are extended to PHB's
apertures in order to support hotplug in subsequent patch.

This fixes the issue by mapping IO and M32 segments based on the
resources of the PCI devices included in the PE, instead of the
windows of the parent bridge of the PE's primary bus.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 127 +-
 1 file changed, 71 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index fd7d382..7330a73 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2929,71 +2929,86 @@ truncate_iov:
 }
 #endif /* CONFIG_PCI_IOV */
 
-/*
- * This function is supposed to be called on basis of PE from top
- * to bottom style. So the the I/O or MMIO segment assigned to
- * parent PE could be overrided by its child PEs if necessary.
- */
-static void pnv_ioda_setup_pe_seg(struct pci_controller *hose,
- struct pnv_ioda_pe *pe)
+static void pnv_ioda_setup_one_res(struct pnv_ioda_pe *pe,
+  struct resource *res)
 {
-   struct pnv_phb *phb = hose->private_data;
+   struct pnv_phb *phb = pe->phb;
struct pci_bus_region region;
-   struct resource *res;
-   unsigned int segsize;
-   int *segmap, index, i;
+   unsigned int index, segsize;
+   int *segmap;
uint16_t win;
int64_t rc;
 
-   /*
-* NOTE: We only care PCI bus based PE for now. For PCI
-* device based PE, for example SRIOV sensitive VF should
-* be figured out later.
-*/
-   BUG_ON(!(pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL)));
+   if (!res->parent || !res->flags  || res->start > res->end)
+   return;
+   if (!(res->flags & (IORESOURCE_IO | IORESOURCE_MEM)) ||
+   pnv_pci_is_mem_pref_64(res->flags))
+   return;
 
-   pci_bus_for_each_resource(pe->pbus, res, i) {
-   if (!res || !res->flags ||
-   res->start > res->end)
-   continue;
+   if (res->flags & IORESOURCE_IO) {
+   region.start = res->start - phb->ioda.io_pci_base;
+   region.end   = res->end - phb->ioda.io_pci_base;
+   segsize  = phb->ioda.io_segsize;
+   segmap   = phb->ioda.io_segmap;
+   win  = OPAL_IO_WINDOW_TYPE;
+   } else {
+   region.start = res->start -
+  phb->hose->mem_offset[0] -
+  phb->ioda.m32_pci_base;
+   region.end   = res->end -
+  phb->hose->mem_offset[0] -
+  phb->ioda.m32_pci_base;
+   segsize  = phb->ioda.m32_segsize;
+   segmap   = phb->ioda.m32_segmap;
+   win  = OPAL_M32_WINDOW_TYPE;
+   }
+
+   region.start = _ALIGN_DOWN(region.start, segsize);
+   region.end   = _ALIGN_UP(region.end, segsize);
+   index = region.start / segsize;
+   while (index < phb->ioda.total_pe_num && region.start < region.end) {
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   pe->pe_number, win, 0, index);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("%s: Error %lld mapping (%d) seg#%d to 
PHB#%d-PE#%d\n",
+   __func__, rc, win, index,
+   phb->hose->global_number,
+   pe->pe_number);
+   return;
+   }
 
-   if (res->flags & IORESOURCE_IO) {
-   region.start = res->start - phb->ioda.io_pci_base;
-   region.end   = res->end - phb->ioda.io_pci_base;
-   segsize  = phb->ioda.io_segsize;
-   segmap   = phb->ioda.io_segmap;
-   win  = OPAL_IO_WINDOW_TYPE;
-   } else if ((res->flags & IORESOURCE_MEM) &&
-  !pnv_pci_is_mem_pref_64(res->flags)) {
-   region.start = res->start -
-  hose->mem_offset[0] -
-  phb->ioda.m32_pci_base;
-   region.end   = res->end -
-  hose->mem_offset[0] -
-  phb->ioda.m32_pci_base;
-   segsize  = phb->ioda.m32_segsize;
-   segmap   = phb->ioda.m32_segmap;
-   win  = OPAL_M32_WINDOW_TYPE;
-  

[PATCH v8 26/45] powerpc/pci: Move pci_find_bus_by_node() around

2016-02-16 Thread Gavin Shan
This moves pci_find_bus_by_node() from arch/powerpc/platforms/
pseries/pci_dlpar.c to arch/powerpc/kernel/pci-hotplug.c so that
the function can be used by pSeries and PowerNV platform at the
same time. Also, below cleanup applied. No functional changes
introduced.

   * Remove variable "busdn" in find_bus_among_children()
   * Use PCI_DN() to convert device node to pci_dn

Signed-off-by: Gavin Shan 
Acked-by: Benjamin Herrenschmidt 
---
 arch/powerpc/kernel/pci-hotplug.c  | 29 
 arch/powerpc/platforms/pseries/pci_dlpar.c | 31 --
 2 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 78bf2a1..7929a1c 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,35 @@
 #include 
 #include 
 
+static struct pci_bus *find_bus_among_children(struct pci_bus *bus,
+  struct device_node *dn)
+{
+   struct pci_bus *child = NULL;
+   struct pci_bus *tmp;
+
+   if (pci_bus_to_OF_node(bus) == dn)
+   return bus;
+
+   list_for_each_entry(tmp, >children, node) {
+   child = find_bus_among_children(tmp, dn);
+   if (child)
+   break;
+   }
+
+   return child;
+}
+
+struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
+{
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn  || !pdn->phb || !pdn->phb->bus)
+   return NULL;
+
+   return find_bus_among_children(pdn->phb->bus, dn);
+}
+EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
+
 /**
  * pcibios_release_device - release PCI device
  * @dev: PCI device
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
b/arch/powerpc/platforms/pseries/pci_dlpar.c
index aee22b4..906dbaa 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -34,37 +34,6 @@
 
 #include "pseries.h"
 
-static struct pci_bus *
-find_bus_among_children(struct pci_bus *bus,
-struct device_node *dn)
-{
-   struct pci_bus *child = NULL;
-   struct pci_bus *tmp;
-   struct device_node *busdn;
-
-   busdn = pci_bus_to_OF_node(bus);
-   if (busdn == dn)
-   return bus;
-
-   list_for_each_entry(tmp, >children, node) {
-   child = find_bus_among_children(tmp, dn);
-   if (child)
-   break;
-   };
-   return child;
-}
-
-struct pci_bus *pci_find_bus_by_node(struct device_node *dn)
-{
-   struct pci_dn *pdn = dn->data;
-
-   if (!pdn  || !pdn->phb || !pdn->phb->bus)
-   return NULL;
-
-   return find_bus_among_children(pdn->phb->bus, dn);
-}
-EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
struct pci_controller *phb;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 18/45] powerpc/powernv: Increase PE# capacity

2016-02-16 Thread Gavin Shan
Each PHB maintains an array helping to translate 2-bytes Request
ID (RID) to PE# with the assumption that PE# takes one byte, meaning
that we can't have more than 256 PEs. However, pci_dn->pe_number
already had 4-bytes for the PE#.

This extends the PE# capacity for every PHB. After that, the PE number
is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
check the PE# in phb->pe_rmap[] is valid or not.

Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Axtens 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 6 +-
 arch/powerpc/platforms/powernv/pci.h  | 7 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 59782fba..7800897 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -757,7 +757,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, 
struct pnv_ioda_pe *pe)
 
/* Clear the reverse map */
for (rid = pe->rid; rid < rid_end; rid++)
-   phb->ioda.pe_rmap[rid] = 0;
+   phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
 
/* Release from all parents PELT-V */
while (parent) {
@@ -3387,6 +3387,10 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
if (prop32)
phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
 
+   /* Invalidate RID to PE# mapping */
+   for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
+   phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
+
/* Parse 64-bit MMIO range */
pnv_ioda_parse_m64_window(phb);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 350e630..928cf81 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -160,11 +160,8 @@ struct pnv_phb {
struct list_headpe_list;
struct mutexpe_list_mutex;
 
-   /* Reverse map of PEs, will have to extend if
-* we are to support more than 256 PEs, indexed
-* bus { bus, devfn }
-*/
-   unsigned char   pe_rmap[0x1];
+   /* Reverse map of PEs, indexed by {bus, devfn} */
+   int pe_rmap[0x1];
 
/* TCE cache invalidate registers (physical and
 * remapped)
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 36/45] powerpc/powernv: Support PCI slot ID

2016-02-16 Thread Gavin Shan
PowerNV platforms runs on top of skiboot firmware that includes
changes to support PCI slots. PCI slots are identified by PHB's
ID or the combo of that and PCI slot ID.

This changes the EEH PowerNV backend to support PCI slots:

   * Rename arguments of opal_pci_reset() and opal_pci_poll().
   * One more argument (PCI slot's state) added to opal_pci_poll().
   * Drop pnv_eeh_phb_poll() and introduce a enhanced similar
 function pnv_pci_poll() that will be used by PowerNV hotplug
 backends.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/opal.h  |  4 +--
 arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++--
 arch/powerpc/platforms/powernv/pci.c | 21 ++
 arch/powerpc/platforms/powernv/pci.h |  1 +
 4 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 07a99e6..9e0039f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, 
uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
uint16_t dma_window_number, uint64_t 
pci_start_addr,
uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t 
assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
   uint64_t diag_buffer_len);
@@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
 int64_t opal_set_system_attention_led(uint8_t led_action);
 int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
__be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *state);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c7454ba..e23b063 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int 
*delay)
return ret;
 }
 
-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
-{
-   s64 rc = OPAL_HARDWARE;
-
-   while (1) {
-   rc = opal_pci_poll(phb->opal_id);
-   if (rc <= 0)
-   break;
-
-   if (system_state < SYSTEM_RUNNING)
-   udelay(1000 * rc);
-   else
-   msleep(rc);
-   }
-
-   return rc;
-}
-
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
struct pnv_phb *phb = hose->private_data;
s64 rc = OPAL_HARDWARE;
+   int ret;
 
pr_debug("%s: Reset PHB#%x, option=%d\n",
 __func__, hose->global_number, option);
@@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int 
option)
rc = opal_pci_reset(phb->opal_id,
OPAL_RESET_PHB_COMPLETE,
OPAL_DEASSERT_RESET);
-   if (rc < 0)
-   goto out;
 
/*
 * Poll state of the PHB until the request is done
@@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int 
option)
 * reset followed by hot reset on root bus. So we also
 * need the PCI bus settlement delay.
 */
-   rc = pnv_eeh_phb_poll(phb);
-   if (option == EEH_RESET_DEACTIVATE) {
+   ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+   if (option == EEH_RESET_DEACTIVATE && !ret) {
if (system_state < SYSTEM_RUNNING)
udelay(1000 * EEH_PE_RST_SETTLE_TIME);
else
msleep(EEH_PE_RST_SETTLE_TIME);
}
-out:
-   if (rc != OPAL_SUCCESS)
-   return -EIO;
 
-   return 0;
+   return ret;
 }
 
 static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
 {
struct pnv_phb *phb = hose->private_data;
s64 rc = OPAL_HARDWARE;
+   int ret;
 
pr_debug("%s: Reset PHB#%x, option=%d\n",
 __func__, hose->global_number, option);
@@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller 
*hose, int option)
rc = opal_pci_reset(phb->opal_id,
OPAL_RESET_PCI_HOT,
OPAL_DEASSERT_RESET);
-   if (rc < 0)
-   goto out;
 
/* Poll state of the PHB until the request is done */
-   rc = pnv_eeh_phb_poll(phb);
-   if (option == 

[PATCH v8 41/45] drivers/of: Avoid recursively calling unflatten_dt_node()

2016-02-16 Thread Gavin Shan
In current implementation, unflatten_dt_node() is called recursively
to unflatten device nodes in FDT blob. It's stress to limited stack
capacity, especially to adopt the function to unflatten device sub-tree
that possibly has multiple root nodes. In that case, we runs out of
stack and the system can't boot up successfully.

In order to reuse the function to unflatten device sub-tree, this avoids
calling the function recursively, meaning the device nodes are unflattened
in one call on unflatten_dt_node(): two arrays are introduced to track the
parent path size and the device node of current level of depth, which will
be used by the device node on next level of depth to be unflattened. All
device nodes in more than 64 level of depth are dropped and hopefully,
the system can boot up successfully with the partial device-tree.

Also, the parameter "poffset" and "fpsize" are unused and dropped and the
parameter "dryrun" is figured out from "mem == NULL". Besides, the return
value of the function is changed to indicate the size of memory consumed by
the unflatten device tree or error code.

Signed-off-by: Gavin Shan 
---
 drivers/of/fdt.c | 122 +--
 1 file changed, 74 insertions(+), 48 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 3c69002..667a5b2 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -356,63 +356,90 @@ static unsigned long populate_node(const void *blob,
return fpsize;
 }
 
+static void reverse_nodes(struct device_node *parent)
+{
+   struct device_node *child, *next;
+
+   /* In-depth first */
+   child = parent->child;
+   while (child) {
+   reverse_nodes(child);
+
+   child = child->sibling;
+   }
+
+   /* Reverse the nodes in the child list */
+   child = parent->child;
+   parent->child = NULL;
+   while (child) {
+   next = child->sibling;
+
+   child->sibling = parent->child;
+   parent->child = child;
+   child = next;
+   }
+}
+
 /**
  * unflatten_dt_node - Alloc and populate a device_node from the flat tree
  * @blob: The parent device tree blob
  * @mem: Memory chunk to use for allocating device nodes and properties
- * @poffset: pointer to node in flat tree
  * @dad: Parent struct device_node
  * @nodepp: The device_node tree created by the call
- * @fpsize: Size of the node path up at the current depth.
- * @dryrun: If true, do not allocate device nodes but still calculate needed
- * memory size
+ *
+ * It returns the size of unflattened device tree or error code
  */
-static void *unflatten_dt_node(const void *blob,
-  void *mem,
-  int *poffset,
-  struct device_node *dad,
-  struct device_node **nodepp,
-  unsigned long fpsize,
-  bool dryrun)
+static int unflatten_dt_node(const void *blob,
+void *mem,
+struct device_node *dad,
+struct device_node **nodepp)
 {
-   struct device_node *np;
-   static int depth;
-   int old_depth;
+   struct device_node *root;
+   int offset = 0, depth = 0;
+#define FDT_MAX_DEPTH  64
+   unsigned long fpsizes[FDT_MAX_DEPTH];
+   struct device_node *nps[FDT_MAX_DEPTH];
+   void *base = mem;
+   bool dryrun = !base;
 
-   fpsize = populate_node(blob, *poffset, , dad, fpsize, , dryrun);
-   if (!fpsize)
-   return mem;
+   if (nodepp)
+   *nodepp = NULL;
+
+   root = dad;
+   fpsizes[depth] = dad ? strlen(of_node_full_name(dad)) : 0;
+   nps[depth++] = dad;
+   for (offset = 0;
+offset >= 0;
+offset = fdt_next_node(blob, offset, )) {
+   if (WARN_ON_ONCE(depth >= FDT_MAX_DEPTH))
+   continue;
 
-   old_depth = depth;
-   *poffset = fdt_next_node(blob, *poffset, );
-   if (depth < 0)
-   depth = 0;
-   while (*poffset > 0 && depth > old_depth)
-   mem = unflatten_dt_node(blob, mem, poffset, np, NULL,
-   fpsize, dryrun);
+   fpsizes[depth] = populate_node(blob, offset, ,
+  nps[depth - 1],
+  fpsizes[depth - 1],
+  [depth], dryrun);
+   if (!fpsizes[depth])
+   return mem - base;
+
+   if (!dryrun && nodepp && !*nodepp)
+   *nodepp = nps[depth];
+   if (!dryrun && !root)
+   root = nps[depth];
+   }
 
-   if (*poffset < 0 && *poffset != -FDT_ERR_NOTFOUND)
-   pr_err("unflatten: error %d processing FDT\n", 

[PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info()

2016-02-16 Thread Gavin Shan
This implements and exports pci_remove_device_node_info(). It's
used to remove the pdn (struct pci_dn) for the indicated device
node. The function is going to be used by PowerNV PCI hotplug
driver.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h |  1 +
 arch/powerpc/kernel/pci_dn.c  | 23 +++
 2 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 72a9d4e..c6310e2 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -240,6 +240,7 @@ extern struct pci_dn *add_dev_pci_data(struct pci_dev 
*pdev);
 extern void remove_dev_pci_data(struct pci_dev *pdev);
 extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
   struct device_node *dn);
+extern void pci_remove_device_node_info(struct device_node *dn);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 0a249ff..ce10281 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -331,6 +331,29 @@ struct pci_dn *pci_add_device_node_info(struct 
pci_controller *hose,
 }
 EXPORT_SYMBOL_GPL(pci_add_device_node_info);
 
+void pci_remove_device_node_info(struct device_node *dn)
+{
+   struct pci_dn *pdn = dn ? PCI_DN(dn) : NULL;
+#ifdef CONFIG_EEH
+   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+
+   if (edev)
+   edev->pdn = NULL;
+#endif
+
+   if (!pdn)
+   return;
+
+   WARN_ON(!list_empty(>child_list));
+   list_del(>list);
+   if (pdn->parent)
+   of_node_put(pdn->parent->node);
+
+   dn->data = NULL;
+   kfree(pdn);
+}
+EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
+
 /*
  * Traverse a device tree stopping each PCI device in the tree.
  * This is done depth first.  As each node is processed, a "pre"
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset()

2016-02-16 Thread Gavin Shan
This drops unnecessary nested if statements in pnv_eeh_reset() to
improve the code readability. After the changes, the unused local
variable "ret" is dropped as well. No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 67 +---
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 69e41ce..9226df1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1009,8 +1009,9 @@ static int pnv_eeh_reset_vf_pe(struct eeh_pe *pe, int 
option)
 static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 {
struct pci_controller *hose = pe->phb;
+   struct pnv_phb *phb;
struct pci_bus *bus;
-   int ret;
+   int64_t rc;
 
/*
 * For PHB reset, we always have complete reset. For those PEs whose
@@ -1026,45 +1027,39 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 * reset. The side effect is that EEH core has to clear the frozen
 * state explicitly after BAR restore.
 */
-   if (pe->type & EEH_PE_PHB) {
-   ret = pnv_eeh_phb_reset(hose, option);
-   } else {
-   struct pnv_phb *phb;
-   s64 rc;
+   if (pe->type & EEH_PE_PHB)
+   return pnv_eeh_phb_reset(hose, option);
 
-   /*
-* The frozen PE might be caused by PAPR error injection
-* registers, which are expected to be cleared after hitting
-* frozen PE as stated in the hardware spec. Unfortunately,
-* that's not true on P7IOC. So we have to clear it manually
-* to avoid recursive EEH errors during recovery.
-*/
-   phb = hose->private_data;
-   if (phb->model == PNV_PHB_MODEL_P7IOC &&
-   (option == EEH_RESET_HOT ||
-   option == EEH_RESET_FUNDAMENTAL)) {
-   rc = opal_pci_reset(phb->opal_id,
-   OPAL_RESET_PHB_ERROR,
-   OPAL_ASSERT_RESET);
-   if (rc != OPAL_SUCCESS) {
-   pr_warn("%s: Failure %lld clearing "
-   "error injection registers\n",
-   __func__, rc);
-   return -EIO;
-   }
+   /*
+* The frozen PE might be caused by PAPR error injection
+* registers, which are expected to be cleared after hitting
+* frozen PE as stated in the hardware spec. Unfortunately,
+* that's not true on P7IOC. So we have to clear it manually
+* to avoid recursive EEH errors during recovery.
+*/
+   phb = hose->private_data;
+   if (phb->model == PNV_PHB_MODEL_P7IOC &&
+   (option == EEH_RESET_HOT ||
+option == EEH_RESET_FUNDAMENTAL)) {
+   rc = opal_pci_reset(phb->opal_id,
+   OPAL_RESET_PHB_ERROR,
+   OPAL_ASSERT_RESET);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("%s: Failure %lld clearing error injection 
registers\n",
+   __func__, rc);
+   return -EIO;
}
-
-   bus = eeh_pe_bus_get(pe);
-   if (pe->type & EEH_PE_VF)
-   ret = pnv_eeh_reset_vf_pe(pe, option);
-   else if (pci_is_root_bus(bus) ||
-   pci_is_root_bus(bus->parent))
-   ret = pnv_eeh_root_reset(hose, option);
-   else
-   ret = pnv_eeh_bridge_reset(bus->self, option);
}
 
-   return ret;
+   bus = eeh_pe_bus_get(pe);
+   if (pe->type & EEH_PE_VF)
+   return pnv_eeh_reset_vf_pe(pe, option);
+
+   if (pci_is_root_bus(bus) ||
+   pci_is_root_bus(bus->parent))
+   return pnv_eeh_root_reset(hose, option);
+
+   return pnv_eeh_bridge_reset(bus->self, option);
 }
 
 /**
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()

2016-02-16 Thread Gavin Shan
This renames traverse_pci_devices() to pci_traverse_device_nodes().
The function traverses all subordinate device nodes of the specified
one. Also, below cleanup applied to the function. No logical changes
introduced.

   * Rename "pre" to "fn".
   * Avoid assignment in if condition reported from checkpatch.pl.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
 arch/powerpc/kernel/pci_dn.c | 15 ++-
 arch/powerpc/platforms/pseries/msi.c |  4 ++--
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index ca0c5bf..8753e4e 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev; /* may be NULL 
if no ISA bus */
 struct device_node;
 struct pci_dn;
 
-typedef void *(*traverse_func)(struct device_node *me, void *data);
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-   void *data);
+void *pci_traverse_device_nodes(struct device_node *start,
+   void *(*fn)(struct device_node *, void *),
+   void *data);
 void *traverse_pci_dn(struct pci_dn *root,
  void *(*fn)(struct pci_dn *, void *),
  void *data);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index ce10281..ecdccce 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
  * one of these nodes we also assume its siblings are non-pci for
  * performance.
  */
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-   void *data)
+void *pci_traverse_device_nodes(struct device_node *start,
+   void *(*fn)(struct device_node *, void *),
+   void *data)
 {
struct device_node *dn, *nextdn;
void *ret;
@@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, 
traverse_func pre,
if (classp)
class = of_read_number(classp, 1);
 
-   if (pre && ((ret = pre(dn, data)) != NULL))
-   return ret;
+   if (fn) {
+   ret = fn(dn, data);
+   if (ret)
+   return ret;
+   }
 
/* If we are a PCI bridge, go down */
if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
@@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, 
traverse_func pre,
}
return NULL;
 }
+EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
 
 static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
  struct pci_dn *pdn)
@@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
}
 
/* Update dn->phb ptrs for new phb and children devices */
-   traverse_pci_devices(dn, add_pdn, phb);
+   pci_traverse_device_nodes(dn, add_pdn, phb);
 }
 
 /** 
diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 272e9ec..543a638 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int 
request)
memset(, 0, sizeof(struct msi_counts));
 
/* Work out how many devices we have below this PE */
-   traverse_pci_devices(pe_dn, count_non_bridge_devices, );
+   pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, );
 
if (counts.num_devices == 0) {
pr_err("rtas_msi: found 0 devices under PE for %s\n",
@@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int 
request)
/* else, we have some more calculating to do */
counts.requestor = pci_device_to_OF_node(dev);
counts.request = request;
-   traverse_pci_devices(pe_dn, count_spare_msis, );
+   pci_traverse_device_nodes(pe_dn, count_spare_msis, );
 
/* If the quota isn't an integer multiple of the total, we can
 * use the remainder as spare MSIs for anyone that wants them. */
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 32/45] powerpc/pci: Update bridge windows on PCI plug

2016-02-16 Thread Gavin Shan
On the PCI plugging event, PCI slot's subordinate devices are
scanned and their (IO and MMIO) resources are assigned. Platform
dependent resources (PE#, IO/MMIO/DMA windows) are allocated or
created on updating windows of the slot's upstream bridge.

This updates the windows of the hot plugged slot's upstream bridge
in pcibios_finish_adding_to_bus() so that the platform resources
(PE#, IO/MMIO/DMA segments) are allocated or created accordingly.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/kernel/pci-common.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 40df3a5..be9e515 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1444,8 +1444,12 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
/* Allocate bus and devices resources */
pcibios_allocate_bus_resources(bus);
pcibios_claim_one_bus(bus);
-   if (!pci_has_flag(PCI_PROBE_ONLY))
-   pci_assign_unassigned_bus_resources(bus);
+   if (!pci_has_flag(PCI_PROBE_ONLY)) {
+   if (bus->self)
+   pci_assign_unassigned_bridge_resources(bus->self);
+   else
+   pci_assign_unassigned_bus_resources(bus);
+   }
 
/* Fixup EEH */
eeh_add_device_tree_late(bus);
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 15/45] powerpc/powernv/ioda1: Introduce PNV_IODA1_DMA32_SEGSIZE

2016-02-16 Thread Gavin Shan
Currently, there is one macro (TCE32_TABLE_SIZE) representing the
TCE table size for one DMA32 segment. The constant representing
the DMA32 segment size (1 << 28) is still used in the code.

This defines PNV_IODA1_DMA32_SEGSIZE representing one DMA32
segment size. the TCE table size can be calcualted when the page
has fixed 4KB size. So all the related calculation depends on one
macro (PNV_IODA1_DMA32_SEGSIZE). No logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 +-
 arch/powerpc/platforms/powernv/pci.h  |  1 +
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index d18b95e..e60cff6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -48,9 +48,6 @@
 #include "powernv.h"
 #include "pci.h"
 
-/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
-#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
-
 #define POWERNV_IOMMU_DEFAULT_LEVELS   1
 #define POWERNV_IOMMU_MAX_LEVELS   5
 
@@ -2034,7 +2031,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
 
struct page *tce_mem = NULL;
struct iommu_table *tbl;
-   unsigned int i;
+   unsigned int tce32_segsz, i;
int64_t rc;
void *addr;
 
@@ -2054,29 +2051,34 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
/* Grab a 32-bit TCE table */
pe->tce32_seg = base;
pe_info(pe, " Setting up 32-bit TCE table at %08x..%08x\n",
-   (base << 28), ((base + segs) << 28) - 1);
+   base * PNV_IODA1_DMA32_SEGSIZE,
+   (base + segs) * PNV_IODA1_DMA32_SEGSIZE - 1);
 
/* XXX Currently, we allocate one big contiguous table for the
 * TCEs. We only really need one chunk per 256M of TCE space
 * (ie per segment) but that's an optimization for later, it
 * requires some added smarts with our get/put_tce implementation
+*
+* Each TCE page is 4KB in size and each TCE entry occupies 8
+* bytes
 */
+   tce32_segsz = PNV_IODA1_DMA32_SEGSIZE >> (IOMMU_PAGE_SHIFT_4K - 3);
tce_mem = alloc_pages_node(phb->hose->node, GFP_KERNEL,
-  get_order(TCE32_TABLE_SIZE * segs));
+  get_order(tce32_segsz * segs));
if (!tce_mem) {
pe_err(pe, " Failed to allocate a 32-bit TCE memory\n");
goto fail;
}
addr = page_address(tce_mem);
-   memset(addr, 0, TCE32_TABLE_SIZE * segs);
+   memset(addr, 0, tce32_segsz * segs);
 
/* Configure HW */
for (i = 0; i < segs; i++) {
rc = opal_pci_map_pe_dma_window(phb->opal_id,
  pe->pe_number,
  base + i, 1,
- __pa(addr) + TCE32_TABLE_SIZE * i,
- TCE32_TABLE_SIZE, 0x1000);
+ __pa(addr) + tce32_segsz * i,
+ tce32_segsz, 0x1000);
if (rc) {
pe_err(pe, " Failed to configure 32-bit TCE table,"
   " err %ld\n", rc);
@@ -2085,8 +2087,9 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
}
 
/* Setup linux iommu table */
-   pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
- base << 28, IOMMU_PAGE_SHIFT_4K);
+   pnv_pci_setup_iommu_table(tbl, addr, tce32_segsz * segs,
+ base * PNV_IODA1_DMA32_SEGSIZE,
+ IOMMU_PAGE_SHIFT_4K);
 
/* OPAL variant of P7IOC SW invalidated TCEs */
if (phb->ioda.tce_inval_reg)
@@ -2116,7 +2119,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
*phb,
if (pe->tce32_seg >= 0)
pe->tce32_seg = -1;
if (tce_mem)
-   __free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
+   __free_pages(tce_mem, get_order(tce32_segsz * segs));
if (tbl) {
pnv_pci_unlink_table_and_group(tbl, >table_group);
iommu_free_table(tbl, "pnv");
@@ -3445,7 +3448,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
mutex_init(>ioda.pe_list_mutex);
 
/* Calculate how many 32-bit TCE segments we have */
-   phb->ioda.tce32_count = phb->ioda.m32_pci_base >> 28;
+   phb->ioda.tce32_count = phb->ioda.m32_pci_base /
+   PNV_IODA1_DMA32_SEGSIZE;
 
 #if 0 /* We should really do that ... */
rc = opal_pci_set_phb_mem_window(opal->phb_id,
diff --git a/arch/powerpc/platforms/powernv/pci.h 

[PATCH v8 08/45] powerpc/powernv: Fix initial IO and M32 segmap

2016-02-16 Thread Gavin Shan
There are two arrays for IO and M32 segment maps on every PHB.
The index of the arrays are segment number and the value stored
in the corresponding element is PE number, indicating the segment
is assigned to the PE. Initially, all elements in those two arrays
are zeroes, meaning all segments are assigned to PE#0. It's wrong.

This fixes the initial values in the elements of those two arrays
to IODA_INVALID_PE, meaning all segments aren't assigned to any
PE. In order to use IODA_INVALID_PE (-1) to represent invalid PE
number, the types of those two arrays are changed from "unsigned int"
to "int".

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 9 +++--
 arch/powerpc/platforms/powernv/pci.h  | 4 ++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1d2514f..44cc5f3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3239,7 +3239,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
unsigned long size, m32map_off, pemap_off, iomap_off = 0;
const __be64 *prop64;
const __be32 *prop32;
-   int len;
+   int i, len;
u64 phb_id;
void *aux;
long rc;
@@ -3334,8 +3334,13 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
aux = memblock_virt_alloc(size, 0);
phb->ioda.pe_alloc = aux;
phb->ioda.m32_segmap = aux + m32map_off;
-   if (phb->type == PNV_PHB_IODA1)
+   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
+   if (phb->type == PNV_PHB_IODA1) {
phb->ioda.io_segmap = aux + iomap_off;
+   for (i = 0; i < phb->ioda.total_pe_num; i++)
+   phb->ioda.io_segmap[i] = IODA_INVALID_PE;
+   }
phb->ioda.pe_array = aux + pemap_off;
set_bit(phb->ioda.reserved_pe_idx, phb->ioda.pe_alloc);
 
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 784882a..36c4965 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -146,8 +146,8 @@ struct pnv_phb {
struct pnv_ioda_pe  *pe_array;
 
/* M32 & IO segment maps */
-   unsigned int*m32_segmap;
-   unsigned int*io_segmap;
+   int *m32_segmap;
+   int *io_segmap;
 
/* IRQ chip */
int irq_chip_init;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 07/45] powerpc/powernv: Rename PE# fields in struct pnv_phb

2016-02-16 Thread Gavin Shan
This renames the fields related to PE number in "struct pnv_phb"
for better reflecting of their usages as Alexey suggested. No
logical changes introduced.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c| 58 ++--
 arch/powerpc/platforms/powernv/pci.c |  2 +-
 arch/powerpc/platforms/powernv/pci.h |  4 +-
 4 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 950b3e5..69e41ce 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -75,7 +75,7 @@ static int pnv_eeh_init(void)
 * and P7IOC separately. So we should regard
 * PE#0 as valid for PHB3 and P7IOC.
 */
-   if (phb->ioda.reserved_pe != 0)
+   if (phb->ioda.reserved_pe_idx != 0)
eeh_add_flag(EEH_VALID_PE_ZERO);
 
break;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 10ecd97..1d2514f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -124,7 +124,7 @@ static inline bool pnv_pci_is_mem_pref_64(unsigned long 
flags)
 
 static void pnv_ioda_reserve_pe(struct pnv_phb *phb, int pe_no)
 {
-   if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe)) {
+   if (!(pe_no >= 0 && pe_no < phb->ioda.total_pe_num)) {
pr_warn("%s: Invalid PE %d on PHB#%x\n",
__func__, pe_no, phb->hose->global_number);
return;
@@ -144,8 +144,8 @@ static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
 
do {
pe = find_next_zero_bit(phb->ioda.pe_alloc,
-   phb->ioda.total_pe, 0);
-   if (pe >= phb->ioda.total_pe)
+   phb->ioda.total_pe_num, 0);
+   if (pe >= phb->ioda.total_pe_num)
return IODA_INVALID_PE;
} while(test_and_set_bit(pe, phb->ioda.pe_alloc));
 
@@ -199,13 +199,13 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
 * expected to be 0 or last one of PE capabicity.
 */
r = >hose->mem_resources[1];
-   if (phb->ioda.reserved_pe == 0)
+   if (phb->ioda.reserved_pe_idx == 0)
r->start += phb->ioda.m64_segsize;
-   else if (phb->ioda.reserved_pe == (phb->ioda.total_pe - 1))
+   else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
r->end -= phb->ioda.m64_segsize;
else
pr_warn("  Cannot strip M64 segment for reserved PE#%d\n",
-   phb->ioda.reserved_pe);
+   phb->ioda.reserved_pe_idx);
 
return 0;
 
@@ -274,7 +274,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
return IODA_INVALID_PE;
 
/* Allocate bitmap */
-   size = _ALIGN_UP(phb->ioda.total_pe / 8, sizeof(unsigned long));
+   size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
pe_alloc = kzalloc(size, GFP_KERNEL);
if (!pe_alloc) {
pr_warn("%s: Out of memory !\n",
@@ -290,7 +290,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
 * contributed by its child buses. For the case, we needn't
 * pick M64 dependent PE#.
 */
-   if (bitmap_empty(pe_alloc, phb->ioda.total_pe)) {
+   if (bitmap_empty(pe_alloc, phb->ioda.total_pe_num)) {
kfree(pe_alloc);
return IODA_INVALID_PE;
}
@@ -301,8 +301,8 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, bool 
all)
 */
master_pe = NULL;
i = -1;
-   while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe, i + 1)) <
-   phb->ioda.total_pe) {
+   while ((i = find_next_bit(pe_alloc, phb->ioda.total_pe_num, i + 1)) <
+   phb->ioda.total_pe_num) {
pe = >ioda.pe_array[i];
 
if (!master_pe) {
@@ -355,7 +355,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb 
*phb)
hose->mem_offset[1] = res->start - pci_addr;
 
phb->ioda.m64_size = resource_size(res);
-   phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe;
+   phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
phb->ioda.m64_base = pci_addr;
 
pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
@@ -456,7 +456,7 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int 
pe_no)
s64 rc;
 
/* Sanity check on PE number */
-   if (pe_no < 0 || pe_no >= phb->ioda.total_pe)
+   if (pe_no < 0 || pe_no >= phb->ioda.total_pe_num)
return OPAL_EEH_STOPPED_PERM_UNAVAIL;
 

[PATCH v8 02/45] powerpc/pci: Override pcibios_setup_bridge()

2016-02-16 Thread Gavin Shan
This overrides pcibios_setup_bridge() that is called to update PCI
bridge windows when PCI resource assignment is completed, to assign
PE and setup various (resource) mapping for the PE in subsequent
patches.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h | 2 ++
 arch/powerpc/kernel/pci-common.c  | 8 
 2 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 9f165e8..b688d04 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -33,6 +33,8 @@ struct pci_controller_ops {
 
/* Called during PCI resource reassignment */
resource_size_t (*window_alignment)(struct pci_bus *, unsigned long 
type);
+   void(*setup_bridge)(struct pci_bus *bus,
+   unsigned long type);
void(*reset_secondary_bus)(struct pci_dev *dev);
 
 #ifdef CONFIG_PCI_MSI
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..40df3a5 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -124,6 +124,14 @@ resource_size_t pcibios_window_alignment(struct pci_bus 
*bus,
return 1;
 }
 
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+   struct pci_controller *hose = pci_bus_to_host(bus);
+
+   if (hose->controller_ops.setup_bridge)
+   hose->controller_ops.setup_bridge(bus, type);
+}
+
 void pcibios_reset_secondary_bus(struct pci_dev *dev)
 {
struct pci_controller *phb = pci_bus_to_host(dev->bus);
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 01/45] PCI: Add pcibios_setup_bridge()

2016-02-16 Thread Gavin Shan
Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(),
which is called for once after PCI probing and resource assignment
are completed, to allocate platform required resources for PCI devices:
PE#, IO and MMIO mapping, DMA address translation (TCE) table etc.
Obviously, it's not hotplug friendly.

This adds weak function pcibios_setup_bridge(), which is called by
pci_setup_bridge(). PowerPC PowerNV platform will reuse the function
to assign above platform required resources to newly plugged PCI devices
during PCI hotplug in subsequent patches.

Signed-off-by: Gavin Shan 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/setup-bus.c | 5 +
 include/linux/pci.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7796d0a..acda514 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -696,11 +696,16 @@ static void __pci_setup_bridge(struct pci_bus *bus, 
unsigned long type)
pci_write_config_word(bridge, PCI_BRIDGE_CONTROL, bus->bridge_ctl);
 }
 
+void __weak pcibios_setup_bridge(struct pci_bus *bus, unsigned long type)
+{
+}
+
 void pci_setup_bridge(struct pci_bus *bus)
 {
unsigned long type = IORESOURCE_IO | IORESOURCE_MEM |
  IORESOURCE_PREFETCH;
 
+   pcibios_setup_bridge(bus, type);
__pci_setup_bridge(bus, type);
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index bc435d62..8161c79 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -842,6 +842,7 @@ void pci_stop_and_remove_bus_device_locked(struct pci_dev 
*dev);
 void pci_stop_root_bus(struct pci_bus *bus);
 void pci_remove_root_bus(struct pci_bus *bus);
 void pci_setup_cardbus(struct pci_bus *bus);
+void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type);
 void pci_sort_breadthfirst(void);
 #define dev_is_pci(d) ((d)->bus == _bus_type)
 #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops

2016-02-16 Thread Gavin Shan
Each PHB has one instance of "struct pci_controller_ops", which
includes various callbacks called by PCI subsystem. In the definition
of this struct, some callbacks have explicit names for its arguments,
but the left don't have.

This adds all explicit names of the arguments to the callbacks in
"struct pci_controller_ops" so that the code looks consistent.

Signed-off-by: Gavin Shan 
Reviewed-by: Daniel Axtens 
---
 arch/powerpc/include/asm/pci-bridge.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index b688d04..4dd6ef4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -21,18 +21,19 @@ struct pci_controller_ops {
void(*dma_dev_setup)(struct pci_dev *dev);
void(*dma_bus_setup)(struct pci_bus *bus);
 
-   int (*probe_mode)(struct pci_bus *);
+   int (*probe_mode)(struct pci_bus *bus);
 
/* Called when pci_enable_device() is called. Returns true to
 * allow assignment/enabling of the device. */
-   bool(*enable_device_hook)(struct pci_dev *);
+   bool(*enable_device_hook)(struct pci_dev *dev);
 
-   void(*disable_device)(struct pci_dev *);
+   void(*disable_device)(struct pci_dev *dev);
 
-   void(*release_device)(struct pci_dev *);
+   void(*release_device)(struct pci_dev *dev);
 
/* Called during PCI resource reassignment */
-   resource_size_t (*window_alignment)(struct pci_bus *, unsigned long 
type);
+   resource_size_t (*window_alignment)(struct pci_bus *bus,
+   unsigned long type);
void(*setup_bridge)(struct pci_bus *bus,
unsigned long type);
void(*reset_secondary_bus)(struct pci_dev *dev);
@@ -46,7 +47,7 @@ struct pci_controller_ops {
int (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
u64 (*dma_get_required_mask)(struct pci_dev *dev);
 
-   void(*shutdown)(struct pci_controller *);
+   void(*shutdown)(struct pci_controller *hose);
 };
 
 /*
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-02-16 Thread Gavin Shan
This series of patches rebases on powerpc/next branch, plus below additional
patches:

   
   
   https://patchwork.ozlabs.org/patch/581315/   (PATCH[1/9] Richard's SRIOV EEH)
   https://patchwork.ozlabs.org/patch/582639/   (PATCH[1/1] Gavin's EEH fix)
   https://patchwork.ozlabs.org/patch/582093/   (PATCH[1/1] Gavin's EEH fix)
   https://patchwork.ozlabs.org/patch/580626/   (PATCH[1/4] Gavin's PCI fix)
   https://patchwork.ozlabs.org/patch/580153/   (PATCH[1/1] Andrew's EEH minor 
fix)
   https://patchwork.ozlabs.org/patch/566827/   (PATCH[1/1] Russell's P5IOC2 
removal)
   https://patchwork.ozlabs.org/patch/534154/   (PATCH[1/7] Richard's SRIOV 
rework)
   commit 388f7b1 ("Linux 4.5-rc3")
   
The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skib...@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking). Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 23] are related to this part.

From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. 
Conversely,
the device-tree node should be removed from the system when the PCI adapter is 
going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. PATCH[24 - 39] are
doing the related work.

The OF driver is changed to support unflattening FDT blob for sub-stree, which
is covered by PATCH[40 - 44].

The last one, PATCH[45], is the standalone PCI hotplug driver for PowerPC 
PowerNV
platform.

===
Testing
===
1. Unplug adapters behind non-empty slot, then plug them.

   1.1 Check status
   # cat /sys/bus/pci/slots/C10/address 
   0003:09:00
   # cat /sys/bus/pci/slots/C10/adapter 
   1
   # cat /sys/bus/pci/slots/C10/power 
   1
   # lspci
   0003:09:00.0 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   0003:09:00.1 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   0003:09:00.2 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   0003:09:00.3 Ethernet controller: \
   Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
   # lspci -t
   # lspci -t
   -+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
|   +-08.0-[04-08]--
|   +-09.0-[09]--+-00.0
|   |+-00.1
|   |+-00.2
|   |\-00.3
|   +-10.0-[0a-0e]--
|   \-11.0-[0f-13]--

   1.2 Unplug adapter 0003:09.00.x
   # echo 0 > /sys/bus/pci/slots/C10/power 
   # lspci -t
   -+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
|   +-08.0-[04-08]--
|   +-09.0-[09]--
|   +-10.0-[0a-0e]--
|   \-11.0-[0f-13]--

   1.3 Plug adapter 0003:09.00.x
   # echo 1 > /sys/bus/pci/slots/C10/power 
   # lspci -t
   -+-[0003:00]---00.0-[01-13]00.0-[02-13]--+-01.0-[03]00.0
|   +-08.0-[04-08]--
|   +-09.0-[09]--+-00.0
|   |+-00.1
|   |+-00.2

Re: [PATCH v8 8/8] livepatch: Detect offset for the ftrace location during build

2016-02-16 Thread Michael Ellerman
On Tue, 2016-02-16 at 14:57 +0100, Petr Mladek wrote:
> 
> Some dugging has shown an Oops in the fucntion int_to_scsilun()
> called from ibmvscsi_queuecommand(). So, I rebooted and
> did the following test:
> 
> $> echo ibmvscsi_queuecommand >/sys/kernel/debug/tracing/set_ftrace_filter
> $> echo function > /sys/kernel/debug/tracing/current_tracer 
> $> echo 1 > /sys/kernel/debug/tracing/tracing_on
> $> cat /sys/kernel/debug/tracing/trace
> # tracer: function
> #
> # entries-in-buffer/entries-written: 7/7   #P:4
> #
> #  _-=> irqs-off
> # / _=> need-resched
> #| / _---=> hardirq/softirq
> #|| / _--=> preempt-depth
> #||| / delay
> #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
> #  | |   |      | |
> bash-3488  [000]    100.278622: ibmvscsi_queuecommand 
> <-scsi_dispatch_cmd
>  kworker/1:2-223   [001]    101.048569: ibmvscsi_queuecommand 
> <-scsi_dispatch_cmd
>  kworker/1:2-223   [001]    103.048575: ibmvscsi_queuecommand 
> <-scsi_dispatch_cmd
>  jbd2/sda3-8-1021  [003]    104.008645: ibmvscsi_queuecommand 
> <-scsi_dispatch_cmd
>  jbd2/sda3-8-1021  [003]    104.008883: ibmvscsi_queuecommand 
> <-scsi_dispatch_cmd
>   -0 [000] ..s.   104.017672: ibmvscsi_queuecommand 
> <-scsi_dispatch_cmd
>   -0 [003] ..s.   104.017771: ibmvscsi_queuecommand 
> <-scsi_dispatch_cmd
> 
> It means that ibmvscsi_queuecommand can be traced. Then I did
> 
> c79:/sys/kernel/debug/tracing # echo int_to_scsilun >set_ftrace_filter
> 
> BANG!
> 
> Unable to handle kernel paging request for data at address 0xd108b148
> Faulting instruction address: 0xd0bde35c
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: af_packet(E) dm_mod(E) e1000(E) rtc_generic(E) ext4(E) 
> crc16(E) mbcache(E) jbd2(E) sr_mod(E) cdrom(E) sd_mod(E) ibmvscsi(E) 
> scsi_transport_srp(E) sg(E) scsi_mod(E) autofs4(E)
> CPU: 1 PID: 223 Comm: kworker/1:2 Tainted: GE   
> 4.5.0-rc2-11-default+ #90
> Workqueue: events_freezable_power_ disk_events_workfn
> task: c000f7d99aa0 ti: c000f7cb8000 task.ti: c000f7cb8000
> NIP: d0bde35c LR: d0bcffec CTR: d0bcffe0
> REGS: c000f7cbb3b0 TRAP: 0300   Tainted: GE
> (4.5.0-rc2-11-default+)
> MSR: 80019033   CR: 24c82220  XER: 
> 
> CFAR: d0bdd144 DAR: d108b148 DSISR: 4000 SOFTE: 0 
> GPR00: d10a21cc c000f7cbb630 d10aeda0 8200 
> GPR04: c000fa0b33fc 014a c000fa0b3408 0010 
> GPR08: 0008 c370a4e0  d108b128 
> GPR12: d0bcffe0 c7e80300 c00d8b18 c000fe04ba80 
> GPR16:  c000f7b6f208  0001 
> GPR20: c000f7b6f144 c000f7b6f140 d0bcbcf0  
> GPR24: 0200  0001 c000f7b6f810 
> GPR28: c000f7b6f000 c000f7b6f000 c000fa0b33a0 c000fe837e00 
> NIP [d0bde35c] scsi_inq_str+0x21b0/0x41ac [scsi_mod]
> LR [d0bcffec] int_to_scsilun+0xc/0x60 [scsi_mod]
> Call Trace:
> [c000f7cbb630] [d10a21cc] ibmvscsi_queuecommand+0x10c/0x4e0 
> [ibmvscsi] (unreliable)
> [c000f7cbb6e0] [d0bcbea8] scsi_dispatch_cmd+0xe8/0x2c0 [scsi_mod]
> [c000f7cbb760] [d0bceb0c] scsi_request_fn+0x50c/0x8b0 [scsi_mod]
> [c000f7cbb850] [c0407280] __blk_run_queue+0x60/0x90
> [c000f7cbb880] [c0413640] blk_execute_rq_nowait+0x100/0x1a0
> [c000f7cbb8d0] [c0413768] blk_execute_rq+0x88/0x170
> [c000f7cbb9b0] [d0bca048] scsi_execute+0x108/0x1d0 [scsi_mod]
> [c000f7cbba20] [d0bca2e8] scsi_execute_req_flags+0xc8/0x150 
> [scsi_mod]
> [c000f7cbbae0] [d12209e4] sr_check_events+0xb4/0x340 [sr_mod]
> [c000f7cbbb90] [d11c00b4] cdrom_check_events+0x44/0x80 [cdrom]
> [c000f7cbbbc0] [d1220fa4] sr_block_check_events+0x44/0x60 [sr_mod]
> [c000f7cbbbe0] [c04222f8] disk_check_events+0x78/0x1b0
> [c000f7cbbc50] [c00d0610] process_one_work+0x1a0/0x480
> [c000f7cbbce0] [c00d0998] worker_thread+0xa8/0x5c0
> [c000f7cbbd80] [c00d8c24] kthread+0x114/0x140
> [c000f7cbbe30] [c0009538] ret_from_kernel_thread+0x5c/0xa4
> Instruction dump:
> 396bc360 f8410018 e98b0020 7d8903a6 4e800420   0044fb30 
> c000 3d62fffe 396bc388 6000  7d8903a6 4e800420  
> ---[ end trace 3b830c669dd7adb5 ]---
> 
> 
> Note that ibmvscsi_queuecommand() handle TOC and int_to_scsilun()
> does not handle TOC
> 
> $> objdump -hdr  drivers/scsi/ibmvscsi/ibmvscsi.ko
> 20c0 :
>

Re: [PATCH v2 3/7] ibmvscsi: Replace magic values in set_adpater_info() with defines

2016-02-16 Thread Martin K. Petersen
> "Tyrel" == Tyrel Datwyler  writes:

>> Is there some reason you didn't carry the review tag over from this:
>> 
>> http://mid.gmane.org/20160204084459.gw27...@c203.arch.suse.de
>> 
>> ?
>> 
>> James

Tyrel> The patch is slightly changed from v1. A define for AIX os type
Tyrel> was added as mentioned in the cover letter v2 changes, and I
Tyrel> moved the defines to the mad_adapter_info_data structure around
Tyrel> the fields they apply.

Johannes: Mind checking this out?

https://patchwork.kernel.org/patch/8276101/

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] MAINTAINERS: rename EEH from "enhanced" to "extended" error handling

2016-02-16 Thread Michael Ellerman
On Wed, 2016-02-17 at 10:49 +1100, Russell Currey wrote:
> On Wed, 2016-02-17 at 09:58 +1100, Gavin Shan wrote:
> > On Wed, Feb 17, 2016 at 09:49:08AM +1100, Gavin Shan wrote:
> > > 
> > > On Tue, Feb 16, 2016 at 07:52:00PM +1100, Russell Currey wrote:
> > > > 
> > > > Incredibly, IBM online documentation for EEH uses "extended error
> > > > handling"
> > > > and "enhanced error handling" to refer to the same thing, in
> > > > different
> > > > places.  In other parts of the kernel, namely the EEH documentation
> > > > (found
> > > > in Documentation/powerpc/eeh-pci-error-recovery.txt), it's referred
> > > > to as
> > > > "extended", and in my opinion "extended" makes more sense for what
> > > > EEH
> > > > does.
> > > > 
> > > > The only place "enhanced error handling" shows up in the kernel is in
> > > > MAINTAINERS, so fix it.
> > > > 
> > > Russell, Thanks for fixing it up. Since you're at it, Please replace
> > > the
> > > maintainer to yourself. Also, the components mentioned in this file are
> > > listed in alphabetic order according to their names. As the name of EEH
> > > is changed, it would be put in front of "Extended Verification Module
> > > (EVM)".
> > > 
> > I agree with Michael as discussed in another thread: we're going to use
> > "enhanced", not "extended" though some chip datasheet (P7IOC) talks about
> > "extended error handling".
> > 
> > Russell, please change the maintainer to you and repost.
> Sure.
> 
> While I'm changing MAINTAINERS, what are your thoughts on renaming the EEH
> entry from "ENHANCED ERROR HANDLING (EEH)" to something like "PCI EXTENDED
> ERROR HANDLING (EEH)" or "EXTENDED ERROR HANDLING (EEH) FOR PCI" - so it's
> clear it refers to PCI and not something more generic?

Yep, having PCI at the front would be good as it will sort next to the other
PCI things. Having POWERPC in there is probably also good.

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] MAINTAINERS: rename EEH from "enhanced" to "extended" error handling

2016-02-16 Thread Russell Currey
On Wed, 2016-02-17 at 10:49 +1100, Russell Currey wrote:
> On Wed, 2016-02-17 at 09:58 +1100, Gavin Shan wrote:
> > 
> > On Wed, Feb 17, 2016 at 09:49:08AM +1100, Gavin Shan wrote:
> > > 
> > > 
> > > On Tue, Feb 16, 2016 at 07:52:00PM +1100, Russell Currey wrote:
> > > > 
> > > > 
> > > > Incredibly, IBM online documentation for EEH uses "extended error
> > > > handling"
> > > > and "enhanced error handling" to refer to the same thing, in
> > > > different
> > > > places.  In other parts of the kernel, namely the EEH documentation
> > > > (found
> > > > in Documentation/powerpc/eeh-pci-error-recovery.txt), it's referred
> > > > to as
> > > > "extended", and in my opinion "extended" makes more sense for what
> > > > EEH
> > > > does.
> > > > 
> > > > The only place "enhanced error handling" shows up in the kernel is
> > > > in
> > > > MAINTAINERS, so fix it.
> > > > 
> > > Russell, Thanks for fixing it up. Since you're at it, Please replace
> > > the
> > > maintainer to yourself. Also, the components mentioned in this file
> > > are
> > > listed in alphabetic order according to their names. As the name of
> > > EEH
> > > is changed, it would be put in front of "Extended Verification Module
> > > (EVM)".
> > > 
> > I agree with Michael as discussed in another thread: we're going to use
> > "enhanced", not "extended" though some chip datasheet (P7IOC) talks
> > about
> > "extended error handling".
> > 
> > Russell, please change the maintainer to you and repost.
> Sure.
> 
> While I'm changing MAINTAINERS, what are your thoughts on renaming the
> EEH
> entry from "ENHANCED ERROR HANDLING (EEH)" to something like "PCI
> EXTENDED
> ERROR HANDLING (EEH)" or "EXTENDED ERROR HANDLING (EEH) FOR PCI" - so
> it's
> clear it refers to PCI and not something more generic?

and of course, s/EXTENDED/ENHANCED...-ENOCOFFEE
> > 
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > Signed-off-by: Russell Currey 
> > > With above issues fixed:
> > > 
> > > Acked-by: Gavin Shan 
> > > 
> > Thanks,
> > Gavin
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > ---
> > > > I don't know what kind of things parse MAINTAINERS, and if there's
> > > > a
> > > > chance
> > > > this will break them.  Also, the powerpc tree *is* the right place
> > > > to
> > > > send
> > > > this, right?
> > > > ---
> > > > MAINTAINERS | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > > index 28eb61b..e5d9134 100644
> > > > --- a/MAINTAINERS
> > > > +++ b/MAINTAINERS
> > > > @@ -4222,7 +4222,7 @@ M:Maxim Levitsky  > > > .c
> > > > om>
> > > > S:  Maintained
> > > > F:  drivers/media/rc/ene_ir.*
> > > > 
> > > > -ENHANCED ERROR HANDLING (EEH)
> > > > +EXTENDED ERROR HANDLING (EEH)
> > > > M:  Gavin Shan 
> > > > L:  linuxppc-dev@lists.ozlabs.org
> > > > S:  Supported
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc/perf/24x7: Eliminate domain suffix in event names

2016-02-16 Thread Sukadev Bhattiprolu
From 1520e8087d047e8ab6c1bda027a74eb33956e5a0 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Tue, 16 Feb 2016 22:21:25 -0500
Subject: [PATCH 2/2] powerpc/perf/24x7: Eliminate domain suffix in event names

The Physical Core events of the 24x7 PMU can be monitored across various
domains (physical core, vcpu home core, vcpu home node etc). For each of
these core events, we currently create multiple events in sysfs, one for
each domain the event can be monitored in. These events are distinguished
by their suffixes like __PHYS_CORE, __VCPU_HOME_CORE etc.

Rather than creating multiple such entries, we could let the user specify
make 'domain' index a required parameter and let the user specify a value
for it (like they currently specify the core index).

$ cat /sys/bus/event_source/devices/hv_24x7/events/HPM_CCYC
domain=?,offset=0x98,core=?,lpar=0x0

$ perf stat -C 0 -e hv_24x7/HPM_CCYC,domain=2,core=1/ true

(the 'domain=?' and 'core=?' in sysfs tell perf tool to enforce them as
required parameters).

This simplifies the interface and allows users to identify events by the
name specified in the catalog (User can determine the domain index by
referring to '/sys/bus/event_source/devices/hv_24x7/interface/domains').

Eliminating the event suffix eliminates several functions and simplifies
code.

Note that Physical Chip events can only be monitored in the chip domain
so those events have the domain set to 1 (rather than =?) and users don't
need to specify the domain index for the Chip events.

$ cat /sys/bus/event_source/devices/hv_24x7/events/PM_XLINK_CYCLES
domain=1,offset=0x230,chip=?,lpar=0x0

$ perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES,chip=1/ true

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/perf/hv-24x7.c | 149 
 1 file changed, 66 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 36b29fd..59012e7 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -27,20 +27,6 @@
 #include "hv-24x7-catalog.h"
 #include "hv-common.h"
 
-static const char *event_domain_suffix(unsigned domain)
-{
-   switch (domain) {
-#define DOMAIN(n, v, x, c) \
-   case HV_PERF_DOMAIN_##n:\
-   return "__" #n;
-#include "hv-24x7-domains.h"
-#undef DOMAIN
-   default:
-   WARN(1, "unknown domain %d\n", domain);
-   return "__UNKNOWN_DOMAIN_SUFFIX";
-   }
-}
-
 static bool domain_is_valid(unsigned domain)
 {
switch (domain) {
@@ -294,38 +280,70 @@ static unsigned long h_get_24x7_catalog_page(char page[],
version, index);
 }
 
-static unsigned core_domains[] = {
-   HV_PERF_DOMAIN_PHYS_CORE,
-   HV_PERF_DOMAIN_VCPU_HOME_CORE,
-   HV_PERF_DOMAIN_VCPU_HOME_CHIP,
-   HV_PERF_DOMAIN_VCPU_HOME_NODE,
-   HV_PERF_DOMAIN_VCPU_REMOTE_NODE,
-};
-/* chip event data always yeilds a single event, core yeilds multiple */
-#define MAX_EVENTS_PER_EVENT_DATA ARRAY_SIZE(core_domains)
-
+/*
+ * Each event we find in the catalog, will have a sysfs entry. Format the
+ * data for this sysfs entry based on the event's domain.
+ *
+ * Events belonging to the Chip domain can only be monitored in that domain.
+ * i.e the domain for these events is a fixed/knwon value.
+ *
+ * Events belonging to the Core domain can be monitored either in the physical
+ * core or in one of the virtual CPU domains. So the domain value for these
+ * events must be specified by the user (i.e is a required parameter). Format
+ * the Core events with 'domain=?' so the perf-tool can error check required
+ * parameters.
+ *
+ * NOTE: For the Core domain events, rather than making domain a required
+ *  parameter we could default it to PHYS_CORE and allowe users to
+ *  override the domain to one of the VCPU domains.
+ *
+ *  However, this can make the interface a little inconsistent.
+ *
+ *  If we set domain=2 (PHYS_CHIP) and allow user to override this field
+ *  the user may be tempted to also modify the "offset=x" field in which
+ *  can lead to confusing usage. Consider the HPM_PCYC (offset=0x18) and
+ *  HPM_INST (offset=0x20) events. With:
+ *
+ * perf stat -e hv_24x7/HPM_PCYC,offset=0x20/
+ *
+ * we end up monitoring HPM_INST, while the command line has HPM_PCYC.
+ *
+ * By not assigning a default value to the domain for the Core events,
+ * we can have simple guidelines:
+ *
+ * - Specifying values for parameters with "=?" is required.
+ *
+ * - Specifying (i.e overriding) values for other parameters
+ *   is undefined.
+ */
 static char *event_fmt(struct hv_24x7_event_data *event, unsigned domain)
 {
const char *sindex;
const char *lpar;
+   const char *domain_str;
+   char buf[8];
 
   

[PATCH 1/2] powerpc/perf/hv-24x7: Display domain indices in sysfs

2016-02-16 Thread Sukadev Bhattiprolu
From aff5a822e873522b9a3f355f816547394b452a64 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Tue, 16 Feb 2016 20:07:51 -0500
Subject: [PATCH 1/2] powerpc/perf/hv-24x7: Display domain indices in sysfs

To help users determine domains, display the domain indices used by the
kernel in sysfs.

$ cat /sys/bus/event_source/devices/hv_24x7/interface/domains
1: Physical Chip
2: Physical Core
3: VCPU Home Core
4: VCPU Home Chip
5: VCPU Home Node
6: VCPU Remote Node

Signed-off-by: Sukadev Bhattiprolu 
---
 arch/powerpc/perf/hv-24x7.c | 41 +
 arch/powerpc/perf/hv-24x7.h |  1 +
 2 files changed, 42 insertions(+)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 77b958f..36b29fd 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -68,6 +68,24 @@ static bool is_physical_domain(unsigned domain)
}
 }
 
+static const char *domain_name(unsigned domain)
+{
+   if (!domain_is_valid(domain))
+   return NULL;
+
+   switch (domain) {
+   case HV_PERF_DOMAIN_PHYS_CHIP:  return "Physical Chip";
+   case HV_PERF_DOMAIN_PHYS_CORE:  return "Physical Core";
+   case HV_PERF_DOMAIN_VCPU_HOME_CORE: return "VCPU Home Core";
+   case HV_PERF_DOMAIN_VCPU_HOME_CHIP: return "VCPU Home Chip";
+   case HV_PERF_DOMAIN_VCPU_HOME_NODE: return "VCPU Home Node";
+   case HV_PERF_DOMAIN_VCPU_REMOTE_NODE:   return "VCPU Remote Node";
+   }
+
+   WARN_ON_ONCE(domain);
+   return NULL;
+}
+
 static bool catalog_entry_domain_is_valid(unsigned domain)
 {
return is_physical_domain(domain);
@@ -969,6 +987,27 @@ e_free:
return ret;
 }
 
+static ssize_t domains_show(struct device *dev, struct device_attribute *attr,
+   char *page)
+{
+   int d, n, count = 0;
+   const char *str;
+
+   for (d = 0; d < HV_PERF_DOMAIN_MAX; d++) {
+   str = domain_name(d);
+   if (!str)
+   continue;
+
+   n = sprintf(page, "%d: %s\n", d, str);
+   if (n < 0)
+   break;
+
+   count += n;
+   page += n;
+   }
+   return count;
+}
+
 #define PAGE_0_ATTR(_name, _fmt, _expr)\
 static ssize_t _name##_show(struct device *dev,\
struct device_attribute *dev_attr,  \
@@ -997,6 +1036,7 @@ PAGE_0_ATTR(catalog_version, "%lld\n",
 PAGE_0_ATTR(catalog_len, "%lld\n",
(unsigned long long)be32_to_cpu(page_0->length) * 4096);
 static BIN_ATTR_RO(catalog, 0/* real length varies */);
+static DEVICE_ATTR_RO(domains);
 
 static struct bin_attribute *if_bin_attrs[] = {
_attr_catalog,
@@ -1006,6 +1046,7 @@ static struct bin_attribute *if_bin_attrs[] = {
 static struct attribute *if_attrs[] = {
_attr_catalog_len.attr,
_attr_catalog_version.attr,
+   _attr_domains.attr,
NULL,
 };
 
diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h
index 0f9fa21..2d7f4e4 100644
--- a/arch/powerpc/perf/hv-24x7.h
+++ b/arch/powerpc/perf/hv-24x7.h
@@ -7,6 +7,7 @@ enum hv_perf_domains {
 #define DOMAIN(n, v, x, c) HV_PERF_DOMAIN_##n = v,
 #include "hv-24x7-domains.h"
 #undef DOMAIN
+   HV_PERF_DOMAIN_MAX,
 };
 
 struct hv_24x7_request {
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/eeh: rename EEH from "extended" to "enhanced" error handling

2016-02-16 Thread Gavin Shan
On Tue, Feb 16, 2016 at 11:06:05PM +1100, Russell Currey wrote:
>IBM online documentation for EEH uses "extended error handling" and
>"enhanced error handling" to refer to the same thing, in different
>places.  The only place mentioning it as "enhanced error handling" in the
>kernel is the MAINTAINERS file, and it's "extended" in some documentation.
>
>IBM originally defined EEH as "enhanced error handling", so standardise
>all mentions of EEH to use that term.
>
>Signed-off-by: Russell Currey 

Acked-by: Gavin Shan 

Thanks,
Gavin

>---
>This is essentially a V2 (though it has the inverse result) of this patch:
>https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-February/139367.html
>---
> Documentation/powerpc/eeh-pci-error-recovery.txt | 2 +-
> arch/powerpc/kernel/eeh.c| 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/Documentation/powerpc/eeh-pci-error-recovery.txt 
>b/Documentation/powerpc/eeh-pci-error-recovery.txt
>index 9d4e33d..6781892 100644
>--- a/Documentation/powerpc/eeh-pci-error-recovery.txt
>+++ b/Documentation/powerpc/eeh-pci-error-recovery.txt
>@@ -12,7 +12,7 @@ Overview:
> The IBM POWER-based pSeries and iSeries computers include PCI bus
> controller chips that have extended capabilities for detecting and
> reporting a large variety of PCI bus error conditions.  These features
>-go under the name of "EEH", for "Extended Error Handling".  The EEH
>+go under the name of "EEH", for "Enhanced Error Handling".  The EEH
> hardware features allow PCI bus errors to be cleared and a PCI
> card to be "rebooted", without also having to reboot the operating
> system.
>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>index 40e4d4a..2e78466 100644
>--- a/arch/powerpc/kernel/eeh.c
>+++ b/arch/powerpc/kernel/eeh.c
>@@ -48,7 +48,7 @@
> 
> 
> /** Overview:
>- *  EEH, or "Extended Error Handling" is a PCI bridge technology for
>+ *  EEH, or "Enhanced Error Handling" is a PCI bridge technology for
>  *  dealing with PCI bus errors that can't be dealt with within the
>  *  usual PCI framework, except by check-stopping the CPU.  Systems
>  *  that are designed for high-availability/reliability cannot afford
>-- 
>2.7.1
>
>___
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] MAINTAINERS: rename EEH from "enhanced" to "extended" error handling

2016-02-16 Thread Gavin Shan
On Wed, Feb 17, 2016 at 09:49:08AM +1100, Gavin Shan wrote:
>On Tue, Feb 16, 2016 at 07:52:00PM +1100, Russell Currey wrote:
>>Incredibly, IBM online documentation for EEH uses "extended error handling"
>>and "enhanced error handling" to refer to the same thing, in different
>>places.  In other parts of the kernel, namely the EEH documentation (found
>>in Documentation/powerpc/eeh-pci-error-recovery.txt), it's referred to as
>>"extended", and in my opinion "extended" makes more sense for what EEH
>>does.
>>
>>The only place "enhanced error handling" shows up in the kernel is in
>>MAINTAINERS, so fix it.
>>
>
>Russell, Thanks for fixing it up. Since you're at it, Please replace the
>maintainer to yourself. Also, the components mentioned in this file are
>listed in alphabetic order according to their names. As the name of EEH
>is changed, it would be put in front of "Extended Verification Module (EVM)".
>

I agree with Michael as discussed in another thread: we're going to use
"enhanced", not "extended" though some chip datasheet (P7IOC) talks about
"extended error handling".

Russell, please change the maintainer to you and repost.

>>Signed-off-by: Russell Currey 
>
>With above issues fixed:
>
>Acked-by: Gavin Shan 
>

Thanks,
Gavin

>>---
>>I don't know what kind of things parse MAINTAINERS, and if there's a chance
>>this will break them.  Also, the powerpc tree *is* the right place to send
>>this, right?
>>---
>> MAINTAINERS | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/MAINTAINERS b/MAINTAINERS
>>index 28eb61b..e5d9134 100644
>>--- a/MAINTAINERS
>>+++ b/MAINTAINERS
>>@@ -4222,7 +4222,7 @@ M:  Maxim Levitsky 
>> S:   Maintained
>> F:   drivers/media/rc/ene_ir.*
>> 
>>-ENHANCED ERROR HANDLING (EEH)
>>+EXTENDED ERROR HANDLING (EEH)
>> M:   Gavin Shan 
>> L:   linuxppc-dev@lists.ozlabs.org
>> S:   Supported
>>-- 
>>2.7.1
>>
>>___
>>Linuxppc-dev mailing list
>>Linuxppc-dev@lists.ozlabs.org
>>https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>___
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] MAINTAINERS: rename EEH from "enhanced" to "extended" error handling

2016-02-16 Thread Gavin Shan
On Tue, Feb 16, 2016 at 07:52:00PM +1100, Russell Currey wrote:
>Incredibly, IBM online documentation for EEH uses "extended error handling"
>and "enhanced error handling" to refer to the same thing, in different
>places.  In other parts of the kernel, namely the EEH documentation (found
>in Documentation/powerpc/eeh-pci-error-recovery.txt), it's referred to as
>"extended", and in my opinion "extended" makes more sense for what EEH
>does.
>
>The only place "enhanced error handling" shows up in the kernel is in
>MAINTAINERS, so fix it.
>

Russell, Thanks for fixing it up. Since you're at it, Please replace the
maintainer to yourself. Also, the components mentioned in this file are
listed in alphabetic order according to their names. As the name of EEH
is changed, it would be put in front of "Extended Verification Module (EVM)".

>Signed-off-by: Russell Currey 

With above issues fixed:

Acked-by: Gavin Shan 

Thanks,
Gavin

>---
>I don't know what kind of things parse MAINTAINERS, and if there's a chance
>this will break them.  Also, the powerpc tree *is* the right place to send
>this, right?
>---
> MAINTAINERS | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 28eb61b..e5d9134 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -4222,7 +4222,7 @@ M:   Maxim Levitsky 
> S:Maintained
> F:drivers/media/rc/ene_ir.*
> 
>-ENHANCED ERROR HANDLING (EEH)
>+EXTENDED ERROR HANDLING (EEH)
> M:Gavin Shan 
> L:linuxppc-dev@lists.ozlabs.org
> S:Supported
>-- 
>2.7.1
>
>___
>Linuxppc-dev mailing list
>Linuxppc-dev@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: linux-4.5-rc4/arch/powerpc/boot/treeboot-akebono.c:90: possible bad test ?

2016-02-16 Thread Daniel Axtens
Hi David,

>
> [linux-4.5-rc4/arch/powerpc/boot/treeboot-akebono.c:90]: (style) A pointer 
> can not be negative so it is either pointless or an error to check if it is 
> not.
>
> Source code is
>
>     emac = finddevice("/plb/opb/ethernet");
>     if (emac> 0) {
>
> but
>
>     void *emac;
>
> Suggest new code
>
>     emac = finddevice("/plb/opb/ethernet");
>     if (emac != 0) {
>
>

That looks like a good suggestion: maybe make the test 'if (!emac)'
rather than explicitly comparing with zero.

Are you comfortable sending a patch to that effect? A patch generated by
git format-patch and sent with git send-email are usually the easiest.

If you have any difficulties, feel free to ping me off-list and I can
walk you through the process.

Regards,
Daniel
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] add POWER Virtual Management Channel driver

2016-02-16 Thread Greg Kroah-Hartman
On Tue, Feb 16, 2016 at 02:43:13PM -0600, Steven Royer wrote:
> From: Steven Royer 
> 
> The ibmvmc driver is a device driver for the POWER Virtual Management
> Channel virtual adapter on the PowerVM platform.  It is used to
> communicate with the hypervisor for virtualization management.  It
> provides both request/response and asynchronous message support through
> the /dev/ibmvmc node.

What is the protocol for that device node?

Where is the documentation here?  Why does this have to be a character
device?  Why can't it fit in with other drivers of this type?

> 
> Signed-off-by: Steven Royer 
> ---
> This is used by the PowerVM NovaLink project.  You can see development 
> history on github:
> https://github.com/powervm/ibmvmc
> 
>  Documentation/ioctl/ioctl-number.txt |1 +
>  MAINTAINERS  |5 +
>  arch/powerpc/include/asm/hvcall.h|3 +-
>  drivers/misc/Kconfig |9 +
>  drivers/misc/Makefile|1 +
>  drivers/misc/ibmvmc.c| 1882 
> ++
>  drivers/misc/ibmvmc.h|  203 
>  7 files changed, 2103 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/misc/ibmvmc.c
>  create mode 100644 drivers/misc/ibmvmc.h
> 
> diff --git a/Documentation/ioctl/ioctl-number.txt 
> b/Documentation/ioctl/ioctl-number.txt
> index 91261a3..d5f5f4f 100644
> --- a/Documentation/ioctl/ioctl-number.txt
> +++ b/Documentation/ioctl/ioctl-number.txt
> @@ -324,6 +324,7 @@ Code  Seq#(hex)   Include FileComments
>  0xCA 80-8F   uapi/scsi/cxlflash_ioctl.h
>  0xCB 00-1F   CBM serial IEC bus  in development:
>   
> 
> +0xCC 00-0F   drivers/misc/ibmvmc.h   pseries VMC driver
>  0xCD 01  linux/reiserfs_fs.h
>  0xCF 02  fs/cifs/ioctl.c
>  0xDB 00-0F   drivers/char/mwave/mwavepub.h
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cc2f753..c39dca2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5353,6 +5353,11 @@ L: net...@vger.kernel.org
>  S:   Supported
>  F:   drivers/net/ethernet/ibm/ibmvnic.*
>  
> +IBM Power Virtual Management Channel Driver
> +M:   Steven Royer 
> +S:   Supported
> +F:   drivers/misc/ibmvmc.*
> +
>  IBM Power Virtual SCSI Device Drivers
>  M:   Tyrel Datwyler 
>  L:   linux-s...@vger.kernel.org
> diff --git a/arch/powerpc/include/asm/hvcall.h 
> b/arch/powerpc/include/asm/hvcall.h
> index e3b54dd..1ee6f2b 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -274,7 +274,8 @@
>  #define H_COP0x304
>  #define H_GET_MPP_X  0x314
>  #define H_SET_MODE   0x31C
> -#define MAX_HCALL_OPCODE H_SET_MODE
> +#define H_REQUEST_VMC0x360
> +#define MAX_HCALL_OPCODE H_REQUEST_VMC
>  
>  /* H_VIOCTL functions */
>  #define H_GET_VIOA_DUMP_SIZE 0x01
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 054fc10..f8d9113 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -526,6 +526,15 @@ config VEXPRESS_SYSCFG
> bus. System Configuration interface is one of the possible means
> of generating transactions on this bus.
>  
> +config IBMVMC
> + tristate "IBM Virtual Management Channel support"
> + depends on PPC_PSERIES
> + help
> +   This is the IBM POWER Virtual Management Channel
> +
> +   To compile this driver as a module, choose M here: the
> +   module will be called ibmvmc.
> +
>  source "drivers/misc/c2port/Kconfig"
>  source "drivers/misc/eeprom/Kconfig"
>  source "drivers/misc/cb710/Kconfig"
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index 537d7f3..08336b3 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -56,3 +56,4 @@ obj-$(CONFIG_GENWQE)+= genwqe/
>  obj-$(CONFIG_ECHO)   += echo/
>  obj-$(CONFIG_VEXPRESS_SYSCFG)+= vexpress-syscfg.o
>  obj-$(CONFIG_CXL_BASE)   += cxl/
> +obj-$(CONFIG_IBMVMC) += ibmvmc.o
> diff --git a/drivers/misc/ibmvmc.c b/drivers/misc/ibmvmc.c
> new file mode 100644
> index 000..fb943b7
> --- /dev/null
> +++ b/drivers/misc/ibmvmc.c
> @@ -0,0 +1,1882 @@
> +/*
> + * IBM Power Systems Virtual Management Channel Support.
> + *
> + * Copyright (c) 2004, 2016 IBM Corp.
> + *   Dave Engebretsen engeb...@us.ibm.com
> + *   Steven Royer sero...@linux.vnet.ibm.com
> + *   Adam Reznechek adrez...@linux.vnet.ibm.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.

I have to ask, but do you really mean "or any later version"?

> + *
> + * This program is distributed in the hope 

Re: Question about code which uses MPIC_NO_RESET on 85XX

2016-02-16 Thread Scott Wood
On Thu, 2016-02-04 at 14:56 +0100, Alessio Igor Bogani wrote:
> Hi,
> 
> Can we change this code (from mpc85xx_ds.c)
> 
> if (of_flat_dt_is_compatible(root, "fsl,MPC8572DS-CAMP")) {
> mpic = mpic_alloc(NULL, 0,
> MPIC_NO_RESET |
> MPIC_BIG_ENDIAN |
> MPIC_SINGLE_DEST_CPU,
> 0, 256, " OpenPIC  ");
> } else {
> mpic = mpic_alloc(NULL, 0,
> MPIC_BIG_ENDIAN |
> MPIC_SINGLE_DEST_CPU,
> 0, 256, " OpenPIC  ");
> }
> 
> in this one
> 
> mpic = mpic_alloc(NULL, 0,
> MPIC_BIG_ENDIAN |
> MPIC_SINGLE_DEST_CPU,
> 0, 256, " OpenPIC  ");
> 
> using "pic-no-reset" in the device tree?

In theory that breaks existing device trees that don't specify pic-no-reset. 
 I'm not sure how much it matters in this case as it's primarily meant as an
example of how to do AMP rather than something that works out-of-the-box.

BTW, Kyle, it looks like there was a meaningful difference between
!MPIC_WANTS_RESET and MPIC_NO_RESET -- the former did not inhibit
initialization of non-protected vectors and was the behavior of fsl,MPC8572DS
-CAMP before your "powerpc/mpic: Remove duplicate MPIC_WANTS_RESET flag"
patch.  It's no longer possible to initialize all MPIC vectors except
protected ones.  Again, I'm not sure it matters much, but I'm also not sure
how much continued  value the protected-source code has.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 18/18] cxl: Add tracepoints around the CAPI hcall

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

To ease debugging, add a few tracepoints around the CAPI hcalls.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/hcalls.c |   9 +++
 drivers/misc/cxl/trace.h  | 193 ++
 2 files changed, 202 insertions(+)

diff --git a/drivers/misc/cxl/hcalls.c b/drivers/misc/cxl/hcalls.c
index f592e80..7a5eab0 100644
--- a/drivers/misc/cxl/hcalls.c
+++ b/drivers/misc/cxl/hcalls.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include "hcalls.h"
+#include "trace.h"
 
 #define CXL_HCALL_TIMEOUT 6
 #define CXL_HCALL_TIMEOUT_DOWNLOAD 12
@@ -142,6 +143,7 @@ long cxl_h_attach_process(u64 unit_address,
CXL_H_WAIT_UNTIL_DONE(rc, retbuf, H_ATTACH_CA_PROCESS, unit_address, 
virt_to_phys(element));
_PRINT_MSG(rc, "cxl_h_attach_process(%#.16llx, %#.16lx): %li\n",
unit_address, virt_to_phys(element), rc);
+   trace_cxl_hcall_attach(unit_address, virt_to_phys(element), retbuf[0], 
retbuf[1], retbuf[2], rc);
 
pr_devel("token: 0x%.8lx mmio_addr: 0x%lx mmio_size: 0x%lx\nProcess 
Element Structure:\n",
retbuf[0], retbuf[1], retbuf[2]);
@@ -181,6 +183,7 @@ long cxl_h_detach_process(u64 unit_address, u64 
process_token)
 
CXL_H_WAIT_UNTIL_DONE(rc, retbuf, H_DETACH_CA_PROCESS, unit_address, 
process_token);
_PRINT_MSG(rc, "cxl_h_detach_process(%#.16llx, 0x%.8llx): %li\n", 
unit_address, process_token, rc);
+   trace_cxl_hcall_detach(unit_address, process_token, rc);
 
switch (rc) {
case H_SUCCESS:   /* The process was detached from the coherent 
platform function */
@@ -213,6 +216,7 @@ static long cxl_h_control_function(u64 unit_address, u64 op,
CXL_H9_WAIT_UNTIL_DONE(rc, retbuf, H_CONTROL_CA_FUNCTION, unit_address, 
op, p1, p2, p3, p4);
_PRINT_MSG(rc, "cxl_h_control_function(%#.16llx, %s(%#llx, %#llx, 
%#llx, %#llx, R4: %#lx)): %li\n",
unit_address, OP_STR_AFU(op), p1, p2, p3, p4, retbuf[0], rc);
+   trace_cxl_hcall_control_function(unit_address, OP_STR_AFU(op), p1, p2, 
p3, p4, retbuf[0], rc);
 
switch (rc) {
case H_SUCCESS:   /* The operation is completed for the coherent 
platform function */
@@ -406,6 +410,7 @@ long cxl_h_collect_int_info(u64 unit_address, u64 
process_token,
unit_address, process_token);
_PRINT_MSG(rc, "cxl_h_collect_int_info(%#.16llx, 0x%llx): %li\n",
unit_address, process_token, rc);
+   trace_cxl_hcall_collect_int_info(unit_address, process_token, rc);
 
switch (rc) {
case H_SUCCESS: /* The interrupt info is returned in return 
registers. */
@@ -449,6 +454,8 @@ long cxl_h_control_faults(u64 unit_address, u64 
process_token,
_PRINT_MSG(rc, "cxl_h_control_faults(%#.16llx, 0x%llx, %#llx, %#llx): 
%li (%#lx)\n",
unit_address, process_token, control_mask, reset_mask,
rc, retbuf[0]);
+   trace_cxl_hcall_control_faults(unit_address, process_token,
+   control_mask, reset_mask, retbuf[0], rc);
 
switch (rc) {
case H_SUCCESS:/* Faults were successfully controlled for the 
function. */
@@ -482,6 +489,7 @@ static long cxl_h_control_facility(u64 unit_address, u64 op,
CXL_H9_WAIT_UNTIL_DONE(rc, retbuf, H_CONTROL_CA_FACILITY, unit_address, 
op, p1, p2, p3, p4);
_PRINT_MSG(rc, "cxl_h_control_facility(%#.16llx, %s(%#llx, %#llx, 
%#llx, %#llx, R4: %#lx)): %li\n",
unit_address, OP_STR_CONTROL_ADAPTER(op), p1, p2, p3, p4, 
retbuf[0], rc);
+   trace_cxl_hcall_control_facility(unit_address, 
OP_STR_CONTROL_ADAPTER(op), p1, p2, p3, p4, retbuf[0], rc);
 
switch (rc) {
case H_SUCCESS:   /* The operation is completed for the coherent 
platform facility */
@@ -588,6 +596,7 @@ static long cxl_h_download_facility(u64 unit_address, u64 
op,
}
_PRINT_MSG(rc, "cxl_h_download_facility(%#.16llx, %s(%#llx, %#llx), 
%#lx): %li\n",
 unit_address, OP_STR_DOWNLOAD_ADAPTER(op), list_address, num, 
retbuf[0], rc);
+   trace_cxl_hcall_download_facility(unit_address, 
OP_STR_DOWNLOAD_ADAPTER(op), list_address, num, retbuf[0], rc);
 
switch (rc) {
case H_SUCCESS:   /* The operation is completed for the coherent 
platform facility */
diff --git a/drivers/misc/cxl/trace.h b/drivers/misc/cxl/trace.h
index 6e1e2ad..751d611 100644
--- a/drivers/misc/cxl/trace.h
+++ b/drivers/misc/cxl/trace.h
@@ -450,6 +450,199 @@ DEFINE_EVENT(cxl_pe_class, cxl_slbia,
TP_ARGS(ctx)
 );
 
+TRACE_EVENT(cxl_hcall,
+   TP_PROTO(u64 unit_address, u64 process_token, long rc),
+
+   TP_ARGS(unit_address, process_token, rc),
+
+   

[PATCH v4 17/18] cxl: Adapter failure handling

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

Check the AFU state whenever an API is called. The hypervisor may
issue a reset of the adapter when it detects a fault. When it happens,
it launches an error recovery which will either move the AFU to a
permanent failure state, or in the disabled state.
If the AFU is found to be disabled, detach all existing contexts from
it before issuing a AFU reset to re-enable it.

Before detaching contexts, notify any kernel driver through the EEH
callbacks of the AFU pci device.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/context.c |   2 +-
 drivers/misc/cxl/cxl.h |  18 ++---
 drivers/misc/cxl/file.c|  10 +--
 drivers/misc/cxl/guest.c   | 167 -
 drivers/misc/cxl/main.c|   2 +-
 drivers/misc/cxl/native.c  |  32 -
 drivers/misc/cxl/vphb.c|   2 +-
 7 files changed, 198 insertions(+), 35 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 180c85a..10370f2 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -220,7 +220,7 @@ int __detach_context(struct cxl_context *ctx)
 * If detach fails when hw is down, we don't care.
 */
WARN_ON(cxl_ops->detach_process(ctx) &&
-   cxl_ops->link_ok(ctx->afu->adapter));
+   cxl_ops->link_ok(ctx->afu->adapter, ctx->afu));
flush_work(>fault_work); /* Only needed for dedicated process */
 
/* release the reference to the group leader and mm handling pid */
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 6c2521c..e9150c3 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -373,6 +373,8 @@ struct cxl_afu_guest {
phys_addr_t p2n_phys;
u64 p2n_size;
int max_ints;
+   struct mutex recovery_lock;
+   int previous_state;
 };
 
 struct cxl_afu {
@@ -611,7 +613,7 @@ struct cxl_process_element {
__be32 software_state;
 } __packed;
 
-static inline bool cxl_adapter_link_ok(struct cxl *cxl)
+static inline bool cxl_adapter_link_ok(struct cxl *cxl, struct cxl_afu *afu)
 {
struct pci_dev *pdev;
 
@@ -630,13 +632,13 @@ static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, 
cxl_p1_reg_t reg)
 
 static inline void cxl_p1_write(struct cxl *cxl, cxl_p1_reg_t reg, u64 val)
 {
-   if (likely(cxl_adapter_link_ok(cxl)))
+   if (likely(cxl_adapter_link_ok(cxl, NULL)))
out_be64(_cxl_p1_addr(cxl, reg), val);
 }
 
 static inline u64 cxl_p1_read(struct cxl *cxl, cxl_p1_reg_t reg)
 {
-   if (likely(cxl_adapter_link_ok(cxl)))
+   if (likely(cxl_adapter_link_ok(cxl, NULL)))
return in_be64(_cxl_p1_addr(cxl, reg));
else
return ~0ULL;
@@ -650,13 +652,13 @@ static inline void __iomem *_cxl_p1n_addr(struct cxl_afu 
*afu, cxl_p1n_reg_t reg
 
 static inline void cxl_p1n_write(struct cxl_afu *afu, cxl_p1n_reg_t reg, u64 
val)
 {
-   if (likely(cxl_adapter_link_ok(afu->adapter)))
+   if (likely(cxl_adapter_link_ok(afu->adapter, afu)))
out_be64(_cxl_p1n_addr(afu, reg), val);
 }
 
 static inline u64 cxl_p1n_read(struct cxl_afu *afu, cxl_p1n_reg_t reg)
 {
-   if (likely(cxl_adapter_link_ok(afu->adapter)))
+   if (likely(cxl_adapter_link_ok(afu->adapter, afu)))
return in_be64(_cxl_p1n_addr(afu, reg));
else
return ~0ULL;
@@ -669,13 +671,13 @@ static inline void __iomem *_cxl_p2n_addr(struct cxl_afu 
*afu, cxl_p2n_reg_t reg
 
 static inline void cxl_p2n_write(struct cxl_afu *afu, cxl_p2n_reg_t reg, u64 
val)
 {
-   if (likely(cxl_adapter_link_ok(afu->adapter)))
+   if (likely(cxl_adapter_link_ok(afu->adapter, afu)))
out_be64(_cxl_p2n_addr(afu, reg), val);
 }
 
 static inline u64 cxl_p2n_read(struct cxl_afu *afu, cxl_p2n_reg_t reg)
 {
-   if (likely(cxl_adapter_link_ok(afu->adapter)))
+   if (likely(cxl_adapter_link_ok(afu->adapter, afu)))
return in_be64(_cxl_p2n_addr(afu, reg));
else
return ~0ULL;
@@ -851,7 +853,7 @@ struct cxl_backend_ops {
u64 wed, u64 amr);
int (*detach_process)(struct cxl_context *ctx);
bool (*support_attributes)(const char *attr_name);
-   bool (*link_ok)(struct cxl *cxl);
+   bool (*link_ok)(struct cxl *cxl, struct cxl_afu *afu);
void (*release_afu)(struct device *dev);
ssize_t (*afu_read_err_buffer)(struct cxl_afu *afu, char *buf,
loff_t off, size_t count);
diff --git a/drivers/misc/cxl/file.c b/drivers/misc/cxl/file.c
index e160462..eec468f 100644
--- a/drivers/misc/cxl/file.c
+++ b/drivers/misc/cxl/file.c
@@ -76,7 +76,7 @@ static int __afu_open(struct inode *inode, struct 

[PATCH v4 14/18] cxl: Support to flash a new image on the adapter from a guest

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

The new flash.c file contains the logic to flash a new image on the
adapter, through a hcall. It is an iterative process, with chunks of
data of 1M at a time. There are also 2 phases: write and verify. The
flash operation itself is driven from a user-land tool.

Once flashing is successful, an rtas call is made to update the device
tree with the new properties values for the adapter and the AFU(s)

Add a new char device for the adapter, so that the flash tool can
access the card, even if there is no valid AFU on it.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 Documentation/powerpc/cxl.txt |  55 +
 drivers/misc/cxl/Makefile |   2 +-
 drivers/misc/cxl/base.c   |   7 +
 drivers/misc/cxl/cxl.h|   6 +
 drivers/misc/cxl/file.c   |  11 +-
 drivers/misc/cxl/flash.c  | 538 ++
 drivers/misc/cxl/guest.c  |  15 ++
 include/uapi/misc/cxl.h   |  24 ++
 8 files changed, 653 insertions(+), 5 deletions(-)
 create mode 100644 drivers/misc/cxl/flash.c

diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt
index 205c1b8..1440044 100644
--- a/Documentation/powerpc/cxl.txt
+++ b/Documentation/powerpc/cxl.txt
@@ -116,6 +116,8 @@ Work Element Descriptor (WED)
 User API
 
 
+1. AFU character devices
+
 For AFUs operating in AFU directed mode, two character device
 files will be created. /dev/cxl/afu0.0m will correspond to a
 master context and /dev/cxl/afu0.0s will correspond to a slave
@@ -362,6 +364,59 @@ read
 reserved fields:
 For future extensions and padding
 
+
+2. Card character device (powerVM guest only)
+
+In a powerVM guest, an extra character device is created for the
+card. The device is only used to write (flash) a new image on the
+FPGA accelerator. Once the image is written and verified, the
+device tree is updated and the card is reset to reload the updated
+image.
+
+open
+
+
+Opens the device and allocates a file descriptor to be used with
+the rest of the API. The device can only be opened once.
+
+ioctl
+-
+
+CXL_IOCTL_DOWNLOAD_IMAGE:
+CXL_IOCTL_VALIDATE_IMAGE:
+Starts and controls flashing a new FPGA image. Partial
+reconfiguration is not supported (yet), so the image must contain
+a copy of the PSL and AFU(s). Since an image can be quite large,
+the caller may have to iterate, splitting the image in smaller
+chunks.
+
+Takes a pointer to a struct cxl_adapter_image:
+struct cxl_adapter_image {
+__u64 flags;
+__u64 *data;
+__u64 len_data;
+__u64 len_image;
+__u64 reserved1;
+__u64 reserved2;
+__u64 reserved3;
+__u64 reserved4;
+};
+
+flags:
+These flags indicate which optional fields are present in
+this struct. Currently all fields are mandatory.
+
+data:
+Pointer to a buffer with part of the image to write to the
+card.
+
+len_data:
+Size of the buffer pointed to by data.
+
+len_image:
+Full size of the image.
+
+
 Sysfs Class
 ===
 
diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index a3d4bef..9ab874f 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -4,7 +4,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
 cxl-y  += main.o file.o irq.o fault.o native.o
 cxl-y  += context.o sysfs.o debugfs.o pci.o trace.o
 cxl-y  += vphb.o api.o
-cxl-y  += guest.o of.o hcalls.o
+cxl-y  += flash.o guest.o of.o hcalls.o
 obj-$(CONFIG_CXL)  += cxl.o
 obj-$(CONFIG_CXL_BASE) += base.o
 
diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index a9f0dd3..957f4dd 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -84,3 +84,10 @@ void unregister_cxl_calls(struct cxl_calls *calls)
synchronize_rcu();
 }
 EXPORT_SYMBOL_GPL(unregister_cxl_calls);
+
+int cxl_update_properties(struct device_node *dn,
+ struct property *new_prop)
+{
+   return of_update_property(dn, new_prop);
+}
+EXPORT_SYMBOL_GPL(cxl_update_properties);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index e54bf4f..0bf536c 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -324,6 +324,10 @@ static const cxl_p2n_reg_t CXL_PSL_WED_An = {0x0A0};
 #define CXL_MODE_TIME_SLICED 0x4
 #define CXL_SUPPORTED_MODES (CXL_MODE_DEDICATED | CXL_MODE_DIRECTED)
 
+#define CXL_DEV_MINORS 13   /* 1 control + 4 AFUs * 3 
(dedicated/master/shared) */
+#define CXL_CARD_MINOR(adapter) (adapter->adapter_num * CXL_DEV_MINORS)
+#define 

[PATCH v4 15/18] cxl: Parse device tree and create CAPI device(s) at boot

2016-02-16 Thread Frederic Barrat
Add new entry point to scan the device tree at boot in a guest,
looking for CAPI devices.

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/base.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/misc/cxl/base.c b/drivers/misc/cxl/base.c
index 957f4dd..9b90ec6 100644
--- a/drivers/misc/cxl/base.c
+++ b/drivers/misc/cxl/base.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "cxl.h"
 
 /* protected by rcu */
@@ -91,3 +92,27 @@ int cxl_update_properties(struct device_node *dn,
return of_update_property(dn, new_prop);
 }
 EXPORT_SYMBOL_GPL(cxl_update_properties);
+
+static int __init cxl_base_init(void)
+{
+   struct device_node *np = NULL;
+   struct platform_device *dev;
+   int count = 0;
+
+   /*
+* Scan for compatible devices in guest only
+*/
+   if (cpu_has_feature(CPU_FTR_HVMODE))
+   return 0;
+
+   while ((np = of_find_compatible_node(np, NULL,
+"ibm,coherent-platform-facility"))) {
+   dev = of_platform_device_create(np, NULL, NULL);
+   if (dev)
+   count++;
+   }
+   pr_devel("Found %d cxl device(s)\n", count);
+   return 0;
+}
+
+module_init(cxl_base_init);
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 16/18] cxl: Support the cxl kernel API from a guest

2016-02-16 Thread Frederic Barrat
Like on bare-metal, the cxl driver creates a virtual PHB and a pci
device for the AFU. The configuration space of the device is mapped to
the configuration record of the AFU.

Reuse the code defined in afu_cr_read8|16|32() when reading the
configuration space of the AFU device.

Even though the (virtual) AFU device is a pci device, the adapter is
not. So a driver using the cxl kernel API cannot read the VPD of the
adapter through the usual PCI interface. Therefore, we add a call to
the cxl kernel API:
ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count);

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/api.c|  63 ++-
 drivers/misc/cxl/cxl.h|   6 +-
 drivers/misc/cxl/guest.c  |  26 
 drivers/misc/cxl/native.c |  50 +++
 drivers/misc/cxl/pci.c|   9 ++-
 drivers/misc/cxl/vphb.c   | 154 +++---
 include/misc/cxl.h|   5 ++
 7 files changed, 203 insertions(+), 110 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 325f957..75ec2f9 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -89,28 +89,11 @@ int cxl_release_context(struct cxl_context *ctx)
 }
 EXPORT_SYMBOL_GPL(cxl_release_context);
 
-int cxl_allocate_afu_irqs(struct cxl_context *ctx, int num)
-{
-   if (num == 0)
-   num = ctx->afu->pp_irqs;
-   return afu_allocate_irqs(ctx, num);
-}
-EXPORT_SYMBOL_GPL(cxl_allocate_afu_irqs);
-
-void cxl_free_afu_irqs(struct cxl_context *ctx)
-{
-   afu_irq_name_free(ctx);
-   cxl_ops->release_irq_ranges(>irqs, ctx->afu->adapter);
-}
-EXPORT_SYMBOL_GPL(cxl_free_afu_irqs);
-
 static irq_hw_number_t cxl_find_afu_irq(struct cxl_context *ctx, int num)
 {
__u16 range;
int r;
 
-   WARN_ON(num == 0);
-
for (r = 0; r < CXL_IRQ_RANGES; r++) {
range = ctx->irqs.range[r];
if (num < range) {
@@ -121,6 +104,44 @@ static irq_hw_number_t cxl_find_afu_irq(struct cxl_context 
*ctx, int num)
return 0;
 }
 
+int cxl_allocate_afu_irqs(struct cxl_context *ctx, int num)
+{
+   int res;
+   irq_hw_number_t hwirq;
+
+   if (num == 0)
+   num = ctx->afu->pp_irqs;
+   res = afu_allocate_irqs(ctx, num);
+   if (!res && !cpu_has_feature(CPU_FTR_HVMODE)) {
+   /* In a guest, the PSL interrupt is not multiplexed. It was
+* allocated above, and we need to set its handler
+*/
+   hwirq = cxl_find_afu_irq(ctx, 0);
+   if (hwirq)
+   cxl_map_irq(ctx->afu->adapter, hwirq, 
cxl_ops->psl_interrupt, ctx, "psl");
+   }
+   return res;
+}
+EXPORT_SYMBOL_GPL(cxl_allocate_afu_irqs);
+
+void cxl_free_afu_irqs(struct cxl_context *ctx)
+{
+   irq_hw_number_t hwirq;
+   unsigned int virq;
+
+   if (!cpu_has_feature(CPU_FTR_HVMODE)) {
+   hwirq = cxl_find_afu_irq(ctx, 0);
+   if (hwirq) {
+   virq = irq_find_mapping(NULL, hwirq);
+   if (virq)
+   cxl_unmap_irq(virq, ctx);
+   }
+   }
+   afu_irq_name_free(ctx);
+   cxl_ops->release_irq_ranges(>irqs, ctx->afu->adapter);
+}
+EXPORT_SYMBOL_GPL(cxl_free_afu_irqs);
+
 int cxl_map_afu_irq(struct cxl_context *ctx, int num,
irq_handler_t handler, void *cookie, char *name)
 {
@@ -356,3 +377,11 @@ void cxl_perst_reloads_same_image(struct cxl_afu *afu,
afu->adapter->perst_same_image = perst_reloads_same_image;
 }
 EXPORT_SYMBOL_GPL(cxl_perst_reloads_same_image);
+
+ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count)
+{
+   struct cxl_afu *afu = cxl_pci_to_afu(dev);
+
+   return cxl_ops->read_adapter_vpd(afu->adapter, buf, count);
+}
+EXPORT_SYMBOL_GPL(cxl_read_adapter_vpd);
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 0bf536c..6c2521c 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -581,6 +581,7 @@ int cxl_pci_setup_irq(struct cxl *adapter, unsigned int 
hwirq, unsigned int virq
 int cxl_update_image_control(struct cxl *adapter);
 int cxl_pci_reset(struct cxl *adapter);
 void cxl_pci_release_afu(struct device *dev);
+ssize_t cxl_pci_read_adapter_vpd(struct cxl *adapter, void *buf, size_t len);
 
 /* common == phyp + powernv */
 struct cxl_process_element_common {
@@ -802,7 +803,6 @@ int cxl_psl_purge(struct cxl_afu *afu);
 
 void cxl_stop_trace(struct cxl *cxl);
 int cxl_pci_vphb_add(struct cxl_afu *afu);
-void cxl_pci_vphb_reconfigure(struct cxl_afu *afu);
 void cxl_pci_vphb_remove(struct cxl_afu *afu);
 
 extern struct pci_driver cxl_pci_driver;
@@ -863,6 +863,10 @@ struct cxl_backend_ops {
int (*afu_cr_read16)(struct cxl_afu *afu, int cr_idx, u64 

[PATCH v4 13/18] cxl: sysfs support for guests

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

Filter out a few adapter parameters which don't make sense in a guest.
Document the changes.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 Documentation/ABI/testing/sysfs-class-cxl |  8 +++
 drivers/misc/cxl/cxl.h|  1 +
 drivers/misc/cxl/guest.c  | 12 +++
 drivers/misc/cxl/native.c |  6 ++
 drivers/misc/cxl/sysfs.c  | 36 +++
 5 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-cxl 
b/Documentation/ABI/testing/sysfs-class-cxl
index b07e86d..4d0da47 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -183,7 +183,7 @@ Description:read only
 Identifies the revision level of the PSL.
 Users: https://github.com/ibm-capi/libcxl
 
-What:   /sys/class/cxl//base_image
+What:   /sys/class/cxl//base_image (not in a guest)
 Date:   September 2014
 Contact:linuxppc-dev@lists.ozlabs.org
 Description:read only
@@ -193,7 +193,7 @@ Description:read only
 during the initial program load.
 Users: https://github.com/ibm-capi/libcxl
 
-What:   /sys/class/cxl//image_loaded
+What:   /sys/class/cxl//image_loaded (not in a guest)
 Date:   September 2014
 Contact:linuxppc-dev@lists.ozlabs.org
 Description:read only
@@ -201,7 +201,7 @@ Description:read only
 onto the card.
 Users: https://github.com/ibm-capi/libcxl
 
-What:   /sys/class/cxl//load_image_on_perst
+What:   /sys/class/cxl//load_image_on_perst (not in a guest)
 Date:   December 2014
 Contact:linuxppc-dev@lists.ozlabs.org
 Description:read/write
@@ -224,7 +224,7 @@ Description:write only
 to reload the FPGA depending on load_image_on_perst.
 Users: https://github.com/ibm-capi/libcxl
 
-What:  /sys/class/cxl//perst_reloads_same_image
+What:  /sys/class/cxl//perst_reloads_same_image (not in a guest)
 Date:  July 2015
 Contact:   linuxppc-dev@lists.ozlabs.org
 Description:   read/write
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 4372a87..e54bf4f 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -844,6 +844,7 @@ struct cxl_backend_ops {
int (*attach_process)(struct cxl_context *ctx, bool kernel,
u64 wed, u64 amr);
int (*detach_process)(struct cxl_context *ctx);
+   bool (*support_attributes)(const char *attr_name);
bool (*link_ok)(struct cxl *cxl);
void (*release_afu)(struct device *dev);
ssize_t (*afu_read_err_buffer)(struct cxl_afu *afu, char *buf,
diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
index 03eb83d..d02ff03 100644
--- a/drivers/misc/cxl/guest.c
+++ b/drivers/misc/cxl/guest.c
@@ -596,6 +596,17 @@ static int guest_afu_check_and_enable(struct cxl_afu *afu)
return 0;
 }
 
+static bool guest_support_attributes(const char *attr_name)
+{
+   if ((strcmp(attr_name, "base_image") == 0) ||
+   (strcmp(attr_name, "load_image_on_perst") == 0) ||
+   (strcmp(attr_name, "perst_reloads_same_image") == 0) ||
+   (strcmp(attr_name, "image_loaded") == 0))
+   return false;
+
+   return true;
+}
+
 static int activate_afu_directed(struct cxl_afu *afu)
 {
int rc;
@@ -936,6 +947,7 @@ const struct cxl_backend_ops cxl_guest_ops = {
.ack_irq = guest_ack_irq,
.attach_process = guest_attach_process,
.detach_process = guest_detach_process,
+   .support_attributes = guest_support_attributes,
.link_ok = guest_link_ok,
.release_afu = guest_release_afu,
.afu_read_err_buffer = guest_afu_read_err_buffer,
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index c0bca59..acb9486 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -967,6 +967,11 @@ int cxl_check_error(struct cxl_afu *afu)
return (cxl_p1n_read(afu, CXL_PSL_SCNTL_An) == ~0ULL);
 }
 
+static bool native_support_attributes(const char *attr_name)
+{
+   return true;
+}
+
 static int native_afu_cr_read64(struct cxl_afu *afu, int cr, u64 off, u64 *out)
 {
if (unlikely(!cxl_ops->link_ok(afu->adapter)))
@@ -1026,6 +1031,7 @@ const struct cxl_backend_ops cxl_native_ops = {
.ack_irq = native_ack_irq,
.attach_process = native_attach_process,
.detach_process = native_detach_process,
+   .support_attributes = native_support_attributes,
.link_ok = cxl_adapter_link_ok,
.release_afu = cxl_pci_release_afu,
.afu_read_err_buffer = 

[PATCH v4 12/18] cxl: Add guest-specific code

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

The new of.c file contains code to parse the device tree to find out
about CAPI adapters and AFUs.

guest.c implements the guest-specific callbacks for the backend API.

The process element ID is not known until the context is attached, so
we have to separate the context ID assigned by the cxl driver from the
process element ID visible to the user applications. In bare-metal,
the 2 IDs match.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/Makefile  |   1 +
 drivers/misc/cxl/api.c |   2 +-
 drivers/misc/cxl/context.c |   6 +-
 drivers/misc/cxl/cxl.h |  37 +-
 drivers/misc/cxl/file.c|   2 +-
 drivers/misc/cxl/guest.c   | 950 +
 drivers/misc/cxl/main.c|  18 +-
 drivers/misc/cxl/of.c  | 513 
 8 files changed, 1519 insertions(+), 10 deletions(-)
 create mode 100644 drivers/misc/cxl/guest.c
 create mode 100644 drivers/misc/cxl/of.c

diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile
index be2ac5c..a3d4bef 100644
--- a/drivers/misc/cxl/Makefile
+++ b/drivers/misc/cxl/Makefile
@@ -4,6 +4,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
 cxl-y  += main.o file.o irq.o fault.o native.o
 cxl-y  += context.o sysfs.o debugfs.o pci.o trace.o
 cxl-y  += vphb.o api.o
+cxl-y  += guest.o of.o hcalls.o
 obj-$(CONFIG_CXL)  += cxl.o
 obj-$(CONFIG_CXL_BASE) += base.o
 
diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index 31eb842..325f957 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(cxl_start_context);
 
 int cxl_process_element(struct cxl_context *ctx)
 {
-   return ctx->pe;
+   return ctx->external_pe;
 }
 EXPORT_SYMBOL_GPL(cxl_process_element);
 
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 200837f..180c85a 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -95,8 +95,12 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu 
*afu, bool master,
return i;
 
ctx->pe = i;
-   if (cpu_has_feature(CPU_FTR_HVMODE))
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
ctx->elem = >afu->native->spa[i];
+   ctx->external_pe = ctx->pe;
+   } else {
+   ctx->external_pe = -1; /* assigned when attaching */
+   }
ctx->pe_inserted = false;
 
/*
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 3a1fabd..4372a87 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -433,6 +433,12 @@ struct cxl_irq_name {
char *name;
 };
 
+struct irq_avail {
+   irq_hw_number_t offset;
+   irq_hw_number_t range;
+   unsigned long   *bitmap;
+};
+
 /*
  * This is a cxl context.  If the PSL is in dedicated mode, there will be one
  * of these per AFU.  If in AFU directed there can be lots of these.
@@ -488,7 +494,19 @@ struct cxl_context {
 
struct cxl_process_element *elem;
 
-   int pe; /* process element handle */
+   /*
+* pe is the process element handle, assigned by this driver when the
+* context is initialized.
+*
+* external_pe is the PE shown outside of cxl.
+* On bare-metal, pe=external_pe, because we decide what the handle is.
+* In a guest, we only find out about the pe used by pHyp when the
+* context is attached, and that's the value we want to report outside
+* of cxl.
+*/
+   int pe;
+   int external_pe;
+
u32 irq_count;
bool pe_inserted;
bool master;
@@ -782,6 +800,7 @@ void cxl_pci_vphb_reconfigure(struct cxl_afu *afu);
 void cxl_pci_vphb_remove(struct cxl_afu *afu);
 
 extern struct pci_driver cxl_pci_driver;
+extern struct platform_driver cxl_of_driver;
 int afu_allocate_irqs(struct cxl_context *ctx, u32 count);
 
 int afu_open(struct inode *inode, struct file *file);
@@ -792,6 +811,21 @@ unsigned int afu_poll(struct file *file, struct 
poll_table_struct *poll);
 ssize_t afu_read(struct file *file, char __user *buf, size_t count, loff_t 
*off);
 extern const struct file_operations afu_fops;
 
+struct cxl *cxl_guest_init_adapter(struct device_node *np, struct 
platform_device *dev);
+void cxl_guest_remove_adapter(struct cxl *adapter);
+int cxl_of_read_adapter_handle(struct cxl *adapter, struct device_node *np);
+int cxl_of_read_adapter_properties(struct cxl *adapter, struct device_node 
*np);
+ssize_t cxl_guest_read_adapter_vpd(struct cxl *adapter, void *buf, size_t len);
+ssize_t cxl_guest_read_afu_vpd(struct cxl_afu *afu, void *buf, size_t len);
+int cxl_guest_init_afu(struct cxl *adapter, int slice, struct device_node 

[PATCH v4 11/18] cxl: Separate bare-metal fields in adapter and AFU data structures

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

Introduce sub-structures containing the bare-metal specific fields in
the structures describing the adapter (struct cxl) and AFU (struct
cxl_afu).
Update all their references.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/context.c |  2 +-
 drivers/misc/cxl/cxl.h | 84 +++-
 drivers/misc/cxl/irq.c |  2 +-
 drivers/misc/cxl/main.c|  1 -
 drivers/misc/cxl/native.c  | 85 
 drivers/misc/cxl/pci.c | 96 --
 drivers/misc/cxl/sysfs.c   |  2 +-
 drivers/misc/cxl/vphb.c|  4 +-
 8 files changed, 165 insertions(+), 111 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 46f9844..200837f 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -96,7 +96,7 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu 
*afu, bool master,
 
ctx->pe = i;
if (cpu_has_feature(CPU_FTR_HVMODE))
-   ctx->elem = >afu->spa[i];
+   ctx->elem = >afu->native->spa[i];
ctx->pe_inserted = false;
 
/*
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index ac655a6..3a1fabd 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -344,18 +344,44 @@ struct cxl_sste {
 #define to_cxl_adapter(d) container_of(d, struct cxl, dev)
 #define to_cxl_afu(d) container_of(d, struct cxl_afu, dev)
 
-struct cxl_afu {
+struct cxl_afu_native {
+   void __iomem *p1n_mmio;
+   void __iomem *afu_desc_mmio;
irq_hw_number_t psl_hwirq;
+   unsigned int psl_virq;
+   struct mutex spa_mutex;
+   /*
+* Only the first part of the SPA is used for the process element
+* linked list. The only other part that software needs to worry about
+* is sw_command_status, which we store a separate pointer to.
+* Everything else in the SPA is only used by hardware
+*/
+   struct cxl_process_element *spa;
+   __be64 *sw_command_status;
+   unsigned int spa_size;
+   int spa_order;
+   int spa_max_procs;
+   u64 pp_offset;
+};
+
+struct cxl_afu_guest {
+   u64 handle;
+   phys_addr_t p2n_phys;
+   u64 p2n_size;
+   int max_ints;
+};
+
+struct cxl_afu {
+   struct cxl_afu_native *native;
+   struct cxl_afu_guest *guest;
irq_hw_number_t serr_hwirq;
-   char *err_irq_name;
-   char *psl_irq_name;
unsigned int serr_virq;
-   void __iomem *p1n_mmio;
+   char *psl_irq_name;
+   char *err_irq_name;
void __iomem *p2n_mmio;
phys_addr_t psn_phys;
-   u64 pp_offset;
u64 pp_size;
-   void __iomem *afu_desc_mmio;
+
struct cxl *adapter;
struct device dev;
struct cdev afu_cdev_s, afu_cdev_m, afu_cdev_d;
@@ -363,26 +389,12 @@ struct cxl_afu {
struct idr contexts_idr;
struct dentry *debugfs;
struct mutex contexts_lock;
-   struct mutex spa_mutex;
spinlock_t afu_cntl_lock;
 
/* AFU error buffer fields and bin attribute for sysfs */
u64 eb_len, eb_offset;
struct bin_attribute attr_eb;
 
-   /*
-* Only the first part of the SPA is used for the process element
-* linked list. The only other part that software needs to worry about
-* is sw_command_status, which we store a separate pointer to.
-* Everything else in the SPA is only used by hardware
-*/
-   struct cxl_process_element *spa;
-   __be64 *sw_command_status;
-   unsigned int spa_size;
-   int spa_order;
-   int spa_max_procs;
-   unsigned int psl_virq;
-
/* pointer to the vphb */
struct pci_controller *phb;
 
@@ -488,11 +500,34 @@ struct cxl_context {
struct rcu_head rcu;
 };
 
-struct cxl {
+struct cxl_native {
+   u64 afu_desc_off;
+   u64 afu_desc_size;
void __iomem *p1_mmio;
void __iomem *p2_mmio;
irq_hw_number_t err_hwirq;
unsigned int err_virq;
+   u64 ps_off;
+};
+
+struct cxl_guest {
+   struct platform_device *pdev;
+   int irq_nranges;
+   struct cdev cdev;
+   irq_hw_number_t irq_base_offset;
+   struct irq_avail *irq_avail;
+   spinlock_t irq_alloc_lock;
+   u64 handle;
+   char *status;
+   u16 vendor;
+   u16 device;
+   u16 subsystem_vendor;
+   u16 subsystem;
+};
+
+struct cxl {
+   struct cxl_native *native;
+   struct cxl_guest *guest;
spinlock_t afu_list_lock;
struct cxl_afu *afu[CXL_MAX_SLICES];
struct device dev;
@@ -503,9 +538,6 @@ struct cxl {
struct bin_attribute cxl_attr;
int adapter_num;
int user_irqs;
-   u64 afu_desc_off;
- 

[PATCH v4 10/18] cxl: New hcalls to support CAPI adapters

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

The hypervisor calls provide an interface with a coherent plaform
facility and function. It matches version 0.16 of the 'PAPR changes'
document.

The following hcalls are supported:
H_ATTACH_CA_PROCESSAttach a process element to a coherent platform
   function.
H_DETACH_CA_PROCESSDetach a process element from a coherent
   platform function.
H_CONTROL_CA_FUNCTION  Allow the partition to manipulate or query
   certain coherent platform function behaviors.
H_COLLECT_CA_INT_INFO  Collect interrupt info about a coherent.
   platform function after an interrupt occurred
H_CONTROL_CA_FAULTSControl the operation of a coherent platform
   function after a fault occurs.
H_DOWNLOAD_CA_FACILITY Support for downloading a base adapter image to
   the coherent platform facility, and for
   validating the entire image after the download.
H_CONTROL_CA_FACILITY  Allow the partition to manipulate or query
   certain coherent platform facility behaviors.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/cxl.h|  23 +-
 drivers/misc/cxl/hcalls.c | 639 ++
 drivers/misc/cxl/hcalls.h | 203 +++
 drivers/misc/cxl/main.c   |  26 ++
 drivers/misc/cxl/native.c |   1 +
 5 files changed, 890 insertions(+), 2 deletions(-)
 create mode 100644 drivers/misc/cxl/hcalls.c
 create mode 100644 drivers/misc/cxl/hcalls.h

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index c7ed265..ac655a6 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -688,6 +688,7 @@ void cxl_prefault(struct cxl_context *ctx, u64 wed);
 
 struct cxl *get_cxl_adapter(int num);
 int cxl_alloc_sst(struct cxl_context *ctx);
+void cxl_dump_debug_buffer(void *addr, size_t size);
 
 void init_cxl_native(void);
 
@@ -701,16 +702,34 @@ unsigned int cxl_map_irq(struct cxl *adapter, 
irq_hw_number_t hwirq,
 void cxl_unmap_irq(unsigned int virq, void *cookie);
 int __detach_context(struct cxl_context *ctx);
 
-/* This matches the layout of the H_COLLECT_CA_INT_INFO retbuf */
+/*
+ * This must match the layout of the H_COLLECT_CA_INT_INFO retbuf defined
+ * in PAPR.
+ * A word about endianness: a pointer to this structure is passed when
+ * calling the hcall. However, it is not a block of memory filled up by
+ * the hypervisor. The return values are found in registers, and copied
+ * one by one when returning from the hcall. See the end of the call to
+ * plpar_hcall9() in hvCall.S
+ * As a consequence:
+ * - we don't need to do any endianness conversion
+ * - the pid and tid are an exception. They are 32-bit values returned in
+ *   the same 64-bit register. So we do need to worry about byte ordering.
+ */
 struct cxl_irq_info {
u64 dsisr;
u64 dar;
u64 dsr;
+#ifndef CONFIG_CPU_LITTLE_ENDIAN
u32 pid;
u32 tid;
+#else
+   u32 tid;
+   u32 pid;
+#endif
u64 afu_err;
u64 errstat;
-   u64 padding[3]; /* to match the expected retbuf size for plpar_hcall9 */
+   u64 proc_handle;
+   u64 padding[2]; /* to match the expected retbuf size for plpar_hcall9 */
 };
 
 void cxl_assign_psn_space(struct cxl_context *ctx);
diff --git a/drivers/misc/cxl/hcalls.c b/drivers/misc/cxl/hcalls.c
new file mode 100644
index 000..f592e80
--- /dev/null
+++ b/drivers/misc/cxl/hcalls.c
@@ -0,0 +1,639 @@
+/*
+ * Copyright 2015 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "hcalls.h"
+
+#define CXL_HCALL_TIMEOUT 6
+#define CXL_HCALL_TIMEOUT_DOWNLOAD 12
+
+#define H_ATTACH_CA_PROCESS0x344
+#define H_CONTROL_CA_FUNCTION  0x348
+#define H_DETACH_CA_PROCESS0x34C
+#define H_COLLECT_CA_INT_INFO  0x350
+#define H_CONTROL_CA_FAULTS0x354
+#define H_DOWNLOAD_CA_FUNCTION 0x35C
+#define H_DOWNLOAD_CA_FACILITY 0x364
+#define H_CONTROL_CA_FACILITY  0x368
+
+#define H_CONTROL_CA_FUNCTION_RESET   1 /* perform a reset */
+#define H_CONTROL_CA_FUNCTION_SUSPEND_PROCESS 2 /* suspend a process 
from being executed */
+#define H_CONTROL_CA_FUNCTION_RESUME_PROCESS  3 /* resume a process to 
be executed */
+#define H_CONTROL_CA_FUNCTION_READ_ERR_STATE  4 /* read the error 
state */
+#define H_CONTROL_CA_FUNCTION_GET_AFU_ERR 5 /* collect the AFU 
error buffer */
+#define H_CONTROL_CA_FUNCTION_GET_CONFIG  6 /* collect 

[PATCH v4 09/18] cxl: New possible return value from hcall

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

The hcalls introduced for CAPI use a possible new value:
H_STATE (invalid state).

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 arch/powerpc/include/asm/hvcall.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b54dd..0bc9c28 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -94,6 +94,7 @@
 #define H_SG_LIST  -72
 #define H_OP_MODE  -73
 #define H_COP_HW   -74
+#define H_STATE-75
 #define H_UNSUPPORTED_FLAG_START   -256
 #define H_UNSUPPORTED_FLAG_END -511
 #define H_MULTI_THREADS_ACTIVE -9005
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 08/18] cxl: IRQ allocation for guests

2016-02-16 Thread Frederic Barrat
The PSL interrupt is not going to be multiplexed in a guest, so an
interrupt will be allocated for it for each context. It will still be
the first interrupt found in the first interrupt range, but is treated
almost like any other AFU interrupt when creating/deleting the
context. Only the handler is different. Rework the code so that the
range 0 is treated like the other ranges.

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/irq.c | 78 +-
 1 file changed, 64 insertions(+), 14 deletions(-)

diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 5033869..3c04c14 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -19,6 +19,13 @@
 #include "cxl.h"
 #include "trace.h"
 
+static int afu_irq_range_start(void)
+{
+   if (cpu_has_feature(CPU_FTR_HVMODE))
+   return 1;
+   return 0;
+}
+
 static irqreturn_t schedule_cxl_fault(struct cxl_context *ctx, u64 dsisr, u64 
dar)
 {
ctx->dsisr = dsisr;
@@ -117,11 +124,23 @@ static irqreturn_t cxl_irq_afu(int irq, void *data)
 {
struct cxl_context *ctx = data;
irq_hw_number_t hwirq = irqd_to_hwirq(irq_get_irq_data(irq));
-   int irq_off, afu_irq = 1;
+   int irq_off, afu_irq = 0;
__u16 range;
int r;
 
-   for (r = 1; r < CXL_IRQ_RANGES; r++) {
+   /*
+* Look for the interrupt number.
+* On bare-metal, we know range 0 only contains the PSL
+* interrupt so we could start counting at range 1 and initialize
+* afu_irq at 1.
+* In a guest, range 0 also contains AFU interrupts, so it must
+* be counted for. Therefore we initialize afu_irq at 0 to take into
+* account the PSL interrupt.
+*
+* For code-readability, it just seems easier to go over all
+* the ranges on bare-metal and guest. The end result is the same.
+*/
+   for (r = 0; r < CXL_IRQ_RANGES; r++) {
irq_off = hwirq - ctx->irqs.offset[r];
range = ctx->irqs.range[r];
if (irq_off >= 0 && irq_off < range) {
@@ -131,7 +150,7 @@ static irqreturn_t cxl_irq_afu(int irq, void *data)
afu_irq += range;
}
if (unlikely(r >= CXL_IRQ_RANGES)) {
-   WARN(1, "Recieved AFU IRQ out of range for pe %i (virq %i hwirq 
%lx)\n",
+   WARN(1, "Received AFU IRQ out of range for pe %i (virq %i hwirq 
%lx)\n",
 ctx->pe, irq, hwirq);
return IRQ_HANDLED;
}
@@ -141,7 +160,7 @@ static irqreturn_t cxl_irq_afu(int irq, void *data)
   afu_irq, ctx->pe, irq, hwirq);
 
if (unlikely(!ctx->irq_bitmap)) {
-   WARN(1, "Recieved AFU IRQ for context with no IRQ bitmap\n");
+   WARN(1, "Received AFU IRQ for context with no IRQ bitmap\n");
return IRQ_HANDLED;
}
spin_lock(>lock);
@@ -227,17 +246,33 @@ int afu_allocate_irqs(struct cxl_context *ctx, u32 count)
 {
int rc, r, i, j = 1;
struct cxl_irq_name *irq_name;
+   int alloc_count;
+
+   /*
+* In native mode, range 0 is reserved for the multiplexed
+* PSL interrupt. It has been allocated when the AFU was initialized.
+*
+* In a guest, the PSL interrupt is not mutliplexed, but per-context,
+* and is the first interrupt from range 0. It still needs to be
+* allocated, so bump the count by one.
+*/
+   if (cpu_has_feature(CPU_FTR_HVMODE))
+   alloc_count = count;
+   else
+   alloc_count = count + 1;
 
/* Initialize the list head to hold irq names */
INIT_LIST_HEAD(>irq_names);
 
if ((rc = cxl_ops->alloc_irq_ranges(>irqs, ctx->afu->adapter,
-   count)))
+   alloc_count)))
return rc;
 
-   /* Multiplexed PSL Interrupt */
-   ctx->irqs.offset[0] = ctx->afu->psl_hwirq;
-   ctx->irqs.range[0] = 1;
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   /* Multiplexed PSL Interrupt */
+   ctx->irqs.offset[0] = ctx->afu->psl_hwirq;
+   ctx->irqs.range[0] = 1;
+   }
 
ctx->irq_count = count;
ctx->irq_bitmap = kcalloc(BITS_TO_LONGS(count),
@@ -249,7 +284,7 @@ int afu_allocate_irqs(struct cxl_context *ctx, u32 count)
 * Allocate names first.  If any fail, bail out before allocating
 * actual hardware IRQs.
 */
-   for (r = 1; r < CXL_IRQ_RANGES; r++) {
+   for (r = afu_irq_range_start(); r < CXL_IRQ_RANGES; r++) {
for (i = 0; i < ctx->irqs.range[r]; i++) {
irq_name = kmalloc(sizeof(struct cxl_irq_name),
   

[PATCH v4 06/18] cxl: Isolate a few bare-metal-specific calls

2016-02-16 Thread Frederic Barrat
A few functions are mostly common between bare-metal and guest and
just need minor tuning. To avoid crowding the backend API, introduce a
few 'if' based on the CPU being in HV mode.

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/context.c |  3 ++-
 drivers/misc/cxl/cxl.h |  7 +--
 drivers/misc/cxl/debugfs.c |  4 
 drivers/misc/cxl/fault.c   | 19 +++
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index aa65262..46f9844 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -95,7 +95,8 @@ int cxl_context_init(struct cxl_context *ctx, struct cxl_afu 
*afu, bool master,
return i;
 
ctx->pe = i;
-   ctx->elem = >afu->spa[i];
+   if (cpu_has_feature(CPU_FTR_HVMODE))
+   ctx->elem = >afu->spa[i];
ctx->pe_inserted = false;
 
/*
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 02065b4..40f6783 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -560,8 +560,11 @@ static inline bool cxl_adapter_link_ok(struct cxl *cxl)
 {
struct pci_dev *pdev;
 
-   pdev = to_pci_dev(cxl->dev.parent);
-   return !pci_channel_offline(pdev);
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   pdev = to_pci_dev(cxl->dev.parent);
+   return !pci_channel_offline(pdev);
+   }
+   return true;
 }
 
 static inline void __iomem *_cxl_p1_addr(struct cxl *cxl, cxl_p1_reg_t reg)
diff --git a/drivers/misc/cxl/debugfs.c b/drivers/misc/cxl/debugfs.c
index 18df6f4..5751899 100644
--- a/drivers/misc/cxl/debugfs.c
+++ b/drivers/misc/cxl/debugfs.c
@@ -118,6 +118,10 @@ void cxl_debugfs_afu_remove(struct cxl_afu *afu)
 int __init cxl_debugfs_init(void)
 {
struct dentry *ent;
+
+   if (!cpu_has_feature(CPU_FTR_HVMODE))
+   return 0;
+
ent = debugfs_create_dir("cxl", NULL);
if (IS_ERR(ent))
return PTR_ERR(ent);
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index ab740a1..9a8650b 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -254,14 +254,17 @@ void cxl_handle_fault(struct work_struct *fault_work)
u64 dar = ctx->dar;
struct mm_struct *mm = NULL;
 
-   if (cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An) != dsisr ||
-   cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An) != dar ||
-   cxl_p2n_read(ctx->afu, CXL_PSL_PEHandle_An) != ctx->pe) {
-   /* Most likely explanation is harmless - a dedicated process
-* has detached and these were cleared by the PSL purge, but
-* warn about it just in case */
-   dev_notice(>afu->dev, "cxl_handle_fault: Translation fault 
regs changed\n");
-   return;
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   if (cxl_p2n_read(ctx->afu, CXL_PSL_DSISR_An) != dsisr ||
+   cxl_p2n_read(ctx->afu, CXL_PSL_DAR_An) != dar ||
+   cxl_p2n_read(ctx->afu, CXL_PSL_PEHandle_An) != ctx->pe) {
+   /* Most likely explanation is harmless - a dedicated
+* process has detached and these were cleared by the
+* PSL purge, but warn about it just in case
+*/
+   dev_notice(>afu->dev, "cxl_handle_fault: 
Translation fault regs changed\n");
+   return;
+   }
}
 
/* Early return if the context is being / has been detached */
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 07/18] cxl: Update cxl_irq() prototype

2016-02-16 Thread Frederic Barrat
The context parameter when calling cxl_irq() should be strongly typed.

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/cxl.h | 2 +-
 drivers/misc/cxl/irq.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 40f6783..c7ed265 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -714,7 +714,7 @@ struct cxl_irq_info {
 };
 
 void cxl_assign_psn_space(struct cxl_context *ctx);
-irqreturn_t cxl_irq(int irq, void *ctx, struct cxl_irq_info *irq_info);
+irqreturn_t cxl_irq(int irq, struct cxl_context *ctx, struct cxl_irq_info 
*irq_info);
 int cxl_register_one_irq(struct cxl *adapter, irq_handler_t handler,
void *cookie, irq_hw_number_t *dest_hwirq,
unsigned int *dest_virq, const char *name);
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 56ad301..5033869 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -27,9 +27,8 @@ static irqreturn_t schedule_cxl_fault(struct cxl_context 
*ctx, u64 dsisr, u64 da
return IRQ_HANDLED;
 }
 
-irqreturn_t cxl_irq(int irq, void *data, struct cxl_irq_info *irq_info)
+irqreturn_t cxl_irq(int irq, struct cxl_context *ctx, struct cxl_irq_info 
*irq_info)
 {
-   struct cxl_context *ctx = data;
u64 dsisr, dar;
 
dsisr = irq_info->dsisr;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 04/18] cxl: Introduce implementation-specific API

2016-02-16 Thread Frederic Barrat
The backend API (in cxl.h) lists some low-level functions whose
implementation is different on bare-metal and in a guest. Each
environment implements its own functions, and the common code uses
them through function pointers, defined in cxl_backend_ops

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/api.c |   8 +--
 drivers/misc/cxl/context.c |   4 +-
 drivers/misc/cxl/cxl.h |  53 +++---
 drivers/misc/cxl/fault.c   |   6 +-
 drivers/misc/cxl/file.c|  15 ++---
 drivers/misc/cxl/irq.c |  19 ---
 drivers/misc/cxl/main.c|  11 ++--
 drivers/misc/cxl/native.c  | 135 -
 drivers/misc/cxl/pci.c |  16 +++---
 drivers/misc/cxl/sysfs.c   |  32 +++
 drivers/misc/cxl/vphb.c|   6 +-
 11 files changed, 185 insertions(+), 120 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index b45d857..31eb842 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -100,7 +100,7 @@ EXPORT_SYMBOL_GPL(cxl_allocate_afu_irqs);
 void cxl_free_afu_irqs(struct cxl_context *ctx)
 {
afu_irq_name_free(ctx);
-   cxl_release_irq_ranges(>irqs, ctx->afu->adapter);
+   cxl_ops->release_irq_ranges(>irqs, ctx->afu->adapter);
 }
 EXPORT_SYMBOL_GPL(cxl_free_afu_irqs);
 
@@ -176,7 +176,7 @@ int cxl_start_context(struct cxl_context *ctx, u64 wed,
 
cxl_ctx_get();
 
-   if ((rc = cxl_attach_process(ctx, kernel, wed , 0))) {
+   if ((rc = cxl_ops->attach_process(ctx, kernel, wed, 0))) {
put_pid(ctx->pid);
cxl_ctx_put();
goto out;
@@ -342,11 +342,11 @@ int cxl_afu_reset(struct cxl_context *ctx)
struct cxl_afu *afu = ctx->afu;
int rc;
 
-   rc = __cxl_afu_reset(afu);
+   rc = cxl_ops->afu_reset(afu);
if (rc)
return rc;
 
-   return cxl_afu_check_and_enable(afu);
+   return cxl_ops->afu_check_and_enable(afu);
 }
 EXPORT_SYMBOL_GPL(cxl_afu_reset);
 
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 262b88e..aa65262 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -214,8 +214,8 @@ int __detach_context(struct cxl_context *ctx)
/* Only warn if we detached while the link was OK.
 * If detach fails when hw is down, we don't care.
 */
-   WARN_ON(cxl_detach_process(ctx) &&
-   cxl_adapter_link_ok(ctx->afu->adapter));
+   WARN_ON(cxl_ops->detach_process(ctx) &&
+   cxl_ops->link_ok(ctx->afu->adapter));
flush_work(>fault_work); /* Only needed for dedicated process */
 
/* release the reference to the group leader and mm handling pid */
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 3b824e3..8233af3 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -623,11 +623,6 @@ static inline u64 cxl_p2n_read(struct cxl_afu *afu, 
cxl_p2n_reg_t reg)
return ~0ULL;
 }
 
-u64 cxl_afu_cr_read64(struct cxl_afu *afu, int cr, u64 off);
-u32 cxl_afu_cr_read32(struct cxl_afu *afu, int cr, u64 off);
-u16 cxl_afu_cr_read16(struct cxl_afu *afu, int cr, u64 off);
-u8 cxl_afu_cr_read8(struct cxl_afu *afu, int cr, u64 off);
-
 ssize_t cxl_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
loff_t off, size_t count);
 
@@ -666,10 +661,6 @@ void cxl_sysfs_afu_m_remove(struct cxl_afu *afu);
 
 struct cxl *cxl_alloc_adapter(void);
 struct cxl_afu *cxl_alloc_afu(struct cxl *adapter, int slice);
-
-int cxl_afu_activate_mode(struct cxl_afu *afu, int mode);
-int _cxl_afu_deactivate_mode(struct cxl_afu *afu, int mode);
-int cxl_afu_deactivate_mode(struct cxl_afu *afu);
 int cxl_afu_select_best_mode(struct cxl_afu *afu);
 
 int cxl_register_psl_irq(struct cxl_afu *afu);
@@ -681,8 +672,6 @@ void cxl_release_serr_irq(struct cxl_afu *afu);
 int afu_register_irqs(struct cxl_context *ctx, u32 count);
 void afu_release_irqs(struct cxl_context *ctx, void *cookie);
 void afu_irq_name_free(struct cxl_context *ctx);
-irqreturn_t handle_psl_slice_error(struct cxl_context *ctx, u64 dsisr,
-   u64 errstat);
 
 int cxl_debugfs_init(void);
 void cxl_debugfs_exit(void);
@@ -727,18 +716,10 @@ int cxl_register_one_irq(struct cxl *adapter, 
irq_handler_t handler,
void *cookie, irq_hw_number_t *dest_hwirq,
unsigned int *dest_virq, const char *name);
 
-int cxl_attach_process(struct cxl_context *ctx, bool kernel, u64 wed,
-   u64 amr);
-int cxl_detach_process(struct cxl_context *ctx);
-
-int cxl_ack_irq(struct cxl_context *ctx, u64 tfc, u64 psl_reset_mask);
-
 int cxl_check_error(struct cxl_afu *afu);
 int cxl_afu_slbia(struct cxl_afu *afu);
 int 

[PATCH v4 05/18] cxl: Rename some bare-metal specific functions

2016-02-16 Thread Frederic Barrat
Rename a few functions, changing the 'cxl_' prefix to either
'cxl_pci_' or 'cxl_native_', to make clear that the implementation is
bare-metal specific.

Those functions will have an equivalent implementation for a guest in
a later patch.

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/cxl.h| 28 +++---
 drivers/misc/cxl/native.c | 98 ---
 drivers/misc/cxl/pci.c| 78 +++--
 3 files changed, 104 insertions(+), 100 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 8233af3..02065b4 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -519,14 +519,14 @@ struct cxl {
bool perst_same_image;
 };
 
-int cxl_alloc_one_irq(struct cxl *adapter);
-void cxl_release_one_irq(struct cxl *adapter, int hwirq);
-int cxl_alloc_irq_ranges(struct cxl_irq_ranges *irqs, struct cxl *adapter, 
unsigned int num);
-void cxl_release_irq_ranges(struct cxl_irq_ranges *irqs, struct cxl *adapter);
-int cxl_setup_irq(struct cxl *adapter, unsigned int hwirq, unsigned int virq);
+int cxl_pci_alloc_one_irq(struct cxl *adapter);
+void cxl_pci_release_one_irq(struct cxl *adapter, int hwirq);
+int cxl_pci_alloc_irq_ranges(struct cxl_irq_ranges *irqs, struct cxl *adapter, 
unsigned int num);
+void cxl_pci_release_irq_ranges(struct cxl_irq_ranges *irqs, struct cxl 
*adapter);
+int cxl_pci_setup_irq(struct cxl *adapter, unsigned int hwirq, unsigned int 
virq);
 int cxl_update_image_control(struct cxl *adapter);
-int cxl_reset(struct cxl *adapter);
-void cxl_release_afu(struct device *dev);
+int cxl_pci_reset(struct cxl *adapter);
+void cxl_pci_release_afu(struct device *dev);
 
 /* common == phyp + powernv */
 struct cxl_process_element_common {
@@ -623,7 +623,7 @@ static inline u64 cxl_p2n_read(struct cxl_afu *afu, 
cxl_p2n_reg_t reg)
return ~0ULL;
 }
 
-ssize_t cxl_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
+ssize_t cxl_pci_afu_read_err_buffer(struct cxl_afu *afu, char *buf,
loff_t off, size_t count);
 
 
@@ -663,12 +663,12 @@ struct cxl *cxl_alloc_adapter(void);
 struct cxl_afu *cxl_alloc_afu(struct cxl *adapter, int slice);
 int cxl_afu_select_best_mode(struct cxl_afu *afu);
 
-int cxl_register_psl_irq(struct cxl_afu *afu);
-void cxl_release_psl_irq(struct cxl_afu *afu);
-int cxl_register_psl_err_irq(struct cxl *adapter);
-void cxl_release_psl_err_irq(struct cxl *adapter);
-int cxl_register_serr_irq(struct cxl_afu *afu);
-void cxl_release_serr_irq(struct cxl_afu *afu);
+int cxl_native_register_psl_irq(struct cxl_afu *afu);
+void cxl_native_release_psl_irq(struct cxl_afu *afu);
+int cxl_native_register_psl_err_irq(struct cxl *adapter);
+void cxl_native_release_psl_err_irq(struct cxl *adapter);
+int cxl_native_register_serr_irq(struct cxl_afu *afu);
+void cxl_native_release_serr_irq(struct cxl_afu *afu);
 int afu_register_irqs(struct cxl_context *ctx, u32 count);
 void afu_release_irqs(struct cxl_context *ctx, void *cookie);
 void afu_irq_name_free(struct cxl_context *ctx);
diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 16d3b1a..b8a6ad5 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -80,7 +80,7 @@ int cxl_afu_disable(struct cxl_afu *afu)
 }
 
 /* This will disable as well as reset */
-static int __cxl_afu_reset(struct cxl_afu *afu)
+static int native_afu_reset(struct cxl_afu *afu)
 {
pr_devel("AFU reset request\n");
 
@@ -90,7 +90,7 @@ static int __cxl_afu_reset(struct cxl_afu *afu)
   false);
 }
 
-static int cxl_afu_check_and_enable(struct cxl_afu *afu)
+static int native_afu_check_and_enable(struct cxl_afu *afu)
 {
if (!cxl_ops->link_ok(afu->adapter)) {
WARN(1, "Refusing to enable afu while link down!\n");
@@ -631,7 +631,7 @@ static int deactivate_dedicated_process(struct cxl_afu *afu)
return 0;
 }
 
-static int cxl_afu_deactivate_mode(struct cxl_afu *afu, int mode)
+static int native_afu_deactivate_mode(struct cxl_afu *afu, int mode)
 {
if (mode == CXL_MODE_DIRECTED)
return deactivate_afu_directed(afu);
@@ -640,7 +640,7 @@ static int cxl_afu_deactivate_mode(struct cxl_afu *afu, int 
mode)
return 0;
 }
 
-static int cxl_afu_activate_mode(struct cxl_afu *afu, int mode)
+static int native_afu_activate_mode(struct cxl_afu *afu, int mode)
 {
if (!mode)
return 0;
@@ -660,7 +660,8 @@ static int cxl_afu_activate_mode(struct cxl_afu *afu, int 
mode)
return -EINVAL;
 }
 
-static int cxl_attach_process(struct cxl_context *ctx, bool kernel, u64 wed, 
u64 amr)
+static int native_attach_process(struct cxl_context *ctx, bool kernel,
+   u64 wed, u64 amr)
 {
if 

[PATCH v4 01/18] cxl: Move common code away from bare-metal-specific files

2016-02-16 Thread Frederic Barrat
From: Christophe Lombard 

Move around some functions which will be accessed from the bare-metal
and guest environments.
Code in native.c and pci.c is meant to be bare-metal specific.
Other files contain code which may be shared with guests.

Co-authored-by: Frederic Barrat 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/cxl.h|  9 +++
 drivers/misc/cxl/irq.c| 14 +-
 drivers/misc/cxl/main.c   | 67 +++
 drivers/misc/cxl/native.c | 21 ---
 drivers/misc/cxl/pci.c| 48 +
 5 files changed, 84 insertions(+), 75 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index a521bc7..3f88140 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -526,6 +526,7 @@ void cxl_release_irq_ranges(struct cxl_irq_ranges *irqs, 
struct cxl *adapter);
 int cxl_setup_irq(struct cxl *adapter, unsigned int hwirq, unsigned int virq);
 int cxl_update_image_control(struct cxl *adapter);
 int cxl_reset(struct cxl *adapter);
+void cxl_release_afu(struct device *dev);
 
 /* common == phyp + powernv */
 struct cxl_process_element_common {
@@ -679,6 +680,9 @@ void cxl_sysfs_afu_remove(struct cxl_afu *afu);
 int cxl_sysfs_afu_m_add(struct cxl_afu *afu);
 void cxl_sysfs_afu_m_remove(struct cxl_afu *afu);
 
+struct cxl *cxl_alloc_adapter(void);
+struct cxl_afu *cxl_alloc_afu(struct cxl *adapter, int slice);
+
 int cxl_afu_activate_mode(struct cxl_afu *afu, int mode);
 int _cxl_afu_deactivate_mode(struct cxl_afu *afu, int mode);
 int cxl_afu_deactivate_mode(struct cxl_afu *afu);
@@ -733,6 +737,11 @@ struct cxl_irq_info {
 };
 
 void cxl_assign_psn_space(struct cxl_context *ctx);
+irqreturn_t cxl_irq(int irq, void *ctx, struct cxl_irq_info *irq_info);
+int cxl_register_one_irq(struct cxl *adapter, irq_handler_t handler,
+   void *cookie, irq_hw_number_t *dest_hwirq,
+   unsigned int *dest_virq, const char *name);
+
 int cxl_attach_process(struct cxl_context *ctx, bool kernel, u64 wed,
u64 amr);
 int cxl_detach_process(struct cxl_context *ctx);
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index 09a4060..e468e6c 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -93,7 +93,7 @@ static irqreturn_t schedule_cxl_fault(struct cxl_context 
*ctx, u64 dsisr, u64 da
return IRQ_HANDLED;
 }
 
-static irqreturn_t cxl_irq(int irq, void *data, struct cxl_irq_info *irq_info)
+irqreturn_t cxl_irq(int irq, void *data, struct cxl_irq_info *irq_info)
 {
struct cxl_context *ctx = data;
u64 dsisr, dar;
@@ -291,12 +291,12 @@ void cxl_unmap_irq(unsigned int virq, void *cookie)
irq_dispose_mapping(virq);
 }
 
-static int cxl_register_one_irq(struct cxl *adapter,
-   irq_handler_t handler,
-   void *cookie,
-   irq_hw_number_t *dest_hwirq,
-   unsigned int *dest_virq,
-   const char *name)
+int cxl_register_one_irq(struct cxl *adapter,
+   irq_handler_t handler,
+   void *cookie,
+   irq_hw_number_t *dest_hwirq,
+   unsigned int *dest_virq,
+   const char *name)
 {
int hwirq, virq;
 
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 9fde75e..7ef5b43 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -32,6 +32,27 @@ uint cxl_verbose;
 module_param_named(verbose, cxl_verbose, uint, 0600);
 MODULE_PARM_DESC(verbose, "Enable verbose dmesg output");
 
+int cxl_afu_slbia(struct cxl_afu *afu)
+{
+   unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
+
+   pr_devel("cxl_afu_slbia issuing SLBIA command\n");
+   cxl_p2n_write(afu, CXL_SLBIA_An, CXL_TLB_SLB_IQ_ALL);
+   while (cxl_p2n_read(afu, CXL_SLBIA_An) & CXL_TLB_SLB_P) {
+   if (time_after_eq(jiffies, timeout)) {
+   dev_warn(>dev, "WARNING: CXL AFU SLBIA timed 
out!\n");
+   return -EBUSY;
+   }
+   /* If the adapter has gone down, we can assume that we
+* will PERST it and that will invalidate everything.
+*/
+   if (!cxl_adapter_link_ok(afu->adapter))
+   return -EIO;
+   cpu_relax();
+   }
+   return 0;
+}
+
 static inline void _cxl_slbia(struct cxl_context *ctx, struct mm_struct *mm)
 {
struct task_struct *task;
@@ -174,6 +195,52 @@ void cxl_remove_adapter_nr(struct cxl *adapter)
idr_remove(_adapter_idr, adapter->adapter_num);
 }
 
+struct cxl *cxl_alloc_adapter(void)
+{
+   

[PATCH v4 03/18] cxl: Define process problem state area at attach time only

2016-02-16 Thread Frederic Barrat
Cxl kernel API was defining the process problem state area during
context initialization, making it possible to map the problem state
area before attaching the context. This won't work on a powerVM
guest. So force the logical behavior, like in userspace: attach first,
then map the problem state area.
Remove calls to cxl_assign_psn_space during init. The function is
already called on the attach paths.

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
---
 drivers/misc/cxl/api.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
index ea3eeb7..b45d857 100644
--- a/drivers/misc/cxl/api.c
+++ b/drivers/misc/cxl/api.c
@@ -51,8 +51,6 @@ struct cxl_context *cxl_dev_context_init(struct pci_dev *dev)
if (rc)
goto err_mapping;
 
-   cxl_assign_psn_space(ctx);
-
return ctx;
 
 err_mapping:
@@ -207,7 +205,6 @@ EXPORT_SYMBOL_GPL(cxl_stop_context);
 void cxl_set_master(struct cxl_context *ctx)
 {
ctx->master = true;
-   cxl_assign_psn_space(ctx);
 }
 EXPORT_SYMBOL_GPL(cxl_set_master);
 
@@ -325,15 +322,11 @@ EXPORT_SYMBOL_GPL(cxl_start_work);
 
 void __iomem *cxl_psa_map(struct cxl_context *ctx)
 {
-   struct cxl_afu *afu = ctx->afu;
-   int rc;
-
-   rc = cxl_afu_check_and_enable(afu);
-   if (rc)
+   if (ctx->status != STARTED)
return NULL;
 
pr_devel("%s: psn_phys%llx size:%llx\n",
-__func__, afu->psn_phys, afu->adapter->ps_size);
+   __func__, ctx->psn_phys, ctx->psn_size);
return ioremap(ctx->psn_phys, ctx->psn_size);
 }
 EXPORT_SYMBOL_GPL(cxl_psa_map);
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 02/18] cxl: Move bare-metal specific code to specialized files

2016-02-16 Thread Frederic Barrat
Move a few functions around to better separate code specific to
bare-metal environment from code which will be commonly used between
guest and bare-metal.

Code specific to bare-metal is meant to be in native.c or pci.c
only. It's basically anything which touches the capi p1 registers,
some p2 registers not needed from a guest and the PCI interface.

Co-authored-by: Christophe Lombard 
Signed-off-by: Frederic Barrat 
Signed-off-by: Christophe Lombard 
Acked-by: Ian Munsie 
---
 drivers/misc/cxl/cxl.h|  24 +
 drivers/misc/cxl/irq.c| 205 +--
 drivers/misc/cxl/main.c   |   2 +-
 drivers/misc/cxl/native.c | 240 +-
 drivers/misc/cxl/pci.c|  18 
 5 files changed, 245 insertions(+), 244 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 3f88140..3b824e3 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -623,23 +623,8 @@ static inline u64 cxl_p2n_read(struct cxl_afu *afu, 
cxl_p2n_reg_t reg)
return ~0ULL;
 }
 
-static inline u64 cxl_afu_cr_read64(struct cxl_afu *afu, int cr, u64 off)
-{
-   if (likely(cxl_adapter_link_ok(afu->adapter)))
-   return in_le64((afu)->afu_desc_mmio + (afu)->crs_offset +
-  ((cr) * (afu)->crs_len) + (off));
-   else
-   return ~0ULL;
-}
-
-static inline u32 cxl_afu_cr_read32(struct cxl_afu *afu, int cr, u64 off)
-{
-   if (likely(cxl_adapter_link_ok(afu->adapter)))
-   return in_le32((afu)->afu_desc_mmio + (afu)->crs_offset +
-  ((cr) * (afu)->crs_len) + (off));
-   else
-   return 0x;
-}
+u64 cxl_afu_cr_read64(struct cxl_afu *afu, int cr, u64 off);
+u32 cxl_afu_cr_read32(struct cxl_afu *afu, int cr, u64 off);
 u16 cxl_afu_cr_read16(struct cxl_afu *afu, int cr, u64 off);
 u8 cxl_afu_cr_read8(struct cxl_afu *afu, int cr, u64 off);
 
@@ -654,7 +639,6 @@ struct cxl_calls {
 int register_cxl_calls(struct cxl_calls *calls);
 void unregister_cxl_calls(struct cxl_calls *calls);
 
-int cxl_alloc_adapter_nr(struct cxl *adapter);
 void cxl_remove_adapter_nr(struct cxl *adapter);
 
 int cxl_alloc_spa(struct cxl_afu *afu);
@@ -697,7 +681,8 @@ void cxl_release_serr_irq(struct cxl_afu *afu);
 int afu_register_irqs(struct cxl_context *ctx, u32 count);
 void afu_release_irqs(struct cxl_context *ctx, void *cookie);
 void afu_irq_name_free(struct cxl_context *ctx);
-irqreturn_t cxl_slice_irq_err(int irq, void *data);
+irqreturn_t handle_psl_slice_error(struct cxl_context *ctx, u64 dsisr,
+   u64 errstat);
 
 int cxl_debugfs_init(void);
 void cxl_debugfs_exit(void);
@@ -746,7 +731,6 @@ int cxl_attach_process(struct cxl_context *ctx, bool 
kernel, u64 wed,
u64 amr);
 int cxl_detach_process(struct cxl_context *ctx);
 
-int cxl_get_irq(struct cxl_afu *afu, struct cxl_irq_info *info);
 int cxl_ack_irq(struct cxl_context *ctx, u64 tfc, u64 psl_reset_mask);
 
 int cxl_check_error(struct cxl_afu *afu);
diff --git a/drivers/misc/cxl/irq.c b/drivers/misc/cxl/irq.c
index e468e6c..16fd67f 100644
--- a/drivers/misc/cxl/irq.c
+++ b/drivers/misc/cxl/irq.c
@@ -19,72 +19,6 @@
 #include "cxl.h"
 #include "trace.h"
 
-/* XXX: This is implementation specific */
-static irqreturn_t handle_psl_slice_error(struct cxl_context *ctx, u64 dsisr, 
u64 errstat)
-{
-   u64 fir1, fir2, fir_slice, serr, afu_debug;
-
-   fir1 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR1);
-   fir2 = cxl_p1_read(ctx->afu->adapter, CXL_PSL_FIR2);
-   fir_slice = cxl_p1n_read(ctx->afu, CXL_PSL_FIR_SLICE_An);
-   serr = cxl_p1n_read(ctx->afu, CXL_PSL_SERR_An);
-   afu_debug = cxl_p1n_read(ctx->afu, CXL_AFU_DEBUG_An);
-
-   dev_crit(>afu->dev, "PSL ERROR STATUS: 0x%016llx\n", errstat);
-   dev_crit(>afu->dev, "PSL_FIR1: 0x%016llx\n", fir1);
-   dev_crit(>afu->dev, "PSL_FIR2: 0x%016llx\n", fir2);
-   dev_crit(>afu->dev, "PSL_SERR_An: 0x%016llx\n", serr);
-   dev_crit(>afu->dev, "PSL_FIR_SLICE_An: 0x%016llx\n", fir_slice);
-   dev_crit(>afu->dev, "CXL_PSL_AFU_DEBUG_An: 0x%016llx\n", 
afu_debug);
-
-   dev_crit(>afu->dev, "STOPPING CXL TRACE\n");
-   cxl_stop_trace(ctx->afu->adapter);
-
-   return cxl_ack_irq(ctx, 0, errstat);
-}
-
-irqreturn_t cxl_slice_irq_err(int irq, void *data)
-{
-   struct cxl_afu *afu = data;
-   u64 fir_slice, errstat, serr, afu_debug;
-
-   WARN(irq, "CXL SLICE ERROR interrupt %i\n", irq);
-
-   serr = cxl_p1n_read(afu, CXL_PSL_SERR_An);
-   fir_slice = cxl_p1n_read(afu, CXL_PSL_FIR_SLICE_An);
-   errstat = cxl_p2n_read(afu, CXL_PSL_ErrStat_An);
-   afu_debug = cxl_p1n_read(afu, CXL_AFU_DEBUG_An);
-   dev_crit(>dev, "PSL_SERR_An: 0x%016llx\n", serr);
-   dev_crit(>dev, "PSL_FIR_SLICE_An: 

[PATCH v4 00/18] cxl: Add support for powerVM guest​

2016-02-16 Thread Frederic Barrat
This series adds support for a CAPI card in a powerVM guest.

It requires firmware FW840 and an activation code for CAPI.

Note that pHyp only claims support for cxlflash, and not generic
support for FPGA CAPI accelerators. cxlflash uses the (slightly
modified) Nallatech card, so the memcopy AFU is expected to
work. There's no support for the dedicated mode programming model.

It builds on top of the existing cxl driver for bare-metal. The cxl
module registers either a pci driver or a platform driver, based on
the environment (bare-metal or guest).

The differences in implementation have mostly 2 root causes:
1/ The CAPI card is not showing as a PCI device in the device tree
over pHyp. Instead, it is a new type of device found at the root of
the device tree. The AFU description(s) can be found in subfolder(s)
2/ When interacting with the card, the operating system doesn't have
access to the p1 registers, which are the most privileged and reserved
for the hypervisor. It can theoretically access the p2 registers,
though there are hcalls defined for most cases. We only rely on p2
registers for SLB invalidation, for which there is no hcall.

The interactions with the hypervisor are defined in an extension of
the PAPR.

Roughly 3/4 of the code is common between the 2 types of driver. When
the code needs to call a platform-specific implementation, it does so
through an API. The bare-metal and guest implementations each describe
their own definition. See struct cxl_backend_ops.

It has been tested in little-endian and big-endian environments, using
the memcpy and cxlflash AFUs.

The first 3 patches are mostly cleanup and fixes, separating the
bare-metal-specific code from the code which will also be used in a
guest.
Patches 4-8 restructure existing code, to easily add the guest
implementation
Patches 9,10 define the interactions with pHyp
Patch 11 prepares the main data structures, separating common and
implementation-specific fields
Patch 12 introduces the core of the guest-specific code
The rest adds smaller, independent changes to better support guests:
sysfs, kernel API, flash of the adapter and failure handling, ...


Changelog:
v3->v4:
 Address comments from Ian:
 - patch 3: cxl_psa_map() now fails immediately if context is not attached
 - patch 5: rephrase commit message
 - patch 8: add better comment
 - patch 10: improve pr_devel traces
 - patch 11: rework some error paths
 - patch 12: improve debug traces
 - patch 14: (big) rework of the user API for adapter device
 - patch 16: return EPERM instead of EIO
v2->v3:
 - comment from Ian: external_pe is now an integer
 - add patch 3: the problem state is only defined at attach time
   when using the kernel API, not during context initialization
 - add patch 18: add tracepoints around CAPI hcalls for debugging
v1->v2: (v1 was privately reviewed)
 - integrate comments from Michael Neuling and Ian Munsie
 - add another patch to the series: adapter failure handling
 - base patchset on 4.5-rc1


Frederic Barrat (18):
  cxl: Move common code away from bare-metal-specific files
  cxl: Move bare-metal specific code to specialized files
  cxl: Define process problem state area at attach time only
  cxl: Introduce implementation-specific API
  cxl: Rename some bare-metal specific functions
  cxl: Isolate a few bare-metal-specific calls
  cxl: Update cxl_irq() prototype
  cxl: IRQ allocation for guests
  cxl: New possible return value from hcall
  cxl: New hcalls to support CAPI adapters
  cxl: Separate bare-metal fields in adapter and AFU data structures
  cxl: Add guest-specific code
  cxl: sysfs support for guests
  cxl: Support to flash a new image on the adapter from a guest
  cxl: Parse device tree and create CAPI device(s) at boot
  cxl: Support the cxl kernel API from a guest
  cxl: Adapter failure handling
  cxl: Add tracepoints around the CAPI hcall

 Documentation/ABI/testing/sysfs-class-cxl |8 +-
 Documentation/powerpc/cxl.txt |   55 ++
 arch/powerpc/include/asm/hvcall.h |1 +
 drivers/misc/cxl/Makefile |1 +
 drivers/misc/cxl/api.c|   82 +-
 drivers/misc/cxl/base.c   |   32 +
 drivers/misc/cxl/context.c|   11 +-
 drivers/misc/cxl/cxl.h|  280 ---
 drivers/misc/cxl/debugfs.c|4 +
 drivers/misc/cxl/fault.c  |   25 +-
 drivers/misc/cxl/file.c   |   28 +-
 drivers/misc/cxl/flash.c  |  538 +
 drivers/misc/cxl/guest.c  | 1164 +
 drivers/misc/cxl/hcalls.c |  648 
 drivers/misc/cxl/hcalls.h |  203 +
 drivers/misc/cxl/irq.c|  309 ++--
 drivers/misc/cxl/main.c   |  117 ++-
 drivers/misc/cxl/native.c |  468 ++--
 drivers/misc/cxl/of.c |  513 +
 drivers/misc/cxl/pci.c 

Re: [PATCH v5] powerpc32: provide VIRT_CPU_ACCOUNTING

2016-02-16 Thread Scott Wood
On Thu, 2016-02-11 at 17:16 +0100, Christophe Leroy wrote:
> This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
> PPC32 doesn't have the PACA structure, so we use the task_info
> structure to store the accounting data.
> 
> In order to reuse on PPC32 the PPC64 functions, all u64 data has
> been replaced by 'unsigned long' so that it is u32 on PPC32 and
> u64 on PPC64
> 
> Signed-off-by: Christophe Leroy 
> ---
> Changes in v3: unlike previous version of the patch that was inspired
> from IA64 architecture, this new version tries to reuse as much as
> possible the PPC64 implementation.
> 
> PPC32 doesn't have PACA and past discusion on v2 version has shown
> that it is not worth implementing a PACA in PPC32 architecture
> (see below benh opinion)
> 
> benh: PACA is actually a data structure and you really really don't want it
> on ppc32 :-) Having a register point to current works, having a register
> point to per-cpu data instead works too (ie, change what we do today),
> but don't introduce a PACA *please* :-)

And Ben never replied to my reply at the time:

"What is special about 64-bit that warrants doing things differently from 32
-bit?  What is the difference between PACA and "per-cpu data", other than the
obscure name?"

I can understand wanting to avoid churn, but other than that, doing things 
differently on 64-bit versus 32-bit sucks.

> b/arch/powerpc/include/asm/cputime.h
> index e245255..c4c33be 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -230,7 +230,11 @@ static inline cputime_t clock_t_to_cputime(const
> unsigned long clk)
>  
>  #define cputime64_to_clock_t(ct) cputime_to_clock_t((cputime_t)(ct))
>  
> +#ifdef CONFIG_PPC64
>  static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> +#else
> +void arch_vtime_task_switch(struct task_struct *tsk);
> +#endif

Add a comment explaining why this is empty on 64-bit but non-empty on 32-bit.

> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm
> -offsets.c
> index 07cebc3..b04b957 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -256,6 +256,13 @@ int main(void)
>   DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
>   DEFINE(PACA_NAPSTATELOST, offsetof(struct paca_struct,
> nap_state_lost));
>   DEFINE(PACA_SPRG_VDSO, offsetof(struct paca_struct, sprg_vdso));
> +#else /* CONFIG_PPC64 */
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> + DEFINE(PACA_STARTTIME, offsetof(struct thread_info, starttime));
> + DEFINE(PACA_STARTTIME_USER, offsetof(struct thread_info,
> starttime_user));
> + DEFINE(PACA_USER_TIME, offsetof(struct thread_info, user_time));
> + DEFINE(PACA_SYSTEM_TIME, offsetof(struct thread_info,
> system_time));
> +#endif
>  #endif /* CONFIG_PPC64 */

Can you change the name if it's not always going to be relative to a PACA?

> +#ifdef CONFIG_PPC32
> +#define get_paca()   task_thread_info(tsk)
> +#endif

Likewise, this is just going to cause confusion.

Can you bundle up the time accounting fields into a struct, that you share
between the paca and the 32-bit thread_info, and then have a macro to grab a
pointer to that?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] add POWER Virtual Management Channel driver

2016-02-16 Thread Steven Royer
From: Steven Royer 

The ibmvmc driver is a device driver for the POWER Virtual Management
Channel virtual adapter on the PowerVM platform.  It is used to
communicate with the hypervisor for virtualization management.  It
provides both request/response and asynchronous message support through
the /dev/ibmvmc node.

Signed-off-by: Steven Royer 
---
This is used by the PowerVM NovaLink project.  You can see development history 
on github:
https://github.com/powervm/ibmvmc

 Documentation/ioctl/ioctl-number.txt |1 +
 MAINTAINERS  |5 +
 arch/powerpc/include/asm/hvcall.h|3 +-
 drivers/misc/Kconfig |9 +
 drivers/misc/Makefile|1 +
 drivers/misc/ibmvmc.c| 1882 ++
 drivers/misc/ibmvmc.h|  203 
 7 files changed, 2103 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/ibmvmc.c
 create mode 100644 drivers/misc/ibmvmc.h

diff --git a/Documentation/ioctl/ioctl-number.txt 
b/Documentation/ioctl/ioctl-number.txt
index 91261a3..d5f5f4f 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -324,6 +324,7 @@ Code  Seq#(hex) Include FileComments
 0xCA   80-8F   uapi/scsi/cxlflash_ioctl.h
 0xCB   00-1F   CBM serial IEC bus  in development:


+0xCC   00-0F   drivers/misc/ibmvmc.h   pseries VMC driver
 0xCD   01  linux/reiserfs_fs.h
 0xCF   02  fs/cifs/ioctl.c
 0xDB   00-0F   drivers/char/mwave/mwavepub.h
diff --git a/MAINTAINERS b/MAINTAINERS
index cc2f753..c39dca2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5353,6 +5353,11 @@ L:   net...@vger.kernel.org
 S: Supported
 F: drivers/net/ethernet/ibm/ibmvnic.*
 
+IBM Power Virtual Management Channel Driver
+M: Steven Royer 
+S: Supported
+F: drivers/misc/ibmvmc.*
+
 IBM Power Virtual SCSI Device Drivers
 M: Tyrel Datwyler 
 L: linux-s...@vger.kernel.org
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b54dd..1ee6f2b 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -274,7 +274,8 @@
 #define H_COP  0x304
 #define H_GET_MPP_X0x314
 #define H_SET_MODE 0x31C
-#define MAX_HCALL_OPCODE   H_SET_MODE
+#define H_REQUEST_VMC  0x360
+#define MAX_HCALL_OPCODE   H_REQUEST_VMC
 
 /* H_VIOCTL functions */
 #define H_GET_VIOA_DUMP_SIZE   0x01
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 054fc10..f8d9113 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -526,6 +526,15 @@ config VEXPRESS_SYSCFG
  bus. System Configuration interface is one of the possible means
  of generating transactions on this bus.
 
+config IBMVMC
+   tristate "IBM Virtual Management Channel support"
+   depends on PPC_PSERIES
+   help
+ This is the IBM POWER Virtual Management Channel
+
+ To compile this driver as a module, choose M here: the
+ module will be called ibmvmc.
+
 source "drivers/misc/c2port/Kconfig"
 source "drivers/misc/eeprom/Kconfig"
 source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 537d7f3..08336b3 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -56,3 +56,4 @@ obj-$(CONFIG_GENWQE)  += genwqe/
 obj-$(CONFIG_ECHO) += echo/
 obj-$(CONFIG_VEXPRESS_SYSCFG)  += vexpress-syscfg.o
 obj-$(CONFIG_CXL_BASE) += cxl/
+obj-$(CONFIG_IBMVMC)   += ibmvmc.o
diff --git a/drivers/misc/ibmvmc.c b/drivers/misc/ibmvmc.c
new file mode 100644
index 000..fb943b7
--- /dev/null
+++ b/drivers/misc/ibmvmc.c
@@ -0,0 +1,1882 @@
+/*
+ * IBM Power Systems Virtual Management Channel Support.
+ *
+ * Copyright (c) 2004, 2016 IBM Corp.
+ *   Dave Engebretsen engeb...@us.ibm.com
+ *   Steven Royer sero...@linux.vnet.ibm.com
+ *   Adam Reznechek adrez...@linux.vnet.ibm.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "ibmvmc.h"
+
+#define IBMVMC_DRIVER_VERSION "1.0"
+
+MODULE_DESCRIPTION("IBM 

Re: Writes, smp_wmb(), and transitivity?

2016-02-16 Thread Paul E. McKenney
On Tue, Feb 16, 2016 at 10:59:08AM -0800, Linus Torvalds wrote:
> On Mon, Feb 15, 2016 at 9:58 AM, Paul E. McKenney
>  wrote:
> >
> > Two threads:
> >
> > int a, b;
> >
> > void thread0(void)
> > {
> > WRITE_ONCE(a, 1);
> > smp_wmb();
> > WRITE_ONCE(b, 2);
> > }
> >
> > void thread1(void)
> > {
> > WRITE_ONCE(b, 1);
> > smp_wmb();
> > WRITE_ONCE(a, 2);
> > }
> >
> > /* After all threads have completed and the dust has settled... */
> >
> > BUG_ON(a == 1 && b == 1);
> 
> So the more I look at that kind of litmus test, the less I think that
> we should care, because I can't come up with a scenario in where that
> kind of test makes sense. without even a possibility of any causal
> relationship between the two, I can't say why we'd ever care about the
> ordering of the (independent) writes to the individual variables.
> 
> If somebody can make up a causal chain, things differ. But as long as
> all the CPU's are just doing locally ordered writes, I don't think we
> need to care about a global store ordering.

Works for me!  (Yes, I can artificially generate a use case for this
thing, but all the ones I have come up with have some better and more
sane way to get the job done.  So I completely agree with your not caring
about it.)

So for transitivity, we focus on causal chains, where one task writes
to some variable that the next task reads.

In addition, if all threads use full memory barriers throughout, as in
smp_mb(), then full ordering is of course provided regardless of the
pattern of reads and writes.

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Writes, smp_wmb(), and transitivity?

2016-02-16 Thread Linus Torvalds
On Mon, Feb 15, 2016 at 9:58 AM, Paul E. McKenney
 wrote:
>
> Two threads:
>
> int a, b;
>
> void thread0(void)
> {
> WRITE_ONCE(a, 1);
> smp_wmb();
> WRITE_ONCE(b, 2);
> }
>
> void thread1(void)
> {
> WRITE_ONCE(b, 1);
> smp_wmb();
> WRITE_ONCE(a, 2);
> }
>
> /* After all threads have completed and the dust has settled... */
>
> BUG_ON(a == 1 && b == 1);

So the more I look at that kind of litmus test, the less I think that
we should care, because I can't come up with a scenario in where that
kind of test makes sense. without even a possibility of any causal
relationship between the two, I can't say why we'd ever care about the
ordering of the (independent) writes to the individual variables.

If somebody can make up a causal chain, things differ. But as long as
all the CPU's are just doing locally ordered writes, I don't think we
need to care about a global store ordering.

  Linus
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-16 Thread Christian Borntraeger
On 02/15/2016 10:35 PM, Kirill A. Shutemov wrote:
> 
> Is there any chance that I'll be able to trigger the bug using QEMU?
> Does anybody have an QEMU image I can use?

qemu/TCG on s390 does neither provide SMP nor large pages (only QEMU/KVM does)
so this will probably not help you here. 

Christian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-16 Thread Gerald Schaefer
On Mon, 15 Feb 2016 23:35:26 +0200
"Kirill A. Shutemov"  wrote:

> On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote:
> > On Mon, 15 Feb 2016 13:31:59 +0200
> > "Kirill A. Shutemov"  wrote:
> > 
> > > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote:
> > > > 
> > > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote:
> > > > > Could you check if revert of fecffad25458 helps?
> > > > 
> > > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with:
> > > > 
> > > > ¢ 1851.721062! Unable to handle kernel pointer dereference in virtual 
> > > > kernel address space
> > > > ¢ 1851.721075! failing address:  TEID: 0483
> > > > ¢ 1851.721078! Fault in home space mode while using kernel ASCE.
> > > > ¢ 1851.721085! AS:00d5c007 R3:0007 
> > > > S:a800 P:003d
> > > > ¢ 1851.721128! Oops: 0004 ilc:3 ¢#1! PREEMPT SMP DEBUG_PAGEALLOC
> > > > ¢ 1851.721135! Modules linked in: bridge stp llc btrfs mlx4_ib mlx4_en 
> > > > ib_sa ib_mad vxlan xor ip6_udp_tunnel ib_core udp_tunnel ptp pps_core 
> > > > ib_addr ghash_s390raid6_pq prng ecb aes_s390 mlx4_core des_s390 
> > > > des_generic genwqe_card sha512_s390 sha256_s390 sha1_s390 sha_common 
> > > > crc_itu_t dm_mod scm_block vhost_net tun vhost eadm_sch macvtap macvlan 
> > > > kvm autofs4
> > > > ¢ 1851.721183! CPU: 7 PID: 256422 Comm: bash Not tainted 
> > > > 4.5.0-rc3-00058-g07923d7-dirty #178
> > > > ¢ 1851.721186! task: 7fbfd290 ti: 8c604000 task.ti: 
> > > > 8c604000
> > > > ¢ 1851.721189! Krnl PSW : 0704d0018000 0045d3b8 
> > > > (__rb_erase_color+0x280/0x308)
> > > > ¢ 1851.721200!R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 
> > > > PM:0 EA:3
> > > >Krnl GPRS: 0001 0020 
> > > >  bd07eff1
> > > > ¢ 1851.721205!0027ca10  
> > > > 83e45898 77b61198
> > > > ¢ 1851.721207!7ce1a490 bd07eff0 
> > > > 7ce1a548 0027ca10
> > > > ¢ 1851.721210!bd07c350 bd07eff0 
> > > > 8c607aa8 8c607a68
> > > > ¢ 1851.721221! Krnl Code: 0045d3aa: e3c0d0080024   stg 
> > > > %%r12,8(%%r13)
> > > >   0045d3b0: b9040039   lgr 
> > > > %%r3,%%r9
> > > >  #0045d3b4: a53b0001   oill
> > > > %%r3,1
> > > >  >0045d3b8: e3301024   stg 
> > > > %%r3,0(%%r1)
> > > >   0045d3be: ec28000e007c   cgij
> > > > %%r2,0,8,45d3da
> > > >   0045d3c4: e3402004   lg  
> > > > %%r4,0(%%r2)
> > > >   0045d3ca: b904001c   lgr 
> > > > %%r1,%%r12
> > > >   0045d3ce: ec143f3f0056   rosbg   
> > > > %%r1,%%r4,63,63,0
> > > > ¢ 1851.721269! Call Trace:
> > > > ¢ 1851.721273! (¢<83e45898>! 0x83e45898)
> > > > ¢ 1851.721279!  ¢<0029342a>! unlink_anon_vmas+0x9a/0x1d8
> > > > ¢ 1851.721282!  ¢<00283f34>! free_pgtables+0xcc/0x148
> > > > ¢ 1851.721285!  ¢<0028c376>! exit_mmap+0xd6/0x300
> > > > ¢ 1851.721289!  ¢<00134db8>! mmput+0x90/0x118
> > > > ¢ 1851.721294!  ¢<002d76bc>! flush_old_exec+0x5d4/0x700
> > > > ¢ 1851.721298!  ¢<003369f4>! load_elf_binary+0x2f4/0x13e8
> > > > ¢ 1851.721301!  ¢<002d6e4a>! search_binary_handler+0x9a/0x1f8
> > > > ¢ 1851.721304!  ¢<002d8970>! 
> > > > do_execveat_common.isra.32+0x668/0x9a0
> > > > ¢ 1851.721307!  ¢<002d8cec>! do_execve+0x44/0x58
> > > > ¢ 1851.721310!  ¢<002d8f92>! SyS_execve+0x3a/0x48
> > > > ¢ 1851.721315!  ¢<006fb096>! system_call+0xd6/0x258
> > > > ¢ 1851.721317!  ¢<03ff997436d6>! 0x3ff997436d6
> > > > ¢ 1851.721319! INFO: lockdep is turned off.
> > > > ¢ 1851.721321! Last Breaking-Event-Address:
> > > > ¢ 1851.721323!  ¢<0045d31a>! __rb_erase_color+0x1e2/0x308
> > > > ¢ 1851.721327!
> > > > ¢ 1851.721329! ---¢ end trace 0d80041ac00cfae2 !---
> > > > 
> > > > 
> > > > > 
> > > > > And could you share how crashes looks like? I haven't seen backtraces 
> > > > > yet.
> > > > > 
> > > > 
> > > > Sure. I didn't because they really looked random to me. Most of the time
> > > > in rcu or list debugging but I thought these have just been the 
> > > > messenger
> > > > observing a corruption first. Anyhow, here is an older one that might 
> > > > look
> > > > interesting:
> > > > 
> > > > [   59.851421] list_del corruption. next->prev should be 
> > > > 6e1eb000, but was 0400
> > > 
> > > This kinda interesting: 0x400 is TAIL_MAPPING.. Hm..
> > > 
> > > Could you check if you see the problem on commit 1c290f642101 and its
> > > immediate parent?
> > > 

  1   2   >