Re: [PATCH 18/26] KVM: PPC: Book3S PR: make mtspr/mfspr emulation behavior based on active TM SPRs

2018-01-23 Thread Paul Mackerras
On Thu, Jan 11, 2018 at 06:11:31PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> The mfspr/mtspr on TM SPRs(TEXASR/TFIAR/TFHAR) are non-privileged
> instructions and can be executed at PR KVM guest without trapping
> into host in problem state. We only emulate mtspr/mfspr
> texasr/tfiar/tfhar at guest PR=0 state.
> 
> When we are emulating mtspr tm sprs at guest PR=0 state, the emulation
> result need to be visible to guest PR=1 state. That is, the actual TM
> SPR val should be loaded into actual registers.
> 
> We already flush TM SPRs into vcpu when switching out of CPU, and load
> TM SPRs when switching back.
> 
> This patch corrects mfspr()/mtspr() emulation for TM SPRs to make the
> actual source/dest based on actual TM SPRs.
> 
> Signed-off-by: Simon Guo 
> ---
>  arch/powerpc/kvm/book3s_emulate.c | 35 +++
>  1 file changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_emulate.c 
> b/arch/powerpc/kvm/book3s_emulate.c
> index e096d01..c2836330 100644
> --- a/arch/powerpc/kvm/book3s_emulate.c
> +++ b/arch/powerpc/kvm/book3s_emulate.c
> @@ -521,13 +521,26 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, 
> int sprn, ulong spr_val)
>   break;
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   case SPRN_TFHAR:
> - vcpu->arch.tfhar = spr_val;
> - break;
>   case SPRN_TEXASR:
> - vcpu->arch.texasr = spr_val;
> - break;
>   case SPRN_TFIAR:
> - vcpu->arch.tfiar = spr_val;
> + if (MSR_TM_ACTIVE(kvmppc_get_msr(vcpu))) {
> + /* it is illegal to mtspr() TM regs in
> +  * other than non-transactional state.
> +  */
> + kvmppc_core_queue_program(vcpu, SRR1_PROGTM);
> + emulated = EMULATE_AGAIN;
> + break;
> + }

We also need to check that the guest has TM enabled in the guest MSR,
and give them a facility unavailable interrupt if not.

> +
> + tm_enable();
> + if (sprn == SPRN_TFHAR)
> + mtspr(SPRN_TFHAR, spr_val);
> + else if (sprn == SPRN_TEXASR)
> + mtspr(SPRN_TEXASR, spr_val);
> + else
> + mtspr(SPRN_TFIAR, spr_val);
> + tm_disable();

I haven't seen any checks that we are on a CPU that has TM.  What
happens if a guest does a mtmsrd with TM=1 and then a mtspr to TEXASR
when running on a POWER7 (assuming the host kernel was compiled with
CONFIG_PPC_TRANSACTIONAL_MEM=y)?

Ideally, if the host CPU does not have TM functionality, these mtsprs
would be treated as no-ops and attempts to set the TM or TS fields in
the guest MSR would be ignored.

> +
>   break;
>  #endif
>  #endif
> @@ -674,13 +687,19 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, 
> int sprn, ulong *spr_val
>   break;
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   case SPRN_TFHAR:
> - *spr_val = vcpu->arch.tfhar;
> + tm_enable();
> + *spr_val = mfspr(SPRN_TFHAR);
> + tm_disable();
>   break;
>   case SPRN_TEXASR:
> - *spr_val = vcpu->arch.texasr;
> + tm_enable();
> + *spr_val = mfspr(SPRN_TEXASR);
> + tm_disable();
>   break;
>   case SPRN_TFIAR:
> - *spr_val = vcpu->arch.tfiar;
> + tm_enable();
> + *spr_val = mfspr(SPRN_TFIAR);
> + tm_disable();
>   break;

These need to check MSR_TM in the guest MSR, and become no-ops on
machines without TM capability.

Paul.


Re: [PATCH 19/26] KVM: PPC: Book3S PR: always fail transaction in guest privilege state

2018-01-23 Thread Paul Mackerras
On Thu, Jan 11, 2018 at 06:11:32PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> Currently kernel doesn't use transaction memory.
> And there is an issue for privilege guest that:
> tbegin/tsuspend/tresume/tabort TM instructions can impact MSR TM bits
> without trap into PR host. So following code will lead to a false mfmsr
> result:
>   tbegin  <- MSR bits update to Transaction active.
>   beq <- failover handler branch
>   mfmsr   <- still read MSR bits from magic page with
>   transaction inactive.
> 
> It is not an issue for non-privilege guest since its mfmsr is not patched
> with magic page and will always trap into PR host.
> 
> This patch will always fail tbegin attempt for privilege guest, so that
> the above issue is prevented. It is benign since currently (guest) kernel
> doesn't initiate a transaction.
> 
> Test case:
> https://github.com/justdoitqd/publicFiles/blob/master/test_tbegin_pr.c
> 
> Signed-off-by: Simon Guo 

You need to handle the case where MSR_TM is not set in the guest MSR,
and give the guest a facility unavailable interrupt.

[snip]

> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -255,7 +255,7 @@ static inline void kvmppc_save_tm_sprs(struct kvm_vcpu 
> *vcpu)
>   tm_disable();
>  }
>  
> -static inline void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu)
> +inline void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu)

You should probably remove the 'inline' here too.

Paul.


Re: [PATCH 20/26] KVM: PPC: Book3S PR: enable NV reg restore for reading TM SPR at guest privilege state

2018-01-23 Thread Paul Mackerras
On Thu, Jan 11, 2018 at 06:11:33PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> Currently kvmppc_handle_fac() will not update NV GPRs and thus it can
> return with GUEST_RESUME.
> 
> However PR KVM guest always disables MSR_TM bit at privilege state. If PR
> privilege guest are trying to read TM SPRs, it will trigger TM facility
> unavailable exception and fall into kvmppc_handle_fac(). Then the emulation
> will be done by kvmppc_core_emulate_mfspr_pr(). The mfspr instruction can
> include a RT with NV reg. So it is necessary to restore NV GPRs at this
> case, to reflect the update to NV RT.
> 
> This patch make kvmppc_handle_fac() return GUEST_RESUME_NV at TM fac
> exception and with guest privilege state.
> 
> Signed-off-by: Simon Guo 

Reviewed-by: Paul Mackerras 


Re: [PATCH 21/26] KVM: PPC: Book3S PR: adds emulation for treclaim.

2018-01-23 Thread Paul Mackerras
On Thu, Jan 11, 2018 at 06:11:34PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> This patch adds support for "treclaim." emulation when PR KVM guest
> executes treclaim. and traps to host.
> 
> We will firstly doing treclaim. and save TM checkpoint and doing
> treclaim. Then it is necessary to update vcpu current reg content
> with checkpointed vals. When rfid into guest again, those vcpu
> current reg content(now the checkpoint vals) will be loaded into
> regs.
> 
> Signed-off-by: Simon Guo 
> ---
>  arch/powerpc/include/asm/reg.h|  4 +++
>  arch/powerpc/kvm/book3s_emulate.c | 66 
> ++-
>  2 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 6c293bc..b3bcf6b 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -244,12 +244,16 @@
>  #define SPRN_TEXASR  0x82/* Transaction EXception & Summary */
>  #define SPRN_TEXASRU 0x83/* ''  ''  ''Upper 32  */
>  #define TEXASR_FC_LG (63 - 7)/* Failure Code */
> +#define TEXASR_AB_LG (63 - 31)   /* Abort */
> +#define TEXASR_SU_LG (63 - 32)   /* Suspend */
>  #define TEXASR_HV_LG (63 - 34)   /* Hypervisor state*/
>  #define TEXASR_PR_LG (63 - 35)   /* Privilege level */
>  #define TEXASR_FS_LG (63 - 36)   /* failure summary */
>  #define TEXASR_EX_LG (63 - 37)   /* TFIAR exact bit */
>  #define TEXASR_ROT_LG(63 - 38)   /* ROT bit */
>  #define TEXASR_FC(ASM_CONST(0xFF) << TEXASR_FC_LG)
> +#define TEXASR_AB__MASK(TEXASR_AB_LG)
> +#define TEXASR_SU__MASK(TEXASR_SU_LG)
>  #define TEXASR_HV__MASK(TEXASR_HV_LG)
>  #define TEXASR_PR__MASK(TEXASR_PR_LG)
>  #define TEXASR_FS__MASK(TEXASR_FS_LG)

It would be good to collect up all the modifications you need to make
to reg.h into a single patch at the beginning of the patch series --
that will make it easier to merge it all.

> diff --git a/arch/powerpc/kvm/book3s_emulate.c 
> b/arch/powerpc/kvm/book3s_emulate.c
> index 1eb1900..51c0e20 100644
> --- a/arch/powerpc/kvm/book3s_emulate.c
> +++ b/arch/powerpc/kvm/book3s_emulate.c

[snip]

> @@ -127,6 +130,42 @@ void kvmppc_copyfrom_vcpu_tm(struct kvm_vcpu *vcpu)
>   vcpu->arch.vrsave = vcpu->arch.vrsave_tm;
>  }
>  
> +static void kvmppc_emulate_treclaim(struct kvm_vcpu *vcpu, int ra_val)
> +{
> + unsigned long guest_msr = kvmppc_get_msr(vcpu);
> + int fc_val = ra_val ? ra_val : 1;
> +
> + kvmppc_save_tm_pr(vcpu);
> +
> + preempt_disable();
> + kvmppc_copyfrom_vcpu_tm(vcpu);
> + preempt_enable();
> +
> + /*
> +  * treclaim need quit to non-transactional state.
> +  */
> + guest_msr &= ~(MSR_TS_MASK);
> + kvmppc_set_msr(vcpu, guest_msr);
> +
> + preempt_disable();
> + tm_enable();
> + vcpu->arch.texasr = mfspr(SPRN_TEXASR);
> + vcpu->arch.texasr &= ~TEXASR_FC;
> + vcpu->arch.texasr |= ((u64)fc_val << TEXASR_FC_LG);

You're doing failure recording here unconditionally, but the
architecture says that treclaim. only does failure recording if
TEXASR_FS is not already set.

> + vcpu->arch.texasr &= ~(TEXASR_PR | TEXASR_HV);
> + if (kvmppc_get_msr(vcpu) & MSR_PR)
> + vcpu->arch.texasr |= TEXASR_PR;
> +
> + if (kvmppc_get_msr(vcpu) & MSR_HV)
> + vcpu->arch.texasr |= TEXASR_HV;
> +
> + vcpu->arch.tfiar = kvmppc_get_pc(vcpu);
> + mtspr(SPRN_TEXASR, vcpu->arch.texasr);
> + mtspr(SPRN_TFIAR, vcpu->arch.tfiar);
> + tm_disable();
> + preempt_enable();
> +}
>  #endif

Paul.


Re: [PATCH 22/26] KVM: PPC: Book3S PR: add emulation for trechkpt in PR KVM.

2018-01-23 Thread Paul Mackerras
On Thu, Jan 11, 2018 at 06:11:35PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> This patch adds host emulation when guest PR KVM executes "trechkpt.",
> which is a privileged instruction and will trap into host.
> 
> We firstly copy vcpu ongoing content into vcpu tm checkpoint
> content, then perform kvmppc_restore_tm_pr() to do trechkpt.
> with updated vcpu tm checkpoint vals.
> 
> Signed-off-by: Simon Guo 

[snip]

> +static void kvmppc_emulate_trchkpt(struct kvm_vcpu *vcpu)
> +{
> + unsigned long guest_msr = kvmppc_get_msr(vcpu);
> +
> + preempt_disable();
> + vcpu->arch.save_msr_tm = MSR_TS_S;
> + vcpu->arch.save_msr_tm &= ~(MSR_FP | MSR_VEC | MSR_VSX);

This looks odd, since you are clearing bits when you have just set
save_msr_tm to a constant value that doesn't have these bits set.
This could be taken as a sign that the previous line has a bug and you
meant "|=" or something similar instead of "=".  I think you probably
did mean "=", in which case you should remove the line clearing
FP/VEC/VSX.

Paul.


Re: [PATCH 23/26] KVM: PPC: Book3S PR: add emulation for tabort. for privilege guest

2018-01-23 Thread Paul Mackerras
On Thu, Jan 11, 2018 at 06:11:36PM +0800, wei.guo.si...@gmail.com wrote:
> From: Simon Guo 
> 
> Currently privilege guest will be run with TM disabled.
> 
> Although the privilege guest cannot initiate a new transaction,
> it can use tabort to terminate its problem state's transaction.
> So it is still necessary to emulate tabort. for privilege guest.
> 
> This patch adds emulation for tabort. of privilege guest.
> 
> Tested with:
> https://github.com/justdoitqd/publicFiles/blob/master/test_tabort.c
> 
> Signed-off-by: Simon Guo 
> ---
>  arch/powerpc/include/asm/kvm_book3s.h |  1 +
>  arch/powerpc/kvm/book3s_emulate.c | 31 +++
>  arch/powerpc/kvm/book3s_pr.c  |  2 +-
>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 524cd82..8bd454c 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -258,6 +258,7 @@ extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu,
>  void kvmppc_save_tm_pr(struct kvm_vcpu *vcpu);
>  void kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu);
>  void kvmppc_restore_tm_sprs(struct kvm_vcpu *vcpu);
> +void kvmppc_save_tm_sprs(struct kvm_vcpu *vcpu);

Why do you add this declaration, and change it from "static inline" to
"inline" below, when this patch doesn't use it?  Also, making it
"inline" is pointless if it has a caller outside the source file where
it's defined (if gcc wants to inline uses of it inside the same source
file, it will do so anyway even without the "inline" keyword.)

Paul.


[PATCH v2 00/13] New driver to support OpenCAPI devices on POWER9

2018-01-23 Thread Frederic Barrat

This series adds support for Open Coherent Accelerator (ocxl) devices
on POWER9 processor. OpenCAPI is a consortium developing the
specifications for an interface between processors and accelerators,
allowing sharing the host memory with the accelerators, using virtual
addresses.

The OpenCAPI device can also have its own local memory and provide
access to the host, though it is not supported by that series.

The OpenCAPI specification is processor agnostic, but this series adds
support specifically for powerpc.

Even though the underlying transport is not PCI, the firmware
abstracts the hardware like a PCI host bridge and Linux sees the
OpenCAPI devices as PCI devices. So a lot of existing infrastructure
and commands can be reused.

Patches 1-5:  add the platform-specific services needed by the driver
Patches 6-10: driver code
Patch 11: small correction to existing cxl driver
Patch 12: documentation

Current limitations, that will be addressed in later patches:
 - no capability to trigger a reset of the opencapi adapter
 - no support for the 'wake_host_thread' command
 - no support for adapters with a dual-link connection (none exists yet)
 - no access to the adapter-local memory

Many people contributed directly or indirectly, from the software,
hardware and bringup teams. In particular Andrew Donnellan and
Alastair D'Silva, who are developing the related firmware and library.

Feedback welcome!

Changelog:
v2:
All/many patches:
  use new SPDX tag for licensing info
  fix sparse warnings
patch 2:  set "From" field correctly
  don't activate the PCI fixup on platforms other than powernv
patch 4:  fix typos
patch 7:  map AFU interrupt trigger page write-only
patch 10: rephrase CONFIG help message
patch 12: add documentation for new sysfs files
patch 13: follow alphabetical order for new entry in MAINTAINERS



Andrew Donnellan (1):
  powerpc/powernv: Set correct configuration space size for opencapi
devices

Frederic Barrat (12):
  powerpc/powernv: Introduce new PHB type for opencapi links
  powerpc/powernv: Add opal calls for opencapi
  powerpc/powernv: Add platform-specific services for opencapi
  powerpc/powernv: Capture actag information for the device
  ocxl: Driver code for 'generic' opencapi devices
  ocxl: Add AFU interrupt support
  ocxl: Add a kernel API for other opencapi drivers
  ocxl: Add trace points
  ocxl: Add Makefile and Kconfig
  cxl: Remove support for "Processing accelerators" class
  ocxl: Documentation
  ocxl: add MAINTAINERS entry

 Documentation/ABI/testing/sysfs-class-ocxl |  35 ++
 Documentation/accelerators/ocxl.rst| 160 ++
 Documentation/ioctl/ioctl-number.txt   |   1 +
 MAINTAINERS|  12 +
 arch/powerpc/include/asm/opal-api.h|   5 +-
 arch/powerpc/include/asm/opal.h|   6 +
 arch/powerpc/include/asm/pnv-ocxl.h|  36 ++
 arch/powerpc/platforms/powernv/Makefile|   1 +
 arch/powerpc/platforms/powernv/npu-dma.c   |   2 +-
 arch/powerpc/platforms/powernv/ocxl.c  | 515 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
 arch/powerpc/platforms/powernv/pci-ioda.c  |  54 +-
 arch/powerpc/platforms/powernv/pci.c   |   4 +
 arch/powerpc/platforms/powernv/pci.h   |   8 +-
 drivers/misc/Kconfig   |   1 +
 drivers/misc/Makefile  |   1 +
 drivers/misc/cxl/pci.c |   2 -
 drivers/misc/ocxl/Kconfig  |  31 ++
 drivers/misc/ocxl/Makefile |  11 +
 drivers/misc/ocxl/afu_irq.c| 202 +++
 drivers/misc/ocxl/config.c | 723 +
 drivers/misc/ocxl/context.c| 279 ++
 drivers/misc/ocxl/file.c   | 432 +++
 drivers/misc/ocxl/link.c   | 647 ++
 drivers/misc/ocxl/main.c   |  33 ++
 drivers/misc/ocxl/ocxl_internal.h  | 131 +
 drivers/misc/ocxl/pasid.c  | 107 
 drivers/misc/ocxl/pci.c| 585 
 drivers/misc/ocxl/sysfs.c  | 142 +
 drivers/misc/ocxl/trace.c  |   6 +
 drivers/misc/ocxl/trace.h  | 182 +++
 include/misc/ocxl-config.h |  45 ++
 include/misc/ocxl.h| 214 
 include/uapi/misc/ocxl.h   |  49 ++
 34 files changed, 4650 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-class-ocxl
 create mode 100644 Documentation/accelerators/ocxl.rst
 create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
 create mode 100644 arch/powerpc/platforms/powernv/ocxl.c
 create mode 100644 drivers/misc/ocxl/Kconfig
 create mode 100644 drivers/misc/ocxl/Makefile
 create mode 100644 drivers

[PATCH v2 02/13] powerpc/powernv: Set correct configuration space size for opencapi devices

2018-01-23 Thread Frederic Barrat
From: Andrew Donnellan 

The configuration space for opencapi devices doesn't have a PCI
Express capability, therefore confusing linux in thinking it's of an
old PCI type with a 256-byte configuration space size, instead of the
desired 4k. So add a PCI fixup to declare the correct size.

Signed-off-by: Andrew Donnellan 
Signed-off-by: Frederic Barrat 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index e780263a14ee..d5af700820f3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -4080,6 +4080,19 @@ void __init pnv_pci_init_npu2_opencapi_phb(struct 
device_node *np)
pnv_pci_init_ioda_phb(np, 0, PNV_PHB_NPU_OCAPI);
 }
 
+static void pnv_npu2_opencapi_cfg_size_fixup(struct pci_dev *dev)
+{
+   struct pci_controller *hose = pci_bus_to_host(dev->bus);
+   struct pnv_phb *phb = hose->private_data;
+
+   if (!machine_is(powernv))
+   return;
+
+   if (phb->type == PNV_PHB_NPU_OCAPI)
+   dev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, 
pnv_npu2_opencapi_cfg_size_fixup);
+
 void __init pnv_pci_init_ioda_hub(struct device_node *np)
 {
struct device_node *phbn;
-- 
2.14.1



[PATCH v2 03/13] powerpc/powernv: Add opal calls for opencapi

2018-01-23 Thread Frederic Barrat
Add opal calls to interact with the NPU:

OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA)
The SPA is a table containing one entry (Process Element) per memory
context which can be accessed by the opencapi device.

OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache
The NPU keeps a cache of recently accessed memory contexts. When a
Process Element is removed from the SPA, the cache for the link must
be cleared.

OPAL_NPU_TL_SET: configure the Transaction Layer
The Transaction Layer specification defines several templates for
messages to be exchanged on the link. During link setup, the host and
device must negotiate what templates are supported on both sides and
at what rates those messages can be sent.

Signed-off-by: Frederic Barrat 
Acked-by: Andrew Donnellan 
---
 arch/powerpc/include/asm/opal-api.h| 5 -
 arch/powerpc/include/asm/opal.h| 6 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S | 3 +++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 233c7504b1f2..24c73f5575ee 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -201,7 +201,10 @@
 #define OPAL_SET_POWER_SHIFT_RATIO 155
 #define OPAL_SENSOR_GROUP_CLEAR156
 #define OPAL_PCI_SET_P2P   157
-#define OPAL_LAST  157
+#define OPAL_NPU_SPA_SETUP 159
+#define OPAL_NPU_SPA_CLEAR_CACHE   160
+#define OPAL_NPU_TL_SET161
+#define OPAL_LAST  161
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 0c545f7fc77b..12e70fb58700 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -34,6 +34,12 @@ int64_t opal_npu_init_context(uint64_t phb_id, int pasid, 
uint64_t msr,
uint64_t bdf);
 int64_t opal_npu_map_lpar(uint64_t phb_id, uint64_t bdf, uint64_t lparid,
uint64_t lpcr);
+int64_t opal_npu_spa_setup(uint64_t phb_id, uint32_t bdfn,
+   uint64_t addr, uint64_t PE_mask);
+int64_t opal_npu_spa_clear_cache(uint64_t phb_id, uint32_t bdfn,
+   uint64_t PE_handle);
+int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t bdfn, long cap,
+   uint64_t rate_phys, uint32_t size);
 int64_t opal_console_write(int64_t term_number, __be64 *length,
   const uint8_t *buffer);
 int64_t opal_console_read(int64_t term_number, __be64 *length,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 6f4b00a2ac46..1b2936ba6040 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -320,3 +320,6 @@ OPAL_CALL(opal_set_powercap,
OPAL_SET_POWERCAP);
 OPAL_CALL(opal_get_power_shift_ratio,  OPAL_GET_POWER_SHIFT_RATIO);
 OPAL_CALL(opal_set_power_shift_ratio,  OPAL_SET_POWER_SHIFT_RATIO);
 OPAL_CALL(opal_sensor_group_clear, OPAL_SENSOR_GROUP_CLEAR);
+OPAL_CALL(opal_npu_spa_setup,  OPAL_NPU_SPA_SETUP);
+OPAL_CALL(opal_npu_spa_clear_cache,OPAL_NPU_SPA_CLEAR_CACHE);
+OPAL_CALL(opal_npu_tl_set, OPAL_NPU_TL_SET);
-- 
2.14.1



[PATCH v2 04/13] powerpc/powernv: Add platform-specific services for opencapi

2018-01-23 Thread Frederic Barrat
Implement a few platform-specific calls which can be used by drivers:

- provide the Transaction Layer capabilities of the host, so that the
  driver can find some common ground and configure the device and host
  appropriately.

- provide the hw interrupt to be used for translation faults raised by
  the NPU

- map/unmap some NPU mmio registers to get the fault context when the
  NPU raises an address translation fault

The rest are wrappers around the previously-introduced opal calls.

Signed-off-by: Frederic Barrat 
---
 arch/powerpc/include/asm/pnv-ocxl.h |  29 +
 arch/powerpc/platforms/powernv/Makefile |   1 +
 arch/powerpc/platforms/powernv/ocxl.c   | 180 
 3 files changed, 210 insertions(+)
 create mode 100644 arch/powerpc/include/asm/pnv-ocxl.h
 create mode 100644 arch/powerpc/platforms/powernv/ocxl.c

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
new file mode 100644
index ..36868d49aeed
--- /dev/null
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#ifndef _ASM_PNV_OCXL_H
+#define _ASM_PNV_OCXL_H
+
+#include 
+
+#define PNV_OCXL_TL_MAX_TEMPLATE63
+#define PNV_OCXL_TL_BITS_PER_RATE   4
+#define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
+
+extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
+   char *rate_buf, int rate_buf_size);
+extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+   uint64_t rate_buf_phys, int rate_buf_size);
+
+extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);
+extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
+   void __iomem *tfc, void __iomem *pe_handle);
+extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
+   void __iomem **dar, void __iomem **tfc,
+   void __iomem **pe_handle);
+
+extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
+   void **platform_data);
+extern void pnv_ocxl_spa_release(void *platform_data);
+extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+
+#endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 3732118a0482..6c9d5199a7e2 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_PERF_EVENTS) += opal-imc.o
 obj-$(CONFIG_PPC_MEMTRACE) += memtrace.o
 obj-$(CONFIG_PPC_VAS)  += vas.o vas-window.o vas-debug.o
 obj-$(CONFIG_PPC_FTW)  += nx-ftw.o
+obj-$(CONFIG_OCXL_BASE)+= ocxl.o
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
new file mode 100644
index ..d61186805a07
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#include 
+#include 
+#include "pci.h"
+
+#define PNV_OCXL_TL_P9_RECV_CAP0x000Full
+/* PASIDs are 20-bit, but on P9, NPU can only handle 15 bits */
+#define PNV_OCXL_PASID_BITS15
+#define PNV_OCXL_PASID_MAX ((1 << PNV_OCXL_PASID_BITS) - 1)
+
+
+static void set_templ_rate(unsigned int templ, unsigned int rate, char *buf)
+{
+   int shift, idx;
+
+   WARN_ON(templ > PNV_OCXL_TL_MAX_TEMPLATE);
+   idx = (PNV_OCXL_TL_MAX_TEMPLATE - templ) / 2;
+   shift = 4 * (1 - ((PNV_OCXL_TL_MAX_TEMPLATE - templ) % 2));
+   buf[idx] |= rate << shift;
+}
+
+int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
+   char *rate_buf, int rate_buf_size)
+{
+   if (rate_buf_size != PNV_OCXL_TL_RATE_BUF_SIZE)
+   return -EINVAL;
+   /*
+* The TL capabilities are a characteristic of the NPU, so
+* we go with hard-coded values.
+*
+* The receiving rate of each template is encoded on 4 bits.
+*
+* On P9:
+* - templates 0 -> 3 are supported
+* - templates 0, 1 and 3 have a 0 receiving rate
+* - template 2 has receiving rate of 1 (extra cycle)
+*/
+   memset(rate_buf, 0, rate_buf_size);
+   set_templ_rate(2, 1, rate_buf);
+   *cap = PNV_OCXL_TL_P9_RECV_CAP;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_get_tl_cap);
+
+int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+   uint64_t rate_buf_phys, int rate_buf_size)
+{
+   struct pci_controller *hose = pci_bus_to_host(dev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   int rc;
+
+   if (rate_buf_size != PNV_OCXL_TL_RATE_BUF_SIZE)
+   return -EINVAL;
+
+   rc = opal_npu_tl_set(phb->opal_id, dev->devfn, cap,
+   rate_buf_phys

[PATCH v2 05/13] powerpc/powernv: Capture actag information for the device

2018-01-23 Thread Frederic Barrat
In the opencapi protocol, host memory contexts are referenced by a
'actag'. During setup, a driver must tell the device how many actags
it can used, and what values are acceptable.

On POWER9, the NPU can handle 64 actags per link, so they must be
shared between all the PCI functions of the link. To get a global
picture of how many actags are used by each AFU of every function, we
capture some data at the end of PCI enumeration, so that actags can be
shared fairly if needed.

This is not powernv specific per say, but rather a consequence of the
opencapi configuration specification being quite general. The number
of available actags on POWER9 makes it more likely to be hit. This is
somewhat mitigated by the fact that existing AFUs are coded by
requesting a reasonable count of actags and existing devices carry
only one AFU.

Signed-off-by: Frederic Barrat 
---
 arch/powerpc/include/asm/pnv-ocxl.h   |   4 +
 arch/powerpc/platforms/powernv/ocxl.c | 305 ++
 include/misc/ocxl-config.h|  45 +
 3 files changed, 354 insertions(+)
 create mode 100644 include/misc/ocxl-config.h

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 36868d49aeed..398d05b30600 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -9,6 +9,10 @@
 #define PNV_OCXL_TL_BITS_PER_RATE   4
 #define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
 
+extern int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled,
+   u16 *supported);
+extern int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
+
 extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
char *rate_buf, int rate_buf_size);
 extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index d61186805a07..1faaa4ef6903 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -2,13 +2,318 @@
 // Copyright 2017 IBM Corp.
 #include 
 #include 
+#include 
 #include "pci.h"
 
 #define PNV_OCXL_TL_P9_RECV_CAP0x000Full
+#define PNV_OCXL_ACTAG_MAX 64
 /* PASIDs are 20-bit, but on P9, NPU can only handle 15 bits */
 #define PNV_OCXL_PASID_BITS15
 #define PNV_OCXL_PASID_MAX ((1 << PNV_OCXL_PASID_BITS) - 1)
 
+#define AFU_PRESENT (1 << 31)
+#define AFU_INDEX_MASK 0x3F00
+#define AFU_INDEX_SHIFT 24
+#define ACTAG_MASK 0xFFF
+
+
+struct actag_range {
+   u16 start;
+   u16 count;
+};
+
+struct npu_link {
+   struct list_head list;
+   int domain;
+   int bus;
+   int dev;
+   u16 fn_desired_actags[8];
+   struct actag_range fn_actags[8];
+   bool assignment_done;
+};
+static struct list_head links_list = LIST_HEAD_INIT(links_list);
+static DEFINE_MUTEX(links_list_lock);
+
+
+/*
+ * opencapi actags handling:
+ *
+ * When sending commands, the opencapi device references the memory
+ * context it's targeting with an 'actag', which is really an alias
+ * for a (BDF, pasid) combination. When it receives a command, the NPU
+ * must do a lookup of the actag to identify the memory context. The
+ * hardware supports a finite number of actags per link (64 for
+ * POWER9).
+ *
+ * The device can carry multiple functions, and each function can have
+ * multiple AFUs. Each AFU advertises in its config space the number
+ * of desired actags. The host must configure in the config space of
+ * the AFU how many actags the AFU is really allowed to use (which can
+ * be less than what the AFU desires).
+ *
+ * When a PCI function is probed by the driver, it has no visibility
+ * about the other PCI functions and how many actags they'd like,
+ * which makes it impossible to distribute actags fairly among AFUs.
+ *
+ * Unfortunately, the only way to know how many actags a function
+ * desires is by looking at the data for each AFU in the config space
+ * and add them up. Similarly, the only way to know how many actags
+ * all the functions of the physical device desire is by adding the
+ * previously computed function counts. Then we can match that against
+ * what the hardware supports.
+ *
+ * To get a comprehensive view, we use a 'pci fixup': at the end of
+ * PCI enumeration, each function counts how many actags its AFUs
+ * desire and we save it in a 'npu_link' structure, shared between all
+ * the PCI functions of a same device. Therefore, when the first
+ * function is probed by the driver, we can get an idea of the total
+ * count of desired actags for the device, and assign the actags to
+ * the AFUs, by pro-rating if needed.
+ */
+
+static int find_dvsec_from_pos(struct pci_dev *dev, int dvsec_id, int pos)
+{
+   int vsec = pos;
+   u16 vendor, id;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+

[PATCH v2 06/13] ocxl: Driver code for 'generic' opencapi devices

2018-01-23 Thread Frederic Barrat
Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/config.c| 712 ++
 drivers/misc/ocxl/context.c   | 230 
 drivers/misc/ocxl/file.c  | 398 +
 drivers/misc/ocxl/link.c  | 603 
 drivers/misc/ocxl/main.c  |  33 ++
 drivers/misc/ocxl/ocxl_internal.h | 193 +++
 drivers/misc/ocxl/pasid.c | 107 ++
 drivers/misc/ocxl/pci.c   | 585 +++
 drivers/misc/ocxl/sysfs.c | 142 
 include/uapi/misc/ocxl.h  |  40 +++
 10 files changed, 3043 insertions(+)
 create mode 100644 drivers/misc/ocxl/config.c
 create mode 100644 drivers/misc/ocxl/context.c
 create mode 100644 drivers/misc/ocxl/file.c
 create mode 100644 drivers/misc/ocxl/link.c
 create mode 100644 drivers/misc/ocxl/main.c
 create mode 100644 drivers/misc/ocxl/ocxl_internal.h
 create mode 100644 drivers/misc/ocxl/pasid.c
 create mode 100644 drivers/misc/ocxl/pci.c
 create mode 100644 drivers/misc/ocxl/sysfs.c
 create mode 100644 include/uapi/misc/ocxl.h

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
new file mode 100644
index ..ea8cca50ea06
--- /dev/null
+++ b/drivers/misc/ocxl/config.c
@@ -0,0 +1,712 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#include 
+#include 
+#include 
+#include "ocxl_internal.h"
+
+#define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
+#define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
+
+#define OCXL_DVSEC_AFU_IDX_MASK  GENMASK(5, 0)
+#define OCXL_DVSEC_ACTAG_MASKGENMASK(11, 0)
+#define OCXL_DVSEC_PASID_MASKGENMASK(19, 0)
+#define OCXL_DVSEC_PASID_LOG_MASKGENMASK(4, 0)
+
+#define OCXL_DVSEC_TEMPL_VERSION 0x0
+#define OCXL_DVSEC_TEMPL_NAME0x4
+#define OCXL_DVSEC_TEMPL_AFU_VERSION 0x1C
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL 0x20
+#define OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ  0x28
+#define OCXL_DVSEC_TEMPL_MMIO_PP 0x30
+#define OCXL_DVSEC_TEMPL_MMIO_PP_SZ  0x38
+#define OCXL_DVSEC_TEMPL_MEM_SZ  0x3C
+#define OCXL_DVSEC_TEMPL_WWID0x40
+
+#define OCXL_MAX_AFU_PER_FUNCTION 64
+#define OCXL_TEMPL_LEN0x58
+#define OCXL_TEMPL_NAME_LEN   24
+#define OCXL_CFG_TIMEOUT 3
+
+static int find_dvsec(struct pci_dev *dev, int dvsec_id)
+{
+   int vsec = 0;
+   u16 vendor, id;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   &vendor);
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
+   if (vendor == PCI_VENDOR_ID_IBM && id == dvsec_id)
+   return vsec;
+   }
+   return 0;
+}
+
+static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 afu_idx)
+{
+   int vsec = 0;
+   u16 vendor, id;
+   u8 idx;
+
+   while ((vsec = pci_find_next_ext_capability(dev, vsec,
+   OCXL_EXT_CAP_ID_DVSEC))) {
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_VENDOR_OFFSET,
+   &vendor);
+   pci_read_config_word(dev, vsec + OCXL_DVSEC_ID_OFFSET, &id);
+
+   if (vendor == PCI_VENDOR_ID_IBM &&
+   id == OCXL_DVSEC_AFU_CTRL_ID) {
+   pci_read_config_byte(dev,
+   vsec + OCXL_DVSEC_AFU_CTRL_AFU_IDX,
+   &idx);
+   if (idx == afu_idx)
+   return vsec;
+   }
+   }
+   return 0;
+}
+
+static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
+{
+   u16 val;
+   int pos;
+
+   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_PASID);
+   if (!pos) {
+   /*
+* PASID capability is not mandatory, but there
+* shouldn't be any AFU
+*/
+   dev_dbg(&dev->dev, "Function doesn't require any PASID\n");
+   fn->max_pasid_log = -1;
+   got

[PATCH v2 07/13] ocxl: Add AFU interrupt support

2018-01-23 Thread Frederic Barrat
Add user APIs through ioctl to allocate, free, and be notified of an
AFU interrupt.

For opencapi, an AFU can trigger an interrupt on the host by sending a
specific command targeting a 64-bit object handle. On POWER9, this is
implemented by mapping a special page in the address space of a
process and a write to that page will trigger an interrupt.

Signed-off-by: Frederic Barrat 
---
 arch/powerpc/include/asm/pnv-ocxl.h   |   3 +
 arch/powerpc/platforms/powernv/ocxl.c |  30 ++
 drivers/misc/ocxl/afu_irq.c   | 197 ++
 drivers/misc/ocxl/context.c   |  51 -
 drivers/misc/ocxl/file.c  |  34 ++
 drivers/misc/ocxl/link.c  |  28 +
 drivers/misc/ocxl/ocxl_internal.h |   7 ++
 include/uapi/misc/ocxl.h  |   9 ++
 8 files changed, 357 insertions(+), 2 deletions(-)
 create mode 100644 drivers/misc/ocxl/afu_irq.c

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 398d05b30600..f6945d3bc971 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -30,4 +30,7 @@ extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void 
*spa_mem, int PE_mask,
 extern void pnv_ocxl_spa_release(void *platform_data);
 extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
 
+extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
+extern void pnv_ocxl_free_xive_irq(u32 irq);
+
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index 1faaa4ef6903..fa9b53af3c7b 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -2,6 +2,7 @@
 // Copyright 2017 IBM Corp.
 #include 
 #include 
+#include 
 #include 
 #include "pci.h"
 
@@ -483,3 +484,32 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int 
pe_handle)
return rc;
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
+
+int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
+{
+   __be64 flags, trigger_page;
+   s64 rc;
+   u32 hwirq;
+
+   hwirq = xive_native_alloc_irq();
+   if (!hwirq)
+   return -ENOENT;
+
+   rc = opal_xive_get_irq_info(hwirq, &flags, NULL, &trigger_page, NULL,
+   NULL);
+   if (rc || !trigger_page) {
+   xive_native_free_irq(hwirq);
+   return -ENOENT;
+   }
+   *irq = hwirq;
+   *trigger_addr = be64_to_cpu(trigger_page);
+   return 0;
+
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_alloc_xive_irq);
+
+void pnv_ocxl_free_xive_irq(u32 irq)
+{
+   xive_native_free_irq(irq);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_free_xive_irq);
diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
new file mode 100644
index ..f40d853de401
--- /dev/null
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#include 
+#include 
+#include 
+#include "ocxl_internal.h"
+
+struct afu_irq {
+   int id;
+   int hw_irq;
+   unsigned int virq;
+   char *name;
+   u64 trigger_page;
+   struct eventfd_ctx *ev_ctx;
+};
+
+static int irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
+{
+   return (offset - ctx->afu->irq_base_offset) >> PAGE_SHIFT;
+}
+
+static u64 irq_id_to_offset(struct ocxl_context *ctx, int id)
+{
+   return ctx->afu->irq_base_offset + (id << PAGE_SHIFT);
+}
+
+static irqreturn_t afu_irq_handler(int virq, void *data)
+{
+   struct afu_irq *irq = (struct afu_irq *) data;
+
+   if (irq->ev_ctx)
+   eventfd_signal(irq->ev_ctx, 1);
+   return IRQ_HANDLED;
+}
+
+static int setup_afu_irq(struct ocxl_context *ctx, struct afu_irq *irq)
+{
+   int rc;
+
+   irq->virq = irq_create_mapping(NULL, irq->hw_irq);
+   if (!irq->virq) {
+   pr_err("irq_create_mapping failed\n");
+   return -ENOMEM;
+   }
+   pr_debug("hw_irq %d mapped to virq %u\n", irq->hw_irq, irq->virq);
+
+   irq->name = kasprintf(GFP_KERNEL, "ocxl-afu-%u", irq->virq);
+   if (!irq->name) {
+   irq_dispose_mapping(irq->virq);
+   return -ENOMEM;
+   }
+
+   rc = request_irq(irq->virq, afu_irq_handler, 0, irq->name, irq);
+   if (rc) {
+   kfree(irq->name);
+   irq->name = NULL;
+   irq_dispose_mapping(irq->virq);
+   pr_err("request_irq failed: %d\n", rc);
+   return rc;
+   }
+   return 0;
+}
+
+static void release_afu_irq(struct afu_irq *irq)
+{
+   free_irq(irq->virq, irq);
+   irq_dispose_mapping(irq->virq);
+   kfree(irq->name);
+}
+
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)
+{
+   struct afu_irq *irq;
+   int rc;
+
+   irq = kzalloc(sizeof(struct afu_irq), GFP_KERNEL);
+   if (!irq)
+   return -ENOMEM;
+
+   /*
+* We limit the numb

[PATCH v2 08/13] ocxl: Add a kernel API for other opencapi drivers

2018-01-23 Thread Frederic Barrat
Some of the functions done by the generic driver should also be needed
by other opencapi drivers: attaching a context to an adapter,
translation fault handling, AFU interrupt allocation...

So to avoid code duplication, the driver provides a kernel API that
other drivers can use, similar to calling a in-kernel library.

It is still a bit theoretical, for lack of real hardware, and will
likely need adjustements down the road. But we used the cxlflash
driver as a guinea pig.

Signed-off-by: Frederic Barrat 
---
 drivers/misc/ocxl/config.c|  13 ++-
 drivers/misc/ocxl/link.c  |   7 ++
 drivers/misc/ocxl/ocxl_internal.h |  71 +
 include/misc/ocxl.h   | 214 ++
 4 files changed, 234 insertions(+), 71 deletions(-)
 create mode 100644 include/misc/ocxl.h

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index ea8cca50ea06..2e30de9c694a 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -2,8 +2,8 @@
 // Copyright 2017 IBM Corp.
 #include 
 #include 
+#include 
 #include 
-#include "ocxl_internal.h"
 
 #define EXTRACT_BIT(val, bit) (!!(val & BIT(bit)))
 #define EXTRACT_BITS(val, s, e) ((val & GENMASK(e, s)) >> s)
@@ -243,6 +243,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
rc = validate_function(dev, fn);
return rc;
 }
+EXPORT_SYMBOL_GPL(ocxl_config_read_function);
 
 static int read_afu_info(struct pci_dev *dev, struct ocxl_fn_config *fn,
int offset, u32 *data)
@@ -301,6 +302,7 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
}
return 1;
 }
+EXPORT_SYMBOL_GPL(ocxl_config_check_afu_index);
 
 static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,
struct ocxl_afu_config *afu)
@@ -498,6 +500,7 @@ int ocxl_config_read_afu(struct pci_dev *dev, struct 
ocxl_fn_config *fn,
rc = validate_afu(dev, afu);
return rc;
 }
+EXPORT_SYMBOL_GPL(ocxl_config_read_afu);
 
 int ocxl_config_get_actag_info(struct pci_dev *dev, u16 *base, u16 *enabled,
u16 *supported)
@@ -516,6 +519,7 @@ int ocxl_config_get_actag_info(struct pci_dev *dev, u16 
*base, u16 *enabled,
}
return 0;
 }
+EXPORT_SYMBOL_GPL(ocxl_config_get_actag_info);
 
 void ocxl_config_set_afu_actag(struct pci_dev *dev, int pos, int actag_base,
int actag_count)
@@ -528,11 +532,13 @@ void ocxl_config_set_afu_actag(struct pci_dev *dev, int 
pos, int actag_base,
val = actag_base & OCXL_DVSEC_ACTAG_MASK;
pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_ACTAG_BASE, val);
 }
+EXPORT_SYMBOL_GPL(ocxl_config_set_afu_actag);
 
 int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count)
 {
return pnv_ocxl_get_pasid_count(dev, count);
 }
+EXPORT_SYMBOL_GPL(ocxl_config_get_pasid_info);
 
 void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,
u32 pasid_count_log)
@@ -550,6 +556,7 @@ void ocxl_config_set_afu_pasid(struct pci_dev *dev, int 
pos, int pasid_base,
pci_write_config_dword(dev, pos + OCXL_DVSEC_AFU_CTRL_PASID_BASE,
val32);
 }
+EXPORT_SYMBOL_GPL(ocxl_config_set_afu_pasid);
 
 void ocxl_config_set_afu_state(struct pci_dev *dev, int pos, int enable)
 {
@@ -562,6 +569,7 @@ void ocxl_config_set_afu_state(struct pci_dev *dev, int 
pos, int enable)
val &= 0xFE;
pci_write_config_byte(dev, pos + OCXL_DVSEC_AFU_CTRL_ENABLE, val);
 }
+EXPORT_SYMBOL_GPL(ocxl_config_set_afu_state);
 
 int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
 {
@@ -660,6 +668,7 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
kfree(recv_rate);
return rc;
 }
+EXPORT_SYMBOL_GPL(ocxl_config_set_TL);
 
 int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int 
pasid)
 {
@@ -699,6 +708,7 @@ int ocxl_config_terminate_pasid(struct pci_dev *dev, int 
afu_control, int pasid)
}
return 0;
 }
+EXPORT_SYMBOL_GPL(ocxl_config_terminate_pasid);
 
 void ocxl_config_set_actag(struct pci_dev *dev, int func_dvsec, u32 tag_first,
u32 tag_count)
@@ -710,3 +720,4 @@ void ocxl_config_set_actag(struct pci_dev *dev, int 
func_dvsec, u32 tag_first,
pci_write_config_dword(dev, func_dvsec + OCXL_DVSEC_FUNC_OFF_ACTAG,
val);
 }
+EXPORT_SYMBOL_GPL(ocxl_config_set_actag);
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 8bdcef9c3cba..fbca3feec592 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ocxl_internal.h"
 
 
@@ -420,6 +421,7 @@ int ocxl_link_setup(struct pci_dev *dev, int PE_mask, void 
**link_handle)
mutex_unlock(&links_list_lock);
return rc;
 }
+EXPORT_SYMBOL_GPL(ocxl_link_setup);
 
 static void release_xsl(struct kref *ref)

[PATCH v2 09/13] ocxl: Add trace points

2018-01-23 Thread Frederic Barrat
Define a few trace points so that we can use the standard tracing
mechanism for debug and/or monitoring.

Signed-off-by: Frederic Barrat 
---
 drivers/misc/ocxl/afu_irq.c |   5 ++
 drivers/misc/ocxl/context.c |   2 +
 drivers/misc/ocxl/link.c|  11 ++-
 drivers/misc/ocxl/trace.c   |   6 ++
 drivers/misc/ocxl/trace.h   | 182 
 5 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/ocxl/trace.c
 create mode 100644 drivers/misc/ocxl/trace.h

diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
index f40d853de401..e70cfa24577f 100644
--- a/drivers/misc/ocxl/afu_irq.c
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include "ocxl_internal.h"
+#include "trace.h"
 
 struct afu_irq {
int id;
@@ -28,6 +29,7 @@ static irqreturn_t afu_irq_handler(int virq, void *data)
 {
struct afu_irq *irq = (struct afu_irq *) data;
 
+   trace_ocxl_afu_irq_receive(virq);
if (irq->ev_ctx)
eventfd_signal(irq->ev_ctx, 1);
return IRQ_HANDLED;
@@ -102,6 +104,8 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 
*irq_offset)
 
*irq_offset = irq_id_to_offset(ctx, irq->id);
 
+   trace_ocxl_afu_irq_alloc(ctx->pasid, irq->id, irq->virq, irq->hw_irq,
+   *irq_offset);
mutex_unlock(&ctx->irq_lock);
return 0;
 
@@ -117,6 +121,7 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 
*irq_offset)
 
 static void afu_irq_free(struct afu_irq *irq, struct ocxl_context *ctx)
 {
+   trace_ocxl_afu_irq_free(ctx->pasid, irq->id);
if (ctx->mapping)
unmap_mapping_range(ctx->mapping,
irq_id_to_offset(ctx, irq->id),
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 269149490063..909e8807824a 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0+
 // Copyright 2017 IBM Corp.
 #include 
+#include "trace.h"
 #include "ocxl_internal.h"
 
 struct ocxl_context *ocxl_context_alloc(void)
@@ -214,6 +215,7 @@ int ocxl_context_detach(struct ocxl_context *ctx)
mutex_lock(&ctx->afu->afu_control_lock);
rc = ocxl_config_terminate_pasid(dev, afu_control_pos, ctx->pasid);
mutex_unlock(&ctx->afu->afu_control_lock);
+   trace_ocxl_terminate_pasid(ctx->pasid, rc);
if (rc) {
/*
 * If we timeout waiting for the AFU to terminate the
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fbca3feec592..f30790582dc0 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include "ocxl_internal.h"
+#include "trace.h"
 
 
 #define SPA_PASID_BITS 15
@@ -116,8 +117,11 @@ static void ack_irq(struct spa *spa, enum xsl_response r)
else
WARN(1, "Invalid irq response %d\n", r);
 
-   if (reg)
+   if (reg) {
+   trace_ocxl_fault_ack(spa->spa_mem, spa->xsl_fault.pe,
+   spa->xsl_fault.dsisr, spa->xsl_fault.dar, reg);
out_be64(spa->reg_tfc, reg);
+   }
 }
 
 static void xsl_fault_handler_bh(struct work_struct *fault_work)
@@ -182,6 +186,7 @@ static irqreturn_t xsl_fault_handler(int irq, void *data)
int lpid, pid, tid;
 
read_irq(spa, &dsisr, &dar, &pe_handle);
+   trace_ocxl_fault(spa->spa_mem, pe_handle, dsisr, dar, -1);
 
WARN_ON(pe_handle > SPA_PE_MASK);
pe = spa->spa_mem + pe_handle;
@@ -532,6 +537,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 * the problem.
 */
mmgrab(mm);
+   trace_ocxl_context_add(current->pid, spa->spa_mem, pasid, pidr, tidr);
 unlock:
mutex_unlock(&spa->spa_lock);
return rc;
@@ -577,6 +583,9 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
goto unlock;
}
 
+   trace_ocxl_context_remove(current->pid, spa->spa_mem, pasid,
+   be32_to_cpu(pe->pid), be32_to_cpu(pe->tid));
+
memset(pe, 0, sizeof(struct ocxl_process_element));
/*
 * The barrier makes sure the PE is removed from the SPA
diff --git a/drivers/misc/ocxl/trace.c b/drivers/misc/ocxl/trace.c
new file mode 100644
index ..1e6947049697
--- /dev/null
+++ b/drivers/misc/ocxl/trace.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#ifndef __CHECKER__
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+#endif
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
new file mode 100644
index ..bcb7ff330c1e
--- /dev/null
+++ b/drivers/misc/ocxl/trace.h
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ocxl
+
+#if !defined(_TRACE_OCXL_H) || defined(TRACE_HEADER_MULT

[PATCH v2 10/13] ocxl: Add Makefile and Kconfig

2018-01-23 Thread Frederic Barrat
OCXL_BASE triggers the platform support needed by the driver.

Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
---
 drivers/misc/Kconfig   |  1 +
 drivers/misc/Makefile  |  1 +
 drivers/misc/ocxl/Kconfig  | 31 +++
 drivers/misc/ocxl/Makefile | 11 +++
 4 files changed, 44 insertions(+)
 create mode 100644 drivers/misc/ocxl/Kconfig
 create mode 100644 drivers/misc/ocxl/Makefile

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index f1a5c2357b14..0534f338c84a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -508,4 +508,5 @@ source "drivers/misc/mic/Kconfig"
 source "drivers/misc/genwqe/Kconfig"
 source "drivers/misc/echo/Kconfig"
 source "drivers/misc/cxl/Kconfig"
+source "drivers/misc/ocxl/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 5ca5f64df478..73326d54e246 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_CXL_BASE)+= cxl/
 obj-$(CONFIG_ASPEED_LPC_CTRL)  += aspeed-lpc-ctrl.o
 obj-$(CONFIG_ASPEED_LPC_SNOOP) += aspeed-lpc-snoop.o
 obj-$(CONFIG_PCI_ENDPOINT_TEST)+= pci_endpoint_test.o
+obj-$(CONFIG_OCXL) += ocxl/
 
 lkdtm-$(CONFIG_LKDTM)  += lkdtm_core.o
 lkdtm-$(CONFIG_LKDTM)  += lkdtm_bugs.o
diff --git a/drivers/misc/ocxl/Kconfig b/drivers/misc/ocxl/Kconfig
new file mode 100644
index ..4bbdb0d3c8ee
--- /dev/null
+++ b/drivers/misc/ocxl/Kconfig
@@ -0,0 +1,31 @@
+#
+# Open Coherent Accelerator (OCXL) compatible devices
+#
+
+config OCXL_BASE
+   bool
+   default n
+   select PPC_COPRO_BASE
+
+config OCXL
+   tristate "OpenCAPI coherent accelerator support"
+   depends on PPC_POWERNV && PCI && EEH
+   select OCXL_BASE
+   default m
+   help
+ Select this option to enable the ocxl driver for Open
+ Coherent Accelerator Processor Interface (OpenCAPI) devices.
+
+ OpenCAPI allows FPGA and ASIC accelerators to be coherently
+ attached to a CPU over an OpenCAPI link.
+
+ The ocxl driver enables userspace programs to access these
+ accelerators through devices in /dev/ocxl/.
+
+ For more information, see http://opencapi.org.
+
+ This is not to be confused with the support for IBM CAPI
+ accelerators (CONFIG_CXL), which are PCI-based instead of a
+ dedicated OpenCAPI link, and don't follow the same protocol.
+
+ If unsure, say N.
diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
new file mode 100644
index ..5229dcda8297
--- /dev/null
+++ b/drivers/misc/ocxl/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0+
+ccflags-$(CONFIG_PPC_WERROR)   += -Werror
+
+ocxl-y += main.o pci.o config.o file.o pasid.o
+ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
+obj-$(CONFIG_OCXL) += ocxl.o
+
+# For tracepoints to include our trace.h from tracepoint infrastructure:
+CFLAGS_trace.o := -I$(src)
+
+# ccflags-y += -DDEBUG
-- 
2.14.1



[PATCH v2 11/13] cxl: Remove support for "Processing accelerators" class

2018-01-23 Thread Frederic Barrat
The cxl driver currently declares in its table of supported PCI
devices the class "Processing accelerators". Therefore it may be
called to probe for opencapi devices, which generates errors, as the
config space of a cxl device is not compatible with opencapi.

So remove support for the generic class, as we now have (at least) two
drivers for devices of the same class. Most cxl devices are FPGAs with
a PSL which will show a known device ID of 0x477. Other devices are
really supported by the cxlflash driver and are already listed in the
table. So removing the class is expected to go unnoticed.

Signed-off-by: Frederic Barrat 
Acked-by: Andrew Donnellan 
---
 drivers/misc/cxl/pci.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 19969ee86d6f..758842f65a1b 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -125,8 +125,6 @@ static const struct pci_device_id cxl_pci_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0601), },
{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0623), },
{ PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0628), },
-   { PCI_DEVICE_CLASS(0x12, ~0), },
-
{ }
 };
 MODULE_DEVICE_TABLE(pci, cxl_pci_tbl);
-- 
2.14.1



[PATCH v2 12/13] ocxl: Documentation

2018-01-23 Thread Frederic Barrat
ocxl.rst gives a quick, high-level view of opencapi.

Update ioctl-number.txt to reflect ioctl numbers being used by the
ocxl driver

Signed-off-by: Frederic Barrat 
---
 Documentation/ABI/testing/sysfs-class-ocxl |  35 +++
 Documentation/accelerators/ocxl.rst| 160 +
 Documentation/ioctl/ioctl-number.txt   |   1 +
 3 files changed, 196 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-ocxl
 create mode 100644 Documentation/accelerators/ocxl.rst

diff --git a/Documentation/ABI/testing/sysfs-class-ocxl 
b/Documentation/ABI/testing/sysfs-class-ocxl
new file mode 100644
index ..ac11deb71235
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-ocxl
@@ -0,0 +1,35 @@
+What:   /sys/class/ocxl//afu_version
+Date:   January 2018
+Contact:linuxppc-dev@lists.ozlabs.org
+Description:read only
+Version of the AFU, in the format :
+   Reflects what is read in the configuration space of the AFU
+
+What:   /sys/class/ocxl//contexts
+Date:   January 2018
+Contact:linuxppc-dev@lists.ozlabs.org
+Description:read only
+   Number of contexts for the AFU, in the format /
+   where:
+   n: number of currently active contexts, for debug
+   max: maximum number of contexts supported by the AFU
+
+What:   /sys/class/ocxl//pp_mmio_size
+Date:   January 2018
+Contact:linuxppc-dev@lists.ozlabs.org
+Description:read only
+   Size of the per-process mmio area, as defined in the
+   configuration space of the AFU
+
+What:   /sys/class/ocxl//global_mmio_size
+Date:   January 2018
+Contact:linuxppc-dev@lists.ozlabs.org
+Description:read only
+   Size of the global mmio area, as defined in the
+   configuration space of the AFU
+
+What:   /sys/class/ocxl//global_mmio_area
+Date:   January 2018
+Contact:linuxppc-dev@lists.ozlabs.org
+Description:read/write
+   Give access the global mmio area for the AFU
diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
new file mode 100644
index ..4f7af841d935
--- /dev/null
+++ b/Documentation/accelerators/ocxl.rst
@@ -0,0 +1,160 @@
+
+OpenCAPI (Open Coherent Accelerator Processor Interface)
+
+
+OpenCAPI is an interface between processors and accelerators. It aims
+at being low-latency and high-bandwidth. The specification is
+developed by the `OpenCAPI Consortium `_.
+
+It allows an accelerator (which could be a FPGA, ASICs, ...) to access
+the host memory coherently, using virtual addresses. An OpenCAPI
+device can also host its own memory, that can be accessed from the
+host.
+
+OpenCAPI is known in linux as 'ocxl', as the open, processor-agnostic
+evolution of 'cxl' (the driver for the IBM CAPI interface for
+powerpc), which was named that way to avoid confusion with the ISDN
+CAPI subsystem.
+
+
+High-level view
+===
+
+OpenCAPI defines a Data Link Layer (DL) and Transaction Layer (TL), to
+be implemented on top of a physical link. Any processor or device
+implementing the DL and TL can start sharing memory.
+
+::
+
+  +---+ +-+
+  |   | | |
+  |   | | Accelerated |
+  | Processor | |  Function   |
+  |   |  ++ |Unit |  ++
+  |   |--| Memory | |(AFU)|--| Memory |
+  |   |  ++ | |  ++
+  +---+ +-+
+   |   |
+  +---+ +-+
+  |TL | |TLX  |
+  +---+ +-+
+   |   |
+  +---+ +-+
+  |DL | |DLX  |
+  +---+ +-+
+   |   |
+   |   PHY |
+   +---+
+
+
+
+Device discovery
+
+
+OpenCAPI relies on a PCI-like configuration space, implemented on the
+device. So the host can discover AFUs by querying the config space.
+
+OpenCAPI devices in Linux are treated like PCI devices (with a few
+caveats). The firmware is expected to abstract the hardware as if it
+was a PCI link. A lot of the existing PCI infrastructure is reused:
+devices are scanned and BARs are assigned during the standard PCI
+enumeration

[PATCH v2 13/13] ocxl: add MAINTAINERS entry

2018-01-23 Thread Frederic Barrat
Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a6e86e20761e..8a0357f3b7bc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9820,6 +9820,18 @@ M:   Josh Poimboeuf 
 S: Supported
 F: tools/objtool/
 
+OCXL (Open Coherent Accelerator Processor Interface OpenCAPI) DRIVER
+M: Frederic Barrat 
+M: Andrew Donnellan 
+L: linuxppc-dev@lists.ozlabs.org
+S: Supported
+F: arch/powerpc/platforms/powernv/ocxl.c
+F: arch/powerpc/include/asm/pnv-ocxl.h
+F: drivers/misc/ocxl/
+F: include/misc/ocxl*
+F: include/uapi/misc/ocxl.h
+F: Documentation/accelerators/ocxl.txt
+
 OMAP AUDIO SUPPORT
 M: Peter Ujfalusi 
 M: Jarkko Nikula 
-- 
2.14.1



[PATCH v2 01/13] powerpc/powernv: Introduce new PHB type for opencapi links

2018-01-23 Thread Frederic Barrat
The NPU was already abstracted by opal as a virtual PHB for nvlink,
but it helps to be able to differentiate between a nvlink or opencapi
PHB, as it's not completely transparent to linux. In particular, PE
assignment differs and we'll also need the information in later
patches.

So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
new type PNV_PHB_NPU_OCAPI.

Signed-off-by: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
---
 arch/powerpc/platforms/powernv/npu-dma.c  |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c | 41 +--
 arch/powerpc/platforms/powernv/pci.c  |  4 +++
 arch/powerpc/platforms/powernv/pci.h  |  8 +++---
 4 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index f6cbc1a71472..c5899c107d59 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -277,7 +277,7 @@ static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
int64_t rc = 0;
phys_addr_t top = memblock_end_of_DRAM();
 
-   if (phb->type != PNV_PHB_NPU || !npe->pdev)
+   if (phb->type != PNV_PHB_NPU_NVLINK || !npe->pdev)
return -EINVAL;
 
rc = pnv_npu_unset_window(npe, 0);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 74903064..e780263a14ee 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -54,7 +54,8 @@
 #define POWERNV_IOMMU_DEFAULT_LEVELS   1
 #define POWERNV_IOMMU_MAX_LEVELS   5
 
-static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU" };
+static const char * const pnv_phb_names[] = { "IODA1", "IODA2", "NPU_NVLINK",
+ "NPU_OCAPI" };
 static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
 
 void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
@@ -924,7 +925,7 @@ static int pnv_ioda_configure_pe(struct pnv_phb *phb, 
struct pnv_ioda_pe *pe)
 * Configure PELTV. NPUs don't have a PELTV table so skip
 * configuration on them.
 */
-   if (phb->type != PNV_PHB_NPU)
+   if (phb->type != PNV_PHB_NPU_NVLINK && phb->type != PNV_PHB_NPU_OCAPI)
pnv_ioda_set_peltv(phb, pe, true);
 
/* Setup reverse map */
@@ -1272,16 +1273,23 @@ static void pnv_pci_ioda_setup_PEs(void)
 {
struct pci_controller *hose, *tmp;
struct pnv_phb *phb;
+   struct pci_bus *bus;
+   struct pci_dev *pdev;
 
list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
phb = hose->private_data;
-   if (phb->type == PNV_PHB_NPU) {
+   if (phb->type == PNV_PHB_NPU_NVLINK) {
/* PE#0 is needed for error reporting */
pnv_ioda_reserve_pe(phb, 0);
pnv_ioda_setup_npu_PEs(hose->bus);
if (phb->model == PNV_PHB_MODEL_NPU2)
pnv_npu2_init(phb);
}
+   if (phb->type == PNV_PHB_NPU_OCAPI) {
+   bus = hose->bus;
+   list_for_each_entry(pdev, &bus->devices, bus_list)
+   pnv_ioda_setup_dev_PE(pdev);
+   }
}
 }
 
@@ -2640,7 +2648,7 @@ static int gpe_table_group_to_npe_cb(struct device *dev, 
void *opaque)
 
hose = pci_bus_to_host(pdev->bus);
phb = hose->private_data;
-   if (phb->type != PNV_PHB_NPU)
+   if (phb->type != PNV_PHB_NPU_NVLINK)
return 0;
 
*ptmppe = &phb->ioda.pe_array[pdn->pe_number];
@@ -2724,7 +2732,7 @@ static void pnv_pci_ioda_setup_iommu_api(void)
list_for_each_entry_safe(hose, tmp, &hose_list, list_node) {
phb = hose->private_data;
 
-   if (phb->type != PNV_PHB_NPU)
+   if (phb->type != PNV_PHB_NPU_NVLINK)
continue;
 
list_for_each_entry(pe, &phb->ioda.pe_list, list) {
@@ -3774,6 +3782,13 @@ static const struct pci_controller_ops 
pnv_npu_ioda_controller_ops = {
.shutdown   = pnv_pci_ioda_shutdown,
 };
 
+static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
+   .enable_device_hook = pnv_pci_enable_device_hook,
+   .window_alignment   = pnv_pci_window_alignment,
+   .reset_secondary_bus= pnv_pci_reset_secondary_bus,
+   .shutdown   = pnv_pci_ioda_shutdown,
+};
+
 #ifdef CONFIG_CXL_BASE
 const struct pci_controller_ops pnv_cxl_cx4_ioda_controller_ops = {
.dma_dev_setup  = pnv_pci_dma_dev_setup,
@@ -4007,9 +4022,14 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 */
ppc_md.pcibios_fixup = pnv_pci_ioda_fixup;
 
-   if (phb->type == PNV_PHB_NPU) {
+   switch (phb->type) {
+   

Re: [PATCH v8 1/2] powerpc/powernv: Enable tunneled operations

2018-01-23 Thread Frederic Barrat



Le 22/01/2018 à 14:33, Philippe Bergheaud a écrit :

P9 supports PCI tunneled operations (atomics and as_notify). This
patch adds support for tunneled operations on powernv, with a new
API, to be called by device drivers:

pnv_pci_get_tunnel_ind()
Tell driver the 16-bit ASN indication used by kernel.

pnv_pci_set_tunnel_bar()
Tell kernel the Tunnel BAR Response address used by driver.
This function uses two new OPAL calls, as the PBCQ Tunnel BAR
register is configured by skiboot.

pnv_pci_get_as_notify_info()
Return the ASN info of the thread to be woken up.

Signed-off-by: Philippe Bergheaud 
---


still
Acked-by: Frederic Barrat 



Changelog:

v2: Do not set the ASN indication. Get it from the device tree.

v3: Make pnv_pci_get_phb_node() available when compiling without cxl.

v4: Add pnv_pci_get_as_notify_info().
 Rebase opal call numbers on skiboot 5.9.6.

v5: pnv_pci_get_tunnel_ind():
   - fix node reference count
 pnv_pci_get_as_notify_info():
   - fail if task == NULL
   - read pid from mm->context.id
   - explain that thread.tidr require CONFIG_PPC64

v6: pnv_pci_get_tunnel_ind():
   - check if radix is enabled, or else return an error
 pnv_pci_get_as_notify_info():
   - remove a capi-specific comment, irrelevant for pci

v7: pnv_pci_set_tunnel_bar():
   - setting the tunnel bar more than once with the same value
 is not an error

v8: No change

This patch depends on the following skiboot patches:
   https://patchwork.ozlabs.org/patch/858324/
   https://patchwork.ozlabs.org/patch/858325/
---
  arch/powerpc/include/asm/opal-api.h|   4 +-
  arch/powerpc/include/asm/opal.h|   2 +
  arch/powerpc/include/asm/pnv-pci.h |   5 ++
  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
  arch/powerpc/platforms/powernv/pci-cxl.c   |   8 --
  arch/powerpc/platforms/powernv/pci.c   | 107 +
  6 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 233c7504b1f2..b901f4d9f009 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -201,7 +201,9 @@
  #define OPAL_SET_POWER_SHIFT_RATIO155
  #define OPAL_SENSOR_GROUP_CLEAR   156
  #define OPAL_PCI_SET_P2P  157
-#define OPAL_LAST  157
+#define OPAL_PCI_GET_PBCQ_TUNNEL_BAR   159
+#define OPAL_PCI_SET_PBCQ_TUNNEL_BAR   160
+#define OPAL_LAST  160

  /* Device tree flags */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 0c545f7fc77b..8705e422b893 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -198,6 +198,8 @@ int64_t opal_unregister_dump_region(uint32_t id);
  int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
  int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
  int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
+int64_t opal_pci_get_pbcq_tunnel_bar(uint64_t phb_id, uint64_t *addr);
+int64_t opal_pci_set_pbcq_tunnel_bar(uint64_t phb_id, uint64_t addr);
  int64_t opal_ipmi_send(uint64_t interface, struct opal_ipmi_msg *msg,
uint64_t msg_len);
  int64_t opal_ipmi_recv(uint64_t interface, struct opal_ipmi_msg *msg,
diff --git a/arch/powerpc/include/asm/pnv-pci.h 
b/arch/powerpc/include/asm/pnv-pci.h
index 3e5cf251ad9a..c69de3276b5e 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -29,6 +29,11 @@ extern int pnv_pci_set_power_state(uint64_t id, uint8_t 
state,
  extern int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
   u64 desc);

+extern int pnv_pci_get_tunnel_ind(struct pci_dev *dev, uint64_t *ind);
+extern int pnv_pci_set_tunnel_bar(struct pci_dev *dev, uint64_t addr,
+ int enable);
+extern int pnv_pci_get_as_notify_info(struct task_struct *task, u32 *lpid,
+ u32 *pid, u32 *tid);
  int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
  int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 6f4b00a2ac46..5da790fb7fef 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -320,3 +320,5 @@ OPAL_CALL(opal_set_powercap,
OPAL_SET_POWERCAP);
  OPAL_CALL(opal_get_power_shift_ratio, OPAL_GET_POWER_SHIFT_RATIO);
  OPAL_CALL(opal_set_power_shift_ratio, OPAL_SET_POWER_SHIFT_RATIO);
  OPAL_CALL(opal_sensor_group_clear,OPAL_SENSOR_GROUP_CLEAR);
+OPAL_CALL(opal_pci_get_pbcq_tunne

Re: [PATCH v8 2/2] cxl: read PHB indications from the device tree

2018-01-23 Thread Frederic Barrat



Le 22/01/2018 à 14:33, Philippe Bergheaud a écrit :

Configure the P9 XSL_DSNCTL register with PHB indications found
in the device tree, or else use legacy hard-coded values.

Signed-off-by: Philippe Bergheaud 
---


simpler, safer
Acked-by: Frederic Barrat 



Changelog:

v2: New patch. Use the new device tree property "ibm,phb-indications".

v3: No change.

v4: No functional change.
 Drop cosmetic fix in comment.

v5: get_phb_indications():
   - make static variables local to function.
   - return static variable values by arguments.

v6: get_phb_indications():
   - acquire a mutex before setting the phb indications.

v7: get_phb_indications():
 cxl_get_xsl9_dsnctl():
   - return -ENODEV instead of -1.

v8: get_phb_indications():
   - stay on the safe side: acquire the mutex unconditionally

This patch depends on the following skiboot patch:
   https://patchwork.ozlabs.org/patch/858324/
---
  drivers/misc/cxl/cxl.h|  2 +-
  drivers/misc/cxl/cxllib.c |  2 +-
  drivers/misc/cxl/pci.c| 48 ++-
  3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index e46a4062904a..5a6e9a921c2b 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -1062,7 +1062,7 @@ int cxl_psl_purge(struct cxl_afu *afu);
  int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid,
  u32 *phb_index, u64 *capp_unit_id);
  int cxl_slot_is_switched(struct pci_dev *dev);
-int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg);
+int cxl_get_xsl9_dsnctl(struct pci_dev *dev, u64 capp_unit_id, u64 *reg);
  u64 cxl_calculate_sr(bool master, bool kernel, bool real_mode, bool p9);

  void cxl_native_irq_dump_regs_psl9(struct cxl_context *ctx);
diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c
index dc9bc1807fdf..61f80d586279 100644
--- a/drivers/misc/cxl/cxllib.c
+++ b/drivers/misc/cxl/cxllib.c
@@ -99,7 +99,7 @@ int cxllib_get_xsl_config(struct pci_dev *dev, struct 
cxllib_xsl_config *cfg)
if (rc)
return rc;

-   rc = cxl_get_xsl9_dsnctl(capp_unit_id, &cfg->dsnctl);
+   rc = cxl_get_xsl9_dsnctl(dev, capp_unit_id, &cfg->dsnctl);
if (rc)
return rc;
if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 19969ee86d6f..12e5cae6d452 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -409,21 +409,59 @@ int cxl_calc_capp_routing(struct pci_dev *dev, u64 
*chipid,
return 0;
  }

-int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg)
+static DEFINE_MUTEX(indications_mutex);
+
+static int get_phb_indications(struct pci_dev *dev, u64* capiind, u64 *asnind,
+  u64 *nbwind)
+{
+   static u64 nbw, asn, capi = 0;
+   struct device_node *np;
+   const __be32 *prop;
+
+   mutex_lock(&indications_mutex);
+   if (!capi) {
+   if (!(np = pnv_pci_get_phb_node(dev))) {
+   mutex_unlock(&indications_mutex);
+   return -ENODEV;
+   }
+
+   prop = of_get_property(np, "ibm,phb-indications", NULL);
+   if (!prop) {
+   nbw = 0x0300UL; /* legacy values */
+   asn = 0x0400UL;
+   capi = 0x0200UL;
+   } else {
+   nbw = (u64)be32_to_cpu(prop[2]);
+   asn = (u64)be32_to_cpu(prop[1]);
+   capi = (u64)be32_to_cpu(prop[0]);
+   }
+   of_node_put(np);
+   }
+   *capiind = capi;
+   *asnind = asn;
+   *nbwind = nbw;
+   mutex_unlock(&indications_mutex);
+   return 0;
+}
+
+int cxl_get_xsl9_dsnctl(struct pci_dev *dev, u64 capp_unit_id, u64 *reg)
  {
u64 xsl_dsnctl;
+   u64 capiind, asnind, nbwind;

/*
 * CAPI Identifier bits [0:7]
 * bit 61:60 MSI bits --> 0
 * bit 59 TVT selector --> 0
 */
+   if (get_phb_indications(dev, &capiind, &asnind, &nbwind))
+   return -ENODEV;

/*
 * Tell XSL where to route data to.
 * The field chipid should match the PHB CAPI_CMPM register
 */
-   xsl_dsnctl = ((u64)0x2 << (63-7)); /* Bit 57 */
+   xsl_dsnctl = (capiind << (63-15)); /* Bit 57 */
xsl_dsnctl |= (capp_unit_id << (63-15));

/* nMMU_ID Defaults to: b’01001’*/
@@ -437,14 +475,14 @@ int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg)
 * nbwind=0x03, bits [57:58], must include capi indicator.
 * Not supported on P9 DD1.
 */
-   xsl_dsnctl |= ((u64)0x03 << (63-47));
+   xsl_dsnctl |= (nbwind << (63-55));

/*
 * Upper 16b address bits of ASB_Notify messages sent to the
 * system. Need to match the PHB’s ASN Compare/Ma

Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues

2018-01-23 Thread Michael Ellerman
Wolfram Sang  writes:

> Hi Frank,
>
>> Please go back and read the thread for version 1.  Simply resubmitting a
>> forward port is ignoring that whole conversation.
>> 
>> There is a lot of good info in that thread.  I certainly learned stuff in it.
>
> Yes, I did that and learned stuff, too. My summary of the discussion was:
>
> - you mentioned some drawbacks you saw (like the mixture of trace output
>   and printk output)
> - most of them look like addressed to me? (e.g. Steven showed a way to 
> redirect
>   printk to trace)
> - you posted your version (which was, however, marked as "not user friendly"
>   even by yourself)
> - The discussion stalled over having two approaches
>
> So, I thought reposting would be a good way of finding out if your
> concerns were addressed in the discussion or not. If I overlooked
> something, I am sorry for that. Still, my intention is to continue the
> discussion, not to ignore it. Because as it stands, we don't have such a
> debugging mechanism in place currently, and with people working with DT
> overlays, I'd think it would be nice to have.

Yeah I agree with all of that, I didn't think there were really any
concerns left outstanding. These trace points are very useful, I've
twice added them to a kernel to debug something, so it would be great
for them to be in mainline.

cheers


Re: [PATCH v2 1/6] resource: Extend the PPC32 reserved memory hack

2018-01-23 Thread Michael Ellerman
Jonathan Neuschäfer  writes:

> On the Nintendo Wii, there are two ranges of physical memory, and MMIO
> in between, but Linux on ppc32 doesn't support discontiguous memory.
> Therefore a hack was introduced in commit c5df7f775148 ("powerpc: allow
> ioremap within reserved memory regions") and commit de32400dd26e ("wii:
> use both mem1 and mem2 as ram"):
>
>  - Treat the area from the start of the first memory area (MEM1) to the
>end of the second (MEM2) as one big memory area, but mark the part
>that doesn't belong to MEM1 or MEM2 as reserved.
>  - Only on the Wii, allow ioremap to be used on reserved memory.
>
> This hack, however, doesn't account for the "resource"-based API in
> kernel/resource.c, because __request_region performs its own checks.
>
> Extend the hack to kernel/resource.c, to allow more drivers to allocate
> their MMIO regions on the Wii.

Hi Jonathan,

Sorry but I can't merge a hack like this in generic code.

Has anyone looked at adding proper discontig mem support to PPC32?

Or can we punch a hole in the resource in the right place? Maybe from
add_system_ram_resources() ?

cheers


[PATCH] powerpc/mm/nohash: do not flush the entire mm when range is a single page

2018-01-23 Thread Christophe Leroy
Most of the time, flush_tlb_range() is called on single pages.
At the time being, flush_tlb_range() inconditionnaly calls
flush_tlb_mm() which flushes at least the entire PID pages and on
older CPUs like 4xx or 8xx it flushes the entire TLB table.

This patch calls flush_tlb_page() instead of flush_tlb_mm() when
the range is a single page.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/tlb_nohash.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index bfc4a0869609..15fe5f0c8665 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -388,7 +388,10 @@ void flush_tlb_range(struct vm_area_struct *vma, unsigned 
long start,
 unsigned long end)
 
 {
-   flush_tlb_mm(vma->vm_mm);
+   if (end - start == PAGE_SIZE && !(start & ~PAGE_MASK))
+   flush_tlb_page(vma, start);
+   else
+   flush_tlb_mm(vma->vm_mm);
 }
 EXPORT_SYMBOL(flush_tlb_range);
 
-- 
2.13.3



Re: [PATCH] powerpc: pseries: use irq_of_parse_and_map helper

2018-01-23 Thread Rob Herring
On Tue, Jan 23, 2018 at 12:53 AM, Michael Ellerman  wrote:
> Rob Herring  writes:
>
>> Instead of calling both of_irq_parse_one and irq_create_of_mapping, call
>> of_irq_parse_and_map instead which does the same thing. This gets us closer
>> to making the former 2 functions static.
>>
>> Cc: Benjamin Herrenschmidt 
>> Cc: Paul Mackerras 
>> Cc: Michael Ellerman 
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Rob Herring 
>> ---
>>  arch/powerpc/platforms/pseries/event_sources.c | 11 ---
>>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> Sorry NAK, this doesn't boot.
>
>> diff --git a/arch/powerpc/platforms/pseries/event_sources.c 
>> b/arch/powerpc/platforms/pseries/event_sources.c
>> index 6eeb0d4bab61..b0d8c146fe7b 100644
>> --- a/arch/powerpc/platforms/pseries/event_sources.c
>> +++ b/arch/powerpc/platforms/pseries/event_sources.c
>> @@ -16,7 +16,8 @@
>>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
>>   */
>>
>> -#include 
>> +#include 
>> +#include 
>>
>>  #include "pseries.h"
>>
>> @@ -25,15 +26,11 @@ void request_event_sources_irqs(struct device_node *np,
>>   const char *name)
>>  {
>>   int i, index, count = 0;
>> - struct of_phandle_args oirq;
>>   unsigned int virqs[16];
>>
>>   /* First try to do a proper OF tree parsing */
>> - for (index = 0; of_irq_parse_one(np, index, &oirq) == 0;
>> -  index++) {
>> - if (count > 15)
>> - break;
>> - virqs[count] = irq_create_of_mapping(&oirq);
>> + for (index = 0; count < 16; index++) {
>> + virqs[count] = irq_of_parse_and_map(np, index);
>>   if (!virqs[count]) {
>>   pr_err("event-sources: Unable to allocate "
>>  "interrupt number for %pOF\n",
>
>np);
> WARN_ON(1);
> } else {
> count++;
> }
> }
>
>
> Which is an infinite loop if we have less than 16 irqs, and spews the
> warning continuously.
>
> Are you trying to remove the low-level routines or is this just a
> cleanup?

The former, but I'm not sure that will happen. There's a handful of
others left, but they aren't simply a call to of_irq_parse_one and
then irq_create_of_mapping.

> The patch below works, it loses the error handling if the interrupts
> property is corrupt/empty, but that's probably overly paranoid anyway.

Not quite. Previously, it was silent if parsing failed. Only the
mapping would give an error which would mean the interrupt parent had
some error.

Actually, we could use of_irq_get here to preserve the error handling.
It will return error codes from parsing, 0 on mapping failure, or the
Linux irq number. It adds an irq_find_host call for deferred probe,
but that should be harmless. I'll respin it.

Rob


[PATCH 02/11] powerpc: membarrier: Skip memory barrier in switch_mm() (v7)

2018-01-23 Thread Mathieu Desnoyers
Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Signed-off-by: Mathieu Desnoyers 
CC: Peter Zijlstra 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
CC: Alan Stern 
CC: Will Deacon 
CC: Andy Lutomirski 
CC: Ingo Molnar 
CC: Alexander Viro 
CC: Nicholas Piggin 
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-a...@vger.kernel.org
---
Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Changes since v5:
- Rebase on v4.14-rc6.
- Fold "Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on
  powerpc (v2)"

Changes since v6:
- Rename MEMBARRIER_STATE_SWITCH_MM to MEMBARRIER_STATE_PRIVATE_EXPEDITED.
---
 MAINTAINERS   |  1 +
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/membarrier.h | 26 ++
 arch/powerpc/mm/mmu_context.c |  7 +++
 include/linux/sched/mm.h  | 13 -
 init/Kconfig  |  3 +++
 kernel/sched/core.c   | 10 --
 kernel/sched/membarrier.c |  8 
 8 files changed, 58 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h

diff --git a/MAINTAINERS b/MAINTAINERS
index e3581413420c..11ff47c28b12 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8931,6 +8931,7 @@ L:linux-ker...@vger.kernel.org
 S: Supported
 F: kernel/sched/membarrier.c
 F: include/uapi/linux/membarrier.h
+F: arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L: linux...@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2ed525a44734..09b02180b8a0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,6 +140,7 @@ config PPC
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_PMEM_APIif PPC64
+   select ARCH_HAS_MEMBARRIER_HOOKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h 
b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index ..98ff4f1fcf2b
-

Re: [PATCH v2 1/6] resource: Extend the PPC32 reserved memory hack

2018-01-23 Thread Jonathan Neuschäfer
On Tue, Jan 23, 2018 at 11:58:06PM +1100, Michael Ellerman wrote:
> Jonathan Neuschäfer  writes:
> 
> > On the Nintendo Wii, there are two ranges of physical memory, and MMIO
> > in between, but Linux on ppc32 doesn't support discontiguous memory.
> > Therefore a hack was introduced in commit c5df7f775148 ("powerpc: allow
> > ioremap within reserved memory regions") and commit de32400dd26e ("wii:
> > use both mem1 and mem2 as ram"):
> >
> >  - Treat the area from the start of the first memory area (MEM1) to the
> >end of the second (MEM2) as one big memory area, but mark the part
> >that doesn't belong to MEM1 or MEM2 as reserved.
> >  - Only on the Wii, allow ioremap to be used on reserved memory.
> >
> > This hack, however, doesn't account for the "resource"-based API in
> > kernel/resource.c, because __request_region performs its own checks.
> >
> > Extend the hack to kernel/resource.c, to allow more drivers to allocate
> > their MMIO regions on the Wii.
> 
> Hi Jonathan,
> 
> Sorry but I can't merge a hack like this in generic code.

Makes sense.

> Has anyone looked at adding proper discontig mem support to PPC32?

I'm not aware of any such effort.

Do you have any pointer on how to implement discontiguous memory
support? CONFIG_ARCH_SPARSEMEM_ENABLE seems relevant.

> Or can we punch a hole in the resource in the right place? Maybe from
> add_system_ram_resources() ?

Not sure. add_system_ram_resources would need the original memblock
table, which is overwritten in wii_memory_fixups, if I read the code
correctly.

If a proper solution doesn't take an overwhelming amount of work, I'd
prefer a proper solution.


Thanks,
Jonathan Neuschäfer


signature.asc
Description: PGP signature


Re: [PATCH 07/13] ocxl: Add AFU interrupt support

2018-01-23 Thread Cédric Le Goater
On 12/19/2017 04:05 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2017-12-18 at 16:21 +0100, Frederic Barrat wrote:
>> Add user APIs through ioctl to allocate, free, and be notified of an
>> AFU interrupt.
>>
>> For opencapi, an AFU can trigger an interrupt on the host by sending a
>> specific command targeting a 64-bit object handle. On POWER9, this is
>> implemented by mapping a special page in the address space of a
>> process and a write to that page will trigger an interrupt.
> 
> We need to figure out how that plays with KVM. +Cedric..
> 
> For all those "generic xive" interrupts, whether they are used for
> OpenCAPI, plain guest IPIs, NX interrupts etc... but also for actual
> pass-through ones, we'll need a mechanism to map the trigger and ESB
> pages into qemu.
It seems feasible to use a common driver, at least for QEMU/KVM 
and OCXL, to expose the ESB pages of a range of IRQ numbers. Fred 
has already defined a user API, a set of ioctl which allocate, free 
one IRQ and also associate an IRQ with an eventfd for handling. 
The VMA is populated on demand. 

This XIVE IRQ "device", that I don't know how to name, defines 
generic IRQ sources and handlers for a given range. We would need 
a couple of properties to describe it in a device tree, 

  - "ibm,xive-lisn-ranges" for the range.
  
Anymore ? 

The current code needs some changes to distinguish the XIVE IRQ 
driver from the OCXL one, range support should be added, using a 
bitmap to track allocation I guess.

>From a OCXL perspective, the XIVE IRQ device driver would be 
instantiated from the OCXL one using an ioctl returning an fd,
like KVM does with KVM devices. User space would then alloc, free, 
associate IRQs and mmap the ESB pages to configure the OpenCAPI 
device. As for QEMU, I think we could add an extra KVM device, 
QEMU does not need the 'associate' feature though.

Such devices could theoretically be defined by the firmware for 
general purpose also, and be used through a char device. This is 
a possibility.


> We can't have a bazillion VMAs and KVM memory regions either, so we'll
> need some kind of mechanism/driver which allows for a single fairly
> large mmap'ed VMA which can then be "populated" with interrupt control
> pages.

yes. the full address range should mmapped for the IRQ range defined
for the device. access to pages not populated would return EFAULT.  
 
> The issue of course is that we can't really do a "generic" system that
> allows to map any interrupt, it's a security issue. So we need the
> interrupt "owner" to be the one allowing this. VFIO for PCI for
> example, possibly a specific VFIO variant for OpenCAPI, something else
> for guest IPIs ?
If we have defined ranges per devices, that should be enough no ?

Thanks,

C. 
 
> Food for thoughts...
> 
> Ben.
> 
>>
>> Signed-off-by: Frederic Barrat 
>> ---
>>  arch/powerpc/include/asm/pnv-ocxl.h   |   3 +
>>  arch/powerpc/platforms/powernv/ocxl.c |  30 +
>>  drivers/misc/ocxl/afu_irq.c   | 204 
>> ++
>>  drivers/misc/ocxl/context.c   |  40 ++-
>>  drivers/misc/ocxl/file.c  |  33 ++
>>  drivers/misc/ocxl/link.c  |  28 +
>>  drivers/misc/ocxl/ocxl_internal.h |   7 ++
>>  include/uapi/misc/ocxl.h  |   9 ++
>>  8 files changed, 352 insertions(+), 2 deletions(-)
>>  create mode 100644 drivers/misc/ocxl/afu_irq.c
>>
>> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
>> b/arch/powerpc/include/asm/pnv-ocxl.h
>> index 5a7ae7f28209..1e26f0a39500 100644
>> --- a/arch/powerpc/include/asm/pnv-ocxl.h
>> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
>> @@ -37,4 +37,7 @@ extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void 
>> *spa_mem, int PE_mask,
>>  extern void pnv_ocxl_spa_release(void *platform_data);
>>  extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
>>  
>> +extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
>> +extern void pnv_ocxl_free_xive_irq(u32 irq);
>> +
>>  #endif /* _ASM_PVN_OCXL_H */
>> diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
>> b/arch/powerpc/platforms/powernv/ocxl.c
>> index 6c79924b95c8..96cafba6aef1 100644
>> --- a/arch/powerpc/platforms/powernv/ocxl.c
>> +++ b/arch/powerpc/platforms/powernv/ocxl.c
>> @@ -9,6 +9,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include "pci.h"
>>  
>> @@ -487,3 +488,32 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int 
>> pe_handle)
>>  return rc;
>>  }
>>  EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
>> +
>> +int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
>> +{
>> +__be64 flags, trigger_page;
>> +s64 rc;
>> +u32 hwirq;
>> +
>> +hwirq = xive_native_alloc_irq();
>> +if (!hwirq)
>> +return -ENOENT;
>> +
>> +rc = opal_xive_get_irq_info(hwirq, &flags, NULL, &trigger_page, NULL,
>> +NULL);
>> +if (rc || !trigger_page) {
>> +xive_native_free_irq(hwirq);
>> +  

[PATCH] powerpc/ps3: remove an unneeded NULL check

2018-01-23 Thread Dan Carpenter
Static checkers don't like the inconsistent NULL checking on "ops".
This function is only called once and "ops" isn't NULL so the check can
be removed.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/ps3/sys-manager-core.c b/drivers/ps3/sys-manager-core.c
index c429ffca1ab7..a5a6def77bb9 100644
--- a/drivers/ps3/sys-manager-core.c
+++ b/drivers/ps3/sys-manager-core.c
@@ -43,7 +43,7 @@ void ps3_sys_manager_register_ops(const struct 
ps3_sys_manager_ops *ops)
 {
BUG_ON(!ops);
BUG_ON(!ops->dev);
-   ps3_sys_manager_ops = ops ? *ops : ps3_sys_manager_ops;
+   ps3_sys_manager_ops = *ops;
 }
 EXPORT_SYMBOL_GPL(ps3_sys_manager_register_ops);
 


Re: [PATCH] powerpc/ps3: remove an unneeded NULL check

2018-01-23 Thread Geoff Levand
Hi Dan,

On 01/23/2018 12:33 AM, Dan Carpenter wrote:
> @@ -43,7 +43,7 @@ void ps3_sys_manager_register_ops(const struct 
> ps3_sys_manager_ops *ops)
>  {
>   BUG_ON(!ops);
>   BUG_ON(!ops->dev);
> - ps3_sys_manager_ops = ops ? *ops : ps3_sys_manager_ops;
> + ps3_sys_manager_ops = *ops;

This seems to be a left over from when I was adding the modular
system-manager support.  It doesn't really make sense if you
look at how the ps3_sys_manager_ops variable is used.

I added your patch to my ps3-queue branch.  Thanks for the
contribution.

-Geoff


Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues

2018-01-23 Thread Frank Rowand
On 01/23/18 04:11, Michael Ellerman wrote:
> Wolfram Sang  writes:
> 
>> Hi Frank,
>>
>>> Please go back and read the thread for version 1.  Simply resubmitting a
>>> forward port is ignoring that whole conversation.
>>>
>>> There is a lot of good info in that thread.  I certainly learned stuff in 
>>> it.
>>
>> Yes, I did that and learned stuff, too. My summary of the discussion was:
>>
>> - you mentioned some drawbacks you saw (like the mixture of trace output
>>   and printk output)
>> - most of them look like addressed to me? (e.g. Steven showed a way to 
>> redirect
>>   printk to trace)
>> - you posted your version (which was, however, marked as "not user friendly"
>>   even by yourself)
>> - The discussion stalled over having two approaches
>>
>> So, I thought reposting would be a good way of finding out if your
>> concerns were addressed in the discussion or not. If I overlooked
>> something, I am sorry for that. Still, my intention is to continue the
>> discussion, not to ignore it. Because as it stands, we don't have such a
>> debugging mechanism in place currently, and with people working with DT
>> overlays, I'd think it would be nice to have.
> 
> Yeah I agree with all of that, I didn't think there were really any
> concerns left outstanding. These trace points are very useful, I've
> twice added them to a kernel to debug something, so it would be great
> for them to be in mainline.
> 
> cheers
> 

Yes, I believe there are concerns outstanding.  I'll try to read through
the whole thread today to make sure I'm not missing anything.

-Frank


[PATCH 2/2] KVM: PPC: Book3S HV: Fix trailing semicolon

2018-01-23 Thread Luis de Bethencourt
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Signed-off-by: Luis de Bethencourt 
---
 arch/powerpc/kvm/book3s_xive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index f0f5cd4d2fe7..f9818d7d3381 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -188,7 +188,7 @@ static int xive_provision_queue(struct kvm_vcpu *vcpu, u8 
prio)
if (!qpage) {
pr_err("Failed to allocate queue %d for VCPU %d\n",
   prio, xc->server_num);
-   return -ENOMEM;;
+   return -ENOMEM;
}
memset(qpage, 0, 1 << xive->q_order);
 
-- 
2.15.1



[PATCH 1/2] powerpc/powernv: Fix trailing semicolon

2018-01-23 Thread Luis de Bethencourt
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Signed-off-by: Luis de Bethencourt 
---

Hi,

After fixing the same thing in drivers/staging/rtl8723bs/, Joe Perches
suggested I fix it treewide [0].

Best regards 
Luis


[0] 
http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115410.html
[1] 
http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-January/115390.html

 arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f6f0c5717e08..101981cc75ac 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1840,7 +1840,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pci_dev 
*pdev, u64 dma_mask)
s64 rc;
 
if (WARN_ON(!pdn || pdn->pe_number == IODA_INVALID_PE))
-   return -ENODEV;;
+   return -ENODEV;
 
pe = &phb->ioda.pe_array[pdn->pe_number];
if (pe->tce_bypass_enabled) {
-- 
2.15.1



[PATCH] drivers/macintosh: Use true for boolean value

2018-01-23 Thread Gustavo A. R. Silva
Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/macintosh/windfarm_pm72.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/windfarm_pm72.c 
b/drivers/macintosh/windfarm_pm72.c
index e88cfb3..8330215 100644
--- a/drivers/macintosh/windfarm_pm72.c
+++ b/drivers/macintosh/windfarm_pm72.c
@@ -611,7 +611,7 @@ static void pm72_tick(void)
int i, last_failure;
 
if (!started) {
-   started = 1;
+   started = true;
printk(KERN_INFO "windfarm: CPUs control loops started.\n");
for (i = 0; i < nr_chips; ++i) {
if (cpu_setup_pid(i) < 0) {
-- 
2.7.4



[PATCH-next] powerpc/fsl_pci: Use PTR_ERR_OR_ZERO

2018-01-23 Thread Christopher Díaz Riveros
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

This issue was detected by using the Coccinelle software.

Signed-off-by: Christopher Díaz Riveros 
---
 arch/powerpc/sysdev/fsl_pci.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index cc20d2255d7f..142184635c81 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1304,10 +1304,7 @@ static int add_err_dev(struct platform_device *pdev)
   pdev->resource,
   pdev->num_resources,
   &pd, sizeof(pd));
-   if (IS_ERR(errdev))
-   return PTR_ERR(errdev);
-
-   return 0;
+   return PTR_ERR_OR_ZERO(errdev);
 }
 
 static int fsl_pci_probe(struct platform_device *pdev)
-- 
2.16.0



Re: [PATCH v2 1/5] powerpc/mm: Enhance 'slice' for supporting PPC32

2018-01-23 Thread Segher Boessenkool
On Mon, Jan 22, 2018 at 08:52:53AM +0100, Christophe LEROY wrote:
> >Just make sure to declare all functions, or define it to some empty
> >thing, or #ifdeffery if you have to.  There are many options, it is
> >not hard, and if it means you have to pull code further apart that is
> >not so bad: you get cleaner, clearer code.
> 
> Ok, if I understand well, your comment applies to the following indeed, 
> so you confirm the #ifdef is necessary.

As I said, not necessary, but it might be the easiest or even the
cleanest here.  Something for you and the maintainers to fight about,
I'll stay out of it :-)

> However, my question was related to another part of the current 
> patchset, where the functions are always refined:
> 
> 
> On PPC32 we set:
> 
> +#define SLICE_LOW_SHIFT  28
> +#define SLICE_HIGH_SHIFT 0
> 
> On PPC64 we set:
> 
>  #define SLICE_LOW_SHIFT  28
>  #define SLICE_HIGH_SHIFT 40
> 
> We define:
> 
> +#define slice_bitmap_zero(dst, nbits) \
> + do { if (nbits) bitmap_zero(dst, nbits); } while (0)
> 
> 
> We have a function with:
> {
>   slice_bitmap_zero(ret->low_slices, SLICE_NUM_LOW);
>   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
> }

SLICE_NUM_xx is not the same as SLICE_xx_SHIFT; I don't see how any of
those shift values give nbits == 0.

> So the question is to find the better approach. Is the above approach 
> correct, including performance wise ?

If slice_bitmap_zero is inlined (or partially inlined) it is fine.  Is it?


Segher


Re: [PATCH] drivers/macintosh: Use true for boolean value

2018-01-23 Thread Michael Ellerman
"Gustavo A. R. Silva"  writes:

> Assign true or false to boolean variables instead of an integer value.
>
> This issue was detected with the help of Coccinelle.
>
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/macintosh/windfarm_pm72.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This seems to be common across all those drivers. Can you please send
one patch to fix them all to use bool:

  $ git grep "started = " drivers/macintosh/
  drivers/macintosh/therm_adt746x.c:  int started = 0;
  drivers/macintosh/therm_adt746x.c:  started = 1;
  drivers/macintosh/windfarm_pm112.c: slots_started = 1;
  drivers/macintosh/windfarm_pm112.c: started = 1;
  drivers/macintosh/windfarm_pm121.c: pm121_started = 1;
  drivers/macintosh/windfarm_pm72.c:  started = 1;
  drivers/macintosh/windfarm_pm81.c:  wf_smu_started = 1;
  drivers/macintosh/windfarm_pm91.c:  wf_smu_started = 1;
  drivers/macintosh/windfarm_rm31.c:  started = 1;

cheers


Re: [PATCH v4 3/7] platforms/pseries: Set eeh_pe of EEH_PE_VF type

2018-01-23 Thread Michael Ellerman
"Bryant G. Ly"  writes:

> To correctly use EEH code one has to make
> sure that the EEH_PE_VF is set for dynamic created
> VFs. Therefore this patch allocates an eeh_pe of
> eeh type EEH_PE_VF and associates PE with parent.
>
> Signed-off-by: Bryant G. Ly 
> Signed-off-by: Juan J. Alvarez 
> ---
>  arch/powerpc/include/asm/pci-bridge.h|  5 -
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 17 +
>  2 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/pci-bridge.h 
> b/arch/powerpc/include/asm/pci-bridge.h
> index 9f66ddebb799..16d70740a76f 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -211,7 +211,10 @@ struct pci_dn {
>   unsigned int *pe_num_map;   /* PE# for the first VF PE or array */
>   boolm64_single_mode;/* Use M64 BAR in Single Mode */
>  #define IODA_INVALID_M64(-1)
> - int (*m64_map)[PCI_SRIOV_NUM_BARS];
> + union {
> + int (*m64_map)[PCI_SRIOV_NUM_BARS]; /*Only used in powernv 
> */
> + int last_allow_rc;  /* Only used in pSeries */
> + };
>  #endif /* CONFIG_PCI_IOV */
>   int mps;/* Maximum Payload Size */
>   struct list_head child_list;

I don't see the point of using a union to save 4 bytes.

And if you look at the current layout of the struct there's actually a 4
byte hole after mps, so it doesn't actually save any space at all.

I can remove it before applying, unless there's some compelling reason
for it I'm not seeing.

cheers


Re: [PATCH v4 0/7] SR-IOV Enablement on PowerVM

2018-01-23 Thread Russell Currey
On Fri, 2018-01-05 at 10:45 -0600, Bryant G. Ly wrote:
> This patch series will enable SR-IOV on PowerVM. A specific set of
> lids for PFW/PHYP is required. They are planned to release with
> 920 at the moment.
> 
> For IBM internal testers let me know of a system you want to test on
> and we can put on the lids required or we can provide a system to run
> the tests.
> 
> This patch depends on the three patches:
> 988fc3ba5653278a8c14d6ccf687371775930d2b
> dae7253f9f78a731755ca20c66b2d2c40b86baea
> 608c0d8804ef3ca4cda8ec6ad914e47deb283d7b
> 
> v1 - Initial Patch
> v2 - Addressed Alexey and Russell's comments
> v3 - Unify the call of .error_detected()
> v4 - Fixed subject and change log per Bjorn's comments and
>  fixed Alexey's comments

For the whole series:

Acked-by: Russell Currey 


Re: [PATCH] drivers/macintosh: Use true for boolean value

2018-01-23 Thread Gustavo A. R. Silva


Quoting Michael Ellerman :


"Gustavo A. R. Silva"  writes:


Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/macintosh/windfarm_pm72.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


This seems to be common across all those drivers. Can you please send
one patch to fix them all to use bool:

  $ git grep "started = " drivers/macintosh/
  drivers/macintosh/therm_adt746x.c:  int started = 0;
  drivers/macintosh/therm_adt746x.c:  started = 1;
  drivers/macintosh/windfarm_pm112.c: slots_started = 1;
  drivers/macintosh/windfarm_pm112.c: started = 1;
  drivers/macintosh/windfarm_pm121.c: pm121_started = 1;
  drivers/macintosh/windfarm_pm72.c:  started = 1;
  drivers/macintosh/windfarm_pm81.c:  wf_smu_started = 1;
  drivers/macintosh/windfarm_pm91.c:  wf_smu_started = 1;
  drivers/macintosh/windfarm_rm31.c:  started = 1;

cheers


Sure, no problem.

By the way, I've just found the following similar case:

--- a/drivers/macintosh/ams/ams-input.c
+++ b/drivers/macintosh/ams/ams-input.c
@@ -91,7 +91,7 @@ static int ams_input_enable(void)
return error;
}

-   joystick = 1;
+   joystick = true;

return 0;
 }
@@ -104,7 +104,7 @@ static void ams_input_disable(void)
ams_info.idev = NULL;
}

-   joystick = 0;
+   joystick = false;
 }

Do you want me to include them all in the same patch?

Thanks
--
Gustavo






Re: [PATCH v2 1/6] resource: Extend the PPC32 reserved memory hack

2018-01-23 Thread Michael Ellerman
Jonathan Neuschäfer  writes:

> On Tue, Jan 23, 2018 at 11:58:06PM +1100, Michael Ellerman wrote:
>> Jonathan Neuschäfer  writes:
>> 
>> > On the Nintendo Wii, there are two ranges of physical memory, and MMIO
>> > in between, but Linux on ppc32 doesn't support discontiguous memory.
>> > Therefore a hack was introduced in commit c5df7f775148 ("powerpc: allow
>> > ioremap within reserved memory regions") and commit de32400dd26e ("wii:
>> > use both mem1 and mem2 as ram"):
>> >
>> >  - Treat the area from the start of the first memory area (MEM1) to the
>> >end of the second (MEM2) as one big memory area, but mark the part
>> >that doesn't belong to MEM1 or MEM2 as reserved.
>> >  - Only on the Wii, allow ioremap to be used on reserved memory.
>> >
>> > This hack, however, doesn't account for the "resource"-based API in
>> > kernel/resource.c, because __request_region performs its own checks.
>> >
>> > Extend the hack to kernel/resource.c, to allow more drivers to allocate
>> > their MMIO regions on the Wii.
>> 
>> Hi Jonathan,
>> 
>> Sorry but I can't merge a hack like this in generic code.
>
> Makes sense.
>
>> Has anyone looked at adding proper discontig mem support to PPC32?
>
> I'm not aware of any such effort.
>
> Do you have any pointer on how to implement discontiguous memory
> support? CONFIG_ARCH_SPARSEMEM_ENABLE seems relevant.

I'm not really sure what the key impediment to it working is.

You don't need to go all the way to SPARSEMEM, there is DISCONTIGMEM
which IIUI is quite a bit simpler.

I'd actually be interested to know what happens (ie. breaks) if you just
add the two memblocks and leave the hole in between. Is it the generic
code that breaks or is it something in the powerpc code? If it's the
later maybe we can do a small fix/hack to work around that.

>> Or can we punch a hole in the resource in the right place? Maybe from
>> add_system_ram_resources() ?
>
> Not sure. add_system_ram_resources would need the original memblock
> table, which is overwritten in wii_memory_fixups, if I read the code
> correctly.

Or it just needs to know where the "wii hole" is, and it can skip that
region, that should be doable, but whether it actually works I'm not
100% sure.

> If a proper solution doesn't take an overwhelming amount of work, I'd
> prefer a proper solution.

Thanks.

cheers


Re: [PATCH] drivers/macintosh: Use true for boolean value

2018-01-23 Thread Gustavo A. R. Silva


Quoting "Gustavo A. R. Silva" :


Quoting Michael Ellerman :


"Gustavo A. R. Silva"  writes:


Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
drivers/macintosh/windfarm_pm72.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


This seems to be common across all those drivers. Can you please send
one patch to fix them all to use bool:

 $ git grep "started = " drivers/macintosh/
 drivers/macintosh/therm_adt746x.c:  int started = 0;
 drivers/macintosh/therm_adt746x.c:  started = 1;
 drivers/macintosh/windfarm_pm112.c: slots_started = 1;
 drivers/macintosh/windfarm_pm112.c: started = 1;
 drivers/macintosh/windfarm_pm121.c: pm121_started = 1;
 drivers/macintosh/windfarm_pm72.c:  started = 1;
 drivers/macintosh/windfarm_pm81.c:  wf_smu_started = 1;
 drivers/macintosh/windfarm_pm91.c:  wf_smu_started = 1;
 drivers/macintosh/windfarm_rm31.c:  started = 1;

cheers


Sure, no problem.

By the way, I've just found the following similar case:

--- a/drivers/macintosh/ams/ams-input.c
+++ b/drivers/macintosh/ams/ams-input.c
@@ -91,7 +91,7 @@ static int ams_input_enable(void)
return error;
}

-   joystick = 1;
+   joystick = true;

return 0;
 }
@@ -104,7 +104,7 @@ static void ams_input_disable(void)
ams_info.idev = NULL;
}

-   joystick = 0;
+   joystick = false;
 }

Do you want me to include them all in the same patch?



I sent separate patches for this.

Thanks
--
Gustavo








[PATCH] macintosh: change some data types from int to bool

2018-01-23 Thread Gustavo A. R. Silva
Change the data type of the following variables from int to bool
across all macintosh drivers:

started
slots_started
pm121_started
wf_smu_started

Some of these issues were detected with the help of Coccinelle.

Suggested-by: Michael Ellerman 
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/macintosh/therm_adt746x.c  | 4 ++--
 drivers/macintosh/windfarm_pm112.c | 8 
 drivers/macintosh/windfarm_pm121.c | 5 +++--
 drivers/macintosh/windfarm_pm72.c  | 2 +-
 drivers/macintosh/windfarm_pm81.c  | 5 +++--
 drivers/macintosh/windfarm_pm91.c  | 5 +++--
 drivers/macintosh/windfarm_rm31.c  | 2 +-
 7 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/macintosh/therm_adt746x.c 
b/drivers/macintosh/therm_adt746x.c
index f433521..d7cd5af 100644
--- a/drivers/macintosh/therm_adt746x.c
+++ b/drivers/macintosh/therm_adt746x.c
@@ -230,7 +230,7 @@ static void update_fans_speed (struct thermostat *th)
 
/* we don't care about local sensor, so we start at sensor 1 */
for (i = 1; i < 3; i++) {
-   int started = 0;
+   bool started = false;
int fan_number = (th->type == ADT7460 && i == 2);
int var = th->temps[i] - th->limits[i];
 
@@ -243,7 +243,7 @@ static void update_fans_speed (struct thermostat *th)
if (abs(var - th->last_var[fan_number]) < 2)
continue;
 
-   started = 1;
+   started = true;
new_speed = fan_speed + ((var-1)*step);
 
if (new_speed < fan_speed)
diff --git a/drivers/macintosh/windfarm_pm112.c 
b/drivers/macintosh/windfarm_pm112.c
index 96d16fc..fec91db 100644
--- a/drivers/macintosh/windfarm_pm112.c
+++ b/drivers/macintosh/windfarm_pm112.c
@@ -96,14 +96,14 @@ static int cpu_last_target;
 static struct wf_pid_state backside_pid;
 static int backside_tick;
 static struct wf_pid_state slots_pid;
-static int slots_started;
+static bool slots_started;
 static struct wf_pid_state drive_bay_pid;
 static int drive_bay_tick;
 
 static int nr_cores;
 static int have_all_controls;
 static int have_all_sensors;
-static int started;
+static bool started;
 
 static int failure_state;
 #define FAILURE_SENSOR 1
@@ -462,7 +462,7 @@ static void slots_fan_tick(void)
/* first time; initialize things */
printk(KERN_INFO "windfarm: Slots control loop started.\n");
wf_pid_init(&slots_pid, &slots_param);
-   slots_started = 1;
+   slots_started = true;
}
 
err = slots_power->ops->get_value(slots_power, &power);
@@ -506,7 +506,7 @@ static void pm112_tick(void)
int i, last_failure;
 
if (!started) {
-   started = 1;
+   started = true;
printk(KERN_INFO "windfarm: CPUs control loops started.\n");
for (i = 0; i < nr_cores; ++i) {
if (create_cpu_loop(i) < 0) {
diff --git a/drivers/macintosh/windfarm_pm121.c 
b/drivers/macintosh/windfarm_pm121.c
index b350fb8..4d72d8f 100644
--- a/drivers/macintosh/windfarm_pm121.c
+++ b/drivers/macintosh/windfarm_pm121.c
@@ -246,7 +246,8 @@ enum {
 static struct wf_control *controls[N_CONTROLS] = {};
 
 /* Set to kick the control loop into life */
-static int pm121_all_controls_ok, pm121_all_sensors_ok, pm121_started;
+static int pm121_all_controls_ok, pm121_all_sensors_ok;
+static bool pm121_started;
 
 enum {
FAILURE_FAN = 1 << 0,
@@ -806,7 +807,7 @@ static void pm121_tick(void)
pm121_create_sys_fans(i);
 
pm121_create_cpu_fans();
-   pm121_started = 1;
+   pm121_started = true;
}
 
/* skipping ticks */
diff --git a/drivers/macintosh/windfarm_pm72.c 
b/drivers/macintosh/windfarm_pm72.c
index e88cfb3..8330215 100644
--- a/drivers/macintosh/windfarm_pm72.c
+++ b/drivers/macintosh/windfarm_pm72.c
@@ -611,7 +611,7 @@ static void pm72_tick(void)
int i, last_failure;
 
if (!started) {
-   started = 1;
+   started = true;
printk(KERN_INFO "windfarm: CPUs control loops started.\n");
for (i = 0; i < nr_chips; ++i) {
if (cpu_setup_pid(i) < 0) {
diff --git a/drivers/macintosh/windfarm_pm81.c 
b/drivers/macintosh/windfarm_pm81.c
index 93faf29..d9ea455 100644
--- a/drivers/macintosh/windfarm_pm81.c
+++ b/drivers/macintosh/windfarm_pm81.c
@@ -140,7 +140,8 @@ static struct wf_control *fan_system;
 static struct wf_control *cpufreq_clamp;
 
 /* Set to kick the control loop into life */
-static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok, wf_smu_started;
+static int wf_smu_all_controls_ok, wf_smu_all_sensors_ok;
+static bool wf_smu_started;
 
 /* Failure handling.. could be nicer */
 #define FAILURE_FAN0x01
@@ -549,7 +550,7 @@ static void wf_smu_tick(void)
DBG("wf: c

[PATCH] macintosh/ams-input: Use true and false for boolean values

2018-01-23 Thread Gustavo A. R. Silva
Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/macintosh/ams/ams-input.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/ams/ams-input.c 
b/drivers/macintosh/ams/ams-input.c
index 2edae7d..fe248f6 100644
--- a/drivers/macintosh/ams/ams-input.c
+++ b/drivers/macintosh/ams/ams-input.c
@@ -91,7 +91,7 @@ static int ams_input_enable(void)
return error;
}
 
-   joystick = 1;
+   joystick = true;
 
return 0;
 }
@@ -104,7 +104,7 @@ static void ams_input_disable(void)
ams_info.idev = NULL;
}
 
-   joystick = 0;
+   joystick = false;
 }
 
 static ssize_t ams_input_show_joystick(struct device *dev,
-- 
2.7.4



[PATCH] ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info

2018-01-23 Thread Tyrel Datwyler
The fcp_rsp_info structure as defined in the FC spec has an initial 3 bytes
reserved field. The ibmvfc driver mistakenly defined this field as 4 bytes
resulting in the rsp_code field being defined in what should be the start of
the second reserved field and thus always being reported as zero by the
driver.

Ideally, we should wire ibmvfc up with libfc for the sake of code
deduplication, and ease of maintaining standardized structures in a single
place. However, for now simply fixup the definition in ibmvfc for
backporting to distros on older kernels. Wiring up with libfc will be done
in a followup patch.

Cc: sta...@vger.kernel.org
Reported-by: Hannes Reinecke 
Signed-off-by: Tyrel Datwyler 
---
 drivers/scsi/ibmvscsi/ibmvfc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 9a0696f..b81a53c 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -367,7 +367,7 @@ enum ibmvfc_fcp_rsp_info_codes {
 };
 
 struct ibmvfc_fcp_rsp_info {
-   __be16 reserved;
+   u8 reserved[3];
u8 rsp_code;
u8 reserved2[4];
 }__attribute__((packed, aligned (2)));
-- 
2.7.4



Re: [PATCH-next] powerpc/fsl_pci: Use PTR_ERR_OR_ZERO

2018-01-23 Thread Tyrel Datwyler
On 01/23/2018 12:37 PM, Christopher Díaz Riveros wrote:
> Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Christopher Díaz Riveros 
> ---

Reviewed-by: Tyrel Datwyler 

>  arch/powerpc/sysdev/fsl_pci.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> index cc20d2255d7f..142184635c81 100644
> --- a/arch/powerpc/sysdev/fsl_pci.c
> +++ b/arch/powerpc/sysdev/fsl_pci.c
> @@ -1304,10 +1304,7 @@ static int add_err_dev(struct platform_device *pdev)
>  pdev->resource,
>  pdev->num_resources,
>  &pd, sizeof(pd));
> - if (IS_ERR(errdev))
> - return PTR_ERR(errdev);
> -
> - return 0;
> + return PTR_ERR_OR_ZERO(errdev);
>  }
>  
>  static int fsl_pci_probe(struct platform_device *pdev)
>