Re: [PATCH v2] fs/dax: deposit pagetable even when installing zero page

2019-03-12 Thread Aneesh Kumar K.V


Hi Dan/Andrew/Jan,

"Aneesh Kumar K.V"  writes:

> Architectures like ppc64 use the deposited page table to store hardware
> page table slot information. Make sure we deposit a page table when
> using zero page at the pmd level for hash.
>
> Without this we hit
>
> Unable to handle kernel paging request for data at address 0x
> Faulting instruction address: 0xc0082a74
> Oops: Kernel access of bad area, sig: 11 [#1]
> 
>
> NIP [c0082a74] __hash_page_thp+0x224/0x5b0
> LR [c00829a4] __hash_page_thp+0x154/0x5b0
> Call Trace:
>  hash_page_mm+0x43c/0x740
>  do_hash_page+0x2c/0x3c
>  copy_from_iter_flushcache+0xa4/0x4a0
>  pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
>  dax_copy_from_iter+0x40/0x70
>  dax_iomap_actor+0x134/0x360
>  iomap_apply+0xfc/0x1b0
>  dax_iomap_rw+0xac/0x130
>  ext4_file_write_iter+0x254/0x460 [ext4]
>  __vfs_write+0x120/0x1e0
>  vfs_write+0xd8/0x220
>  SyS_write+0x6c/0x110
>  system_call+0x3c/0x130
>
> Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
> Reviewed-by: Jan Kara 
> Signed-off-by: Aneesh Kumar K.V 

Any suggestion on which tree this patch should got to? Also since this
fix a kernel crash, we may want to get this to 5.1?

> ---
> Changes from v1:
> * Add reviewed-by:
> * Add Fixes:
>
>  fs/dax.c | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index 6959837cc465..01bfb2ac34f9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "internal.h"
>  
>  #define CREATE_TRACE_POINTS
> @@ -1410,7 +1411,9 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
> *xas, struct vm_fault *vmf,
>  {
>   struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>   unsigned long pmd_addr = vmf->address & PMD_MASK;
> + struct vm_area_struct *vma = vmf->vma;
>   struct inode *inode = mapping->host;
> + pgtable_t pgtable = NULL;
>   struct page *zero_page;
>   spinlock_t *ptl;
>   pmd_t pmd_entry;
> @@ -1425,12 +1428,22 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
> *xas, struct vm_fault *vmf,
>   *entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn,
>   DAX_PMD | DAX_ZERO_PAGE, false);
>  
> + if (arch_needs_pgtable_deposit()) {
> + pgtable = pte_alloc_one(vma->vm_mm);
> + if (!pgtable)
> + return VM_FAULT_OOM;
> + }
> +
>   ptl = pmd_lock(vmf->vma->vm_mm, vmf->pmd);
>   if (!pmd_none(*(vmf->pmd))) {
>   spin_unlock(ptl);
>   goto fallback;
>   }
>  
> + if (pgtable) {
> + pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
> + mm_inc_nr_ptes(vma->vm_mm);
> + }
>   pmd_entry = mk_pmd(zero_page, vmf->vma->vm_page_prot);
>   pmd_entry = pmd_mkhuge(pmd_entry);
>   set_pmd_at(vmf->vma->vm_mm, pmd_addr, vmf->pmd, pmd_entry);
> @@ -1439,6 +1452,8 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
> *xas, struct vm_fault *vmf,
>   return VM_FAULT_NOPAGE;
>  
>  fallback:
> + if (pgtable)
> + pte_free(vma->vm_mm, pgtable);
>   trace_dax_pmd_load_hole_fallback(inode, vmf, zero_page, *entry);
>   return VM_FAULT_FALLBACK;
>  }
> -- 
> 2.20.1

-aneesh



Re: [PATCH v2] powerpc/mm: move warning from resize_hpt_for_hotplug()

2019-03-12 Thread David Gibson
On Fri, Mar 08, 2019 at 11:54:13AM +0100, Laurent Vivier wrote:
> resize_hpt_for_hotplug() reports a warning when it cannot
> resize the hash page table ("Unable to resize hash page
> table to target order") but in some cases it's not a problem
> and can make user thinks something has not worked properly.
> 
> This patch moves the warning to arch_remove_memory() to
> only report the problem when it is needed.
> 
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/sparsemem.h  |  4 ++--
>  arch/powerpc/mm/hash_utils_64.c   | 17 ++---
>  arch/powerpc/mm/mem.c |  3 ++-
>  arch/powerpc/platforms/pseries/lpar.c |  3 ++-
>  4 files changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/sparsemem.h 
> b/arch/powerpc/include/asm/sparsemem.h
> index 68da49320592..3192d454a733 100644
> --- a/arch/powerpc/include/asm/sparsemem.h
> +++ b/arch/powerpc/include/asm/sparsemem.h
> @@ -17,9 +17,9 @@ extern int create_section_mapping(unsigned long start, 
> unsigned long end, int ni
>  extern int remove_section_mapping(unsigned long start, unsigned long end);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> -extern void resize_hpt_for_hotplug(unsigned long new_mem_size);
> +extern int resize_hpt_for_hotplug(unsigned long new_mem_size);
>  #else
> -static inline void resize_hpt_for_hotplug(unsigned long new_mem_size) { }
> +static inline int resize_hpt_for_hotplug(unsigned long new_mem_size) { 
> return 0; }
>  #endif
>  
>  #ifdef CONFIG_NUMA
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0cc7fbc3bd1c..40bb2a8326bb 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -755,12 +755,12 @@ static unsigned long __init htab_get_table_size(void)
>  }
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -void resize_hpt_for_hotplug(unsigned long new_mem_size)
> +int resize_hpt_for_hotplug(unsigned long new_mem_size)
>  {
>   unsigned target_hpt_shift;
>  
>   if (!mmu_hash_ops.resize_hpt)
> - return;
> + return 0;
>  
>   target_hpt_shift = htab_shift_for_mem_size(new_mem_size);
>  
> @@ -773,15 +773,10 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
>* current shift
>*/
>   if ((target_hpt_shift > ppc64_pft_size)
> - || (target_hpt_shift < (ppc64_pft_size - 1))) {
> - int rc;
> -
> - rc = mmu_hash_ops.resize_hpt(target_hpt_shift);
> - if (rc && (rc != -ENODEV))
> - printk(KERN_WARNING
> -"Unable to resize hash page table to target 
> order %d: %d\n",
> -target_hpt_shift, rc);
> - }
> + || (target_hpt_shift < (ppc64_pft_size - 1)))
> + return mmu_hash_ops.resize_hpt(target_hpt_shift);
> +
> + return 0;
>  }
>  
>  int hash__create_section_mapping(unsigned long start, unsigned long end, int 
> nid)
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 33cc6f676fa6..0d40d970cf4a 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -169,7 +169,8 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 
> size,
>*/
>   vm_unmap_aliases();
>  
> - resize_hpt_for_hotplug(memblock_phys_mem_size());
> + if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
> + pr_warn("Hash collision while resizing HPT\n");
>  
>   return ret;
>  }
> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
> b/arch/powerpc/platforms/pseries/lpar.c
> index f2a9f0adc2d3..1034ef1fe2b4 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -901,8 +901,10 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>   break;
>  
>   case H_PARAMETER:
> + pr_warn("Invalid argument from H_RESIZE_HPT_PREPARE\n");
>   return -EINVAL;
>   case H_RESOURCE:
> + pr_warn("Operation not permitted from H_RESIZE_HPT_PREPARE\n");
>   return -EPERM;
>   default:
>   pr_warn("Unexpected error %d from H_RESIZE_HPT_PREPARE\n", rc);
> @@ -918,7 +920,6 @@ static int pseries_lpar_resize_hpt(unsigned long shift)
>   if (rc != 0) {
>   switch (state.commit_rc) {
>   case H_PTEG_FULL:
> - pr_warn("Hash collision while resizing HPT\n");
>   return -ENOSPC;
>  
>   default:

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 06/16] KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration

2019-03-12 Thread David Gibson
On Tue, Mar 12, 2019 at 06:00:38PM +0100, Cédric Le Goater wrote:
> On 2/25/19 3:39 AM, David Gibson wrote:
> > On Fri, Feb 22, 2019 at 12:28:30PM +0100, Cédric Le Goater wrote:
> >> These controls will be used by the H_INT_SET_QUEUE_CONFIG and
> >> H_INT_GET_QUEUE_CONFIG hcalls from QEMU. They will also be used to
> >> restore the configuration of the XIVE EQs in the KVM device and to
> >> capture the internal runtime state of the EQs. Both 'get' and 'set'
> >> rely on an OPAL call to access from the XIVE interrupt controller the
> >> EQ toggle bit and EQ index which are updated by the HW when event
> >> notifications are enqueued in the EQ.
> >>
> >> The value of the guest physical address of the event queue is saved in
> >> the XIVE internal xive_q structure for later use. That is when
> >> migration needs to mark the EQ pages dirty to capture a consistent
> >> memory state of the VM.
> >>
> >> To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
> >> OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
> >> but restoring the EQ state will.
> 
> I think we need to add some kind of flags to differentiate the hcall
> H_INT_SET_QUEUE_CONFIG from the restore of the EQ. The hcall does
> not need OPAL support call and this could help in the code
> transition.

Hrm.  What's the actual difference in the semantics between the two
cases.  The guest shouldn't have awareness of whether or not OPAL is
involved.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 03/16] KVM: PPC: Book3S HV: XIVE: introduce a new capability KVM_CAP_PPC_IRQ_XIVE

2019-03-12 Thread David Gibson
On Tue, Mar 12, 2019 at 03:03:25PM +0100, Cédric Le Goater wrote:
> On 2/25/19 1:35 AM, David Gibson wrote:
> > On Fri, Feb 22, 2019 at 12:28:27PM +0100, Cédric Le Goater wrote:
[snip]
> >> +int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
> >> +  struct kvm_vcpu *vcpu, u32 cpu)
> >> +{
> >> +  struct kvmppc_xive *xive = dev->private;
> >> +  struct kvmppc_xive_vcpu *xc;
> >> +  int rc;
> >> +
> >> +  pr_devel("native_connect_vcpu(cpu=%d)\n", cpu);
> >> +
> >> +  if (dev->ops != _xive_native_ops) {
> >> +  pr_devel("Wrong ops !\n");
> >> +  return -EPERM;
> >> +  }
> >> +  if (xive->kvm != vcpu->kvm)
> >> +  return -EPERM;
> >> +  if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
> >> +  return -EBUSY;
> >> +  if (kvmppc_xive_find_server(vcpu->kvm, cpu)) {
> > 
> > You haven't taken the kvm->lock yet, so couldn't a race mean a
> > duplicate server gets inserted after you make this check?
> > 
> >> +  pr_devel("Duplicate !\n");
> >> +  return -EEXIST;
> >> +  }
> >> +  if (cpu >= KVM_MAX_VCPUS) {
> >> +  pr_devel("Out of bounds !\n");
> >> +  return -EINVAL;
> >> +  }
> >> +  xc = kzalloc(sizeof(*xc), GFP_KERNEL);
> >> +  if (!xc)
> >> +  return -ENOMEM;
> >> +
> >> +  mutex_lock(>kvm->lock);
> >> +  vcpu->arch.xive_vcpu = xc;
> > 
> > Similarly you don't verify this is NULL after taking the lock, so
> > couldn't another thread race and make a connect which gets clobbered
> > here?
> 
> Yes. this is not very safe ... We need to clean up all the KVM device 
> methods doing the connection of the presenter to the vCPU AFAICT. 
> I will fix the XIVE native one for now. 
> 
> And also, this CPU parameter is useless. There is no reason to connect 
> a vCPU from another vCPU.

Hmm.. I thought the point of the 'cpu' parameter (not a great name) is
that it lets userspace chose the guest visible irq server ID.  I think
that's preferable to tying it to an existing cpu id, if possible.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[PATCH 7/7] ocxl: move event_fd handling to frontend

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

Event_fd is only used in the driver frontend, so it does not
need to exist in the backend code. Relocate it to the frontend
and provide an opaque mechanism for consumers instead.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/afu_irq.c   | 69 +--
 drivers/misc/ocxl/file.c  | 22 +-
 drivers/misc/ocxl/ocxl_internal.h |  5 ---
 include/misc/ocxl.h   | 44 
 4 files changed, 102 insertions(+), 38 deletions(-)

diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
index 1885c472df58..bccb89085a29 100644
--- a/drivers/misc/ocxl/afu_irq.c
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0+
 // Copyright 2017 IBM Corp.
 #include 
-#include 
+#include 
 #include "ocxl_internal.h"
 #include "trace.h"
 
@@ -11,7 +11,9 @@ struct afu_irq {
unsigned int virq;
char *name;
u64 trigger_page;
-   struct eventfd_ctx *ev_ctx;
+   irqreturn_t (*handler)(void *private);
+   void (*free_private)(void *private);
+   void *private;
 };
 
 int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
@@ -24,14 +26,42 @@ u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int 
irq_id)
return ctx->afu->irq_base_offset + (irq_id << PAGE_SHIFT);
 }
 
+int ocxl_irq_set_handler(struct ocxl_context *ctx, int irq_id,
+   irqreturn_t (*handler)(void *private),
+   void (*free_private)(void *private),
+   void *private)
+{
+   struct afu_irq *irq;
+   int rc;
+
+   mutex_lock(>irq_lock);
+   irq = idr_find(>irq_idr, irq_id);
+   if (!irq) {
+   rc = -EINVAL;
+   goto unlock;
+   }
+
+   irq->handler = handler;
+   irq->private = private;
+
+   rc = 0;
+   goto unlock;
+
+unlock:
+   mutex_unlock(>irq_lock);
+   return rc;
+}
+
 static irqreturn_t afu_irq_handler(int virq, void *data)
 {
struct afu_irq *irq = (struct afu_irq *) data;
 
trace_ocxl_afu_irq_receive(virq);
-   if (irq->ev_ctx)
-   eventfd_signal(irq->ev_ctx, 1);
-   return IRQ_HANDLED;
+
+   if (irq->handler)
+   return irq->handler(irq->private);
+
+   return IRQ_HANDLED; // Just drop it on the ground
 }
 
 static int setup_afu_irq(struct ocxl_context *ctx, struct afu_irq *irq)
@@ -123,8 +153,8 @@ static void afu_irq_free(struct afu_irq *irq, struct 
ocxl_context *ctx)
ocxl_irq_id_to_offset(ctx, irq->id),
1 << PAGE_SHIFT, 1);
release_afu_irq(irq);
-   if (irq->ev_ctx)
-   eventfd_ctx_put(irq->ev_ctx);
+   if (irq->free_private)
+   irq->free_private(irq->private);
ocxl_link_free_irq(ctx->afu->fn->link, irq->hw_irq);
kfree(irq);
 }
@@ -157,31 +187,6 @@ void ocxl_afu_irq_free_all(struct ocxl_context *ctx)
mutex_unlock(>irq_lock);
 }
 
-int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, int irq_id, int eventfd)
-{
-   struct afu_irq *irq;
-   struct eventfd_ctx *ev_ctx;
-   int rc = 0;
-
-   mutex_lock(>irq_lock);
-   irq = idr_find(>irq_idr, irq_id);
-   if (!irq) {
-   rc = -EINVAL;
-   goto unlock;
-   }
-
-   ev_ctx = eventfd_ctx_fdget(eventfd);
-   if (IS_ERR(ev_ctx)) {
-   rc = -EINVAL;
-   goto unlock;
-   }
-
-   irq->ev_ctx = ev_ctx;
-unlock:
-   mutex_unlock(>irq_lock);
-   return rc;
-}
-
 u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id)
 {
struct afu_irq *irq;
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index d28618c161de..16833a5d47f0 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -192,11 +193,27 @@ static long afu_ioctl_get_features(struct ocxl_context 
*ctx,
x == OCXL_IOCTL_GET_FEATURES ? "GET_FEATURES" : \
"UNKNOWN")
 
+static irqreturn_t irq_handler(void *private)
+{
+   struct eventfd_ctx *ev_ctx = private;
+
+   eventfd_signal(ev_ctx, 1);
+   return IRQ_HANDLED;
+}
+
+static void irq_free(void *private)
+{
+   struct eventfd_ctx *ev_ctx = private;
+
+   eventfd_ctx_put(ev_ctx);
+}
+
 static long afu_ioctl(struct file *file, unsigned int cmd,
unsigned long args)
 {
struct ocxl_context *ctx = file->private_data;
struct ocxl_ioctl_irq_fd irq_fd;
+   struct eventfd_ctx *ev_ctx;
int irq_id;
u64 irq_offset;
long rc;
@@ -248,7 +265,10 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
if (irq_fd.reserved)
return -EINVAL;
irq_id = ocxl_irq_offset_to_id(ctx, irq_fd.irq_offset);
-   rc = ocxl_afu_irq_set_fd(ctx, 

[PATCH 6/7] ocxl: afu_irq only deals with IRQ IDs, not offsets

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

The use of offsets is required only in the frontend, so alter
the IRQ API to only work with IRQ IDs in the backend.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/afu_irq.c   | 31 +--
 drivers/misc/ocxl/context.c   |  7 +--
 drivers/misc/ocxl/file.c  | 13 -
 drivers/misc/ocxl/ocxl_internal.h | 10 ++
 drivers/misc/ocxl/trace.h | 12 
 5 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c
index 11ab996657a2..1885c472df58 100644
--- a/drivers/misc/ocxl/afu_irq.c
+++ b/drivers/misc/ocxl/afu_irq.c
@@ -14,14 +14,14 @@ struct afu_irq {
struct eventfd_ctx *ev_ctx;
 };
 
-static int irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
+int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 offset)
 {
return (offset - ctx->afu->irq_base_offset) >> PAGE_SHIFT;
 }
 
-static u64 irq_id_to_offset(struct ocxl_context *ctx, int id)
+u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id)
 {
-   return ctx->afu->irq_base_offset + (id << PAGE_SHIFT);
+   return ctx->afu->irq_base_offset + (irq_id << PAGE_SHIFT);
 }
 
 static irqreturn_t afu_irq_handler(int virq, void *data)
@@ -69,7 +69,7 @@ static void release_afu_irq(struct afu_irq *irq)
kfree(irq->name);
 }
 
-int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset)
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id)
 {
struct afu_irq *irq;
int rc;
@@ -101,10 +101,7 @@ int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 
*irq_offset)
if (rc)
goto err_alloc;
 
-   *irq_offset = irq_id_to_offset(ctx, irq->id);
-
-   trace_ocxl_afu_irq_alloc(ctx->pasid, irq->id, irq->virq, irq->hw_irq,
-   *irq_offset);
+   trace_ocxl_afu_irq_alloc(ctx->pasid, irq->id, irq->virq, irq->hw_irq);
mutex_unlock(>irq_lock);
return 0;
 
@@ -123,7 +120,7 @@ static void afu_irq_free(struct afu_irq *irq, struct 
ocxl_context *ctx)
trace_ocxl_afu_irq_free(ctx->pasid, irq->id);
if (ctx->mapping)
unmap_mapping_range(ctx->mapping,
-   irq_id_to_offset(ctx, irq->id),
+   ocxl_irq_id_to_offset(ctx, irq->id),
1 << PAGE_SHIFT, 1);
release_afu_irq(irq);
if (irq->ev_ctx)
@@ -132,14 +129,13 @@ static void afu_irq_free(struct afu_irq *irq, struct 
ocxl_context *ctx)
kfree(irq);
 }
 
-int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset)
+int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id)
 {
struct afu_irq *irq;
-   int id = irq_offset_to_id(ctx, irq_offset);
 
mutex_lock(>irq_lock);
 
-   irq = idr_find(>irq_idr, id);
+   irq = idr_find(>irq_idr, irq_id);
if (!irq) {
mutex_unlock(>irq_lock);
return -EINVAL;
@@ -161,14 +157,14 @@ void ocxl_afu_irq_free_all(struct ocxl_context *ctx)
mutex_unlock(>irq_lock);
 }
 
-int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset, int eventfd)
+int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, int irq_id, int eventfd)
 {
struct afu_irq *irq;
struct eventfd_ctx *ev_ctx;
-   int rc = 0, id = irq_offset_to_id(ctx, irq_offset);
+   int rc = 0;
 
mutex_lock(>irq_lock);
-   irq = idr_find(>irq_idr, id);
+   irq = idr_find(>irq_idr, irq_id);
if (!irq) {
rc = -EINVAL;
goto unlock;
@@ -186,14 +182,13 @@ int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 
irq_offset, int eventfd)
return rc;
 }
 
-u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset)
+u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id)
 {
struct afu_irq *irq;
-   int id = irq_offset_to_id(ctx, irq_offset);
u64 addr = 0;
 
mutex_lock(>irq_lock);
-   irq = idr_find(>irq_idr, id);
+   irq = idr_find(>irq_idr, irq_id);
if (irq)
addr = irq->trigger_page;
mutex_unlock(>irq_lock);
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 9a37e9632cd9..c04887591837 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -93,8 +93,9 @@ static vm_fault_t map_afu_irq(struct vm_area_struct *vma, 
unsigned long address,
u64 offset, struct ocxl_context *ctx)
 {
u64 trigger_addr;
+   int irq_id = ocxl_irq_offset_to_id(ctx, offset);
 
-   trigger_addr = ocxl_afu_irq_get_addr(ctx, offset);
+   trigger_addr = ocxl_afu_irq_get_addr(ctx, irq_id);
if (!trigger_addr)
return VM_FAULT_SIGBUS;
 
@@ -154,12 +155,14 @@ static const struct vm_operations_struct ocxl_vmops = {
 static int check_mmap_afu_irq(struct ocxl_context *ctx,
struct vm_area_struct 

[PATCH 5/7] ocxl: Create a clear delineation between ocxl backend & frontend

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

The OCXL driver contains both frontend code for interacting with userspace,
as well as backend code for interacting with the hardware.

This patch separates the backend code from the frontend so that it can be
used by other device drivers that communicate via OpenCAPI.

Relocate dev, cdev & sysfs files to the frontend code to allow external
drivers to maintain their own devices.

Reference counting on the device in the backend is replaced with kref
counting.

Move file & sysfs layer initialisation from core.c (backend) to
pci.c (frontend).

Create an ocxl_function oriented interface for initing devices &
enumerating AFUs.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/context.c   |   2 +-
 drivers/misc/ocxl/core.c  | 205 +++---
 drivers/misc/ocxl/file.c  | 122 +++---
 drivers/misc/ocxl/ocxl_internal.h |  39 +++---
 drivers/misc/ocxl/pci.c   |  61 -
 drivers/misc/ocxl/sysfs.c |  58 +
 include/misc/ocxl.h   | 121 --
 7 files changed, 411 insertions(+), 197 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 1534b56f1db1..9a37e9632cd9 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -241,7 +241,7 @@ int ocxl_context_detach(struct ocxl_context *ctx)
}
rc = ocxl_link_remove_pe(ctx->afu->fn->link, ctx->pasid);
if (rc) {
-   dev_warn(>afu->dev,
+   dev_warn(>dev,
"Couldn't remove PE entry cleanly: %d\n", rc);
}
return 0;
diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index 2fd0c700e8a0..c632ec372342 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -13,16 +13,6 @@ static void ocxl_fn_put(struct ocxl_fn *fn)
put_device(>dev);
 }
 
-struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
-{
-   return (get_device(>dev) == NULL) ? NULL : afu;
-}
-
-void ocxl_afu_put(struct ocxl_afu *afu)
-{
-   put_device(>dev);
-}
-
 static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
 {
struct ocxl_afu *afu;
@@ -31,6 +21,7 @@ static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
if (!afu)
return NULL;
 
+   kref_init(>kref);
mutex_init(>contexts_lock);
mutex_init(>afu_control_lock);
idr_init(>contexts_idr);
@@ -39,32 +30,26 @@ static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
return afu;
 }
 
-static void free_afu(struct ocxl_afu *afu)
+static void afu_release(struct kref *kref)
 {
+   struct ocxl_afu *afu = container_of(kref, struct ocxl_afu, kref);
+
idr_destroy(>contexts_idr);
ocxl_fn_put(afu->fn);
kfree(afu);
 }
 
-static void free_afu_dev(struct device *dev)
+void ocxl_afu_get(struct ocxl_afu *afu)
 {
-   struct ocxl_afu *afu = to_ocxl_afu(dev);
-
-   ocxl_unregister_afu(afu);
-   free_afu(afu);
+   kref_get(>kref);
 }
+EXPORT_SYMBOL_GPL(ocxl_afu_get);
 
-static int set_afu_device(struct ocxl_afu *afu, const char *location)
+void ocxl_afu_put(struct ocxl_afu *afu)
 {
-   struct ocxl_fn *fn = afu->fn;
-   int rc;
-
-   afu->dev.parent = >dev;
-   afu->dev.release = free_afu_dev;
-   rc = dev_set_name(>dev, "%s.%s.%hhu", afu->config.name, location,
-   afu->config.idx);
-   return rc;
+   kref_put(>kref, afu_release);
 }
+EXPORT_SYMBOL_GPL(ocxl_afu_put);
 
 static int assign_afu_actag(struct ocxl_afu *afu)
 {
@@ -233,27 +218,25 @@ static int configure_afu(struct ocxl_afu *afu, u8 
afu_idx, struct pci_dev *dev)
if (rc)
return rc;
 
-   rc = set_afu_device(afu, dev_name(>dev));
-   if (rc)
-   return rc;
-
rc = assign_afu_actag(afu);
if (rc)
return rc;
 
rc = assign_afu_pasid(afu);
-   if (rc) {
-   reclaim_afu_actag(afu);
-   return rc;
-   }
+   if (rc)
+   goto err_free_actag;
 
rc = map_mmio_areas(afu);
-   if (rc) {
-   reclaim_afu_pasid(afu);
-   reclaim_afu_actag(afu);
-   return rc;
-   }
+   if (rc)
+   goto err_free_pasid;
+
return 0;
+
+err_free_pasid:
+   reclaim_afu_pasid(afu);
+err_free_actag:
+   reclaim_afu_actag(afu);
+   return rc;
 }
 
 static void deconfigure_afu(struct ocxl_afu *afu)
@@ -265,16 +248,8 @@ static void deconfigure_afu(struct ocxl_afu *afu)
 
 static int activate_afu(struct pci_dev *dev, struct ocxl_afu *afu)
 {
-   int rc;
-
ocxl_config_set_afu_state(dev, afu->config.dvsec_afu_control_pos, 1);
-   /*
-* Char device creation is the last step, as processes can
-* call our driver immediately, so all our inits must be finished.
-*/
-   rc = ocxl_create_cdev(afu);
-   if (rc)
-   return rc;
+
return 0;
 }
 
@@ 

[PATCH 4/7] ocxl: Don't pass pci_dev around

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

This data is already available in a struct

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/core.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index b47cfda83e46..2fd0c700e8a0 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -66,10 +66,11 @@ static int set_afu_device(struct ocxl_afu *afu, const char 
*location)
return rc;
 }
 
-static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
+static int assign_afu_actag(struct ocxl_afu *afu)
 {
struct ocxl_fn *fn = afu->fn;
int actag_count, actag_offset;
+   struct pci_dev *pci_dev = to_pci_dev(fn->dev.parent);
 
/*
 * if there were not enough actags for the function, each afu
@@ -79,16 +80,16 @@ static int assign_afu_actag(struct ocxl_afu *afu, struct 
pci_dev *dev)
fn->actag_enabled / fn->actag_supported;
actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
if (actag_offset < 0) {
-   dev_err(>dev, "Can't allocate %d actags for AFU: %d\n",
+   dev_err(_dev->dev, "Can't allocate %d actags for AFU: %d\n",
actag_count, actag_offset);
return actag_offset;
}
afu->actag_base = fn->actag_base + actag_offset;
afu->actag_enabled = actag_count;
 
-   ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
+   ocxl_config_set_afu_actag(pci_dev, afu->config.dvsec_afu_control_pos,
afu->actag_base, afu->actag_enabled);
-   dev_dbg(>dev, "actag base=%d enabled=%d\n",
+   dev_dbg(_dev->dev, "actag base=%d enabled=%d\n",
afu->actag_base, afu->actag_enabled);
return 0;
 }
@@ -103,10 +104,11 @@ static void reclaim_afu_actag(struct ocxl_afu *afu)
ocxl_actag_afu_free(afu->fn, start_offset, size);
 }
 
-static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
+static int assign_afu_pasid(struct ocxl_afu *afu)
 {
struct ocxl_fn *fn = afu->fn;
int pasid_count, pasid_offset;
+   struct pci_dev *pci_dev = to_pci_dev(fn->dev.parent);
 
/*
 * We only support the case where the function configuration
@@ -115,7 +117,7 @@ static int assign_afu_pasid(struct ocxl_afu *afu, struct 
pci_dev *dev)
pasid_count = 1 << afu->config.pasid_supported_log;
pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
if (pasid_offset < 0) {
-   dev_err(>dev, "Can't allocate %d PASIDs for AFU: %d\n",
+   dev_err(_dev->dev, "Can't allocate %d PASIDs for AFU: %d\n",
pasid_count, pasid_offset);
return pasid_offset;
}
@@ -123,10 +125,10 @@ static int assign_afu_pasid(struct ocxl_afu *afu, struct 
pci_dev *dev)
afu->pasid_count = 0;
afu->pasid_max = pasid_count;
 
-   ocxl_config_set_afu_pasid(dev, afu->config.dvsec_afu_control_pos,
+   ocxl_config_set_afu_pasid(pci_dev, afu->config.dvsec_afu_control_pos,
afu->pasid_base,
afu->config.pasid_supported_log);
-   dev_dbg(>dev, "PASID base=%d, enabled=%d\n",
+   dev_dbg(_dev->dev, "PASID base=%d, enabled=%d\n",
afu->pasid_base, pasid_count);
return 0;
 }
@@ -172,9 +174,10 @@ static void release_fn_bar(struct ocxl_fn *fn, int bar)
WARN_ON(fn->bar_used[idx] < 0);
 }
 
-static int map_mmio_areas(struct ocxl_afu *afu, struct pci_dev *dev)
+static int map_mmio_areas(struct ocxl_afu *afu)
 {
int rc;
+   struct pci_dev *pci_dev = to_pci_dev(afu->fn->dev.parent);
 
rc = reserve_fn_bar(afu->fn, afu->config.global_mmio_bar);
if (rc)
@@ -187,10 +190,10 @@ static int map_mmio_areas(struct ocxl_afu *afu, struct 
pci_dev *dev)
}
 
afu->global_mmio_start =
-   pci_resource_start(dev, afu->config.global_mmio_bar) +
+   pci_resource_start(pci_dev, afu->config.global_mmio_bar) +
afu->config.global_mmio_offset;
afu->pp_mmio_start =
-   pci_resource_start(dev, afu->config.pp_mmio_bar) +
+   pci_resource_start(pci_dev, afu->config.pp_mmio_bar) +
afu->config.pp_mmio_offset;
 
afu->global_mmio_ptr = ioremap(afu->global_mmio_start,
@@ -198,7 +201,7 @@ static int map_mmio_areas(struct ocxl_afu *afu, struct 
pci_dev *dev)
if (!afu->global_mmio_ptr) {
release_fn_bar(afu->fn, afu->config.pp_mmio_bar);
release_fn_bar(afu->fn, afu->config.global_mmio_bar);
-   dev_err(>dev, "Error mapping global mmio area\n");
+   dev_err(_dev->dev, "Error mapping global mmio area\n");
return -ENOMEM;
}
 
@@ -234,17 +237,17 @@ static int configure_afu(struct ocxl_afu *afu, u8 
afu_idx, struct 

[PATCH 3/7] ocxl: Split pci.c

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

In preparation for making core code available for external drivers,
move the core code out of pci.c and into core.c

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/Makefile|   1 +
 drivers/misc/ocxl/core.c  | 517 ++
 drivers/misc/ocxl/ocxl_internal.h |   5 +
 drivers/misc/ocxl/pci.c   | 517 --
 4 files changed, 523 insertions(+), 517 deletions(-)
 create mode 100644 drivers/misc/ocxl/core.c

diff --git a/drivers/misc/ocxl/Makefile b/drivers/misc/ocxl/Makefile
index 922e47cd4f0d..d07d1bb8e8d4 100644
--- a/drivers/misc/ocxl/Makefile
+++ b/drivers/misc/ocxl/Makefile
@@ -3,6 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
 
 ocxl-y += main.o pci.o config.o file.o pasid.o mmio.o
 ocxl-y += link.o context.o afu_irq.o sysfs.o trace.o
+ocxl-y += core.o
 obj-$(CONFIG_OCXL) += ocxl.o
 
 # For tracepoints to include our trace.h from tracepoint infrastructure:
diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
new file mode 100644
index ..b47cfda83e46
--- /dev/null
+++ b/drivers/misc/ocxl/core.c
@@ -0,0 +1,517 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright 2017 IBM Corp.
+#include 
+#include "ocxl_internal.h"
+
+static struct ocxl_fn *ocxl_fn_get(struct ocxl_fn *fn)
+{
+   return (get_device(>dev) == NULL) ? NULL : fn;
+}
+
+static void ocxl_fn_put(struct ocxl_fn *fn)
+{
+   put_device(>dev);
+}
+
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu)
+{
+   return (get_device(>dev) == NULL) ? NULL : afu;
+}
+
+void ocxl_afu_put(struct ocxl_afu *afu)
+{
+   put_device(>dev);
+}
+
+static struct ocxl_afu *alloc_afu(struct ocxl_fn *fn)
+{
+   struct ocxl_afu *afu;
+
+   afu = kzalloc(sizeof(struct ocxl_afu), GFP_KERNEL);
+   if (!afu)
+   return NULL;
+
+   mutex_init(>contexts_lock);
+   mutex_init(>afu_control_lock);
+   idr_init(>contexts_idr);
+   afu->fn = fn;
+   ocxl_fn_get(fn);
+   return afu;
+}
+
+static void free_afu(struct ocxl_afu *afu)
+{
+   idr_destroy(>contexts_idr);
+   ocxl_fn_put(afu->fn);
+   kfree(afu);
+}
+
+static void free_afu_dev(struct device *dev)
+{
+   struct ocxl_afu *afu = to_ocxl_afu(dev);
+
+   ocxl_unregister_afu(afu);
+   free_afu(afu);
+}
+
+static int set_afu_device(struct ocxl_afu *afu, const char *location)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int rc;
+
+   afu->dev.parent = >dev;
+   afu->dev.release = free_afu_dev;
+   rc = dev_set_name(>dev, "%s.%s.%hhu", afu->config.name, location,
+   afu->config.idx);
+   return rc;
+}
+
+static int assign_afu_actag(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int actag_count, actag_offset;
+
+   /*
+* if there were not enough actags for the function, each afu
+* reduces its count as well
+*/
+   actag_count = afu->config.actag_supported *
+   fn->actag_enabled / fn->actag_supported;
+   actag_offset = ocxl_actag_afu_alloc(fn, actag_count);
+   if (actag_offset < 0) {
+   dev_err(>dev, "Can't allocate %d actags for AFU: %d\n",
+   actag_count, actag_offset);
+   return actag_offset;
+   }
+   afu->actag_base = fn->actag_base + actag_offset;
+   afu->actag_enabled = actag_count;
+
+   ocxl_config_set_afu_actag(dev, afu->config.dvsec_afu_control_pos,
+   afu->actag_base, afu->actag_enabled);
+   dev_dbg(>dev, "actag base=%d enabled=%d\n",
+   afu->actag_base, afu->actag_enabled);
+   return 0;
+}
+
+static void reclaim_afu_actag(struct ocxl_afu *afu)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int start_offset, size;
+
+   start_offset = afu->actag_base - fn->actag_base;
+   size = afu->actag_enabled;
+   ocxl_actag_afu_free(afu->fn, start_offset, size);
+}
+
+static int assign_afu_pasid(struct ocxl_afu *afu, struct pci_dev *dev)
+{
+   struct ocxl_fn *fn = afu->fn;
+   int pasid_count, pasid_offset;
+
+   /*
+* We only support the case where the function configuration
+* requested enough PASIDs to cover all AFUs.
+*/
+   pasid_count = 1 << afu->config.pasid_supported_log;
+   pasid_offset = ocxl_pasid_afu_alloc(fn, pasid_count);
+   if (pasid_offset < 0) {
+   dev_err(>dev, "Can't allocate %d PASIDs for AFU: %d\n",
+   pasid_count, pasid_offset);
+   return pasid_offset;
+   }
+   afu->pasid_base = fn->pasid_base + pasid_offset;
+   afu->pasid_count = 0;
+   afu->pasid_max = pasid_count;
+
+   ocxl_config_set_afu_pasid(dev, afu->config.dvsec_afu_control_pos,
+   afu->pasid_base,
+   

[PATCH 2/7] ocxl: Allow external drivers to use OpenCAPI contexts

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

Most OpenCAPI operations require a valid context, so
exposing these functions to external drivers is necessary.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---
 drivers/misc/ocxl/context.c   |  9 +--
 drivers/misc/ocxl/file.c  |  2 +-
 drivers/misc/ocxl/ocxl_internal.h |  6 -
 include/misc/ocxl.h   | 44 +++
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 3498a0199bde..1534b56f1db1 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -8,6 +8,7 @@ struct ocxl_context *ocxl_context_alloc(void)
 {
return kzalloc(sizeof(struct ocxl_context), GFP_KERNEL);
 }
+EXPORT_SYMBOL_GPL(ocxl_context_alloc);
 
 int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping)
@@ -43,6 +44,7 @@ int ocxl_context_init(struct ocxl_context *ctx, struct 
ocxl_afu *afu,
ocxl_afu_get(afu);
return 0;
 }
+EXPORT_SYMBOL_GPL(ocxl_context_init);
 
 /*
  * Callback for when a translation fault triggers an error
@@ -63,7 +65,7 @@ static void xsl_fault_error(void *data, u64 addr, u64 dsisr)
wake_up_all(>events_wq);
 }
 
-int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
+int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct 
*mm)
 {
int rc;
 
@@ -75,7 +77,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
}
 
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
-   current->mm->context.id, ctx->tidr, amr, current->mm,
+   mm->context.id, ctx->tidr, amr, mm,
xsl_fault_error, ctx);
if (rc)
goto out;
@@ -85,6 +87,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
mutex_unlock(>status_mutex);
return rc;
 }
+EXPORT_SYMBOL_GPL(ocxl_context_attach);
 
 static vm_fault_t map_afu_irq(struct vm_area_struct *vma, unsigned long 
address,
u64 offset, struct ocxl_context *ctx)
@@ -243,6 +246,7 @@ int ocxl_context_detach(struct ocxl_context *ctx)
}
return 0;
 }
+EXPORT_SYMBOL_GPL(ocxl_context_detach);
 
 void ocxl_context_detach_all(struct ocxl_afu *afu)
 {
@@ -280,3 +284,4 @@ void ocxl_context_free(struct ocxl_context *ctx)
ocxl_afu_put(ctx->afu);
kfree(ctx);
 }
+EXPORT_SYMBOL_GPL(ocxl_context_free);
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index 16eb8a60d5c7..865b3d176431 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -100,7 +100,7 @@ static long afu_ioctl_attach(struct ocxl_context *ctx,
return -EINVAL;
 
amr = arg.amr & mfspr(SPRN_UAMOR);
-   rc = ocxl_context_attach(ctx, amr);
+   rc = ocxl_context_attach(ctx, amr, current->mm);
return rc;
 }
 
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 06fd98c989c8..779d15ef60b5 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -130,15 +130,9 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
  */
 int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
 
-struct ocxl_context *ocxl_context_alloc(void);
-int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
-   struct address_space *mapping);
-int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
 int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
-int ocxl_context_detach(struct ocxl_context *ctx);
 void ocxl_context_detach_all(struct ocxl_afu *afu);
-void ocxl_context_free(struct ocxl_context *ctx);
 
 int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
 void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 3b320c39f0af..ebbfe83cd5a6 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -51,6 +51,7 @@ struct ocxl_fn_config {
 
 // These are opaque outside the ocxl driver
 struct ocxl_afu;
+struct ocxl_context;
 
 enum ocxl_endian {
OCXL_BIG_ENDIAN = 0,/**< AFU data is big-endian */
@@ -58,6 +59,49 @@ enum ocxl_endian {
OCXL_HOST_ENDIAN = 2,   /**< AFU data is the same endianness as the 
host */
 };
 
+/**
+ * Allocate space for a new OpenCAPI context
+ *
+ * Returns NULL on failure
+ */
+struct ocxl_context *ocxl_context_alloc(void);
+
+/**
+ * Initialize an OpenCAPI context
+ *
+ * @ctx: The OpenCAPI context to initialize
+ * @afu: The AFU the context belongs to
+ * @mapping: The mapping to unmap when the context is closed (may be NULL)
+ */
+int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
+   struct address_space *mapping);
+
+/**
+ * Free an OpenCAPI context
+ *
+ * @ctx: The OpenCAPI context to free
+ */
+void ocxl_context_free(struct ocxl_context 

[PATCH 0/7] Refactor OCXL driver to allow external drivers to use it

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

This series reworks the OpenCAPI driver to split frontend
(driver interactions) from backend (hardware interactions).

This allows external drivers to utilise the core of the
generic OpenCAPI driver to communicate with specific
OpenCAPI hardware.

Alastair D'Silva (7):
  ocxl: Provide global MMIO accessors for external drivers
  ocxl: Allow external drivers to use OpenCAPI contexts
  ocxl: Split pci.c
  ocxl: Don't pass pci_dev around
  ocxl: Create a clear delineation between ocxl backend & frontend
  ocxl: afu_irq only deals with IRQ IDs, not offsets
  ocxl: move event_fd handling to frontend

 drivers/misc/ocxl/Makefile|   3 +-
 drivers/misc/ocxl/afu_irq.c   |  94 ++---
 drivers/misc/ocxl/context.c   |  18 +-
 drivers/misc/ocxl/core.c  | 578 ++
 drivers/misc/ocxl/file.c  | 157 +---
 drivers/misc/ocxl/mmio.c  | 234 
 drivers/misc/ocxl/ocxl_internal.h |  49 +--
 drivers/misc/ocxl/pci.c   | 562 ++---
 drivers/misc/ocxl/sysfs.c |  58 +--
 drivers/misc/ocxl/trace.h |  12 +-
 include/misc/ocxl.h   | 322 -
 11 files changed, 1382 insertions(+), 705 deletions(-)
 create mode 100644 drivers/misc/ocxl/core.c
 create mode 100644 drivers/misc/ocxl/mmio.c

-- 
2.20.1



[PATCH 4/5] ocxl: Remove superfluous 'extern' from headers

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

The 'extern' keyword adds no value here.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/ocxl_internal.h | 54 +++
 include/misc/ocxl.h   | 36 ++---
 2 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index a32f2151029f..321b29e77f45 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -16,7 +16,6 @@
 
 extern struct pci_driver ocxl_pci_driver;
 
-
 struct ocxl_fn {
struct device dev;
int bar_used[3];
@@ -92,41 +91,40 @@ struct ocxl_process_element {
__be32 software_state;
 };
 
+struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);
+void ocxl_afu_put(struct ocxl_afu *afu);
 
-extern struct ocxl_afu *ocxl_afu_get(struct ocxl_afu *afu);
-extern void ocxl_afu_put(struct ocxl_afu *afu);
-
-extern int ocxl_create_cdev(struct ocxl_afu *afu);
-extern void ocxl_destroy_cdev(struct ocxl_afu *afu);
-extern int ocxl_register_afu(struct ocxl_afu *afu);
-extern void ocxl_unregister_afu(struct ocxl_afu *afu);
+int ocxl_create_cdev(struct ocxl_afu *afu);
+void ocxl_destroy_cdev(struct ocxl_afu *afu);
+int ocxl_register_afu(struct ocxl_afu *afu);
+void ocxl_unregister_afu(struct ocxl_afu *afu);
 
-extern int ocxl_file_init(void);
-extern void ocxl_file_exit(void);
+int ocxl_file_init(void);
+void ocxl_file_exit(void);
 
-extern int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
-extern void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
-extern int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
-extern void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size);
+void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
+int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
+void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
 
-extern struct ocxl_context *ocxl_context_alloc(void);
-extern int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
+struct ocxl_context *ocxl_context_alloc(void);
+int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping);
-extern int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
-extern int ocxl_context_mmap(struct ocxl_context *ctx,
+int ocxl_context_attach(struct ocxl_context *ctx, u64 amr);
+int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
-extern int ocxl_context_detach(struct ocxl_context *ctx);
-extern void ocxl_context_detach_all(struct ocxl_afu *afu);
-extern void ocxl_context_free(struct ocxl_context *ctx);
+int ocxl_context_detach(struct ocxl_context *ctx);
+void ocxl_context_detach_all(struct ocxl_afu *afu);
+void ocxl_context_free(struct ocxl_context *ctx);
 
-extern int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
-extern void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
+int ocxl_sysfs_add_afu(struct ocxl_afu *afu);
+void ocxl_sysfs_remove_afu(struct ocxl_afu *afu);
 
-extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
-extern int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
-extern void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
-extern int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, u64 *irq_offset);
+int ocxl_afu_irq_free(struct ocxl_context *ctx, u64 irq_offset);
+void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
+int ocxl_afu_irq_set_fd(struct ocxl_context *ctx, u64 irq_offset,
int eventfd);
-extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
+u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, u64 irq_offset);
 
 #endif /* _OCXL_INTERNAL_H_ */
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 9ff6ddc28e22..4544573cc93c 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -53,7 +53,7 @@ struct ocxl_fn_config {
  * Read the configuration space of a function and fill in a
  * ocxl_fn_config structure with all the function details
  */
-extern int ocxl_config_read_function(struct pci_dev *dev,
+int ocxl_config_read_function(struct pci_dev *dev,
struct ocxl_fn_config *fn);
 
 /*
@@ -62,14 +62,14 @@ extern int ocxl_config_read_function(struct pci_dev *dev,
  * AFU indexes can be sparse, so a driver should check all indexes up
  * to the maximum found in the function description
  */
-extern int ocxl_config_check_afu_index(struct pci_dev *dev,
+int ocxl_config_check_afu_index(struct pci_dev *dev,
struct ocxl_fn_config *fn, int afu_idx);
 
 /*
  * Read the configuration space of a function for the AFU specified by
  * the index 'afu_idx'. Fills in a ocxl_afu_config structure
  */
-extern int ocxl_config_read_afu(struct pci_dev *dev,
+int 

[PATCH 5/5] ocxl: Remove some unused exported symbols

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

Remove some unused exported symbols.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/config.c|  2 --
 drivers/misc/ocxl/ocxl_internal.h | 23 +++
 include/misc/ocxl.h   | 23 ---
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 026ac2ac4f9c..c90c2e4875bf 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -299,7 +299,6 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
}
return 1;
 }
-EXPORT_SYMBOL_GPL(ocxl_config_check_afu_index);
 
 static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn,
struct ocxl_afu_config *afu)
@@ -535,7 +534,6 @@ int ocxl_config_get_pasid_info(struct pci_dev *dev, int 
*count)
 {
return pnv_ocxl_get_pasid_count(dev, count);
 }
-EXPORT_SYMBOL_GPL(ocxl_config_get_pasid_info);
 
 void ocxl_config_set_afu_pasid(struct pci_dev *dev, int pos, int pasid_base,
u32 pasid_count_log)
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 321b29e77f45..06fd98c989c8 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -107,6 +107,29 @@ void ocxl_pasid_afu_free(struct ocxl_fn *fn, u32 start, 
u32 size);
 int ocxl_actag_afu_alloc(struct ocxl_fn *fn, u32 size);
 void ocxl_actag_afu_free(struct ocxl_fn *fn, u32 start, u32 size);
 
+/*
+ * Get the max PASID value that can be used by the function
+ */
+int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
+
+/*
+ * Check if an AFU index is valid for the given function.
+ *
+ * AFU indexes can be sparse, so a driver should check all indexes up
+ * to the maximum found in the function description
+ */
+int ocxl_config_check_afu_index(struct pci_dev *dev,
+   struct ocxl_fn_config *fn, int afu_idx);
+
+/**
+ * Update values within a Process Element
+ *
+ * link_handle: the link handle associated with the process element
+ * pasid: the PASID for the AFU context
+ * tid: the new thread id for the process element
+ */
+int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+
 struct ocxl_context *ocxl_context_alloc(void);
 int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
struct address_space *mapping);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 4544573cc93c..9530d3be1b30 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -56,15 +56,6 @@ struct ocxl_fn_config {
 int ocxl_config_read_function(struct pci_dev *dev,
struct ocxl_fn_config *fn);
 
-/*
- * Check if an AFU index is valid for the given function.
- *
- * AFU indexes can be sparse, so a driver should check all indexes up
- * to the maximum found in the function description
- */
-int ocxl_config_check_afu_index(struct pci_dev *dev,
-   struct ocxl_fn_config *fn, int afu_idx);
-
 /*
  * Read the configuration space of a function for the AFU specified by
  * the index 'afu_idx'. Fills in a ocxl_afu_config structure
@@ -74,11 +65,6 @@ int ocxl_config_read_afu(struct pci_dev *dev,
struct ocxl_afu_config *afu,
u8 afu_idx);
 
-/*
- * Get the max PASID value that can be used by the function
- */
-int ocxl_config_get_pasid_info(struct pci_dev *dev, int *count);
-
 /*
  * Tell an AFU, by writing in the configuration space, the PASIDs that
  * it can use. Range starts at 'pasid_base' and its size is a multiple
@@ -188,15 +174,6 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data);
 
-/**
- * Update values within a Process Element
- *
- * link_handle: the link handle associated with the process element
- * pasid: the PASID for the AFU context
- * tid: the new thread id for the process element
- */
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
-
 /*
  * Remove a Process Element from the Shared Process Area for a link
  */
-- 
2.20.1



[PATCH 3/5] ocxl: read_pasid never returns an error, so make it void

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

No need for a return value in read_pasid as it only returns 0.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---
 drivers/misc/ocxl/config.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 0ee7856b033d..026ac2ac4f9c 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -68,7 +68,7 @@ static int find_dvsec_afu_ctrl(struct pci_dev *dev, u8 
afu_idx)
return 0;
 }
 
-static int read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
+static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config *fn)
 {
u16 val;
int pos;
@@ -89,7 +89,6 @@ static int read_pasid(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
 out:
dev_dbg(>dev, "PASID capability:\n");
dev_dbg(>dev, "  Max PASID log = %d\n", fn->max_pasid_log);
-   return 0;
 }
 
 static int read_dvsec_tl(struct pci_dev *dev, struct ocxl_fn_config *fn)
@@ -205,11 +204,7 @@ int ocxl_config_read_function(struct pci_dev *dev, struct 
ocxl_fn_config *fn)
 {
int rc;
 
-   rc = read_pasid(dev, fn);
-   if (rc) {
-   dev_err(>dev, "Invalid PASID configuration: %d\n", rc);
-   return -ENODEV;
-   }
+   read_pasid(dev, fn);
 
rc = read_dvsec_tl(dev, fn);
if (rc) {
-- 
2.20.1



[PATCH v2 0/5] ocxl: OpenCAPI Cleanup

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

Some minor cleanups for the OpenCAPI driver as a prerequisite
for an ocxl driver refactoring to allow the driver core to
be utilised by external drivers.

Changelog:
V2:
  - remove intermediate assignment of 'link' var in
'Rename struct link to ocxl_link'
  - Don't shift definition of ocxl_context_attach in
'Remove some unused exported symbols'

Alastair D'Silva (5):
  ocxl: Rename struct link to ocxl_link
  ocxl: Clean up printf formats
  ocxl: read_pasid never returns an error, so make it void
  ocxl: Remove superfluous 'extern' from headers
  ocxl: Remove some unused exported symbols

 drivers/misc/ocxl/config.c| 17 ++
 drivers/misc/ocxl/context.c   |  2 +-
 drivers/misc/ocxl/file.c  |  2 +-
 drivers/misc/ocxl/link.c  | 36 ++---
 drivers/misc/ocxl/ocxl_internal.h | 86 +++
 drivers/misc/ocxl/trace.h | 10 ++--
 include/misc/ocxl.h   | 53 ++-
 7 files changed, 99 insertions(+), 107 deletions(-)

-- 
2.20.1



[PATCH 2/5] ocxl: Clean up printf formats

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

Use %# instead of using a literal '0x'

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/config.c  |  6 +++---
 drivers/misc/ocxl/context.c |  2 +-
 drivers/misc/ocxl/trace.h   | 10 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 8f2c5d8bd2ee..0ee7856b033d 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -178,9 +178,9 @@ static int read_dvsec_vendor(struct pci_dev *dev)
pci_read_config_dword(dev, pos + OCXL_DVSEC_VENDOR_DLX_VERS, );
 
dev_dbg(>dev, "Vendor specific DVSEC:\n");
-   dev_dbg(>dev, "  CFG version = 0x%x\n", cfg);
-   dev_dbg(>dev, "  TLX version = 0x%x\n", tlx);
-   dev_dbg(>dev, "  DLX version = 0x%x\n", dlx);
+   dev_dbg(>dev, "  CFG version = %#x\n", cfg);
+   dev_dbg(>dev, "  TLX version = %#x\n", tlx);
+   dev_dbg(>dev, "  DLX version = %#x\n", dlx);
return 0;
 }
 
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index c10a940e3b38..3498a0199bde 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -134,7 +134,7 @@ static vm_fault_t ocxl_mmap_fault(struct vm_fault *vmf)
vm_fault_t ret;
 
offset = vmf->pgoff << PAGE_SHIFT;
-   pr_debug("%s: pasid %d address 0x%lx offset 0x%llx\n", __func__,
+   pr_debug("%s: pasid %d address %#lx offset %#llx\n", __func__,
ctx->pasid, vmf->address, offset);
 
if (offset < ctx->afu->irq_base_offset)
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index bcb7ff330c1e..68bf2f173a1a 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -28,7 +28,7 @@ DECLARE_EVENT_CLASS(ocxl_context,
__entry->tidr = tidr;
),
 
-   TP_printk("linux pid=%d spa=0x%p pasid=0x%x pidr=0x%x tidr=0x%x",
+   TP_printk("linux pid=%d spa=%p pasid=%#x pidr=%#x tidr=%#x",
__entry->pid,
__entry->spa,
__entry->pasid,
@@ -61,7 +61,7 @@ TRACE_EVENT(ocxl_terminate_pasid,
__entry->rc = rc;
),
 
-   TP_printk("pasid=0x%x rc=%d",
+   TP_printk("pasid=%#x rc=%d",
__entry->pasid,
__entry->rc
)
@@ -87,7 +87,7 @@ DECLARE_EVENT_CLASS(ocxl_fault_handler,
__entry->tfc = tfc;
),
 
-   TP_printk("spa=%p pe=0x%llx dsisr=0x%llx dar=0x%llx tfc=0x%llx",
+   TP_printk("spa=%p pe=%#llx dsisr=%#llx dar=%#llx tfc=%#llx",
__entry->spa,
__entry->pe,
__entry->dsisr,
@@ -127,7 +127,7 @@ TRACE_EVENT(ocxl_afu_irq_alloc,
__entry->irq_offset = irq_offset;
),
 
-   TP_printk("pasid=0x%x irq_id=%d virq=%u hw_irq=%d irq_offset=0x%llx",
+   TP_printk("pasid=%#x irq_id=%d virq=%u hw_irq=%d irq_offset=%#llx",
__entry->pasid,
__entry->irq_id,
__entry->virq,
@@ -150,7 +150,7 @@ TRACE_EVENT(ocxl_afu_irq_free,
__entry->irq_id = irq_id;
),
 
-   TP_printk("pasid=0x%x irq_id=%d",
+   TP_printk("pasid=%#x irq_id=%d",
__entry->pasid,
__entry->irq_id
)
-- 
2.20.1



[PATCH 1/5] ocxl: Rename struct link to ocxl_link

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

The term 'link' is ambiguous (especially when the struct is used for a
list), so rename it for clarity.

Signed-off-by: Alastair D'Silva 
Reviewed-by: Greg Kurz 
---
 drivers/misc/ocxl/file.c |  5 ++---
 drivers/misc/ocxl/link.c | 36 ++--
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index e6a607488f8a..009e09b7ded5 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -151,10 +151,9 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context 
*ctx,
mutex_unlock(>status_mutex);
 
if (status == ATTACHED) {
-   int rc;
-   struct link *link = ctx->afu->fn->link;
+   int rc = ocxl_link_update_pe(ctx->afu->fn->link,
+   ctx->pasid, ctx->tidr);
 
-   rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
if (rc)
return rc;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index d50b861d7e57..8d2690a1a9de 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -76,7 +76,7 @@ struct spa {
  * limited number of opencapi slots on a system and lookup is only
  * done when the device is probed
  */
-struct link {
+struct ocxl_link {
struct list_head list;
struct kref ref;
int domain;
@@ -179,7 +179,7 @@ static void xsl_fault_handler_bh(struct work_struct 
*fault_work)
 
 static irqreturn_t xsl_fault_handler(int irq, void *data)
 {
-   struct link *link = (struct link *) data;
+   struct ocxl_link *link = (struct ocxl_link *) data;
struct spa *spa = link->spa;
u64 dsisr, dar, pe_handle;
struct pe_data *pe_data;
@@ -256,7 +256,7 @@ static int map_irq_registers(struct pci_dev *dev, struct 
spa *spa)
>reg_tfc, >reg_pe_handle);
 }
 
-static int setup_xsl_irq(struct pci_dev *dev, struct link *link)
+static int setup_xsl_irq(struct pci_dev *dev, struct ocxl_link *link)
 {
struct spa *spa = link->spa;
int rc;
@@ -311,7 +311,7 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link 
*link)
return rc;
 }
 
-static void release_xsl_irq(struct link *link)
+static void release_xsl_irq(struct ocxl_link *link)
 {
struct spa *spa = link->spa;
 
@@ -323,7 +323,7 @@ static void release_xsl_irq(struct link *link)
unmap_irq_registers(spa);
 }
 
-static int alloc_spa(struct pci_dev *dev, struct link *link)
+static int alloc_spa(struct pci_dev *dev, struct ocxl_link *link)
 {
struct spa *spa;
 
@@ -350,7 +350,7 @@ static int alloc_spa(struct pci_dev *dev, struct link *link)
return 0;
 }
 
-static void free_spa(struct link *link)
+static void free_spa(struct ocxl_link *link)
 {
struct spa *spa = link->spa;
 
@@ -364,12 +364,12 @@ static void free_spa(struct link *link)
}
 }
 
-static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)
+static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link 
**out_link)
 {
-   struct link *link;
+   struct ocxl_link *link;
int rc;
 
-   link = kzalloc(sizeof(struct link), GFP_KERNEL);
+   link = kzalloc(sizeof(struct ocxl_link), GFP_KERNEL);
if (!link)
return -ENOMEM;
 
@@ -405,7 +405,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct link **out_link)
return rc;
 }
 
-static void free_link(struct link *link)
+static void free_link(struct ocxl_link *link)
 {
release_xsl_irq(link);
free_spa(link);
@@ -415,7 +415,7 @@ static void free_link(struct link *link)
 int ocxl_link_setup(struct pci_dev *dev, int PE_mask, void **link_handle)
 {
int rc = 0;
-   struct link *link;
+   struct ocxl_link *link;
 
mutex_lock(_list_lock);
list_for_each_entry(link, _list, list) {
@@ -442,7 +442,7 @@ EXPORT_SYMBOL_GPL(ocxl_link_setup);
 
 static void release_xsl(struct kref *ref)
 {
-   struct link *link = container_of(ref, struct link, ref);
+   struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
 
list_del(>list);
/* call platform code before releasing data */
@@ -452,7 +452,7 @@ static void release_xsl(struct kref *ref)
 
 void ocxl_link_release(struct pci_dev *dev, void *link_handle)
 {
-   struct link *link = (struct link *) link_handle;
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
 
mutex_lock(_list_lock);
kref_put(>ref, release_xsl);
@@ -488,7 +488,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
 {
-   struct link *link = (struct link *) link_handle;
+   struct ocxl_link *link = (struct ocxl_link 

Re: [PATCH] powerpc/64s: Mark 'dummy_copy_buffer' as used

2019-03-12 Thread Michael Ellerman
Christophe Leroy  writes:
> On 03/12/2019 08:29 PM, Mathieu Malaterre wrote:
>> In commit 07d2a628bc00 ("powerpc/64s: Avoid cpabort in context switch
>> when possible") a buffer 'dummy_copy_buffer' was introduced. gcc does
>> not see this buffer being used in the inline assembly within function
>> '__switch_to', explicitly marked this variable as being used.
>> 
>> Prefer using '__aligned' to get passed line over 80 characters warning
>> in checkpatch.
>
> Powerpc accepts 90 characters, use arch/powerpc/tools/checkpatch.sh
>
>> 
>> This remove the following warning:
>> 
>>arch/powerpc/kernel/process.c:1156:17: error: 'dummy_copy_buffer' defined 
>> but not used [-Werror=unused-const-variable=]
>
> commit 2bf1071a8d50 ("powerpc/64s: Remove POWER9 DD1 support") has 
> removed the fonction using 'dummy_copy_buffer' so you should remove it 
> completely.

Yes that would be better, thanks.

cheers


[PATCH 1/1] arch/powerpc: Rework local_paca to avoid LTO warnings

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

When building an LTO kernel, the existing code generates warnings:
./arch/powerpc/include/asm/paca.h:37:30: warning: register of
‘local_paca’ used for multiple global register variables
 register struct paca_struct *local_paca asm("r13");
  ^
./arch/powerpc/include/asm/paca.h:37:30: note: conflicts with
‘local_paca’

This patch reworks local_paca into an inline getter & setter function,
which addresses the warning.

Generated ASM from this patch is broadly similar (addresses have
changed and the compiler uses different GPRs in some places).

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/paca.h | 44 +++--
 arch/powerpc/kernel/paca.c  |  2 +-
 2 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e843bc5d1a0f..9c9e2dea0f9b 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -34,19 +34,6 @@
 #include 
 #include 
 
-register struct paca_struct *local_paca asm("r13");
-
-#if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
-extern unsigned int debug_smp_processor_id(void); /* from linux/smp.h */
-/*
- * Add standard checks that preemption cannot occur when using get_paca():
- * otherwise the paca_struct it points to may be the wrong one just after.
- */
-#define get_paca() ((void) debug_smp_processor_id(), local_paca)
-#else
-#define get_paca() local_paca
-#endif
-
 #ifdef CONFIG_PPC_PSERIES
 #define get_lppaca()   (get_paca()->lppaca_ptr)
 #endif
@@ -266,6 +253,37 @@ struct paca_struct {
 #endif
 } cacheline_aligned;
 
+#if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
+extern unsigned int debug_smp_processor_id(void); /* from linux/smp.h */
+#endif
+
+static inline struct paca_struct *get_paca_no_preempt_check(void)
+{
+   register struct paca_struct *paca asm("r13");
+   return paca;
+}
+
+static inline struct paca_struct *get_paca(void)
+{
+#if defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_SMP)
+   /*
+* Add standard checks that preemption cannot occur when using 
get_paca():
+* otherwise the paca_struct it points to may be the wrong one just 
after.
+*/
+   debug_smp_processor_id();
+#endif
+   return get_paca_no_preempt_check();
+}
+
+#define local_paca get_paca_no_preempt_check()
+
+static inline void set_paca(struct paca_struct *new)
+{
+   register struct paca_struct *paca asm("r13");
+   paca = new;
+}
+
+
 extern void copy_mm_to_paca(struct mm_struct *mm);
 extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 913bfca09c4f..ae5c243f9d5a 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -172,7 +172,7 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
 void setup_paca(struct paca_struct *new_paca)
 {
/* Setup r13 */
-   local_paca = new_paca;
+   set_paca(new_paca);
 
 #ifdef CONFIG_PPC_BOOK3E
/* On Book3E, initialize the TLB miss exception frames */
-- 
2.20.1



[PATCH 1/1] arch/powerpc: Don't assume start_text & head_end align

2019-03-12 Thread Alastair D'Silva
From: Alastair D'Silva 

When building LTO kernels, the start_text symbol is not guaranteed to mark
the end of the head section.

Instead, look explicitly for __head_end.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/tools/head_check.sh | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/tools/head_check.sh b/arch/powerpc/tools/head_check.sh
index ad9e57209aa4..1b0f634038c3 100644
--- a/arch/powerpc/tools/head_check.sh
+++ b/arch/powerpc/tools/head_check.sh
@@ -44,7 +44,7 @@ nm="$1"
 vmlinux="$2"
 
 # gcc-4.6-era toolchain make _stext an A (absolute) symbol rather than T
-$nm "$vmlinux" | grep -e " [TA] _stext$" -e " t start_first_256B$" -e " a 
text_start$" -e " t start_text$" -m4 > .tmp_symbols.txt
+$nm "$vmlinux" | grep -e " [TA] _stext$" -e " t start_first_256B$" -e " a 
text_start$" -e " T __head_end$" -m4 > .tmp_symbols.txt
 
 
 vma=$(cat .tmp_symbols.txt | grep -e " [TA] _stext$" | cut -d' ' -f1)
@@ -63,12 +63,12 @@ fi
 
 top_vma=$(echo $vma | cut -d'0' -f1)
 
-expected_start_text_addr=$(cat .tmp_symbols.txt | grep " a text_start$" | cut 
-d' ' -f1 | sed "s/^0/$top_vma/")
+expected_head_end_addr=$(cat .tmp_symbols.txt | grep " a text_start$" | cut 
-d' ' -f1 | sed "s/^0/$top_vma/")
 
-start_text_addr=$(cat .tmp_symbols.txt | grep " t start_text$" | cut -d' ' -f1)
+head_end_addr=$(cat .tmp_symbols.txt | grep " T __head_end$" | cut -d' ' -f1)
 
-if [ "$start_text_addr" != "$expected_start_text_addr" ]; then
-   echo "ERROR: start_text address is $start_text_addr, should be 
$expected_start_text_addr"
+if [ "$head_end_addr" != "$expected_head_end_addr" ]; then
+   echo "ERROR: __head_end address is $head_end_addr, should be 
$expected_head_end_addr"
echo "ERROR: try to enable LD_HEAD_STUB_CATCH config option"
echo "ERROR: see comments in arch/powerpc/tools/head_check.sh"
 
-- 
2.20.1



Re: [PATCH 3/6] x86: clean up _TIF_SYSCALL_EMU handling using ptrace_syscall_enter hook

2019-03-12 Thread Haibo Xu (Arm Technology China)
On 2019/3/12 20:09, Sudeep Holla wrote:
> On Mon, Mar 11, 2019 at 08:04:39PM -0700, Andy Lutomirski wrote:
>> On Mon, Mar 11, 2019 at 6:35 PM Haibo Xu (Arm Technology China)
>>  wrote:
>>>
>
> [...]
>
>>> For the PTRACE_SYSEMU_SINGLESTEP request, ptrace only need to report(send
>>> SIGTRAP) at the entry of a system call, no need to report at the exit of a
>>> system call.That's why the old logic-{step = ((flags & (_TIF_SINGLESTEP |
>>> _TIF_SYSCALL_EMU)) == _TIF_SINGLESTEP)} here try to filter out the special
>>> case(PTRACE_SYSEMU_SINGLESTEP).
>>>
>>> Another way to make sure the logic is fine, you can run some tests with
>>> respect to both logic, and to check whether they have the same behavior.
>>
>> tools/testing/selftests/x86/ptrace_syscall.c has a test intended to
>> exercise this.  Can one of you either confirm that it does exercise it
>> and that it still passes or can you improve the test?
>>
> I did run the tests which didn't flag anything. I haven't looked at the
> details of test implementation, but seem to miss this case. I will see
> what can be improved(if it's possible). Also I think single_step_syscall
> is the one I need to look for this particular one. Both single_step_syscall
> ptrace_syscall reported no errors.
>
> --
> Regards,
> Sudeep
>

Since ptrace() system call do have so many request type, I'm not sure whether 
the
test cases have covered all of that. But here we'd better make sure the 
PTRACE_SYSEMU
and PTRACE_SYSEMU_SINGLESTEP requests are work correctly. May be you can verify 
them with
tests from Bin Lu(bin...@arm.com).

Regards,
Haibo
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


Re: [PATCH v10 05/18] powerpc/prom_init: don't use string functions from lib/

2019-03-12 Thread Daniel Axtens
Hi Christophe,

In trying to extend my KASAN implementation to Book3S 64bit, I found one
other change needed to prom_init. I don't know if you think it should go
in this patch, the next one, or somewhere else entirely - I will leave
it up to you. Just let me know if you want me to carry it separately.

Thanks again for all your work on this and the integration of my series.

Regards,
Daniel

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 7017156168e8..cebb3fc535ba 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1265,7 +1265,8 @@ static void __init prom_check_platform_support(void)
   "ibm,arch-vec-5-platform-support");
 
/* First copy the architecture vec template */
-   ibm_architecture_vec = ibm_architecture_vec_template;
+   memcpy(_architecture_vec, _architecture_vec_template,
+  sizeof(struct ibm_arch_vec));
 
if (prop_len > 1) {
int i;

> When KASAN is active, the string functions in lib/ are doing the
> KASAN checks. This is too early for prom_init.
>
> This patch implements dedicated string functions for prom_init,
> which will be compiled in with KASAN disabled.
>
> Size of prom_init before the patch:
>text  data bss dec hex filename
>   12060   4886960   195084c34 arch/powerpc/kernel/prom_init.o
>
> Size of prom_init after the patch:
>text  data bss dec hex filename
>   12460   4886960   199084dc4 arch/powerpc/kernel/prom_init.o
>
> This increases the size of prom_init a bit, but as prom_init is
> in __init section, it is freed after boot anyway.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/prom_init.c| 211 
> ++---
>  arch/powerpc/kernel/prom_init_check.sh |   2 +-
>  2 files changed, 171 insertions(+), 42 deletions(-)
>
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index ecf083c46bdb..7017156168e8 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -224,6 +224,135 @@ static bool  __prombss rtas_has_query_cpu_stopped;
>  #define PHANDLE_VALID(p) ((p) != 0 && (p) != PROM_ERROR)
>  #define IHANDLE_VALID(i) ((i) != 0 && (i) != PROM_ERROR)
>  
> +/* Copied from lib/string.c and lib/kstrtox.c */
> +
> +static int __init prom_strcmp(const char *cs, const char *ct)
> +{
> + unsigned char c1, c2;
> +
> + while (1) {
> + c1 = *cs++;
> + c2 = *ct++;
> + if (c1 != c2)
> + return c1 < c2 ? -1 : 1;
> + if (!c1)
> + break;
> + }
> + return 0;
> +}
> +
> +static char __init *prom_strcpy(char *dest, const char *src)
> +{
> + char *tmp = dest;
> +
> + while ((*dest++ = *src++) != '\0')
> + /* nothing */;
> + return tmp;
> +}
> +
> +static int __init prom_strncmp(const char *cs, const char *ct, size_t count)
> +{
> + unsigned char c1, c2;
> +
> + while (count) {
> + c1 = *cs++;
> + c2 = *ct++;
> + if (c1 != c2)
> + return c1 < c2 ? -1 : 1;
> + if (!c1)
> + break;
> + count--;
> + }
> + return 0;
> +}
> +
> +static size_t __init prom_strlen(const char *s)
> +{
> + const char *sc;
> +
> + for (sc = s; *sc != '\0'; ++sc)
> + /* nothing */;
> + return sc - s;
> +}
> +
> +static int __init prom_memcmp(const void *cs, const void *ct, size_t count)
> +{
> + const unsigned char *su1, *su2;
> + int res = 0;
> +
> + for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--)
> + if ((res = *su1 - *su2) != 0)
> + break;
> + return res;
> +}
> +
> +static char __init *prom_strstr(const char *s1, const char *s2)
> +{
> + size_t l1, l2;
> +
> + l2 = prom_strlen(s2);
> + if (!l2)
> + return (char *)s1;
> + l1 = prom_strlen(s1);
> + while (l1 >= l2) {
> + l1--;
> + if (!prom_memcmp(s1, s2, l2))
> + return (char *)s1;
> + s1++;
> + }
> + return NULL;
> +}
> +
> +static size_t __init prom_strlcpy(char *dest, const char *src, size_t size)
> +{
> + size_t ret = prom_strlen(src);
> +
> + if (size) {
> + size_t len = (ret >= size) ? size - 1 : ret;
> + memcpy(dest, src, len);
> + dest[len] = '\0';
> + }
> + return ret;
> +}
> +
> +#ifdef CONFIG_PPC_PSERIES
> +static int __init prom_strtobool(const char *s, bool *res)
> +{
> + if (!s)
> + return -EINVAL;
> +
> + switch (s[0]) {
> + case 'y':
> + case 'Y':
> + case '1':
> + *res = true;
> + return 0;
> + case 'n':
> + case 'N':
> + case '0':
> + *res = false;
> +

[PATCH RFC v3 18/18] powerpc: KASAN for 64bit Book3E

2019-03-12 Thread Christophe Leroy
From: Daniel Axtens 

Wire up KASAN. Only outline instrumentation is supported.

The KASAN shadow area is mapped into vmemmap space:
0x8000 0400   to 0x8000 0600  .
To do this we require that vmemmap be disabled. (This is the default
in the kernel config that QorIQ provides for the machine in their
SDK anyway - they use flat memory.)

Only the kernel linear mapping (0xc000...) is checked. The vmalloc and
ioremap areas (also in 0x800...) are all mapped to the zero page. As
with the Book3S hash series, this requires overriding the memory <->
shadow mapping.

Also, as with both previous 64-bit series, early instrumentation is not
supported.  It would allow us to drop the check_return_arch_not_ready()
hook in the KASAN core, but it's tricky to get it set up early enough:
we need it setup before the first call to instrumented code like printk().
Perhaps in the future.

Only KASAN_MINIMAL works.

Tested on e6500. KVM, kexec and xmon have not been tested.

The test_kasan module fires warnings as expected, except for the
following tests:

 - Expected/by design:
kasan test: memcg_accounted_kmem_cache allocate memcg accounted object

 - Due to only supporting KASAN_MINIMAL:
kasan test: kasan_stack_oob out-of-bounds on stack
kasan test: kasan_global_oob out-of-bounds global variable
kasan test: kasan_alloca_oob_left out-of-bounds to left on alloca
kasan test: kasan_alloca_oob_right out-of-bounds to right on alloca
kasan test: use_after_scope_test use-after-scope on int
kasan test: use_after_scope_test use-after-scope on array

Thanks to those who have done the heavy lifting over the past several
years:
 - Christophe's 32 bit series: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-February/185379.html
 - Aneesh's Book3S hash series: https://lwn.net/Articles/655642/
 - Balbir's Book3S radix series: https://patchwork.ozlabs.org/patch/795211/

Cc: Christophe Leroy 
Cc: Aneesh Kumar K.V 
Cc: Balbir Singh 
Signed-off-by: Daniel Axtens 
[- Removed EXPORT_SYMBOL of the static key
 - Fixed most checkpatch problems
 - Replaced kasan_zero_page[] by kasan_early_shadow_page[]
 - Reduced casting mess by using intermediate locals
 - Fixed build failure on pmac32_defconfig]
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/Kconfig.debug   |  2 +-
 arch/powerpc/include/asm/kasan.h | 71 
 arch/powerpc/mm/Makefile |  2 +
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/kasan_init_book3e_64.c | 50 
 6 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_book3e_64.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d9364368329b..51ef9fac6c5d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -174,6 +174,7 @@ config PPC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
+   select HAVE_ARCH_KASAN  if PPC_BOOK3E_64 && 
!SPARSEMEM_VMEMMAP
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 61febbbdd02b..fc1f5fa7554e 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -369,5 +369,5 @@ config PPC_FAST_ENDIAN_SWITCH
 
 config KASAN_SHADOW_OFFSET
hex
-   depends on KASAN
+   depends on KASAN && PPC32
default 0xe000
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 296e51c2f066..ae410f0e060d 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -21,12 +21,15 @@
 #define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \
 (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
 
+#ifdef CONFIG_PPC32
 #define KASAN_SHADOW_OFFSETASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
 
 #define KASAN_SHADOW_END   0UL
 
 #define KASAN_SHADOW_SIZE  (KASAN_SHADOW_END - KASAN_SHADOW_START)
 
+#endif /* CONFIG_PPC32 */
+
 #ifdef CONFIG_KASAN
 void kasan_early_init(void);
 void kasan_mmu_init(void);
@@ -36,5 +39,73 @@ static inline void kasan_init(void) { }
 static inline void kasan_mmu_init(void) { }
 #endif
 
+#ifdef CONFIG_PPC_BOOK3E_64
+#include 
+#include 
+
+/*
+ * We don't put this in Kconfig as we only support KASAN_MINIMAL, and
+ * that will be disabled if the symbol is available in Kconfig
+ */
+#define KASAN_SHADOW_OFFSETASM_CONST(0x68000400)
+
+#define KASAN_SHADOW_SIZE  (KERN_VIRT_SIZE >> KASAN_SHADOW_SCALE_SHIFT)
+
+extern struct static_key_false powerpc_kasan_enabled_key;
+extern unsigned char kasan_early_shadow_page[];
+
+static inline bool kasan_arch_is_ready_book3e(void)
+{
+   if (static_branch_likely(_kasan_enabled_key))
+   return true;
+   return 

[PATCH RFC v3 16/18] kasan: allow architectures to manage the memory-to-shadow mapping

2019-03-12 Thread Christophe Leroy
From: Daniel Axtens 

Currently, shadow addresses are always addr >> shift + offset.
However, for powerpc, the virtual address space is fragmented in
ways that make this simple scheme impractical.

Allow architectures to override:
 - kasan_shadow_to_mem
 - kasan_mem_to_shadow
 - addr_has_shadow

Rename addr_has_shadow to kasan_addr_has_shadow as if it is
overridden it will be available in more places, increasing the
risk of collisions.

If architectures do not #define their own versions, the generic
code will continue to run as usual.

Reviewed-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 
Signed-off-by: Christophe Leroy 
---
 include/linux/kasan.h | 2 ++
 mm/kasan/generic.c| 2 +-
 mm/kasan/generic_report.c | 2 +-
 mm/kasan/kasan.h  | 6 +-
 mm/kasan/report.c | 6 +++---
 mm/kasan/tags.c   | 2 +-
 6 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index b40ea104dd36..f6261840f94c 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -23,11 +23,13 @@ extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 int kasan_populate_early_shadow(const void *shadow_start,
const void *shadow_end);
 
+#ifndef kasan_mem_to_shadow
 static inline void *kasan_mem_to_shadow(const void *addr)
 {
return (void *)((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT)
+ KASAN_SHADOW_OFFSET;
 }
+#endif
 
 /* Enable reporting bugs after kasan_disable_current() */
 extern void kasan_enable_current(void);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 9e5c989dab8c..a5b28e3ceacb 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -173,7 +173,7 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
if (unlikely(size == 0))
return;
 
-   if (unlikely(!addr_has_shadow((void *)addr))) {
+   if (unlikely(!kasan_addr_has_shadow((void *)addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
diff --git a/mm/kasan/generic_report.c b/mm/kasan/generic_report.c
index 36c645939bc9..6caafd61fc3a 100644
--- a/mm/kasan/generic_report.c
+++ b/mm/kasan/generic_report.c
@@ -107,7 +107,7 @@ static const char *get_wild_bug_type(struct 
kasan_access_info *info)
 
 const char *get_bug_type(struct kasan_access_info *info)
 {
-   if (addr_has_shadow(info->access_addr))
+   if (kasan_addr_has_shadow(info->access_addr))
return get_shadow_bug_type(info);
return get_wild_bug_type(info);
 }
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 3e0c11f7d7a1..958e984d4544 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -110,16 +110,20 @@ struct kasan_alloc_meta *get_alloc_info(struct kmem_cache 
*cache,
 struct kasan_free_meta *get_free_info(struct kmem_cache *cache,
const void *object);
 
+#ifndef kasan_shadow_to_mem
 static inline const void *kasan_shadow_to_mem(const void *shadow_addr)
 {
return (void *)(((unsigned long)shadow_addr - KASAN_SHADOW_OFFSET)
<< KASAN_SHADOW_SCALE_SHIFT);
 }
+#endif
 
-static inline bool addr_has_shadow(const void *addr)
+#ifndef kasan_addr_has_shadow
+static inline bool kasan_addr_has_shadow(const void *addr)
 {
return (addr >= kasan_shadow_to_mem((void *)KASAN_SHADOW_START));
 }
+#endif
 
 void kasan_poison_shadow(const void *address, size_t size, u8 value);
 
diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index ca9418fe9232..bc3355ee2dd0 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -298,7 +298,7 @@ void kasan_report(unsigned long addr, size_t size,
untagged_addr = reset_tag(tagged_addr);
 
info.access_addr = tagged_addr;
-   if (addr_has_shadow(untagged_addr))
+   if (kasan_addr_has_shadow(untagged_addr))
info.first_bad_addr = find_first_bad_addr(tagged_addr, size);
else
info.first_bad_addr = untagged_addr;
@@ -309,11 +309,11 @@ void kasan_report(unsigned long addr, size_t size,
start_report();
 
print_error_description();
-   if (addr_has_shadow(untagged_addr))
+   if (kasan_addr_has_shadow(untagged_addr))
print_tags(get_tag(tagged_addr), info.first_bad_addr);
pr_err("\n");
 
-   if (addr_has_shadow(untagged_addr)) {
+   if (kasan_addr_has_shadow(untagged_addr)) {
print_address_description(untagged_addr);
pr_err("\n");
print_shadow_for_address(info.first_bad_addr);
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 87ebee0a6aea..661c23dd5340 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -109,7 +109,7 @@ void check_memory_region(unsigned long addr, size_t size, 
bool write,
return;
 
untagged_addr = reset_tag((const void *)addr);
-   if (unlikely(!addr_has_shadow(untagged_addr))) {
+   if 

[PATCH RFC v3 17/18] kasan: allow architectures to provide an outline readiness check

2019-03-12 Thread Christophe Leroy
From: Daniel Axtens 

In powerpc (as I understand it), we spend a lot of time in boot
running in real mode before MMU paging is initialised. During
this time we call a lot of generic code, including printk(). If
we try to access the shadow region during this time, things fail.

My attempts to move early init before the first printk have not
been successful. (Both previous RFCs for ppc64 - by 2 different
people - have needed this trick too!)

So, allow architectures to define a kasan_arch_is_ready()
hook that bails out of check_memory_region_inline() unless the
arch has done all of the init.

Link: https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
Link: https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
Originally-by: Balbir Singh 
Cc: Aneesh Kumar K.V 
Signed-off-by: Daniel Axtens 
[check_return_arch_not_ready() ==> static inline kasan_arch_is_ready()]
Signed-off-by: Christophe Leroy 
---
 include/linux/kasan.h | 4 
 mm/kasan/generic.c| 3 +++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index f6261840f94c..a630d53f1a36 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -14,6 +14,10 @@ struct task_struct;
 #include 
 #include 
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
 extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
 extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index a5b28e3ceacb..0336f31bbae3 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -170,6 +170,9 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return;
+
if (unlikely(size == 0))
return;
 
-- 
2.13.3



[PATCH RFC v3 15/18] kasan: do not open-code addr_has_shadow

2019-03-12 Thread Christophe Leroy
From: Daniel Axtens 

We have a couple of places checking for the existence of a shadow
mapping for an address by open-coding the inverse of the check in
addr_has_shadow.

Replace the open-coded versions with the helper. This will be
needed in future to allow architectures to override the layout
of the shadow mapping.

Reviewed-by: Andrew Donnellan 
Reviewed-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 
Signed-off-by: Christophe Leroy 
---
 mm/kasan/generic.c | 3 +--
 mm/kasan/tags.c| 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 504c79363a34..9e5c989dab8c 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -173,8 +173,7 @@ static __always_inline void 
check_memory_region_inline(unsigned long addr,
if (unlikely(size == 0))
return;
 
-   if (unlikely((void *)addr <
-   kasan_shadow_to_mem((void *)KASAN_SHADOW_START))) {
+   if (unlikely(!addr_has_shadow((void *)addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c
index 63fca3172659..87ebee0a6aea 100644
--- a/mm/kasan/tags.c
+++ b/mm/kasan/tags.c
@@ -109,8 +109,7 @@ void check_memory_region(unsigned long addr, size_t size, 
bool write,
return;
 
untagged_addr = reset_tag((const void *)addr);
-   if (unlikely(untagged_addr <
-   kasan_shadow_to_mem((void *)KASAN_SHADOW_START))) {
+   if (unlikely(!addr_has_shadow(untagged_addr))) {
kasan_report(addr, size, write, ret_ip);
return;
}
-- 
2.13.3



[PATCH v10 14/18] powerpc/32s: map kasan zero shadow with PAGE_READONLY instead of PAGE_KERNEL_RO

2019-03-12 Thread Christophe Leroy
For hash32, the zero shadow page gets mapped with PAGE_READONLY instead
of PAGE_KERNEL_RO, because the PP bits don't provide a RO kernel, so
PAGE_KERNEL_RO is equivalent to PAGE_KERNEL. By using PAGE_READONLY,
the page is RO for both kernel and user, but this is not a security issue
as it contains only zeroes.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index ba8361487075..0d62be3cba47 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -39,7 +39,10 @@ static int kasan_init_shadow_page_tables(unsigned long 
k_start, unsigned long k_
 
if (!new)
return -ENOMEM;
-   kasan_populate_pte(new, PAGE_KERNEL_RO);
+   if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
+   kasan_populate_pte(new, PAGE_READONLY);
+   else
+   kasan_populate_pte(new, PAGE_KERNEL_RO);
pmd_populate_kernel(_mm, pmd, new);
}
return 0;
@@ -84,7 +87,10 @@ static int __ref kasan_init_region(void *start, size_t size)
 
 static void __init kasan_remap_early_shadow_ro(void)
 {
-   kasan_populate_pte(kasan_early_shadow_pte, PAGE_KERNEL_RO);
+   if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE))
+   kasan_populate_pte(kasan_early_shadow_pte, PAGE_READONLY);
+   else
+   kasan_populate_pte(kasan_early_shadow_pte, PAGE_KERNEL_RO);
 
flush_tlb_kernel_range(KASAN_SHADOW_START, KASAN_SHADOW_END);
 }
-- 
2.13.3



[PATCH v10 13/18] powerpc/32s: set up an early static hash table for KASAN.

2019-03-12 Thread Christophe Leroy
KASAN requires early activation of hash table, before memblock()
functions are available.

This patch implements an early hash_table statically defined in
__initdata.

During early boot, a single page table is used.

For hash32, when doing the final init, one page table is allocated
for each PGD entry because of the _PAGE_HASHPTE flag which can't be
common to several virt pages. This is done after memblock get
available but before switching to the final hash table, otherwise
there are issues with TLB flushing due to the shared entries.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S | 40 ++-
 arch/powerpc/mm/kasan/kasan_init_32.c | 23 +++-
 arch/powerpc/mm/mmu_decl.h|  1 +
 3 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 0bfaf64e67ee..fd7c394bc77c 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -160,6 +160,10 @@ __after_mmu_off:
bl  flush_tlbs
 
bl  initial_bats
+   bl  load_segment_registers
+#ifdef CONFIG_KASAN
+   bl  early_hash_table
+#endif
 #if defined(CONFIG_BOOTX_TEXT)
bl  setup_disp_bat
 #endif
@@ -205,7 +209,7 @@ __after_mmu_off:
  */
 turn_on_mmu:
mfmsr   r0
-   ori r0,r0,MSR_DR|MSR_IR
+   ori r0,r0,MSR_DR|MSR_IR|MSR_RI
mtspr   SPRN_SRR1,r0
lis r0,start_here@h
ori r0,r0,start_here@l
@@ -884,11 +888,24 @@ _ENTRY(__restore_cpu_setup)
blr
 #endif /* !defined(CONFIG_PPC_BOOK3S_32) */
 
-
 /*
  * Load stuff into the MMU.  Intended to be called with
  * IR=0 and DR=0.
  */
+#ifdef CONFIG_KASAN
+early_hash_table:
+   sync/* Force all PTE updates to finish */
+   isync
+   tlbia   /* Clear all TLB entries */
+   sync/* wait for tlbia/tlbie to finish */
+   TLBSYNC /* ... on all CPUs */
+   /* Load the SDR1 register (hash table base & size) */
+   lis r6, early_hash - PAGE_OFFSET@h
+   ori r6, r6, 3   /* 256kB table */
+   mtspr   SPRN_SDR1, r6
+   blr
+#endif
+
 load_up_mmu:
sync/* Force all PTE updates to finish */
isync
@@ -900,14 +917,6 @@ load_up_mmu:
tophys(r6,r6)
lwz r6,_SDR1@l(r6)
mtspr   SPRN_SDR1,r6
-   li  r0,16   /* load up segment register values */
-   mtctr   r0  /* for context 0 */
-   lis r3,0x2000   /* Ku = 1, VSID = 0 */
-   li  r4,0
-3: mtsrin  r3,r4
-   addir3,r3,0x111 /* increment VSID */
-   addis   r4,r4,0x1000/* address of next segment */
-   bdnz3b
 
 /* Load the BAT registers with the values set up by MMU_init.
MMU_init takes care of whether we're on a 601 or not. */
@@ -929,6 +938,17 @@ BEGIN_MMU_FTR_SECTION
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
blr
 
+load_segment_registers:
+   li  r0, 16  /* load up segment register values */
+   mtctr   r0  /* for context 0 */
+   lis r3, 0x2000  /* Ku = 1, VSID = 0 */
+   li  r4, 0
+3: mtsrin  r3, r4
+   addir3, r3, 0x111   /* increment VSID */
+   addis   r4, r4, 0x1000  /* address of next segment */
+   bdnz3b
+   blr
+
 /*
  * This is where the main kernel code starts.
  */
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 42617fcad828..ba8361487075 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -94,6 +94,13 @@ void __init kasan_mmu_init(void)
int ret;
struct memblock_region *reg;
 
+   if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE)) {
+   ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, 
KASAN_SHADOW_END);
+
+   if (ret)
+   panic("kasan: kasan_init_shadow_page_tables() failed");
+   }
+
for_each_memblock(memory, reg) {
phys_addr_t base = reg->base;
phys_addr_t top = min(base + reg->size, total_lowmem);
@@ -135,6 +142,20 @@ void *module_alloc(unsigned long size)
 }
 #endif
 
+#ifdef CONFIG_PPC_BOOK3S_32
+u8 __initdata early_hash[256 << 10] __aligned(256 << 10) = {0};
+
+static void __init kasan_early_hash_table(void)
+{
+   modify_instruction_site(__hash_page_A0, 0x, __pa(early_hash) 
>> 16);
+   modify_instruction_site(__flush_hash_A0, 0x, __pa(early_hash) 
>> 16);
+
+   Hash = (struct hash_pte *)early_hash;
+}
+#else
+static void __init kasan_early_hash_table(void) {}
+#endif
+
 void __init kasan_early_init(void)
 {
unsigned long addr = KASAN_SHADOW_START;
@@ -152,5 +173,5 @@ void __init kasan_early_init(void)
} while (pmd++, addr = next, addr != end);
 
if 

[PATCH v10 12/18] powerpc/32s: move hash code patching out of MMU_init_hw()

2019-03-12 Thread Christophe Leroy
For KASAN, hash table handling will be activated early for
accessing to KASAN shadow areas.

In order to avoid any modification of the hash functions while
they are still used with the early hash table, the code patching
is moved out of MMU_init_hw() and put close to the big-bang switch
to the final hash table.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_32.S |  3 +++
 arch/powerpc/mm/mmu_decl.h|  1 +
 arch/powerpc/mm/ppc_mmu_32.c  | 36 ++--
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 3ee42c0ada69..0bfaf64e67ee 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -966,6 +966,9 @@ start_here:
bl  machine_init
bl  __save_cpu_setup
bl  MMU_init
+BEGIN_MMU_FTR_SECTION
+   bl  MMU_init_hw_patch
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_HPTE_TABLE)
 
 /*
  * Go back to running unmapped so we can load up new values
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 74ff61dabcb1..d726ff776054 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -130,6 +130,7 @@ extern void wii_memory_fixups(void);
  */
 #ifdef CONFIG_PPC32
 extern void MMU_init_hw(void);
+void MMU_init_hw_patch(void);
 unsigned long mmu_mapin_ram(unsigned long base, unsigned long top);
 #endif
 
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 2d5b0d50fb31..38c0e28c21e1 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -39,6 +39,7 @@
 struct hash_pte *Hash, *Hash_end;
 unsigned long Hash_size, Hash_mask;
 unsigned long _SDR1;
+static unsigned int hash_mb, hash_mb2;
 
 struct ppc_bat BATS[8][2]; /* 8 pairs of IBAT, DBAT */
 
@@ -308,7 +309,6 @@ void hash_preload(struct mm_struct *mm, unsigned long ea,
  */
 void __init MMU_init_hw(void)
 {
-   unsigned int hmask, mb, mb2;
unsigned int n_hpteg, lg_n_hpteg;
 
if (!mmu_has_feature(MMU_FTR_HPTE_TABLE))
@@ -349,20 +349,30 @@ void __init MMU_init_hw(void)
   (unsigned long long)(total_memory >> 20), Hash_size >> 10, Hash);
 
 
-   /*
-* Patch up the instructions in hashtable.S:create_hpte
-*/
-   if ( ppc_md.progress ) ppc_md.progress("hash:patch", 0x345);
Hash_mask = n_hpteg - 1;
-   hmask = Hash_mask >> (16 - LG_HPTEG_SIZE);
-   mb2 = mb = 32 - LG_HPTEG_SIZE - lg_n_hpteg;
+   hash_mb2 = hash_mb = 32 - LG_HPTEG_SIZE - lg_n_hpteg;
if (lg_n_hpteg > 16)
-   mb2 = 16 - LG_HPTEG_SIZE;
+   hash_mb2 = 16 - LG_HPTEG_SIZE;
+}
+
+void __init MMU_init_hw_patch(void)
+{
+   unsigned int hmask = Hash_mask >> (16 - LG_HPTEG_SIZE);
 
+   if (ppc_md.progress)
+   ppc_md.progress("hash:patch", 0x345);
+   if (ppc_md.progress)
+   ppc_md.progress("hash:done", 0x205);
+
+   /* WARNING: Make sure nothing can trigger a KASAN check past this point 
*/
+
+   /*
+* Patch up the instructions in hashtable.S:create_hpte
+*/
modify_instruction_site(__hash_page_A0, 0x,
((unsigned int)Hash - PAGE_OFFSET) >> 16);
-   modify_instruction_site(__hash_page_A1, 0x7c0, mb << 6);
-   modify_instruction_site(__hash_page_A2, 0x7c0, mb2 << 6);
+   modify_instruction_site(__hash_page_A1, 0x7c0, hash_mb << 6);
+   modify_instruction_site(__hash_page_A2, 0x7c0, hash_mb2 << 6);
modify_instruction_site(__hash_page_B, 0x, hmask);
modify_instruction_site(__hash_page_C, 0x, hmask);
 
@@ -371,11 +381,9 @@ void __init MMU_init_hw(void)
 */
modify_instruction_site(__flush_hash_A0, 0x,
((unsigned int)Hash - PAGE_OFFSET) >> 16);
-   modify_instruction_site(__flush_hash_A1, 0x7c0, mb << 6);
-   modify_instruction_site(__flush_hash_A2, 0x7c0, mb2 << 6);
+   modify_instruction_site(__flush_hash_A1, 0x7c0, hash_mb << 6);
+   modify_instruction_site(__flush_hash_A2, 0x7c0, hash_mb2 << 6);
modify_instruction_site(__flush_hash_B, 0x, hmask);
-
-   if ( ppc_md.progress ) ppc_md.progress("hash:done", 0x205);
 }
 
 void setup_initial_memory_limit(phys_addr_t first_memblock_base,
-- 
2.13.3



[PATCH v10 11/18] powerpc/32: Add KASAN support

2019-03-12 Thread Christophe Leroy
This patch adds KASAN support for PPC32. The following patch
will add an early activation of hash table for book3s. Until
then, a warning will be raised if trying to use KASAN on an
hash 6xx.

To support KASAN, this patch initialises that MMU mapings for
accessing to the KASAN shadow area defined in a previous patch.

An early mapping is set as soon as the kernel code has been
relocated at its definitive place.

Then the definitive mapping is set once paging is initialised.

For modules, the shadow area is allocated at module_alloc().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/include/asm/kasan.h  |   9 ++
 arch/powerpc/kernel/head_32.S |   3 +
 arch/powerpc/kernel/head_40x.S|   3 +
 arch/powerpc/kernel/head_44x.S|   3 +
 arch/powerpc/kernel/head_8xx.S|   3 +
 arch/powerpc/kernel/head_fsl_booke.S  |   3 +
 arch/powerpc/kernel/setup-common.c|   3 +
 arch/powerpc/mm/Makefile  |   1 +
 arch/powerpc/mm/init_32.c |   3 +
 arch/powerpc/mm/kasan/Makefile|   5 ++
 arch/powerpc/mm/kasan/kasan_init_32.c | 156 ++
 12 files changed, 193 insertions(+)
 create mode 100644 arch/powerpc/mm/kasan/Makefile
 create mode 100644 arch/powerpc/mm/kasan/kasan_init_32.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index facaa6ba0d2a..d9364368329b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -173,6 +173,7 @@ config PPC
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
+   select HAVE_ARCH_KASAN  if PPC32
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 05274dea3109..296e51c2f066 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -27,5 +27,14 @@
 
 #define KASAN_SHADOW_SIZE  (KASAN_SHADOW_END - KASAN_SHADOW_START)
 
+#ifdef CONFIG_KASAN
+void kasan_early_init(void);
+void kasan_mmu_init(void);
+void kasan_init(void);
+#else
+static inline void kasan_init(void) { }
+static inline void kasan_mmu_init(void) { }
+#endif
+
 #endif /* __ASSEMBLY */
 #endif
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 48051c8977c5..3ee42c0ada69 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -958,6 +958,9 @@ start_here:
  * Do early platform-specific initialization,
  * and set up the MMU.
  */
+#ifdef CONFIG_KASAN
+   bl  kasan_early_init
+#endif
li  r3,0
mr  r4,r31
bl  machine_init
diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index a9c934f2319b..efa219d2136e 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -848,6 +848,9 @@ start_here:
 /*
  * Decide what sort of machine this is and initialize the MMU.
  */
+#ifdef CONFIG_KASAN
+   bl  kasan_early_init
+#endif
li  r3,0
mr  r4,r31
bl  machine_init
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index 37117ab11584..34a5df827b38 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -203,6 +203,9 @@ _ENTRY(_start);
 /*
  * Decide what sort of machine this is and initialize the MMU.
  */
+#ifdef CONFIG_KASAN
+   bl  kasan_early_init
+#endif
li  r3,0
mr  r4,r31
bl  machine_init
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 03c73b4c6435..d25adb6ef235 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -853,6 +853,9 @@ start_here:
 /*
  * Decide what sort of machine this is and initialize the MMU.
  */
+#ifdef CONFIG_KASAN
+   bl  kasan_early_init
+#endif
li  r3,0
mr  r4,r31
bl  machine_init
diff --git a/arch/powerpc/kernel/head_fsl_booke.S 
b/arch/powerpc/kernel/head_fsl_booke.S
index 1881127682e9..0fc38eb957b7 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -275,6 +275,9 @@ set_ivor:
 /*
  * Decide what sort of machine this is and initialize the MMU.
  */
+#ifdef CONFIG_KASAN
+   bl  kasan_early_init
+#endif
mr  r3,r30
mr  r4,r31
bl  machine_init
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index e7534f306c8e..3c6c5a43901e 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "setup.h"
 
@@ -865,6 +866,8 @@ static void smp_setup_pacas(void)
  */
 void __init setup_arch(char **cmdline_p)
 {
+   kasan_init();
+
*cmdline_p = boot_command_line;
 
/* Set a 

[PATCH v10 10/18] powerpc: disable KASAN instrumentation on early/critical files.

2019-03-12 Thread Christophe Leroy
All files containing functions run before kasan_early_init() is called
must have KASAN instrumentation disabled.

For those file, branch profiling also have to be disabled otherwise
each if () generates a call to ftrace_likely_update().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/Makefile | 12 
 arch/powerpc/lib/Makefile|  8 
 arch/powerpc/mm/Makefile |  6 ++
 arch/powerpc/platforms/powermac/Makefile |  6 ++
 arch/powerpc/purgatory/Makefile  |  3 +++
 arch/powerpc/xmon/Makefile   |  1 +
 6 files changed, 36 insertions(+)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 45e47752b692..0ea6c4aa3a20 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -31,6 +31,18 @@ CFLAGS_REMOVE_btext.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_prom.o = $(CC_FLAGS_FTRACE)
 endif
 
+KASAN_SANITIZE_early_32.o := n
+KASAN_SANITIZE_cputable.o := n
+KASAN_SANITIZE_prom_init.o := n
+KASAN_SANITIZE_btext.o := n
+
+ifdef CONFIG_KASAN
+CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
+CFLAGS_cputable.o += -DDISABLE_BRANCH_PROFILING
+CFLAGS_prom_init.o += -DDISABLE_BRANCH_PROFILING
+CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
+endif
+
 obj-y  := cputable.o ptrace.o syscalls.o \
   irq.o align.o signal_32.o pmc.o vdso.o \
   process.o systbl.o idle.o \
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 47a4de434c22..c55f9c27bf79 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -8,6 +8,14 @@ ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 CFLAGS_REMOVE_code-patching.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_feature-fixups.o = $(CC_FLAGS_FTRACE)
 
+KASAN_SANITIZE_code-patching.o := n
+KASAN_SANITIZE_feature-fixups.o := n
+
+ifdef CONFIG_KASAN
+CFLAGS_code-patching.o += -DDISABLE_BRANCH_PROFILING
+CFLAGS_feature-fixups.o += -DDISABLE_BRANCH_PROFILING
+endif
+
 obj-y += alloc.o code-patching.o feature-fixups.o
 
 ifndef CONFIG_KASAN
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index d52ec118e09d..240d73dce6bb 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -7,6 +7,12 @@ ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
 CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
 
+KASAN_SANITIZE_ppc_mmu_32.o := n
+
+ifdef CONFIG_KASAN
+CFLAGS_ppc_mmu_32.o+= -DDISABLE_BRANCH_PROFILING
+endif
+
 obj-y  := fault.o mem.o pgtable.o mmap.o \
   init_$(BITS).o pgtable_$(BITS).o \
   init-common.o mmu_context.o drmem.o
diff --git a/arch/powerpc/platforms/powermac/Makefile 
b/arch/powerpc/platforms/powermac/Makefile
index 20ebf35d7913..f4247ade71ca 100644
--- a/arch/powerpc/platforms/powermac/Makefile
+++ b/arch/powerpc/platforms/powermac/Makefile
@@ -2,6 +2,12 @@
 CFLAGS_bootx_init.o+= -fPIC
 CFLAGS_bootx_init.o+= $(call cc-option, -fno-stack-protector)
 
+KASAN_SANITIZE_bootx_init.o := n
+
+ifdef CONFIG_KASAN
+CFLAGS_bootx_init.o+= -DDISABLE_BRANCH_PROFILING
+endif
+
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace early boot code
 CFLAGS_REMOVE_bootx_init.o = $(CC_FLAGS_FTRACE)
diff --git a/arch/powerpc/purgatory/Makefile b/arch/powerpc/purgatory/Makefile
index 4314ba5baf43..7c6d8b14f440 100644
--- a/arch/powerpc/purgatory/Makefile
+++ b/arch/powerpc/purgatory/Makefile
@@ -1,4 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
+
+KASAN_SANITIZE := n
+
 targets += trampoline.o purgatory.ro kexec-purgatory.c
 
 LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index 3050f9323254..f142570ad860 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -7,6 +7,7 @@ subdir-ccflags-y := $(call cc-disable-warning, 
builtin-requires-header)
 GCOV_PROFILE := n
 KCOV_INSTRUMENT := n
 UBSAN_SANITIZE := n
+KASAN_SANITIZE := n
 
 # Disable ftrace for the entire directory
 ORIG_CFLAGS := $(KBUILD_CFLAGS)
-- 
2.13.3



[PATCH v10 08/18] powerpc/32: make KVIRT_TOP dependent on FIXMAP_START

2019-03-12 Thread Christophe Leroy
When we add KASAN shadow area, KVIRT_TOP can't be anymore fixed
at 0xfe00.

This patch uses FIXADDR_START to define KVIRT_TOP.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 13 ++---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 13 ++---
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index aa8406b8f7ba..838de59f6754 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -134,15 +134,24 @@ static inline bool pte_user(pte_t pte)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
 #define USER_PTRS_PER_PGD  (TASK_SIZE / PGDIR_SIZE)
+
+#ifndef __ASSEMBLY__
+
+int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
+
+#endif /* !__ASSEMBLY__ */
+
 /*
  * This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
  * value (for now) on others, from where we can start layout kernel
  * virtual space that goes below PKMAP and FIXMAP
  */
+#include 
+
 #ifdef CONFIG_HIGHMEM
 #define KVIRT_TOP  PKMAP_BASE
 #else
-#define KVIRT_TOP  (0xfe00UL)  /* for now, could be FIXMAP_BASE ? */
+#define KVIRT_TOP  FIXADDR_START
 #endif
 
 /*
@@ -373,8 +382,6 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) 
>> 3 })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << 3 })
 
-int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
-
 /* Generic accessors to PTE bits */
 static inline int pte_write(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_RW);}
 static inline int pte_read(pte_t pte)  { return 1; }
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index bed433358260..0284f8f5305f 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -64,15 +64,24 @@ extern int icache_44x_need_flush;
 #define pgd_ERROR(e) \
pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 
+#ifndef __ASSEMBLY__
+
+int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
+
+#endif /* !__ASSEMBLY__ */
+
+
 /*
  * This is the bottom of the PKMAP area with HIGHMEM or an arbitrary
  * value (for now) on others, from where we can start layout kernel
  * virtual space that goes below PKMAP and FIXMAP
  */
+#include 
+
 #ifdef CONFIG_HIGHMEM
 #define KVIRT_TOP  PKMAP_BASE
 #else
-#define KVIRT_TOP  (0xfe00UL)  /* for now, could be FIXMAP_BASE ? */
+#define KVIRT_TOP  FIXADDR_START
 #endif
 
 /*
@@ -379,8 +388,6 @@ static inline int pte_young(pte_t pte)
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val(pte) 
>> 3 })
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << 3 })
 
-int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
-
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_POWERPC_NOHASH_32_PGTABLE_H */
-- 
2.13.3



[PATCH v10 09/18] powerpc/32: prepare shadow area for KASAN

2019-03-12 Thread Christophe Leroy
This patch prepares a shadow area for KASAN.

The shadow area will be at the top of the kernel virtual
memory space above the fixmap area and will occupy one
eighth of the total kernel virtual memory space.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig.debug|  5 +
 arch/powerpc/include/asm/fixmap.h |  5 +
 arch/powerpc/include/asm/kasan.h  | 16 
 arch/powerpc/mm/mem.c |  4 
 arch/powerpc/mm/ptdump/ptdump.c   |  8 
 5 files changed, 38 insertions(+)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 4e00cb0a5464..61febbbdd02b 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -366,3 +366,8 @@ config PPC_FAST_ENDIAN_SWITCH
 depends on DEBUG_KERNEL && PPC_BOOK3S_64
 help
  If you're unsure what this is, say N.
+
+config KASAN_SHADOW_OFFSET
+   hex
+   depends on KASAN
+   default 0xe000
diff --git a/arch/powerpc/include/asm/fixmap.h 
b/arch/powerpc/include/asm/fixmap.h
index b9fbed84ddca..0cfc365d814b 100644
--- a/arch/powerpc/include/asm/fixmap.h
+++ b/arch/powerpc/include/asm/fixmap.h
@@ -22,7 +22,12 @@
 #include 
 #endif
 
+#ifdef CONFIG_KASAN
+#include 
+#define FIXADDR_TOP(KASAN_SHADOW_START - PAGE_SIZE)
+#else
 #define FIXADDR_TOP((unsigned long)(-PAGE_SIZE))
+#endif
 
 /*
  * Here we define all the compile-time 'special' virtual
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 2c179a39d4ba..05274dea3109 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -12,4 +12,20 @@
 #define EXPORT_SYMBOL_KASAN(fn)
 #endif
 
+#ifndef __ASSEMBLY__
+
+#include 
+
+#define KASAN_SHADOW_SCALE_SHIFT   3
+
+#define KASAN_SHADOW_START (KASAN_SHADOW_OFFSET + \
+(PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
+
+#define KASAN_SHADOW_OFFSETASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
+
+#define KASAN_SHADOW_END   0UL
+
+#define KASAN_SHADOW_SIZE  (KASAN_SHADOW_END - KASAN_SHADOW_START)
+
+#endif /* __ASSEMBLY */
 #endif
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index f6787f90e158..4e7fa4eb2dd3 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -309,6 +309,10 @@ void __init mem_init(void)
mem_init_print_info(NULL);
 #ifdef CONFIG_PPC32
pr_info("Kernel virtual memory layout:\n");
+#ifdef CONFIG_KASAN
+   pr_info("  * 0x%08lx..0x%08lx  : kasan shadow mem\n",
+   KASAN_SHADOW_START, KASAN_SHADOW_END);
+#endif
pr_info("  * 0x%08lx..0x%08lx  : fixmap\n", FIXADDR_START, FIXADDR_TOP);
 #ifdef CONFIG_HIGHMEM
pr_info("  * 0x%08lx..0x%08lx  : highmem PTEs\n",
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 37138428ab55..812ed680024f 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -101,6 +101,10 @@ static struct addr_marker address_markers[] = {
{ 0,"Fixmap start" },
{ 0,"Fixmap end" },
 #endif
+#ifdef CONFIG_KASAN
+   { 0,"kasan shadow mem start" },
+   { 0,"kasan shadow mem end" },
+#endif
{ -1,   NULL },
 };
 
@@ -322,6 +326,10 @@ static void populate_markers(void)
 #endif
address_markers[i++].start_address = FIXADDR_START;
address_markers[i++].start_address = FIXADDR_TOP;
+#ifdef CONFIG_KASAN
+   address_markers[i++].start_address = KASAN_SHADOW_START;
+   address_markers[i++].start_address = KASAN_SHADOW_END;
+#endif
 #endif /* CONFIG_PPC64 */
 }
 
-- 
2.13.3



[PATCH v10 05/18] powerpc/prom_init: don't use string functions from lib/

2019-03-12 Thread Christophe Leroy
When KASAN is active, the string functions in lib/ are doing the
KASAN checks. This is too early for prom_init.

This patch implements dedicated string functions for prom_init,
which will be compiled in with KASAN disabled.

Size of prom_init before the patch:
   textdata bss dec hex filename
  12060 4886960   195084c34 arch/powerpc/kernel/prom_init.o

Size of prom_init after the patch:
   textdata bss dec hex filename
  12460 4886960   199084dc4 arch/powerpc/kernel/prom_init.o

This increases the size of prom_init a bit, but as prom_init is
in __init section, it is freed after boot anyway.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/prom_init.c| 211 ++---
 arch/powerpc/kernel/prom_init_check.sh |   2 +-
 2 files changed, 171 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index ecf083c46bdb..7017156168e8 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -224,6 +224,135 @@ static bool  __prombss rtas_has_query_cpu_stopped;
 #define PHANDLE_VALID(p)   ((p) != 0 && (p) != PROM_ERROR)
 #define IHANDLE_VALID(i)   ((i) != 0 && (i) != PROM_ERROR)
 
+/* Copied from lib/string.c and lib/kstrtox.c */
+
+static int __init prom_strcmp(const char *cs, const char *ct)
+{
+   unsigned char c1, c2;
+
+   while (1) {
+   c1 = *cs++;
+   c2 = *ct++;
+   if (c1 != c2)
+   return c1 < c2 ? -1 : 1;
+   if (!c1)
+   break;
+   }
+   return 0;
+}
+
+static char __init *prom_strcpy(char *dest, const char *src)
+{
+   char *tmp = dest;
+
+   while ((*dest++ = *src++) != '\0')
+   /* nothing */;
+   return tmp;
+}
+
+static int __init prom_strncmp(const char *cs, const char *ct, size_t count)
+{
+   unsigned char c1, c2;
+
+   while (count) {
+   c1 = *cs++;
+   c2 = *ct++;
+   if (c1 != c2)
+   return c1 < c2 ? -1 : 1;
+   if (!c1)
+   break;
+   count--;
+   }
+   return 0;
+}
+
+static size_t __init prom_strlen(const char *s)
+{
+   const char *sc;
+
+   for (sc = s; *sc != '\0'; ++sc)
+   /* nothing */;
+   return sc - s;
+}
+
+static int __init prom_memcmp(const void *cs, const void *ct, size_t count)
+{
+   const unsigned char *su1, *su2;
+   int res = 0;
+
+   for (su1 = cs, su2 = ct; 0 < count; ++su1, ++su2, count--)
+   if ((res = *su1 - *su2) != 0)
+   break;
+   return res;
+}
+
+static char __init *prom_strstr(const char *s1, const char *s2)
+{
+   size_t l1, l2;
+
+   l2 = prom_strlen(s2);
+   if (!l2)
+   return (char *)s1;
+   l1 = prom_strlen(s1);
+   while (l1 >= l2) {
+   l1--;
+   if (!prom_memcmp(s1, s2, l2))
+   return (char *)s1;
+   s1++;
+   }
+   return NULL;
+}
+
+static size_t __init prom_strlcpy(char *dest, const char *src, size_t size)
+{
+   size_t ret = prom_strlen(src);
+
+   if (size) {
+   size_t len = (ret >= size) ? size - 1 : ret;
+   memcpy(dest, src, len);
+   dest[len] = '\0';
+   }
+   return ret;
+}
+
+#ifdef CONFIG_PPC_PSERIES
+static int __init prom_strtobool(const char *s, bool *res)
+{
+   if (!s)
+   return -EINVAL;
+
+   switch (s[0]) {
+   case 'y':
+   case 'Y':
+   case '1':
+   *res = true;
+   return 0;
+   case 'n':
+   case 'N':
+   case '0':
+   *res = false;
+   return 0;
+   case 'o':
+   case 'O':
+   switch (s[1]) {
+   case 'n':
+   case 'N':
+   *res = true;
+   return 0;
+   case 'f':
+   case 'F':
+   *res = false;
+   return 0;
+   default:
+   break;
+   }
+   default:
+   break;
+   }
+
+   return -EINVAL;
+}
+#endif
 
 /* This is the one and *ONLY* place where we actually call open
  * firmware.
@@ -555,7 +684,7 @@ static int __init prom_setprop(phandle node, const char 
*nodename,
add_string(, tohex((u32)(unsigned long) value));
add_string(, tohex(valuelen));
add_string(, tohex(ADDR(pname)));
-   add_string(, tohex(strlen(pname)));
+   add_string(, tohex(prom_strlen(pname)));
add_string(, "property");
*p = 0;
return call_prom("interpret", 1, 1, (u32)(unsigned long) cmd);
@@ -638,23 +767,23 @@ static void __init early_cmdline_parse(void)
if ((long)prom.chosen > 0)
l = prom_getprop(prom.chosen, "bootargs", p, 

[PATCH v10 07/18] powerpc/32: use memset() instead of memset_io() to zero BSS

2019-03-12 Thread Christophe Leroy
Since commit 400c47d81ca38 ("powerpc32: memset: only use dcbz once cache is
enabled"), memset() can be used before activation of the cache,
so no need to use memset_io() for zeroing the BSS.

Acked-by: Dmitry Vyukov 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/early_32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/early_32.c b/arch/powerpc/kernel/early_32.c
index cf3cdd81dc47..3482118ffe76 100644
--- a/arch/powerpc/kernel/early_32.c
+++ b/arch/powerpc/kernel/early_32.c
@@ -21,8 +21,8 @@ notrace unsigned long __init early_init(unsigned long dt_ptr)
 {
unsigned long offset = reloc_offset();
 
-   /* First zero the BSS -- use memset_io, some platforms don't have 
caches on yet */
-   memset_io((void __iomem *)PTRRELOC(&__bss_start), 0, __bss_stop - 
__bss_start);
+   /* First zero the BSS */
+   memset(PTRRELOC(&__bss_start), 0, __bss_stop - __bss_start);
 
/*
 * Identify the CPU type and fix up code sections
-- 
2.13.3



[PATCH v10 06/18] powerpc/mm: don't use direct assignation during early boot.

2019-03-12 Thread Christophe Leroy
In kernel/cputable.c, explicitly use memcpy() instead of *y = *x;
This will allow GCC to replace it with __memcpy() when KASAN is
selected.

Acked-by: Dmitry Vyukov 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cputable.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 1eab54bc6ee9..cd12f362b61f 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2147,7 +2147,11 @@ void __init set_cur_cpu_spec(struct cpu_spec *s)
struct cpu_spec *t = _cpu_spec;
 
t = PTRRELOC(t);
-   *t = *s;
+   /*
+* use memcpy() instead of *t = *s so that GCC replaces it
+* by __memcpy() when KASAN is active
+*/
+   memcpy(t, s, sizeof(*t));
 
*PTRRELOC(_cpu_spec) = _cpu_spec;
 }
@@ -2161,8 +2165,11 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned 
long offset,
t = PTRRELOC(t);
old = *t;
 
-   /* Copy everything, then do fixups */
-   *t = *s;
+   /*
+* Copy everything, then do fixups. Use memcpy() instead of *t = *s
+* so that GCC replaces it by __memcpy() when KASAN is active
+*/
+   memcpy(t, s, sizeof(*t));
 
/*
 * If we are overriding a previous value derived from the real
-- 
2.13.3



[PATCH v10 00/18] KASAN for powerpc/32 and RFC for 64bit Book3E

2019-03-12 Thread Christophe Leroy
This series adds KASAN support to powerpc/32

32 bits tested on nohash/32 (8xx), book3s/32 (mpc832x ie 603) and qemu mac99
64bit Book3E tested by Daniel on e6500

Changes in v10:
- Prepended the patch which fixes boot on hash32
- Reduced ifdef mess related to CONFIG_CMDLINE in prom_init.c
- Fixed strings preparation macros for ppc64 build (Reported by Daniel)
- Fixed boot failure on hash32 when total amount of memory is above the initial 
amount mapped with BATs.
- Reordered stuff in kasan.h to have a smoother patch when adding 64bit Book3E
- Split the change to PAGE_READONLY out of the hash32 patch.
- Appended Daniel's series for 64bit Book3E (with a build failure fix and a few 
cosmetic changes)

Changes in v9:
- Fixed fixmap IMMR alignment issue on 8xx with KASAN enabled.
- Set up final shadow page tables before switching to the final hash table on 
hash32
- Using PAGE_READONLY instead of PAGE_KERNEL_RO on hash32
- Use flash_tlb_kernel_range() instead of flash_tlb_mm() which doesn't work for 
kernel on some subarches.
- use __set_pte_at() instead of pte_update() to install final page tables

Changes in v8:
- Fixed circular issue between pgtable.h and fixmap.h
- Added missing includes in ppc64 string files
- Fixed kasan string related macro names for ppc64.
- Fixed most checkpatch messages
- build tested on kisskb 
(http://kisskb.ellerman.id.au/kisskb/head/6e65827de2fe71d21682dafd9084ed2cc6e06d4f/)
- moved CONFIG_KASAN_SHADOW_OFFSET in Kconfig.debug

Changes in v7:
- split in several smaller patches
- prom_init now has its own string functions
- full deactivation of powerpc-optimised string functions when KASAN is active
- shadow area now at a fixed place on very top of kernel virtual space.
- Early static hash table for hash book3s/32.
- Full support of both inline and outline instrumentation for both hash and 
nohash ppc32
- Earlier full activation of kasan.

Changes in v6:
- Fixed oops on module loading (due to access to RO shadow zero area).
- Added support for hash book3s/32, thanks to Daniel's patch to differ KASAN 
activation.
- Reworked handling of optimised string functions (dedicated patch for it)
- Reordered some files to ease adding of book3e/64 support.

Changes in v5:
- Added KASAN_SHADOW_OFFSET in Makefile, otherwise we fallback to KASAN_MINIMAL
and some stuff like stack instrumentation is not performed
- Moved calls to kasan_early_init() in head.S because stack instrumentation
in machine_init was performed before the call to kasan_early_init()
- Mapping kasan_early_shadow_page RW in kasan_early_init() and
remaping RO later in kasan_init()
- Allocating a big memblock() for shadow area, falling back to PAGE_SIZE blocks 
in case of failure.

Changes in v4:
- Comments from Andrey (DISABLE_BRANCH_PROFILING, Activation of reports)
- Proper initialisation of shadow area in kasan_init()
- Panic in case Hash table is required.
- Added comments in patch one to explain why *t = *s becomes memcpy(t, s, ...)
- Call of kasan_init_tags()

Changes in v3:
- Removed the printk() in kasan_early_init() to avoid build failure (see 
https://github.com/linuxppc/issues/issues/218)
- Added necessary changes in asm/book3s/32/pgtable.h to get it work on powerpc 
603 family
- Added a few KASAN_SANITIZE_xxx.o := n to successfully boot on powerpc 603 
family

Changes in v2:
- Rebased.
- Using __set_pte_at() to build the early table.
- Worked around and got rid of the patch adding asm/page.h in 
asm/pgtable-types.h
==> might be fixed independently but not needed for this serie.

Christophe Leroy (18):
  powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32
  powerpc/32: Move early_init() in a separate file
  powerpc: prepare string/mem functions for KASAN
  powerpc: remove CONFIG_CMDLINE #ifdef mess
  powerpc/prom_init: don't use string functions from lib/
  powerpc/mm: don't use direct assignation during early boot.
  powerpc/32: use memset() instead of memset_io() to zero BSS
  powerpc/32: make KVIRT_TOP dependent on FIXMAP_START
  powerpc/32: prepare shadow area for KASAN
  powerpc: disable KASAN instrumentation on early/critical files.
  powerpc/32: Add KASAN support
  powerpc/32s: move hash code patching out of MMU_init_hw()
  powerpc/32s: set up an early static hash table for KASAN.
  powerpc/32s: map kasan zero shadow with PAGE_READONLY instead of
PAGE_KERNEL_RO
  kasan: do not open-code addr_has_shadow
  kasan: allow architectures to manage the memory-to-shadow mapping
  kasan: allow architectures to provide an outline readiness check
  powerpc: KASAN for 64bit Book3E

 arch/powerpc/Kconfig |   8 +-
 arch/powerpc/Kconfig.debug   |   5 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  13 +-
 arch/powerpc/include/asm/fixmap.h|   5 +
 arch/powerpc/include/asm/kasan.h | 111 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h |  13 +-
 arch/powerpc/include/asm/string.h|  32 +++-
 arch/powerpc/kernel/Makefile 

[PATCH v10 01/18] powerpc/6xx: fix setup and use of SPRN_SPRG_PGDIR for hash32

2019-03-12 Thread Christophe Leroy
Not only the 603 but all 6xx need SPRN_SPRG_PGDIR to be initialised at
startup. This patch move it from __setup_cpu_603() to start_here()
and __secondary_start(), close to the initialisation of SPRN_THREAD.

Previously, virt addr of PGDIR was retrieved from thread struct.
Now that it is the phys addr which is stored in SPRN_SPRG_PGDIR,
hash_page() shall not convert it to phys anymore.
This patch removes the conversion.

Fixes: 93c4a162b014("powerpc/6xx: Store PGDIR physical address in a SPRG")
Reported-by: Guenter Roeck 
Tested-by: Guenter Roeck 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/cpu_setup_6xx.S | 3 ---
 arch/powerpc/kernel/head_32.S   | 6 ++
 arch/powerpc/mm/hash_low_32.S   | 8 
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S 
b/arch/powerpc/kernel/cpu_setup_6xx.S
index 6f1c11e0691f..7534ecff5e92 100644
--- a/arch/powerpc/kernel/cpu_setup_6xx.S
+++ b/arch/powerpc/kernel/cpu_setup_6xx.S
@@ -24,9 +24,6 @@ BEGIN_MMU_FTR_SECTION
li  r10,0
mtspr   SPRN_SPRG_603_LRU,r10   /* init SW LRU tracking */
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU)
-   lis r10, (swapper_pg_dir - PAGE_OFFSET)@h
-   ori r10, r10, (swapper_pg_dir - PAGE_OFFSET)@l
-   mtspr   SPRN_SPRG_PGDIR, r10
 
 BEGIN_FTR_SECTION
bl  __init_fpu_registers
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index ce6a972f2584..48051c8977c5 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -855,6 +855,9 @@ __secondary_start:
li  r3,0
stw r3, RTAS_SP(r4) /* 0 => not in RTAS */
 #endif
+   lis r4, (swapper_pg_dir - PAGE_OFFSET)@h
+   ori r4, r4, (swapper_pg_dir - PAGE_OFFSET)@l
+   mtspr   SPRN_SPRG_PGDIR, r4
 
/* enable MMU and jump to start_secondary */
li  r4,MSR_KERNEL
@@ -942,6 +945,9 @@ start_here:
li  r3,0
stw r3, RTAS_SP(r4) /* 0 => not in RTAS */
 #endif
+   lis r4, (swapper_pg_dir - PAGE_OFFSET)@h
+   ori r4, r4, (swapper_pg_dir - PAGE_OFFSET)@l
+   mtspr   SPRN_SPRG_PGDIR, r4
 
/* stack */
lis r1,init_thread_union@ha
diff --git a/arch/powerpc/mm/hash_low_32.S b/arch/powerpc/mm/hash_low_32.S
index 1f13494efb2b..a6c491f18a04 100644
--- a/arch/powerpc/mm/hash_low_32.S
+++ b/arch/powerpc/mm/hash_low_32.S
@@ -70,12 +70,12 @@ _GLOBAL(hash_page)
lis r0,KERNELBASE@h /* check if kernel address */
cmplw   0,r4,r0
ori r3,r3,_PAGE_USER|_PAGE_PRESENT /* test low addresses as user */
-   mfspr   r5, SPRN_SPRG_PGDIR /* virt page-table root */
+   mfspr   r5, SPRN_SPRG_PGDIR /* phys page-table root */
blt+112f/* assume user more likely */
-   lis r5,swapper_pg_dir@ha/* if kernel address, use */
-   addir5,r5,swapper_pg_dir@l  /* kernel page table */
+   lis r5, (swapper_pg_dir - PAGE_OFFSET)@ha   /* if kernel address, 
use */
+   addir5 ,r5 ,(swapper_pg_dir - PAGE_OFFSET)@l/* kernel page 
table */
rlwimi  r3,r9,32-12,29,29   /* MSR_PR -> _PAGE_USER */
-112:   tophys(r5, r5)
+112:
 #ifndef CONFIG_PTE_64BIT
rlwimi  r5,r4,12,20,29  /* insert top 10 bits of address */
lwz r8,0(r5)/* get pmd entry */
-- 
2.13.3



[PATCH v10 02/18] powerpc/32: Move early_init() in a separate file

2019-03-12 Thread Christophe Leroy
In preparation of KASAN, move early_init() into a separate
file in order to allow deactivation of KASAN for that function.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/Makefile   |  2 +-
 arch/powerpc/kernel/early_32.c | 36 
 arch/powerpc/kernel/setup_32.c | 28 
 3 files changed, 37 insertions(+), 29 deletions(-)
 create mode 100644 arch/powerpc/kernel/early_32.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index cddadccf551d..45e47752b692 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -93,7 +93,7 @@ extra-y   += vmlinux.lds
 
 obj-$(CONFIG_RELOCATABLE)  += reloc_$(BITS).o
 
-obj-$(CONFIG_PPC32)+= entry_32.o setup_32.o
+obj-$(CONFIG_PPC32)+= entry_32.o setup_32.o early_32.o
 obj-$(CONFIG_PPC64)+= dma-iommu.o iommu.o
 obj-$(CONFIG_KGDB) += kgdb.o
 obj-$(CONFIG_BOOTX_TEXT)   += btext.o
diff --git a/arch/powerpc/kernel/early_32.c b/arch/powerpc/kernel/early_32.c
new file mode 100644
index ..cf3cdd81dc47
--- /dev/null
+++ b/arch/powerpc/kernel/early_32.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Early init before relocation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * We're called here very early in the boot.
+ *
+ * Note that the kernel may be running at an address which is different
+ * from the address that it was linked at, so we must use RELOC/PTRRELOC
+ * to access static data (including strings).  -- paulus
+ */
+notrace unsigned long __init early_init(unsigned long dt_ptr)
+{
+   unsigned long offset = reloc_offset();
+
+   /* First zero the BSS -- use memset_io, some platforms don't have 
caches on yet */
+   memset_io((void __iomem *)PTRRELOC(&__bss_start), 0, __bss_stop - 
__bss_start);
+
+   /*
+* Identify the CPU type and fix up code sections
+* that depend on which cpu we have.
+*/
+   identify_cpu(offset, mfspr(SPRN_PVR));
+
+   apply_feature_fixups();
+
+   return KERNELBASE + offset;
+}
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 4a65e08a6042..3fb9f64f88fd 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -64,34 +64,6 @@ EXPORT_SYMBOL(DMA_MODE_READ);
 EXPORT_SYMBOL(DMA_MODE_WRITE);
 
 /*
- * We're called here very early in the boot.
- *
- * Note that the kernel may be running at an address which is different
- * from the address that it was linked at, so we must use RELOC/PTRRELOC
- * to access static data (including strings).  -- paulus
- */
-notrace unsigned long __init early_init(unsigned long dt_ptr)
-{
-   unsigned long offset = reloc_offset();
-
-   /* First zero the BSS -- use memset_io, some platforms don't have
-* caches on yet */
-   memset_io((void __iomem *)PTRRELOC(&__bss_start), 0,
-   __bss_stop - __bss_start);
-
-   /*
-* Identify the CPU type and fix up code sections
-* that depend on which cpu we have.
-*/
-   identify_cpu(offset, mfspr(SPRN_PVR));
-
-   apply_feature_fixups();
-
-   return KERNELBASE + offset;
-}
-
-
-/*
  * This is run before start_kernel(), the kernel has been relocated
  * and we are running with enough of the MMU enabled to have our
  * proper kernel virtual addresses
-- 
2.13.3



[PATCH v10 04/18] powerpc: remove CONFIG_CMDLINE #ifdef mess

2019-03-12 Thread Christophe Leroy
This patch makes CONFIG_CMDLINE defined at all time. It avoids
having to enclose related code inside #ifdef CONFIG_CMDLINE

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig| 6 +++---
 arch/powerpc/kernel/prom_init.c | 9 +++--
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b5dce13a6132..facaa6ba0d2a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -833,9 +833,9 @@ config CMDLINE_BOOL
bool "Default bootloader kernel arguments"
 
 config CMDLINE
-   string "Initial kernel command string"
-   depends on CMDLINE_BOOL
-   default "console=ttyS0,9600 console=tty0 root=/dev/sda2"
+   string "Initial kernel command string" if CMDLINE_BOOL
+   default "console=ttyS0,9600 console=tty0 root=/dev/sda2" if CMDLINE_BOOL
+   default ""
help
  On some platforms, there is currently no way for the boot loader to
  pass arguments to the kernel. For these platforms, you can supply
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index f33ff4163a51..ecf083c46bdb 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -631,17 +631,14 @@ static void __init early_cmdline_parse(void)
const char *opt;
 
char *p;
-   int l __maybe_unused = 0;
+   int l = 0;
 
prom_cmd_line[0] = 0;
p = prom_cmd_line;
if ((long)prom.chosen > 0)
l = prom_getprop(prom.chosen, "bootargs", p, 
COMMAND_LINE_SIZE-1);
-#ifdef CONFIG_CMDLINE
-   if (l <= 0 || p[0] == '\0') /* dbl check */
-   strlcpy(prom_cmd_line,
-   CONFIG_CMDLINE, sizeof(prom_cmd_line));
-#endif /* CONFIG_CMDLINE */
+   if (IS_ENABLED(CONFIG_CMDLINE_BOOL) && (l <= 0 || p[0] == '\0')) /* dbl 
check */
+   strlcpy(prom_cmd_line, CONFIG_CMDLINE, sizeof(prom_cmd_line));
prom_printf("command line: %s\n", prom_cmd_line);
 
 #ifdef CONFIG_PPC64
-- 
2.13.3



[PATCH v10 03/18] powerpc: prepare string/mem functions for KASAN

2019-03-12 Thread Christophe Leroy
CONFIG_KASAN implements wrappers for memcpy() memmove() and memset()
Those wrappers are doing the verification then call respectively
__memcpy() __memmove() and __memset(). The arches are therefore
expected to rename their optimised functions that way.

For files on which KASAN is inhibited, #defines are used to allow
them to directly call optimised versions of the functions without
going through the KASAN wrappers.

See commit 393f203f5fd5 ("x86_64: kasan: add interceptors for
memset/memmove/memcpy functions") for details.

Other string / mem functions do not (yet) have kasan wrappers,
we therefore have to fallback to the generic versions when
KASAN is active, otherwise KASAN checks will be skipped.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/kasan.h   | 15 +++
 arch/powerpc/include/asm/string.h  | 32 +---
 arch/powerpc/kernel/prom_init_check.sh | 10 +-
 arch/powerpc/lib/Makefile  | 11 ---
 arch/powerpc/lib/copy_32.S | 12 +---
 arch/powerpc/lib/mem_64.S  |  9 +++--
 arch/powerpc/lib/memcpy_64.S   |  4 +++-
 7 files changed, 80 insertions(+), 13 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kasan.h

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
new file mode 100644
index ..2c179a39d4ba
--- /dev/null
+++ b/arch/powerpc/include/asm/kasan.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KASAN_H
+#define __ASM_KASAN_H
+
+#ifdef CONFIG_KASAN
+#define _GLOBAL_KASAN(fn)  _GLOBAL(__##fn)
+#define _GLOBAL_TOC_KASAN(fn)  _GLOBAL_TOC(__##fn)
+#define EXPORT_SYMBOL_KASAN(fn)EXPORT_SYMBOL(__##fn)
+#else
+#define _GLOBAL_KASAN(fn)  _GLOBAL(fn)
+#define _GLOBAL_TOC_KASAN(fn)  _GLOBAL_TOC(fn)
+#define EXPORT_SYMBOL_KASAN(fn)
+#endif
+
+#endif
diff --git a/arch/powerpc/include/asm/string.h 
b/arch/powerpc/include/asm/string.h
index 1647de15a31e..9bf6dffb4090 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -4,14 +4,17 @@
 
 #ifdef __KERNEL__
 
+#ifndef CONFIG_KASAN
 #define __HAVE_ARCH_STRNCPY
 #define __HAVE_ARCH_STRNCMP
+#define __HAVE_ARCH_MEMCHR
+#define __HAVE_ARCH_MEMCMP
+#define __HAVE_ARCH_MEMSET16
+#endif
+
 #define __HAVE_ARCH_MEMSET
 #define __HAVE_ARCH_MEMCPY
 #define __HAVE_ARCH_MEMMOVE
-#define __HAVE_ARCH_MEMCMP
-#define __HAVE_ARCH_MEMCHR
-#define __HAVE_ARCH_MEMSET16
 #define __HAVE_ARCH_MEMCPY_FLUSHCACHE
 
 extern char * strcpy(char *,const char *);
@@ -27,7 +30,27 @@ extern int memcmp(const void *,const void *,__kernel_size_t);
 extern void * memchr(const void *,int,__kernel_size_t);
 extern void * memcpy_flushcache(void *,const void *,__kernel_size_t);
 
+void *__memset(void *s, int c, __kernel_size_t count);
+void *__memcpy(void *to, const void *from, __kernel_size_t n);
+void *__memmove(void *to, const void *from, __kernel_size_t n);
+
+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
+/*
+ * For files that are not instrumented (e.g. mm/slub.c) we
+ * should use not instrumented version of mem* functions.
+ */
+#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#define memmove(dst, src, len) __memmove(dst, src, len)
+#define memset(s, c, n) __memset(s, c, n)
+
+#ifndef __NO_FORTIFY
+#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
+#endif
+
+#endif
+
 #ifdef CONFIG_PPC64
+#ifndef CONFIG_KASAN
 #define __HAVE_ARCH_MEMSET32
 #define __HAVE_ARCH_MEMSET64
 
@@ -49,8 +72,11 @@ static inline void *memset64(uint64_t *p, uint64_t v, 
__kernel_size_t n)
 {
return __memset64(p, v, n * 8);
 }
+#endif
 #else
+#ifndef CONFIG_KASAN
 #define __HAVE_ARCH_STRLEN
+#endif
 
 extern void *memset16(uint16_t *, uint16_t, __kernel_size_t);
 #endif
diff --git a/arch/powerpc/kernel/prom_init_check.sh 
b/arch/powerpc/kernel/prom_init_check.sh
index 667df97d2595..181fd10008ef 100644
--- a/arch/powerpc/kernel/prom_init_check.sh
+++ b/arch/powerpc/kernel/prom_init_check.sh
@@ -16,8 +16,16 @@
 # If you really need to reference something from prom_init.o add
 # it to the list below:
 
+grep "^CONFIG_KASAN=y$" .config >/dev/null
+if [ $? -eq 0 ]
+then
+   MEM_FUNCS="__memcpy __memset"
+else
+   MEM_FUNCS="memcpy memset"
+fi
+
 WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
-_end enter_prom memcpy memset reloc_offset __secondary_hold
+_end enter_prom $MEM_FUNCS reloc_offset __secondary_hold
 __secondary_hold_acknowledge __secondary_hold_spinloop __start
 strcmp strcpy strlcpy strlen strncmp strstr kstrtobool logo_linux_clut224
 reloc_got2 kernstart_addr memstart_addr linux_banner _stext
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 79396e184bca..47a4de434c22 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -8,9 +8,14 @@ ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 CFLAGS_REMOVE_code-patching.o = $(CC_FLAGS_FTRACE)
 

Re: [PATCH] powerpc: sstep: Mark variable `rc` as unused in function 'analyse_instr'

2019-03-12 Thread Christophe Leroy




Le 12/03/2019 à 22:12, Mathieu Malaterre a écrit :

On Tue, Mar 12, 2019 at 9:56 PM Christophe Leroy
 wrote:




Le 12/03/2019 à 21:20, Mathieu Malaterre a écrit :

Add gcc attribute unused for `rc` variable.

Fix warnings treated as errors with W=1:

arch/powerpc/lib/sstep.c:1172:31: error: variable 'rc' set but not used 
[-Werror=unused-but-set-variable]

Signed-off-by: Mathieu Malaterre 
---
   arch/powerpc/lib/sstep.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3d33fb509ef4..32d092f62ae0 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1169,7 +1169,7 @@ static nokprobe_inline int trap_compare(long v1, long v2)
   int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 unsigned int instr)
   {
- unsigned int opcode, ra, rb, rc, rd, spr, u;
+ unsigned int opcode, ra, rb, rc __maybe_unused, rd, spr, u;


I think it would be better to enclose 'rc' inside a #ifdef CONFIG_PPC64


Hum odd, I would have bet you would have suggested me to use
IS_ENABLED with some crazy scheme (I was not able to mix it with the
switch case nicely).


Well I guess yes, you could also get rid of the #ifdef __powerpc64__ and 
instead add the following just after the 'case 4:'


if (!IS_ENABLED(CONFIG_64))
break;

That's less uggly than adding two #ifdef/#endif

Christophe



Anyway I'll try your suggestion and post a v2.


Christophe


   unsigned long int imm;
   unsigned long int val, val2;
   unsigned int mb, me, sh;



[PATCH v2] powerpc/32: sstep: Move variable `rc` within CONFIG_PPC64 sentinels

2019-03-12 Thread Mathieu Malaterre
Fix warnings treated as errors with W=1:

  arch/powerpc/lib/sstep.c:1172:31: error: variable 'rc' set but not used 
[-Werror=unused-but-set-variable]

Suggested-by: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 
---
v2: as suggested prefer CONFIG_PPC64 sentinel instead of unused keyword

 arch/powerpc/lib/sstep.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3d33fb509ef4..9996dc7a0b46 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1169,7 +1169,10 @@ static nokprobe_inline int trap_compare(long v1, long v2)
 int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
  unsigned int instr)
 {
-   unsigned int opcode, ra, rb, rc, rd, spr, u;
+   unsigned int opcode, ra, rb, rd, spr, u;
+#ifdef CONFIG_PPC64
+   unsigned int rc;
+#endif
unsigned long int imm;
unsigned long int val, val2;
unsigned int mb, me, sh;
@@ -1292,7 +1295,9 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
rd = (instr >> 21) & 0x1f;
ra = (instr >> 16) & 0x1f;
rb = (instr >> 11) & 0x1f;
+#ifdef CONFIG_PPC64
rc = (instr >> 6) & 0x1f;
+#endif
 
switch (opcode) {
 #ifdef __powerpc64__
-- 
2.20.1



Re: [PATCH] powerpc: Make some functions static

2019-03-12 Thread Christophe Leroy




Le 12/03/2019 à 21:31, Mathieu Malaterre a écrit :

In commit cb9e4d10c448 ("[POWERPC] Add support for 750CL Holly board")
new functions were added. Since these functions can be made static,
make it so. While doing so, it turns out that holly_power_off and
holly_halt are unused, so remove them.


I would have said 'since these functions are only used in this C file, 
make them static'.


I think this could be split in two patches:
1/ Remove unused functions, ie holly_halt() and holly_power_off().
2/ Make the other ones static.

Christophe



Silence the following warnings triggered using W=1:

   arch/powerpc/platforms/embedded6xx/holly.c:47:5: error: no previous 
prototype for 'holly_exclude_device' [-Werror=missing-prototypes]
   arch/powerpc/platforms/embedded6xx/holly.c:190:6: error: no previous 
prototype for 'holly_show_cpuinfo' [-Werror=missing-prototypes]
   arch/powerpc/platforms/embedded6xx/holly.c:196:17: error: no previous 
prototype for 'holly_restart' [-Werror=missing-prototypes]
   arch/powerpc/platforms/embedded6xx/holly.c:236:6: error: no previous 
prototype for 'holly_power_off' [-Werror=missing-prototypes]
   arch/powerpc/platforms/embedded6xx/holly.c:243:6: error: no previous 
prototype for 'holly_halt' [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/platforms/embedded6xx/holly.c | 19 ---
  1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/holly.c 
b/arch/powerpc/platforms/embedded6xx/holly.c
index 0409714e8070..829bf3697dc9 100644
--- a/arch/powerpc/platforms/embedded6xx/holly.c
+++ b/arch/powerpc/platforms/embedded6xx/holly.c
@@ -44,7 +44,8 @@
  
  #define HOLLY_PCI_CFG_PHYS 0x7c00
  
-int holly_exclude_device(struct pci_controller *hose, u_char bus, u_char devfn)

+static int holly_exclude_device(struct pci_controller *hose, u_char bus,
+   u_char devfn)
  {
if (bus == 0 && PCI_SLOT(devfn) == 0)
return PCIBIOS_DEVICE_NOT_FOUND;
@@ -187,13 +188,13 @@ static void __init holly_init_IRQ(void)
tsi108_write_reg(TSI108_MPIC_OFFSET + 0x30c, 0);
  }
  
-void holly_show_cpuinfo(struct seq_file *m)

+static void holly_show_cpuinfo(struct seq_file *m)
  {
seq_printf(m, "vendor\t\t: IBM\n");
seq_printf(m, "machine\t\t: PPC750 GX/CL\n");
  }
  
-void __noreturn holly_restart(char *cmd)

+static void __noreturn holly_restart(char *cmd)
  {
__be32 __iomem *ocn_bar1 = NULL;
unsigned long bar;
@@ -233,18 +234,6 @@ void __noreturn holly_restart(char *cmd)
for (;;) ;
  }
  
-void holly_power_off(void)

-{
-   local_irq_disable();
-   /* No way to shut power off with software */
-   for (;;) ;
-}
-
-void holly_halt(void)
-{
-   holly_power_off();
-}
-
  /*
   * Called very early, device-tree isn't unflattened
   */



Re: [PATCH] powerpc: sstep: Mark variable `rc` as unused in function 'analyse_instr'

2019-03-12 Thread Mathieu Malaterre
On Tue, Mar 12, 2019 at 9:56 PM Christophe Leroy
 wrote:
>
>
>
> Le 12/03/2019 à 21:20, Mathieu Malaterre a écrit :
> > Add gcc attribute unused for `rc` variable.
> >
> > Fix warnings treated as errors with W=1:
> >
> >arch/powerpc/lib/sstep.c:1172:31: error: variable 'rc' set but not used 
> > [-Werror=unused-but-set-variable]
> >
> > Signed-off-by: Mathieu Malaterre 
> > ---
> >   arch/powerpc/lib/sstep.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> > index 3d33fb509ef4..32d092f62ae0 100644
> > --- a/arch/powerpc/lib/sstep.c
> > +++ b/arch/powerpc/lib/sstep.c
> > @@ -1169,7 +1169,7 @@ static nokprobe_inline int trap_compare(long v1, long 
> > v2)
> >   int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
> > unsigned int instr)
> >   {
> > - unsigned int opcode, ra, rb, rc, rd, spr, u;
> > + unsigned int opcode, ra, rb, rc __maybe_unused, rd, spr, u;
>
> I think it would be better to enclose 'rc' inside a #ifdef CONFIG_PPC64

Hum odd, I would have bet you would have suggested me to use
IS_ENABLED with some crazy scheme (I was not able to mix it with the
switch case nicely).

Anyway I'll try your suggestion and post a v2.

> Christophe
>
> >   unsigned long int imm;
> >   unsigned long int val, val2;
> >   unsigned int mb, me, sh;
> >


RE: [PATCH v2 2/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

2019-03-12 Thread Kazuhito Hagio
Hi Bhupesh,

-Original Message-
> Right now user-space tools like 'makedumpfile' and 'crash' need to rely
> on a best-guess method of determining value of 'MAX_PHYSMEM_BITS'
> supported by underlying kernel.
> 
> This value is used in user-space code to calculate the bit-space
> required to store a section for SPARESMEM (similar to the existing
> calculation method used in the kernel implementation):
> 
>   #define SECTIONS_SHIFT(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
> 
> Now, regressions have been reported in user-space utilities
> like 'makedumpfile' and 'crash' on arm64, with the recently added
> kernel support for 52-bit physical address space, as there is
> no clear method of determining this value in user-space
> (other than reading kernel CONFIG flags).
> 
> As per suggestion from makedumpfile maintainer (Kazu), it makes more
> sense to append 'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself
> rather than in arch-specific code, so that the user-space code for other
> archs can also benefit from this addition to the vmcoreinfo and use it
> as a standard way of determining 'SECTIONS_SHIFT' value in user-land.
> 
> A reference 'makedumpfile' implementation which reads the
> 'MAX_PHYSMEM_BITS' value from vmcoreinfo in a arch-independent fashion
> is available here:
> 
> [0]. 
> https://github.com/bhupesh-sharma/makedumpfile/blob/remove-max-phys-mem-bit-v1/arch/ppc64.c#L471
> 
> Cc: Boris Petkov 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: James Morse 
> Cc: Will Deacon 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Dave Anderson 
> Cc: Kazuhito Hagio 
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Cc: ke...@lists.infradead.org
> Signed-off-by: Bhupesh Sharma 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 093c9f917ed0..44b90368e183 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -467,6 +467,7 @@ static int __init crash_save_vmcoreinfo_init(void)
>  #define PAGE_OFFLINE_MAPCOUNT_VALUE  (~PG_offline)
>   VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
>  #endif
> + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);

Some architectures define MAX_PHYSMEM_BITS only with CONFIG_SPARSEMEM,
so we need to move this to the #ifdef section that exports some
mem_section things.

Thanks!
Kazu

> 
>   arch_crash_save_vmcoreinfo();
>   update_vmcoreinfo_note();
> --
> 2.7.4
> 




Re: [PATCH 00/14] entry: preempt_schedule_irq() callers scrub

2019-03-12 Thread Valentin Schneider
On 12/03/2019 18:03, Vineet Gupta wrote:
[...]
>> Regarding that loop, archs seem to fall in 3 categories:
>> A) Those that don't have the loop
> 
> Please clarify that this is the right thing to do (since core code already 
> has the
> loop) hence no fixing is required for this "category"
> 

Right, those don't need any change. I had a brief look at them to double
check they had the proper need_resched() gate before calling
preempt_schedule_irq() (with no loop) and they all seem fine. Also...

>> B) Those that have a small need_resched() loop around the
>>preempt_schedule_irq() callsite
>> C) Those that branch to some more generic code further up the entry code
>>and eventually branch back to preempt_schedule_irq()
>>
>> arc, m68k, nios2 fall in A)
> 

I forgot to include parisc in here.

[...]


[PATCH 08/14] powerpc: entry: Remove unneeded need_resched() loop

2019-03-12 Thread Valentin Schneider
Since the enabling and disabling of IRQs within preempt_schedule_irq()
is contained in a need_resched() loop, we don't need the outer arch
code loop.

Signed-off-by: Valentin Schneider 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/entry_32.S | 6 +-
 arch/powerpc/kernel/entry_64.S | 8 +---
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 0768dfd8a64e..ff3fe3824a4a 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -896,11 +896,7 @@ resume_kernel:
 */
bl  trace_hardirqs_off
 #endif
-1: bl  preempt_schedule_irq
-   CURRENT_THREAD_INFO(r9, r1)
-   lwz r3,TI_FLAGS(r9)
-   andi.   r0,r3,_TIF_NEED_RESCHED
-   bne-1b
+   bl  preempt_schedule_irq
 #ifdef CONFIG_TRACE_IRQFLAGS
/* And now, to properly rebalance the above, we tell lockdep they
 * are being turned back on, which will happen when we return
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 435927f549c4..9c86c6826856 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -857,13 +857,7 @@ resume_kernel:
 * sure we are soft-disabled first and reconcile irq state.
 */
RECONCILE_IRQ_STATE(r3,r4)
-1: bl  preempt_schedule_irq
-
-   /* Re-test flags and eventually loop */
-   CURRENT_THREAD_INFO(r9, r1)
-   ld  r4,TI_FLAGS(r9)
-   andi.   r0,r4,_TIF_NEED_RESCHED
-   bne 1b
+   bl  preempt_schedule_irq
 
/*
 * arch_local_irq_restore() from preempt_schedule_irq above may
-- 
2.20.1



[PATCH 00/14] entry: preempt_schedule_irq() callers scrub

2019-03-12 Thread Valentin Schneider
Hi,

This is the continuation of [1] where I'm hunting down
preempt_schedule_irq() callers because of [2].

I told myself the best way to get this moving forward wouldn't be to write
doc about it, but to go write some fixes and get some discussions going,
which is what this patch-set is about.

I've looked at users of preempt_schedule_irq(), and made sure they didn't
have one of those useless loops. The list of offenders is:

$ grep -r -I "preempt_schedule_irq" arch/ | cut -d/ -f2 | sort | uniq

  arc
  arm
  arm64
  c6x
  csky
  h8300
  ia64
  m68k
  microblaze
  mips
  nds32
  nios2
  parisc
  powerpc
  riscv
  s390
  sh
  sparc
  x86
  xtensa

Regarding that loop, archs seem to fall in 3 categories:
A) Those that don't have the loop
B) Those that have a small need_resched() loop around the
   preempt_schedule_irq() callsite
C) Those that branch to some more generic code further up the entry code
   and eventually branch back to preempt_schedule_irq()

arc, m68k, nios2 fall in A)
sparc, ia64, s390 fall in C)
all the others fall in B)

I've written patches for B) and C) EXCEPT for ia64 and s390 because I
haven't been able to tell if it's actually fine to kill that "long jump"
(and maybe I'm wrong on sparc). Hopefully folks who understand what goes on
in there might be able to shed some light.

Also, since I sent patches for arm & arm64 in [1] I'm not including them
here.

Boot-tested on:
- x86

Build-tested on:
- h8300
- c6x
- powerpc
- mips
- nds32
- microblaze
- sparc
- xtensa

Thanks,
Valentin

[1]: 
https://lore.kernel.org/lkml/20190131182339.9835-1-valentin.schnei...@arm.com/
[2]: https://lore.kernel.org/lkml/cc989920-a13b-d53b-db83-1584a7f53...@arm.com/

Valentin Schneider (14):
  sched/core: Fix preempt_schedule() interrupt return comment
  c6x: entry: Remove unneeded need_resched() loop
  csky: entry: Remove unneeded need_resched() loop
  h8300: entry: Remove unneeded need_resched() loop
  microblaze: entry: Remove unneeded need_resched() loop
  MIPS: entry: Remove unneeded need_resched() loop
  nds32: ex-exit: Remove unneeded need_resched() loop
  powerpc: entry: Remove unneeded need_resched() loop
  RISC-V: entry: Remove unneeded need_resched() loop
  sh: entry: Remove unneeded need_resched() loop
  sh64: entry: Remove unneeded need_resched() loop
  sparc64: rtrap: Remove unneeded need_resched() loop
  x86/entry: Remove unneeded need_resched() loop
  xtensa: entry: Remove unneeded need_resched() loop

 arch/c6x/kernel/entry.S| 3 +--
 arch/csky/kernel/entry.S   | 4 
 arch/h8300/kernel/entry.S  | 3 +--
 arch/microblaze/kernel/entry.S | 5 -
 arch/mips/kernel/entry.S   | 3 +--
 arch/nds32/kernel/ex-exit.S| 4 ++--
 arch/powerpc/kernel/entry_32.S | 6 +-
 arch/powerpc/kernel/entry_64.S | 8 +---
 arch/riscv/kernel/entry.S  | 3 +--
 arch/sh/kernel/cpu/sh5/entry.S | 5 +
 arch/sh/kernel/entry-common.S  | 4 +---
 arch/sparc/kernel/rtrap_64.S   | 1 -
 arch/x86/entry/entry_32.S  | 3 +--
 arch/x86/entry/entry_64.S  | 3 +--
 arch/xtensa/kernel/entry.S | 2 +-
 kernel/sched/core.c| 7 +++
 16 files changed, 16 insertions(+), 48 deletions(-)

--
2.20.1



Re: [PATCH] powerpc/64s: Include header file to fix a warning

2019-03-12 Thread Christophe Leroy




Le 12/03/2019 à 21:18, Mathieu Malaterre a écrit :

Make sure to include  to provide the following prototype:
hv_nmi_check_nonrecoverable.

Remove the following warning treated as error (W=1):

   arch/powerpc/kernel/traps.c:393:6: error: no previous prototype for 
'hv_nmi_check_nonrecoverable' [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 


Reviewed-by: Christophe Leroy 


---
  arch/powerpc/kernel/traps.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a21200c6aaea..1fd45a8650e1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -71,6 +71,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)

  int (*__debugger)(struct pt_regs *regs) __read_mostly;



Re: [PATCH] kmemleak: skip scanning holes in the .bss section

2019-03-12 Thread Andrew Morton
On Tue, 12 Mar 2019 15:14:12 -0400 Qian Cai  wrote:

> The commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
> kvm_tmp[] into the .bss section and then free the rest of unused spaces
> back to the page allocator.
> 
> kernel_init
>   kvm_guest_init
> kvm_free_tmp
>   free_reserved_area
> free_unref_page
>   free_unref_page_prepare
> 
> With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
> result, kmemleak scan will trigger a panic below when it scans the .bss
> section with unmapped pages.
> 
> Since this is done way before the first kmemleak_scan(), just go
> lockless to make the implementation simple and skip those pages when
> scanning the .bss section. Later, those pages could be tracked by
> kmemleak again once allocated by the page allocator. Overall, this is
> such a special case, so no need to make it a generic to let kmemleak
> gain an ability to skip blocks in scan_large_block().
> 
> BUG: Unable to handle kernel data access at 0xc161
> Faulting instruction address: 0xc03cc178
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
> CPU: 3 PID: 130 Comm: kmemleak Kdump: loaded Not tainted 5.0.0+ #9
> REGS: c004b05bf940 TRAP: 0300   Not tainted  (5.0.0+)
> NIP [c03cc178] scan_block+0xa8/0x190
> LR [c03cc170] scan_block+0xa0/0x190
> Call Trace:
> [c004b05bfbd0] [c03cc170] scan_block+0xa0/0x190 (unreliable)
> [c004b05bfc30] [c03cc2c0] scan_large_block+0x60/0xa0
> [c004b05bfc70] [c03ccc64] kmemleak_scan+0x254/0x960
> [c004b05bfd40] [c03cdd50] kmemleak_scan_thread+0xec/0x12c
> [c004b05bfdb0] [c0104388] kthread+0x1b8/0x1c0
> [c004b05bfe20] [c000b364] ret_from_kernel_thread+0x5c/0x78
> Instruction dump:
> 7fa3eb78 4844667d 6000 6000 6000 6000 3bff0008 7fbcf840
> 409d00b8 4bfffeed 2fa3 409e00ac  e93e0128 7fa91840
> 419dffdc
> 

hm, yes, this is super crude.  I guess we can turn it into something
more sophisticated if another caller is identified.

> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -237,6 +237,10 @@ static int kmemleak_skip_disable;
>  /* If there are leaks that can be reported */
>  static bool kmemleak_found_leaks;
>  
> +/* Skip scanning of a range in the .bss section. */
> +static void *bss_hole_start;
> +static void *bss_hole_stop;
> +
>  static bool kmemleak_verbose;
>  module_param_named(verbose, kmemleak_verbose, bool, 0600);
>  
> @@ -1265,6 +1269,18 @@ void __ref kmemleak_ignore_phys(phys_addr_t phys)
>  }
>  EXPORT_SYMBOL(kmemleak_ignore_phys);
>  
> +/**
> + * kmemleak_bss_hole - skip scanning a range in the .bss section
> + *
> + * @start:   start of the range
> + * @stop:end of the range
> + */
> +void kmemleak_bss_hole(void *start, void *stop)
> +{
> + bss_hole_start = start;
> + bss_hole_stop = stop;
> +}

I'll make this __init.

>  /*
>   * Update an object's checksum and return true if it was modified.
>   */
> @@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
>  
>   /* data/bss scanning */
>   scan_large_block(_sdata, _edata);
> - scan_large_block(__bss_start, __bss_stop);
> +
> + if (bss_hole_start) {
> + scan_large_block(__bss_start, bss_hole_start);
> + scan_large_block(bss_hole_stop, __bss_stop);
> + } else {
> + scan_large_block(__bss_start, __bss_stop);
> + }
> +
>   scan_large_block(__start_ro_after_init, __end_ro_after_init);
>  
>  #ifdef CONFIG_SMP



Re: [PATCH] powerpc/64s: Mark 'dummy_copy_buffer' as used

2019-03-12 Thread Christophe Leroy

On 03/12/2019 08:29 PM, Mathieu Malaterre wrote:

In commit 07d2a628bc00 ("powerpc/64s: Avoid cpabort in context switch
when possible") a buffer 'dummy_copy_buffer' was introduced. gcc does
not see this buffer being used in the inline assembly within function
'__switch_to', explicitly marked this variable as being used.

Prefer using '__aligned' to get passed line over 80 characters warning
in checkpatch.


Powerpc accepts 90 characters, use arch/powerpc/tools/checkpatch.sh



This remove the following warning:

   arch/powerpc/kernel/process.c:1156:17: error: 'dummy_copy_buffer' defined 
but not used [-Werror=unused-const-variable=]


commit 2bf1071a8d50 ("powerpc/64s: Remove POWER9 DD1 support") has 
removed the fonction using 'dummy_copy_buffer' so you should remove it 
completely.


Christophe




Cc: Nicholas Piggin 
Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/kernel/process.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 77e44275d025..5acf63d45802 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1153,7 +1153,7 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
  
  #ifdef CONFIG_PPC_BOOK3S_64

  #define CP_SIZE 128
-static const u8 dummy_copy_buffer[CP_SIZE] __attribute__((aligned(CP_SIZE)));
+static const u8 dummy_copy_buffer[CP_SIZE] __aligned(CP_SIZE) __used;
  #endif
  
  struct task_struct *__switch_to(struct task_struct *prev,




Re: [PATCH] powerpc: sstep: Mark variable `rc` as unused in function 'analyse_instr'

2019-03-12 Thread Christophe Leroy




Le 12/03/2019 à 21:20, Mathieu Malaterre a écrit :

Add gcc attribute unused for `rc` variable.

Fix warnings treated as errors with W=1:

   arch/powerpc/lib/sstep.c:1172:31: error: variable 'rc' set but not used 
[-Werror=unused-but-set-variable]

Signed-off-by: Mathieu Malaterre 
---
  arch/powerpc/lib/sstep.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3d33fb509ef4..32d092f62ae0 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1169,7 +1169,7 @@ static nokprobe_inline int trap_compare(long v1, long v2)
  int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
  unsigned int instr)
  {
-   unsigned int opcode, ra, rb, rc, rd, spr, u;
+   unsigned int opcode, ra, rb, rc __maybe_unused, rd, spr, u;


I think it would be better to enclose 'rc' inside a #ifdef CONFIG_PPC64

Christophe


unsigned long int imm;
unsigned long int val, val2;
unsigned int mb, me, sh;



[PATCH] powerpc: Make some functions static

2019-03-12 Thread Mathieu Malaterre
In commit cb9e4d10c448 ("[POWERPC] Add support for 750CL Holly board")
new functions were added. Since these functions can be made static,
make it so. While doing so, it turns out that holly_power_off and
holly_halt are unused, so remove them.

Silence the following warnings triggered using W=1:

  arch/powerpc/platforms/embedded6xx/holly.c:47:5: error: no previous prototype 
for 'holly_exclude_device' [-Werror=missing-prototypes]
  arch/powerpc/platforms/embedded6xx/holly.c:190:6: error: no previous 
prototype for 'holly_show_cpuinfo' [-Werror=missing-prototypes]
  arch/powerpc/platforms/embedded6xx/holly.c:196:17: error: no previous 
prototype for 'holly_restart' [-Werror=missing-prototypes]
  arch/powerpc/platforms/embedded6xx/holly.c:236:6: error: no previous 
prototype for 'holly_power_off' [-Werror=missing-prototypes]
  arch/powerpc/platforms/embedded6xx/holly.c:243:6: error: no previous 
prototype for 'holly_halt' [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/platforms/embedded6xx/holly.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/holly.c 
b/arch/powerpc/platforms/embedded6xx/holly.c
index 0409714e8070..829bf3697dc9 100644
--- a/arch/powerpc/platforms/embedded6xx/holly.c
+++ b/arch/powerpc/platforms/embedded6xx/holly.c
@@ -44,7 +44,8 @@
 
 #define HOLLY_PCI_CFG_PHYS 0x7c00
 
-int holly_exclude_device(struct pci_controller *hose, u_char bus, u_char devfn)
+static int holly_exclude_device(struct pci_controller *hose, u_char bus,
+   u_char devfn)
 {
if (bus == 0 && PCI_SLOT(devfn) == 0)
return PCIBIOS_DEVICE_NOT_FOUND;
@@ -187,13 +188,13 @@ static void __init holly_init_IRQ(void)
tsi108_write_reg(TSI108_MPIC_OFFSET + 0x30c, 0);
 }
 
-void holly_show_cpuinfo(struct seq_file *m)
+static void holly_show_cpuinfo(struct seq_file *m)
 {
seq_printf(m, "vendor\t\t: IBM\n");
seq_printf(m, "machine\t\t: PPC750 GX/CL\n");
 }
 
-void __noreturn holly_restart(char *cmd)
+static void __noreturn holly_restart(char *cmd)
 {
__be32 __iomem *ocn_bar1 = NULL;
unsigned long bar;
@@ -233,18 +234,6 @@ void __noreturn holly_restart(char *cmd)
for (;;) ;
 }
 
-void holly_power_off(void)
-{
-   local_irq_disable();
-   /* No way to shut power off with software */
-   for (;;) ;
-}
-
-void holly_halt(void)
-{
-   holly_power_off();
-}
-
 /*
  * Called very early, device-tree isn't unflattened
  */
-- 
2.20.1



[PATCH] powerpc/64s: Mark 'dummy_copy_buffer' as used

2019-03-12 Thread Mathieu Malaterre
In commit 07d2a628bc00 ("powerpc/64s: Avoid cpabort in context switch
when possible") a buffer 'dummy_copy_buffer' was introduced. gcc does
not see this buffer being used in the inline assembly within function
'__switch_to', explicitly marked this variable as being used.

Prefer using '__aligned' to get passed line over 80 characters warning
in checkpatch.

This remove the following warning:

  arch/powerpc/kernel/process.c:1156:17: error: 'dummy_copy_buffer' defined but 
not used [-Werror=unused-const-variable=]

Cc: Nicholas Piggin 
Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 77e44275d025..5acf63d45802 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1153,7 +1153,7 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #define CP_SIZE 128
-static const u8 dummy_copy_buffer[CP_SIZE] __attribute__((aligned(CP_SIZE)));
+static const u8 dummy_copy_buffer[CP_SIZE] __aligned(CP_SIZE) __used;
 #endif
 
 struct task_struct *__switch_to(struct task_struct *prev,
-- 
2.20.1



[PATCH] powerpc: sstep: Mark variable `rc` as unused in function 'analyse_instr'

2019-03-12 Thread Mathieu Malaterre
Add gcc attribute unused for `rc` variable.

Fix warnings treated as errors with W=1:

  arch/powerpc/lib/sstep.c:1172:31: error: variable 'rc' set but not used 
[-Werror=unused-but-set-variable]

Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/lib/sstep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3d33fb509ef4..32d092f62ae0 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1169,7 +1169,7 @@ static nokprobe_inline int trap_compare(long v1, long v2)
 int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
  unsigned int instr)
 {
-   unsigned int opcode, ra, rb, rc, rd, spr, u;
+   unsigned int opcode, ra, rb, rc __maybe_unused, rd, spr, u;
unsigned long int imm;
unsigned long int val, val2;
unsigned int mb, me, sh;
-- 
2.20.1



[PATCH] powerpc/64s: Include header file to fix a warning

2019-03-12 Thread Mathieu Malaterre
Make sure to include  to provide the following prototype:
hv_nmi_check_nonrecoverable.

Remove the following warning treated as error (W=1):

  arch/powerpc/kernel/traps.c:393:6: error: no previous prototype for 
'hv_nmi_check_nonrecoverable' [-Werror=missing-prototypes]

Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/kernel/traps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index a21200c6aaea..1fd45a8650e1 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
-- 
2.20.1



Re: [PATCH] kmemleak: skip scanning holes in the .bss section

2019-03-12 Thread Qian Cai
Fixing some email addresses.

On Tue, 2019-03-12 at 15:14 -0400, Qian Cai wrote:
> The commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
> kvm_tmp[] into the .bss section and then free the rest of unused spaces
> back to the page allocator.
> 
> kernel_init
>   kvm_guest_init
> kvm_free_tmp
>   free_reserved_area
> free_unref_page
>   free_unref_page_prepare
> 
> With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
> result, kmemleak scan will trigger a panic below when it scans the .bss
> section with unmapped pages.
> 
> Since this is done way before the first kmemleak_scan(), just go
> lockless to make the implementation simple and skip those pages when
> scanning the .bss section. Later, those pages could be tracked by
> kmemleak again once allocated by the page allocator. Overall, this is
> such a special case, so no need to make it a generic to let kmemleak
> gain an ability to skip blocks in scan_large_block().
> 
> BUG: Unable to handle kernel data access at 0xc161
> Faulting instruction address: 0xc03cc178
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
> CPU: 3 PID: 130 Comm: kmemleak Kdump: loaded Not tainted 5.0.0+ #9
> REGS: c004b05bf940 TRAP: 0300   Not tainted  (5.0.0+)
> NIP [c03cc178] scan_block+0xa8/0x190
> LR [c03cc170] scan_block+0xa0/0x190
> Call Trace:
> [c004b05bfbd0] [c03cc170] scan_block+0xa0/0x190 (unreliable)
> [c004b05bfc30] [c03cc2c0] scan_large_block+0x60/0xa0
> [c004b05bfc70] [c03ccc64] kmemleak_scan+0x254/0x960
> [c004b05bfd40] [c03cdd50] kmemleak_scan_thread+0xec/0x12c
> [c004b05bfdb0] [c0104388] kthread+0x1b8/0x1c0
> [c004b05bfe20] [c000b364] ret_from_kernel_thread+0x5c/0x78
> Instruction dump:
> 7fa3eb78 4844667d 6000 6000 6000 6000 3bff0008 7fbcf840
> 409d00b8 4bfffeed 2fa3 409e00ac  e93e0128 7fa91840
> 419dffdc
> 
> Signed-off-by: Qian Cai 
> ---
>  arch/powerpc/kernel/kvm.c |  3 +++
>  include/linux/kmemleak.h  |  4 
>  mm/kmemleak.c | 25 -
>  3 files changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
> index 683b5b3805bd..5cddc8fc56bb 100644
> --- a/arch/powerpc/kernel/kvm.c
> +++ b/arch/powerpc/kernel/kvm.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -712,6 +713,8 @@ static void kvm_use_magic_page(void)
>  
>  static __init void kvm_free_tmp(void)
>  {
> + kmemleak_bss_hole(_tmp[kvm_tmp_index],
> +   _tmp[ARRAY_SIZE(kvm_tmp)]);
>   free_reserved_area(_tmp[kvm_tmp_index],
>      _tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL);
>  }
> diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h
> index 5ac416e2d339..3d8949b9c6f5 100644
> --- a/include/linux/kmemleak.h
> +++ b/include/linux/kmemleak.h
> @@ -46,6 +46,7 @@ extern void kmemleak_alloc_phys(phys_addr_t phys, size_t
> size, int min_count,
>  extern void kmemleak_free_part_phys(phys_addr_t phys, size_t size) __ref;
>  extern void kmemleak_not_leak_phys(phys_addr_t phys) __ref;
>  extern void kmemleak_ignore_phys(phys_addr_t phys) __ref;
> +extern void kmemleak_bss_hole(void *start, void *stop);
>  
>  static inline void kmemleak_alloc_recursive(const void *ptr, size_t size,
>   int min_count, slab_flags_t
> flags,
> @@ -131,6 +132,9 @@ static inline void kmemleak_not_leak_phys(phys_addr_t
> phys)
>  static inline void kmemleak_ignore_phys(phys_addr_t phys)
>  {
>  }
> +static inline void kmemleak_bss_hole(void *start, void *stop)
> +{
> +}
>  
>  #endif   /* CONFIG_DEBUG_KMEMLEAK */
>  
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index 707fa5579f66..42349cd9ef7a 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -237,6 +237,10 @@ static int kmemleak_skip_disable;
>  /* If there are leaks that can be reported */
>  static bool kmemleak_found_leaks;
>  
> +/* Skip scanning of a range in the .bss section. */
> +static void *bss_hole_start;
> +static void *bss_hole_stop;
> +
>  static bool kmemleak_verbose;
>  module_param_named(verbose, kmemleak_verbose, bool, 0600);
>  
> @@ -1265,6 +1269,18 @@ void __ref kmemleak_ignore_phys(phys_addr_t phys)
>  }
>  EXPORT_SYMBOL(kmemleak_ignore_phys);
>  
> +/**
> + * kmemleak_bss_hole - skip scanning a range in the .bss section
> + *
> + * @start:   start of the range
> + * @stop:end of the range
> + */
> +void kmemleak_bss_hole(void *start, void *stop)
> +{
> + bss_hole_start = start;
> + bss_hole_stop = stop;
> +}
> +
>  /*
>   * Update an object's checksum and return true if it was modified.
>   */
> @@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
>  
>   /* data/bss scanning */
>   scan_large_block(_sdata, _edata);
> - 

[PATCH] kmemleak: skip scanning holes in the .bss section

2019-03-12 Thread Qian Cai
The commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
kvm_tmp[] into the .bss section and then free the rest of unused spaces
back to the page allocator.

kernel_init
  kvm_guest_init
kvm_free_tmp
  free_reserved_area
free_unref_page
  free_unref_page_prepare

With DEBUG_PAGEALLOC=y, it will unmap those pages from kernel. As the
result, kmemleak scan will trigger a panic below when it scans the .bss
section with unmapped pages.

Since this is done way before the first kmemleak_scan(), just go
lockless to make the implementation simple and skip those pages when
scanning the .bss section. Later, those pages could be tracked by
kmemleak again once allocated by the page allocator. Overall, this is
such a special case, so no need to make it a generic to let kmemleak
gain an ability to skip blocks in scan_large_block().

BUG: Unable to handle kernel data access at 0xc161
Faulting instruction address: 0xc03cc178
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
CPU: 3 PID: 130 Comm: kmemleak Kdump: loaded Not tainted 5.0.0+ #9
REGS: c004b05bf940 TRAP: 0300   Not tainted  (5.0.0+)
NIP [c03cc178] scan_block+0xa8/0x190
LR [c03cc170] scan_block+0xa0/0x190
Call Trace:
[c004b05bfbd0] [c03cc170] scan_block+0xa0/0x190 (unreliable)
[c004b05bfc30] [c03cc2c0] scan_large_block+0x60/0xa0
[c004b05bfc70] [c03ccc64] kmemleak_scan+0x254/0x960
[c004b05bfd40] [c03cdd50] kmemleak_scan_thread+0xec/0x12c
[c004b05bfdb0] [c0104388] kthread+0x1b8/0x1c0
[c004b05bfe20] [c000b364] ret_from_kernel_thread+0x5c/0x78
Instruction dump:
7fa3eb78 4844667d 6000 6000 6000 6000 3bff0008 7fbcf840
409d00b8 4bfffeed 2fa3 409e00ac  e93e0128 7fa91840
419dffdc

Signed-off-by: Qian Cai 
---
 arch/powerpc/kernel/kvm.c |  3 +++
 include/linux/kmemleak.h  |  4 
 mm/kmemleak.c | 25 -
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 683b5b3805bd..5cddc8fc56bb 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -712,6 +713,8 @@ static void kvm_use_magic_page(void)
 
 static __init void kvm_free_tmp(void)
 {
+   kmemleak_bss_hole(_tmp[kvm_tmp_index],
+ _tmp[ARRAY_SIZE(kvm_tmp)]);
free_reserved_area(_tmp[kvm_tmp_index],
   _tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL);
 }
diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h
index 5ac416e2d339..3d8949b9c6f5 100644
--- a/include/linux/kmemleak.h
+++ b/include/linux/kmemleak.h
@@ -46,6 +46,7 @@ extern void kmemleak_alloc_phys(phys_addr_t phys, size_t 
size, int min_count,
 extern void kmemleak_free_part_phys(phys_addr_t phys, size_t size) __ref;
 extern void kmemleak_not_leak_phys(phys_addr_t phys) __ref;
 extern void kmemleak_ignore_phys(phys_addr_t phys) __ref;
+extern void kmemleak_bss_hole(void *start, void *stop);
 
 static inline void kmemleak_alloc_recursive(const void *ptr, size_t size,
int min_count, slab_flags_t flags,
@@ -131,6 +132,9 @@ static inline void kmemleak_not_leak_phys(phys_addr_t phys)
 static inline void kmemleak_ignore_phys(phys_addr_t phys)
 {
 }
+static inline void kmemleak_bss_hole(void *start, void *stop)
+{
+}
 
 #endif /* CONFIG_DEBUG_KMEMLEAK */
 
diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 707fa5579f66..42349cd9ef7a 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -237,6 +237,10 @@ static int kmemleak_skip_disable;
 /* If there are leaks that can be reported */
 static bool kmemleak_found_leaks;
 
+/* Skip scanning of a range in the .bss section. */
+static void *bss_hole_start;
+static void *bss_hole_stop;
+
 static bool kmemleak_verbose;
 module_param_named(verbose, kmemleak_verbose, bool, 0600);
 
@@ -1265,6 +1269,18 @@ void __ref kmemleak_ignore_phys(phys_addr_t phys)
 }
 EXPORT_SYMBOL(kmemleak_ignore_phys);
 
+/**
+ * kmemleak_bss_hole - skip scanning a range in the .bss section
+ *
+ * @start: start of the range
+ * @stop:  end of the range
+ */
+void kmemleak_bss_hole(void *start, void *stop)
+{
+   bss_hole_start = start;
+   bss_hole_stop = stop;
+}
+
 /*
  * Update an object's checksum and return true if it was modified.
  */
@@ -1531,7 +1547,14 @@ static void kmemleak_scan(void)
 
/* data/bss scanning */
scan_large_block(_sdata, _edata);
-   scan_large_block(__bss_start, __bss_stop);
+
+   if (bss_hole_start) {
+   scan_large_block(__bss_start, bss_hole_start);
+   scan_large_block(bss_hole_stop, __bss_stop);
+   } else {
+   scan_large_block(__bss_start, __bss_stop);
+   }
+
scan_large_block(__start_ro_after_init, 

Re: [PATCH v2 4/7] dt-bindings: counter: ftm-quaddec

2019-03-12 Thread Rob Herring
On Wed, Mar 06, 2019 at 12:12:05PM +0100, Patrick Havelange wrote:
> FlexTimer quadrature decoder driver.
> 
> Signed-off-by: Patrick Havelange 
> Reviewed-by: Esben Haabendal 
> ---
> Changes v2
>  - None
> ---
>  .../bindings/counter/ftm-quaddec.txt   | 18 ++
>  1 file changed, 18 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/counter/ftm-quaddec.txt
> 
> diff --git a/Documentation/devicetree/bindings/counter/ftm-quaddec.txt 
> b/Documentation/devicetree/bindings/counter/ftm-quaddec.txt
> new file mode 100644
> index ..4d18cd722074
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/counter/ftm-quaddec.txt
> @@ -0,0 +1,18 @@
> +FlexTimer Quadrature decoder counter
> +
> +This driver exposes a simple counter for the quadrature decoder mode.

Seems like this is more a mode of a h/w block than describing a h/w 
block. Bindings should do the latter.

> +
> +Required properties:
> +- compatible:Must be "fsl,ftm-quaddec".
> +- reg:   Must be set to the memory region of the 
> flextimer.
> +
> +Optional property:
> +- big-endian:Access the device registers in big-endian mode.
> +
> +Example:
> + counter0: counter@29d {
> + compatible = "fsl,ftm-quaddec";
> + reg = <0x0 0x29d 0x0 0x1>;
> + big-endian;
> + status = "disabled";
> + };
> -- 
> 2.19.1
> 


Re: [PATCH v2 01/16] powerpc/xive: add OPAL extensions for the XIVE native exploitation support

2019-03-12 Thread Cédric Le Goater
On 2/26/19 5:21 AM, David Gibson wrote:
> On Mon, Feb 25, 2019 at 11:11:58AM +0100, Cédric Le Goater wrote:
>> On 2/25/19 4:50 AM, Michael Ellerman wrote:
>>> Cédric Le Goater  writes:
>>>
 The support for XIVE native exploitation mode in Linux/KVM needs a
 couple more OPAL calls to configure the sPAPR guest and to get/set the
 state of the XIVE internal structures.

 Signed-off-by: Cédric Le Goater 
 ---
  arch/powerpc/include/asm/opal-api.h   | 11 ++-
  arch/powerpc/include/asm/opal.h   |  7 ++
  arch/powerpc/include/asm/xive.h   | 14 +++
  arch/powerpc/sysdev/xive/native.c | 99 +++
  .../powerpc/platforms/powernv/opal-wrappers.S |  3 +
  5 files changed, 130 insertions(+), 4 deletions(-)

 diff --git a/arch/powerpc/include/asm/opal-api.h 
 b/arch/powerpc/include/asm/opal-api.h
 index 870fb7b239ea..cdfc54f78101 100644
 --- a/arch/powerpc/include/asm/opal-api.h
 +++ b/arch/powerpc/include/asm/opal-api.h
 @@ -186,8 +186,8 @@
  #define OPAL_XIVE_FREE_IRQ140
  #define OPAL_XIVE_SYNC141
  #define OPAL_XIVE_DUMP142
 -#define OPAL_XIVE_RESERVED3   143
 -#define OPAL_XIVE_RESERVED4   144
 +#define OPAL_XIVE_GET_QUEUE_STATE 143
 +#define OPAL_XIVE_SET_QUEUE_STATE 144
  #define OPAL_SIGNAL_SYSTEM_RESET  145
  #define OPAL_NPU_INIT_CONTEXT 146
  #define OPAL_NPU_DESTROY_CONTEXT  147
 @@ -209,8 +209,11 @@
  #define OPAL_SENSOR_GROUP_ENABLE  163
  #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR  164
  #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR  165
 -#define   OPAL_NX_COPROC_INIT 167
 -#define OPAL_LAST 167
 +#define OPAL_HANDLE_HMI2  166
 +#define OPAL_NX_COPROC_INIT   167
 +#define OPAL_NPU_SET_RELAXED_ORDER168
 +#define OPAL_NPU_GET_RELAXED_ORDER169
 +#define OPAL_XIVE_GET_VP_STATE170
>>>
>>> You should only be defining the calls you need, leaving gaps for other
>>> things, and you need to retain OPAL_LAST. So it should look more like:
>>>
>>>  -#define OPAL_LAST 167
>>>  +#define OPAL_XIVE_GET_VP_STATE170
>>>  +#define OPAL_LAST 170
>>>
>>>
>>> Also I can't merge this until it's merged into skiboot.
>>
>> OK. Let's start with skiboot.
> 
> Yeah.. where are we at with skiboot in general.  We can't test this
> downstream until we have a released skiboot with the necessary
> support.

If we add a flag to remove the OPAL call when setting the EQ, you could
test, without migration though.

C. 



Re: [PATCH 00/14] entry: preempt_schedule_irq() callers scrub

2019-03-12 Thread Vineet Gupta
On 3/11/19 3:47 PM, Valentin Schneider wrote:
> Hi,
> 
> This is the continuation of [1] where I'm hunting down
> preempt_schedule_irq() callers because of [2].
> 
> I told myself the best way to get this moving forward wouldn't be to write
> doc about it, but to go write some fixes and get some discussions going,
> which is what this patch-set is about.
> 
> I've looked at users of preempt_schedule_irq(), and made sure they didn't
> have one of those useless loops. The list of offenders is:
> 
> $ grep -r -I "preempt_schedule_irq" arch/ | cut -d/ -f2 | sort | uniq
> 
...

> 
> Regarding that loop, archs seem to fall in 3 categories:
> A) Those that don't have the loop

Please clarify that this is the right thing to do (since core code already has 
the
loop) hence no fixing is required for this "category"

> B) Those that have a small need_resched() loop around the
>preempt_schedule_irq() callsite
> C) Those that branch to some more generic code further up the entry code
>and eventually branch back to preempt_schedule_irq()
> 
> arc, m68k, nios2 fall in A)

> sparc, ia64, s390 fall in C)
> all the others fall in B)
> 
> I've written patches for B) and C) EXCEPT for ia64 and s390 because I
> haven't been able to tell if it's actually fine to kill that "long jump"
> (and maybe I'm wrong on sparc). Hopefully folks who understand what goes on
> in there might be able to shed some light.


Re: [PATCH v2 06/16] KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration

2019-03-12 Thread Cédric Le Goater
On 2/25/19 3:39 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 12:28:30PM +0100, Cédric Le Goater wrote:
>> These controls will be used by the H_INT_SET_QUEUE_CONFIG and
>> H_INT_GET_QUEUE_CONFIG hcalls from QEMU. They will also be used to
>> restore the configuration of the XIVE EQs in the KVM device and to
>> capture the internal runtime state of the EQs. Both 'get' and 'set'
>> rely on an OPAL call to access from the XIVE interrupt controller the
>> EQ toggle bit and EQ index which are updated by the HW when event
>> notifications are enqueued in the EQ.
>>
>> The value of the guest physical address of the event queue is saved in
>> the XIVE internal xive_q structure for later use. That is when
>> migration needs to mark the EQ pages dirty to capture a consistent
>> memory state of the VM.
>>
>> To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
>> OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
>> but restoring the EQ state will.

I think we need to add some kind of flags to differentiate the hcall
H_INT_SET_QUEUE_CONFIG from the restore of the EQ. The hcall does
not need OPAL support call and this could help in the code transition.

But without OPAL support, we won't have migration. Would that be of 
any use ?  


>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/xive.h|   2 +
>>  arch/powerpc/include/uapi/asm/kvm.h|  21 +++
>>  arch/powerpc/kvm/book3s_xive.h |   2 +
>>  arch/powerpc/kvm/book3s_xive.c |  15 +-
>>  arch/powerpc/kvm/book3s_xive_native.c  | 207 +
>>  Documentation/virtual/kvm/devices/xive.txt |  29 +++
>>  6 files changed, 270 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/xive.h 
>> b/arch/powerpc/include/asm/xive.h
>> index b579a943407b..46891f321606 100644
>> --- a/arch/powerpc/include/asm/xive.h
>> +++ b/arch/powerpc/include/asm/xive.h
>> @@ -73,6 +73,8 @@ struct xive_q {
>>  u32 esc_irq;
>>  atomic_tcount;
>>  atomic_tpending_count;
>> +u64 guest_qpage;
>> +u32 guest_qsize;
>>  };
>>  
>>  /* Global enable flags for the XIVE support */
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 91899c7f9abd..177e43f3edaf 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -679,6 +679,7 @@ struct kvm_ppc_cpu_char {
>>  #define KVM_DEV_XIVE_GRP_CTRL   1
>>  #define KVM_DEV_XIVE_GRP_SOURCE 2   /* 64-bit source 
>> attributes */
>>  #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG  3   /* 64-bit source 
>> attributes */
>> +#define KVM_DEV_XIVE_GRP_EQ_CONFIG  4   /* 64-bit eq attributes */
>>  
>>  /* Layout of 64-bit XIVE source attribute values */
>>  #define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>> @@ -694,4 +695,24 @@ struct kvm_ppc_cpu_char {
>>  #define KVM_XIVE_SOURCE_EISN_SHIFT  33
>>  #define KVM_XIVE_SOURCE_EISN_MASK   0xfffeULL
>>  
>> +/* Layout of 64-bit eq attribute */
>> +#define KVM_XIVE_EQ_PRIORITY_SHIFT  0
>> +#define KVM_XIVE_EQ_PRIORITY_MASK   0x7
>> +#define KVM_XIVE_EQ_SERVER_SHIFT3
>> +#define KVM_XIVE_EQ_SERVER_MASK 0xfff8ULL
>> +
>> +/* Layout of 64-bit eq attribute values */
>> +struct kvm_ppc_xive_eq {
>> +__u32 flags;
>> +__u32 qsize;
>> +__u64 qpage;
>> +__u32 qtoggle;
>> +__u32 qindex;
>> +__u8  pad[40];
>> +};
>> +
>> +#define KVM_XIVE_EQ_FLAG_ENABLED0x0001
>> +#define KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY  0x0002
>> +#define KVM_XIVE_EQ_FLAG_ESCALATE   0x0004
>> +
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
>> index ab3ac152980d..6660d138c6b7 100644
>> --- a/arch/powerpc/kvm/book3s_xive.h
>> +++ b/arch/powerpc/kvm/book3s_xive.h
>> @@ -267,6 +267,8 @@ struct kvmppc_xive_src_block 
>> *kvmppc_xive_create_src_block(
>>  struct kvmppc_xive *xive, int irq);
>>  void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
>>  int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
>> +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
>> +  bool single_escalation);
>>  
>>  #endif /* CONFIG_KVM_XICS */
>>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>> index 086da91d7c6e..7431e31bc541 100644
>> --- a/arch/powerpc/kvm/book3s_xive.c
>> +++ b/arch/powerpc/kvm/book3s_xive.c
>> @@ -166,7 +166,8 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
>>  return IRQ_HANDLED;
>>  }
>>  
>> -static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
>> +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
>> +  bool single_escalation)
>>  {
>>  

Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-03-12 Thread Thiago Jung Bauermann


Gautham R Shenoy  writes:

>> Signed-off-by: Thiago Jung Bauermann 
>
> Thanks for this version. I have tested the patch and we no longer see
> the "Querying DEAD? cpu X (Y) shows 2" message.
>
>
> Tested-and-Reviewed-by: Gautham R. Shenoy 

Thanks for reviewing and testing the patch!

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH v2 04/16] KVM: PPC: Book3S HV: XIVE: add a control to initialize a source

2019-03-12 Thread Cédric Le Goater
On 2/25/19 3:10 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 12:28:28PM +0100, Cédric Le Goater wrote:
>> The associated HW interrupt source is simply allocated at the OPAL/HW
>> level and then MASKED. KVM only needs to know about its type: LSI or
>> MSI.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h|   5 +
>>  arch/powerpc/kvm/book3s_xive.h |  10 ++
>>  arch/powerpc/kvm/book3s_xive.c |   8 +-
>>  arch/powerpc/kvm/book3s_xive_native.c  | 114 +
>>  Documentation/virtual/kvm/devices/xive.txt |  15 +++
>>  5 files changed, 148 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index b002c0c67787..a9ad99f2a11b 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -677,5 +677,10 @@ struct kvm_ppc_cpu_char {
>>  
>>  /* POWER9 XIVE Native Interrupt Controller */
>>  #define KVM_DEV_XIVE_GRP_CTRL   1
>> +#define KVM_DEV_XIVE_GRP_SOURCE 2   /* 64-bit source 
>> attributes */
>> +
>> +/* Layout of 64-bit XIVE source attribute values */
>> +#define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>> +#define KVM_XIVE_LEVEL_ASSERTED (1ULL << 1)
>>  
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
>> index bcb1bbcf0359..f22f2d46d0f0 100644
>> --- a/arch/powerpc/kvm/book3s_xive.h
>> +++ b/arch/powerpc/kvm/book3s_xive.h
>> @@ -12,6 +12,13 @@
>>  #ifdef CONFIG_KVM_XICS
>>  #include "book3s_xics.h"
>>  
>> +/*
>> + * The XIVE IRQ number space is aligned with the XICS IRQ number
>> + * space, CPU IPIs being allocated in the first 4K.
> 
> We do align these in qemu, but I don't see that the kernel part
> cares: as far as it's concerned only one of XICS or XIVE is active at
> a time, and the irq numbers are chosen by userspace.

There is some relation with userspace nevertheless. The KVM device does 
not remap the numbers to some other range today and the limits are fixed
values. Checks are being done in the has_attr() and the set_attr(). 

>> + */
>> +#define KVMPPC_XIVE_FIRST_IRQ   0
>> +#define KVMPPC_XIVE_NR_IRQS KVMPPC_XICS_NR_IRQS
>> +
>>  /*
>>   * State for one guest irq source.
>>   *
>> @@ -253,6 +260,9 @@ extern int (*__xive_vm_h_eoi)(struct kvm_vcpu *vcpu, 
>> unsigned long xirr);
>>   */
>>  void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu);
>>  int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu 
>> *vcpu);
>> +struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
>> +struct kvmppc_xive *xive, int irq);
>> +void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
>>  
>>  #endif /* CONFIG_KVM_XICS */
>>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>> index d1cc18a5b1c4..6f950ecb3592 100644
>> --- a/arch/powerpc/kvm/book3s_xive.c
>> +++ b/arch/powerpc/kvm/book3s_xive.c
> 
> I wonder if we should rename this book3s_xics_on_xive.c or something
> at some point, I keep getting confused because I forget that this is
> only dealing with host xive, not guest xive.

I am fine with renaming. Any objections ? book3s_xics_p9.c ? 

>> @@ -1485,8 +1485,8 @@ static int xive_get_source(struct kvmppc_xive *xive, 
>> long irq, u64 addr)
>>  return 0;
>>  }
>>  
>> -static struct kvmppc_xive_src_block *xive_create_src_block(struct 
>> kvmppc_xive *xive,
>> -   int irq)
>> +struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
>> +struct kvmppc_xive *xive, int irq)
>>  {
>>  struct kvm *kvm = xive->kvm;
>>  struct kvmppc_xive_src_block *sb;
> 
> It's odd that this function, now used from the xive-on-xive path as
> well as the xics-on-xive path references KVMPPC_XICS_ICS_SHIFT a few
> lines down from this change.

Yes. This is because of the definition of the struct kvmppc_xive_src_block.

We could introduce new defines for XIVE or a common set of defines for
XICS and XIVE.

>> @@ -1565,7 +1565,7 @@ static int xive_set_source(struct kvmppc_xive *xive, 
>> long irq, u64 addr)
>>  sb = kvmppc_xive_find_source(xive, irq, );
>>  if (!sb) {
>>  pr_devel("No source, creating source block...\n");
>> -sb = xive_create_src_block(xive, irq);
>> +sb = kvmppc_xive_create_src_block(xive, irq);
>>  if (!sb) {
>>  pr_devel("Failed to create block...\n");
>>  return -ENOMEM;
>> @@ -1789,7 +1789,7 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct 
>> xive_irq_data *xd)
>>  xive_cleanup_irq_data(xd);
>>  }
>>  
>> -static void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
>> +void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
>>  {
>>  int i;
>>  
>> diff --git 

Re: [PATCH v2 03/16] KVM: PPC: Book3S HV: XIVE: introduce a new capability KVM_CAP_PPC_IRQ_XIVE

2019-03-12 Thread Cédric Le Goater
On 2/25/19 5:59 AM, Paul Mackerras wrote:
> On Mon, Feb 25, 2019 at 11:35:27AM +1100, David Gibson wrote:
>> On Fri, Feb 22, 2019 at 12:28:27PM +0100, Cédric Le Goater wrote:
>>> +   xc->xive = xive;
>>> +   xc->vcpu = vcpu;
>>> +   xc->server_num = cpu;
>>> +   xc->vp_id = xive->vp_base + cpu;
>>
>> Hrm.  This ties the internal VP id to the userspace chosen server
>> number, which isn't ideal.  It puts a constraint on those server
>> numbers that you wouldn't otherwise have.
> 
> We should probably do the same as the xics-on-xive code, which is to
> put the server number through kvmppc_pack_vcpu_id(), which is a
> folding function that maps the QEMU vcpu id (which is the server
> number) down to the range 0..KVM_MAX_VCPUS-1, and works for the
> allocation patterns used in the various vSMT modes.

yes. I will see how it goes.

Thanks,

C.


Re: [PATCH v2 0/2] Append new variables to vmcoreinfo (PTRS_PER_PGD for arm64 and MAX_PHYSMEM_BITS for all archs)

2019-03-12 Thread Bhupesh Sharma

Hi Dave,

On 03/11/2019 02:35 PM, Dave Young wrote:

Hi Bhupesh,
On 03/10/19 at 03:34pm, Bhupesh Sharma wrote:

Changes since v1:

- v1 was sent out as a single patch which can be seen here:
   http://lists.infradead.org/pipermail/kexec/2019-February/022411.html

- v2 breaks the single patch into two independent patches:
   [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas
   [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code 
(all archs)

This patchset primarily fixes the regression reported in user-space
utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture
with the availability of 52-bit address space feature in underlying
kernel. These regressions have been reported both on CPUs which don't
support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels
and also on prototype platforms (like ARMv8 FVP simulator model) which
support ARMv8.2 extensions and are running newer kernels.

The reason for these regressions is that right now user-space tools
have no direct access to these values (since these are not exported
from the kernel) and hence need to rely on a best-guess method of
determining value of 'PTRS_PER_PGD' and 'MAX_PHYSMEM_BITS' supported
by underlying kernel.

Exporting these values via vmcoreinfo will help user-land in such cases.
In addition, as per suggestion from makedumpfile maintainer (Kazu)
during v1 review, it makes more sense to append 'MAX_PHYSMEM_BITS' to
vmcoreinfo in the core code itself rather than in arm64 arch-specific
code, so that the user-space code for other archs can also benefit from
this addition to the vmcoreinfo and use it as a standard way of
determining 'SECTIONS_SHIFT' value in user-land.

Cc: Mark Rutland 
Cc: James Morse 
Cc: Will Deacon 
Cc: Boris Petkov 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: ke...@lists.infradead.org

Bhupesh Sharma (2):
   arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to vmcoreinfo
   crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

  arch/arm64/kernel/crash_core.c | 1 +
  kernel/crash_core.c| 1 +
  2 files changed, 2 insertions(+)



Lianbo's document patch has been merged, would you mind to add vmcoreinfo doc
patch as well in your series?


Thanks for the inputs. Will add it to the v3.
Let's wait for other comments/reviews, before I spin a version 3.

Regards,
Bhupesh



Re: [PATCH v2 03/16] KVM: PPC: Book3S HV: XIVE: introduce a new capability KVM_CAP_PPC_IRQ_XIVE

2019-03-12 Thread Cédric Le Goater
On 2/25/19 1:35 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 12:28:27PM +0100, Cédric Le Goater wrote:
>> The user interface exposes a new capability to let QEMU connect the
>> vCPU to the XIVE KVM device if required. The capability is only
>> advertised on a PowerNV Hypervisor as support for nested guests
>> (pseries KVM Hypervisor) is not yet available.
>>
>> Internally, the interface to the new KVM device is protected with a
>> new interrupt mode: KVMPPC_IRQ_XIVE.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/kvm_host.h   |   1 +
>>  arch/powerpc/include/asm/kvm_ppc.h|  13 +++
>>  arch/powerpc/kvm/book3s_xive.h|   6 ++
>>  include/uapi/linux/kvm.h  |   1 +
>>  arch/powerpc/kvm/book3s_xive.c|  67 +++-
>>  arch/powerpc/kvm/book3s_xive_native.c | 144 ++
>>  arch/powerpc/kvm/powerpc.c|  33 ++
>>  Documentation/virtual/kvm/api.txt |   9 ++
>>  8 files changed, 246 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index 9f75a75a07f2..eb8581be0ee8 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -448,6 +448,7 @@ struct kvmppc_passthru_irqmap {
>>  #define KVMPPC_IRQ_DEFAULT  0
>>  #define KVMPPC_IRQ_MPIC 1
>>  #define KVMPPC_IRQ_XICS 2 /* Includes a XIVE option */
>> +#define KVMPPC_IRQ_XIVE 3 /* XIVE native exploitation mode */
>>  
>>  #define MMIO_HPTE_CACHE_SIZE4
>>  
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index 4b72ddde7dc1..1e61877fe147 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -594,6 +594,14 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
>> irq_source_id, u32 irq,
>> int level, bool line_status);
>>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>>  
>> +static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.irq_type == KVMPPC_IRQ_XIVE;
>> +}
>> +
>> +extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>> +   struct kvm_vcpu *vcpu, u32 cpu);
>> +extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
>>  extern void kvmppc_xive_native_init_module(void);
>>  extern void kvmppc_xive_native_exit_module(void);
>>  
>> @@ -621,6 +629,11 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, 
>> int irq_source_id, u32 ir
>>int level, bool line_status) { return 
>> -ENODEV; }
>>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>>  
>> +static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>> +{ return 0; }
>> +static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>> +  struct kvm_vcpu *vcpu, u32 cpu) { return -EBUSY; }
>> +static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { 
>> }
>>  static inline void kvmppc_xive_native_init_module(void) { }
>>  static inline void kvmppc_xive_native_exit_module(void) { }
>>  
>> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
>> index a08ae6fd4c51..bcb1bbcf0359 100644
>> --- a/arch/powerpc/kvm/book3s_xive.h
>> +++ b/arch/powerpc/kvm/book3s_xive.h
>> @@ -248,5 +248,11 @@ extern int (*__xive_vm_h_ipi)(struct kvm_vcpu *vcpu, 
>> unsigned long server,
>>  extern int (*__xive_vm_h_cppr)(struct kvm_vcpu *vcpu, unsigned long cppr);
>>  extern int (*__xive_vm_h_eoi)(struct kvm_vcpu *vcpu, unsigned long xirr);
>>  
>> +/*
>> + * Common Xive routines for XICS-over-XIVE and XIVE native
>> + */
>> +void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu);
>> +int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu 
>> *vcpu);
>> +
>>  #endif /* CONFIG_KVM_XICS */
>>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index e6368163d3a0..52bf74a1616e 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -988,6 +988,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_ARM_VM_IPA_SIZE 165
>>  #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
>>  #define KVM_CAP_HYPERV_CPUID 167
>> +#define KVM_CAP_PPC_IRQ_XIVE 168
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>> index f78d002f0fe0..d1cc18a5b1c4 100644
>> --- a/arch/powerpc/kvm/book3s_xive.c
>> +++ b/arch/powerpc/kvm/book3s_xive.c
>> @@ -1049,7 +1049,7 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned 
>> long guest_irq,
>>  }
>>  EXPORT_SYMBOL_GPL(kvmppc_xive_clr_mapped);
>>  
>> -static void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
>> +void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
>>  {
>>  struct 

Re: [PATCH 3/6] x86: clean up _TIF_SYSCALL_EMU handling using ptrace_syscall_enter hook

2019-03-12 Thread Sudeep Holla
On Mon, Mar 11, 2019 at 08:04:39PM -0700, Andy Lutomirski wrote:
> On Mon, Mar 11, 2019 at 6:35 PM Haibo Xu (Arm Technology China)
>  wrote:
> >

[...]

> > For the PTRACE_SYSEMU_SINGLESTEP request, ptrace only need to report(send
> > SIGTRAP) at the entry of a system call, no need to report at the exit of a
> > system call.That's why the old logic-{step = ((flags & (_TIF_SINGLESTEP |
> > _TIF_SYSCALL_EMU)) == _TIF_SINGLESTEP)} here try to filter out the special
> > case(PTRACE_SYSEMU_SINGLESTEP).
> >
> > Another way to make sure the logic is fine, you can run some tests with
> > respect to both logic, and to check whether they have the same behavior.
>
> tools/testing/selftests/x86/ptrace_syscall.c has a test intended to
> exercise this.  Can one of you either confirm that it does exercise it
> and that it still passes or can you improve the test?
>
I did run the tests which didn't flag anything. I haven't looked at the
details of test implementation, but seem to miss this case. I will see
what can be improved(if it's possible). Also I think single_step_syscall
is the one I need to look for this particular one. Both single_step_syscall
ptrace_syscall reported no errors.

--
Regards,
Sudeep


Re: [PATCH 3/6] x86: clean up _TIF_SYSCALL_EMU handling using ptrace_syscall_enter hook

2019-03-12 Thread Sudeep Holla
On Tue, Mar 12, 2019 at 01:34:44AM +, Haibo Xu (Arm Technology China) wrote:
> On 2019/3/12 2:34, Sudeep Holla wrote:
> > (I thought I had sent this email, last Tuesday itself, but saw this in my
> > draft today, something went wrong, sorry for the delay)
> > 
> > On Tue, Mar 05, 2019 at 02:14:47AM +, Haibo Xu (Arm Technology China) 
> > wrote:
> >> On 2019/3/4 18:12, Sudeep Holla wrote:
> >>> On Mon, Mar 04, 2019 at 08:25:28AM +, Haibo Xu (Arm Technology China) 
> >>> wrote:
>  On 2019/3/1 2:32, Sudeep Holla wrote:
> > Now that we have a new hook ptrace_syscall_enter that can be called from
> > syscall entry code and it handles PTRACE_SYSEMU in generic code, we
> > can do some cleanup using the same in syscall_trace_enter.
> >
> > Further the extra logic to find single stepping PTRACE_SYSEMU_SINGLESTEP
> > in syscall_slow_exit_work seems unnecessary. Let's remove the same.
> 
>  I think we should not change the logic here. Is so, it will double the 
>  report of syscall
>  when PTRACE_SYSEMU_SINGLESTEP is enabled.
> 
> >>>
> >>> I don't think that should happen, but I may be missing something.
> >>> Can you explain how ?
> >>>
> >>
> >> When PTRACE_SYSEMU_SINGLESTEP is enabled, both the _TIF_SYSCALL_EMU and
> >> _TIF_SINGLESTEP flags are set, but ptrace only need to report(send SIGTRAP)
> >> at the entry of a system call, no need to report at the exit of a system
> >> call.
> >>
> > Sorry, but I still not get it, we have:
> > 
> > step = ((flags & (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU)) == 
> > _TIF_SINGLESTEP);
> > 
> > For me, this is same as:
> > step = ((flags & _TIF_SINGLESTEP) == _TIF_SINGLESTEP)
> > or
> > if (flags & _TIF_SINGLESTEP)
> > step = true;
> > 
> 
> I don't think so! As I mentioned in the last email loop, when
> PTRACE_SYSEMU_SINGLESTE is enabled, both the _TIF_SYSCALL_EMU and
> _TIF_SINGLESTEP flags are set, in which case the step should be "false" for
> the old logic. But with the new logic, the step is "true".
> 

Ah right, sorry I missed that.

> > So when PTRACE_SYSEMU_SINGLESTEP, _TIF_SYSCALL_EMU and _TIF_SINGLESTEP
> > are set and step evaluates to true.
> > 
> > So dropping _TIF_SYSCALL_EMU here should be fine. Am I still missing
> > something ?
> > 
> > --
> > Regards,
> > Sudeep
> > 
> 
> For the PTRACE_SYSEMU_SINGLESTEP request, ptrace only need to report(send
> SIGTRAP) at the entry of a system call, no need to report at the exit of a
> system call.That's why the old logic-{step = ((flags & (_TIF_SINGLESTEP |
> _TIF_SYSCALL_EMU)) == _TIF_SINGLESTEP)} here try to filter out the special
> case(PTRACE_SYSEMU_SINGLESTEP).
> 

Understood

> Another way to make sure the logic is fine, you can run some tests with
> respect to both logic, and to check whether they have the same behavior.
>

I did run selftests after Andy Lutomirski pointed out. Nothing got flagged,
I haven't looked at the tests themselves yet, but it clearly misses this
case.

--
Regards,
Sudeep


Re: [PATCH v2 02/16] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode

2019-03-12 Thread Cédric Le Goater
On 2/25/19 1:08 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 12:28:26PM +0100, Cédric Le Goater wrote:
>> This is the basic framework for the new KVM device supporting the XIVE
>> native exploitation mode. The user interface exposes a new KVM device
>> to be created by QEMU when running on a L0 hypervisor only. Support
>> for nested guests is not available yet.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/kvm_host.h|   1 +
>>  arch/powerpc/include/asm/kvm_ppc.h |   8 +
>>  arch/powerpc/include/uapi/asm/kvm.h|   3 +
>>  include/uapi/linux/kvm.h   |   2 +
>>  arch/powerpc/kvm/book3s.c  |   7 +-
>>  arch/powerpc/kvm/book3s_xive_native.c  | 191 +
>>  Documentation/virtual/kvm/devices/xive.txt |  19 ++
>>  arch/powerpc/kvm/Makefile  |   2 +-
>>  8 files changed, 231 insertions(+), 2 deletions(-)
>>  create mode 100644 arch/powerpc/kvm/book3s_xive_native.c
>>  create mode 100644 Documentation/virtual/kvm/devices/xive.txt
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index 091430339db1..9f75a75a07f2 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -220,6 +220,7 @@ extern struct kvm_device_ops kvm_xics_ops;
>>  struct kvmppc_xive;
>>  struct kvmppc_xive_vcpu;
>>  extern struct kvm_device_ops kvm_xive_ops;
>> +extern struct kvm_device_ops kvm_xive_native_ops;
>>  
>>  struct kvmppc_passthru_irqmap;
>>  
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index b3bf4f61b30c..4b72ddde7dc1 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -593,6 +593,10 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, 
>> u64 icpval);
>>  extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
>> int level, bool line_status);
>>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>> +
>> +extern void kvmppc_xive_native_init_module(void);
>> +extern void kvmppc_xive_native_exit_module(void);
>> +
>>  #else
>>  static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
>> u32 priority) { return -1; }
>> @@ -616,6 +620,10 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu 
>> *vcpu, u64 icpval) { retur
>>  static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, 
>> u32 irq,
>>int level, bool line_status) { return 
>> -ENODEV; }
>>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>> +
>> +static inline void kvmppc_xive_native_init_module(void) { }
>> +static inline void kvmppc_xive_native_exit_module(void) { }
>> +
>>  #endif /* CONFIG_KVM_XIVE */
>>  
>>  #ifdef CONFIG_PPC_POWERNV
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 8c876c166ef2..b002c0c67787 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -675,4 +675,7 @@ struct kvm_ppc_cpu_char {
>>  #define  KVM_XICS_PRESENTED (1ULL << 43)
>>  #define  KVM_XICS_QUEUED(1ULL << 44)
>>  
>> +/* POWER9 XIVE Native Interrupt Controller */
>> +#define KVM_DEV_XIVE_GRP_CTRL   1
>> +
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6d4ea4b6c922..e6368163d3a0 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1211,6 +1211,8 @@ enum kvm_device_type {
>>  #define KVM_DEV_TYPE_ARM_VGIC_V3KVM_DEV_TYPE_ARM_VGIC_V3
>>  KVM_DEV_TYPE_ARM_VGIC_ITS,
>>  #define KVM_DEV_TYPE_ARM_VGIC_ITS   KVM_DEV_TYPE_ARM_VGIC_ITS
>> +KVM_DEV_TYPE_XIVE,
>> +#define KVM_DEV_TYPE_XIVE   KVM_DEV_TYPE_XIVE
>>  KVM_DEV_TYPE_MAX,
>>  };
>>  
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index 601c094f15ab..96d43f091255 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -1040,6 +1040,9 @@ static int kvmppc_book3s_init(void)
>>  if (xics_on_xive()) {
>>  kvmppc_xive_init_module();
>>  kvm_register_device_ops(_xive_ops, KVM_DEV_TYPE_XICS);
>> +kvmppc_xive_native_init_module();
>> +kvm_register_device_ops(_xive_native_ops,
>> +KVM_DEV_TYPE_XIVE);
>>  } else
>>  #endif
>>  kvm_register_device_ops(_xics_ops, KVM_DEV_TYPE_XICS);
>> @@ -1050,8 +1053,10 @@ static int kvmppc_book3s_init(void)
>>  static void kvmppc_book3s_exit(void)
>>  {
>>  #ifdef CONFIG_KVM_XICS
>> -if (xics_on_xive())
>> +if (xics_on_xive()) {
>>  kvmppc_xive_exit_module();
>> +kvmppc_xive_native_exit_module();
>> +}
>>  #endif
>>  #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
>>  

Re: [PATCH v3] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

2019-03-12 Thread Gautham R Shenoy
Hello Thiago,

On Mon, Mar 11, 2019 at 04:35:17PM -0300, Thiago Jung Bauermann wrote:
> When testing DLPAR CPU add/remove on a system under stress,
> pseries_cpu_die() doesn't wait long enough for a CPU to die:
> 
> [  446.983944] cpu 148 (hwid 148) Ready to die...
> [  446.984062] cpu 149 (hwid 149) Ready to die...
> [  446.993518] cpu 150 (hwid 150) Ready to die...
> [  446.993543] Querying DEAD? cpu 150 (150) shows 2
> [  446.994098] cpu 151 (hwid 151) Ready to die...
> [  447.133726] cpu 136 (hwid 136) Ready to die...
> [  447.403532] cpu 137 (hwid 137) Ready to die...
> [  447.403772] cpu 138 (hwid 138) Ready to die...
> [  447.403839] cpu 139 (hwid 139) Ready to die...
> [  447.403887] cpu 140 (hwid 140) Ready to die...
> [  447.403937] cpu 141 (hwid 141) Ready to die...
> [  447.403979] cpu 142 (hwid 142) Ready to die...
> [  447.404038] cpu 143 (hwid 143) Ready to die...
> [  447.513546] cpu 128 (hwid 128) Ready to die...
> [  447.693533] cpu 129 (hwid 129) Ready to die...
> [  447.693999] cpu 130 (hwid 130) Ready to die...
> [  447.703530] cpu 131 (hwid 131) Ready to die...
> [  447.704087] Querying DEAD? cpu 132 (132) shows 2
> [  447.704102] cpu 132 (hwid 132) Ready to die...
> [  447.713534] cpu 133 (hwid 133) Ready to die...
> [  447.714064] Querying DEAD? cpu 134 (134) shows 2
> 
> This is a race between one CPU stopping and another one calling
> pseries_cpu_die() to wait for it to stop. That function does a short busy
> loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
> that it is stopped, but I think there's a lot for the stopping CPU to do
> which may take longer than this loop allows.
> 
> As can be seen in the dmesg right before or after the "Querying DEAD?"
> messages, if pseries_cpu_die() waited a little longer it would have seen
> the CPU in the stopped state.
> 
> What I think is going on is that CPU 134 was inactive at the time it was
> unplugged. In that case, dlpar_offline_cpu() calls H_PROD on that CPU and
> immediately calls pseries_cpu_die(). Meanwhile, the prodded CPU activates
> and start the process of stopping itself. The busy loop is not long enough
> to allow for the CPU to wake up and complete the stopping process.
> 
> This can be a problem because if the busy loop finishes too early, then the
> kernel may offline another CPU before the previous one finished dying,
> which would lead to two concurrent calls to rtas-stop-self, which is
> prohibited by the PAPR.
> 
> We can make the race a lot more even if we only start querying if the CPU
> is stopped when the stopping CPU is close to call rtas_stop_self(). Since
> pseries_mach_cpu_die() sets the CPU current state to offline almost
> immediately before calling rtas_stop_self(), we use that as a signal that
> it is either already stopped or very close to that point, and we can start
> the busy loop.
> 
> As suggested by Michael Ellerman, this patch also changes the busy loop to
> wait for a fixed amount of wall time. Based on the measurements that
> Gautham did on a POWER9 system, in successful cases of
> smp_query_cpu_stopped(cpu) returning affirmative, the maximum time spent
> inside the loop was was 10 ms. This patch loops for 20 ms just be sure.
> 
> Signed-off-by: Thiago Jung Bauermann 

Thanks for this version. I have tested the patch and we no longer see
the "Querying DEAD? cpu X (Y) shows 2" message.


Tested-and-Reviewed-by: Gautham R. Shenoy 


--
Thanks and Regards
gautham.



Re: [RFCv2 PATCH 4/4] powerpc: KASAN for 64bit Book3E

2019-03-12 Thread Christophe Leroy

Hi,

Build failure with pmac32_defconfig.

  CC  arch/powerpc/kernel/asm-offsets.s
In file included from ./arch/powerpc/include/asm/book3s/32/pgtable.h:149:0,
 from ./arch/powerpc/include/asm/book3s/pgtable.h:8,
 from ./arch/powerpc/include/asm/pgtable.h:18,
 from ./arch/powerpc/include/asm/kasan.h:18,
 from ./include/linux/kasan.h:14,
 from ./include/linux/slab.h:129,
 from ./include/linux/crypto.h:24,
 from ./include/crypto/hash.h:16,
 from ./include/linux/uio.h:14,
 from ./include/linux/socket.h:8,
 from ./include/linux/compat.h:15,
 from arch/powerpc/kernel/asm-offsets.c:18:
./include/asm-generic/fixmap.h: In function ‘fix_to_virt’:
./arch/powerpc/include/asm/fixmap.h:27:22: error: ‘KASAN_SHADOW_START’ 
undeclared (first use in this function)

 #define FIXADDR_TOP (KASAN_SHADOW_START - PAGE_SIZE)
  ^
./include/asm-generic/fixmap.h:21:27: note: in expansion of macro 
‘FIXADDR_TOP’

 #define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT))
   ^
./include/asm-generic/fixmap.h:33:9: note: in expansion of macro 
‘__fix_to_virt’

  return __fix_to_virt(idx);
 ^
./arch/powerpc/include/asm/fixmap.h:27:22: note: each undeclared 
identifier is reported only once for each function it appears in

 #define FIXADDR_TOP (KASAN_SHADOW_START - PAGE_SIZE)
  ^
./include/asm-generic/fixmap.h:21:27: note: in expansion of macro 
‘FIXADDR_TOP’

 #define __fix_to_virt(x) (FIXADDR_TOP - ((x) << PAGE_SHIFT))
   ^
./include/asm-generic/fixmap.h:33:9: note: in expansion of macro 
‘__fix_to_virt’

  return __fix_to_virt(idx);
 ^
In file included from ./include/linux/bug.h:5:0,
 from ./include/linux/thread_info.h:12,
 from ./include/asm-generic/preempt.h:5,
 from ./arch/powerpc/include/generated/asm/preempt.h:1,
 from ./include/linux/preempt.h:78,
 from ./include/linux/spinlock.h:51,
 from ./include/linux/seqlock.h:36,
 from ./include/linux/time.h:6,
 from ./include/linux/compat.h:10,
 from arch/powerpc/kernel/asm-offsets.c:18:
./include/asm-generic/fixmap.h: In function ‘virt_to_fix’:
./arch/powerpc/include/asm/fixmap.h:27:22: error: ‘KASAN_SHADOW_START’ 
undeclared (first use in this function)

 #define FIXADDR_TOP (KASAN_SHADOW_START - PAGE_SIZE)
  ^
./arch/powerpc/include/asm/bug.h:76:27: note: in definition of macro 
‘BUG_ON’

  if (__builtin_constant_p(x)) {\
   ^
./include/asm-generic/fixmap.h:38:18: note: in expansion of macro 
‘FIXADDR_TOP’

  BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
  ^
make[1]: *** [arch/powerpc/kernel/asm-offsets.s] Error 1
make: *** [prepare0] Error 2

Christophe

On 03/12/2019 01:23 AM, Daniel Axtens wrote:

Wire up KASAN. Only outline instrumentation is supported.

The KASAN shadow area is mapped into vmemmap space:
0x8000 0400   to 0x8000 0600  .
To do this we require that vmemmap be disabled. (This is the default
in the kernel config that QorIQ provides for the machine in their
SDK anyway - they use flat memory.)

Only the kernel linear mapping (0xc000...) is checked. The vmalloc and
ioremap areas (also in 0x800...) are all mapped to the zero page. As
with the Book3S hash series, this requires overriding the memory <->
shadow mapping.

Also, as with both previous 64-bit series, early instrumentation is not
supported.  It would allow us to drop the check_return_arch_not_ready()
hook in the KASAN core, but it's tricky to get it set up early enough:
we need it setup before the first call to instrumented code like printk().
Perhaps in the future.

Only KASAN_MINIMAL works.

Tested on e6500. KVM, kexec and xmon have not been tested.

The test_kasan module fires warnings as expected, except for the
following tests:

  - Expected/by design:
kasan test: memcg_accounted_kmem_cache allocate memcg accounted object

  - Due to only supporting KASAN_MINIMAL:
kasan test: kasan_stack_oob out-of-bounds on stack
kasan test: kasan_global_oob out-of-bounds global variable
kasan test: kasan_alloca_oob_left out-of-bounds to left on alloca
kasan test: kasan_alloca_oob_right out-of-bounds to right on alloca
kasan test: use_after_scope_test use-after-scope on int
kasan test: use_after_scope_test use-after-scope on array

Thanks to those who have done the heavy lifting over the past several years:
  - Christophe's 32 bit series: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-February/185379.html
  - Aneesh's Book3S hash series: https://lwn.net/Articles/655642/
  - Balbir's Book3S radix series: https://patchwork.ozlabs.org/patch/795211/

Cc: Christophe Leroy 
Cc: Aneesh Kumar K.V 
Cc: Balbir Singh