Re: [PATCH v7 7/7] powerpc/32: use set_memory_attr()

2020-03-31 Thread Christophe Leroy




Le 01/04/2020 à 04:27, Russell Currey a écrit :

On Tue, 2020-03-31 at 11:56 +0200, Christophe Leroy wrote:


Le 31/03/2020 à 06:48, Russell Currey a écrit :

From: Christophe Leroy 

Use set_memory_attr() instead of the PPC32 specific
change_page_attr()

change_page_attr() was checking that the address was not mapped by
blocks and was handling highmem, but that's unneeded because the
affected pages can't be in highmem and block mapping verification
is already done by the callers.

Signed-off-by: Christophe Leroy 
---
   arch/powerpc/mm/pgtable_32.c | 95 ---
-
   1 file changed, 10 insertions(+), 85 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c
b/arch/powerpc/mm/pgtable_32.c
index 5fb90edd865e..3d92eaf3ee2f 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -23,6 +23,7 @@
   #include 
   #include 
   #include 
+#include 
   
   #include 

   #include 
@@ -121,99 +122,20 @@ void __init mapin_ram(void)
}
   }
   
-/* Scan the real Linux page tables and return a PTE pointer for

- * a virtual address in a context.
- * Returns true (1) if PTE was found, zero otherwise.  The pointer
to
- * the PTE pointer is unmodified if PTE is not found.
- */
-static int
-get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
pmd_t **pmdp)


This will conflict, get_pteptr() is gone now, see
https://github.com/linuxppc/linux/commit/2efc7c085f05870eda6f29ac71eeb83f3bd54415

Christophe


OK cool, so I can just drop that hunk?  Will try that and make sure it
rebases on powerpc/next


Well, you still have to remove __change_page_attr_noflush() and 
change_page_attr(), everything else in this patch should still be the same.


Christophe



- Russell





-{
-pgd_t  *pgd;
-   pud_t   *pud;
-pmd_t  *pmd;
-pte_t  *pte;
-int retval = 0;
-
-pgd = pgd_offset(mm, addr & PAGE_MASK);
-if (pgd) {
-   pud = pud_offset(pgd, addr & PAGE_MASK);
-   if (pud && pud_present(*pud)) {
-   pmd = pmd_offset(pud, addr & PAGE_MASK);
-   if (pmd_present(*pmd)) {
-   pte = pte_offset_map(pmd, addr &
PAGE_MASK);
-   if (pte) {
-   retval = 1;
-   *ptep = pte;
-   if (pmdp)
-   *pmdp = pmd;
-   /* XXX caller needs to do
pte_unmap, yuck */
-   }
-   }
-   }
-}
-return(retval);
-}
-
-static int __change_page_attr_noflush(struct page *page, pgprot_t
prot)
-{
-   pte_t *kpte;
-   pmd_t *kpmd;
-   unsigned long address;
-
-   BUG_ON(PageHighMem(page));
-   address = (unsigned long)page_address(page);
-
-   if (v_block_mapped(address))
-   return 0;
-   if (!get_pteptr(_mm, address, , ))
-   return -EINVAL;
-   __set_pte_at(_mm, address, kpte, mk_pte(page, prot), 0);
-   pte_unmap(kpte);
-
-   return 0;
-}
-
-/*
- * Change the page attributes of an page in the linear mapping.
- *
- * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
- */
-static int change_page_attr(struct page *page, int numpages,
pgprot_t prot)
-{
-   int i, err = 0;
-   unsigned long flags;
-   struct page *start = page;
-
-   local_irq_save(flags);
-   for (i = 0; i < numpages; i++, page++) {
-   err = __change_page_attr_noflush(page, prot);
-   if (err)
-   break;
-   }
-   wmb();
-   local_irq_restore(flags);
-   flush_tlb_kernel_range((unsigned long)page_address(start),
-  (unsigned long)page_address(page));
-   return err;
-}
-
   void mark_initmem_nx(void)
   {
-   struct page *page = virt_to_page(_sinittext);
unsigned long numpages = PFN_UP((unsigned long)_einittext) -
 PFN_DOWN((unsigned long)_sinittext);
   
   	if (v_block_mapped((unsigned long)_stext + 1))

mmu_mark_initmem_nx();
else
-   change_page_attr(page, numpages, PAGE_KERNEL);
+   set_memory_attr((unsigned long)_sinittext, numpages,
PAGE_KERNEL);
   }
   
   #ifdef CONFIG_STRICT_KERNEL_RWX

   void mark_rodata_ro(void)
   {
-   struct page *page;
unsigned long numpages;
   
   	if (v_block_mapped((unsigned long)_sinittext)) {

@@ -222,20 +144,18 @@ void mark_rodata_ro(void)
return;
}
   
-	page = virt_to_page(_stext);

numpages = PFN_UP((unsigned long)_etext) -
   PFN_DOWN((unsigned long)_stext);
   
-	change_page_attr(page, numpages, PAGE_KERNEL_ROX);

+   set_memory_attr((unsigned long)_stext, numpages,
PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. 

Re: [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA)

2020-03-31 Thread Baoquan He
On 04/01/20 at 12:56am, Mike Rapoport wrote:
> On Mon, Mar 30, 2020 at 11:58:43AM +0200, Michal Hocko wrote:
> > 
> > What would it take to make ia64 use HAVE_MEMBLOCK_NODE_MAP? I would
> > really love to see that thing go away. It is causing problems when
> > people try to use memblock api.
> 
> Well, it's a small patch in the end :)
> 
> Currently all NUMA architectures currently enable
> CONFIG_HAVE_MEMBLOCK_NODE_MAP and use free_area_init_nodes() to initialize
> nodes and zones structures.

I did some investigation, there are nine ARCHes having NUMA config. And
among them, alpha doesn't have HAVE_MEMBLOCK_NODE_MAP support. While the
interesting thing is there are two ARCHes which have
HAVE_MEMBLOCK_NODE_MAP config, but don't have NUMA config adding, they
are microblaze and riscv. Obviously it was not carefully considered to
add HAVE_MEMBLOCK_NODE_MAP config into riscv and microblaze.

arch/alpha/Kconfig:config NUMA
arch/arm64/Kconfig:config NUMA
arch/ia64/Kconfig:config NUMA
arch/mips/Kconfig:config NUMA
arch/powerpc/Kconfig:config NUMA
arch/s390/Kconfig:config NUMA
arch/sh/mm/Kconfig:config NUMA
arch/sparc/Kconfig:config NUMA
arch/x86/Kconfig:config NUMA

>From above information, we can remove HAVE_MEMBLOCK_NODE_MAP, and
replace it with CONFIG_NUMA. That sounds more sensible to store nid into
memblock when NUMA support is enabled.


> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 079d17d96410..9de81112447e 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -50,9 +50,7 @@ struct memblock_region {
>   phys_addr_t base;
>   phys_addr_t size;
>   enum memblock_flags flags;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>   int nid;
> -#endif

I didn't look into other change very carefully, but feel enabling
memblock node map for all ARCHes looks a little radical. After all, many
ARCHes even don't have NUMA support.

>  };
>  
>  /**
> @@ -215,7 +213,6 @@ static inline bool memblock_is_nomap(struct 
> memblock_region *m)
>   return m->flags & MEMBLOCK_NOMAP;
>  }
>  
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
>   unsigned long  *end_pfn);
>  void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
> @@ -234,7 +231,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned 
> long *out_start_pfn,
>  #define for_each_mem_pfn_range(i, nid, p_start, p_end, p_nid)
> \
>   for (i = -1, __next_mem_pfn_range(, nid, p_start, p_end, p_nid); \
>i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid))
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>  
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
> @@ -310,7 +306,6 @@ void __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
> *zone,
>   for_each_mem_range_rev(i, , , \
>  nid, flags, p_start, p_end, p_nid)
>  
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  int memblock_set_node(phys_addr_t base, phys_addr_t size,
> struct memblock_type *type, int nid);
>  
> @@ -323,16 +318,6 @@ static inline int memblock_get_region_node(const struct 
> memblock_region *r)
>  {
>   return r->nid;
>  }
> -#else
> -static inline void memblock_set_region_node(struct memblock_region *r, int 
> nid)
> -{
> -}
> -
> -static inline int memblock_get_region_node(const struct memblock_region *r)
> -{
> - return 0;
> -}
> -#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>  
>  /* Flags for memblock allocation APIs */
>  #define MEMBLOCK_ALLOC_ANYWHERE  (~(phys_addr_t)0)
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c54fb96cb1e6..368a45d4696a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2125,9 +2125,8 @@ static inline unsigned long get_num_physpages(void)
>   return phys_pages;
>  }
>  
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>  /*
> - * With CONFIG_HAVE_MEMBLOCK_NODE_MAP set, an architecture may initialise its
> + * Using memblock node mappings, an architecture may initialise its
>   * zones, allocate the backing mem_map and account for memory holes in a more
>   * architecture independent manner. This is a substitute for creating the
>   * zone_sizes[] and zholes_size[] arrays and passing them to
> @@ -2148,9 +2147,6 @@ static inline unsigned long get_num_physpages(void)
>   * registered physical page range.  Similarly
>   * sparse_memory_present_with_active_regions() calls memory_present() for
>   * each range when SPARSEMEM is enabled.
> - *
> - * See mm/page_alloc.c for more information on each function exposed by
> - * CONFIG_HAVE_MEMBLOCK_NODE_MAP.
>   */
>  extern void free_area_init_nodes(unsigned long *max_zone_pfn);
>  unsigned long node_map_pfn_alignment(void);
> @@ -2165,22 +2161,12 @@ extern void free_bootmem_with_active_regions(int nid,
>   unsigned long max_low_pfn);

Re: [PATCH v5 4/4] powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH

2020-03-31 Thread Aneesh Kumar K.V
Vaibhav Jain  writes:

> This patch implements support for papr_scm command
> 'DSM_PAPR_SCM_HEALTH' that returns a newly introduced 'struct
> nd_papr_scm_dimm_health_stat' instance containing dimm health
> information back to user space in response to ND_CMD_CALL. This
> functionality is implemented in newly introduced papr_scm_get_health()
> that queries the scm-dimm health information and then copies these bitmaps
> to the package payload whose layout is defined by 'struct
> papr_scm_ndctl_health'.
>
> The patch also introduces a new member a new member 'struct
> papr_scm_priv.health' thats an instance of 'struct
> nd_papr_scm_dimm_health_stat' to cache the health information of a
> scm-dimm. As a result functions drc_pmem_query_health() and
> papr_flags_show() are updated to populate and use this new struct
> instead of two be64 integers that we earlier used.
>

Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
>
> v4..v5: None
>
> v3..v4: Call the DSM_PAPR_SCM_HEALTH service function from
>   papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
>
> v2..v3: Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx'
>   types as its exported to the userspace [Aneesh]
>   Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm
>   health from enum to #defines [Aneesh]
>
> v1..v2: New patch in the series
> ---
>  arch/powerpc/include/uapi/asm/papr_scm_dsm.h |  40 +++
>  arch/powerpc/platforms/pseries/papr_scm.c| 109 ---
>  2 files changed, 132 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h 
> b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
> index c039a49b41b4..8265125304ca 100644
> --- a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
> +++ b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
> @@ -132,6 +132,7 @@ struct nd_papr_scm_cmd_pkg {
>   */
>  enum dsm_papr_scm {
>   DSM_PAPR_SCM_MIN =  0x1,
> + DSM_PAPR_SCM_HEALTH,
>   DSM_PAPR_SCM_MAX,
>  };
>  
> @@ -158,4 +159,43 @@ static void *papr_scm_pcmd_to_payload(struct 
> nd_papr_scm_cmd_pkg *pcmd)
>   else
>   return (void *)((__u8 *) pcmd + pcmd->payload_offset);
>  }
> +
> +/* Various scm-dimm health indicators */
> +#define DSM_PAPR_SCM_DIMM_HEALTHY   0
> +#define DSM_PAPR_SCM_DIMM_UNHEALTHY 1
> +#define DSM_PAPR_SCM_DIMM_CRITICAL  2
> +#define DSM_PAPR_SCM_DIMM_FATAL 3
> +
> +/*
> + * Struct exchanged between kernel & ndctl in for PAPR_DSM_PAPR_SMART_HEALTH
> + * Various bitflags indicate the health status of the dimm.
> + *
> + * dimm_unarmed  : Dimm not armed. So contents wont persist.
> + * dimm_bad_shutdown : Previous shutdown did not persist contents.
> + * dimm_bad_restore  : Contents from previous shutdown werent restored.
> + * dimm_scrubbed : Contents of the dimm have been scrubbed.
> + * dimm_locked   : Contents of the dimm cant be modified until 
> CEC reboot
> + * dimm_encrypted: Contents of dimm are encrypted.
> + * dimm_health   : Dimm health indicator.
> + */
> +struct nd_papr_scm_dimm_health_stat_v1 {
> + __u8 dimm_unarmed;
> + __u8 dimm_bad_shutdown;
> + __u8 dimm_bad_restore;
> + __u8 dimm_scrubbed;
> + __u8 dimm_locked;
> + __u8 dimm_encrypted;
> + __u16 dimm_health;
> +};
> +
> +/*
> + * Typedef the current struct for dimm_health so that any application
> + * or kernel recompiled after introducing a new version automatically
> + * supports the new version.
> + */
> +#define nd_papr_scm_dimm_health_stat nd_papr_scm_dimm_health_stat_v1
> +
> +/* Current version number for the dimm health struct */
> +#define ND_PAPR_SCM_DIMM_HEALTH_VERSION 1
> +
>  #endif /* _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_ */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index e8ce96d2249e..ce94762954e0 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -47,8 +47,7 @@ struct papr_scm_priv {
>   struct mutex dimm_mutex;
>  
>   /* Health information for the dimm */
> - __be64 health_bitmap;
> - __be64 health_bitmap_valid;
> + struct nd_papr_scm_dimm_health_stat health;
>  };
>  
>  static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -158,6 +157,7 @@ static int drc_pmem_query_health(struct papr_scm_priv *p)
>  {
>   unsigned long ret[PLPAR_HCALL_BUFSIZE];
>   int64_t rc;
> + __be64 health;
>  
>   rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
>   if (rc != H_SUCCESS) {
> @@ -172,13 +172,41 @@ static int drc_pmem_query_health(struct papr_scm_priv 
> *p)
>   return rc;
>  
>   /* Store the retrieved health information in dimm platform data */
> - p->health_bitmap = ret[0];
> - p->health_bitmap_valid = ret[1];
> + health = ret[0] & ret[1];
>  
>   dev_dbg(>pdev->dev,
>   "Queried dimm health info. Bitmap:0x%016llx 

Re: [PATCH v5 3/4] powerpc/papr_scm,uapi: Add support for handling PAPR DSM commands

2020-03-31 Thread Aneesh Kumar K.V
Vaibhav Jain  writes:

> Implement support for handling PAPR DSM commands in papr_scm
> module. We advertise support for ND_CMD_CALL for the dimm command mask
> and implement necessary scaffolding in the module to handle ND_CMD_CALL
> ioctl and DSM commands that we receive.
>
> The layout of the DSM commands as we expect from libnvdimm/libndctl is
> described in newly introduced uapi header 'papr_scm_dsm.h' which
> defines a new 'struct nd_papr_scm_cmd_pkg' header. This header is used
> to communicate the DSM command via 'nd_pkg_papr_scm->nd_command' and
> size of payload that need to be sent/received for servicing the DSM.
>
> The PAPR DSM commands are assigned indexes started from 0x1 to
> prevent them from overlapping ND_CMD_* values and also makes handling
> dimm commands in papr_scm_ndctl(). A new function cmd_to_func() is
> implemented that reads the args to papr_scm_ndctl() and performs
> sanity tests on them. In case of a DSM command being sent via
> ND_CMD_CALL a newly introduced function papr_scm_service_dsm() is
> called to handle the request.
>

Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Vaibhav Jain 
>
> ---
> Changelog:
>
> v4..v5: Fixed a bug in new implementation of papr_scm_ndctl().
>
> v3..v4: Updated papr_scm_ndctl() to delegate DSM command handling to a
>   different function papr_scm_service_dsm(). [Aneesh]
>
> v2..v3: Updated the nd_papr_scm_cmd_pkg to use __xx types as its
>   exported to the userspace [Aneesh]
>
> v1..v2: New patch in the series.
> ---
>  arch/powerpc/include/uapi/asm/papr_scm_dsm.h | 161 +++
>  arch/powerpc/platforms/pseries/papr_scm.c|  97 ++-
>  2 files changed, 252 insertions(+), 6 deletions(-)
>  create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_dsm.h
>
> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h 
> b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
> new file mode 100644
> index ..c039a49b41b4
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
> @@ -0,0 +1,161 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * PAPR SCM Device specific methods and struct for libndctl and ndctl
> + *
> + * (C) Copyright IBM 2020
> + *
> + * Author: Vaibhav Jain 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2, or (at your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_
> +#define _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_
> +
> +#include 
> +
> +#ifdef __KERNEL__
> +#include 
> +#else
> +#include 
> +#endif
> +
> +/*
> + * DSM Envelope:
> + *
> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> + * 'envelopes' which consists of a header and user-defined payload sections.
> + * The header is described by 'struct nd_papr_scm_cmd_pkg' which expects a
> + * payload following it and offset of which relative to the struct is 
> provided
> + * by 'nd_papr_scm_cmd_pkg.payload_offset'. *
> + *
> + *  +-+-+---+
> + *  |   64-Bytes  |   8-Bytes   |   Max 184-Bytes   |
> + *  +-+-+---+
> + *  |   nd_papr_scm_cmd_pkg |   |
> + *  |-+ |   |
> + *  |  nd_cmd_pkg | |   |
> + *  +-+-+---+
> + *  | nd_family   |  |   |
> + *  | nd_size_out | cmd_status  ||
> + *  | nd_size_in  | payload_version |  PAYLOAD   |
> + *  | nd_command  | payload_offset ->|
> + *  | nd_fw_size  | ||
> + *  +-+-+---+
> + *
> + * DSM Header:
> + *
> + * The header is defined as 'struct nd_papr_scm_cmd_pkg' which embeds a
> + * 'struct nd_cmd_pkg' instance. The DSM command is assigned to member
> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelop which 
> is
> + * contained in 'struct nd_cmd_pkg', the header also has members following
> + * members:
> + *
> + * 'cmd_status'  : (Out) Errors if any encountered while 
> servicing DSM.
> + * 'payload_version' : (In/Out) Version number associated with the payload.
> + * 'payload_offset'  : (In)Relative offset of payload from start of envelope.
> + *
> + * DSM Payload:
> + *
> + * The layout of the DSM Payload is 

Re: [PATCH v2] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.

2020-03-31 Thread Ram Pai
On Tue, Mar 31, 2020 at 08:53:07PM -0300, Thiago Jung Bauermann wrote:
> 
> Hi Ram,
> 
> Ram Pai  writes:
> 
> > diff --git a/arch/powerpc/sysdev/xive/spapr.c 
> > b/arch/powerpc/sysdev/xive/spapr.c
> > index 55dc61c..608b52f 100644
> > --- a/arch/powerpc/sysdev/xive/spapr.c
> > +++ b/arch/powerpc/sysdev/xive/spapr.c
> > @@ -26,6 +26,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  
> >  #include "xive-internal.h"
> >  
> > @@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, 
> > struct xive_q *q, u8 prio,
> > rc = -EIO;
> > } else {
> > q->qpage = qpage;
> > +   if (is_secure_guest())
> > +   uv_share_page(PHYS_PFN(qpage_phys),
> > +   1 << xive_alloc_order(order));
> 
> If I understand this correctly, you're passing the number of bytes of
> the queue to uv_share_page(), but that ultracall expects the number of
> pages to be shared.


static inline u32 xive_alloc_order(u32 queue_shift)
{
return (queue_shift > PAGE_SHIFT) ? (queue_shift - PAGE_SHIFT) : 0;
}

xive_alloc_order(order) returns the order of PAGE_SIZE pages.
Hence the value passed to uv_shared_pages is the number of pages,
and not the number of bytes.

BTW: I did verify through testing that it was indeed passing 1 page to the
uv_share_page().  

RP



Re: [PATCH v5 2/4] ndctl/uapi: Introduce NVDIMM_FAMILY_PAPR_SCM as a new NVDIMM DSM family

2020-03-31 Thread Aneesh Kumar K.V
Vaibhav Jain  writes:

> Add PAPR-scm family of DSM command-set to the white list of NVDIMM
> command sets.
>

Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
>
> v4..v5 : None
>
> v3..v4 : None
>
> v2..v3 : Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>
> v1..v2 : None
> ---
>  include/uapi/linux/ndctl.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> index de5d90212409..99fb60600ef8 100644
> --- a/include/uapi/linux/ndctl.h
> +++ b/include/uapi/linux/ndctl.h
> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>  #define NVDIMM_FAMILY_HPE2 2
>  #define NVDIMM_FAMILY_MSFT 3
>  #define NVDIMM_FAMILY_HYPERV 4
> +#define NVDIMM_FAMILY_PAPR_SCM 5
>  
>  #define ND_IOCTL_CALL_IOWR(ND_IOCTL, ND_CMD_CALL,\
>   struct nd_cmd_pkg)
> -- 
> 2.25.1
> ___
> Linux-nvdimm mailing list -- linux-nvd...@lists.01.org
> To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v5 1/4] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-03-31 Thread Aneesh Kumar K.V
Vaibhav Jain  writes:

> Implement support for fetching nvdimm health information via
> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
> of 64-bit big-endian integers which are then stored in 'struct
> papr_scm_priv' and subsequently partially exposed to user-space via
> newly introduced dimm specific attribute 'papr_flags'. Also a new asm
> header named 'papr-scm.h' is added that describes the interface
> between PHYP and guest kernel.
>
> Following flags are reported via 'papr_flags' sysfs attribute contents
> of which are space separated string flags indicating various nvdimm
> states:
>
>  * "not_armed": Indicating that nvdimm contents wont survive a power
>  cycle.
>  * "save_fail": Indicating that nvdimm contents couldn't be flushed
>  during last shutdown event.
>  * "restore_fail": Indicating that nvdimm contents couldn't be restored
>  during dimm initialization.
>  * "encrypted": Dimm contents are encrypted.
>  * "smart_notify": There is health event for the nvdimm.
>  * "scrubbed" : Indicating that contents of the nvdimm have been
>  scrubbed.
>  * "locked"   : Indicating that nvdimm contents cant be modified
>  until next power cycle.
>
> [1]: commit 58b278f568f0 ("powerpc: Provide initial documentation for
> PAPR hcalls")
>

Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
>
> v4..v5 : None
>
> v3..v4 : None
>
> v2..v3 : Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>NVDIMM unarmed [Aneesh]
>
> v1..v2 : New patch in the series.
> ---
>  arch/powerpc/include/asm/papr_scm.h   |  48 ++
>  arch/powerpc/platforms/pseries/papr_scm.c | 105 +-
>  2 files changed, 151 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/papr_scm.h
>
> diff --git a/arch/powerpc/include/asm/papr_scm.h 
> b/arch/powerpc/include/asm/papr_scm.h
> new file mode 100644
> index ..868d3360f56a
> --- /dev/null
> +++ b/arch/powerpc/include/asm/papr_scm.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Structures and defines needed to manage nvdimms for spapr guests.
> + */
> +#ifndef _ASM_POWERPC_PAPR_SCM_H_
> +#define _ASM_POWERPC_PAPR_SCM_H_
> +
> +#include 
> +#include 
> +
> +/* DIMM health bitmap bitmap indicators */
> +/* SCM device is unable to persist memory contents */
> +#define PAPR_SCM_DIMM_UNARMEDPPC_BIT(0)
> +/* SCM device failed to persist memory contents */
> +#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY PPC_BIT(1)
> +/* SCM device contents are persisted from previous IPL */
> +#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN PPC_BIT(2)
> +/* SCM device contents are not persisted from previous IPL */
> +#define PAPR_SCM_DIMM_EMPTY  PPC_BIT(3)
> +/* SCM device memory life remaining is critically low */
> +#define PAPR_SCM_DIMM_HEALTH_CRITICALPPC_BIT(4)
> +/* SCM device will be garded off next IPL due to failure */
> +#define PAPR_SCM_DIMM_HEALTH_FATAL   PPC_BIT(5)
> +/* SCM contents cannot persist due to current platform health status */
> +#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY   PPC_BIT(6)
> +/* SCM device is unable to persist memory contents in certain conditions */
> +#define PAPR_SCM_DIMM_HEALTH_NON_CRITICALPPC_BIT(7)
> +/* SCM device is encrypted */
> +#define PAPR_SCM_DIMM_ENCRYPTED  PPC_BIT(8)
> +/* SCM device has been scrubbed and locked */
> +#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKEDPPC_BIT(9)
> +
> +/* Bits status indicators for health bitmap indicating unarmed dimm */
> +#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED |  \
> + PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
> +
> +/* Bits status indicators for health bitmap indicating unflushed dimm */
> +#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
> +
> +/* Bits status indicators for health bitmap indicating unrestored dimm */
> +#define PAPR_SCM_DIMM_BAD_RESTORE_MASK  (PAPR_SCM_DIMM_EMPTY)
> +
> +/* Bit status indicators for smart event notification */
> +#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
> +PAPR_SCM_DIMM_HEALTH_FATAL | \
> +PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
> +
> +#endif
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 0b4467e378e5..aaf2e4ab1f75 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -14,6 +14,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  #define BIND_ANY_ADDR (~0ul)
>  
> @@ -39,6 +40,13 @@ struct papr_scm_priv {
>   struct resource res;
>   struct nd_region *region;
>   struct nd_interleave_set nd_set;
> +
> + /* Protect dimm 

Re: Emulate ppc64le builds on x86/x64 machine

2020-03-31 Thread Michael Ellerman
shivakanth k  writes:

> Hi ,
> Could you please help me to set up ppc64le 0n liunux machine

There's some instructions here:

https://github.com/linuxppc/wiki/wiki/Building-powerpc-kernels

cheers


Re: [PATCH 0/2] selftests: vm: Build fixes for powerpc64

2020-03-31 Thread Michael Ellerman
Shuah Khan  writes:
> On 1/30/20 12:01 AM, Sandipan Das wrote:
>> The second patch was already posted independently but because
>> of the changes in the first patch, the second one now depends
>> on it. Hence posting it now as a part of this series.
>> 
>> The last version (v2) of the second patch can be found at:
>> https://patchwork.ozlabs.org/patch/1225969/
>> 
>> Sandipan Das (2):
>>selftests: vm: Do not override definition of ARCH
>>selftests: vm: Fix 64-bit test builds for powerpc64le
>> 
>>   tools/testing/selftests/vm/Makefile| 4 ++--
>>   tools/testing/selftests/vm/run_vmtests | 2 +-
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>> 
>
> Michael,
>
> I see your tested-by on these two patches. I will take these
> through kselftest fixes.

Thanks.

cheers


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-03-31 Thread kbuild test robot
Hi Leonardo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on paulus-powerpc/kvm-ppc-next linus/master linux/master 
v5.6 next-20200331]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Leonardo-Bras/ppc-crash-Reset-spinlocks-during-crash/20200401-091600
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-rhel-kconfig (attached as .config)
compiler: powerpc64le-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   powerpc64le-linux-ld: warning: orphan section `.gnu.hash' from `linker 
stubs' being placed in section `.gnu.hash'
>> powerpc64le-linux-ld: arch/powerpc/kexec/crash.o:(.toc+0x0): undefined 
>> reference to `rtas'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH] powerpc/64/tm: Don't let userspace set regs->trap via sigreturn

2020-03-31 Thread Michael Ellerman
In restore_tm_sigcontexts() we take the trap value directly from the
user sigcontext with no checking:

err |= __get_user(regs->trap, >gp_regs[PT_TRAP]);

This means we can be in the kernel with an arbitrary regs->trap value.

Although that's not immediately problematic, there is a risk we could
trigger one of the uses of CHECK_FULL_REGS():

#define CHECK_FULL_REGS(regs)   BUG_ON(regs->trap & 1)

It can also cause us to unnecessarily save non-volatile GPRs again in
save_nvgprs(), which shouldn't be problematic but is still wrong.

It's also possible it could trick the syscall restart machinery, which
relies on regs->trap not being == 0xc00 (see 9a81c16b5275 ("powerpc:
fix double syscall restarts")), though I haven't been able to make
that happen.

Finally it doesn't match the behaviour of the non-TM case, in
restore_sigcontext() which zeroes regs->trap.

So change restore_tm_sigcontexts() to zero regs->trap.

This was discovered while testing Nick's upcoming rewrite of the
syscall entry path. In that series the call to save_nvgprs() prior to
signal handling (do_notify_resume()) is removed, which leaves the
low-bit of regs->trap uncleared which can then trigger the FULL_REGS()
WARNs in setup_tm_sigcontexts().

Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal 
context")
Cc: sta...@vger.kernel.org # v3.9+
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/signal_64.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 84ed2e77ef9c..adfde59cf4ba 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -473,8 +473,10 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
err |= __get_user(tsk->thread.ckpt_regs.ccr,
  >gp_regs[PT_CCR]);
 
+   /* Don't allow userspace to set the trap value */
+   regs->trap = 0;
+
/* These regs are not checkpointed; they can go in 'regs'. */
-   err |= __get_user(regs->trap, >gp_regs[PT_TRAP]);
err |= __get_user(regs->dar, >gp_regs[PT_DAR]);
err |= __get_user(regs->dsisr, >gp_regs[PT_DSISR]);
err |= __get_user(regs->result, >gp_regs[PT_RESULT]);

base-commit: 7074695ac6fb965d478f373b95bc5c636e9f21b0
-- 
2.25.1



Re: [PATCH v7 7/7] powerpc/32: use set_memory_attr()

2020-03-31 Thread Russell Currey
On Tue, 2020-03-31 at 11:56 +0200, Christophe Leroy wrote:
> 
> Le 31/03/2020 à 06:48, Russell Currey a écrit :
> > From: Christophe Leroy 
> > 
> > Use set_memory_attr() instead of the PPC32 specific
> > change_page_attr()
> > 
> > change_page_attr() was checking that the address was not mapped by
> > blocks and was handling highmem, but that's unneeded because the
> > affected pages can't be in highmem and block mapping verification
> > is already done by the callers.
> > 
> > Signed-off-by: Christophe Leroy 
> > ---
> >   arch/powerpc/mm/pgtable_32.c | 95 ---
> > -
> >   1 file changed, 10 insertions(+), 85 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/pgtable_32.c
> > b/arch/powerpc/mm/pgtable_32.c
> > index 5fb90edd865e..3d92eaf3ee2f 100644
> > --- a/arch/powerpc/mm/pgtable_32.c
> > +++ b/arch/powerpc/mm/pgtable_32.c
> > @@ -23,6 +23,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   
> >   #include 
> >   #include 
> > @@ -121,99 +122,20 @@ void __init mapin_ram(void)
> > }
> >   }
> >   
> > -/* Scan the real Linux page tables and return a PTE pointer for
> > - * a virtual address in a context.
> > - * Returns true (1) if PTE was found, zero otherwise.  The pointer
> > to
> > - * the PTE pointer is unmodified if PTE is not found.
> > - */
> > -static int
> > -get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep,
> > pmd_t **pmdp)
> 
> This will conflict, get_pteptr() is gone now, see 
> https://github.com/linuxppc/linux/commit/2efc7c085f05870eda6f29ac71eeb83f3bd54415
> 
> Christophe

OK cool, so I can just drop that hunk?  Will try that and make sure it
rebases on powerpc/next

- Russell

> 
> 
> > -{
> > -pgd_t  *pgd;
> > -   pud_t   *pud;
> > -pmd_t  *pmd;
> > -pte_t  *pte;
> > -int retval = 0;
> > -
> > -pgd = pgd_offset(mm, addr & PAGE_MASK);
> > -if (pgd) {
> > -   pud = pud_offset(pgd, addr & PAGE_MASK);
> > -   if (pud && pud_present(*pud)) {
> > -   pmd = pmd_offset(pud, addr & PAGE_MASK);
> > -   if (pmd_present(*pmd)) {
> > -   pte = pte_offset_map(pmd, addr &
> > PAGE_MASK);
> > -   if (pte) {
> > -   retval = 1;
> > -   *ptep = pte;
> > -   if (pmdp)
> > -   *pmdp = pmd;
> > -   /* XXX caller needs to do
> > pte_unmap, yuck */
> > -   }
> > -   }
> > -   }
> > -}
> > -return(retval);
> > -}
> > -
> > -static int __change_page_attr_noflush(struct page *page, pgprot_t
> > prot)
> > -{
> > -   pte_t *kpte;
> > -   pmd_t *kpmd;
> > -   unsigned long address;
> > -
> > -   BUG_ON(PageHighMem(page));
> > -   address = (unsigned long)page_address(page);
> > -
> > -   if (v_block_mapped(address))
> > -   return 0;
> > -   if (!get_pteptr(_mm, address, , ))
> > -   return -EINVAL;
> > -   __set_pte_at(_mm, address, kpte, mk_pte(page, prot), 0);
> > -   pte_unmap(kpte);
> > -
> > -   return 0;
> > -}
> > -
> > -/*
> > - * Change the page attributes of an page in the linear mapping.
> > - *
> > - * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
> > - */
> > -static int change_page_attr(struct page *page, int numpages,
> > pgprot_t prot)
> > -{
> > -   int i, err = 0;
> > -   unsigned long flags;
> > -   struct page *start = page;
> > -
> > -   local_irq_save(flags);
> > -   for (i = 0; i < numpages; i++, page++) {
> > -   err = __change_page_attr_noflush(page, prot);
> > -   if (err)
> > -   break;
> > -   }
> > -   wmb();
> > -   local_irq_restore(flags);
> > -   flush_tlb_kernel_range((unsigned long)page_address(start),
> > -  (unsigned long)page_address(page));
> > -   return err;
> > -}
> > -
> >   void mark_initmem_nx(void)
> >   {
> > -   struct page *page = virt_to_page(_sinittext);
> > unsigned long numpages = PFN_UP((unsigned long)_einittext) -
> >  PFN_DOWN((unsigned long)_sinittext);
> >   
> > if (v_block_mapped((unsigned long)_stext + 1))
> > mmu_mark_initmem_nx();
> > else
> > -   change_page_attr(page, numpages, PAGE_KERNEL);
> > +   set_memory_attr((unsigned long)_sinittext, numpages,
> > PAGE_KERNEL);
> >   }
> >   
> >   #ifdef CONFIG_STRICT_KERNEL_RWX
> >   void mark_rodata_ro(void)
> >   {
> > -   struct page *page;
> > unsigned long numpages;
> >   
> > if (v_block_mapped((unsigned long)_sinittext)) {
> > @@ -222,20 +144,18 @@ void mark_rodata_ro(void)
> > return;
> > }
> >   
> > -   page = virt_to_page(_stext);
> > numpages = PFN_UP((unsigned long)_etext) -
> >PFN_DOWN((unsigned long)_stext);
> >   
> > -   change_page_attr(page, 

Re: [PATCH v2 09/11] powerpc/platforms: Move files from 4xx to 44x

2020-03-31 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 31/03/2020 à 18:04, Arnd Bergmann a écrit :
>> On Tue, Mar 31, 2020 at 5:26 PM Christophe Leroy
>>  wrote:
>>> Le 31/03/2020 à 17:14, Arnd Bergmann a écrit :
 On Tue, Mar 31, 2020 at 9:49 AM Christophe Leroy
  wrote:
>
> Only 44x uses 4xx now, so only keep one directory.
>
> Signed-off-by: Christophe Leroy 
> ---
>arch/powerpc/platforms/44x/Makefile   |  9 +++-
>arch/powerpc/platforms/{4xx => 44x}/cpm.c |  0

 No objections to moving everything into one place, but I wonder if the
 combined name should be 4xx instead of 44x, given that 44x currently
 include 46x and 47x. OTOH your approach has the advantage of
 moving fewer files.

>>>
>>> In that case, should we also rename CONFIG_44x to CONFIG_4xx ?
>> 
>> That has the risk of breaking user's defconfig files, but given the
>> small number of users, it may be nicer for consistency. In either
>> case, the two symbols should probably hang around as synonyms,
>> the question is just which one is user visible.
>> 
>
> Not sure it is a good idea to keep two synonyms. In the past we made our 
> best to remove synonyms (We had CONFIG_8xx and CONFIG_PPC_8xx being 
> synonyms, we had CONFIG_6xx and CONFIG_BOOK3S_32 and 
> CONFIG_PPC_STD_MMU_32 being synonyms).
> I think it is a lot cleaner when we can avoid synonyms.
>
> By the way I already dropped CONFIG_4xx in previous patch (8/11). It was 
> not many 4xx changed to 44x. It would be a lot more in the other way 
> round I'm afraid.
>
> But I agree with you it might be more natural to change to 4xx.
>
> Michael, any preference ?

I'd say just use 44x, we've had the inconsistency of 476 living in
platforms/44x, and it hasn't really led to much confusion.

I think for most folks they see 4xx/44x and just think "some 32-bit
embedded thing", so the precise distinction between 4xx, 44x, 476 etc.
is not that important to justify renaming the symbol everywhere I think.

cheers


Re: [PATCH v2 07/11] powerpc/xmon: Remove PPC403 and PPC405

2020-03-31 Thread Michael Ellerman
Arnd Bergmann  writes:
> On Tue, Mar 31, 2020 at 9:49 AM Christophe Leroy
>  wrote:
>>
>> xmon has special support for PPC403 and PPC405 which were part
>> of 40x platforms.
>>
>> 40x platforms are gone, remove support of PPC403 and PPC405 in xmon.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>  arch/powerpc/xmon/ppc-opc.c | 277 +++-
>>  arch/powerpc/xmon/ppc.h |   6 -
>
> These files are from binutils, and may get synchronized with changes there
> in the future. I'd suggest leaving the code in here for now and instead 
> removing
> it from the binutils version first, if they are ready to drop it, too.

Yes those files are almost direct copies of the binutils versions, and
we'd like to keep it that way to ease future synchronisation of changes.

cheers


Re: [PATCH v2 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-03-31 Thread Michael Ellerman
Michal Simek  writes:
> Hi,
>
> recently we wanted to update xilinx intc driver and we found that function
> which we wanted to remove is still wired by ancient Xilinx PowerPC
> platforms. Here is the thread about it.
> https://lore.kernel.org/linux-next/48d3232d-0f1d-42ea-3109-f44bbabfa...@xilinx.com/
>
> I have been talking about it internally and there is no interest in these
> platforms and it is also orphan for quite a long time. None is really
> running/testing these platforms regularly that's why I think it makes sense
> to remove them also with drivers which are specific to this platform.
>
> U-Boot support was removed in 2017 without anybody complain about it
> https://github.com/Xilinx/u-boot-xlnx/commit/98f705c9cefdfdba62c069821bbba10273a0a8ed
>
> Based on current ppc/next.
>
> If anyone has any objection about it, please let me know.

Thanks for taking the time to find all this code and remove it.

I'm not going to take this series for v5.7, it was posted too close to
the merge window, and doing so wouldn't give people much time to object,
especially given people are distracted at the moment.

I'm happy to take it for v5.8, assuming there's no major objections.

cheers


[PATCH] Fix "[v3, 12/32] powerpc/64s/exception: move KVM test to common code"

2020-03-31 Thread Nicholas Piggin
Moving KVM test to the common entry code missed the case of HMI and MCE,
which do not do __GEN_COMMON_ENTRY (because they don't want to switch
to virt mode).

This means a MCE or HMI exception that is taken while KVM is running a
guest context will not be switched out of that context, and KVM won't
be notified. Found by running sigfuz in guest with patched host on
POWER9 DD2.3, which causes some TM related HMI interrupts (which are
expected and supposed to be handled by KVM).

This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add
the KVM test. This makes them look a little more like other handlers
that all use __GEN_COMMON_ENTRY.

Conflicts with later patches in series:
- powerpc/64s/exception: remove confusing IEARLY option
  Fix: Remove mfspr (H)SRR, keep __GEN_REALMODE_COMMON_ENTRY

- powerpc/64s/exception: trim unused arguments from KVMTEST macro
  Fix: Trim IHSRR IVEC args from the KVMTEST in __GEN_REALMODE_COMMON_ENTRY

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 74809f1b521d..1bc73acceb9a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -451,7 +451,9 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 
 /*
  * __GEN_COMMON_ENTRY is required to receive the branch from interrupt
- * entry, except in the case of the IEARLY handlers.
+ * entry, except in the case of the real-mode handlers which require
+ * __GEN_REALMODE_COMMON_ENTRY.
+ *
  * This switches to virtual mode and sets MSR[RI].
  */
 .macro __GEN_COMMON_ENTRY name
@@ -487,6 +489,18 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
.endif /* IVIRT */
 .endm
 
+/*
+ * Don't switch to virt mode. Used for early MCE and HMI handlers that
+ * want to run in real mode.
+ */
+.macro __GEN_REALMODE_COMMON_ENTRY name
+DEFINE_FIXED_SYMBOL(\name\()_common_real)
+\name\()_common_real:
+   .if IKVM_REAL
+   KVMTEST \name IHSRR IVEC
+   .endif
+.endm
+
 .macro __GEN_COMMON_BODY name
.if IMASK
lbz r10,PACAIRQSOFTMASK(r13)
@@ -976,6 +990,8 @@ EXC_COMMON_BEGIN(machine_check_early_common)
mfspr   r11,SPRN_SRR0
mfspr   r12,SPRN_SRR1
 
+   __GEN_REALMODE_COMMON_ENTRY machine_check_early
+
/*
 * Switch to mc_emergency stack and handle re-entrancy (we limit
 * the nested MCE upto level 4 to avoid stack overflow).
@@ -1831,6 +1847,9 @@ EXC_VIRT_NONE(0x4e60, 0x20)
 EXC_COMMON_BEGIN(hmi_exception_early_common)
mfspr   r11,SPRN_HSRR0  /* Save HSRR0 */
mfspr   r12,SPRN_HSRR1  /* Save HSRR1 */
+
+   __GEN_REALMODE_COMMON_ENTRY hmi_exception_early
+
mr  r10,r1  /* Save r1 */
ld  r1,PACAEMERGSP(r13) /* Use emergency stack for realmode */
subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/
-- 
2.23.0



[PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-03-31 Thread Leonardo Bras
During a crash, there is chance that the cpus that handle the NMI IPI
are holding a spin_lock. If this spin_lock is needed by crashing_cpu it
will cause a deadlock. (rtas.lock and printk logbuf_lock as of today)

This is a problem if the system has kdump set up, given if it crashes
for any reason kdump may not be saved for crash analysis.

After NMI IPI is sent to all other cpus, force unlock all spinlocks
needed for finishing crash routine.

Signed-off-by: Leonardo Bras 

---
Changes from v2:
- Instead of skipping spinlocks, unlock the needed ones.

Changes from v1:
- Exported variable
---
 arch/powerpc/kexec/crash.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index d488311efab1..8d63fca3242c 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The primary CPU waits a while for all secondary CPUs to enter. This is to
@@ -49,6 +50,8 @@ static int time_to_dump;
  */
 int crash_wake_offline;
 
+extern raw_spinlock_t logbuf_lock;
+
 #define CRASH_HANDLER_MAX 3
 /* List of shutdown handles */
 static crash_shutdown_t crash_shutdown_handles[CRASH_HANDLER_MAX];
@@ -129,6 +132,13 @@ static void crash_kexec_prepare_cpus(int cpu)
/* Would it be better to replace the trap vector here? */
 
if (atomic_read(_in_crash) >= ncpus) {
+   /*
+* At this point no other CPU is running, and some of them may
+* have been interrupted while holding one of the locks needed
+* to complete crashing. Free them so there is no deadlock.
+*/
+   arch_spin_unlock(_lock.raw_lock);
+   arch_spin_unlock();
printk(KERN_EMERG "IPI complete\n");
return;
}
-- 
2.25.1



Re: [PATCH v2] powerpc/XIVE: SVM: share the event-queue page with the Hypervisor.

2020-03-31 Thread Thiago Jung Bauermann


Hi Ram,

Ram Pai  writes:

> diff --git a/arch/powerpc/sysdev/xive/spapr.c 
> b/arch/powerpc/sysdev/xive/spapr.c
> index 55dc61c..608b52f 100644
> --- a/arch/powerpc/sysdev/xive/spapr.c
> +++ b/arch/powerpc/sysdev/xive/spapr.c
> @@ -26,6 +26,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "xive-internal.h"
>  
> @@ -501,6 +503,9 @@ static int xive_spapr_configure_queue(u32 target, struct 
> xive_q *q, u8 prio,
>   rc = -EIO;
>   } else {
>   q->qpage = qpage;
> + if (is_secure_guest())
> + uv_share_page(PHYS_PFN(qpage_phys),
> + 1 << xive_alloc_order(order));

If I understand this correctly, you're passing the number of bytes of
the queue to uv_share_page(), but that ultracall expects the number of
pages to be shared.

>   }
>  fail:
>   return rc;
> @@ -534,6 +539,8 @@ static void xive_spapr_cleanup_queue(unsigned int cpu, 
> struct xive_cpu *xc,
>  hw_cpu, prio);
>  
>   alloc_order = xive_alloc_order(xive_queue_shift);
> + if (is_secure_guest())
> + uv_unshare_page(PHYS_PFN(__pa(q->qpage)), 1 << alloc_order);
>   free_pages((unsigned long)q->qpage, alloc_order);
>   q->qpage = NULL;
>  }

Same problem here.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


[PATCH v2] powerpc: Add new HWCAP bits

2020-03-31 Thread Alistair Popple
Two new future architectural features requiring HWCAP bits are being
developed. Once allocated in the kernel firmware can enable these via
device tree cpu features.

Signed-off-by: Alistair Popple 

---
v2: ISA v3.10 -> ISA v3.1
---
 arch/powerpc/include/uapi/asm/cputable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/cputable.h 
b/arch/powerpc/include/uapi/asm/cputable.h
index 540592034740..2692a56bf20b 100644
--- a/arch/powerpc/include/uapi/asm/cputable.h
+++ b/arch/powerpc/include/uapi/asm/cputable.h
@@ -50,6 +50,8 @@
 #define PPC_FEATURE2_DARN  0x0020 /* darn random number insn */
 #define PPC_FEATURE2_SCV   0x0010 /* scv syscall */
 #define PPC_FEATURE2_HTM_NO_SUSPEND0x0008 /* TM w/out suspended state 
*/
+#define PPC_FEATURE2_ARCH_3_1  0x0004 /* ISA 3.1 */
+#define PPC_FEATURE2_MMA   0x0002 /* Matrix Multiply 
Accumulate */

 /*
  * IMPORTANT!
--
2.20.1


Re: [PATCH] powerpc: Add new HWCAP bits

2020-03-31 Thread Alistair Popple
On Wednesday, 1 April 2020 9:47:03 AM AEDT Michael Neuling wrote:
> On Tue, 2020-03-31 at 12:12 -0300, Tulio Magno Quites Machado Filho wrote:
> > Alistair Popple  writes:
> > > diff --git a/arch/powerpc/include/uapi/asm/cputable.h
> > > b/arch/powerpc/include/uapi/asm/cputable.h
> > > index 540592034740..c6fe10b2 100644
> > > --- a/arch/powerpc/include/uapi/asm/cputable.h
> > > +++ b/arch/powerpc/include/uapi/asm/cputable.h
> > > @@ -50,6 +50,8 @@
> > > 
> > >  #define PPC_FEATURE2_DARN0x0020 /* darn random 
> > > number insn */
> > >  #define PPC_FEATURE2_SCV 0x0010 /* scv syscall */
> > >  #define PPC_FEATURE2_HTM_NO_SUSPEND  0x0008 /* TM w/out suspended
> > > 
> > > state */
> > > +#define PPC_FEATURE2_ARCH_3_10   0x0004 /* ISA 3.10 */
> > 
> > I think this should have been:
> > 
> > #define PPC_FEATURE2_ARCH_3_1   0x0004 /* ISA 3.1 */
> 
> Agreed. That's the new name.
> 
> Sorry Al I should have caught that earlier.

No worries, I missed it too. Will send v2.

- Alistair

> Mikey






Re: [PATCH] powerpc: Add new HWCAP bits

2020-03-31 Thread Michael Neuling
On Tue, 2020-03-31 at 12:12 -0300, Tulio Magno Quites Machado Filho wrote:
> Alistair Popple  writes:
> 
> > diff --git a/arch/powerpc/include/uapi/asm/cputable.h
> > b/arch/powerpc/include/uapi/asm/cputable.h
> > index 540592034740..c6fe10b2 100644
> > --- a/arch/powerpc/include/uapi/asm/cputable.h
> > +++ b/arch/powerpc/include/uapi/asm/cputable.h
> > @@ -50,6 +50,8 @@
> >  #define PPC_FEATURE2_DARN  0x0020 /* darn random number insn */
> >  #define PPC_FEATURE2_SCV   0x0010 /* scv syscall */
> >  #define PPC_FEATURE2_HTM_NO_SUSPEND0x0008 /* TM w/out suspended
> > state */
> > +#define PPC_FEATURE2_ARCH_3_10 0x0004 /* ISA 3.10 */
> 
> I think this should have been:
> 
> #define PPC_FEATURE2_ARCH_3_1 0x0004 /* ISA 3.1 */

Agreed. That's the new name.

Sorry Al I should have caught that earlier.

Mikey


[PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA)

2020-03-31 Thread Mike Rapoport
On Mon, Mar 30, 2020 at 11:58:43AM +0200, Michal Hocko wrote:
> 
> What would it take to make ia64 use HAVE_MEMBLOCK_NODE_MAP? I would
> really love to see that thing go away. It is causing problems when
> people try to use memblock api.

Well, it's a small patch in the end :)

Currently all NUMA architectures currently enable
CONFIG_HAVE_MEMBLOCK_NODE_MAP and use free_area_init_nodes() to initialize
nodes and zones structures.

On the other hand, the systems that don't have
CONFIG_HAVE_MEMBLOCK_NODE_MAP use free_area_init() or free_area_init_node()
for this purpose.

With this assumptions, it's possible to make selection of the functions
that calculate spanned and absent pages at runtime.

This patch builds for arm and x86-64 and boots on qemu-system for both.

>From f907df987db4d6735c4940b30cfb4764fc0007d4 Mon Sep 17 00:00:00 2001
From: Mike Rapoport 
Date: Wed, 1 Apr 2020 00:27:17 +0300
Subject: [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option

The CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization
of nodes and zones structures between the systems that have region to node
mapping in memblock and those that don't.

Currently all the NUMA architectures enable this option and for the
non-NUMA systems we can presume that all the memory belongs to node 0 and
therefore the compile time configuration option is not required.

Still, free_area_init_node() must have a backward compatible version
because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
different. Once all the architectures will be converted from
free_area_init() and free_area_init_node() to free_area_init_nodes(), the
entire compatibility layer can be dropped.

Signed-off-by: Mike Rapoport 
---
 include/linux/memblock.h |  15 --
 include/linux/mm.h   |  16 +-
 include/linux/mmzone.h   |  13 -
 mm/memblock.c|   9 +---
 mm/memory_hotplug.c  |   4 --
 mm/page_alloc.c  | 103 ++-
 6 files changed, 61 insertions(+), 99 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 079d17d96410..9de81112447e 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -50,9 +50,7 @@ struct memblock_region {
phys_addr_t base;
phys_addr_t size;
enum memblock_flags flags;
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
int nid;
-#endif
 };
 
 /**
@@ -215,7 +213,6 @@ static inline bool memblock_is_nomap(struct memblock_region 
*m)
return m->flags & MEMBLOCK_NOMAP;
 }
 
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
unsigned long  *end_pfn);
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
@@ -234,7 +231,6 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long 
*out_start_pfn,
 #define for_each_mem_pfn_range(i, nid, p_start, p_end, p_nid)  \
for (i = -1, __next_mem_pfn_range(, nid, p_start, p_end, p_nid); \
 i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid))
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
@@ -310,7 +306,6 @@ void __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
*zone,
for_each_mem_range_rev(i, , , \
   nid, flags, p_start, p_end, p_nid)
 
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 int memblock_set_node(phys_addr_t base, phys_addr_t size,
  struct memblock_type *type, int nid);
 
@@ -323,16 +318,6 @@ static inline int memblock_get_region_node(const struct 
memblock_region *r)
 {
return r->nid;
 }
-#else
-static inline void memblock_set_region_node(struct memblock_region *r, int nid)
-{
-}
-
-static inline int memblock_get_region_node(const struct memblock_region *r)
-{
-   return 0;
-}
-#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
 /* Flags for memblock allocation APIs */
 #define MEMBLOCK_ALLOC_ANYWHERE(~(phys_addr_t)0)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c54fb96cb1e6..368a45d4696a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2125,9 +2125,8 @@ static inline unsigned long get_num_physpages(void)
return phys_pages;
 }
 
-#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
 /*
- * With CONFIG_HAVE_MEMBLOCK_NODE_MAP set, an architecture may initialise its
+ * Using memblock node mappings, an architecture may initialise its
  * zones, allocate the backing mem_map and account for memory holes in a more
  * architecture independent manner. This is a substitute for creating the
  * zone_sizes[] and zholes_size[] arrays and passing them to
@@ -2148,9 +2147,6 @@ static inline unsigned long get_num_physpages(void)
  * registered physical page range.  Similarly
  * sparse_memory_present_with_active_regions() calls memory_present() for
  * each range when SPARSEMEM is enabled.
- *
- * See mm/page_alloc.c 

Re: [PATCH v2 09/11] powerpc/platforms: Move files from 4xx to 44x

2020-03-31 Thread Arnd Bergmann
On Tue, Mar 31, 2020 at 6:19 PM Christophe Leroy
 wrote:
> Le 31/03/2020 à 18:04, Arnd Bergmann a écrit :
> > That has the risk of breaking user's defconfig files, but given the
> > small number of users, it may be nicer for consistency. In either
> > case, the two symbols should probably hang around as synonyms,
> > the question is just which one is user visible.
> >
>
> Not sure it is a good idea to keep two synonyms. In the past we made our
> best to remove synonyms (We had CONFIG_8xx and CONFIG_PPC_8xx being
> synonyms, we had CONFIG_6xx and CONFIG_BOOK3S_32 and
> CONFIG_PPC_STD_MMU_32 being synonyms).
> I think it is a lot cleaner when we can avoid synonyms.

Ok, fair enough.

> By the way I already dropped CONFIG_4xx in previous patch (8/11). It was
> not many 4xx changed to 44x. It would be a lot more in the other way
> round I'm afraid.

Right. Maybe stay with 44x for both then (as in your current patches), as it
means changing less in a part of the code that has few users anyway.

  Arnd


Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-03-31 Thread Segher Boessenkool
On Tue, Mar 31, 2020 at 08:56:23AM +0200, Christophe Leroy wrote:
> While we are at it, can we also remove the 601 ? This one is also full 
> of workarounds and diverges a bit from other 6xx.
> 
> I'm unable to find its end of life date, but it was on the market in 
> 1994, so I guess it must be outdated by more than 10-15 yr old now ?

There probably are still some people running Linux on 601 powermacs.


Segher


Re: [PATCH v2 10/12] powerpc/entry32: Blacklist exception entry points for kprobe.

2020-03-31 Thread Naveen N. Rao

Christophe Leroy wrote:

kprobe does not handle events happening in real mode.

As exception entry points are running with MMU disabled,
blacklist them.

The handling of TLF_NAPPING and TLF_SLEEPING is moved before the
CONFIG_TRACE_IRQFLAGS which contains 'reenable_mmu' because from there
kprobe will be possible as the kernel will run with MMU enabled.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
v2: Moved TLF_NAPPING and TLF_SLEEPING handling
---
 arch/powerpc/kernel/entry_32.S | 37 --
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 94f78c03cb79..215aa3a6d4f7 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -51,6 +51,7 @@ mcheck_transfer_to_handler:
mfspr   r0,SPRN_DSRR1
stw r0,_DSRR1(r11)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(mcheck_transfer_to_handler)

.globl  debug_transfer_to_handler
 debug_transfer_to_handler:
@@ -59,6 +60,7 @@ debug_transfer_to_handler:
mfspr   r0,SPRN_CSRR1
stw r0,_CSRR1(r11)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(debug_transfer_to_handler)

.globl  crit_transfer_to_handler
 crit_transfer_to_handler:
@@ -94,6 +96,7 @@ crit_transfer_to_handler:
rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(crit_transfer_to_handler)
 #endif

 #ifdef CONFIG_40x
@@ -115,6 +118,7 @@ crit_transfer_to_handler:
rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(crit_transfer_to_handler)
 #endif

 /*
@@ -127,6 +131,7 @@ crit_transfer_to_handler:
.globl  transfer_to_handler_full
 transfer_to_handler_full:
SAVE_NVGPRS(r11)
+_ASM_NOKPROBE_SYMBOL(transfer_to_handler_full)
/* fall through */

.globl  transfer_to_handler
@@ -227,6 +232,23 @@ transfer_to_handler_cont:
SYNC
RFI /* jump to handler, enable MMU */

+#if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
+4: rlwinm  r12,r12,0,~_TLF_NAPPING
+   stw r12,TI_LOCAL_FLAGS(r2)
+   b   power_save_ppc32_restore
+
+7: rlwinm  r12,r12,0,~_TLF_SLEEPING
+   stw r12,TI_LOCAL_FLAGS(r2)
+   lwz r9,_MSR(r11)/* if sleeping, clear MSR.EE */
+   rlwinm  r9,r9,0,~MSR_EE
+   lwz r12,_LINK(r11)  /* and return to address in LR */
+   kuap_restore r11, r2, r3, r4, r5
+   lwz r2, GPR2(r11)
+   b   fast_exception_return
+#endif
+_ASM_NOKPROBE_SYMBOL(transfer_to_handler)
+_ASM_NOKPROBE_SYMBOL(transfer_to_handler_cont)
+


A very minor nit is that the above NOKPROBE annotation actually covers 
the block of code below between the label '1:' till 'reenable_mmu', but 
isn't obvious from the code. Splitting off 'reenable_mmu' would have 
made that clear.


You don't have to fix that though -- a kprobe still won't be allowed 
there and anyone interested should be able to look up this mail chain.



- Naveen



Re: [PATCH] powerpc/44x: Make AKEBONO depends on NET

2020-03-31 Thread Christoph Hellwig
Why would a board select a network driver?  That is what defconfig
files are for!  I thin kthe select should just go away.


Re: [PATCH v2 09/11] powerpc/platforms: Move files from 4xx to 44x

2020-03-31 Thread Christophe Leroy




Le 31/03/2020 à 18:04, Arnd Bergmann a écrit :

On Tue, Mar 31, 2020 at 5:26 PM Christophe Leroy
 wrote:

Le 31/03/2020 à 17:14, Arnd Bergmann a écrit :

On Tue, Mar 31, 2020 at 9:49 AM Christophe Leroy
 wrote:


Only 44x uses 4xx now, so only keep one directory.

Signed-off-by: Christophe Leroy 
---
   arch/powerpc/platforms/44x/Makefile   |  9 +++-
   arch/powerpc/platforms/{4xx => 44x}/cpm.c |  0


No objections to moving everything into one place, but I wonder if the
combined name should be 4xx instead of 44x, given that 44x currently
include 46x and 47x. OTOH your approach has the advantage of
moving fewer files.



In that case, should we also rename CONFIG_44x to CONFIG_4xx ?


That has the risk of breaking user's defconfig files, but given the
small number of users, it may be nicer for consistency. In either
case, the two symbols should probably hang around as synonyms,
the question is just which one is user visible.



Not sure it is a good idea to keep two synonyms. In the past we made our 
best to remove synonyms (We had CONFIG_8xx and CONFIG_PPC_8xx being 
synonyms, we had CONFIG_6xx and CONFIG_BOOK3S_32 and 
CONFIG_PPC_STD_MMU_32 being synonyms).

I think it is a lot cleaner when we can avoid synonyms.

By the way I already dropped CONFIG_4xx in previous patch (8/11). It was 
not many 4xx changed to 44x. It would be a lot more in the other way 
round I'm afraid.


But I agree with you it might be more natural to change to 4xx.

Michael, any preference ?

Christophe


Re: [PATCH v2 09/11] powerpc/platforms: Move files from 4xx to 44x

2020-03-31 Thread Arnd Bergmann
On Tue, Mar 31, 2020 at 5:26 PM Christophe Leroy
 wrote:
> Le 31/03/2020 à 17:14, Arnd Bergmann a écrit :
> > On Tue, Mar 31, 2020 at 9:49 AM Christophe Leroy
> >  wrote:
> >>
> >> Only 44x uses 4xx now, so only keep one directory.
> >>
> >> Signed-off-by: Christophe Leroy 
> >> ---
> >>   arch/powerpc/platforms/44x/Makefile   |  9 +++-
> >>   arch/powerpc/platforms/{4xx => 44x}/cpm.c |  0
> >
> > No objections to moving everything into one place, but I wonder if the
> > combined name should be 4xx instead of 44x, given that 44x currently
> > include 46x and 47x. OTOH your approach has the advantage of
> > moving fewer files.
> >
>
> In that case, should we also rename CONFIG_44x to CONFIG_4xx ?

That has the risk of breaking user's defconfig files, but given the
small number of users, it may be nicer for consistency. In either
case, the two symbols should probably hang around as synonyms,
the question is just which one is user visible.

   Arnd


[PATCH v2 12/12] powerpc/entry32: Blacklist exception exit points for kprobe.

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode.

The very last part of exception exits cannot support a trap.
Blacklist them from kprobe.

While we are at it, remove exc_exit_start symbol which is not
used to avoid having to blacklist it.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/kernel/entry_32.S | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 577d17fe0d94..02c81192ba52 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -810,6 +810,7 @@ fast_exception_return:
lwz r11,GPR11(r11)
SYNC
RFI
+_ASM_NOKPROBE_SYMBOL(fast_exception_return)
 
 #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
 /* check if the exception happened in a restartable section */
@@ -1049,6 +1050,8 @@ exc_exit_restart:
 exc_exit_restart_end:
SYNC
RFI
+_ASM_NOKPROBE_SYMBOL(exc_exit_restart)
+_ASM_NOKPROBE_SYMBOL(exc_exit_restart_end)
 
 #else /* !(CONFIG_4xx || CONFIG_BOOKE) */
/*
@@ -1070,7 +1073,6 @@ exc_exit_restart_end:
 exc_exit_restart:
lwz r11,_NIP(r1)
lwz r12,_MSR(r1)
-exc_exit_start:
mtspr   SPRN_SRR0,r11
mtspr   SPRN_SRR1,r12
REST_2GPRS(11, r1)
@@ -1080,6 +1082,7 @@ exc_exit_restart_end:
PPC405_ERR77_SYNC
rfi
b   .   /* prevent prefetch past rfi */
+_ASM_NOKPROBE_SYMBOL(exc_exit_restart)
 
 /*
  * Returning from a critical interrupt in user mode doesn't need
@@ -1193,6 +1196,7 @@ ret_from_crit_exc:
mtspr   SPRN_SRR0,r9;
mtspr   SPRN_SRR1,r10;
RET_FROM_EXC_LEVEL(SPRN_CSRR0, SPRN_CSRR1, PPC_RFCI)
+_ASM_NOKPROBE_SYMBOL(ret_from_crit_exc)
 #endif /* CONFIG_40x */
 
 #ifdef CONFIG_BOOKE
@@ -1204,6 +1208,7 @@ ret_from_crit_exc:
RESTORE_xSRR(SRR0,SRR1);
RESTORE_MMU_REGS;
RET_FROM_EXC_LEVEL(SPRN_CSRR0, SPRN_CSRR1, PPC_RFCI)
+_ASM_NOKPROBE_SYMBOL(ret_from_crit_exc)
 
.globl  ret_from_debug_exc
 ret_from_debug_exc:
@@ -1214,6 +1219,7 @@ ret_from_debug_exc:
RESTORE_xSRR(CSRR0,CSRR1);
RESTORE_MMU_REGS;
RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
+_ASM_NOKPROBE_SYMBOL(ret_from_debug_exc)
 
.globl  ret_from_mcheck_exc
 ret_from_mcheck_exc:
@@ -1225,6 +1231,7 @@ ret_from_mcheck_exc:
RESTORE_xSRR(DSRR0,DSRR1);
RESTORE_MMU_REGS;
RET_FROM_EXC_LEVEL(SPRN_MCSRR0, SPRN_MCSRR1, PPC_RFMCI)
+_ASM_NOKPROBE_SYMBOL(ret_from_mcheck_exc)
 #endif /* CONFIG_BOOKE */
 
 /*
-- 
2.25.0



[PATCH v2 11/12] powerpc/entry32: Blacklist syscall exit points for kprobe.

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode.

The very last part of syscall cannot support a trap.
Add a symbol syscall_exit_finish to identify that part and
blacklist it from kprobe.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/kernel/entry_32.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 215aa3a6d4f7..577d17fe0d94 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -463,6 +463,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
lwz r7,_NIP(r1)
lwz r2,GPR2(r1)
lwz r1,GPR1(r1)
+syscall_exit_finish:
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PERF_EVENTS)
mtspr   SPRN_NRI, r0
 #endif
@@ -470,6 +471,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_NEED_PAIRED_STWCX)
mtspr   SPRN_SRR1,r8
SYNC
RFI
+_ASM_NOKPROBE_SYMBOL(syscall_exit_finish)
 #ifdef CONFIG_44x
 2: li  r7,0
iccci   r0,r0
@@ -604,6 +606,7 @@ ret_from_kernel_syscall:
mtspr   SPRN_SRR1, r10
SYNC
RFI
+_ASM_NOKPROBE_SYMBOL(ret_from_kernel_syscall)
 
 /*
  * The fork/clone functions need to copy the full register set into
-- 
2.25.0



[PATCH v2 10/12] powerpc/entry32: Blacklist exception entry points for kprobe.

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode.

As exception entry points are running with MMU disabled,
blacklist them.

The handling of TLF_NAPPING and TLF_SLEEPING is moved before the
CONFIG_TRACE_IRQFLAGS which contains 'reenable_mmu' because from there
kprobe will be possible as the kernel will run with MMU enabled.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
v2: Moved TLF_NAPPING and TLF_SLEEPING handling
---
 arch/powerpc/kernel/entry_32.S | 37 --
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 94f78c03cb79..215aa3a6d4f7 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -51,6 +51,7 @@ mcheck_transfer_to_handler:
mfspr   r0,SPRN_DSRR1
stw r0,_DSRR1(r11)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(mcheck_transfer_to_handler)
 
.globl  debug_transfer_to_handler
 debug_transfer_to_handler:
@@ -59,6 +60,7 @@ debug_transfer_to_handler:
mfspr   r0,SPRN_CSRR1
stw r0,_CSRR1(r11)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(debug_transfer_to_handler)
 
.globl  crit_transfer_to_handler
 crit_transfer_to_handler:
@@ -94,6 +96,7 @@ crit_transfer_to_handler:
rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(crit_transfer_to_handler)
 #endif
 
 #ifdef CONFIG_40x
@@ -115,6 +118,7 @@ crit_transfer_to_handler:
rlwinm  r0,r1,0,0,(31 - THREAD_SHIFT)
stw r0,KSP_LIMIT(r8)
/* fall through */
+_ASM_NOKPROBE_SYMBOL(crit_transfer_to_handler)
 #endif
 
 /*
@@ -127,6 +131,7 @@ crit_transfer_to_handler:
.globl  transfer_to_handler_full
 transfer_to_handler_full:
SAVE_NVGPRS(r11)
+_ASM_NOKPROBE_SYMBOL(transfer_to_handler_full)
/* fall through */
 
.globl  transfer_to_handler
@@ -227,6 +232,23 @@ transfer_to_handler_cont:
SYNC
RFI /* jump to handler, enable MMU */
 
+#if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
+4: rlwinm  r12,r12,0,~_TLF_NAPPING
+   stw r12,TI_LOCAL_FLAGS(r2)
+   b   power_save_ppc32_restore
+
+7: rlwinm  r12,r12,0,~_TLF_SLEEPING
+   stw r12,TI_LOCAL_FLAGS(r2)
+   lwz r9,_MSR(r11)/* if sleeping, clear MSR.EE */
+   rlwinm  r9,r9,0,~MSR_EE
+   lwz r12,_LINK(r11)  /* and return to address in LR */
+   kuap_restore r11, r2, r3, r4, r5
+   lwz r2, GPR2(r11)
+   b   fast_exception_return
+#endif
+_ASM_NOKPROBE_SYMBOL(transfer_to_handler)
+_ASM_NOKPROBE_SYMBOL(transfer_to_handler_cont)
+
 #ifdef CONFIG_TRACE_IRQFLAGS
 1: /* MSR is changing, re-enable MMU so we can notify lockdep. We need to
 * keep interrupts disabled at this point otherwise we might risk
@@ -272,21 +294,6 @@ reenable_mmu:
bctr/* jump to handler */
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
-#if defined (CONFIG_PPC_BOOK3S_32) || defined(CONFIG_E500)
-4: rlwinm  r12,r12,0,~_TLF_NAPPING
-   stw r12,TI_LOCAL_FLAGS(r2)
-   b   power_save_ppc32_restore
-
-7: rlwinm  r12,r12,0,~_TLF_SLEEPING
-   stw r12,TI_LOCAL_FLAGS(r2)
-   lwz r9,_MSR(r11)/* if sleeping, clear MSR.EE */
-   rlwinm  r9,r9,0,~MSR_EE
-   lwz r12,_LINK(r11)  /* and return to address in LR */
-   kuap_restore r11, r2, r3, r4, r5
-   lwz r2, GPR2(r11)
-   b   fast_exception_return
-#endif
-
 #ifndef CONFIG_VMAP_STACK
 /*
  * On kernel stack overflow, load up an initial stack pointer
-- 
2.25.0



[PATCH v2 09/12] powerpc/32: Blacklist functions running with MMU disabled for kprobe

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/kernel/cpu_setup_6xx.S | 2 ++
 arch/powerpc/kernel/entry_32.S  | 3 +++
 arch/powerpc/kernel/fpu.S   | 1 +
 arch/powerpc/kernel/idle_6xx.S  | 1 +
 arch/powerpc/kernel/idle_e500.S | 1 +
 arch/powerpc/kernel/l2cr_6xx.S  | 1 +
 arch/powerpc/kernel/misc.S  | 2 ++
 arch/powerpc/kernel/misc_32.S   | 2 ++
 arch/powerpc/kernel/swsusp_32.S | 2 ++
 arch/powerpc/kernel/vector.S| 1 +
 10 files changed, 16 insertions(+)

diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S 
b/arch/powerpc/kernel/cpu_setup_6xx.S
index f6517f67265a..f8b5ff64b604 100644
--- a/arch/powerpc/kernel/cpu_setup_6xx.S
+++ b/arch/powerpc/kernel/cpu_setup_6xx.S
@@ -288,6 +288,7 @@ _GLOBAL(__init_fpu_registers)
mtmsr   r10
isync
blr
+_ASM_NOKPROBE_SYMBOL(__init_fpu_registers)
 
 
 /* Definitions for the table use to save CPU states */
@@ -483,4 +484,5 @@ _GLOBAL(__restore_cpu_setup)
 1:
mtcrr7
blr
+_ASM_NOKPROBE_SYMBOL(__restore_cpu_setup)
 
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index e652f6506888..94f78c03cb79 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -313,6 +313,7 @@ stack_ovf:
mtspr   SPRN_SRR1,r10
SYNC
RFI
+_ASM_NOKPROBE_SYMBOL(stack_ovf)
 #endif
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -1337,6 +1338,7 @@ nonrecoverable:
bl  unrecoverable_exception
/* shouldn't return */
b   4b
+_ASM_NOKPROBE_SYMBOL(nonrecoverable)
 
.section .bss
.align  2
@@ -1391,4 +1393,5 @@ _GLOBAL(enter_rtas)
mtspr   SPRN_SRR0,r8
mtspr   SPRN_SRR1,r9
RFI /* return to caller */
+_ASM_NOKPROBE_SYMBOL(enter_rtas)
 #endif /* CONFIG_PPC_RTAS */
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 3235a8da6af7..1dfccf58fbb1 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -119,6 +119,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
/* restore registers and return */
/* we haven't used ctr or xer or lr */
blr
+_ASM_NOKPROBE_SYMBOL(load_up_fpu)
 
 /*
  * save_fpu(tsk)
diff --git a/arch/powerpc/kernel/idle_6xx.S b/arch/powerpc/kernel/idle_6xx.S
index 433d97bea1f3..69df840f7253 100644
--- a/arch/powerpc/kernel/idle_6xx.S
+++ b/arch/powerpc/kernel/idle_6xx.S
@@ -187,6 +187,7 @@ BEGIN_FTR_SECTION
mtspr   SPRN_HID1, r9
 END_FTR_SECTION_IFSET(CPU_FTR_DUAL_PLL_750FX)
b   transfer_to_handler_cont
+_ASM_NOKPROBE_SYMBOL(power_save_ppc32_restore)
 
.data
 
diff --git a/arch/powerpc/kernel/idle_e500.S b/arch/powerpc/kernel/idle_e500.S
index 308f499e146c..72c85b6f3898 100644
--- a/arch/powerpc/kernel/idle_e500.S
+++ b/arch/powerpc/kernel/idle_e500.S
@@ -90,3 +90,4 @@ _GLOBAL(power_save_ppc32_restore)
 #endif
 
b   transfer_to_handler_cont
+_ASM_NOKPROBE_SYMBOL(power_save_ppc32_restore)
diff --git a/arch/powerpc/kernel/l2cr_6xx.S b/arch/powerpc/kernel/l2cr_6xx.S
index 2020d255585f..5f07aa5e9851 100644
--- a/arch/powerpc/kernel/l2cr_6xx.S
+++ b/arch/powerpc/kernel/l2cr_6xx.S
@@ -455,5 +455,6 @@ _GLOBAL(__inval_enable_L1)
sync
 
blr
+_ASM_NOKPROBE_SYMBOL(__inval_enable_L1)
 
 
diff --git a/arch/powerpc/kernel/misc.S b/arch/powerpc/kernel/misc.S
index 65f9f731c229..5be96feccb55 100644
--- a/arch/powerpc/kernel/misc.S
+++ b/arch/powerpc/kernel/misc.S
@@ -36,6 +36,8 @@ _GLOBAL(add_reloc_offset)
add r3,r3,r5
mtlrr0
blr
+_ASM_NOKPROBE_SYMBOL(reloc_offset)
+_ASM_NOKPROBE_SYMBOL(add_reloc_offset)
 
.align  3
 2: PPC_LONG 1b
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index d80212be8698..1edcc41e15fc 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -246,6 +246,7 @@ _GLOBAL(real_readb)
sync
isync
blr
+_ASM_NOKPROBE_SYMBOL(real_readb)
 
/*
  * Do an IO access in real mode
@@ -263,6 +264,7 @@ _GLOBAL(real_writeb)
sync
isync
blr
+_ASM_NOKPROBE_SYMBOL(real_writeb)
 
 #endif /* CONFIG_40x */
 
diff --git a/arch/powerpc/kernel/swsusp_32.S b/arch/powerpc/kernel/swsusp_32.S
index cbdf86228eaa..f73f4d72fea4 100644
--- a/arch/powerpc/kernel/swsusp_32.S
+++ b/arch/powerpc/kernel/swsusp_32.S
@@ -395,6 +395,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
 
li  r3,0
blr
+_ASM_NOKPROBE_SYMBOL(swsusp_arch_resume)
 
 /* FIXME:This construct is actually not useful since we don't shut
  * down the instruction MMU, we could just flip back MSR-DR on.
@@ -406,4 +407,5 @@ turn_on_mmu:
sync
isync
rfi
+_ASM_NOKPROBE_SYMBOL(turn_on_mmu)
 
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 

[PATCH v2 08/12] powerpc/rtas: Remove machine_check_in_rtas()

2020-03-31 Thread Christophe Leroy
machine_check_in_rtas() is just a trap.

Do the trap directly in the machine check exception handler.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/kernel/entry_32.S | 6 --
 arch/powerpc/kernel/head_32.S  | 2 +-
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index a6371fb8f761..e652f6506888 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -1391,10 +1391,4 @@ _GLOBAL(enter_rtas)
mtspr   SPRN_SRR0,r8
mtspr   SPRN_SRR1,r9
RFI /* return to caller */
-
-   .globl  machine_check_in_rtas
-machine_check_in_rtas:
-   twi 31,0,0
-   /* XXX load up BATs and panic */
-
 #endif /* CONFIG_PPC_RTAS */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index daaa153950c2..cbd30cac2496 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -297,7 +297,7 @@ MachineCheck:
cmpwi   cr1, r4, 0
 #endif
beq cr1, machine_check_tramp
-   b   machine_check_in_rtas
+   twi 31, 0, 0
 #else
b   machine_check_tramp
 #endif
-- 
2.25.0



[PATCH v2 07/12] powerpc/32s: Blacklist functions running with MMU disabled for kprobe

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/mm/book3s32/hash_low.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index 2afa3fa2012d..f5f836477009 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -163,6 +163,7 @@ _GLOBAL(hash_page)
stw r0, (mmu_hash_lock - PAGE_OFFSET)@l(r8)
blr
 #endif /* CONFIG_SMP */
+_ASM_NOKPROBE_SYMBOL(hash_page)
 
 /*
  * Add an entry for a particular page to the hash table.
@@ -267,6 +268,7 @@ _GLOBAL(add_hash_page)
lwz r0,4(r1)
mtlrr0
blr
+_ASM_NOKPROBE_SYMBOL(add_hash_page)
 
 /*
  * This routine adds a hardware PTE to the hash table.
@@ -472,6 +474,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 
sync/* make sure pte updates get to memory */
blr
+_ASM_NOKPROBE_SYMBOL(create_hpte)
 
.section .bss
.align  2
@@ -628,6 +631,7 @@ _GLOBAL(flush_hash_pages)
isync
blr
 EXPORT_SYMBOL(flush_hash_pages)
+_ASM_NOKPROBE_SYMBOL(flush_hash_pages)
 
 /*
  * Flush an entry from the TLB
@@ -665,6 +669,7 @@ _GLOBAL(_tlbie)
sync
 #endif /* CONFIG_SMP */
blr
+_ASM_NOKPROBE_SYMBOL(_tlbie)
 
 /*
  * Flush the entire TLB. 603/603e only
@@ -706,3 +711,4 @@ _GLOBAL(_tlbia)
isync
 #endif /* CONFIG_SMP */
blr
+_ASM_NOKPROBE_SYMBOL(_tlbia)
-- 
2.25.0



[PATCH v2 06/12] powerpc/32s: Make local symbols non visible in hash_low.

2020-03-31 Thread Christophe Leroy
In hash_low.S, a lot of named local symbols are used instead of
numbers to ease code readability. However, they don't need to be
visible.

In order to ease blacklisting of functions running with MMU
disabled for kprobe, rename the symbols to .Lsymbols in order
to hide them as if they were numbered labels.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
v2: lisibility ==> readability
---
 arch/powerpc/mm/book3s32/hash_low.S | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index 6d236080cb1a..2afa3fa2012d 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -81,7 +81,7 @@ _GLOBAL(hash_page)
rlwinm. r8,r8,0,0,20/* extract pt base address */
 #endif
 #ifdef CONFIG_SMP
-   beq-hash_page_out   /* return if no mapping */
+   beq-.Lhash_page_out /* return if no mapping */
 #else
/* XXX it seems like the 601 will give a machine fault on the
   rfi if its alignment is wrong (bottom 4 bits of address are
@@ -109,11 +109,11 @@ _GLOBAL(hash_page)
 #if (PTE_FLAGS_OFFSET != 0)
addir8,r8,PTE_FLAGS_OFFSET
 #endif
-retry:
+.Lretry:
lwarx   r6,0,r8 /* get linux-style pte, flag word */
andc.   r5,r3,r6/* check access & ~permission */
 #ifdef CONFIG_SMP
-   bne-hash_page_out   /* return if access not permitted */
+   bne-.Lhash_page_out /* return if access not permitted */
 #else
bnelr-
 #endif
@@ -128,7 +128,7 @@ retry:
 #endif /* CONFIG_SMP */
 #endif /* CONFIG_PTE_64BIT */
stwcx.  r5,0,r8 /* attempt to update PTE */
-   bne-retry   /* retry if someone got there first */
+   bne-.Lretry /* retry if someone got there first */
 
mfsrin  r3,r4   /* get segment reg for segment */
 #ifndef CONFIG_VMAP_STACK
@@ -156,7 +156,7 @@ retry:
 #endif
 
 #ifdef CONFIG_SMP
-hash_page_out:
+.Lhash_page_out:
eieio
lis r8, (mmu_hash_lock - PAGE_OFFSET)@ha
li  r0,0
@@ -358,7 +358,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 1: LDPTEu  r6,HPTE_SIZE(r4)/* get next PTE */
CMPPTE  0,r6,r5
bdnzf   2,1b/* loop while ctr != 0 && !cr0.eq */
-   beq+found_slot
+   beq+.Lfound_slot
 
patch_site  0f, patch__hash_page_B
/* Search the secondary PTEG for a matching PTE */
@@ -370,7 +370,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 2: LDPTEu  r6,HPTE_SIZE(r4)
CMPPTE  0,r6,r5
bdnzf   2,2b
-   beq+found_slot
+   beq+.Lfound_slot
xorir5,r5,PTE_H /* clear H bit again */
 
/* Search the primary PTEG for an empty slot */
@@ -379,7 +379,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 1: LDPTEu  r6,HPTE_SIZE(r4)/* get next PTE */
TST_V(r6)   /* test valid bit */
bdnzf   2,1b/* loop while ctr != 0 && !cr0.eq */
-   beq+found_empty
+   beq+.Lfound_empty
 
/* update counter of times that the primary PTEG is full */
lis r4, (primary_pteg_full - PAGE_OFFSET)@ha
@@ -397,7 +397,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 2: LDPTEu  r6,HPTE_SIZE(r4)
TST_V(r6)
bdnzf   2,2b
-   beq+found_empty
+   beq+.Lfound_empty
xorir5,r5,PTE_H /* clear H bit again */
 
/*
@@ -435,9 +435,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_NEED_COHERENT)
 
 #ifndef CONFIG_SMP
/* Store PTE in PTEG */
-found_empty:
+.Lfound_empty:
STPTE   r5,0(r4)
-found_slot:
+.Lfound_slot:
STPTE   r8,HPTE_SIZE/2(r4)
 
 #else /* CONFIG_SMP */
@@ -458,8 +458,8 @@ found_slot:
  * We do however have to make sure that the PTE is never in an invalid
  * state with the V bit set.
  */
-found_empty:
-found_slot:
+.Lfound_empty:
+.Lfound_slot:
CLR_V(r5,r0)/* clear V (valid) bit in PTE */
STPTE   r5,0(r4)
sync
-- 
2.25.0



[PATCH v2 05/12] powerpc/mem: Blacklist flush_dcache_icache_phys() for kprobe

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/mm/mem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9b4f5fb719e0..bcb6af6ba29a 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -466,6 +467,7 @@ static void flush_dcache_icache_phys(unsigned long physaddr)
: "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
: "ctr", "memory");
 }
+NOKPROBE_SYMBOL(flush_dcache_icache_phys)
 #endif // !defined(CONFIG_PPC_8xx) && !defined(CONFIG_PPC64)
 
 /*
-- 
2.25.0



[PATCH v2 04/12] powerpc/powermac: Blacklist functions running with MMU disabled for kprobe

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/platforms/powermac/cache.S | 2 ++
 arch/powerpc/platforms/powermac/sleep.S | 5 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powermac/cache.S 
b/arch/powerpc/platforms/powermac/cache.S
index da69e0fcb4f1..ced225415486 100644
--- a/arch/powerpc/platforms/powermac/cache.S
+++ b/arch/powerpc/platforms/powermac/cache.S
@@ -184,6 +184,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
 
mtlrr10
blr
+_ASM_NOKPROBE_SYMBOL(flush_disable_75x)
 
 /* This code is for 745x processors */
 flush_disable_745x:
@@ -351,4 +352,5 @@ END_FTR_SECTION_IFSET(CPU_FTR_L3CR)
mtmsr   r11 /* restore DR and EE */
isync
blr
+_ASM_NOKPROBE_SYMBOL(flush_disable_745x)
 #endif /* CONFIG_PPC_BOOK3S_32 */
diff --git a/arch/powerpc/platforms/powermac/sleep.S 
b/arch/powerpc/platforms/powermac/sleep.S
index bd6085b470b7..f9a680fdd9c4 100644
--- a/arch/powerpc/platforms/powermac/sleep.S
+++ b/arch/powerpc/platforms/powermac/sleep.S
@@ -244,7 +244,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_SPEC7450)
mtmsr   r2
isync
b   1b
-
+_ASM_NOKPROBE_SYMBOL(low_cpu_die)
 /*
  * Here is the resume code.
  */
@@ -282,6 +282,7 @@ _GLOBAL(core99_wake_up)
lwz r1,0(r3)
 
/* Pass thru to older resume code ... */
+_ASM_NOKPROBE_SYMBOL(core99_wake_up)
 /*
  * Here is the resume code for older machines.
  * r1 has the physical address of SL_PC(sp).
@@ -429,6 +430,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
lwz r0,4(r1)
mtlrr0
blr
+_ASM_NOKPROBE_SYMBOL(grackle_wake_up)
 
 turn_on_mmu:
mflrr4
@@ -438,6 +440,7 @@ turn_on_mmu:
sync
isync
rfi
+_ASM_NOKPROBE_SYMBOL(turn_on_mmu)
 
 #endif /* defined(CONFIG_PM) || defined(CONFIG_CPU_FREQ) */
 
-- 
2.25.0



[PATCH v2 03/12] powerpc/83xx: Blacklist mpc83xx_deep_resume() for kprobe

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/platforms/83xx/suspend-asm.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/83xx/suspend-asm.S 
b/arch/powerpc/platforms/83xx/suspend-asm.S
index 3acd7470dc5e..bc6bd4d0ae96 100644
--- a/arch/powerpc/platforms/83xx/suspend-asm.S
+++ b/arch/powerpc/platforms/83xx/suspend-asm.S
@@ -548,3 +548,4 @@ mpc83xx_deep_resume:
mtdec   r0
 
rfi
+_ASM_NOKPROBE_SYMBOL(mpc83xx_deep_resume)
-- 
2.25.0



[PATCH v2 02/12] powerpc/82xx: Blacklist pq2_restart() for kprobe

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/platforms/82xx/pq2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/82xx/pq2.c 
b/arch/powerpc/platforms/82xx/pq2.c
index 1cdd5ed9d896..3b5cb39a564c 100644
--- a/arch/powerpc/platforms/82xx/pq2.c
+++ b/arch/powerpc/platforms/82xx/pq2.c
@@ -10,6 +10,8 @@
  * Copyright (c) 2006 MontaVista Software, Inc.
  */
 
+#include 
+
 #include 
 #include 
 #include 
@@ -29,6 +31,7 @@ void __noreturn pq2_restart(char *cmd)
 
panic("Restart failed\n");
 }
+NOKPROBE_SYMBOL(pq2_restart)
 
 #ifdef CONFIG_PCI
 static int pq2_pci_exclude_device(struct pci_controller *hose,
-- 
2.25.0



[PATCH v2 01/12] powerpc/52xx: Blacklist functions running with MMU disabled for kprobe

2020-03-31 Thread Christophe Leroy
kprobe does not handle events happening in real mode, all
functions running with MMU disabled have to be blacklisted.

Signed-off-by: Christophe Leroy 
Acked-by: Naveen N. Rao 
---
 arch/powerpc/platforms/52xx/lite5200_sleep.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/52xx/lite5200_sleep.S 
b/arch/powerpc/platforms/52xx/lite5200_sleep.S
index 3a9969c429b3..70083649c9ea 100644
--- a/arch/powerpc/platforms/52xx/lite5200_sleep.S
+++ b/arch/powerpc/platforms/52xx/lite5200_sleep.S
@@ -248,6 +248,7 @@ mmu_on:
 
 
blr
+_ASM_NOKPROBE_SYMBOL(lite5200_wakeup)
 
 
 /* -- */
@@ -391,6 +392,7 @@ restore_regs:
LOAD_SPRN(TBWU,  0x5b);
 
blr
+_ASM_NOKPROBE_SYMBOL(restore_regs)
 
 
 
-- 
2.25.0



Re: [PATCH v2] macintosh: convert to i2c_new_scanned_device

2020-03-31 Thread Wolfram Sang
On Thu, Mar 26, 2020 at 12:38:19PM +0100, Wolfram Sang wrote:
> Move from the deprecated i2c_new_probed_device() to the new
> i2c_new_scanned_device(). No functional change for this driver because
> it doesn't check the return code anyhow.
> 
> Signed-off-by: Wolfram Sang 
> Acked-by: Michael Ellerman 

Applied to for-next, thanks!



signature.asc
Description: PGP signature


Re: [PATCH v2 09/11] powerpc/platforms: Move files from 4xx to 44x

2020-03-31 Thread Christophe Leroy




Le 31/03/2020 à 17:14, Arnd Bergmann a écrit :

On Tue, Mar 31, 2020 at 9:49 AM Christophe Leroy
 wrote:


Only 44x uses 4xx now, so only keep one directory.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/platforms/44x/Makefile   |  9 +++-
  arch/powerpc/platforms/{4xx => 44x}/cpm.c |  0


No objections to moving everything into one place, but I wonder if the
combined name should be 4xx instead of 44x, given that 44x currently
include 46x and 47x. OTOH your approach has the advantage of
moving fewer files.



In that case, should we also rename CONFIG_44x to CONFIG_4xx ?

Christophe


[RFC WIP PATCH] powerpc/32: system call implement entry/exit logic in C

2020-03-31 Thread Christophe Leroy
That's first try to port PPC64 syscall entry/exit logic in C to PPC32.
I've do the minimum to get it work. I have not reworked calls
to sys_fork() and friends for instance.

For the time being, it seems to work more or less but:
- ping reports EINVAL on recvfrom
- strace shows NULL instead of strings in call like open() for instance.
- the performance is definitively bad

On an 8xx, null_syscall test is about 30% slower after this patch:
- Without the patch: 284 cycles
- With the patch: 371 cycles

@nick and others, any suggestion to fix and improve ?

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/kup.h  |  21 ++
 .../powerpc/include/asm/book3s/64/kup-radix.h |  12 +-
 arch/powerpc/include/asm/hw_irq.h |  15 +
 arch/powerpc/include/asm/kup.h|   2 +
 arch/powerpc/include/asm/nohash/32/kup-8xx.h  |  13 +
 arch/powerpc/kernel/Makefile  |   5 +-
 arch/powerpc/kernel/entry_32.S| 259 ++
 arch/powerpc/kernel/head_32.h |   3 +-
 .../kernel/{syscall_64.c => syscall.c}|  25 +-
 9 files changed, 102 insertions(+), 253 deletions(-)
 rename arch/powerpc/kernel/{syscall_64.c => syscall.c} (97%)

diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index 3c0ba22dc360..c85bc5b56366 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -102,6 +102,27 @@ static inline void kuap_update_sr(u32 sr, u32 addr, u32 
end)
isync();/* Context sync required after mtsrin() */
 }
 
+static inline void kuap_restore(struct pt_regs *regs)
+{
+   u32 kuap = current->thread.kuap;
+   u32 addr = kuap & 0xf000;
+   u32 end = kuap << 28;
+
+   if (unlikely(!kuap))
+   return;
+
+   current->thread.kuap = 0;
+   kuap_update_sr(mfsrin(addr) & ~SR_KS, addr, end);   /* Clear Ks */
+}
+
+static inline void kuap_check(void)
+{
+   if (!IS_ENABLED(CONFIG_PPC_KUAP_DEBUG))
+   return;
+
+   WARN_ON_ONCE(current->thread.kuap != 0);
+}
+
 static __always_inline void allow_user_access(void __user *to, const void 
__user *from,
  u32 size, unsigned long dir)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 3bcef989a35d..1f2716a0dcd8 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -60,13 +60,13 @@
 #include 
 #include 
 
-static inline void kuap_restore_amr(struct pt_regs *regs)
+static inline void kuap_restore(struct pt_regs *regs)
 {
if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
mtspr(SPRN_AMR, regs->kuap);
 }
 
-static inline void kuap_check_amr(void)
+static inline void kuap_check(void)
 {
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
@@ -141,14 +141,6 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long 
address, bool is_write)
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
-#else /* CONFIG_PPC_KUAP */
-static inline void kuap_restore_amr(struct pt_regs *regs)
-{
-}
-
-static inline void kuap_check_amr(void)
-{
-}
 #endif /* CONFIG_PPC_KUAP */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index e0e71777961f..6ccf07de6665 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -321,6 +321,16 @@ static inline void arch_local_irq_disable(void)
mtmsr(mfmsr() & ~MSR_EE);
 }
 
+static inline void arch_local_recovery_disable(void)
+{
+   if (IS_ENABLED(CONFIG_BOOKE))
+   wrtee(0);
+   else if (IS_ENABLED(CONFIG_PPC_8xx))
+   wrtspr(SPRN_NRI);
+   else
+   mtmsr(mfmsr() & ~(MSR_EE | MSR_RI));
+}
+
 static inline void arch_local_irq_enable(void)
 {
if (IS_ENABLED(CONFIG_BOOKE))
@@ -343,6 +353,11 @@ static inline bool arch_irqs_disabled(void)
 
 #define hard_irq_disable() arch_local_irq_disable()
 
+#define __hard_irq_enable()arch_local_irq_enable()
+#define __hard_irq_disable()   arch_local_irq_disable()
+#define __hard_EE_RI_disable() arch_local_recovery_disable()
+#define __hard_RI_enable() arch_local_irq_disable()
+
 static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 {
return !(regs->msr & MSR_EE);
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 92bcd1a26d73..1100c13b6d9e 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -62,6 +62,8 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, 
bool is_write)
 {

Re: [PATCH] powerpc: Add new HWCAP bits

2020-03-31 Thread Tulio Magno Quites Machado Filho
Alistair Popple  writes:

> diff --git a/arch/powerpc/include/uapi/asm/cputable.h 
> b/arch/powerpc/include/uapi/asm/cputable.h
> index 540592034740..c6fe10b2 100644
> --- a/arch/powerpc/include/uapi/asm/cputable.h
> +++ b/arch/powerpc/include/uapi/asm/cputable.h
> @@ -50,6 +50,8 @@
>  #define PPC_FEATURE2_DARN0x0020 /* darn random number insn */
>  #define PPC_FEATURE2_SCV 0x0010 /* scv syscall */
>  #define PPC_FEATURE2_HTM_NO_SUSPEND  0x0008 /* TM w/out suspended state 
> */
> +#define PPC_FEATURE2_ARCH_3_10   0x0004 /* ISA 3.10 */

I think this should have been:

#define PPC_FEATURE2_ARCH_3_1   0x0004 /* ISA 3.1 */

-- 
Tulio Magno


Re: [PATCH v2 09/11] powerpc/platforms: Move files from 4xx to 44x

2020-03-31 Thread Arnd Bergmann
On Tue, Mar 31, 2020 at 9:49 AM Christophe Leroy
 wrote:
>
> Only 44x uses 4xx now, so only keep one directory.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/platforms/44x/Makefile   |  9 +++-
>  arch/powerpc/platforms/{4xx => 44x}/cpm.c |  0

No objections to moving everything into one place, but I wonder if the
combined name should be 4xx instead of 44x, given that 44x currently
include 46x and 47x. OTOH your approach has the advantage of
moving fewer files.

   Arnd


Re: [PATCH v2 07/11] powerpc/xmon: Remove PPC403 and PPC405

2020-03-31 Thread Arnd Bergmann
On Tue, Mar 31, 2020 at 9:49 AM Christophe Leroy
 wrote:
>
> xmon has special support for PPC403 and PPC405 which were part
> of 40x platforms.
>
> 40x platforms are gone, remove support of PPC403 and PPC405 in xmon.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/xmon/ppc-opc.c | 277 +++-
>  arch/powerpc/xmon/ppc.h |   6 -

These files are from binutils, and may get synchronized with changes there
in the future. I'd suggest leaving the code in here for now and instead removing
it from the binutils version first, if they are ready to drop it, too.

 Arnd


Re: [PATCH 0/2] selftests: vm: Build fixes for powerpc64

2020-03-31 Thread Shuah Khan

On 1/30/20 12:01 AM, Sandipan Das wrote:

The second patch was already posted independently but because
of the changes in the first patch, the second one now depends
on it. Hence posting it now as a part of this series.

The last version (v2) of the second patch can be found at:
https://patchwork.ozlabs.org/patch/1225969/

Sandipan Das (2):
   selftests: vm: Do not override definition of ARCH
   selftests: vm: Fix 64-bit test builds for powerpc64le

  tools/testing/selftests/vm/Makefile| 4 ++--
  tools/testing/selftests/vm/run_vmtests | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)



Michael,

I see your tested-by on these two patches. I will take these
through kselftest fixes.

Sorry for the delay. I assumed these will go through ppc64 or
vm.

thanks,
-- Shuah


[PATCH v5 3/4] powerpc/papr_scm, uapi: Add support for handling PAPR DSM commands

2020-03-31 Thread Vaibhav Jain
Implement support for handling PAPR DSM commands in papr_scm
module. We advertise support for ND_CMD_CALL for the dimm command mask
and implement necessary scaffolding in the module to handle ND_CMD_CALL
ioctl and DSM commands that we receive.

The layout of the DSM commands as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_scm_dsm.h' which
defines a new 'struct nd_papr_scm_cmd_pkg' header. This header is used
to communicate the DSM command via 'nd_pkg_papr_scm->nd_command' and
size of payload that need to be sent/received for servicing the DSM.

The PAPR DSM commands are assigned indexes started from 0x1 to
prevent them from overlapping ND_CMD_* values and also makes handling
dimm commands in papr_scm_ndctl(). A new function cmd_to_func() is
implemented that reads the args to papr_scm_ndctl() and performs
sanity tests on them. In case of a DSM command being sent via
ND_CMD_CALL a newly introduced function papr_scm_service_dsm() is
called to handle the request.

Signed-off-by: Vaibhav Jain 

---
Changelog:

v4..v5: Fixed a bug in new implementation of papr_scm_ndctl().

v3..v4: Updated papr_scm_ndctl() to delegate DSM command handling to a
different function papr_scm_service_dsm(). [Aneesh]

v2..v3: Updated the nd_papr_scm_cmd_pkg to use __xx types as its
exported to the userspace [Aneesh]

v1..v2: New patch in the series.
---
 arch/powerpc/include/uapi/asm/papr_scm_dsm.h | 161 +++
 arch/powerpc/platforms/pseries/papr_scm.c|  97 ++-
 2 files changed, 252 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_dsm.h

diff --git a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h 
b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
new file mode 100644
index ..c039a49b41b4
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * PAPR SCM Device specific methods and struct for libndctl and ndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_
+
+#include 
+
+#ifdef __KERNEL__
+#include 
+#else
+#include 
+#endif
+
+/*
+ * DSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * 'envelopes' which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_papr_scm_cmd_pkg' which expects a
+ * payload following it and offset of which relative to the struct is provided
+ * by 'nd_papr_scm_cmd_pkg.payload_offset'. *
+ *
+ *  +-+-+---+
+ *  |   64-Bytes  |   8-Bytes   |   Max 184-Bytes   |
+ *  +-+-+---+
+ *  |   nd_papr_scm_cmd_pkg |   |
+ *  |-+ |   |
+ *  |  nd_cmd_pkg | |   |
+ *  +-+-+---+
+ *  | nd_family   ||   |
+ *  | nd_size_out | cmd_status  |  |
+ *  | nd_size_in  | payload_version |  PAYLOAD |
+ *  | nd_command  | payload_offset ->  |
+ *  | nd_fw_size  | |  |
+ *  +-+-+---+
+ *
+ * DSM Header:
+ *
+ * The header is defined as 'struct nd_papr_scm_cmd_pkg' which embeds a
+ * 'struct nd_cmd_pkg' instance. The DSM command is assigned to member
+ * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelop which is
+ * contained in 'struct nd_cmd_pkg', the header also has members following
+ * members:
+ *
+ * 'cmd_status': (Out) Errors if any encountered while 
servicing DSM.
+ * 'payload_version'   : (In/Out) Version number associated with the payload.
+ * 'payload_offset': (In)Relative offset of payload from start of envelope.
+ *
+ * DSM Payload:
+ *
+ * The layout of the DSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. During
+ * servicing of a DSM the papr_scm module will read input args from the payload
+ * field by casting its contents to an appropriate struct pointer 

[PATCH v5 4/4] powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH

2020-03-31 Thread Vaibhav Jain
This patch implements support for papr_scm command
'DSM_PAPR_SCM_HEALTH' that returns a newly introduced 'struct
nd_papr_scm_dimm_health_stat' instance containing dimm health
information back to user space in response to ND_CMD_CALL. This
functionality is implemented in newly introduced papr_scm_get_health()
that queries the scm-dimm health information and then copies these bitmaps
to the package payload whose layout is defined by 'struct
papr_scm_ndctl_health'.

The patch also introduces a new member a new member 'struct
papr_scm_priv.health' thats an instance of 'struct
nd_papr_scm_dimm_health_stat' to cache the health information of a
scm-dimm. As a result functions drc_pmem_query_health() and
papr_flags_show() are updated to populate and use this new struct
instead of two be64 integers that we earlier used.

Signed-off-by: Vaibhav Jain 
---
Changelog:

v4..v5: None

v3..v4: Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]

v2..v3: Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx'
types as its exported to the userspace [Aneesh]
Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm
health from enum to #defines [Aneesh]

v1..v2: New patch in the series
---
 arch/powerpc/include/uapi/asm/papr_scm_dsm.h |  40 +++
 arch/powerpc/platforms/pseries/papr_scm.c| 109 ---
 2 files changed, 132 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h 
b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
index c039a49b41b4..8265125304ca 100644
--- a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
@@ -132,6 +132,7 @@ struct nd_papr_scm_cmd_pkg {
  */
 enum dsm_papr_scm {
DSM_PAPR_SCM_MIN =  0x1,
+   DSM_PAPR_SCM_HEALTH,
DSM_PAPR_SCM_MAX,
 };
 
@@ -158,4 +159,43 @@ static void *papr_scm_pcmd_to_payload(struct 
nd_papr_scm_cmd_pkg *pcmd)
else
return (void *)((__u8 *) pcmd + pcmd->payload_offset);
 }
+
+/* Various scm-dimm health indicators */
+#define DSM_PAPR_SCM_DIMM_HEALTHY   0
+#define DSM_PAPR_SCM_DIMM_UNHEALTHY 1
+#define DSM_PAPR_SCM_DIMM_CRITICAL  2
+#define DSM_PAPR_SCM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_DSM_PAPR_SMART_HEALTH
+ * Various bitflags indicate the health status of the dimm.
+ *
+ * dimm_unarmed: Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown   : Previous shutdown did not persist contents.
+ * dimm_bad_restore: Contents from previous shutdown werent restored.
+ * dimm_scrubbed   : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted  : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator.
+ */
+struct nd_papr_scm_dimm_health_stat_v1 {
+   __u8 dimm_unarmed;
+   __u8 dimm_bad_shutdown;
+   __u8 dimm_bad_restore;
+   __u8 dimm_scrubbed;
+   __u8 dimm_locked;
+   __u8 dimm_encrypted;
+   __u16 dimm_health;
+};
+
+/*
+ * Typedef the current struct for dimm_health so that any application
+ * or kernel recompiled after introducing a new version automatically
+ * supports the new version.
+ */
+#define nd_papr_scm_dimm_health_stat nd_papr_scm_dimm_health_stat_v1
+
+/* Current version number for the dimm health struct */
+#define ND_PAPR_SCM_DIMM_HEALTH_VERSION 1
+
 #endif /* _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index e8ce96d2249e..ce94762954e0 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -47,8 +47,7 @@ struct papr_scm_priv {
struct mutex dimm_mutex;
 
/* Health information for the dimm */
-   __be64 health_bitmap;
-   __be64 health_bitmap_valid;
+   struct nd_papr_scm_dimm_health_stat health;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -158,6 +157,7 @@ static int drc_pmem_query_health(struct papr_scm_priv *p)
 {
unsigned long ret[PLPAR_HCALL_BUFSIZE];
int64_t rc;
+   __be64 health;
 
rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
if (rc != H_SUCCESS) {
@@ -172,13 +172,41 @@ static int drc_pmem_query_health(struct papr_scm_priv *p)
return rc;
 
/* Store the retrieved health information in dimm platform data */
-   p->health_bitmap = ret[0];
-   p->health_bitmap_valid = ret[1];
+   health = ret[0] & ret[1];
 
dev_dbg(>pdev->dev,
"Queried dimm health info. Bitmap:0x%016llx Mask:0x%016llx\n",
-   be64_to_cpu(p->health_bitmap),
-   be64_to_cpu(p->health_bitmap_valid));
+   be64_to_cpu(ret[0]),
+   be64_to_cpu(ret[1]));
+
+   memset(>health, 0, sizeof(p->health));
+

[PATCH v5 2/4] ndctl/uapi: Introduce NVDIMM_FAMILY_PAPR_SCM as a new NVDIMM DSM family

2020-03-31 Thread Vaibhav Jain
Add PAPR-scm family of DSM command-set to the white list of NVDIMM
command sets.

Signed-off-by: Vaibhav Jain 
---
Changelog:

v4..v5 : None

v3..v4 : None

v2..v3 : Updated the patch prefix to 'ndctl/uapi' [Aneesh]

v1..v2 : None
---
 include/uapi/linux/ndctl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..99fb60600ef8 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
 #define NVDIMM_FAMILY_HPE2 2
 #define NVDIMM_FAMILY_MSFT 3
 #define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR_SCM 5
 
 #define ND_IOCTL_CALL  _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
-- 
2.25.1



[PATCH v5 0/4] powerpc/papr_scm: Add support for reporting nvdimm health

2020-03-31 Thread Vaibhav Jain
The PAPR standard[1][3] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in Ref[2].
Until now these stats were never available nor exposed to the user-space
tools like 'ndctl'. This is partly due to PAPR platform not having support
for ACPI and NFIT. Hence 'ndctl' is unable to query and report the dimm
health status and a user had no way to determine the current health status
of a NDVIMM.

To overcome this limitation, this patch-set updates papr_scm kernel module
to query and fetch nvdimm health stats using hcalls described in Ref[2].
This health and performance stats are then exposed to userspace via syfs
and Dimm-Specific-Methods(DSM) issued by libndctl.

These changes coupled with proposed ndtcl changes located at Ref[4] should
provide a way for the user to retrieve NVDIMM health status using ndtcl.

Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM in
a emulation environment:

 # ndctl list -DH
[
  {
"dev":"nmem0",
"health":{
  "health_state":"fatal",
  "shutdown_state":"dirty"
}
  }
]

Dimm health report output on a pseries guest lpar with vPMEM or HMS
based nvdimms that are in perfectly healthy conditions:

 # ndctl list -d nmem0 -H
[
  {
"dev":"nmem0",
"health":{
  "health_state":"ok",
  "shutdown_state":"clean"
}
  }
]

PAPR Dimm-Specific-Methods(DSM)


As the name suggests DSMs are used by vendor specific code in libndctl to
execute certain operations or fetch certain information for NVDIMMS. DSMs
can be sent to papr_scm module via libndctl (userspace) and libnvdimm
(kernel) using the ND_CMD_CALL ioctl which can be handled in the dimm
control function papr_scm_ndctl(). For PAPR this patchset proposes a single
DSM to retrieve DIMM health, defined in the newly introduced uapi header
named 'papr_scm_dsm.h'. Support for more DSMs will be added in future.

Structure of the patch-set
==

The patchset starts with implementing support for fetching nvdimm health
information from PHYP and partially exposing it to user-space via nvdimm
flags.

Second & Third patches deal with implementing support for servicing DSM
commands papr_scm.

Finally the Fourth patch implements support for servicing DSM
'DSM_PAPR_SCM_HEALTH' that returns the nvdimm health information to
libndctl.

Changelog:
==

v4..v5:

* Fixed a bug in new implementation of papr_scm_ndctl() that was triggering
  a false error condition.

v3..v4:

* Restructured papr_scm_ndctl() to dispatch ND_CMD_CALL commands to a new
  function named papr_scm_service_dsm() to serivice DSM requests. [Aneesh]

v2..v3:

* Updated the papr_scm_dsm.h header to be more confimant general kernel
  guidelines for UAPI headers. [Aneesh]

* Changed the definition of macro PAPR_SCM_DIMM_UNARMED_MASK to not
  include case when the nvdimm is unarmed because its a vPMEM
  nvdimm. [Aneesh]

v1..v2:

* Restructured the patch-set based on review comments on V1 patch-set to
simplify the patch review. Multiple small patches have been combined into
single patches to reduce cross referencing that was needed in earlier
patch-set. Hence most of the patches in this patch-set as now new. [Aneesh]

* Removed the initial work done for fetch nvdimm performance statistics.
These changes will be re-proposed in a separate patch-set. [Aneesh]

* Simplified handling of versioning of 'struct
nd_papr_scm_dimm_health_stat_v1' as only one version of the structure is
currently in existence.

References:
[1]: "Power Architecture Platform Reference"
  https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2]: commit 58b278f568f0
 ("powerpc: Provide initial documentation for PAPR hcalls")
[3]: "Linux on Power Architecture Platform Reference"
 https://members.openpowerfoundation.org/document/dl/469
[4]: https://patchwork.kernel.org/project/linux-nvdimm/list/?series=244625

Vaibhav Jain (4):
  powerpc/papr_scm: Fetch nvdimm health information from PHYP
  ndctl/uapi: Introduce NVDIMM_FAMILY_PAPR_SCM as a new NVDIMM DSM
family
  powerpc/papr_scm,uapi: Add support for handling PAPR DSM commands
  powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH

 arch/powerpc/include/asm/papr_scm.h  |  48 
 arch/powerpc/include/uapi/asm/papr_scm_dsm.h | 201 ++
 arch/powerpc/platforms/pseries/papr_scm.c| 277 ++-
 include/uapi/linux/ndctl.h   |   1 +
 4 files changed, 519 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/include/asm/papr_scm.h
 create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_dsm.h

-- 
2.25.1



[PATCH v5 1/4] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-03-31 Thread Vaibhav Jain
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit big-endian integers which are then stored in 'struct
papr_scm_priv' and subsequently partially exposed to user-space via
newly introduced dimm specific attribute 'papr_flags'. Also a new asm
header named 'papr-scm.h' is added that describes the interface
between PHYP and guest kernel.

Following flags are reported via 'papr_flags' sysfs attribute contents
of which are space separated string flags indicating various nvdimm
states:

 * "not_armed"  : Indicating that nvdimm contents wont survive a power
   cycle.
 * "save_fail"  : Indicating that nvdimm contents couldn't be flushed
   during last shutdown event.
 * "restore_fail": Indicating that nvdimm contents couldn't be restored
   during dimm initialization.
 * "encrypted"  : Dimm contents are encrypted.
 * "smart_notify": There is health event for the nvdimm.
 * "scrubbed"   : Indicating that contents of the nvdimm have been
   scrubbed.
 * "locked" : Indicating that nvdimm contents cant be modified
   until next power cycle.

[1]: commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")

Signed-off-by: Vaibhav Jain 
---
Changelog:

v4..v5 : None

v3..v4 : None

v2..v3 : Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
 NVDIMM unarmed [Aneesh]

v1..v2 : New patch in the series.
---
 arch/powerpc/include/asm/papr_scm.h   |  48 ++
 arch/powerpc/platforms/pseries/papr_scm.c | 105 +-
 2 files changed, 151 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/papr_scm.h

diff --git a/arch/powerpc/include/asm/papr_scm.h 
b/arch/powerpc/include/asm/papr_scm.h
new file mode 100644
index ..868d3360f56a
--- /dev/null
+++ b/arch/powerpc/include/asm/papr_scm.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Structures and defines needed to manage nvdimms for spapr guests.
+ */
+#ifndef _ASM_POWERPC_PAPR_SCM_H_
+#define _ASM_POWERPC_PAPR_SCM_H_
+
+#include 
+#include 
+
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_SCM_DIMM_UNARMED  PPC_BIT(0)
+/* SCM device failed to persist memory contents */
+#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY   PPC_BIT(1)
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN   PPC_BIT(2)
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_SCM_DIMM_EMPTYPPC_BIT(3)
+/* SCM device memory life remaining is critically low */
+#define PAPR_SCM_DIMM_HEALTH_CRITICAL  PPC_BIT(4)
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_SCM_DIMM_HEALTH_FATAL PPC_BIT(5)
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY PPC_BIT(6)
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_SCM_DIMM_HEALTH_NON_CRITICAL  PPC_BIT(7)
+/* SCM device is encrypted */
+#define PAPR_SCM_DIMM_ENCRYPTEDPPC_BIT(8)
+/* SCM device has been scrubbed and locked */
+#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED  PPC_BIT(9)
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED |\
+   PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_SCM_DIMM_BAD_RESTORE_MASK  (PAPR_SCM_DIMM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
+  PAPR_SCM_DIMM_HEALTH_FATAL | \
+  PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
+
+#endif
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 0b4467e378e5..aaf2e4ab1f75 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -14,6 +14,7 @@
 #include 
 
 #include 
+#include 
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -39,6 +40,13 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+   /* Protect dimm data from concurrent access */
+   struct mutex dimm_mutex;
+
+   /* Health information for the dimm */
+   __be64 health_bitmap;
+   __be64 health_bitmap_valid;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +152,35 @@ static int 

Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

2020-03-31 Thread Baoquan He
On 03/31/20 at 04:21pm, Michal Hocko wrote:
> On Tue 31-03-20 22:03:32, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 03/31/20 at 10:55am, Michal Hocko wrote:
> > > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > > Maybe I mis-read the code, but I don't see how this could happen. In the
> > > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > > calculate_node_totalpages() that ensures that node->node_zones are 
> > > > entirely
> > > > within the node because this is checked in zone_spanned_pages_in_node().
> > > 
> > > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > > node boundaries. But that doesn't really tell anything about other
> > > potential zones interleaving with the physical memory range.
> > > zone->spanned_pages simply gives the physical range for the zone
> > > including holes. Interleaving nodes are essentially a hole
> > > (__absent_pages_in_range is going to skip those).
> > > 
> > > That means that when free_area_init_core simply goes over the whole
> > > physical zone range including holes and that is why we need to check
> > > both for physical and logical holes (aka other nodes).
> > > 
> > > The life would be so much easier if the whole thing would simply iterate
> > > over memblocks...
> > 
> > The memblock iterating sounds a great idea. I tried with putting the
> > memblock iterating in the upper layer, memmap_init(), which is used for
> > boot mem only anyway. Do you think it's doable and OK? It yes, I can
> > work out a formal patch to make this simpler as you said. The draft code
> > is as below. Like this it uses the existing code and involves little change.
> 
> Doing this would be a step in the right direction! I haven't checked the
> code very closely though. The below sounds way too simple to be truth I
> am afraid. First for_each_mem_pfn_range is available only for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
> saying that I really hate that being conditional). Also I haven't really
> checked the deferred initialization path - I have a very vague
> recollection that it has been converted to the memblock api but I have
> happilly dropped all that memory.

Thanks for your quick response and pointing out the rest suspect aspects,
I will investigate what you mentioned, see if they impact.

>  
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 138a56c0f48f..558d421f294b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, 
> > int nid, unsigned long zone,
> >  * function.  They do not exist on hotplugged memory.
> >  */
> > if (context == MEMMAP_EARLY) {
> > -   if (!early_pfn_valid(pfn)) {
> > -   pfn = next_pfn(pfn);
> > -   continue;
> > -   }
> > -   if (!early_pfn_in_nid(pfn, nid)) {
> > -   pfn++;
> > -   continue;
> > -   }
> > if (overlap_memmap_init(zone, ))
> > continue;
> > if (defer_init(nid, pfn, end_pfn))
> > @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct 
> > zone *zone)
> >  }
> >  
> >  void __meminit __weak memmap_init(unsigned long size, int nid,
> > - unsigned long zone, unsigned long start_pfn)
> > + unsigned long zone, unsigned long 
> > range_start_pfn)
> >  {
> > -   memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +   unsigned long start_pfn, end_pfn;
> > +   unsigned long range_end_pfn = range_start_pfn + size;
> > +   int i;
> > +   for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {
> > +   start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > +   end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > +   if (end_pfn > start_pfn)
> > +   memmap_init_zone(size, nid, zone, start_pfn, 
> > MEMMAP_EARLY, NULL);
> > +   }
> >  }
> >  
> >  static int zone_batchsize(struct zone *zone)
> 
> -- 
> Michal Hocko
> SUSE Labs
> 



Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

2020-03-31 Thread Michal Hocko
On Tue 31-03-20 22:03:32, Baoquan He wrote:
> Hi Michal,
> 
> On 03/31/20 at 10:55am, Michal Hocko wrote:
> > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > Maybe I mis-read the code, but I don't see how this could happen. In the
> > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > calculate_node_totalpages() that ensures that node->node_zones are 
> > > entirely
> > > within the node because this is checked in zone_spanned_pages_in_node().
> > 
> > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > node boundaries. But that doesn't really tell anything about other
> > potential zones interleaving with the physical memory range.
> > zone->spanned_pages simply gives the physical range for the zone
> > including holes. Interleaving nodes are essentially a hole
> > (__absent_pages_in_range is going to skip those).
> > 
> > That means that when free_area_init_core simply goes over the whole
> > physical zone range including holes and that is why we need to check
> > both for physical and logical holes (aka other nodes).
> > 
> > The life would be so much easier if the whole thing would simply iterate
> > over memblocks...
> 
> The memblock iterating sounds a great idea. I tried with putting the
> memblock iterating in the upper layer, memmap_init(), which is used for
> boot mem only anyway. Do you think it's doable and OK? It yes, I can
> work out a formal patch to make this simpler as you said. The draft code
> is as below. Like this it uses the existing code and involves little change.

Doing this would be a step in the right direction! I haven't checked the
code very closely though. The below sounds way too simple to be truth I
am afraid. First for_each_mem_pfn_range is available only for
CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
saying that I really hate that being conditional). Also I haven't really
checked the deferred initialization path - I have a very vague
recollection that it has been converted to the memblock api but I have
happilly dropped all that memory.
 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 138a56c0f48f..558d421f294b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, 
> int nid, unsigned long zone,
>* function.  They do not exist on hotplugged memory.
>*/
>   if (context == MEMMAP_EARLY) {
> - if (!early_pfn_valid(pfn)) {
> - pfn = next_pfn(pfn);
> - continue;
> - }
> - if (!early_pfn_in_nid(pfn, nid)) {
> - pfn++;
> - continue;
> - }
>   if (overlap_memmap_init(zone, ))
>   continue;
>   if (defer_init(nid, pfn, end_pfn))
> @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct zone 
> *zone)
>  }
>  
>  void __meminit __weak memmap_init(unsigned long size, int nid,
> -   unsigned long zone, unsigned long start_pfn)
> +   unsigned long zone, unsigned long 
> range_start_pfn)
>  {
> - memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> + unsigned long start_pfn, end_pfn;
> + unsigned long range_end_pfn = range_start_pfn + size;
> + int i;
> + for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {
> + start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> + end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> + if (end_pfn > start_pfn)
> + memmap_init_zone(size, nid, zone, start_pfn, 
> MEMMAP_EARLY, NULL);
> + }
>  }
>  
>  static int zone_batchsize(struct zone *zone)

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

2020-03-31 Thread Baoquan He
Hi Michal,

On 03/31/20 at 10:55am, Michal Hocko wrote:
> On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > Maybe I mis-read the code, but I don't see how this could happen. In the
> > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > calculate_node_totalpages() that ensures that node->node_zones are entirely
> > within the node because this is checked in zone_spanned_pages_in_node().
> 
> zone_spanned_pages_in_node does chech the zone boundaries are within the
> node boundaries. But that doesn't really tell anything about other
> potential zones interleaving with the physical memory range.
> zone->spanned_pages simply gives the physical range for the zone
> including holes. Interleaving nodes are essentially a hole
> (__absent_pages_in_range is going to skip those).
> 
> That means that when free_area_init_core simply goes over the whole
> physical zone range including holes and that is why we need to check
> both for physical and logical holes (aka other nodes).
> 
> The life would be so much easier if the whole thing would simply iterate
> over memblocks...

The memblock iterating sounds a great idea. I tried with putting the
memblock iterating in the upper layer, memmap_init(), which is used for
boot mem only anyway. Do you think it's doable and OK? It yes, I can
work out a formal patch to make this simpler as you said. The draft code
is as below. Like this it uses the existing code and involves little change.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 138a56c0f48f..558d421f294b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * function.  They do not exist on hotplugged memory.
 */
if (context == MEMMAP_EARLY) {
-   if (!early_pfn_valid(pfn)) {
-   pfn = next_pfn(pfn);
-   continue;
-   }
-   if (!early_pfn_in_nid(pfn, nid)) {
-   pfn++;
-   continue;
-   }
if (overlap_memmap_init(zone, ))
continue;
if (defer_init(nid, pfn, end_pfn))
@@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct zone 
*zone)
 }
 
 void __meminit __weak memmap_init(unsigned long size, int nid,
- unsigned long zone, unsigned long start_pfn)
+ unsigned long zone, unsigned long 
range_start_pfn)
 {
-   memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
+   unsigned long start_pfn, end_pfn;
+   unsigned long range_end_pfn = range_start_pfn + size;
+   int i;
+   for_each_mem_pfn_range(i, nid, _pfn, _pfn, NULL) {
+   start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
+   end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
+   if (end_pfn > start_pfn)
+   memmap_init_zone(size, nid, zone, start_pfn, 
MEMMAP_EARLY, NULL);
+   }
 }
 
 static int zone_batchsize(struct zone *zone)



[PATCH v2] powerpc/perf: Add documentation around use of "ppc_set_pmu_inuse" in PMU core-book3s

2020-03-31 Thread Athira Rajeev
From: Madhavan Srinivasan 

"pmcregs_in_use" flag is part of lppaca (Virtual Process Area),
which is used to indicate whether Performance Monitoring Unit (PMU) and
PMU sprs are in-use and whether should it be saved/restored by
hypervisor. ppc_set_pmu_inuse() is used to set/unset the VPA
flag "pmcregs_in_use". "pmcregs_in_use" flag is set in
"power_pmu_enable" via ppc_set_pmu_inuse(1) and it is unset
when there are no active events (n_events == 0 condition).

Patch here adds documentation on the ppc_set_pmu_inuse() usage.

Signed-off-by: Madhavan Srinivasan 
Signed-off-by: Athira Rajeev 
---
Changes in v2:
- Corrected the patch author information

 arch/powerpc/perf/core-book3s.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3086055..48bfdc9 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1285,6 +1285,11 @@ static void power_pmu_enable(struct pmu *pmu)
goto out;
 
if (cpuhw->n_events == 0) {
+   /*
+* Indicate PMU not in-use to Hypervisor.
+* We end-up here via "ctx_sched_out()" from common code and
+* "power_pmu_del()".
+*/
ppc_set_pmu_inuse(0);
goto out;
}
@@ -1341,6 +1346,11 @@ static void power_pmu_enable(struct pmu *pmu)
 * Write the new configuration to MMCR* with the freeze
 * bit set and set the hardware events to their initial values.
 * Then unfreeze the events.
+* ppc_set_pmu_inuse(1): "power_pmu_enable" will unset the
+* "pmcregs_in_use" flag when a previous profiling/sampling session
+* is completed and un-setting of flag will notify the Hypervisor to
+* drop save/restore of PMU sprs. Now that PMU need to be enabled, first
+* set the "pmcregs_in_use" flag in VPA.
 */
ppc_set_pmu_inuse(1);
mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
-- 
1.8.3.1



Re: [PATCH] powerpc/perf: Add documentation around use of "ppc_set_pmu_inuse" in PMU core-book3s

2020-03-31 Thread Athira Rajeev
Hi, 

Please ignore this version as I messed up with the author information. I am 
sending a V2 with the proper author name.

Thanks
Athira

> On 30-Mar-2020, at 5:08 PM, Athira Rajeev  wrote:
> 
> "pmcregs_in_use" flag is part of lppaca (Virtual Process Area),
> which is used to indicate whether Performance Monitoring Unit (PMU) and
> PMU sprs are in-use and whether should it be saved/restored by
> hypervisor. ppc_set_pmu_inuse() is used to set/unset the VPA
> flag "pmcregs_in_use". "pmcregs_in_use" flag is set in
> "power_pmu_enable" via ppc_set_pmu_inuse(1) and it is unset
> when there are no active events (n_events == 0 condition).
> 
> Patch here adds documentation on the ppc_set_pmu_inuse() usage.
> 
> Signed-off-by: Madhavan Srinivasan 
> Signed-off-by: Athira Rajeev 
> ---
> arch/powerpc/perf/core-book3s.c | 10 ++
> 1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 3086055..48bfdc9 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -1285,6 +1285,11 @@ static void power_pmu_enable(struct pmu *pmu)
>   goto out;
> 
>   if (cpuhw->n_events == 0) {
> + /*
> +  * Indicate PMU not in-use to Hypervisor.
> +  * We end-up here via "ctx_sched_out()" from common code and
> +  * "power_pmu_del()".
> +  */
>   ppc_set_pmu_inuse(0);
>   goto out;
>   }
> @@ -1341,6 +1346,11 @@ static void power_pmu_enable(struct pmu *pmu)
>* Write the new configuration to MMCR* with the freeze
>* bit set and set the hardware events to their initial values.
>* Then unfreeze the events.
> +  * ppc_set_pmu_inuse(1): "power_pmu_enable" will unset the
> +  * "pmcregs_in_use" flag when a previous profiling/sampling session
> +  * is completed and un-setting of flag will notify the Hypervisor to
> +  * drop save/restore of PMU sprs. Now that PMU need to be enabled, first
> +  * set the "pmcregs_in_use" flag in VPA.
>*/
>   ppc_set_pmu_inuse(1);
>   mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
> -- 
> 1.8.3.1
> 



Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

2020-03-31 Thread Gerald Schaefer
On Tue, 24 Mar 2020 10:52:52 +0530
Anshuman Khandual  wrote:

> This series adds more arch page table helper tests. The new tests here are
> either related to core memory functions and advanced arch pgtable helpers.
> This also creates a documentation file enlisting all expected semantics as
> suggested by Mike Rapoport (https://lkml.org/lkml/2020/1/30/40).
> 
> This series has been tested on arm64 and x86 platforms. There is just one
> expected failure on arm64 that will be fixed when we enable THP migration.
> 
> [   21.741634] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:782
> 
> which corresponds to
> 
> WARN_ON(!pmd_present(pmd_mknotpresent(pmd_mkhuge(pmd
> 
> There are many TRANSPARENT_HUGEPAGE and ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD
> ifdefs scattered across the test. But consolidating all the fallback stubs
> is not very straight forward because ARCH_HAS_TRANSPARENT_HUGEPAGE_PUD is
> not explicitly dependent on ARCH_HAS_TRANSPARENT_HUGEPAGE.
> 
> This series has been build tested on many platforms including the ones that
> subscribe the test through ARCH_HAS_DEBUG_VM_PGTABLE.
> 

Hi Anshuman,

thanks for the update. There are a couple of issues on s390, some might
also affect other archs.

1) The pxd_huge_tests are using pxd_set/clear_huge, which defaults to
returning 0 if !CONFIG_HAVE_ARCH_HUGE_VMAP. As result, the checks for
!pxd_test/clear_huge in the pxd_huge_tests will always trigger the
warning. This should affect all archs w/o CONFIG_HAVE_ARCH_HUGE_VMAP.
Could be fixed like this:

@@ -923,8 +923,10 @@ void __init debug_vm_pgtable(void)
pmd_leaf_tests(pmd_aligned, prot);
pud_leaf_tests(pud_aligned, prot);
 
-   pmd_huge_tests(pmdp, pmd_aligned, prot);
-   pud_huge_tests(pudp, pud_aligned, prot);
+   if (IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) {
+   pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pud_huge_tests(pudp, pud_aligned, prot);
+   }
 
pte_savedwrite_tests(pte_aligned, prot);
pmd_savedwrite_tests(pmd_aligned, prot);

BTW, please add some comments to the various #ifdef/#else stuff, especially
when the different parts are far away and/or nested.

2) The hugetlb_advanced_test will fail because it directly de-references
huge *ptep pointers instead of using huge_ptep_get() for this. We have
very different pagetable entry layout for pte and (large) pmd on s390,
and unfortunately the whole hugetlbfs code is using pte_t instead of pmd_t
like THP. For this reason, huge_ptep_get() was introduced, which will
return a "converted" pte, because directly reading from a *ptep (pointing
to a large pmd) will not return a proper pte. Only ARM has also an
implementation of huge_ptep_get(), so they could be affected, depending
on what exactly they need it for.

Could be fixed like this (the first de-reference is a bit special,
because at that point *ptep does not really point to a large (pmd) entry
yet, it is initially an invalid pte entry, which breaks our huge_ptep_get()
conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE,
because we do have some special bits there in our large pmds. It seems
to also work w/o that alignment, but it feels a bit wrong):

@@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test
  unsigned long vaddr, pgprot_t prot)
 {
struct page *page = pfn_to_page(pfn);
-   pte_t pte = READ_ONCE(*ptep);
+   pte_t pte;

-   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+   pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot));
set_huge_pte_at(mm, vaddr, ptep, pte);
barrier();
WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
huge_pte_clear(mm, vaddr, ptep, PMD_SIZE);
-   pte = READ_ONCE(*ptep);
+   pte = huge_ptep_get(ptep);
WARN_ON(!huge_pte_none(pte));
 
pte = mk_huge_pte(page, prot);
set_huge_pte_at(mm, vaddr, ptep, pte);
huge_ptep_set_wrprotect(mm, vaddr, ptep);
-   pte = READ_ONCE(*ptep);
+   pte = huge_ptep_get(ptep);
WARN_ON(huge_pte_write(pte));
 
pte = mk_huge_pte(page, prot);
set_huge_pte_at(mm, vaddr, ptep, pte);
huge_ptep_get_and_clear(mm, vaddr, ptep);
-   pte = READ_ONCE(*ptep);
+   pte = huge_ptep_get(ptep);
WARN_ON(!huge_pte_none(pte));
 
pte = mk_huge_pte(page, prot);
@@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test
pte = huge_pte_mkwrite(pte);
pte = huge_pte_mkdirty(pte);
huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
-   pte = READ_ONCE(*ptep);
+   pte = huge_ptep_get(ptep);
WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
 }
 #else

3) The pmd_protnone_tests() has an issue, because it passes a pmd to
pmd_protnone() which has not been marked as large. We check for large
pmd in the s390 implementation of pmd_protnone(), and will fail if a
pmd is not large. We had similar issues before, in other helpers, where
I 

Re: [RFC/PATCH 0/3] Add support for stop instruction inside KVM guest

2020-03-31 Thread Gautham R Shenoy
On Tue, Mar 31, 2020 at 05:40:55PM +0530, Gautham R. Shenoy wrote:
> From: "Gautham R. Shenoy" 
> 
> 
>  *** RFC Only. Not intended for inclusion 
> 
> Motivation
> ~~~
> 
> The POWER ISA v3.0 allows stop instruction to be executed from a Guest
> Kernel (HV=0,PR=0) context. If the hypervisor has cleared
> PSSCR[ESL|EC] bits, then the stop instruction thus executed will cause
> the vCPU thread to "pause", thereby donating its cycles to the other
> threads in the core until the paused thread is woken up by an
> interrupt. If the hypervisor has set the PSSCR[ESL|EC] bits, then
> execution of the "stop" instruction will raise a Hypervisor Facility
> Unavailable exception.
> 
> The stop idle state in the guest (henceforth referred to as stop0lite)
> when enabled
> 
> * has a very small wakeup latency (1-3us) comparable to that of
>   snooze and considerably better compared the Shared CEDE state
>   (25-30us).  Results are provided below for wakeup latency measured
>   by waking up an idle CPU in a given state using a timer as well as
>   using an IPI.
> 
>   ==
>   Wakeup Latency measured using a timer (in ns) [Lower is better]
>   ==
>   Idle state |  Nr samples |  Min| Max| Median | Avg   | Stddev|
>   ==
>   snooze |   60|  787| 1059   |  938   | 937.4 | 42.27 |
>   ==
>   stop0lite  |   60|  770| 1182   |  948   | 946.4 | 67.41 |
>   ==
>   Shared CEDE|   60| 9550| 36694  | 29219  |28564.1|3545.9 |
>   ==
>

Posted two copies of Wakeup latency measured by timer. Here is the
wakeup latency measured using an IPI.


==
Wakeup latency measured using an IPI (in ns) [Lower is better]
==
Idle state |  Nr|  Min| Max| Median | Avg | Stddev   |
   |samples | ||| |  |
--
snooze |   60   | 2089|4228|2259|  2342.31|316.56|
--
stop0lite  |   60   | 1947|3674|2653|  2610.57|266.73|
--
Shared CEDE|   60   |20147|   36305|   21827| 26762.65|   6875.01|
==

--
Thanks and Regards
gautham.


[RFC/PATCH 3/3] cpuidle/pseries: Add stop0lite state

2020-03-31 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

The POWER ISA v3.0 allows stop instruction to be executed from a
HV=0,PR=0 context. If the PSSCR[ESL|EC] bits are cleared, then the
stop instruction thus executed will cause the thread to pause, thereby
donating its cycles to the other threads in the core until the paused
thread is woken up by an interrupt.

In this patch we define a cpuidle state for pseries guests named
stop0lite. This has a latency and residency intermediate to that of
snooze and CEDE. While snooze has non-existent latency, it consumes
the CPU cycles without contributing to anything useful. CEDE on the
other hand requires a full VM exit, which can result in some other
vCPU being scheduled on this physical CPU thereby delaying the
scheduling of the CEDEd vCPU back. In such cases, when the expected
idle duration is small (1-20us), the vCPU can go to this stop0lite
state which provides a nice intermediate state between snooze and
CEDE.

Signed-off-by: Gautham R. Shenoy 
---
 drivers/cpuidle/cpuidle-pseries.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index 74c2479..9c8c18d 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct cpuidle_driver pseries_idle_driver = {
.name = "pseries_idle",
@@ -170,6 +171,26 @@ static int shared_cede_loop(struct cpuidle_device *dev,
.enter = _cede_loop },
 };
 
+
+
+static int stop_loop(struct cpuidle_device *dev,
+struct cpuidle_driver *drv,
+int index)
+{
+   unsigned long srr1 = 0;
+
+   if (!prep_irq_for_idle_irqsoff())
+   return index;
+
+   __ppc64_runlatch_off();
+   asm volatile("stop");
+   __ppc64_runlatch_on();
+   fini_irq_for_idle_irqsoff();
+   irq_set_pending_from_srr1(srr1);
+
+   return index;
+}
+
 /*
  * States for shared partition case.
  */
@@ -180,6 +201,12 @@ static int shared_cede_loop(struct cpuidle_device *dev,
.exit_latency = 0,
.target_residency = 0,
.enter = _loop },
+   { /* stop0_lite */
+   .name = "stop0lite",
+   .desc = "Pauses the CPU",
+   .exit_latency = 2,
+   .target_residency=20,
+   .enter = _loop },
{ /* Shared Cede */
.name = "Shared Cede",
.desc = "Shared Cede",
-- 
1.9.4



[RFC/PATCH 0/3] Add support for stop instruction inside KVM guest

2020-03-31 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 


 *** RFC Only. Not intended for inclusion 
 
Motivation
~~~

The POWER ISA v3.0 allows stop instruction to be executed from a Guest
Kernel (HV=0,PR=0) context. If the hypervisor has cleared
PSSCR[ESL|EC] bits, then the stop instruction thus executed will cause
the vCPU thread to "pause", thereby donating its cycles to the other
threads in the core until the paused thread is woken up by an
interrupt. If the hypervisor has set the PSSCR[ESL|EC] bits, then
execution of the "stop" instruction will raise a Hypervisor Facility
Unavailable exception.

The stop idle state in the guest (henceforth referred to as stop0lite)
when enabled

* has a very small wakeup latency (1-3us) comparable to that of
  snooze and considerably better compared the Shared CEDE state
  (25-30us).  Results are provided below for wakeup latency measured
  by waking up an idle CPU in a given state using a timer as well as
  using an IPI.

  ==
  Wakeup Latency measured using a timer (in ns) [Lower is better]
  ==
  Idle state |  Nr samples |  Min| Max| Median | Avg   | Stddev|
  ==
  snooze |   60|  787| 1059   |  938   | 937.4 | 42.27 |
  ==
  stop0lite  |   60|  770| 1182   |  948   | 946.4 | 67.41 |
  ==
  Shared CEDE|   60| 9550| 36694  | 29219  |28564.1|3545.9 |
  ==

  ==
  Wakeup Latency measured using a timer (in ns) [Lower is better]
  ==
  Idle state |  Nr samples |  Min| Max| Median | Avg   | Stddev|
  ==
  snooze |   60|  787| 1059   |  938   | 937.4 | 42.27 |
  ==
  stop0lite  |   60|  770| 1182   |  948   | 946.4 | 67.41 |
  ==
  Shared CEDE|   60| 9550| 36694  | 29219  |28564.1|3545.9 |
  ==

* provides an improved single threaded performance compared to snooze
  since the idle state completely relinquishes the core cycles. The
  single threaded performance is observed to be better even when
  compared to "Shared CEDE", since in the latter case something else
  can scheduled on the ceded CPU, while "stop0lite" doesn't give up
  the CPU.

  On a KVM guest with smp 8,sockets=1,cores=2,threads=4 with vCPUs of
  a vCore bound to a physical core, we run a single-threaded ebizzy
  pinned to one of the guest vCPUs while the sibling vCPUs in the core
  are idling. We enable only one guest idle state at a time to measure
  the single-threaded performance benefit that the idle state provides
  by giving up the core resources to the non-idle thread. we obtain
  ~13% improvement in the throughput compared to that with "snooze"
  and ~8% improvement in the throughput compared to "Shared CEDE".
   
   ===
   | ebizzy records/s : [Higher the better]  |
   ===
   |Idle state |  Nr|  Min| Max| Median | Avg | Stddev   |
   |   |samples | ||| |  |
   ===  
   
   |snooze |   10   |  1378988| 1379358| 1379032|1379067.3|113.47|
   ===
   |stop0lite  |   10   |  1561836| 1562058| 1561906|1561927.5| 81.87|
   ===
   |Shared CEDE|   10   |  1446584| 1447383| 1447037|1447009.0|244.16|
   ===

Is stop0lite a replacement for snooze ?
~~~
Not yet. snooze is a polling state, and can respond much faster to a
need_resched() compared to stop0lite which needs an IPI to wakeup from
idle state. This can be seen in the results below:

With the context_switch2 pipe test, we can see that with stop0lite,
the number of context switches are 32.47% lesser than with
snooze. This is due to the fact that snooze is a polling state which
polls for TIF_NEED_RESCHED. Thus it does not require an interrupt to
exit the state and start executing the scheduler code. However,
stop0lite 

[RFC/PATCH 2/3] pseries/kvm: Clear PSSCR[ESL|EC] bits before guest entry

2020-03-31 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

ISA v3.0 allows the guest to execute a stop instruction. For this, the
PSSCR[ESL|EC] bits need to be cleared by the hypervisor before
scheduling in the guest vCPU.

Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits
set. This patch changes the behaviour to enter the guest with
PSSCR[ESL|EC] bits cleared. This is a RFC patch where we
unconditionally clear these bits. Ideally this should be done
conditionally on platforms where the guest stop instruction has no
Bugs (starting POWER9 DD2.3).

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/kvm/book3s_hv.c|  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cdb7224..36d059a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
*vcpu, u64 time_limit,
mtspr(SPRN_IC, vcpu->arch.ic);
mtspr(SPRN_PID, vcpu->arch.pid);
 
-   mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
+   mtspr(SPRN_PSSCR, (vcpu->arch.psscr  & ~(PSSCR_EC | PSSCR_ESL)) |
  (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
 
mtspr(SPRN_HFSCR, vcpu->arch.hfscr);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index dbc2fec..c2daec3 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
mtspr   SPRN_PID, r7
mtspr   SPRN_WORT, r8
 BEGIN_FTR_SECTION
+   /* POWER9-only registers */
+   ld  r5, VCPU_TID(r4)
+   ld  r6, VCPU_PSSCR(r4)
+   lbz r8, HSTATE_FAKE_SUSPEND(r13)
+   lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */
+   andcr6, r6, r7
+   rldimi  r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
+   ld  r7, VCPU_HFSCR(r4)
+   mtspr   SPRN_TIDR, r5
+   mtspr   SPRN_PSSCR, r6
+   mtspr   SPRN_HFSCR, r7
+FTR_SECTION_ELSE
/* POWER8-only registers */
ld  r5, VCPU_TCSCR(r4)
ld  r6, VCPU_ACOP(r4)
@@ -833,18 +845,7 @@ BEGIN_FTR_SECTION
mtspr   SPRN_CSIGR, r7
mtspr   SPRN_TACR, r8
nop
-FTR_SECTION_ELSE
-   /* POWER9-only registers */
-   ld  r5, VCPU_TID(r4)
-   ld  r6, VCPU_PSSCR(r4)
-   lbz r8, HSTATE_FAKE_SUSPEND(r13)
-   orisr6, r6, PSSCR_EC@h  /* This makes stop trap to HV */
-   rldimi  r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
-   ld  r7, VCPU_HFSCR(r4)
-   mtspr   SPRN_TIDR, r5
-   mtspr   SPRN_PSSCR, r6
-   mtspr   SPRN_HFSCR, r7
-ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
 8:
 
ld  r5, VCPU_SPRG0(r4)
-- 
1.9.4



[RFC/PATCH 1/3] powerpc/kvm: Handle H_FAC_UNAVAIL when guest executes stop.

2020-03-31 Thread Gautham R. Shenoy
From: "Gautham R. Shenoy" 

If a guest executes a stop instruction when the hypervisor has set the
PSSCR[ESL|EC] bits, the processor will throw an Hypervisor Facility
Unavailable exception. Currently when we receive this exception, we
only check if the exeception is generated due to a doorbell
instruction, in which case we emulate it. For all other cases,
including the case when the guest executes a stop-instruction, the
hypervisor sends a PROGILL to the guest program, which results in a
guest crash.

This patch adds code to handle the case when the hypervisor receives a
H_FAC_UNAVAIL exception due to guest executing the stop
instruction. The hypervisor increments the pc to the next instruction
and resumes the guest as expected by the semantics of the
PSSCR[ESL|EC] = 0 stop instruction.

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/include/asm/reg.h | 1 +
 arch/powerpc/kvm/book3s_hv.c   | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index da5cab0..2568c18 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -399,6 +399,7 @@
 /* HFSCR and FSCR bit numbers are the same */
 #define FSCR_SCV_LG12  /* Enable System Call Vectored */
 #define FSCR_MSGP_LG   10  /* Enable MSGP */
+#define FSCR_STOP_LG9   /* Enable stop states */
 #define FSCR_TAR_LG8   /* Enable Target Address Register */
 #define FSCR_EBB_LG7   /* Enable Event Based Branching */
 #define FSCR_TM_LG 5   /* Enable Transactional Memory */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 33be4d9..cdb7224 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1419,7 +1419,11 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
if (((vcpu->arch.hfscr >> 56) == FSCR_MSGP_LG) &&
cpu_has_feature(CPU_FTR_ARCH_300))
r = kvmppc_emulate_doorbell_instr(vcpu);
-   if (r == EMULATE_FAIL) {
+   else if (((vcpu->arch.hfscr >> 56) == FSCR_STOP_LG) &&
+   cpu_has_feature(CPU_FTR_ARCH_300)) {
+   kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
+   r = RESUME_GUEST;
+   } else if (r == EMULATE_FAIL) {
kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
r = RESUME_GUEST;
}
-- 
1.9.4



Re: [PATCH v2 2/2] powerpc: Remove Xilinx PPC405/PPC440 support

2020-03-31 Thread Bartlomiej Zolnierkiewicz


On 3/30/20 3:32 PM, Michal Simek wrote:
> The latest Xilinx design tools called ISE and EDK has been released in
> October 2013. New tool doesn't support any PPC405/PPC440 new designs.
> These platforms are no longer supported and tested.
> 
> PowerPC 405/440 port is orphan from 2013 by
> commit cdeb89943bfc ("MAINTAINERS: Fix incorrect status tag") and
> commit 19624236cce1 ("MAINTAINERS: Update Grant's email address and 
> maintainership")
> that's why it is time to remove the support fot these platforms.
> 
> Signed-off-by: Michal Simek 
> Acked-by: Arnd Bergmann 

Acked-by: Bartlomiej Zolnierkiewicz  # for fbdev

> ---
> 
> Changes in v2:
> - Based on my chat with Arnd I removed arch/powerpc/xmon/ changes done in
>   v1 to keep them the same as before. (kbuild reported some issues with it
>   too)
> 
>  Documentation/devicetree/bindings/xilinx.txt | 143 --
>  Documentation/powerpc/bootwrapper.rst|  28 +-
>  MAINTAINERS  |   6 -
>  arch/powerpc/Kconfig.debug   |   2 +-
>  arch/powerpc/boot/Makefile   |   7 +-
>  arch/powerpc/boot/dts/Makefile   |   1 -
>  arch/powerpc/boot/dts/virtex440-ml507.dts| 406 
>  arch/powerpc/boot/dts/virtex440-ml510.dts| 466 ---
>  arch/powerpc/boot/ops.h  |   1 -
>  arch/powerpc/boot/serial.c   |   5 -
>  arch/powerpc/boot/uartlite.c |  79 
>  arch/powerpc/boot/virtex.c   |  97 
>  arch/powerpc/boot/virtex405-head.S   |  31 --
>  arch/powerpc/boot/wrapper|   8 -
>  arch/powerpc/configs/40x/virtex_defconfig|  75 ---
>  arch/powerpc/configs/44x/virtex5_defconfig   |  74 ---
>  arch/powerpc/configs/ppc40x_defconfig|   8 -
>  arch/powerpc/configs/ppc44x_defconfig|   8 -
>  arch/powerpc/include/asm/xilinx_intc.h   |  16 -
>  arch/powerpc/include/asm/xilinx_pci.h|  21 -
>  arch/powerpc/kernel/cputable.c   |  39 --
>  arch/powerpc/platforms/40x/Kconfig   |  31 --
>  arch/powerpc/platforms/40x/Makefile  |   1 -
>  arch/powerpc/platforms/40x/virtex.c  |  54 ---
>  arch/powerpc/platforms/44x/Kconfig   |  37 --
>  arch/powerpc/platforms/44x/Makefile  |   2 -
>  arch/powerpc/platforms/44x/virtex.c  |  60 ---
>  arch/powerpc/platforms/44x/virtex_ml510.c|  30 --
>  arch/powerpc/platforms/Kconfig   |   4 -
>  arch/powerpc/sysdev/Makefile |   2 -
>  arch/powerpc/sysdev/xilinx_intc.c|  88 
>  arch/powerpc/sysdev/xilinx_pci.c | 132 --
>  drivers/char/Kconfig |   2 +-
>  drivers/video/fbdev/Kconfig  |   2 +-
>  34 files changed, 7 insertions(+), 1959 deletions(-)
>  delete mode 100644 arch/powerpc/boot/dts/virtex440-ml507.dts
>  delete mode 100644 arch/powerpc/boot/dts/virtex440-ml510.dts
>  delete mode 100644 arch/powerpc/boot/uartlite.c
>  delete mode 100644 arch/powerpc/boot/virtex.c
>  delete mode 100644 arch/powerpc/boot/virtex405-head.S
>  delete mode 100644 arch/powerpc/configs/40x/virtex_defconfig
>  delete mode 100644 arch/powerpc/configs/44x/virtex5_defconfig
>  delete mode 100644 arch/powerpc/include/asm/xilinx_intc.h
>  delete mode 100644 arch/powerpc/include/asm/xilinx_pci.h
>  delete mode 100644 arch/powerpc/platforms/40x/virtex.c
>  delete mode 100644 arch/powerpc/platforms/44x/virtex.c
>  delete mode 100644 arch/powerpc/platforms/44x/virtex_ml510.c
>  delete mode 100644 arch/powerpc/sysdev/xilinx_intc.c
>  delete mode 100644 arch/powerpc/sysdev/xilinx_pci.c
Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R Institute Poland
Samsung Electronics


[PATCH v4 4/4] powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH

2020-03-31 Thread Vaibhav Jain
This patch implements support for papr_scm command
'DSM_PAPR_SCM_HEALTH' that returns a newly introduced 'struct
nd_papr_scm_dimm_health_stat' instance containing dimm health
information back to user space in response to ND_CMD_CALL. This
functionality is implemented in newly introduced papr_scm_get_health()
that queries the scm-dimm health information and then copies these bitmaps
to the package payload whose layout is defined by 'struct
papr_scm_ndctl_health'.

The patch also introduces a new member a new member 'struct
papr_scm_priv.health' thats an instance of 'struct
nd_papr_scm_dimm_health_stat' to cache the health information of a
scm-dimm. As a result functions drc_pmem_query_health() and
papr_flags_show() are updated to populate and use this new struct
instead of two be64 integers that we earlier used.

Signed-off-by: Vaibhav Jain 
---
Changelog:

v3..v4: Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]

v2..v3: Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx'
types as its exported to the userspace [Aneesh]
Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm
health from enum to #defines [Aneesh]

v1..v2: New patch in the series
---
 arch/powerpc/include/uapi/asm/papr_scm_dsm.h |  40 +++
 arch/powerpc/platforms/pseries/papr_scm.c| 109 ---
 2 files changed, 132 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h 
b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
index c039a49b41b4..8265125304ca 100644
--- a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
@@ -132,6 +132,7 @@ struct nd_papr_scm_cmd_pkg {
  */
 enum dsm_papr_scm {
DSM_PAPR_SCM_MIN =  0x1,
+   DSM_PAPR_SCM_HEALTH,
DSM_PAPR_SCM_MAX,
 };
 
@@ -158,4 +159,43 @@ static void *papr_scm_pcmd_to_payload(struct 
nd_papr_scm_cmd_pkg *pcmd)
else
return (void *)((__u8 *) pcmd + pcmd->payload_offset);
 }
+
+/* Various scm-dimm health indicators */
+#define DSM_PAPR_SCM_DIMM_HEALTHY   0
+#define DSM_PAPR_SCM_DIMM_UNHEALTHY 1
+#define DSM_PAPR_SCM_DIMM_CRITICAL  2
+#define DSM_PAPR_SCM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_DSM_PAPR_SMART_HEALTH
+ * Various bitflags indicate the health status of the dimm.
+ *
+ * dimm_unarmed: Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown   : Previous shutdown did not persist contents.
+ * dimm_bad_restore: Contents from previous shutdown werent restored.
+ * dimm_scrubbed   : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted  : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator.
+ */
+struct nd_papr_scm_dimm_health_stat_v1 {
+   __u8 dimm_unarmed;
+   __u8 dimm_bad_shutdown;
+   __u8 dimm_bad_restore;
+   __u8 dimm_scrubbed;
+   __u8 dimm_locked;
+   __u8 dimm_encrypted;
+   __u16 dimm_health;
+};
+
+/*
+ * Typedef the current struct for dimm_health so that any application
+ * or kernel recompiled after introducing a new version automatically
+ * supports the new version.
+ */
+#define nd_papr_scm_dimm_health_stat nd_papr_scm_dimm_health_stat_v1
+
+/* Current version number for the dimm health struct */
+#define ND_PAPR_SCM_DIMM_HEALTH_VERSION 1
+
 #endif /* _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index e415a3c0d89e..62b20ef66b33 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -47,8 +47,7 @@ struct papr_scm_priv {
struct mutex dimm_mutex;
 
/* Health information for the dimm */
-   __be64 health_bitmap;
-   __be64 health_bitmap_valid;
+   struct nd_papr_scm_dimm_health_stat health;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -158,6 +157,7 @@ static int drc_pmem_query_health(struct papr_scm_priv *p)
 {
unsigned long ret[PLPAR_HCALL_BUFSIZE];
int64_t rc;
+   __be64 health;
 
rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
if (rc != H_SUCCESS) {
@@ -172,13 +172,41 @@ static int drc_pmem_query_health(struct papr_scm_priv *p)
return rc;
 
/* Store the retrieved health information in dimm platform data */
-   p->health_bitmap = ret[0];
-   p->health_bitmap_valid = ret[1];
+   health = ret[0] & ret[1];
 
dev_dbg(>pdev->dev,
"Queried dimm health info. Bitmap:0x%016llx Mask:0x%016llx\n",
-   be64_to_cpu(p->health_bitmap),
-   be64_to_cpu(p->health_bitmap_valid));
+   be64_to_cpu(ret[0]),
+   be64_to_cpu(ret[1]));
+
+   memset(>health, 0, sizeof(p->health));
+
+   /* 

[PATCH v4 3/4] powerpc/papr_scm, uapi: Add support for handling PAPR DSM commands

2020-03-31 Thread Vaibhav Jain
Implement support for handling PAPR DSM commands in papr_scm
module. We advertise support for ND_CMD_CALL for the dimm command mask
and implement necessary scaffolding in the module to handle ND_CMD_CALL
ioctl and DSM commands that we receive.

The layout of the DSM commands as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_scm_dsm.h' which
defines a new 'struct nd_papr_scm_cmd_pkg' header. This header is used
to communicate the DSM command via 'nd_pkg_papr_scm->nd_command' and
size of payload that need to be sent/received for servicing the DSM.

The PAPR DSM commands are assigned indexes started from 0x1 to
prevent them from overlapping ND_CMD_* values and also makes handling
dimm commands in papr_scm_ndctl(). A new function cmd_to_func() is
implemented that reads the args to papr_scm_ndctl() and performs
sanity tests on them. In case of a DSM command being sent via
ND_CMD_CALL a newly introduced function papr_scm_service_dsm() is
called to handle the request.

Signed-off-by: Vaibhav Jain 

---
Changelog:

v3..v4: Updated papr_scm_ndctl() to delegate DSM command handling to a
different function papr_scm_service_dsm(). [Aneesh]

v2..v3: Updated the nd_papr_scm_cmd_pkg to use __xx types as its
exported to the userspace [Aneesh]

v1..v2: New patch in the series.
---
 arch/powerpc/include/uapi/asm/papr_scm_dsm.h | 161 +++
 arch/powerpc/platforms/pseries/papr_scm.c|  97 ++-
 2 files changed, 252 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_dsm.h

diff --git a/arch/powerpc/include/uapi/asm/papr_scm_dsm.h 
b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
new file mode 100644
index ..c039a49b41b4
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_scm_dsm.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * PAPR SCM Device specific methods and struct for libndctl and ndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_SCM_DSM_H_
+
+#include 
+
+#ifdef __KERNEL__
+#include 
+#else
+#include 
+#endif
+
+/*
+ * DSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * 'envelopes' which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_papr_scm_cmd_pkg' which expects a
+ * payload following it and offset of which relative to the struct is provided
+ * by 'nd_papr_scm_cmd_pkg.payload_offset'. *
+ *
+ *  +-+-+---+
+ *  |   64-Bytes  |   8-Bytes   |   Max 184-Bytes   |
+ *  +-+-+---+
+ *  |   nd_papr_scm_cmd_pkg |   |
+ *  |-+ |   |
+ *  |  nd_cmd_pkg | |   |
+ *  +-+-+---+
+ *  | nd_family   ||   |
+ *  | nd_size_out | cmd_status  |  |
+ *  | nd_size_in  | payload_version |  PAYLOAD |
+ *  | nd_command  | payload_offset ->  |
+ *  | nd_fw_size  | |  |
+ *  +-+-+---+
+ *
+ * DSM Header:
+ *
+ * The header is defined as 'struct nd_papr_scm_cmd_pkg' which embeds a
+ * 'struct nd_cmd_pkg' instance. The DSM command is assigned to member
+ * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelop which is
+ * contained in 'struct nd_cmd_pkg', the header also has members following
+ * members:
+ *
+ * 'cmd_status': (Out) Errors if any encountered while 
servicing DSM.
+ * 'payload_version'   : (In/Out) Version number associated with the payload.
+ * 'payload_offset': (In)Relative offset of payload from start of envelope.
+ *
+ * DSM Payload:
+ *
+ * The layout of the DSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. During
+ * servicing of a DSM the papr_scm module will read input args from the payload
+ * field by casting its contents to an appropriate struct pointer based on the
+ * DSM command. Similarly the output of servicing 

[PATCH v4 2/4] ndctl/uapi: Introduce NVDIMM_FAMILY_PAPR_SCM as a new NVDIMM DSM family

2020-03-31 Thread Vaibhav Jain
Add PAPR-scm family of DSM command-set to the white list of NVDIMM
command sets.

Signed-off-by: Vaibhav Jain 
---
Changelog:

v3..v4 : None

v2..v3 : Updated the patch prefix to 'ndctl/uapi' [Aneesh]

v1..v2 : None
---
 include/uapi/linux/ndctl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..99fb60600ef8 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
 #define NVDIMM_FAMILY_HPE2 2
 #define NVDIMM_FAMILY_MSFT 3
 #define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR_SCM 5
 
 #define ND_IOCTL_CALL  _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
-- 
2.24.1



[PATCH v4 1/4] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-03-31 Thread Vaibhav Jain
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit big-endian integers which are then stored in 'struct
papr_scm_priv' and subsequently partially exposed to user-space via
newly introduced dimm specific attribute 'papr_flags'. Also a new asm
header named 'papr-scm.h' is added that describes the interface
between PHYP and guest kernel.

Following flags are reported via 'papr_flags' sysfs attribute contents
of which are space separated string flags indicating various nvdimm
states:

 * "not_armed"  : Indicating that nvdimm contents wont survive a power
   cycle.
 * "save_fail"  : Indicating that nvdimm contents couldn't be flushed
   during last shutdown event.
 * "restore_fail": Indicating that nvdimm contents couldn't be restored
   during dimm initialization.
 * "encrypted"  : Dimm contents are encrypted.
 * "smart_notify": There is health event for the nvdimm.
 * "scrubbed"   : Indicating that contents of the nvdimm have been
   scrubbed.
 * "locked" : Indicating that nvdimm contents cant be modified
   until next power cycle.

[1]: commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")

Signed-off-by: Vaibhav Jain 
---
Changelog:

v3..v4 : None

v2..v3 : Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
 NVDIMM unarmed [Aneesh]

v1..v2 : New patch in the series.
---
 arch/powerpc/include/asm/papr_scm.h   |  48 ++
 arch/powerpc/platforms/pseries/papr_scm.c | 105 +-
 2 files changed, 151 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/papr_scm.h

diff --git a/arch/powerpc/include/asm/papr_scm.h 
b/arch/powerpc/include/asm/papr_scm.h
new file mode 100644
index ..868d3360f56a
--- /dev/null
+++ b/arch/powerpc/include/asm/papr_scm.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Structures and defines needed to manage nvdimms for spapr guests.
+ */
+#ifndef _ASM_POWERPC_PAPR_SCM_H_
+#define _ASM_POWERPC_PAPR_SCM_H_
+
+#include 
+#include 
+
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_SCM_DIMM_UNARMED  PPC_BIT(0)
+/* SCM device failed to persist memory contents */
+#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY   PPC_BIT(1)
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN   PPC_BIT(2)
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_SCM_DIMM_EMPTYPPC_BIT(3)
+/* SCM device memory life remaining is critically low */
+#define PAPR_SCM_DIMM_HEALTH_CRITICAL  PPC_BIT(4)
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_SCM_DIMM_HEALTH_FATAL PPC_BIT(5)
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY PPC_BIT(6)
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_SCM_DIMM_HEALTH_NON_CRITICAL  PPC_BIT(7)
+/* SCM device is encrypted */
+#define PAPR_SCM_DIMM_ENCRYPTEDPPC_BIT(8)
+/* SCM device has been scrubbed and locked */
+#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED  PPC_BIT(9)
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED |\
+   PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_SCM_DIMM_BAD_RESTORE_MASK  (PAPR_SCM_DIMM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
+  PAPR_SCM_DIMM_HEALTH_FATAL | \
+  PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
+
+#endif
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 0b4467e378e5..aaf2e4ab1f75 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -14,6 +14,7 @@
 #include 
 
 #include 
+#include 
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -39,6 +40,13 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+   /* Protect dimm data from concurrent access */
+   struct mutex dimm_mutex;
+
+   /* Health information for the dimm */
+   __be64 health_bitmap;
+   __be64 health_bitmap_valid;
 };
 
 static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +152,35 @@ static int drc_pmem_query_n_bind(struct 

[PATCH v4 0/4] powerpc/papr_scm: Add support for reporting nvdimm health

2020-03-31 Thread Vaibhav Jain
The PAPR standard[1][3] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in Ref[2].
Until now these stats were never available nor exposed to the user-space
tools like 'ndctl'. This is partly due to PAPR platform not having support
for ACPI and NFIT. Hence 'ndctl' is unable to query and report the dimm
health status and a user had no way to determine the current health status
of a NDVIMM.

To overcome this limitation, this patch-set updates papr_scm kernel module
to query and fetch nvdimm health stats using hcalls described in Ref[2].
This health and performance stats are then exposed to userspace via syfs
and Dimm-Specific-Methods(DSM) issued by libndctl.

These changes coupled with proposed ndtcl changes located at Ref[4] should
provide a way for the user to retrieve NVDIMM health status using ndtcl.

Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM in
a emulation environment:

 # ndctl list -DH
[
  {
"dev":"nmem0",
"health":{
  "health_state":"fatal",
  "shutdown_state":"dirty"
}
  }
]

Dimm health report output on a pseries guest lpar with vPMEM or HMS
based nvdimms that are in perfectly healthy conditions:

 # ndctl list -d nmem0 -H
[
  {
"dev":"nmem0",
"health":{
  "health_state":"ok",
  "shutdown_state":"clean"
}
  }
]

PAPR Dimm-Specific-Methods(DSM)


As the name suggests DSMs are used by vendor specific code in libndctl to
execute certain operations or fetch certain information for NVDIMMS. DSMs
can be sent to papr_scm module via libndctl (userspace) and libnvdimm
(kernel) using the ND_CMD_CALL ioctl which can be handled in the dimm
control function papr_scm_ndctl(). For PAPR this patchset proposes a single
DSM to retrieve DIMM health, defined in the newly introduced uapi header
named 'papr_scm_dsm.h'. Support for more DSMs will be added in future.

Structure of the patch-set
==

The patchset starts with implementing support for fetching nvdimm health
information from PHYP and partially exposing it to user-space via nvdimm
flags.

Second & Third patches deal with implementing support for servicing DSM
commands papr_scm.

Finally the Fourth patch implements support for servicing DSM
'DSM_PAPR_SCM_HEALTH' that returns the nvdimm health information to
libndctl.

Changelog:
==

v3..v4:

* Restructured papr_scm_ndctl() to dispatch ND_CMD_CALL commands to a new
  function named papr_scm_service_dsm() to serivice DSM requests. [Aneesh]

v2..v3:

* Updated the papr_scm_dsm.h header to be more confimant general kernel
  guidelines for UAPI headers. [Aneesh]

* Changed the definition of macro PAPR_SCM_DIMM_UNARMED_MASK to not
  include case when the nvdimm is unarmed because its a vPMEM
  nvdimm. [Aneesh]

v1..v2:

* Restructured the patch-set based on review comments on V1 patch-set to
simplify the patch review. Multiple small patches have been combined into
single patches to reduce cross referencing that was needed in earlier
patch-set. Hence most of the patches in this patch-set as now new. [Aneesh]

* Removed the initial work done for fetch nvdimm performance statistics.
These changes will be re-proposed in a separate patch-set. [Aneesh]

* Simplified handling of versioning of 'struct
nd_papr_scm_dimm_health_stat_v1' as only one version of the structure is
currently in existence.

References:
[1]: "Power Architecture Platform Reference"
  https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2]: commit 58b278f568f0
 ("powerpc: Provide initial documentation for PAPR hcalls")
[3]: "Linux on Power Architecture Platform Reference"
 https://members.openpowerfoundation.org/document/dl/469
[4]: https://patchwork.kernel.org/project/linux-nvdimm/list/?series=244625

Vaibhav Jain (4):
  powerpc/papr_scm: Fetch nvdimm health information from PHYP
  ndctl/uapi: Introduce NVDIMM_FAMILY_PAPR_SCM as a new NVDIMM DSM
family
  powerpc/papr_scm,uapi: Add support for handling PAPR DSM commands
  powerpc/papr_scm: Implement support for DSM_PAPR_SCM_HEALTH

 arch/powerpc/include/asm/papr_scm.h  |  48 
 arch/powerpc/include/uapi/asm/papr_scm_dsm.h | 201 ++
 arch/powerpc/platforms/pseries/papr_scm.c| 277 ++-
 include/uapi/linux/ndctl.h   |   1 +
 4 files changed, 519 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/include/asm/papr_scm.h
 create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_dsm.h

-- 
2.24.1



Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-03-31 Thread Christophe Leroy




Le 31/03/2020 à 12:04, Michal Simek a écrit :

On 31. 03. 20 11:49, Christophe Leroy wrote:



Le 31/03/2020 à 09:19, Christophe Leroy a écrit :



Le 31/03/2020 à 08:59, Michal Simek a écrit :

On 31. 03. 20 8:56, Christophe Leroy wrote:



Le 31/03/2020 à 07:30, Michael Ellerman a écrit :

Christophe Leroy  writes:

Le 27/03/2020 à 15:14, Andy Shevchenko a écrit :

On Fri, Mar 27, 2020 at 02:22:55PM +0100, Arnd Bergmann wrote:

On Fri, Mar 27, 2020 at 2:15 PM Andy Shevchenko
 wrote:

On Fri, Mar 27, 2020 at 03:10:26PM +0200, Andy Shevchenko wrote:

On Fri, Mar 27, 2020 at 01:54:33PM +0100, Arnd Bergmann wrote:

On Fri, Mar 27, 2020 at 1:12 PM Michal Simek
 wrote:

...


It does raise a follow-up question about ppc40x though: is it
time to
retire all of it?


Who knows?

I have in possession nice WD My Book Live, based on this
architecture, and I
won't it gone from modern kernel support. OTOH I understand that
amount of real
users not too big.


+Cc: Christian Lamparter, whom I owe for that WD box.


According to https://openwrt.org/toh/wd/mybooklive, that one is
based on
APM82181/ppc464, so it is about several generations newer than
what I
asked about (ppc40x).


Ah, and I have Amiga board, but that one is being used only for
testing, so,
I don't care much.


I think there are a couple of ppc440 based Amiga boards, but again,
not 405
to my knowledge.


Ah, you are right. No objections from ppc40x removal!


Removing 40x would help cleaning things a bit. For instance 40x is
the
last platform still having PTE_ATOMIC_UPDATES. So if we can remove
40x
we can get rid of PTE_ATOMIC_UPDATES completely.

If no one objects, I can prepare a series to drop support for 40x
completely.

Michael, any thought ?


I have no attachment to 40x, and I'd certainly be happy to have less
code in the tree, we struggle to keep even the modern platforms well
maintained.

At the same time I don't want to render anyone's hardware obsolete
unnecessarily. But if there's really no one using 40x then we should
remove it, it could well be broken already.

So I guess post a series to do the removal and we'll see if anyone
speaks up.



Ok, series sent out, see
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167757


ok. I see you have done it completely independently of my patchset.
Would be better if you can base it on the top of my 2 patches because
they are in conflict now and I need to also remove virtex 44x platform
also with alsa driver.



I can't see your first patch, only the second one shows up in the
series, see
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167757




Ok, I found your first patch on another patchwork, it doesn't touch any
file in arch/powerpc/


There was just driver dependency on symbol which is removed by 2/2.
Let's see what you get from kbuild if any symbol is removed but still
used in drivers folder.


Nothing bad apparently, see build test at 
http://kisskb.ellerman.id.au/kisskb/head/a4890e3fb046950e9a62dc3eff5b37469551e823/


Christophe


Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-03-31 Thread Michal Simek
On 31. 03. 20 11:49, Christophe Leroy wrote:
> 
> 
> Le 31/03/2020 à 09:19, Christophe Leroy a écrit :
>>
>>
>> Le 31/03/2020 à 08:59, Michal Simek a écrit :
>>> On 31. 03. 20 8:56, Christophe Leroy wrote:


 Le 31/03/2020 à 07:30, Michael Ellerman a écrit :
> Christophe Leroy  writes:
>> Le 27/03/2020 à 15:14, Andy Shevchenko a écrit :
>>> On Fri, Mar 27, 2020 at 02:22:55PM +0100, Arnd Bergmann wrote:
 On Fri, Mar 27, 2020 at 2:15 PM Andy Shevchenko
  wrote:
> On Fri, Mar 27, 2020 at 03:10:26PM +0200, Andy Shevchenko wrote:
>> On Fri, Mar 27, 2020 at 01:54:33PM +0100, Arnd Bergmann wrote:
>>> On Fri, Mar 27, 2020 at 1:12 PM Michal Simek
>>>  wrote:
>>> ...
>>>
>>> It does raise a follow-up question about ppc40x though: is it
>>> time to
>>> retire all of it?
>>
>> Who knows?
>>
>> I have in possession nice WD My Book Live, based on this
>> architecture, and I
>> won't it gone from modern kernel support. OTOH I understand that
>> amount of real
>> users not too big.
>
> +Cc: Christian Lamparter, whom I owe for that WD box.

 According to https://openwrt.org/toh/wd/mybooklive, that one is
 based on
 APM82181/ppc464, so it is about several generations newer than
 what I
 asked about (ppc40x).

>> Ah, and I have Amiga board, but that one is being used only for
>> testing, so,
>> I don't care much.

 I think there are a couple of ppc440 based Amiga boards, but again,
 not 405
 to my knowledge.
>>>
>>> Ah, you are right. No objections from ppc40x removal!
>>
>> Removing 40x would help cleaning things a bit. For instance 40x is
>> the
>> last platform still having PTE_ATOMIC_UPDATES. So if we can remove
>> 40x
>> we can get rid of PTE_ATOMIC_UPDATES completely.
>>
>> If no one objects, I can prepare a series to drop support for 40x
>> completely.
>>
>> Michael, any thought ?
>
> I have no attachment to 40x, and I'd certainly be happy to have less
> code in the tree, we struggle to keep even the modern platforms well
> maintained.
>
> At the same time I don't want to render anyone's hardware obsolete
> unnecessarily. But if there's really no one using 40x then we should
> remove it, it could well be broken already.
>
> So I guess post a series to do the removal and we'll see if anyone
> speaks up.
>

 Ok, series sent out, see
 https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167757
>>>
>>> ok. I see you have done it completely independently of my patchset.
>>> Would be better if you can base it on the top of my 2 patches because
>>> they are in conflict now and I need to also remove virtex 44x platform
>>> also with alsa driver.
>>>
>>
>> I can't see your first patch, only the second one shows up in the
>> series, see
>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167757
>>
> 
> 
> Ok, I found your first patch on another patchwork, it doesn't touch any
> file in arch/powerpc/

There was just driver dependency on symbol which is removed by 2/2.
Let's see what you get from kbuild if any symbol is removed but still
used in drivers folder.

> 
> I sent a v2 series with your powerpc patch as patch 2/11
> 
> See https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167766

Thanks,
Michal




[PATCH v19 24/24] selftests: vm: pkeys: Fix multilib builds for x86

2020-03-31 Thread Sandipan Das
This ensures that both 32-bit and 64-bit binaries are generated
when this is built on a x86_64 system. Most of the changes have
been borrowed from tools/testing/selftests/x86/Makefile.

Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
Tested-by: Dave Hansen 
---
 tools/testing/selftests/vm/Makefile | 72 +
 1 file changed, 72 insertions(+)

diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 4e9c741be6af2..82031f84af212 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -18,7 +18,30 @@ TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
+
+ifeq ($(ARCH),x86_64)
+CAN_BUILD_I386 := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_32bit_program.c -m32)
+CAN_BUILD_X86_64 := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_64bit_program.c)
+CAN_BUILD_WITH_NOPIE := $(shell ./../x86/check_cc.sh $(CC) 
../x86/trivial_program.c -no-pie)
+
+TARGETS := protection_keys
+BINARIES_32 := $(TARGETS:%=%_32)
+BINARIES_64 := $(TARGETS:%=%_64)
+
+ifeq ($(CAN_BUILD_WITH_NOPIE),1)
+CFLAGS += -no-pie
+endif
+
+ifeq ($(CAN_BUILD_I386),1)
+TEST_GEN_FILES += $(BINARIES_32)
+endif
+
+ifeq ($(CAN_BUILD_X86_64),1)
+TEST_GEN_FILES += $(BINARIES_64)
+endif
+else
 TEST_GEN_FILES += protection_keys
+endif
 
 ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 
sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
@@ -32,6 +55,55 @@ TEST_FILES := test_vmalloc.sh
 KSFT_KHDR_INSTALL := 1
 include ../lib.mk
 
+ifeq ($(ARCH),x86_64)
+BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
+BINARIES_64 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_64))
+
+define gen-target-rule-32
+$(1) $(1)_32: $(OUTPUT)/$(1)_32
+.PHONY: $(1) $(1)_32
+endef
+
+define gen-target-rule-64
+$(1) $(1)_64: $(OUTPUT)/$(1)_64
+.PHONY: $(1) $(1)_64
+endef
+
+ifeq ($(CAN_BUILD_I386),1)
+$(BINARIES_32): CFLAGS += -m32
+$(BINARIES_32): LDLIBS += -lrt -ldl -lm
+$(BINARIES_32): %_32: %.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-32,$(t
+endif
+
+ifeq ($(CAN_BUILD_X86_64),1)
+$(BINARIES_64): CFLAGS += -m64
+$(BINARIES_64): LDLIBS += -lrt -ldl
+$(BINARIES_64): %_64: %.c
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+$(foreach t,$(TARGETS),$(eval $(call gen-target-rule-64,$(t
+endif
+
+# x86_64 users should be encouraged to install 32-bit libraries
+ifeq ($(CAN_BUILD_I386)$(CAN_BUILD_X86_64),01)
+all: warn_32bit_failure
+
+warn_32bit_failure:
+   @echo "Warning: you seem to have a broken 32-bit build" 2>&1;   
\
+   echo  "environment. This will reduce test coverage of 64-bit" 2>&1; 
\
+   echo  "kernels. If you are using a Debian-like distribution," 2>&1; 
\
+   echo  "try:"; 2>&1; 
\
+   echo  "";   
\
+   echo  "  apt-get install gcc-multilib libc6-i386 libc6-dev-i386";   
\
+   echo  "";   
\
+   echo  "If you are using a Fedora-like distribution, try:";  
\
+   echo  "";   
\
+   echo  "  yum install glibc-devel.*i686";
\
+   exit 0;
+endif
+endif
+
 $(OUTPUT)/userfaultfd: LDLIBS += -lpthread
 
 $(OUTPUT)/mlock-random-test: LDLIBS += -lcap
-- 
2.17.1



[PATCH v19 23/24] selftests: vm: pkeys: Use the correct page size on powerpc

2020-03-31 Thread Sandipan Das
Both 4K and 64K pages are supported on powerpc. Parts of
the selftest code perform alignment computations based on
the PAGE_SIZE macro which is currently hardcoded to 64K
for powerpc. This causes some test failures on kernels
configured with 4K page size.

In some cases, we need to enforce function alignment on
page size. Since this can only be done at build time,
64K is used as the alignment factor as that also ensures
4K alignment.

Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-powerpc.h| 2 +-
 tools/testing/selftests/vm/protection_keys.c | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index 02bd4dd7d467a..3a761e51a5878 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -36,7 +36,7 @@
 pkey-31 and exec-only key */
 #define PKEY_BITS_PER_PKEY 2
 #define HPAGE_SIZE (1UL << 24)
-#define PAGE_SIZE  (1UL << 16)
+#define PAGE_SIZE  sysconf(_SC_PAGESIZE)
 
 static inline u32 pkey_bit_position(int pkey)
 {
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index a1cb9a71e77ce..fc19addcb5c82 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -146,7 +146,12 @@ void abort_hooks(void)
  * will then fault, which makes sure that the fault code handles
  * execute-only memory properly.
  */
+#ifdef __powerpc64__
+/* This way, both 4K and 64K alignment are maintained */
+__attribute__((__aligned__(65536)))
+#else
 __attribute__((__aligned__(PAGE_SIZE)))
+#endif
 void lots_o_noops_around_write(int *write_to_me)
 {
dprintf3("running %s()\n", __func__);
-- 
2.17.1



[PATCH v19 22/24] selftests/vm/pkeys: Override access right definitions on powerpc

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Some platforms hardcode the x86 values for PKEY_DISABLE_ACCESS
and PKEY_DISABLE_WRITE such as those in:
 /usr/include/bits/mman-shared.h.

This overrides the definitions with correct values for powerpc.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-powerpc.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index d31665c48f5ed..02bd4dd7d467a 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -16,11 +16,13 @@
 #define fpregs fp_regs
 #define si_pkey_offset 0x20
 
-#ifndef PKEY_DISABLE_ACCESS
+#ifdef PKEY_DISABLE_ACCESS
+#undef PKEY_DISABLE_ACCESS
 # define PKEY_DISABLE_ACCESS   0x3  /* disable read and write */
 #endif
 
-#ifndef PKEY_DISABLE_WRITE
+#ifdef PKEY_DISABLE_WRITE
+#undef PKEY_DISABLE_WRITE
 # define PKEY_DISABLE_WRITE0x2
 #endif
 
-- 
2.17.1



[PATCH v19 21/24] selftests/vm/pkeys: Test correct behaviour of pkey-0

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Ensure that pkey-0 is allocated on start and that it can
be attached dynamically in various modes, without failures.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/protection_keys.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index d4952b57cc909..a1cb9a71e77ce 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -964,6 +964,58 @@ __attribute__((noinline)) int read_ptr(int *ptr)
return *ptr;
 }
 
+void test_pkey_alloc_free_attach_pkey0(int *ptr, u16 pkey)
+{
+   int i, err;
+   int max_nr_pkey_allocs;
+   int alloced_pkeys[NR_PKEYS];
+   int nr_alloced = 0;
+   long size;
+
+   pkey_assert(pkey_last_malloc_record);
+   size = pkey_last_malloc_record->size;
+   /*
+* This is a bit of a hack.  But mprotect() requires
+* huge-page-aligned sizes when operating on hugetlbfs.
+* So, make sure that we use something that's a multiple
+* of a huge page when we can.
+*/
+   if (size >= HPAGE_SIZE)
+   size = HPAGE_SIZE;
+
+   /* allocate every possible key and make sure key-0 never got allocated 
*/
+   max_nr_pkey_allocs = NR_PKEYS;
+   for (i = 0; i < max_nr_pkey_allocs; i++) {
+   int new_pkey = alloc_pkey();
+   pkey_assert(new_pkey != 0);
+
+   if (new_pkey < 0)
+   break;
+   alloced_pkeys[nr_alloced++] = new_pkey;
+   }
+   /* free all the allocated keys */
+   for (i = 0; i < nr_alloced; i++) {
+   int free_ret;
+
+   if (!alloced_pkeys[i])
+   continue;
+   free_ret = sys_pkey_free(alloced_pkeys[i]);
+   pkey_assert(!free_ret);
+   }
+
+   /* attach key-0 in various modes */
+   err = sys_mprotect_pkey(ptr, size, PROT_READ, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_WRITE, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_EXEC, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE, 0);
+   pkey_assert(!err);
+   err = sys_mprotect_pkey(ptr, size, PROT_READ|PROT_WRITE|PROT_EXEC, 0);
+   pkey_assert(!err);
+}
+
 void test_read_of_write_disabled_region(int *ptr, u16 pkey)
 {
int ptr_contents;
@@ -1448,6 +1500,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_pkey_syscalls_on_non_allocated_pkey,
test_pkey_syscalls_bad_args,
test_pkey_alloc_exhaust,
+   test_pkey_alloc_free_attach_pkey0,
 };
 
 void run_tests_once(void)
-- 
2.17.1



[PATCH v19 20/24] selftests/vm/pkeys: Introduce a sub-page allocator

2020-03-31 Thread Sandipan Das
From: Ram Pai 

This introduces a new allocator that allocates 4K hardware
pages to back 64K linux pages. This allocator is available
only on powerpc.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-helpers.h|  6 +
 tools/testing/selftests/vm/pkey-powerpc.h| 25 
 tools/testing/selftests/vm/pkey-x86.h|  5 
 tools/testing/selftests/vm/protection_keys.c |  1 +
 4 files changed, 37 insertions(+)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 59ccdff18214f..622a85848f61b 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -28,6 +28,9 @@
 extern int dprint_in_signal;
 extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
 
+extern int test_nr;
+extern int iteration_nr;
+
 #ifdef __GNUC__
 __attribute__((format(printf, 1, 2)))
 #endif
@@ -78,6 +81,9 @@ __attribute__((noinline)) int read_ptr(int *ptr);
 void expected_pkey_fault(int pkey);
 int sys_pkey_alloc(unsigned long flags, unsigned long init_val);
 int sys_pkey_free(unsigned long pkey);
+int mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot,
+   unsigned long pkey);
+void record_pkey_malloc(void *ptr, long size, int prot);
 
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index 7d7c3ffafdd99..d31665c48f5ed 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -106,4 +106,29 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey)
 /* 4-byte instructions * 16384 = 64K page */
 #define __page_o_noops() asm(".rept 16384 ; nop; .endr")
 
+void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey)
+{
+   void *ptr;
+   int ret;
+
+   dprintf1("doing %s(size=%ld, prot=0x%x, pkey=%d)\n", __func__,
+   size, prot, pkey);
+   pkey_assert(pkey < NR_PKEYS);
+   ptr = mmap(NULL, size, prot, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+   pkey_assert(ptr != (void *)-1);
+
+   ret = syscall(__NR_subpage_prot, ptr, size, NULL);
+   if (ret) {
+   perror("subpage_perm");
+   return PTR_ERR_ENOTSUP;
+   }
+
+   ret = mprotect_pkey((void *)ptr, PAGE_SIZE, prot, pkey);
+   pkey_assert(!ret);
+   record_pkey_malloc(ptr, size, prot);
+
+   dprintf1("%s() for pkey %d @ %p\n", __func__, pkey, ptr);
+   return ptr;
+}
+
 #endif /* _PKEYS_POWERPC_H */
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index 6421b846aa169..3be20f5d52752 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -173,4 +173,9 @@ void expect_fault_on_read_execonly_key(void *p1, int pkey)
expected_pkey_fault(pkey);
 }
 
+void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey)
+{
+   return PTR_ERR_ENOTSUP;
+}
+
 #endif /* _PKEYS_X86_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 8bb4de1038743..d4952b57cc909 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -845,6 +845,7 @@ void *malloc_pkey_mmap_dax(long size, int prot, u16 pkey)
 void *(*pkey_malloc[])(long size, int prot, u16 pkey) = {
 
malloc_pkey_with_mprotect,
+   malloc_pkey_with_mprotect_subpage,
malloc_pkey_anon_huge,
malloc_pkey_hugetlb
 /* can not do direct with the pkey_mprotect() API:
-- 
2.17.1



[PATCH v19 19/24] selftests/vm/pkeys: Detect write violation on a mapped access-denied-key page

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Detect write-violation on a page to which access-disabled
key is associated much after the page is mapped.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index cb31a5cdf6d90..8bb4de1038743 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1027,6 +1027,18 @@ void test_write_of_access_disabled_region(int *ptr, u16 
pkey)
*ptr = __LINE__;
expected_pkey_fault(pkey);
 }
+
+void test_write_of_access_disabled_region_with_page_already_mapped(int *ptr,
+   u16 pkey)
+{
+   *ptr = __LINE__;
+   dprintf1("disabling access; after accessing the page, "
+   " to PKEY[%02d], doing write\n", pkey);
+   pkey_access_deny(pkey);
+   *ptr = __LINE__;
+   expected_pkey_fault(pkey);
+}
+
 void test_kernel_write_of_access_disabled_region(int *ptr, u16 pkey)
 {
int ret;
@@ -1423,6 +1435,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_write_of_write_disabled_region,
test_write_of_write_disabled_region_with_page_already_mapped,
test_write_of_access_disabled_region,
+   test_write_of_access_disabled_region_with_page_already_mapped,
test_kernel_write_of_access_disabled_region,
test_kernel_write_of_write_disabled_region,
test_kernel_gup_of_access_disabled_region,
-- 
2.17.1



[PATCH v19 18/24] selftests/vm/pkeys: Associate key on a mapped page and detect write violation

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Detect write-violation on a page to which write-disabled
key is associated much after the page is mapped.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index f65d384ef6a0d..cb31a5cdf6d90 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1002,6 +1002,17 @@ void 
test_read_of_access_disabled_region_with_page_already_mapped(int *ptr,
expected_pkey_fault(pkey);
 }
 
+void test_write_of_write_disabled_region_with_page_already_mapped(int *ptr,
+   u16 pkey)
+{
+   *ptr = __LINE__;
+   dprintf1("disabling write access; after accessing the page, "
+   "to PKEY[%02d], doing write\n", pkey);
+   pkey_write_deny(pkey);
+   *ptr = __LINE__;
+   expected_pkey_fault(pkey);
+}
+
 void test_write_of_write_disabled_region(int *ptr, u16 pkey)
 {
dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
@@ -1410,6 +1421,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_read_of_access_disabled_region,
test_read_of_access_disabled_region_with_page_already_mapped,
test_write_of_write_disabled_region,
+   test_write_of_write_disabled_region_with_page_already_mapped,
test_write_of_access_disabled_region,
test_kernel_write_of_access_disabled_region,
test_kernel_write_of_write_disabled_region,
-- 
2.17.1



[PATCH v19 17/24] selftests/vm/pkeys: Associate key on a mapped page and detect access violation

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Detect access-violation on a page to which access-disabled
key is associated much after the page is mapped.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 95f173049f43f..f65d384ef6a0d 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -984,6 +984,24 @@ void test_read_of_access_disabled_region(int *ptr, u16 
pkey)
dprintf1("*ptr: %d\n", ptr_contents);
expected_pkey_fault(pkey);
 }
+
+void test_read_of_access_disabled_region_with_page_already_mapped(int *ptr,
+   u16 pkey)
+{
+   int ptr_contents;
+
+   dprintf1("disabling access to PKEY[%02d], doing read @ %p\n",
+   pkey, ptr);
+   ptr_contents = read_ptr(ptr);
+   dprintf1("reading ptr before disabling the read : %d\n",
+   ptr_contents);
+   read_pkey_reg();
+   pkey_access_deny(pkey);
+   ptr_contents = read_ptr(ptr);
+   dprintf1("*ptr: %d\n", ptr_contents);
+   expected_pkey_fault(pkey);
+}
+
 void test_write_of_write_disabled_region(int *ptr, u16 pkey)
 {
dprintf1("disabling write access to PKEY[%02d], doing write\n", pkey);
@@ -1390,6 +1408,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 
pkey)
 void (*pkey_tests[])(int *ptr, u16 pkey) = {
test_read_of_write_disabled_region,
test_read_of_access_disabled_region,
+   test_read_of_access_disabled_region_with_page_already_mapped,
test_write_of_write_disabled_region,
test_write_of_access_disabled_region,
test_kernel_write_of_access_disabled_region,
-- 
2.17.1



[PATCH v19 16/24] selftests/vm/pkeys: Improve checks to determine pkey support

2020-03-31 Thread Sandipan Das
From: Ram Pai 

For the pkeys subsystem to work, both the CPU and the
kernel need to have support. So, additionally check if
the kernel supports pkeys apart from the CPU feature
checks.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-helpers.h| 30 
 tools/testing/selftests/vm/pkey-powerpc.h|  3 +-
 tools/testing/selftests/vm/pkey-x86.h|  2 +-
 tools/testing/selftests/vm/protection_keys.c |  7 +++--
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 2f4b1eb3a680a..59ccdff18214f 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -76,6 +76,8 @@ extern void abort_hooks(void);
 
 __attribute__((noinline)) int read_ptr(int *ptr);
 void expected_pkey_fault(int pkey);
+int sys_pkey_alloc(unsigned long flags, unsigned long init_val);
+int sys_pkey_free(unsigned long pkey);
 
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
@@ -186,4 +188,32 @@ static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si)
 #endif
 }
 
+static inline int kernel_has_pkeys(void)
+{
+   /* try allocating a key and see if it succeeds */
+   int ret = sys_pkey_alloc(0, 0);
+   if (ret <= 0) {
+   return 0;
+   }
+   sys_pkey_free(ret);
+   return 1;
+}
+
+static inline int is_pkeys_supported(void)
+{
+   /* check if the cpu supports pkeys */
+   if (!cpu_has_pkeys()) {
+   dprintf1("SKIP: %s: no CPU support\n", __func__);
+   return 0;
+   }
+
+   /* check if the kernel supports pkeys */
+   if (!kernel_has_pkeys()) {
+   dprintf1("SKIP: %s: no kernel support\n", __func__);
+   return 0;
+   }
+
+   return 1;
+}
+
 #endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index 319673bbab0b3..7d7c3ffafdd99 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -63,8 +63,9 @@ static inline void __write_pkey_reg(u64 pkey_reg)
__func__, __read_pkey_reg(), pkey_reg);
 }
 
-static inline int cpu_has_pku(void)
+static inline int cpu_has_pkeys(void)
 {
+   /* No simple way to determine this */
return 1;
 }
 
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index a0c59d4f7af2e..6421b846aa169 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -97,7 +97,7 @@ static inline void __cpuid(unsigned int *eax, unsigned int 
*ebx,
 #define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */
 #define X86_FEATURE_OSPKE  (1<<4) /* OS Protection Keys Enable */
 
-static inline int cpu_has_pku(void)
+static inline int cpu_has_pkeys(void)
 {
unsigned int eax;
unsigned int ebx;
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 5fcbbc525364e..95f173049f43f 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1378,7 +1378,7 @@ void test_mprotect_pkey_on_unsupported_cpu(int *ptr, u16 
pkey)
int size = PAGE_SIZE;
int sret;
 
-   if (cpu_has_pku()) {
+   if (cpu_has_pkeys()) {
dprintf1("SKIP: %s: no CPU support\n", __func__);
return;
}
@@ -1447,12 +1447,13 @@ void pkey_setup_shadow(void)
 int main(void)
 {
int nr_iterations = 22;
+   int pkeys_supported = is_pkeys_supported();
 
setup_handlers();
 
-   printf("has pku: %d\n", cpu_has_pku());
+   printf("has pkeys: %d\n", pkeys_supported);
 
-   if (!cpu_has_pku()) {
+   if (!pkeys_supported) {
int size = PAGE_SIZE;
int *ptr;
 
-- 
2.17.1



[PATCH v19 15/24] selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Some pkeys which are valid on the hardware are reserved
and not available for application use. These keys cannot
be allocated.

test_pkey_alloc_exhaust() tries to account for these and
has an assertion which validates if all available pkeys
have been exahaustively allocated. However, the expression
that is currently used is only valid for x86. On powerpc,
a pkey is additionally reserved as compared to x86. Hence,
the assertion is made to use an arch-specific helper to
get the correct count of reserved pkeys.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/protection_keys.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index e6de078a9196f..5fcbbc525364e 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1153,6 +1153,7 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
dprintf3("%s()::%d\n", __func__, __LINE__);
 
/*
+* On x86:
 * There are 16 pkeys supported in hardware.  Three are
 * allocated by the time we get here:
 *   1. The default key (0)
@@ -1160,8 +1161,16 @@ void test_pkey_alloc_exhaust(int *ptr, u16 pkey)
 *   3. One allocated by the test code and passed in via
 *  'pkey' to this function.
 * Ensure that we can allocate at least another 13 (16-3).
+*
+* On powerpc:
+* There are either 5, 28, 29 or 32 pkeys supported in
+* hardware depending on the page size (4K or 64K) and
+* platform (powernv or powervm). Four are allocated by
+* the time we get here. These include pkey-0, pkey-1,
+* exec-only pkey and the one allocated by the test code.
+* Ensure that we can allocate the remaining.
 */
-   pkey_assert(i >= NR_PKEYS-3);
+   pkey_assert(i >= (NR_PKEYS - get_arch_reserved_keys() - 1));
 
for (i = 0; i < nr_allocated_pkeys; i++) {
err = sys_pkey_free(allocated_pkeys[i]);
-- 
2.17.1



[PATCH v19 14/24] selftests/vm/pkeys: Fix number of reserved powerpc pkeys

2020-03-31 Thread Sandipan Das
From: "Desnes A. Nunes do Rosario" 

The number of reserved pkeys in a PowerNV environment is
different from that on PowerVM or KVM.

Tested on PowerVM and PowerNV environments.

Signed-off-by: "Desnes A. Nunes do Rosario" 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-powerpc.h | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
index c79f4160a6a08..319673bbab0b3 100644
--- a/tools/testing/selftests/vm/pkey-powerpc.h
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -28,7 +28,10 @@
 #define NR_RESERVED_PKEYS_4K   27 /* pkey-0, pkey-1, exec-only-pkey
  and 24 other keys that cannot be
  represented in the PTE */
-#define NR_RESERVED_PKEYS_64K  3  /* pkey-0, pkey-1 and exec-only-pkey */
+#define NR_RESERVED_PKEYS_64K_3KEYS3 /* PowerNV and KVM: pkey-0,
+pkey-1 and exec-only key */
+#define NR_RESERVED_PKEYS_64K_4KEYS4 /* PowerVM: pkey-0, pkey-1,
+pkey-31 and exec-only key */
 #define PKEY_BITS_PER_PKEY 2
 #define HPAGE_SIZE (1UL << 24)
 #define PAGE_SIZE  (1UL << 16)
@@ -65,12 +68,27 @@ static inline int cpu_has_pku(void)
return 1;
 }
 
+static inline bool arch_is_powervm()
+{
+   struct stat buf;
+
+   if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", ) == 
0) &&
+   (stat("/sys/firmware/devicetree/base/hmc-managed?", ) == 0) &&
+   (stat("/sys/firmware/devicetree/base/chosen/qemu,graphic-width", 
) == -1) )
+   return true;
+
+   return false;
+}
+
 static inline int get_arch_reserved_keys(void)
 {
if (sysconf(_SC_PAGESIZE) == 4096)
return NR_RESERVED_PKEYS_4K;
else
-   return NR_RESERVED_PKEYS_64K;
+   if (arch_is_powervm())
+   return NR_RESERVED_PKEYS_64K_4KEYS;
+   else
+   return NR_RESERVED_PKEYS_64K_3KEYS;
 }
 
 void expect_fault_on_read_execonly_key(void *p1, int pkey)
-- 
2.17.1



[PATCH v19 13/24] selftests/vm/pkeys: Introduce powerpc support

2020-03-31 Thread Sandipan Das
From: Ram Pai 

This makes use of the abstractions added earlier and
introduces support for powerpc.

For powerpc, after receiving the SIGSEGV, the signal
handler must explicitly restore access permissions
for the faulting pkey to allow the test to continue.
As this makes use of pkey_access_allow(), all of its
dependencies and other similar functions have been
moved ahead of the signal handler.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-helpers.h|   2 +
 tools/testing/selftests/vm/pkey-powerpc.h|  90 +++
 tools/testing/selftests/vm/protection_keys.c | 269 ++-
 3 files changed, 233 insertions(+), 128 deletions(-)
 create mode 100644 tools/testing/selftests/vm/pkey-powerpc.h

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 621fb2a0a5efe..2f4b1eb3a680a 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -79,6 +79,8 @@ void expected_pkey_fault(int pkey);
 
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
+#elif defined(__powerpc64__) /* arch */
+#include "pkey-powerpc.h"
 #else /* arch */
 #error Architecture not supported
 #endif /* arch */
diff --git a/tools/testing/selftests/vm/pkey-powerpc.h 
b/tools/testing/selftests/vm/pkey-powerpc.h
new file mode 100644
index 0..c79f4160a6a08
--- /dev/null
+++ b/tools/testing/selftests/vm/pkey-powerpc.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _PKEYS_POWERPC_H
+#define _PKEYS_POWERPC_H
+
+#ifndef SYS_mprotect_key
+# define SYS_mprotect_key  386
+#endif
+#ifndef SYS_pkey_alloc
+# define SYS_pkey_alloc384
+# define SYS_pkey_free 385
+#endif
+#define REG_IP_IDX PT_NIP
+#define REG_TRAPNO PT_TRAP
+#define gregs  gp_regs
+#define fpregs fp_regs
+#define si_pkey_offset 0x20
+
+#ifndef PKEY_DISABLE_ACCESS
+# define PKEY_DISABLE_ACCESS   0x3  /* disable read and write */
+#endif
+
+#ifndef PKEY_DISABLE_WRITE
+# define PKEY_DISABLE_WRITE0x2
+#endif
+
+#define NR_PKEYS   32
+#define NR_RESERVED_PKEYS_4K   27 /* pkey-0, pkey-1, exec-only-pkey
+ and 24 other keys that cannot be
+ represented in the PTE */
+#define NR_RESERVED_PKEYS_64K  3  /* pkey-0, pkey-1 and exec-only-pkey */
+#define PKEY_BITS_PER_PKEY 2
+#define HPAGE_SIZE (1UL << 24)
+#define PAGE_SIZE  (1UL << 16)
+
+static inline u32 pkey_bit_position(int pkey)
+{
+   return (NR_PKEYS - pkey - 1) * PKEY_BITS_PER_PKEY;
+}
+
+static inline u64 __read_pkey_reg(void)
+{
+   u64 pkey_reg;
+
+   asm volatile("mfspr %0, 0xd" : "=r" (pkey_reg));
+
+   return pkey_reg;
+}
+
+static inline void __write_pkey_reg(u64 pkey_reg)
+{
+   u64 amr = pkey_reg;
+
+   dprintf4("%s() changing %016llx to %016llx\n",
+__func__, __read_pkey_reg(), pkey_reg);
+
+   asm volatile("mtspr 0xd, %0" : : "r" ((unsigned long)(amr)) : "memory");
+
+   dprintf4("%s() pkey register after changing %016llx to %016llx\n",
+   __func__, __read_pkey_reg(), pkey_reg);
+}
+
+static inline int cpu_has_pku(void)
+{
+   return 1;
+}
+
+static inline int get_arch_reserved_keys(void)
+{
+   if (sysconf(_SC_PAGESIZE) == 4096)
+   return NR_RESERVED_PKEYS_4K;
+   else
+   return NR_RESERVED_PKEYS_64K;
+}
+
+void expect_fault_on_read_execonly_key(void *p1, int pkey)
+{
+   /*
+* powerpc does not allow userspace to change permissions of exec-only
+* keys since those keys are not allocated by userspace. The signal
+* handler wont be able to reset the permissions, which means the code
+* will infinitely continue to segfault here.
+*/
+   return;
+}
+
+/* 4-byte instructions * 16384 = 64K page */
+#define __page_o_noops() asm(".rept 16384 ; nop; .endr")
+
+#endif /* _PKEYS_POWERPC_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 57c71056c93d8..e6de078a9196f 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -169,6 +169,125 @@ void dump_mem(void *dumpme, int len_bytes)
}
 }
 
+static u32 hw_pkey_get(int pkey, unsigned long flags)
+{
+   u64 pkey_reg = __read_pkey_reg();
+
+   dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
+   __func__, pkey, flags, 0, 0);
+   dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg);
+
+   return (u32) get_pkey_bits(pkey_reg, pkey);
+}
+
+static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags)
+{
+   u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
+   u64 old_pkey_reg 

[PATCH v19 12/24] selftests/vm/pkeys: Introduce generic pkey abstractions

2020-03-31 Thread Sandipan Das
From: Ram Pai 

This introduces some generic abstractions and provides
the corresponding architecture-specfic implementations
for these abstractions.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-helpers.h| 12 
 tools/testing/selftests/vm/pkey-x86.h| 15 +++
 tools/testing/selftests/vm/protection_keys.c |  8 ++--
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 0e3da7c8d6282..621fb2a0a5efe 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -74,6 +74,9 @@ extern void abort_hooks(void);
}   \
 } while (0)
 
+__attribute__((noinline)) int read_ptr(int *ptr);
+void expected_pkey_fault(int pkey);
+
 #if defined(__i386__) || defined(__x86_64__) /* arch */
 #include "pkey-x86.h"
 #else /* arch */
@@ -172,4 +175,13 @@ static inline void __pkey_write_allow(int pkey, int 
do_allow_write)
 #define __stringify_1(x...) #x
 #define __stringify(x...)   __stringify_1(x)
 
+static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si)
+{
+#ifdef si_pkey
+   return >si_pkey;
+#else
+   return (u32 *)(((u8 *)si) + si_pkey_offset);
+#endif
+}
+
 #endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index def2a1bcf6a5d..a0c59d4f7af2e 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -42,6 +42,7 @@
 #endif
 
 #define NR_PKEYS   16
+#define NR_RESERVED_PKEYS  2 /* pkey-0 and exec-only-pkey */
 #define PKEY_BITS_PER_PKEY 2
 #define HPAGE_SIZE (1UL<<21)
 #define PAGE_SIZE  4096
@@ -158,4 +159,18 @@ int pkey_reg_xstate_offset(void)
return xstate_offset;
 }
 
+static inline int get_arch_reserved_keys(void)
+{
+   return NR_RESERVED_PKEYS;
+}
+
+void expect_fault_on_read_execonly_key(void *p1, int pkey)
+{
+   int ptr_contents;
+
+   ptr_contents = read_ptr(p1);
+   dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
+   expected_pkey_fault(pkey);
+}
+
 #endif /* _PKEYS_X86_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 535e464e27e9d..57c71056c93d8 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -1307,9 +1307,7 @@ void test_executing_on_unreadable_memory(int *ptr, u16 
pkey)
madvise(p1, PAGE_SIZE, MADV_DONTNEED);
lots_o_noops_around_write();
do_not_expect_pkey_fault("executing on PROT_EXEC memory");
-   ptr_contents = read_ptr(p1);
-   dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
-   expected_pkey_fault(pkey);
+   expect_fault_on_read_execonly_key(p1, pkey);
 }
 
 void test_implicit_mprotect_exec_only_memory(int *ptr, u16 pkey)
@@ -1336,9 +1334,7 @@ void test_implicit_mprotect_exec_only_memory(int *ptr, 
u16 pkey)
madvise(p1, PAGE_SIZE, MADV_DONTNEED);
lots_o_noops_around_write();
do_not_expect_pkey_fault("executing on PROT_EXEC memory");
-   ptr_contents = read_ptr(p1);
-   dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
-   expected_pkey_fault(UNKNOWN_PKEY);
+   expect_fault_on_read_execonly_key(p1, UNKNOWN_PKEY);
 
/*
 * Put the memory back to non-PROT_EXEC.  Should clear the
-- 
2.17.1



[PATCH v19 10/24] selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random

2020-03-31 Thread Sandipan Das
From: Ram Pai 

alloc_random_pkey() was allocating the same pkey every
time. Not all pkeys were geting tested. This fixes it.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 7fd52d5c4bfdd..9cc82b65f8281 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -25,6 +25,7 @@
 #define __SANE_USERSPACE_TYPES__
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -546,10 +547,10 @@ int alloc_random_pkey(void)
int nr_alloced = 0;
int random_index;
memset(alloced_pkeys, 0, sizeof(alloced_pkeys));
+   srand((unsigned int)time(NULL));
 
/* allocate every possible key and make a note of which ones we got */
max_nr_pkey_allocs = NR_PKEYS;
-   max_nr_pkey_allocs = 1;
for (i = 0; i < max_nr_pkey_allocs; i++) {
int new_pkey = alloc_pkey();
if (new_pkey < 0)
-- 
2.17.1



[PATCH v19 11/24] selftests: vm: pkeys: Use the correct huge page size

2020-03-31 Thread Sandipan Das
The huge page size can vary across architectures. This will
ensure that the correct huge page size is used when accessing
the hugetlb controls under sysfs. Instead of using a hardcoded
page size (i.e. 2MB), this now uses the HPAGE_SIZE macro which
is arch-specific.

Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/protection_keys.c | 23 ++--
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 9cc82b65f8281..535e464e27e9d 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -739,12 +739,15 @@ void *malloc_pkey_anon_huge(long size, int prot, u16 pkey)
 }
 
 int hugetlb_setup_ok;
+#define SYSFS_FMT_NR_HUGE_PAGES 
"/sys/kernel/mm/hugepages/hugepages-%ldkB/nr_hugepages"
 #define GET_NR_HUGE_PAGES 10
 void setup_hugetlbfs(void)
 {
int err;
int fd;
-   char buf[] = "123";
+   char buf[256];
+   long hpagesz_kb;
+   long hpagesz_mb;
 
if (geteuid() != 0) {
fprintf(stderr, "WARNING: not run as root, can not do hugetlb 
test\n");
@@ -755,11 +758,16 @@ void setup_hugetlbfs(void)
 
/*
 * Now go make sure that we got the pages and that they
-* are 2M pages.  Someone might have made 1G the default.
+* are PMD-level pages. Someone might have made PUD-level
+* pages the default.
 */
-   fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", 
O_RDONLY);
+   hpagesz_kb = HPAGE_SIZE / 1024;
+   hpagesz_mb = hpagesz_kb / 1024;
+   sprintf(buf, SYSFS_FMT_NR_HUGE_PAGES, hpagesz_kb);
+   fd = open(buf, O_RDONLY);
if (fd < 0) {
-   perror("opening sysfs 2M hugetlb config");
+   fprintf(stderr, "opening sysfs %ldM hugetlb config: %s\n",
+   hpagesz_mb, strerror(errno));
return;
}
 
@@ -767,13 +775,14 @@ void setup_hugetlbfs(void)
err = read(fd, buf, sizeof(buf)-1);
close(fd);
if (err <= 0) {
-   perror("reading sysfs 2M hugetlb config");
+   fprintf(stderr, "reading sysfs %ldM hugetlb config: %s\n",
+   hpagesz_mb, strerror(errno));
return;
}
 
if (atoi(buf) != GET_NR_HUGE_PAGES) {
-   fprintf(stderr, "could not confirm 2M pages, got: '%s' expected 
%d\n",
-   buf, GET_NR_HUGE_PAGES);
+   fprintf(stderr, "could not confirm %ldM pages, got: '%s' 
expected %d\n",
+   hpagesz_mb, buf, GET_NR_HUGE_PAGES);
return;
}
 
-- 
2.17.1



[PATCH v19 09/24] selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()

2020-03-31 Thread Sandipan Das
From: Ram Pai 

In some cases, a pkey's bits need not necessarily change
in a way that the value of the pkey register increases
when performing a pkey_disable_set() or decreases when
performing a pkey_disable_clear().

For example, on powerpc, if a pkey's current state is
PKEY_DISABLE_ACCESS and we perform a pkey_write_disable()
on it, the bits still remain the same. We will observe
something similar when the pkey's current state is 0 and
a pkey_access_enable() is performed on it.

Either case would cause some assertions to fail. This
fixes the problem.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/protection_keys.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 4b1ddb526228d..7fd52d5c4bfdd 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -400,7 +400,7 @@ void pkey_disable_set(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n",
__func__, pkey, read_pkey_reg());
if (flags)
-   pkey_assert(read_pkey_reg() > orig_pkey_reg);
+   pkey_assert(read_pkey_reg() >= orig_pkey_reg);
dprintf1("END<---%s(%d, 0x%x)\n", __func__,
pkey, flags);
 }
@@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__,
pkey, read_pkey_reg());
if (flags)
-   assert(read_pkey_reg() < orig_pkey_reg);
+   assert(read_pkey_reg() <= orig_pkey_reg);
 }
 
 void pkey_write_allow(int pkey)
-- 
2.17.1



[PATCH v19 08/24] selftests/vm/pkeys: Fix pkey_disable_clear()

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Currently, pkey_disable_clear() sets the specified bits
instead clearing them. This has been dead code up to now
because its only callers i.e. pkey_access/write_allow()
are also unused.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/protection_keys.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index bed9d4de12b48..4b1ddb526228d 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -418,7 +418,7 @@ void pkey_disable_clear(int pkey, int flags)
pkey, pkey, pkey_rights);
pkey_assert(pkey_rights >= 0);
 
-   pkey_rights |= flags;
+   pkey_rights &= ~flags;
 
ret = hw_pkey_set(pkey, pkey_rights, 0);
shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights);
@@ -431,7 +431,7 @@ void pkey_disable_clear(int pkey, int flags)
dprintf1("%s(%d) pkey_reg: 0x%016llx\n", __func__,
pkey, read_pkey_reg());
if (flags)
-   assert(read_pkey_reg() > orig_pkey_reg);
+   assert(read_pkey_reg() < orig_pkey_reg);
 }
 
 void pkey_write_allow(int pkey)
-- 
2.17.1



[PATCH v19 07/24] selftests: vm: pkeys: Add helpers for pkey bits

2020-03-31 Thread Sandipan Das
This introduces some functions that help with setting
or clearing bits of a particular pkey. This also adds
an abstraction for getting a pkey's bit position in
the pkey register as this may vary across architectures.

Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-helpers.h| 22 ++
 tools/testing/selftests/vm/pkey-x86.h|  5 +++
 tools/testing/selftests/vm/protection_keys.c | 32 ++--
 3 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index dfbce49269ce2..0e3da7c8d6282 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -80,6 +80,28 @@ extern void abort_hooks(void);
 #error Architecture not supported
 #endif /* arch */
 
+#define PKEY_MASK  (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
+
+static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags)
+{
+   u32 shift = pkey_bit_position(pkey);
+   /* mask out bits from pkey in old value */
+   reg &= ~((u64)PKEY_MASK << shift);
+   /* OR in new bits for pkey */
+   reg |= (flags & PKEY_MASK) << shift;
+   return reg;
+}
+
+static inline u64 get_pkey_bits(u64 reg, int pkey)
+{
+   u32 shift = pkey_bit_position(pkey);
+   /*
+* shift down the relevant bits to the lowest two, then
+* mask off all the other higher bits
+*/
+   return ((reg >> shift) & PKEY_MASK);
+}
+
 extern u64 shadow_pkey_reg;
 
 static inline u64 _read_pkey_reg(int line)
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
index 6ffea27e2d2d6..def2a1bcf6a5d 100644
--- a/tools/testing/selftests/vm/pkey-x86.h
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -118,6 +118,11 @@ static inline int cpu_has_pku(void)
return 1;
 }
 
+static inline u32 pkey_bit_position(int pkey)
+{
+   return pkey * PKEY_BITS_PER_PKEY;
+}
+
 #define XSTATE_PKEY_BIT(9)
 #define XSTATE_PKEY0x200
 
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index efa35cc6f6b9e..bed9d4de12b48 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -334,25 +334,13 @@ pid_t fork_lazy_child(void)
 
 static u32 hw_pkey_get(int pkey, unsigned long flags)
 {
-   u32 mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
u64 pkey_reg = __read_pkey_reg();
-   u64 shifted_pkey_reg;
-   u32 masked_pkey_reg;
 
dprintf1("%s(pkey=%d, flags=%lx) = %x / %d\n",
__func__, pkey, flags, 0, 0);
dprintf2("%s() raw pkey_reg: %016llx\n", __func__, pkey_reg);
 
-   shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY));
-   dprintf2("%s() shifted_pkey_reg: %016llx\n", __func__,
-   shifted_pkey_reg);
-   masked_pkey_reg = shifted_pkey_reg & mask;
-   dprintf2("%s() masked  pkey_reg: %x\n", __func__, masked_pkey_reg);
-   /*
-* shift down the relevant bits to the lowest two, then
-* mask off all the other high bits.
-*/
-   return masked_pkey_reg;
+   return (u32) get_pkey_bits(pkey_reg, pkey);
 }
 
 static int hw_pkey_set(int pkey, unsigned long rights, unsigned long flags)
@@ -364,12 +352,8 @@ static int hw_pkey_set(int pkey, unsigned long rights, 
unsigned long flags)
/* make sure that 'rights' only contains the bits we expect: */
assert(!(rights & ~mask));
 
-   /* copy old pkey_reg */
-   new_pkey_reg = old_pkey_reg;
-   /* mask out bits from pkey in old value: */
-   new_pkey_reg &= ~(mask << (pkey * PKEY_BITS_PER_PKEY));
-   /* OR in new bits for pkey: */
-   new_pkey_reg |= (rights << (pkey * PKEY_BITS_PER_PKEY));
+   /* modify bits accordingly in old pkey_reg and assign it */
+   new_pkey_reg = set_pkey_bits(old_pkey_reg, pkey, rights);
 
__write_pkey_reg(new_pkey_reg);
 
@@ -403,7 +387,7 @@ void pkey_disable_set(int pkey, int flags)
ret = hw_pkey_set(pkey, pkey_rights, syscall_flags);
assert(!ret);
/* pkey_reg and flags have the same format */
-   shadow_pkey_reg |= flags << (pkey * 2);
+   shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights);
dprintf1("%s(%d) shadow: 0x%016llx\n",
__func__, pkey, shadow_pkey_reg);
 
@@ -437,7 +421,7 @@ void pkey_disable_clear(int pkey, int flags)
pkey_rights |= flags;
 
ret = hw_pkey_set(pkey, pkey_rights, 0);
-   shadow_pkey_reg &= ~(flags << (pkey * 2));
+   shadow_pkey_reg = set_pkey_bits(shadow_pkey_reg, pkey, pkey_rights);
pkey_assert(ret >= 0);
 
pkey_rights = hw_pkey_get(pkey, syscall_flags);
@@ -513,7 +497,8 @@ int alloc_pkey(void)
shadow_pkey_reg);
if (ret) {
/* clear both the bits: */
-   

[PATCH v19 06/24] selftests: vm: pkeys: Use sane types for pkey register

2020-03-31 Thread Sandipan Das
The size of the pkey register can vary across architectures.
This converts the data type of all its references to u64 in
preparation for multi-arch support.

To keep the definition of the u64 type consistent and remove
format specifier related warnings, __SANE_USERSPACE_TYPES__
is defined as suggested by Michael Ellerman.

Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-helpers.h| 31 +++
 tools/testing/selftests/vm/pkey-x86.h|  8 +-
 tools/testing/selftests/vm/protection_keys.c | 86 
 3 files changed, 72 insertions(+), 53 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 7f18a82e54fc8..dfbce49269ce2 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -14,10 +14,10 @@
 #include 
 
 /* Define some kernel-like types */
-#define  u8 uint8_t
-#define u16 uint16_t
-#define u32 uint32_t
-#define u64 uint64_t
+#define  u8 __u8
+#define u16 __u16
+#define u32 __u32
+#define u64 __u64
 
 #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
 
@@ -80,13 +80,14 @@ extern void abort_hooks(void);
 #error Architecture not supported
 #endif /* arch */
 
-extern unsigned int shadow_pkey_reg;
+extern u64 shadow_pkey_reg;
 
-static inline unsigned int _read_pkey_reg(int line)
+static inline u64 _read_pkey_reg(int line)
 {
-   unsigned int pkey_reg = __read_pkey_reg();
+   u64 pkey_reg = __read_pkey_reg();
 
-   dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n",
+   dprintf4("read_pkey_reg(line=%d) pkey_reg: %016llx"
+   " shadow: %016llx\n",
line, pkey_reg, shadow_pkey_reg);
assert(pkey_reg == shadow_pkey_reg);
 
@@ -95,15 +96,15 @@ static inline unsigned int _read_pkey_reg(int line)
 
 #define read_pkey_reg() _read_pkey_reg(__LINE__)
 
-static inline void write_pkey_reg(unsigned int pkey_reg)
+static inline void write_pkey_reg(u64 pkey_reg)
 {
-   dprintf4("%s() changing %08x to %08x\n", __func__,
+   dprintf4("%s() changing %016llx to %016llx\n", __func__,
__read_pkey_reg(), pkey_reg);
/* will do the shadow check for us: */
read_pkey_reg();
__write_pkey_reg(pkey_reg);
shadow_pkey_reg = pkey_reg;
-   dprintf4("%s(%08x) pkey_reg: %08x\n", __func__,
+   dprintf4("%s(%016llx) pkey_reg: %016llx\n", __func__,
pkey_reg, __read_pkey_reg());
 }
 
@@ -113,7 +114,7 @@ static inline void write_pkey_reg(unsigned int pkey_reg)
  */
 static inline void __pkey_access_allow(int pkey, int do_allow)
 {
-   unsigned int pkey_reg = read_pkey_reg();
+   u64 pkey_reg = read_pkey_reg();
int bit = pkey * 2;
 
if (do_allow)
@@ -121,13 +122,13 @@ static inline void __pkey_access_allow(int pkey, int 
do_allow)
else
pkey_reg |= (1<
 #include 
 #include 
@@ -48,7 +49,7 @@
 int iteration_nr = 1;
 int test_nr;
 
-unsigned int shadow_pkey_reg;
+u64 shadow_pkey_reg;
 int dprint_in_signal;
 char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
 
@@ -163,7 +164,7 @@ void dump_mem(void *dumpme, int len_bytes)
 
for (i = 0; i < len_bytes; i += sizeof(u64)) {
u64 *ptr = (u64 *)(c + i);
-   dprintf1("dump[%03d][@%p]: %016jx\n", i, ptr, *ptr);
+   dprintf1("dump[%03d][@%p]: %016llx\n", i, ptr, *ptr);
}
 }
 
@@ -205,7 +206,8 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 
dprint_in_signal = 1;
dprintf1("===SIGSEGV\n");
-   dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__,
+   dprintf1("%s()::%d, pkey_reg: 0x%016llx shadow: %016llx\n",
+   __func__, __LINE__,
__read_pkey_reg(), shadow_pkey_reg);
 
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
@@ -213,8 +215,9 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
fpregset = uctxt->uc_mcontext.fpregs;
fpregs = (void *)fpregset;
 
-   dprintf2("%s() trapno: %d ip: 0x%lx info->si_code: %s/%d\n", __func__,
-   trapno, ip, si_code_str(si->si_code), si->si_code);
+   dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
+   __func__, trapno, ip, si_code_str(si->si_code),
+   si->si_code);
 #ifdef __i386__
/*
 * 32-bit has some extra padding so that userspace can tell whether
@@ -256,8 +259,9 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 * need __read_pkey_reg() version so we do not do shadow_pkey_reg
 * checking
 */
-   dprintf1("signal pkey_reg from  pkey_reg: %08x\n", __read_pkey_reg());
-   dprintf1("pkey from siginfo: %jx\n", siginfo_pkey);
+   dprintf1("signal pkey_reg from  pkey_reg: %016llx\n",
+  

[PATCH v19 05/24] selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()

2020-03-31 Thread Sandipan Das
From: Thiago Jung Bauermann 

This will help us ensure we print pkey_reg_t values correctly in
different architectures.

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Sandipan Das 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/vm/pkey-helpers.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 3ed2f021bf7a0..7f18a82e54fc8 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -27,6 +27,10 @@
 #define DPRINT_IN_SIGNAL_BUF_SIZE 4096
 extern int dprint_in_signal;
 extern char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
+
+#ifdef __GNUC__
+__attribute__((format(printf, 1, 2)))
+#endif
 static inline void sigsafe_printf(const char *format, ...)
 {
va_list ap;
-- 
2.17.1



[PATCH v19 03/24] selftests/vm/pkeys: Move generic definitions to header file

2020-03-31 Thread Sandipan Das
From: Ram Pai 

Moved all the generic definition and helper functions to the
header file.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 35 +---
 tools/testing/selftests/vm/protection_keys.c | 27 ---
 2 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index d5779be4793f8..6ad1bd54ef946 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -13,6 +13,14 @@
 #include 
 #include 
 
+/* Define some kernel-like types */
+#define  u8 uint8_t
+#define u16 uint16_t
+#define u32 uint32_t
+#define u64 uint64_t
+
+#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
+
 #define NR_PKEYS 16
 #define PKEY_BITS_PER_PKEY 2
 
@@ -53,6 +61,18 @@ static inline void sigsafe_printf(const char *format, ...)
 #define dprintf3(args...) dprintf_level(3, args)
 #define dprintf4(args...) dprintf_level(4, args)
 
+extern void abort_hooks(void);
+#define pkey_assert(condition) do {\
+   if (!(condition)) { \
+   dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
+   __FILE__, __LINE__, \
+   test_nr, iteration_nr); \
+   dprintf0("errno at assert: %d", errno); \
+   abort_hooks();  \
+   exit(__LINE__); \
+   }   \
+} while (0)
+
 extern unsigned int shadow_pkey_reg;
 static inline unsigned int __read_pkey_reg(void)
 {
@@ -137,11 +157,6 @@ static inline void __pkey_write_allow(int pkey, int 
do_allow_write)
dprintf4("pkey_reg now: %08x\n", read_pkey_reg());
 }
 
-#define PROT_PKEY0 0x10/* protection key value (bit 0) */
-#define PROT_PKEY1 0x20/* protection key value (bit 1) */
-#define PROT_PKEY2 0x40/* protection key value (bit 2) */
-#define PROT_PKEY3 0x80/* protection key value (bit 3) */
-
 #define PAGE_SIZE 4096
 #define MB (1<<20)
 
@@ -219,4 +234,14 @@ int pkey_reg_xstate_offset(void)
return xstate_offset;
 }
 
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+#define ALIGN_UP(x, align_to)  (((x) + ((align_to)-1)) & ~((align_to)-1))
+#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
+#define ALIGN_PTR_UP(p, ptr_align_to)  \
+   ((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to))
+#define ALIGN_PTR_DOWN(p, ptr_align_to)\
+   ((typeof(p))ALIGN_DOWN((unsigned long)(p), ptr_align_to))
+#define __stringify_1(x...) #x
+#define __stringify(x...)   __stringify_1(x)
+
 #endif /* _PKEYS_HELPER_H */
diff --git a/tools/testing/selftests/vm/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
index 2f4ab81c570db..42ffb58810f29 100644
--- a/tools/testing/selftests/vm/protection_keys.c
+++ b/tools/testing/selftests/vm/protection_keys.c
@@ -51,31 +51,10 @@ int test_nr;
 unsigned int shadow_pkey_reg;
 
 #define HPAGE_SIZE (1UL<<21)
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
-#define ALIGN_UP(x, align_to)  (((x) + ((align_to)-1)) & ~((align_to)-1))
-#define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
-#define ALIGN_PTR_UP(p, ptr_align_to)  ((typeof(p))ALIGN_UP((unsigned 
long)(p),ptr_align_to))
-#define ALIGN_PTR_DOWN(p, ptr_align_to)
((typeof(p))ALIGN_DOWN((unsigned long)(p),  ptr_align_to))
-#define __stringify_1(x...) #x
-#define __stringify(x...)   __stringify_1(x)
-
-#define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
 
 int dprint_in_signal;
 char dprint_in_signal_buffer[DPRINT_IN_SIGNAL_BUF_SIZE];
 
-extern void abort_hooks(void);
-#define pkey_assert(condition) do {\
-   if (!(condition)) { \
-   dprintf0("assert() at %s::%d test_nr: %d iteration: %d\n", \
-   __FILE__, __LINE__, \
-   test_nr, iteration_nr); \
-   dprintf0("errno at assert: %d", errno); \
-   abort_hooks();  \
-   exit(__LINE__); \
-   }   \
-} while (0)
-
 void cat_into_file(char *str, char *file)
 {
int fd = open(file, O_RDWR);
@@ -186,12 +165,6 @@ void lots_o_noops_around_write(int *write_to_me)
dprintf3("%s() done\n", __func__);
 }
 
-/* Define some kernel-like types */
-#define  u8 uint8_t
-#define u16 uint16_t
-#define u32 uint32_t
-#define u64 uint64_t
-
 #ifdef __i386__
 
 #ifndef SYS_mprotect_key
-- 
2.17.1



[PATCH v19 04/24] selftests/vm/pkeys: Move some definitions to arch-specific header

2020-03-31 Thread Sandipan Das
From: Thiago Jung Bauermann 

In preparation for multi-arch support, move definitions which
have arch-specific values to x86-specific header.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h| 111 +
 tools/testing/selftests/vm/pkey-x86.h| 156 +++
 tools/testing/selftests/vm/protection_keys.c |  47 --
 3 files changed, 162 insertions(+), 152 deletions(-)
 create mode 100644 tools/testing/selftests/vm/pkey-x86.h

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 6ad1bd54ef946..3ed2f021bf7a0 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -21,9 +21,6 @@
 
 #define PTR_ERR_ENOTSUP ((void *)-ENOTSUP)
 
-#define NR_PKEYS 16
-#define PKEY_BITS_PER_PKEY 2
-
 #ifndef DEBUG_LEVEL
 #define DEBUG_LEVEL 0
 #endif
@@ -73,19 +70,13 @@ extern void abort_hooks(void);
}   \
 } while (0)
 
+#if defined(__i386__) || defined(__x86_64__) /* arch */
+#include "pkey-x86.h"
+#else /* arch */
+#error Architecture not supported
+#endif /* arch */
+
 extern unsigned int shadow_pkey_reg;
-static inline unsigned int __read_pkey_reg(void)
-{
-   unsigned int eax, edx;
-   unsigned int ecx = 0;
-   unsigned int pkey_reg;
-
-   asm volatile(".byte 0x0f,0x01,0xee\n\t"
-: "=a" (eax), "=d" (edx)
-: "c" (ecx));
-   pkey_reg = eax;
-   return pkey_reg;
-}
 
 static inline unsigned int _read_pkey_reg(int line)
 {
@@ -100,19 +91,6 @@ static inline unsigned int _read_pkey_reg(int line)
 
 #define read_pkey_reg() _read_pkey_reg(__LINE__)
 
-static inline void __write_pkey_reg(unsigned int pkey_reg)
-{
-   unsigned int eax = pkey_reg;
-   unsigned int ecx = 0;
-   unsigned int edx = 0;
-
-   dprintf4("%s() changing %08x to %08x\n", __func__,
-   __read_pkey_reg(), pkey_reg);
-   asm volatile(".byte 0x0f,0x01,0xef\n\t"
-: : "a" (eax), "c" (ecx), "d" (edx));
-   assert(pkey_reg == __read_pkey_reg());
-}
-
 static inline void write_pkey_reg(unsigned int pkey_reg)
 {
dprintf4("%s() changing %08x to %08x\n", __func__,
@@ -157,83 +135,6 @@ static inline void __pkey_write_allow(int pkey, int 
do_allow_write)
dprintf4("pkey_reg now: %08x\n", read_pkey_reg());
 }
 
-#define PAGE_SIZE 4096
-#define MB (1<<20)
-
-static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
-   unsigned int *ecx, unsigned int *edx)
-{
-   /* ecx is often an input as well as an output. */
-   asm volatile(
-   "cpuid;"
-   : "=a" (*eax),
- "=b" (*ebx),
- "=c" (*ecx),
- "=d" (*edx)
-   : "0" (*eax), "2" (*ecx));
-}
-
-/* Intel-defined CPU features, CPUID level 0x0007:0 (ecx) */
-#define X86_FEATURE_PKU(1<<3) /* Protection Keys for Userspace */
-#define X86_FEATURE_OSPKE  (1<<4) /* OS Protection Keys Enable */
-
-static inline int cpu_has_pku(void)
-{
-   unsigned int eax;
-   unsigned int ebx;
-   unsigned int ecx;
-   unsigned int edx;
-
-   eax = 0x7;
-   ecx = 0x0;
-   __cpuid(, , , );
-
-   if (!(ecx & X86_FEATURE_PKU)) {
-   dprintf2("cpu does not have PKU\n");
-   return 0;
-   }
-   if (!(ecx & X86_FEATURE_OSPKE)) {
-   dprintf2("cpu does not have OSPKE\n");
-   return 0;
-   }
-   return 1;
-}
-
-#define XSTATE_PKEY_BIT(9)
-#define XSTATE_PKEY0x200
-
-int pkey_reg_xstate_offset(void)
-{
-   unsigned int eax;
-   unsigned int ebx;
-   unsigned int ecx;
-   unsigned int edx;
-   int xstate_offset;
-   int xstate_size;
-   unsigned long XSTATE_CPUID = 0xd;
-   int leaf;
-
-   /* assume that XSTATE_PKEY is set in XCR0 */
-   leaf = XSTATE_PKEY_BIT;
-   {
-   eax = XSTATE_CPUID;
-   ecx = leaf;
-   __cpuid(, , , );
-
-   if (leaf == XSTATE_PKEY_BIT) {
-   xstate_offset = ebx;
-   xstate_size = eax;
-   }
-   }
-
-   if (xstate_size == 0) {
-   printf("could not find size/offset of PKEY in xsave state\n");
-   return 0;
-   }
-
-   return xstate_offset;
-}
-
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
 #define ALIGN_UP(x, align_to)  (((x) + ((align_to)-1)) & ~((align_to)-1))
 #define ALIGN_DOWN(x, align_to) ((x) & ~((align_to)-1))
diff --git a/tools/testing/selftests/vm/pkey-x86.h 
b/tools/testing/selftests/vm/pkey-x86.h
new file mode 100644
index 0..2f04ade8ca9c4
--- /dev/null
+++ b/tools/testing/selftests/vm/pkey-x86.h
@@ -0,0 +1,156 @@
+/* 

[PATCH v19 01/24] selftests/x86/pkeys: Move selftests to arch-neutral directory

2020-03-31 Thread Sandipan Das
From: Ram Pai 

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Acked-by: Ingo Molnar 
Acked-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/.gitignore | 1 +
 tools/testing/selftests/vm/Makefile   | 1 +
 tools/testing/selftests/{x86 => vm}/pkey-helpers.h| 0
 tools/testing/selftests/{x86 => vm}/protection_keys.c | 0
 tools/testing/selftests/x86/.gitignore| 1 -
 tools/testing/selftests/x86/Makefile  | 2 +-
 6 files changed, 3 insertions(+), 2 deletions(-)
 rename tools/testing/selftests/{x86 => vm}/pkey-helpers.h (100%)
 rename tools/testing/selftests/{x86 => vm}/protection_keys.c (100%)

diff --git a/tools/testing/selftests/vm/.gitignore 
b/tools/testing/selftests/vm/.gitignore
index 31b3c98b6d34d..c55837bf39fa4 100644
--- a/tools/testing/selftests/vm/.gitignore
+++ b/tools/testing/selftests/vm/.gitignore
@@ -14,3 +14,4 @@ virtual_address_range
 gup_benchmark
 va_128TBswitch
 map_fixed_noreplace
+protection_keys
diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 7f9a8a8c31da9..4e9c741be6af2 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -18,6 +18,7 @@ TEST_GEN_FILES += on-fault-limit
 TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
+TEST_GEN_FILES += protection_keys
 
 ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 ppc64 riscv64 s390x sh64 
sparc64 x86_64))
 TEST_GEN_FILES += va_128TBswitch
diff --git a/tools/testing/selftests/x86/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
similarity index 100%
rename from tools/testing/selftests/x86/pkey-helpers.h
rename to tools/testing/selftests/vm/pkey-helpers.h
diff --git a/tools/testing/selftests/x86/protection_keys.c 
b/tools/testing/selftests/vm/protection_keys.c
similarity index 100%
rename from tools/testing/selftests/x86/protection_keys.c
rename to tools/testing/selftests/vm/protection_keys.c
diff --git a/tools/testing/selftests/x86/.gitignore 
b/tools/testing/selftests/x86/.gitignore
index 7757f73ff9a32..eb30ffd838768 100644
--- a/tools/testing/selftests/x86/.gitignore
+++ b/tools/testing/selftests/x86/.gitignore
@@ -11,5 +11,4 @@ ldt_gdt
 iopl
 mpx-mini-test
 ioperm
-protection_keys
 test_vdso
diff --git a/tools/testing/selftests/x86/Makefile 
b/tools/testing/selftests/x86/Makefile
index 5d49bfec1e9ae..5f16821c7f63a 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -12,7 +12,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) 
trivial_program.c -no-pie)
 
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt 
test_mremap_vdso \
check_initial_reg_state sigreturn iopl ioperm \
-   protection_keys test_vdso test_vsyscall mov_ss_trap \
+   test_vdso test_vsyscall mov_ss_trap \
syscall_arg_fault
 TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
test_FCMOV test_FCOMI test_FISTTP \
-- 
2.17.1



[PATCH v19 02/24] selftests/vm/pkeys: Rename all references to pkru to a generic name

2020-03-31 Thread Sandipan Das
From: Ram Pai 

This renames PKRU references to "pkey_reg" or "pkey" based on
the usage.

cc: Dave Hansen 
cc: Florian Weimer 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
Reviewed-by: Dave Hansen 
Signed-off-by: Sandipan Das 
---
 tools/testing/selftests/vm/pkey-helpers.h|  85 +++
 tools/testing/selftests/vm/protection_keys.c | 240 ++-
 2 files changed, 170 insertions(+), 155 deletions(-)

diff --git a/tools/testing/selftests/vm/pkey-helpers.h 
b/tools/testing/selftests/vm/pkey-helpers.h
index 254e5436bdd99..d5779be4793f8 100644
--- a/tools/testing/selftests/vm/pkey-helpers.h
+++ b/tools/testing/selftests/vm/pkey-helpers.h
@@ -14,7 +14,7 @@
 #include 
 
 #define NR_PKEYS 16
-#define PKRU_BITS_PER_PKEY 2
+#define PKEY_BITS_PER_PKEY 2
 
 #ifndef DEBUG_LEVEL
 #define DEBUG_LEVEL 0
@@ -53,85 +53,88 @@ static inline void sigsafe_printf(const char *format, ...)
 #define dprintf3(args...) dprintf_level(3, args)
 #define dprintf4(args...) dprintf_level(4, args)
 
-extern unsigned int shadow_pkru;
-static inline unsigned int __rdpkru(void)
+extern unsigned int shadow_pkey_reg;
+static inline unsigned int __read_pkey_reg(void)
 {
unsigned int eax, edx;
unsigned int ecx = 0;
-   unsigned int pkru;
+   unsigned int pkey_reg;
 
asm volatile(".byte 0x0f,0x01,0xee\n\t"
 : "=a" (eax), "=d" (edx)
 : "c" (ecx));
-   pkru = eax;
-   return pkru;
+   pkey_reg = eax;
+   return pkey_reg;
 }
 
-static inline unsigned int _rdpkru(int line)
+static inline unsigned int _read_pkey_reg(int line)
 {
-   unsigned int pkru = __rdpkru();
+   unsigned int pkey_reg = __read_pkey_reg();
 
-   dprintf4("rdpkru(line=%d) pkru: %x shadow: %x\n",
-   line, pkru, shadow_pkru);
-   assert(pkru == shadow_pkru);
+   dprintf4("read_pkey_reg(line=%d) pkey_reg: %x shadow: %x\n",
+   line, pkey_reg, shadow_pkey_reg);
+   assert(pkey_reg == shadow_pkey_reg);
 
-   return pkru;
+   return pkey_reg;
 }
 
-#define rdpkru() _rdpkru(__LINE__)
+#define read_pkey_reg() _read_pkey_reg(__LINE__)
 
-static inline void __wrpkru(unsigned int pkru)
+static inline void __write_pkey_reg(unsigned int pkey_reg)
 {
-   unsigned int eax = pkru;
+   unsigned int eax = pkey_reg;
unsigned int ecx = 0;
unsigned int edx = 0;
 
-   dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+   dprintf4("%s() changing %08x to %08x\n", __func__,
+   __read_pkey_reg(), pkey_reg);
asm volatile(".byte 0x0f,0x01,0xef\n\t"
 : : "a" (eax), "c" (ecx), "d" (edx));
-   assert(pkru == __rdpkru());
+   assert(pkey_reg == __read_pkey_reg());
 }
 
-static inline void wrpkru(unsigned int pkru)
+static inline void write_pkey_reg(unsigned int pkey_reg)
 {
-   dprintf4("%s() changing %08x to %08x\n", __func__, __rdpkru(), pkru);
+   dprintf4("%s() changing %08x to %08x\n", __func__,
+   __read_pkey_reg(), pkey_reg);
/* will do the shadow check for us: */
-   rdpkru();
-   __wrpkru(pkru);
-   shadow_pkru = pkru;
-   dprintf4("%s(%08x) pkru: %08x\n", __func__, pkru, __rdpkru());
+   read_pkey_reg();
+   __write_pkey_reg(pkey_reg);
+   shadow_pkey_reg = pkey_reg;
+   dprintf4("%s(%08x) pkey_reg: %08x\n", __func__,
+   pkey_reg, __read_pkey_reg());
 }
 
 /*
  * These are technically racy. since something could
- * change PKRU between the read and the write.
+ * change PKEY register between the read and the write.
  */
 static inline void __pkey_access_allow(int pkey, int do_allow)
 {
-   unsigned int pkru = rdpkru();
+   unsigned int pkey_reg = read_pkey_reg();
int bit = pkey * 2;
 
if (do_allow)
-   pkru &= (1<>>>===SIGSEGV\n");
-   dprintf1("%s()::%d, pkru: 0x%x shadow: %x\n", __func__, __LINE__,
-   __rdpkru(), shadow_pkru);
+   dprintf1("%s()::%d, pkey_reg: 0x%x shadow: %x\n", __func__, __LINE__,
+   __read_pkey_reg(), shadow_pkey_reg);
 
trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
@@ -289,19 +289,19 @@ void signal_handler(int signum, siginfo_t *si, void 
*vucontext)
 */
fpregs += 0x70;
 #endif
-   pkru_offset = pkru_xstate_offset();
-   pkru_ptr = (void *)([pkru_offset]);
+   pkey_reg_offset = pkey_reg_xstate_offset();
+   pkey_reg_ptr = (void *)([pkey_reg_offset]);
 
dprintf1("siginfo: %p\n", si);
dprintf1(" fpregs: %p\n", fpregs);
/*
-* If we got a PKRU fault, we *HAVE* to have at least one bit set in
+* If we got a PKEY fault, we *HAVE* to have at least one bit set in
 * here.
 */
-   dprintf1("pkru_xstate_offset: %d\n", 

[PATCH v19 00/24] selftests, powerpc, x86: Memory Protection Keys

2020-03-31 Thread Sandipan Das
Memory protection keys enables an application to protect its address
space from inadvertent access by its own code.

This feature is now enabled on powerpc and has been available since
4.16-rc1. The patches move the selftests to arch neutral directory
and enhance their test coverage.

Tested on powerpc64 and x86_64 (Skylake-SP).

Link to development branch:
https://github.com/sandip4n/linux/tree/pkey-selftests

Resending this based on feedback from maintainers who felt this
can go in via the -mm tree. This has no other changes from the
last version (v18) apart from being rebased.

Changelog
-
Link to previous version (v18):
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=155970

v19:
(1) Rebased on top of latest master.

v18:
(1) Fixed issues with x86 multilib builds based on
feedback from Dave.
(2) Moved patch 2 to the end of the series.

v17:
(1) Fixed issues with i386 builds when running on x86_64
based on feedback from Dave.
(2) Replaced patch 6 from previous version with patch 7.
This addresses u64 format specifier related concerns
that Michael had raised in v15.

v16:
(1) Rebased on top of latest master.
(2) Switched to u64 instead of using an arch-dependent
pkey_reg_t type for references to the pkey register
based on suggestions from Dave, Michal and Michael.
(3) Removed build time determination of page size based
on suggestion from Michael.
(4) Fixed comment before the definition of __page_o_noops()
from patch 13 ("selftests/vm/pkeys: Introduce powerpc
support").

v15:
(1) Rebased on top of latest master.
(2) Addressed review comments from Dave Hansen.
(3) Moved code for getting or setting pkey bits to new
helpers. These changes replace patch 7 of v14.
(4) Added a fix which ensures that the correct count of
reserved keys is used across different platforms.
(5) Added a fix which ensures that the correct page size
is used as powerpc supports both 4K and 64K pages.

v14:
(1) Incorporated another round of comments from Dave Hansen.

v13:
(1) Incorporated comments for Dave Hansen.
(2) Added one more test for correct pkey-0 behavior.

v12:
(1) Fixed the offset of pkey field in the siginfo structure for
x86_64 and powerpc. And tries to use the actual field
if the headers have it defined.

v11:
(1) Fixed a deadlock in the ptrace testcase.

v10 and prior:
(1) Moved the testcase to arch neutral directory.
(2) Split the changes into incremental patches.

Desnes A. Nunes do Rosario (1):
  selftests/vm/pkeys: Fix number of reserved powerpc pkeys

Ram Pai (16):
  selftests/x86/pkeys: Move selftests to arch-neutral directory
  selftests/vm/pkeys: Rename all references to pkru to a generic name
  selftests/vm/pkeys: Move generic definitions to header file
  selftests/vm/pkeys: Fix pkey_disable_clear()
  selftests/vm/pkeys: Fix assertion in pkey_disable_set/clear()
  selftests/vm/pkeys: Fix alloc_random_pkey() to make it really random
  selftests/vm/pkeys: Introduce generic pkey abstractions
  selftests/vm/pkeys: Introduce powerpc support
  selftests/vm/pkeys: Fix assertion in test_pkey_alloc_exhaust()
  selftests/vm/pkeys: Improve checks to determine pkey support
  selftests/vm/pkeys: Associate key on a mapped page and detect access
violation
  selftests/vm/pkeys: Associate key on a mapped page and detect write
violation
  selftests/vm/pkeys: Detect write violation on a mapped
access-denied-key page
  selftests/vm/pkeys: Introduce a sub-page allocator
  selftests/vm/pkeys: Test correct behaviour of pkey-0
  selftests/vm/pkeys: Override access right definitions on powerpc

Sandipan Das (5):
  selftests: vm: pkeys: Use sane types for pkey register
  selftests: vm: pkeys: Add helpers for pkey bits
  selftests: vm: pkeys: Use the correct huge page size
  selftests: vm: pkeys: Use the correct page size on powerpc
  selftests: vm: pkeys: Fix multilib builds for x86

Thiago Jung Bauermann (2):
  selftests/vm/pkeys: Move some definitions to arch-specific header
  selftests/vm/pkeys: Make gcc check arguments of sigsafe_printf()

 tools/testing/selftests/vm/.gitignore |   1 +
 tools/testing/selftests/vm/Makefile   |  73 ++
 tools/testing/selftests/vm/pkey-helpers.h | 225 ++
 tools/testing/selftests/vm/pkey-powerpc.h | 136 
 tools/testing/selftests/vm/pkey-x86.h | 181 +
 .../selftests/{x86 => vm}/protection_keys.c   | 696 ++
 tools/testing/selftests/x86/.gitignore|   1 -
 tools/testing/selftests/x86/Makefile  |   2 +-
 tools/testing/selftests/x86/pkey-helpers.h| 219 --
 9 files changed, 1002 insertions(+), 532 deletions(-)
 create mode 100644 

Re: [PATCH v7 7/7] powerpc/32: use set_memory_attr()

2020-03-31 Thread Christophe Leroy




Le 31/03/2020 à 06:48, Russell Currey a écrit :

From: Christophe Leroy 

Use set_memory_attr() instead of the PPC32 specific change_page_attr()

change_page_attr() was checking that the address was not mapped by
blocks and was handling highmem, but that's unneeded because the
affected pages can't be in highmem and block mapping verification
is already done by the callers.

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/mm/pgtable_32.c | 95 
  1 file changed, 10 insertions(+), 85 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 5fb90edd865e..3d92eaf3ee2f 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -121,99 +122,20 @@ void __init mapin_ram(void)
}
  }
  
-/* Scan the real Linux page tables and return a PTE pointer for

- * a virtual address in a context.
- * Returns true (1) if PTE was found, zero otherwise.  The pointer to
- * the PTE pointer is unmodified if PTE is not found.
- */
-static int
-get_pteptr(struct mm_struct *mm, unsigned long addr, pte_t **ptep, pmd_t 
**pmdp)



This will conflict, get_pteptr() is gone now, see 
https://github.com/linuxppc/linux/commit/2efc7c085f05870eda6f29ac71eeb83f3bd54415


Christophe




-{
-pgd_t  *pgd;
-   pud_t   *pud;
-pmd_t  *pmd;
-pte_t  *pte;
-int retval = 0;
-
-pgd = pgd_offset(mm, addr & PAGE_MASK);
-if (pgd) {
-   pud = pud_offset(pgd, addr & PAGE_MASK);
-   if (pud && pud_present(*pud)) {
-   pmd = pmd_offset(pud, addr & PAGE_MASK);
-   if (pmd_present(*pmd)) {
-   pte = pte_offset_map(pmd, addr & PAGE_MASK);
-   if (pte) {
-   retval = 1;
-   *ptep = pte;
-   if (pmdp)
-   *pmdp = pmd;
-   /* XXX caller needs to do pte_unmap, 
yuck */
-   }
-   }
-   }
-}
-return(retval);
-}
-
-static int __change_page_attr_noflush(struct page *page, pgprot_t prot)
-{
-   pte_t *kpte;
-   pmd_t *kpmd;
-   unsigned long address;
-
-   BUG_ON(PageHighMem(page));
-   address = (unsigned long)page_address(page);
-
-   if (v_block_mapped(address))
-   return 0;
-   if (!get_pteptr(_mm, address, , ))
-   return -EINVAL;
-   __set_pte_at(_mm, address, kpte, mk_pte(page, prot), 0);
-   pte_unmap(kpte);
-
-   return 0;
-}
-
-/*
- * Change the page attributes of an page in the linear mapping.
- *
- * THIS DOES NOTHING WITH BAT MAPPINGS, DEBUG USE ONLY
- */
-static int change_page_attr(struct page *page, int numpages, pgprot_t prot)
-{
-   int i, err = 0;
-   unsigned long flags;
-   struct page *start = page;
-
-   local_irq_save(flags);
-   for (i = 0; i < numpages; i++, page++) {
-   err = __change_page_attr_noflush(page, prot);
-   if (err)
-   break;
-   }
-   wmb();
-   local_irq_restore(flags);
-   flush_tlb_kernel_range((unsigned long)page_address(start),
-  (unsigned long)page_address(page));
-   return err;
-}
-
  void mark_initmem_nx(void)
  {
-   struct page *page = virt_to_page(_sinittext);
unsigned long numpages = PFN_UP((unsigned long)_einittext) -
 PFN_DOWN((unsigned long)_sinittext);
  
  	if (v_block_mapped((unsigned long)_stext + 1))

mmu_mark_initmem_nx();
else
-   change_page_attr(page, numpages, PAGE_KERNEL);
+   set_memory_attr((unsigned long)_sinittext, numpages, 
PAGE_KERNEL);
  }
  
  #ifdef CONFIG_STRICT_KERNEL_RWX

  void mark_rodata_ro(void)
  {
-   struct page *page;
unsigned long numpages;
  
  	if (v_block_mapped((unsigned long)_sinittext)) {

@@ -222,20 +144,18 @@ void mark_rodata_ro(void)
return;
}
  
-	page = virt_to_page(_stext);

numpages = PFN_UP((unsigned long)_etext) -
   PFN_DOWN((unsigned long)_stext);
  
-	change_page_attr(page, numpages, PAGE_KERNEL_ROX);

+   set_memory_attr((unsigned long)_stext, numpages, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
 */
-   page = virt_to_page(__start_rodata);
numpages = PFN_UP((unsigned long)__init_begin) -
   PFN_DOWN((unsigned long)__start_rodata);
  
-	change_page_attr(page, numpages, PAGE_KERNEL_RO);

+   set_memory_attr((unsigned long)__start_rodata, numpages, 

Emulate ppc64le builds on x86/x64 machine

2020-03-31 Thread shivakanth k
Hi ,
Could you please help me to set up ppc64le 0n liunux machine


Re: [PATCH v5 1/7] ASoC: dt-bindings: fsl_asrc: Add new property fsl,asrc-format

2020-03-31 Thread Nicolin Chen
On Tue, Mar 31, 2020 at 10:28:25AM +0800, Shengjiu Wang wrote:
> Hi
> 
> On Tue, Mar 24, 2020 at 5:22 AM Nicolin Chen  wrote:
> >
> > On Fri, Mar 20, 2020 at 11:32:13AM -0600, Rob Herring wrote:
> > > On Mon, Mar 09, 2020 at 02:19:44PM -0700, Nicolin Chen wrote:
> > > > On Mon, Mar 09, 2020 at 11:58:28AM +0800, Shengjiu Wang wrote:
> > > > > In order to support new EASRC and simplify the code structure,
> > > > > We decide to share the common structure between them. This bring
> > > > > a problem that EASRC accept format directly from devicetree, but
> > > > > ASRC accept width from devicetree.
> > > > >
> > > > > In order to align with new ESARC, we add new property fsl,asrc-format.
> > > > > The fsl,asrc-format can replace the fsl,asrc-width, then driver
> > > > > can accept format from devicetree, don't need to convert it to
> > > > > format through width.
> > > > >
> > > > > Signed-off-by: Shengjiu Wang 
> > > > > ---
> > > > >  Documentation/devicetree/bindings/sound/fsl,asrc.txt | 5 +
> > > > >  1 file changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/devicetree/bindings/sound/fsl,asrc.txt 
> > > > > b/Documentation/devicetree/bindings/sound/fsl,asrc.txt
> > > > > index cb9a25165503..780455cf7f71 100644
> > > > > --- a/Documentation/devicetree/bindings/sound/fsl,asrc.txt
> > > > > +++ b/Documentation/devicetree/bindings/sound/fsl,asrc.txt
> > > > > @@ -51,6 +51,11 @@ Optional properties:
> > > > > will be in use as default. Otherwise, the big 
> > > > > endian
> > > > > mode will be in use for all the device registers.
> > > > >
> > > > > +   - fsl,asrc-format : Defines a mutual sample format used by 
> > > > > DPCM Back
> > > > > +   Ends, which can replace the fsl,asrc-width.
> > > > > +   The value is SNDRV_PCM_FORMAT_S16_LE, or
> > > > > +   SNDRV_PCM_FORMAT_S24_LE
> > > >
> > > > I am still holding the concern at the DT binding of this format,
> > > > as it uses values from ASoC header file instead of a dt-binding
> > > > header file -- not sure if we can do this. Let's wait for Rob's
> > > > comments.
> > >
> > > I assume those are an ABI as well, so it's okay to copy them unless we
> >
> > They are defined under include/uapi. So I think we can use them?
> >
> > > already have some format definitions for DT. But it does need to be copy
> > > in a header under include/dt-bindings/.
> >
> > Shengjiu is actually quoting those integral values, rather than
> > those macros, so actually no need copy to include/dt-bindings,
> > yet whoever adds this format property to a new DT would need to
> > look up the value in a header file under include/uapi. I's just
> > wondering if that's okay.
> >
> > Thanks
> Shall I keep this change or drop this change?

This version of patch defines the format using those two marcos.
So what Rob suggested is to copy those defines from uapi header
file to dt-bindings folder. But you don't intend to do that?

My follow-up mail is to find if using integral values is doable.
Yet, not seeing any reply further. I think you can make a choice
to send it -- We will see how Rob acks eventually, or not.


Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-03-31 Thread Christophe Leroy




Le 31/03/2020 à 09:19, Christophe Leroy a écrit :



Le 31/03/2020 à 08:59, Michal Simek a écrit :

On 31. 03. 20 8:56, Christophe Leroy wrote:



Le 31/03/2020 à 07:30, Michael Ellerman a écrit :

Christophe Leroy  writes:

Le 27/03/2020 à 15:14, Andy Shevchenko a écrit :

On Fri, Mar 27, 2020 at 02:22:55PM +0100, Arnd Bergmann wrote:

On Fri, Mar 27, 2020 at 2:15 PM Andy Shevchenko
 wrote:

On Fri, Mar 27, 2020 at 03:10:26PM +0200, Andy Shevchenko wrote:

On Fri, Mar 27, 2020 at 01:54:33PM +0100, Arnd Bergmann wrote:

On Fri, Mar 27, 2020 at 1:12 PM Michal Simek
 wrote:

...


It does raise a follow-up question about ppc40x though: is it
time to
retire all of it?


Who knows?

I have in possession nice WD My Book Live, based on this
architecture, and I
won't it gone from modern kernel support. OTOH I understand that
amount of real
users not too big.


+Cc: Christian Lamparter, whom I owe for that WD box.


According to https://openwrt.org/toh/wd/mybooklive, that one is
based on
APM82181/ppc464, so it is about several generations newer than 
what I

asked about (ppc40x).


Ah, and I have Amiga board, but that one is being used only for
testing, so,
I don't care much.


I think there are a couple of ppc440 based Amiga boards, but again,
not 405
to my knowledge.


Ah, you are right. No objections from ppc40x removal!


Removing 40x would help cleaning things a bit. For instance 40x is the
last platform still having PTE_ATOMIC_UPDATES. So if we can remove 40x
we can get rid of PTE_ATOMIC_UPDATES completely.

If no one objects, I can prepare a series to drop support for 40x
completely.

Michael, any thought ?


I have no attachment to 40x, and I'd certainly be happy to have less
code in the tree, we struggle to keep even the modern platforms well
maintained.

At the same time I don't want to render anyone's hardware obsolete
unnecessarily. But if there's really no one using 40x then we should
remove it, it could well be broken already.

So I guess post a series to do the removal and we'll see if anyone
speaks up.



Ok, series sent out, see
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167757


ok. I see you have done it completely independently of my patchset.
Would be better if you can base it on the top of my 2 patches because
they are in conflict now and I need to also remove virtex 44x platform
also with alsa driver.



I can't see your first patch, only the second one shows up in the 
series, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167757





Ok, I found your first patch on another patchwork, it doesn't touch any 
file in arch/powerpc/


I sent a v2 series with your powerpc patch as patch 2/11

See https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167766

Christophe


  1   2   >