Re: [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()

2021-08-05 Thread Srikar Dronamraju
* Parth Shah  [2021-07-28 23:26:06]:

> From: "Gautham R. Shenoy" 
> 
> The helper function get_shared_cpu_map() was added in
> 
> 'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct
> shared_cpu_map on big-cores")'
> 
> and subsequently expanded upon in
> 
> 'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling
> map/list for L2 cache")'
> 
> in order to help report the correct groups of threads sharing these caches
> on big-core systems where groups of threads within a core can share
> different sets of caches.
> 
> Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
> cache->shared_cpu_map contains the correct set of thread-siblings
> sharing the cache. Hence we no longer need the functions
> get_shared_cpu_map(). This patch removes this function. We also remove
> the helper function index_dir_to_cpu() which was only called by
> get_shared_cpu_map().
> 
> With these functions removed, we can still see the correct
> cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
> caches distributed among groups of threads in a core.
> 
> With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
> are split between the two groups of threads in a core, for CPUs 8,9,
> the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
> follows:
> 
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
> /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
> /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
> /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
> /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15
> 
> $ ppc64_cpu --smt=4
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
> /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
> /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
> /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
> /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11
> 
> $ ppc64_cpu --smt=2
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
> /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
> /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
> /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
> /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9
> 
> $ ppc64_cpu --smt=1
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8
> 
> Signed-off-by: Gautham R. Shenoy 

Looks good to me.

Reviewed-by: Srikar Dronamraju 

> ---
>  arch/powerpc/kernel/cacheinfo.c | 41 +
>  1 file changed, 1 insertion(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
> index 5a6925d87424..20d91693eac1 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -675,45 +675,6 @@ static ssize_t level_show(struct kobject *k, struct 
> kobj_attribute *attr, char *
>  static struct kobj_attribute cache_level_attr =
>   __ATTR(level, 0444, level_show, NULL);
> 
> -static unsigned int index_dir_to_cpu(struct cache_index_dir *index)
> -{
> - struct kobject *index_dir_kobj = >kobj;
> - struct kobject *cache_dir_kobj = index_dir_kobj->parent;
> - struct kobject *cpu_dev_kobj = cache_dir_kobj->parent;
> - struct device *dev = kobj_to_dev(cpu_dev_kobj);
> -
> - return dev->id;
> -}
> -
> -/*
> - * On big-core systems, each core has two groups of CPUs each of which
> - * has its own L1-cache. The thread-siblings which share l1-cache with
> - * @cpu can be obtained via cpu_smallcore_mask().
> - *
> - * On some big-core systems, the L2 cache is shared only between some
> - * groups of siblings. This is already parsed and encoded in
> - * cpu_l2_cache_mask().
> - *
> - * TODO: cache_lookup_or_instantiate() needs to be made aware of the
> - *   

Re: [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id

2021-08-05 Thread Srikar Dronamraju
* Parth Shah  [2021-07-28 23:26:05]:

> From: "Gautham R. Shenoy" 
> 
> Currently the cacheinfo code on powerpc indexes the "cache" objects
> (modelling the L1/L2/L3 caches) where the key is device-tree node
> corresponding to that cache. On some of the POWER server platforms
> thread-groups within the core share different sets of caches (Eg: On
> SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
> threads 1,3,5,7 of the same core share another L1 cache). On such
> platforms, there is a single device-tree node corresponding to that
> cache and the cache-configuration within the threads of the core is
> indicated via "ibm,thread-groups" device-tree property.
> 
> Since the current code is not aware of the "ibm,thread-groups"
> property, on the aforementoined systems, cacheinfo code still treats
> all the threads in the core to be sharing the cache because of the
> single device-tree node (In the earlier example, the cacheinfo code
> would says CPUs 0-7 share L1 cache).
> 
> In this patch, we make the powerpc cacheinfo code aware of the
> "ibm,thread-groups" property. We indexe the "cache" objects by the
> key-pair (device-tree node, thread-group id). For any CPUX, for a
> given level of cache, the thread-group id is defined to be the first
> CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
> of cache which are not represented in "ibm,thread-groups" property,
> the thread-group id is -1.
> 
> Signed-off-by: Gautham R. Shenoy 
> [parth: Remove "static" keyword for the definition of 
> "thread_group_l1_cache_map"
> and "thread_group_l2_cache_map" to get rid of the compile error.]
> Signed-off-by: Parth Shah 


Looks good to me.

Reviewed-by: Srikar Dronamraju 

> ---
>  arch/powerpc/include/asm/smp.h  |  3 ++
>  arch/powerpc/kernel/cacheinfo.c | 80 -
>  arch/powerpc/kernel/smp.c   |  4 +-
>  3 files changed, 63 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 03b3d010cbab..1259040cc3a4 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -33,6 +33,9 @@ extern bool coregroup_enabled;
>  extern int cpu_to_chip_id(int cpu);
>  extern int *chip_id_lookup_table;
> 
> +DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
> +DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
> +
>  #ifdef CONFIG_SMP
> 
>  struct smp_ops_t {
> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
> index 6f903e9aa20b..5a6925d87424 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -120,6 +120,7 @@ struct cache {
>   struct cpumask shared_cpu_map; /* online CPUs using this cache */
>   int type;  /* split cache disambiguation */
>   int level; /* level not explicit in device tree */
> + int group_id;  /* id of the group of threads that share 
> this cache */
>   struct list_head list; /* global list of cache objects */
>   struct cache *next_local;  /* next cache of >= level */
>  };
> @@ -142,22 +143,24 @@ static const char *cache_type_string(const struct cache 
> *cache)
>  }
> 
>  static void cache_init(struct cache *cache, int type, int level,
> -struct device_node *ofnode)
> +struct device_node *ofnode, int group_id)
>  {
>   cache->type = type;
>   cache->level = level;
>   cache->ofnode = of_node_get(ofnode);
> + cache->group_id = group_id;
>   INIT_LIST_HEAD(>list);
>   list_add(>list, _list);
>  }
> 
> -static struct cache *new_cache(int type, int level, struct device_node 
> *ofnode)
> +static struct cache *new_cache(int type, int level,
> +struct device_node *ofnode, int group_id)
>  {
>   struct cache *cache;
> 
>   cache = kzalloc(sizeof(*cache), GFP_KERNEL);
>   if (cache)
> - cache_init(cache, type, level, ofnode);
> + cache_init(cache, type, level, ofnode, group_id);
> 
>   return cache;
>  }
> @@ -309,20 +312,24 @@ static struct cache *cache_find_first_sibling(struct 
> cache *cache)
>   return cache;
> 
>   list_for_each_entry(iter, _list, list)
> - if (iter->ofnode == cache->ofnode && iter->next_local == cache)
> + if (iter->ofnode == cache->ofnode &&
> + iter->group_id == cache->group_id &&
> + iter->next_local == cache)
>   return iter;
> 
>   return cache;
>  }
> 
> -/* return the first cache on a local list matching node */
> -static struct cache *cache_lookup_by_node(const struct device_node *node)
> +/* return the first cache on a local list matching node and thread-group id 
> */
> +static struct cache *cache_lookup_by_node_group(const struct device_node 
> *node,
> + int group_id)
>  {
>   

Re: [PATCH v4] soc: fsl: qe: convert QE interrupt controller to platform_device

2021-08-05 Thread Christophe Leroy




Le 06/08/2021 à 06:39, Saravana Kannan a écrit :

On Thu, Aug 5, 2021 at 9:35 PM Maxim Kochetkov  wrote:


03.08.2021 20:51, Saravana Kannan wrote:

So lets convert this driver to simple platform_device with probe().
Also use platform_get_ and devm_ family function to get/allocate
resources and drop unused .compatible = "qeic".

Yes, please!


Should I totally drop { .type = "qeic"}, or keep?


Sorry for the confusion. My "Yes, please"!" was a show of support for
switching this to a proper platform driver. Not a response to that
specific question.

I didn't look at the code/DT close enough to know/care about the "type" part.



As far as I understand, Leo told it needs to remain, based on his answer below:

"From the original code, this should be type = "qeic".  It is not
defined in current binding but probably needed for backward
compatibility."


Christophe


Re: [RFC PATCH 3/4] powerpc: Optimize register usage for dear register

2021-08-05 Thread Xiongwei Song
On Thu, Aug 5, 2021 at 6:09 PM Christophe Leroy
 wrote:
>
>
>
> Le 26/07/2021 à 16:30, sxwj...@me.com a écrit :
> > From: Xiongwei Song 
> >
> > Create an anonymous union for dar and dear regsiters, we can reference
> > dear to get the effective address when CONFIG_4xx=y or CONFIG_BOOKE=y.
> > Otherwise, reference dar. This makes code more clear.
>
> Same comment here as for patch 1.

Same reply for the patch 1.
Thank you.

>
>
> >
> > Signed-off-by: Xiongwei Song 
> > ---
> >   arch/powerpc/include/asm/ptrace.h  | 5 -
> >   arch/powerpc/include/uapi/asm/ptrace.h | 5 -
> >   arch/powerpc/kernel/process.c  | 2 +-
> >   arch/powerpc/kernel/ptrace/ptrace.c| 2 ++
> >   arch/powerpc/kernel/traps.c| 5 -
> >   arch/powerpc/mm/fault.c| 2 +-
> >   6 files changed, 16 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/ptrace.h 
> > b/arch/powerpc/include/asm/ptrace.h
> > index c252d04b1206..fa725e3238c2 100644
> > --- a/arch/powerpc/include/asm/ptrace.h
> > +++ b/arch/powerpc/include/asm/ptrace.h
> > @@ -43,7 +43,10 @@ struct pt_regs
> >   unsigned long mq;
> >   #endif
> >   unsigned long trap;
> > - unsigned long dar;
> > + union {
> > + unsigned long dar;
> > + unsigned long dear;
> > + };
> >   union {
> >   unsigned long dsisr;
> >   unsigned long esr;
> > diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
> > b/arch/powerpc/include/uapi/asm/ptrace.h
> > index e357288b5f34..9ae150fb4c4b 100644
> > --- a/arch/powerpc/include/uapi/asm/ptrace.h
> > +++ b/arch/powerpc/include/uapi/asm/ptrace.h
> > @@ -52,7 +52,10 @@ struct pt_regs
> >   unsigned long trap; /* Reason for being here */
> >   /* N.B. for critical exceptions on 4xx, the dar and dsisr
> >  fields are overloaded to hold srr0 and srr1. */
> > - unsigned long dar;  /* Fault registers */
> > + union {
> > + unsigned long dar;  /* Fault registers */
> > + unsigned long dear;
> > + };
> >   union {
> >   unsigned long dsisr;/* on Book-S used for DSISR */
> >   unsigned long esr;  /* on 4xx/Book-E used for ESR 
> > */
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index f74af8f9133c..50436b52c213 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1499,7 +1499,7 @@ static void __show_regs(struct pt_regs *regs)
> >   trap == INTERRUPT_DATA_STORAGE ||
> >   trap == INTERRUPT_ALIGNMENT) {
> >   if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
> > - pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
> > regs->esr);
> > + pr_cont("DEAR: "REG" ESR: "REG" ", regs->dear, 
> > regs->esr);
> >   else
> >   pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
> > regs->dsisr);
> >   }
> > diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
> > b/arch/powerpc/kernel/ptrace/ptrace.c
> > index 00789ad2c4a3..969dca8b0718 100644
> > --- a/arch/powerpc/kernel/ptrace/ptrace.c
> > +++ b/arch/powerpc/kernel/ptrace/ptrace.c
> > @@ -373,6 +373,8 @@ void __init pt_regs_check(void)
> >offsetof(struct user_pt_regs, trap));
> >   BUILD_BUG_ON(offsetof(struct pt_regs, dar) !=
> >offsetof(struct user_pt_regs, dar));
> > + BUILD_BUG_ON(offsetof(struct pt_regs, dear) !=
> > +  offsetof(struct user_pt_regs, dear));
> >   BUILD_BUG_ON(offsetof(struct pt_regs, dsisr) !=
> >offsetof(struct user_pt_regs, dsisr));
> >   BUILD_BUG_ON(offsetof(struct pt_regs, esr) !=
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 2164f5705a0b..0796630d3d23 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -1609,7 +1609,10 @@ DEFINE_INTERRUPT_HANDLER(alignment_exception)
> >   }
> >   bad:
> >   if (user_mode(regs))
> > - _exception(sig, regs, code, regs->dar);
> > + if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
> > + _exception(sig, regs, code, regs->dear);
> > + else
> > + _exception(sig, regs, code, regs->dar);
> >   else
> >   bad_page_fault(regs, sig);
> >   }
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 62953d4e7c93..3db6b39a1178 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -542,7 +542,7 @@ static __always_inline void __do_page_fault(struct 
> > pt_regs *regs)
> >   long err;
> >
> >   if (IS_ENABLED(CONFIG_4xx) || 

Re: [RFC PATCH 1/4] powerpc: Optimize register usage for esr register

2021-08-05 Thread Xiongwei Song
On Thu, Aug 5, 2021 at 6:06 PM Christophe Leroy
 wrote:
>
>
>
> Le 26/07/2021 à 16:30, sxwj...@me.com a écrit :
> > From: Xiongwei Song 
> >
> > Create an anonymous union for dsisr and esr regsiters, we can reference
> > esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y.
> > Otherwise, reference dsisr. This makes code more clear.
>
> I'm not sure it is worth doing that.
Why don't we use "esr" as reference manauls mentioned?

>
> What is the point in doing the following when you know that regs->esr and 
> regs->dsisr are exactly
> the same:
>
>  > -err = ___do_page_fault(regs, regs->dar, regs->dsisr);
>  > +if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
>  > +err = ___do_page_fault(regs, regs->dar, regs->esr);
>  > +else
>  > +err = ___do_page_fault(regs, regs->dar, regs->dsisr);
>  > +
Yes, we can drop this. But it's a bit vague.

> Or even
>
>  > -int is_write = page_fault_is_write(regs->dsisr);
>  > +unsigned long err_reg;
>  > +int is_write;
>  > +
>  > +if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
>  > +err_reg = regs->esr;
>  > +else
>  > +err_reg = regs->dsisr;
>  > +
>  > +is_write = page_fault_is_write(err_reg);
>
>
> Artificially growing the code for that makes no sense to me.

We can drop this too.
>
>
> To avoid anbiguity, maybe the best would be to rename regs->dsisr to 
> something like regs->sr , so
> that we know it represents the status register, which is DSISR or ESR 
> depending on the platform.

If so, this would make other people more confused. My consideration is
to follow what the reference
manuals represent.

Thank you so much for your comments.

Regards,
Xiongwei
>
>
> >
> > Signed-off-by: Xiongwei Song 
> > ---
> >   arch/powerpc/include/asm/ptrace.h  |  5 -
> >   arch/powerpc/include/uapi/asm/ptrace.h |  5 -
> >   arch/powerpc/kernel/process.c  |  2 +-
> >   arch/powerpc/kernel/ptrace/ptrace.c|  2 ++
> >   arch/powerpc/kernel/traps.c|  2 +-
> >   arch/powerpc/mm/fault.c| 16 ++--
> >   arch/powerpc/platforms/44x/machine_check.c |  4 ++--
> >   arch/powerpc/platforms/4xx/machine_check.c |  2 +-
> >   8 files changed, 29 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/ptrace.h 
> > b/arch/powerpc/include/asm/ptrace.h
> > index 3e5d470a6155..c252d04b1206 100644
> > --- a/arch/powerpc/include/asm/ptrace.h
> > +++ b/arch/powerpc/include/asm/ptrace.h
> > @@ -44,7 +44,10 @@ struct pt_regs
> >   #endif
> >   unsigned long trap;
> >   unsigned long dar;
> > - unsigned long dsisr;
> > + union {
> > + unsigned long dsisr;
> > + unsigned long esr;
> > + };
> >   unsigned long result;
> >   };
> >   };
> > diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
> > b/arch/powerpc/include/uapi/asm/ptrace.h
> > index 7004cfea3f5f..e357288b5f34 100644
> > --- a/arch/powerpc/include/uapi/asm/ptrace.h
> > +++ b/arch/powerpc/include/uapi/asm/ptrace.h
> > @@ -53,7 +53,10 @@ struct pt_regs
> >   /* N.B. for critical exceptions on 4xx, the dar and dsisr
> >  fields are overloaded to hold srr0 and srr1. */
> >   unsigned long dar;  /* Fault registers */
> > - unsigned long dsisr;/* on 4xx/Book-E used for ESR */
> > + union {
> > + unsigned long dsisr;/* on Book-S used for DSISR */
> > + unsigned long esr;  /* on 4xx/Book-E used for ESR 
> > */
> > + };
> >   unsigned long result;   /* Result of a system call */
> >   };
> >
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index 185beb290580..f74af8f9133c 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1499,7 +1499,7 @@ static void __show_regs(struct pt_regs *regs)
> >   trap == INTERRUPT_DATA_STORAGE ||
> >   trap == INTERRUPT_ALIGNMENT) {
> >   if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
> > - pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
> > regs->dsisr);
> > + pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
> > regs->esr);
> >   else
> >   pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
> > regs->dsisr);
> >   }
> > diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
> > b/arch/powerpc/kernel/ptrace/ptrace.c
> > index 0a0a33eb0d28..00789ad2c4a3 100644
> > --- a/arch/powerpc/kernel/ptrace/ptrace.c
> > +++ b/arch/powerpc/kernel/ptrace/ptrace.c
> > @@ -375,6 +375,8 @@ void __init pt_regs_check(void)
> >offsetof(struct user_pt_regs, dar));
> >   BUILD_BUG_ON(offsetof(struct pt_regs, dsisr) !=
> >

[PATCH v4 2/2] virtio-console: remove unnecessary kmemdup()

2021-08-05 Thread Xianting Tian
hvc framework will never pass stack memory to the put_chars() function,
So the calling of kmemdup() is unnecessary, we can remove it.

This revert commit c4baad5029 ("virtio-console: avoid DMA from stack")

Signed-off-by: Xianting Tian 
---
 drivers/char/virtio_console.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 7eaf303a7..4ed3ffb1d 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1117,8 +1117,6 @@ static int put_chars(u32 vtermno, const char *buf, int 
count)
 {
struct port *port;
struct scatterlist sg[1];
-   void *data;
-   int ret;
 
if (unlikely(early_put_chars))
return early_put_chars(vtermno, buf, count);
@@ -1127,14 +1125,8 @@ static int put_chars(u32 vtermno, const char *buf, int 
count)
if (!port)
return -EPIPE;
 
-   data = kmemdup(buf, count, GFP_ATOMIC);
-   if (!data)
-   return -ENOMEM;
-
-   sg_init_one(sg, data, count);
-   ret = __send_to_port(port, sg, 1, count, data, false);
-   kfree(data);
-   return ret;
+   sg_init_one(sg, buf, count);
+   return __send_to_port(port, sg, 1, count, (void *)buf, false);
 }
 
 /*
-- 
2.17.1



[PATCH v4 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-05 Thread Xianting Tian
As well known, hvc backend can register its opertions to hvc backend.
the opertions contain put_chars(), get_chars() and so on.

Some hvc backend may do dma in its opertions. eg, put_chars() of
virtio-console. But in the code of hvc framework, it may pass DMA
incapable memory to put_chars() under a specific configuration, which
is explained in commit c4baad5029(virtio-console: avoid DMA from stack):
1, c[] is on stack,
   hvc_console_print():
char c[N_OUTBUF] __ALIGNED__;
cons_ops[index]->put_chars(vtermnos[index], c, i);
2, ch is on stack,
   static void hvc_poll_put_char(,,char ch)
   {
struct tty_struct *tty = driver->ttys[0];
struct hvc_struct *hp = tty->driver_data;
int n;

do {
n = hp->ops->put_chars(hp->vtermno, , 1);
} while (n <= 0);
   }

Commit c4baad5029 is just the fix to avoid DMA from stack memory, which
is passed to virtio-console by hvc framework in above code. But I think
the fix is aggressive, it directly uses kmemdup() to alloc new buffer
from kmalloc area and do memcpy no matter the memory is in kmalloc area
or not. But most importantly, it should better be fixed in the hvc
framework, by changing it to never pass stack memory to the put_chars()
function in the first place. Otherwise, we still face the same issue if
a new hvc backend using dma added in the furture.

We make 'char c[N_OUTBUF]' part of 'struct hvc_struct', so hp->c is no
longer the stack memory. we can use it in above two cases.

Other cleanup is to make 'hp->outbuf' aligned and use struct_size() to
calculate the size of hvc_struct.

With the patch, we can remove the fix c4baad5029.

Signed-off-by: Xianting Tian 
Tested-by: Xianting Tian 
---
 drivers/tty/hvc/hvc_console.c | 33 ++---
 drivers/tty/hvc/hvc_console.h | 16 ++--
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 5bb8c4e44..3afdb169c 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -41,16 +41,6 @@
  */
 #define HVC_CLOSE_WAIT (HZ/100) /* 1/10 of a second */
 
-/*
- * These sizes are most efficient for vio, because they are the
- * native transfer size. We could make them selectable in the
- * future to better deal with backends that want other buffer sizes.
- */
-#define N_OUTBUF   16
-#define N_INBUF16
-
-#define __ALIGNED__ __attribute__((__aligned__(sizeof(long
-
 static struct tty_driver *hvc_driver;
 static struct task_struct *hvc_task;
 
@@ -151,9 +141,11 @@ static uint32_t vtermnos[MAX_NR_HVC_CONSOLES] =
 static void hvc_console_print(struct console *co, const char *b,
  unsigned count)
 {
-   char c[N_OUTBUF] __ALIGNED__;
+   char *c;
unsigned i = 0, n = 0;
int r, donecr = 0, index = co->index;
+   unsigned long flags;
+   struct hvc_struct *hp;
 
/* Console access attempt outside of acceptable console range. */
if (index >= MAX_NR_HVC_CONSOLES)
@@ -163,6 +155,13 @@ static void hvc_console_print(struct console *co, const 
char *b,
if (vtermnos[index] == -1)
return;
 
+   list_for_each_entry(hp, _structs, next)
+   if (hp->vtermno == vtermnos[index])
+   break;
+
+   c = hp->c;
+
+   spin_lock_irqsave(>c_lock, flags);
while (count > 0 || i > 0) {
if (count > 0 && i < sizeof(c)) {
if (b[n] == '\n' && !donecr) {
@@ -191,6 +190,7 @@ static void hvc_console_print(struct console *co, const 
char *b,
}
}
}
+   spin_unlock_irqrestore(>c_lock, flags);
hvc_console_flush(cons_ops[index], vtermnos[index]);
 }
 
@@ -878,9 +878,13 @@ static void hvc_poll_put_char(struct tty_driver *driver, 
int line, char ch)
struct tty_struct *tty = driver->ttys[0];
struct hvc_struct *hp = tty->driver_data;
int n;
+   unsigned long flags;
 
do {
-   n = hp->ops->put_chars(hp->vtermno, , 1);
+   spin_lock_irqsave(>c_lock, flags);
+   hp->c[0] = ch;
+   n = hp->ops->put_chars(hp->vtermno, hp->c, 1);
+   spin_unlock_irqrestore(>c_lock, flags);
} while (n <= 0);
 }
 #endif
@@ -922,8 +926,7 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
return ERR_PTR(err);
}
 
-   hp = kzalloc(ALIGN(sizeof(*hp), sizeof(long)) + outbuf_size,
-   GFP_KERNEL);
+   hp = kzalloc(struct_size(hp, outbuf, outbuf_size), GFP_KERNEL);
if (!hp)
return ERR_PTR(-ENOMEM);
 
@@ -931,13 +934,13 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
hp->data = data;
hp->ops = ops;
hp->outbuf_size = outbuf_size;
-   hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];
 

[PATCH v4 0/2] make hvc pass dma capable memory to its backend

2021-08-05 Thread Xianting Tian
Dear all,

This patch series make hvc framework pass DMA capable memory to
put_chars() of hvc backend(eg, virtio-console), and revert commit
c4baad5029 ("virtio-console: avoid DMA from stack”)

V1
virtio-console: avoid DMA from vmalloc area
https://lkml.org/lkml/2021/7/27/494

For v1 patch, Arnd Bergmann suggests to fix the issue in the first
place:
Make hvc pass DMA capable memory to put_chars()
The fix suggestion is included in v2.

V2
[PATCH 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/1/8
[PATCH 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/1/9

For v2 patch, Arnd Bergmann suggests to make new buf part of the
hvc_struct structure, and fix the compile issue.
The fix suggestion is included in v3.

V3
[PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()
https://lkml.org/lkml/2021/8/3/1347
[PATCH v3 2/2] virtio-console: remove unnecessary kmemdup()
https://lkml.org/lkml/2021/8/3/1348

For v3 patch, Jiri Slaby suggests to make 'char c[N_OUTBUF]' part of
hvc_struct, and make 'hp->outbuf' aligned and use struct_size() to
calculate the size of hvc_struct. The fix suggestion is included in
v4.

drivers/char/virtio_console.c | 12 ++--
drivers/tty/hvc/hvc_console.c | 33 ++---
drivers/tty/hvc/hvc_console.h | 16 ++--
3 file changed


Re: [PATCH v1 02/55] KVM: PPC: Book3S HV P9: Fixes for TM softpatch interrupt

2021-08-05 Thread Michael Ellerman
Nicholas Piggin  writes:
> The softpatch interrupt sets HSRR0 to the faulting instruction +4, so
> it should subtract 4 for the faulting instruction address. Also have it
> emulate and deliver HFAC interrupts correctly, which is important for
> nested HV and facility demand-faulting in future.

The nip being off by 4 sounds bad. But I guess it's not that big a deal
because it's only used for reporting the instruction address?

Would also be good to have some more explanation of why it's OK to
change from illegal to HFAC, which is a guest visible change.

> diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
> index cc90b8b82329..e4fd4a9dee08 100644
> --- a/arch/powerpc/kvm/book3s_hv_tm.c
> +++ b/arch/powerpc/kvm/book3s_hv_tm.c
> @@ -74,19 +74,23 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
>   case PPC_INST_RFEBB:
>   if ((msr & MSR_PR) && (vcpu->arch.vcore->pcr & PCR_ARCH_206)) {
>   /* generate an illegal instruction interrupt */
> + vcpu->arch.regs.nip -= 4;
>   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
>   return RESUME_GUEST;
>   }
>   /* check EBB facility is available */
>   if (!(vcpu->arch.hfscr & HFSCR_EBB)) {
> - /* generate an illegal instruction interrupt */
> - kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> - return RESUME_GUEST;
> + vcpu->arch.regs.nip -= 4;
> + vcpu->arch.hfscr &= ~HFSCR_INTR_CAUSE;
> + vcpu->arch.hfscr |= (u64)FSCR_EBB_LG << 56;
> + vcpu->arch.trap = BOOK3S_INTERRUPT_H_FAC_UNAVAIL;
> + return -1; /* rerun host interrupt handler */

This is EBB not TM. Probably OK to leave it in this patch as long as
it's mentioned in the change log?

>   }
>   if ((msr & MSR_PR) && !(vcpu->arch.fscr & FSCR_EBB)) {
>   /* generate a facility unavailable interrupt */
> - vcpu->arch.fscr = (vcpu->arch.fscr & ~(0xffull << 56)) |
> - ((u64)FSCR_EBB_LG << 56);
> + vcpu->arch.regs.nip -= 4;
> + vcpu->arch.fscr &= ~FSCR_INTR_CAUSE;
> + vcpu->arch.fscr |= (u64)FSCR_EBB_LG << 56;

Same.


cheers


Re: Debian SID kernel doesn't boot on PowerBook 3400c

2021-08-05 Thread Finn Thain
(Christophe, you've seen some of this before, however there are new 
results added at the end. I've Cc'd the mailing lists this time.)

On Wed, 4 Aug 2021, Stan Johnson wrote:

> On 8/4/21 8:41 PM, Finn Thain wrote:
> 
> >
> > $ curl 
> > https://lore.kernel.org/lkml/9b64dde3-6ebd-b446-41d9-61e8cb0d8...@csgroup.eu/raw
> > ../message.mbox
> ok
> 
> $ sha1 ../message.mbox
> SHA1 (../message.mbox) = 436ce0adf893c46c84c54607f73c838897caeeea
> 
> >
> > On Wed, 4 Aug 2021, Christophe Leroy wrote:
> >
> >> Can you check if they happen at commit c16728835
> >>
> 
> $ git checkout c16728835eec
> Checking out files: 100% (20728/20728), done.
> Note: checking out 'c16728835eec'.
> 
> You are in 'detached HEAD' state. You can look around, make experimental
> changes and commit them, and you can discard any commits you make in this
> state without impacting any branches by performing another checkout.
> 
> If you want to create a new branch to retain commits you create, you may
> do so (now or later) by using -b with the checkout command again. Example:
> 
>   git checkout -b 
> 
> HEAD is now at c16728835eec powerpc/32: Manage KUAP in C
> $ git am ../message.mbox
> warning: Patch sent with format=flowed; space at the end of lines might be 
> lost.
> Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
> $ cp ../dot-config-powermac-5.13 .config
> $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig 
> vmlinux
> $ strings vmlinux | fgrep 'Linux version'
> Linux version 5.12.0-rc3-pmac-00078-geb51c431b81 (johnson@ThinkPad) 
> (powerpc-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0, GNU ld (GNU Binutils for 
> Debian) 2.31.1) #1 SMP Wed Aug 4 21:50:47 MDT 2021
> 
> 1) PB 3400c
> Hangs at boot (Mac OS screen), no serial console output
> 
> 2) Wallstreet
> X fails, errors ("Kernel attempted to write user page", "BUG: Unable to
> handle kernel instruction fetch"), see Wallstreet_console-1.txt.
> 

The log shows that the error "Kernel attempted to write user page 
(b3399774) - exploit attempt?" happens after commit c16728835eec 
("powerpc/32: Manage KUAP in C").

> >>
> >> Can you check if they DO NOT happen at preceding commit c16728835~
> >>
> 
> $ git checkout c16728835~
> Previous HEAD position was c16728835eec powerpc/32: Manage KUAP in C
> HEAD is now at 0b45359aa2df powerpc/8xx: Create C version of kuap 
> save/restore/check helpers
> $ git am ../message.mbox
> warning: Patch sent with format=flowed; space at the end of lines might be 
> lost.
> Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
> $ cp ../dot-config-powermac-5.13 .config
> $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig 
> vmlinux
> 
> Linux version 5.12.0-rc3-pmac-00077-gc9f6e8dd045
> 
> 3) PB 3400c
> Hangs at boot (Mac OS screen)
> 
> 4) Wallstreet
> X fails, errors in console log (different than test 2), see
> Wallstreet_console-2.txt.
> 

This log shows that the errors "xfce4-session[1775]: bus error (7)" and 
"kernel BUG at arch/powerpc/kernel/interrupt.c:49!" happen prior to commit 
c16728835eec ("powerpc/32: Manage KUAP in C").

> 
> $ git checkout 0b45359aa2df
> ...
> HEAD is now at 0b45359aa2df powerpc/8xx: Create C version of kuap 
> save/restore/check helpers
> $ git am ../message.mbox
> warning: Patch sent with format=flowed; space at the end of lines might be 
> lost.
> Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
> $ cp ../dot-config-powermac-5.13 .config
> $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig 
> vmlinux
> 
> Linux version 5.12.0-rc3-pmac-00077-ge06b29ce146
> 
> 5) PB 3400c
> Hangs at boot (Mac OS screen)
> 
> 6) Wallstreet
> X failed (X login succeeded, but setting up desktop failed), errors in
> console log, see Wallstreet_console-3.txt.
> 

(No need for those two tests: it's exactly the same code and almost the 
same failure modes: "kernel BUG at arch/powerpc/kernel/interrupt.c:50".)

On Thu, 5 Aug 2021, Stan Johnson wrote:

> On 8/5/21 12:47 AM, Finn Thain wrote:
> 
> > On Wed, 4 Aug 2021, Christophe Leroy wrote:
> >
> >> Could you test without CONFIG_PPC_KUAP
> ...
> 
> $ git checkout c16728835eec
> ...
> HEAD is now at c16728835eec powerpc/32: Manage KUAP in C
> $ git am ../message.mbox
> warning: Patch sent with format=flowed; space at the end of lines might be 
> lost.
> Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
> $ cp ../dot-config-powermac-5.13 .config
> $ scripts/config -d CONFIG_PPC_KUAP
> $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig 
> vmlinux
> $ grep CONFIG_PPC_KUAP .config
> # CONFIG_PPC_KUAP is not set
> 
> Linux version 5.12.0-rc3-pmac-00078-g5cac2bc3752
> 
> 7) PB 3400c
> Hangs at boot (Mac OS screen)
> 
> 8) Wallstreet
> Everything works, no errors (see Wallstreet_console-4.txt).
> 

That would seem to implicate CONFIG_PPC_KUAP itself. (Note that all builds 
up until this one have CONFIG_PPC_KUAP=y.)

> 
> >
> >> Could you test with CONFIG_PPC_KUAP and 

Re: [PATCH v2 0/6] PCI: Drop duplicated tracking of a pci_dev's bound driver

2021-08-05 Thread Bjorn Helgaas
On Tue, Aug 03, 2021 at 12:01:44PM +0200, Uwe Kleine-König wrote:
> Hello,
> 
> changes since v1 
> (https://lore.kernel.org/linux-pci/20210729203740.1377045-1-u.kleine-koe...@pengutronix.de):
> 
> - New patch to simplify drivers/pci/xen-pcifront.c, spotted and
>   suggested by Boris Ostrovsky
> - Fix a possible NULL pointer dereference I introduced in xen-pcifront.c
> - A few whitespace improvements
> - Add a commit log to patch #6 (formerly #5)
> 
> I also expanded the audience for patches #4 and #6 to allow affected
> people to actually see the changes to their drivers.
> 
> Interdiff can be found below.
> 
> The idea is still the same: After a few cleanups (#1 - #3) a new macro
> is introduced abstracting access to struct pci_dev->driver. All users
> are then converted to use this and in the last patch the macro is
> changed to make use of struct pci_dev::dev->driver to get rid of the
> duplicated tracking.

I love the idea of this series!

I looked at all the bus_type.probe() methods, it looks like pci_dev is
not the only offender here.  At least the following also have a driver
pointer in the device struct:

  parisc_device.driver
  acpi_device.driver
  dio_dev.driver
  hid_device.driver
  pci_dev.driver
  pnp_dev.driver
  rio_dev.driver
  zorro_dev.driver

Do you plan to do the same for all of them, or is there some reason
why they need the pointer and PCI doesn't?

In almost all cases, other buses define a "to__driver()"
interface.  In fact, PCI already has a to_pci_driver().

This series adds pci_driver_of_dev(), which basically just means we
can do this:

  pdrv = pci_driver_of_dev(pdev);

instead of this:

  pdrv = to_pci_driver(pdev->dev.driver);

I don't see any other "_driver_of_dev()" interfaces, so I assume
other buses just live with the latter style?  I'd rather not be
different and have two ways to get the "struct pci_driver *" unless
there's a good reason.

Looking through the places that care about pci_dev.driver (the ones
updated by patch 5/6), many of them are ... a little dubious to begin
with.  A few need the "struct pci_error_handlers *err_handler"
pointer, so that's probably legitimate.  But many just need a name,
and should probably be using dev_driver_string() instead.

Bjorn


Re: [PATCH v2 5/6] PCI: Adapt all code locations to not use struct pci_dev::driver directly

2021-08-05 Thread Andrew Donnellan

On 3/8/21 8:01 pm, Uwe Kleine-König wrote:

This prepares removing the driver member of struct pci_dev which holds the
same information than struct pci_dev::dev->driver.

Signed-off-by: Uwe Kleine-König 


cxl hunks look alright.

Acked-by: Andrew Donnellan  # cxl

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited


Re: [PATCH v4 1/3] audit: replace magic audit syscall class numbers with macros

2021-08-05 Thread Paul Moore
On Wed, May 19, 2021 at 4:01 PM Richard Guy Briggs  wrote:
>
> Replace audit syscall class magic numbers with macros.
>
> This required putting the macros into new header file
> include/linux/auditsc_classmacros.h since the syscall macros were
> included for both 64 bit and 32 bit in any compat code, causing
> redefinition warnings.
>
> Signed-off-by: Richard Guy Briggs 
> Link: 
> https://lore.kernel.org/r/2300b1083a32aade7ae7efb95826e8f3f260b1df.1621363275.git@redhat.com
> ---
>  MAINTAINERS |  1 +
>  arch/alpha/kernel/audit.c   |  8 
>  arch/ia64/kernel/audit.c|  8 
>  arch/parisc/kernel/audit.c  |  8 
>  arch/parisc/kernel/compat_audit.c   |  9 +
>  arch/powerpc/kernel/audit.c | 10 +-
>  arch/powerpc/kernel/compat_audit.c  | 11 ++-
>  arch/s390/kernel/audit.c| 10 +-
>  arch/s390/kernel/compat_audit.c | 11 ++-
>  arch/sparc/kernel/audit.c   | 10 +-
>  arch/sparc/kernel/compat_audit.c| 11 ++-
>  arch/x86/ia32/audit.c   | 11 ++-
>  arch/x86/kernel/audit_64.c  |  8 
>  include/linux/audit.h   |  1 +
>  include/linux/auditsc_classmacros.h | 23 +++
>  kernel/auditsc.c| 12 ++--
>  lib/audit.c | 10 +-
>  lib/compat_audit.c  | 11 ++-
>  18 files changed, 102 insertions(+), 71 deletions(-)
>  create mode 100644 include/linux/auditsc_classmacros.h

...

> diff --git a/include/linux/auditsc_classmacros.h 
> b/include/linux/auditsc_classmacros.h
> new file mode 100644
> index ..18757d270961
> --- /dev/null
> +++ b/include/linux/auditsc_classmacros.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/* auditsc_classmacros.h -- Auditing support syscall macros
> + *
> + * Copyright 2021 Red Hat Inc., Durham, North Carolina.
> + * All Rights Reserved.
> + *
> + * Author: Richard Guy Briggs 
> + */
> +#ifndef _LINUX_AUDITSCM_H_
> +#define _LINUX_AUDITSCM_H_
> +
> +enum auditsc_class_t {
> +   AUDITSC_NATIVE = 0,
> +   AUDITSC_COMPAT,
> +   AUDITSC_OPEN,
> +   AUDITSC_OPENAT,
> +   AUDITSC_SOCKETCALL,
> +   AUDITSC_EXECVE,
> +
> +   AUDITSC_NVALS /* count */
> +};
> +
> +#endif

My apologies Richard, for some reason I had it in my mind that this
series was waiting on you to answer a question and/or respin; however,
now that I'm clearing my patch queues looking for any stragglers I see
that isn't the case.  Looking over the patchset I think it looks okay
to me, my only concern is that "auditsc_classmacros.h" is an awfully
specific header file name and could prove to be annoying if we want to
add to it in the future.  What do you think about something like
"audit_arch.h" instead?

If that change is okay with you I can go ahead and do the rename while
I'm merging the patches, I'll consider it penance for letting this
patchset sit for so long :/

-- 
paul moore
www.paul-moore.com


[PATCH v2 1/3] KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines

2021-08-05 Thread Fabiano Rosas
The __kvmhv_copy_tofrom_guest_radix function was introduced along with
nested HV guest support. It uses the platform's Radix MMU quadrants to
provide a nested hypervisor with fast access to its nested guests
memory (H_COPY_TOFROM_GUEST hypercall). It has also since been added
as a fast path for the kvmppc_ld/st routines which are used during
instruction emulation.

The commit def0bfdbd603 ("powerpc: use probe_user_read() and
probe_user_write()") changed the low level copy function from
raw_copy_from_user to probe_user_read, which adds a check to
access_ok. In powerpc that is:

 static inline bool __access_ok(unsigned long addr, unsigned long size)
 {
 return addr < TASK_SIZE_MAX && size <= TASK_SIZE_MAX - addr;
 }

and TASK_SIZE_MAX is 0x0010UL for 64-bit, which means that
setting the two MSBs of the effective address (which correspond to the
quadrant) now cause access_ok to reject the access.

This was not caught earlier because the most common code path via
kvmppc_ld/st contains a fallback (kvm_read_guest) that is likely to
succeed for L1 guests. For nested guests there is no fallback.

Another issue is that probe_user_read (now __copy_from_user_nofault)
does not return the number of bytes not copied in case of failure, so
the destination memory is not being cleared anymore in
kvmhv_copy_from_guest_radix:

 ret = kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, to, NULL, n);
 if (ret > 0)<-- always false!
 memset(to + (n - ret), 0, ret);

This patch fixes both issues by skipping access_ok and open-coding the
low level __copy_to/from_user_inatomic.

Fixes: def0bfdbd603 ("powerpc: use probe_user_read() and probe_user_write()")
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..44eb7b1ef289 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -65,10 +65,12 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
}
isync();
 
+   pagefault_disable();
if (is_load)
-   ret = copy_from_user_nofault(to, (const void __user *)from, n);
+   ret = __copy_from_user_inatomic(to, (const void __user *)from, 
n);
else
-   ret = copy_to_user_nofault((void __user *)to, from, n);
+   ret = __copy_to_user_inatomic((void __user *)to, from, n);
+   pagefault_enable();
 
/* switch the pid first to avoid running host with unallocated pid */
if (quadrant == 1 && pid != old_pid)
-- 
2.29.2



[PATCH v2 3/3] KVM: PPC: Book3S HV: Stop exporting symbols from book3s_64_mmu_radix

2021-08-05 Thread Fabiano Rosas
The book3s_64_mmu_radix.o object is not part of the KVM builtins and
all the callers of the exported symbols are in the same kvm-hv.ko
module so we should not need to export any symbols.

Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 1b1c9e9e539b..16359525a40f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -86,7 +86,6 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
 
return ret;
 }
-EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix);
 
 static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
  void *to, void *from, unsigned long n)
@@ -122,14 +121,12 @@ long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, 
gva_t eaddr, void *to,
 
return ret;
 }
-EXPORT_SYMBOL_GPL(kvmhv_copy_from_guest_radix);
 
 long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *from,
   unsigned long n)
 {
return kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, NULL, from, n);
 }
-EXPORT_SYMBOL_GPL(kvmhv_copy_to_guest_radix);
 
 int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr,
   struct kvmppc_pte *gpte, u64 root,
-- 
2.29.2



[PATCH v2 2/3] KVM: PPC: Book3S HV: Add sanity check to copy_tofrom_guest

2021-08-05 Thread Fabiano Rosas
Both paths into __kvmhv_copy_tofrom_guest_radix ensure that we arrive
with an effective address that is smaller than our total addressable
space and addresses quadrant 0.

- The H_COPY_TOFROM_GUEST hypercall path rejects the call with
H_PARAMETER if the effective address has any of the twelve most
significant bits set.

- The kvmhv_copy_tofrom_guest_radix path clears the top twelve bits
before calling the internal function.

Although the callers make sure that the effective address is sane, any
future use of the function is exposed to a programming error, so add a
sanity check.

Suggested-by: Nicholas Piggin 
Signed-off-by: Fabiano Rosas 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 44eb7b1ef289..1b1c9e9e539b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -44,6 +44,9 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
  (to != NULL) ? __pa(to): 0,
  (from != NULL) ? __pa(from): 0, n);
 
+   if (eaddr & (0xFFFUL << 52))
+   return ret;
+
quadrant = 1;
if (!pid)
quadrant = 2;
-- 
2.29.2



[PATCH v2 0/3] KVM: PPC: Book3S HV: kvmhv_copy_tofrom_guest_radix changes

2021-08-05 Thread Fabiano Rosas
This series contains the fix for __kvmhv_copy_tofrom_guest_radix plus
a couple of changes.

- Patch 1: The fix itself. I thought a smaller fix upfront would be
   better to facilitate any backports.

- Patch 2: Adds a sanity check to the low level function. Since the
   hcall API already enforces that the effective address must
   be on quadrant 0, moving the checks from the high level
   function would only add overhead to the hcall path, so I
   opted for a lightweight check at the low level.

- Patch 3: Cleans up the EXPORT_SYMBOL_GPL tags. I don't see how they
   would be needed since the functions are contained within
   kvm-hv.ko.

v1:

https://lkml.kernel.org/r/20210802234941.2568493-1-faro...@linux.ibm.com

Fabiano Rosas (3):
  KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines
  KVM: PPC: Book3S HV: Add sanity check to copy_tofrom_guest
  KVM: PPC: Book3S HV: Stop exporting symbols from book3s_64_mmu_radix

 arch/powerpc/kvm/book3s_64_mmu_radix.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

-- 
2.29.2



Re: [PATCH printk v1 00/10] printk: introduce atomic consoles and sync mode

2021-08-05 Thread Petr Mladek
On Tue 2021-08-03 15:18:51, John Ogness wrote:
> Hi,
> 
> This is the next part of our printk-rework effort (points 3 and
> 4 of the LPC 2019 summary [0]).
> 
> Here the concept of "atomic consoles" is introduced through  a
> new (optional) write_atomic() callback for console drivers. This
> callback must be implemented as an NMI-safe variant of the
> write() callback, meaning that it can function from any context
> without relying on questionable tactics such as ignoring locking
> and also without relying on the synchronization of console
> semaphore.
> 
> As an example of how such an atomic console can look like, this
> series implements write_atomic() for the 8250 UART driver.
> 
> This series also introduces a new console printing mode called
> "sync mode" that is only activated when the kernel is about to
> end (such as panic, oops, shutdown, reboot). Sync mode can only
> be activated if atomic consoles are available. A system without
> registered atomic consoles will be unaffected by this series.
>
> When in sync mode, the console printing behavior becomes:
> 
> - only consoles implementing write_atomic() will be called
> 
> - printing occurs within vprintk_store() instead of
>   console_unlock(), since the console semaphore is irrelevant
>   for atomic consoles

I am fine with the new behavior at this stage. It is a quite clear
win when (only) the atomic console is used. And it does not make any
difference when atomic consoles are disabled.

But I am not sure about the proposed terms and implementation.
I want to be sure that we are on the right way for introducing
console kthreads.

Let me try to compare the behavior:

1. before this patchset():

/* printk: store immediately; try all consoles immediately */
int printk(...)
{
vprintk_store();
if (console_try_lock()) {
/* flush pending messages to the consoles */
console_unlock();
}
}

/* panic: try hard to flush messages to the consoles and avoid deadlock 
*/
void panic()
{
/* Ignore locks in console drivers */
bust_spinlocks(1);

printk("Kernel panic ...);
dump_stack();

smp_send_stop();
/* ignore console lock */
console_flush_on_panic();
}


2. after this patchset():

   + same as before in normal mode or when there is no atomic console

   + in panic with atomic console; it modifies the behavior:

/*
 * printk: store immediately; immediately flush atomic consoles;
 * unsafe consoles are not used anymore;
 */
int printk(...)
{
vprintk_store();
flush_atomic_consoles();
}

/* panic: no hacks; only atomic consoles are used */
void panic()
{
printk("Kernel panic ...);
dump_stack();
}


3. After introducing console kthread(s):

int printk(...)
{
vprintk_store();
wake_consoles_via_irqwork();
}

+ in panic:

+ with atomic console like after this patchset?
+ without atomic consoles?

+ during early boot?


I guess that we will need another sync mode for the early boot,
panic, suspend, kexec, etc.. It must be posible to debug these states
even wihtout atomic console and working kthreads.

Best Regards,
Petr


[RFC PATCH 02/15] arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER

2021-08-05 Thread Zi Yan
From: Zi Yan 

This Kconfig option is used by individual arch to set its desired
MAX_ORDER. Rename it to reflect its actual use.

Signed-off-by: Zi Yan 
Cc: Vineet Gupta 
Cc: Shawn Guo 
Cc: Catalin Marinas 
Cc: Guo Ren 
Cc: Geert Uytterhoeven 
Cc: Thomas Bogendoerfer 
Cc: Ley Foon Tan 
Cc: Michael Ellerman 
Cc: Yoshinori Sato 
Cc: "David S. Miller" 
Cc: Chris Zankel 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ox...@groups.io
Cc: linux-c...@vger.kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
---
 arch/arc/Kconfig | 2 +-
 arch/arm/Kconfig | 2 +-
 arch/arm/configs/imx_v6_v7_defconfig | 2 +-
 arch/arm/configs/milbeaut_m10v_defconfig | 2 +-
 arch/arm/configs/oxnas_v6_defconfig  | 2 +-
 arch/arm/configs/sama7_defconfig | 2 +-
 arch/arm64/Kconfig   | 2 +-
 arch/csky/Kconfig| 2 +-
 arch/ia64/Kconfig| 2 +-
 arch/ia64/include/asm/sparsemem.h| 6 +++---
 arch/m68k/Kconfig.cpu| 2 +-
 arch/mips/Kconfig| 2 +-
 arch/nios2/Kconfig   | 2 +-
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/configs/85xx/ge_imp3a_defconfig | 2 +-
 arch/powerpc/configs/fsl-emb-nonhw.config| 2 +-
 arch/sh/configs/ecovec24_defconfig   | 2 +-
 arch/sh/mm/Kconfig   | 2 +-
 arch/sparc/Kconfig   | 2 +-
 arch/xtensa/Kconfig  | 2 +-
 include/linux/mmzone.h   | 4 ++--
 21 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index b5bf68e74732..923ea4c31e59 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -583,7 +583,7 @@ config ARC_BUILTIN_DTB_NAME
 
 endmenu # "ARC Architecture Configuration"
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "12" if ARC_HUGEPAGE_16M
default "11"
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2fb7012c3246..286854318fe5 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1523,7 +1523,7 @@ config ARM_MODULE_PLTS
  Disabling this is usually safe for small single-platform
  configurations. If unsure, say y.
 
-config FORCE_MAX_ZONEORDER
+config ARCH_FORCE_MAX_ORDER
int "Maximum zone order"
default "12" if SOC_AM33XX
default "9" if SA
diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index 079fcd8d1d11..802310d3ebf5 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -34,7 +34,7 @@ CONFIG_PCI_IMX6=y
 CONFIG_SMP=y
 CONFIG_ARM_PSCI=y
 CONFIG_HIGHMEM=y
-CONFIG_FORCE_MAX_ZONEORDER=14
+CONFIG_ARCH_FORCE_MAX_ORDER=14
 CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
 CONFIG_KEXEC=y
 CONFIG_CPU_FREQ=y
diff --git a/arch/arm/configs/milbeaut_m10v_defconfig 
b/arch/arm/configs/milbeaut_m10v_defconfig
index 7c07f9893a0f..06967243f74d 100644
--- a/arch/arm/configs/milbeaut_m10v_defconfig
+++ b/arch/arm/configs/milbeaut_m10v_defconfig
@@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
 # CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 CONFIG_HIGHMEM=y
-CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=12
 CONFIG_SECCOMP=y
 CONFIG_KEXEC=y
 CONFIG_EFI=y
diff --git a/arch/arm/configs/oxnas_v6_defconfig 
b/arch/arm/configs/oxnas_v6_defconfig
index cae0db6b4eaf..df8462272446 100644
--- a/arch/arm/configs/oxnas_v6_defconfig
+++ b/arch/arm/configs/oxnas_v6_defconfig
@@ -17,7 +17,7 @@ CONFIG_MACH_OX820=y
 CONFIG_SMP=y
 CONFIG_NR_CPUS=16
 CONFIG_CMA=y
-CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_ARCH_FORCE_MAX_ORDER=12
 CONFIG_SECCOMP=y
 CONFIG_ARM_APPENDED_DTB=y
 CONFIG_ARM_ATAG_DTB_COMPAT=y
diff --git a/arch/arm/configs/sama7_defconfig b/arch/arm/configs/sama7_defconfig
index 938aae4bd80b..f8683b87cb27 100644
--- a/arch/arm/configs/sama7_defconfig
+++ b/arch/arm/configs/sama7_defconfig
@@ -22,7 +22,7 @@ CONFIG_ATMEL_CLOCKSOURCE_TCB=y
 # CONFIG_CACHE_L2X0 is not set
 # CONFIG_ARM_PATCH_IDIV is not set
 # CONFIG_CPU_SW_DOMAIN_PAN is not set
-CONFIG_FORCE_MAX_ZONEORDER=15
+CONFIG_ARCH_FORCE_MAX_ORDER=15
 CONFIG_UACCESS_WITH_MEMCPY=y
 # CONFIG_ATAGS is not set
 CONFIG_CMDLINE="console=ttyS0,115200 earlyprintk ignore_loglevel"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b5b13a932561..972d81f6bb2c 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1152,7 +1152,7 @@ config XEN
help
  Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
 

Re: [PATCH 1/3] arch: Export machine_restart() instances so they can be called from modules

2021-08-05 Thread Lee Jones
On Thu, 05 Aug 2021, Greg Kroah-Hartman wrote:

> On Thu, Aug 05, 2021 at 06:36:25PM +0100, Catalin Marinas wrote:
> > On Thu, Aug 05, 2021 at 08:50:30AM +0100, Lee Jones wrote:
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index b4bb67f17a2ca..cf89ce91d7145 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > @@ -212,6 +212,7 @@ void machine_restart(char *cmd)
> > >   printk("Reboot failed -- System halted\n");
> > >   while (1);
> > >  }
> > > +EXPORT_SYMBOL(machine_restart);
> > 
> > Should we make this EXPORT_SYMBOL_GPL? I suppose it's not for general
> > use by out of tree drivers and it matches the other pm_power_off symbol
> > we export in this file.
> 
> Yes please.

Sure.

Thanks for the feedback.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog


Re: [PATCH 1/3] arch: Export machine_restart() instances so they can be called from modules

2021-08-05 Thread Greg Kroah-Hartman
On Thu, Aug 05, 2021 at 06:36:25PM +0100, Catalin Marinas wrote:
> On Thu, Aug 05, 2021 at 08:50:30AM +0100, Lee Jones wrote:
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index b4bb67f17a2ca..cf89ce91d7145 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -212,6 +212,7 @@ void machine_restart(char *cmd)
> > printk("Reboot failed -- System halted\n");
> > while (1);
> >  }
> > +EXPORT_SYMBOL(machine_restart);
> 
> Should we make this EXPORT_SYMBOL_GPL? I suppose it's not for general
> use by out of tree drivers and it matches the other pm_power_off symbol
> we export in this file.

Yes please.


Re: [PATCH 1/3] arch: Export machine_restart() instances so they can be called from modules

2021-08-05 Thread Catalin Marinas
On Thu, Aug 05, 2021 at 08:50:30AM +0100, Lee Jones wrote:
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index b4bb67f17a2ca..cf89ce91d7145 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -212,6 +212,7 @@ void machine_restart(char *cmd)
>   printk("Reboot failed -- System halted\n");
>   while (1);
>  }
> +EXPORT_SYMBOL(machine_restart);

Should we make this EXPORT_SYMBOL_GPL? I suppose it's not for general
use by out of tree drivers and it matches the other pm_power_off symbol
we export in this file.

Either way:

Acked-by: Catalin Marinas 


[PATCH] powerpc/pseries: Fix update of LPAR security flavor after LPM

2021-08-05 Thread Laurent Dufour
After LPM, when migrating from a system with security mitigation enabled to
a system with mitigation disabled, the security flavor exposed in /proc is
not correctly set back to 0.

Do not assume the value of the security flavor is set to 0 when entering
init_cpu_char_feature_flags(), so when called after a LPM, the value is set
correctly even if the mitigation are not turned off.

Fixes: 6ce56e1ac380 ("powerpc/pseries: export LPAR security flavor in
lparcfg")

Cc: sta...@vger.kernel.org # 5.13.x
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/setup.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 6b0886668465..0dfaa6ab44cc 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -539,9 +539,10 @@ static void init_cpu_char_feature_flags(struct 
h_cpu_char_result *result)
 * H_CPU_BEHAV_FAVOUR_SECURITY_H could be set only if
 * H_CPU_BEHAV_FAVOUR_SECURITY is.
 */
-   if (!(result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))
+   if (!(result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY)) {
security_ftr_clear(SEC_FTR_FAVOUR_SECURITY);
-   else if (result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY_H)
+   pseries_security_flavor = 0;
+   } else if (result->behaviour & H_CPU_BEHAV_FAVOUR_SECURITY_H)
pseries_security_flavor = 1;
else
pseries_security_flavor = 2;
-- 
2.32.0



[Bug 213961] Oops while loading radeon driver

2021-08-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213961

Elimar Riesebieter (riese...@lxtec.de) changed:

   What|Removed |Added

  Component|PPC-32  |Video(Other)
Product|Platform Specific/Hardware  |Drivers
 Regression|No  |Yes

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-05 Thread Xianting Tian



在 2021/8/5 下午4:09, Jiri Slaby 写道:

On 05. 08. 21, 9:58, Jiri Slaby wrote:

Hi,

On 04. 08. 21, 4:54, Xianting Tian wrote:
@@ -933,6 +949,16 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, 
int data,

  hp->outbuf_size = outbuf_size;
  hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];


This deserves cleanup too. Why is "outbuf" not "char outbuf[0] 
__ALIGNED__" at the end of the structure? The allocation would be 
easier (using struct_size()) and this line would be gone completely.

I will make the cleanup in v4.



+    /*
+ * hvc_con_outbuf is guaranteed to be aligned at least to the
+ * size(N_OUTBUF) by kmalloc().
+ */
+    hp->hvc_con_outbuf = kzalloc(N_OUTBUF, GFP_KERNEL);
+    if (!hp->hvc_con_outbuf)
+    return ERR_PTR(-ENOMEM);


This leaks hp, right?


Actually, why don't you make
char c[N_OUTBUF] __ALIGNED__;

part of struct hvc_struct directly?

thanks, it a good idea, I will change it in v4.



BTW your 2 patches are still not threaded, that is hard to follow.


+
+    spin_lock_init(>hvc_con_lock);
+
  tty_port_init(>port);
  hp->port.ops = _port_ops;


thanks,


[Bug 213961] Oops while loading radeon driver

2021-08-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213961

--- Comment #10 from Christophe Leroy (christophe.le...@csgroup.eu) ---
(In reply to Elimar Riesebieter from comment #9)
> Well, 5.13.8 just runs fine, though.

Yes, recent changes to Radeon appear for the first time in v5.14-rc1

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213961] Oops while loading radeon driver

2021-08-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213961

--- Comment #9 from Elimar Riesebieter (riese...@lxtec.de) ---
Well, 5.13.8 just runs fine, though.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [RFC PATCH 3/4] powerpc: Optimize register usage for dear register

2021-08-05 Thread Christophe Leroy




Le 26/07/2021 à 16:30, sxwj...@me.com a écrit :

From: Xiongwei Song 

Create an anonymous union for dar and dear regsiters, we can reference
dear to get the effective address when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dar. This makes code more clear.


Same comment here as for patch 1.




Signed-off-by: Xiongwei Song 
---
  arch/powerpc/include/asm/ptrace.h  | 5 -
  arch/powerpc/include/uapi/asm/ptrace.h | 5 -
  arch/powerpc/kernel/process.c  | 2 +-
  arch/powerpc/kernel/ptrace/ptrace.c| 2 ++
  arch/powerpc/kernel/traps.c| 5 -
  arch/powerpc/mm/fault.c| 2 +-
  6 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index c252d04b1206..fa725e3238c2 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -43,7 +43,10 @@ struct pt_regs
unsigned long mq;
  #endif
unsigned long trap;
-   unsigned long dar;
+   union {
+   unsigned long dar;
+   unsigned long dear;
+   };
union {
unsigned long dsisr;
unsigned long esr;
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index e357288b5f34..9ae150fb4c4b 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -52,7 +52,10 @@ struct pt_regs
unsigned long trap; /* Reason for being here */
/* N.B. for critical exceptions on 4xx, the dar and dsisr
   fields are overloaded to hold srr0 and srr1. */
-   unsigned long dar;  /* Fault registers */
+   union {
+   unsigned long dar;  /* Fault registers */
+   unsigned long dear;
+   };
union {
unsigned long dsisr;/* on Book-S used for DSISR */
unsigned long esr;  /* on 4xx/Book-E used for ESR */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index f74af8f9133c..50436b52c213 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1499,7 +1499,7 @@ static void __show_regs(struct pt_regs *regs)
trap == INTERRUPT_DATA_STORAGE ||
trap == INTERRUPT_ALIGNMENT) {
if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
-   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
regs->esr);
+   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dear, 
regs->esr);
else
pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
regs->dsisr);
}
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
b/arch/powerpc/kernel/ptrace/ptrace.c
index 00789ad2c4a3..969dca8b0718 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -373,6 +373,8 @@ void __init pt_regs_check(void)
 offsetof(struct user_pt_regs, trap));
BUILD_BUG_ON(offsetof(struct pt_regs, dar) !=
 offsetof(struct user_pt_regs, dar));
+   BUILD_BUG_ON(offsetof(struct pt_regs, dear) !=
+offsetof(struct user_pt_regs, dear));
BUILD_BUG_ON(offsetof(struct pt_regs, dsisr) !=
 offsetof(struct user_pt_regs, dsisr));
BUILD_BUG_ON(offsetof(struct pt_regs, esr) !=
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2164f5705a0b..0796630d3d23 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1609,7 +1609,10 @@ DEFINE_INTERRUPT_HANDLER(alignment_exception)
}
  bad:
if (user_mode(regs))
-   _exception(sig, regs, code, regs->dar);
+   if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
+   _exception(sig, regs, code, regs->dear);
+   else
+   _exception(sig, regs, code, regs->dar);
else
bad_page_fault(regs, sig);
  }
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 62953d4e7c93..3db6b39a1178 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -542,7 +542,7 @@ static __always_inline void __do_page_fault(struct pt_regs 
*regs)
long err;
  
  	if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))

-   err = ___do_page_fault(regs, regs->dar, regs->esr);
+   err = ___do_page_fault(regs, regs->dear, regs->esr);
else
err = ___do_page_fault(regs, regs->dar, regs->dsisr);
  



Re: [RFC PATCH 1/4] powerpc: Optimize register usage for esr register

2021-08-05 Thread Christophe Leroy




Le 26/07/2021 à 16:30, sxwj...@me.com a écrit :

From: Xiongwei Song 

Create an anonymous union for dsisr and esr regsiters, we can reference
esr to get the exception detail when CONFIG_4xx=y or CONFIG_BOOKE=y.
Otherwise, reference dsisr. This makes code more clear.


I'm not sure it is worth doing that.

What is the point in doing the following when you know that regs->esr and regs->dsisr are exactly 
the same:


> -  err = ___do_page_fault(regs, regs->dar, regs->dsisr);
> +  if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
> +  err = ___do_page_fault(regs, regs->dar, regs->esr);
> +  else
> +  err = ___do_page_fault(regs, regs->dar, regs->dsisr);
> +

Or even

> -  int is_write = page_fault_is_write(regs->dsisr);
> +  unsigned long err_reg;
> +  int is_write;
> +
> +  if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
> +  err_reg = regs->esr;
> +  else
> +  err_reg = regs->dsisr;
> +
> +  is_write = page_fault_is_write(err_reg);


Artificially growing the code for that makes no sense to me.


To avoid anbiguity, maybe the best would be to rename regs->dsisr to something like regs->sr , so 
that we know it represents the status register, which is DSISR or ESR depending on the platform.





Signed-off-by: Xiongwei Song 
---
  arch/powerpc/include/asm/ptrace.h  |  5 -
  arch/powerpc/include/uapi/asm/ptrace.h |  5 -
  arch/powerpc/kernel/process.c  |  2 +-
  arch/powerpc/kernel/ptrace/ptrace.c|  2 ++
  arch/powerpc/kernel/traps.c|  2 +-
  arch/powerpc/mm/fault.c| 16 ++--
  arch/powerpc/platforms/44x/machine_check.c |  4 ++--
  arch/powerpc/platforms/4xx/machine_check.c |  2 +-
  8 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 3e5d470a6155..c252d04b1206 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -44,7 +44,10 @@ struct pt_regs
  #endif
unsigned long trap;
unsigned long dar;
-   unsigned long dsisr;
+   union {
+   unsigned long dsisr;
+   unsigned long esr;
+   };
unsigned long result;
};
};
diff --git a/arch/powerpc/include/uapi/asm/ptrace.h 
b/arch/powerpc/include/uapi/asm/ptrace.h
index 7004cfea3f5f..e357288b5f34 100644
--- a/arch/powerpc/include/uapi/asm/ptrace.h
+++ b/arch/powerpc/include/uapi/asm/ptrace.h
@@ -53,7 +53,10 @@ struct pt_regs
/* N.B. for critical exceptions on 4xx, the dar and dsisr
   fields are overloaded to hold srr0 and srr1. */
unsigned long dar;  /* Fault registers */
-   unsigned long dsisr;/* on 4xx/Book-E used for ESR */
+   union {
+   unsigned long dsisr;/* on Book-S used for DSISR */
+   unsigned long esr;  /* on 4xx/Book-E used for ESR */
+   };
unsigned long result;   /* Result of a system call */
  };
  
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c

index 185beb290580..f74af8f9133c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1499,7 +1499,7 @@ static void __show_regs(struct pt_regs *regs)
trap == INTERRUPT_DATA_STORAGE ||
trap == INTERRUPT_ALIGNMENT) {
if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE))
-   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
regs->dsisr);
+   pr_cont("DEAR: "REG" ESR: "REG" ", regs->dar, 
regs->esr);
else
pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, 
regs->dsisr);
}
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
b/arch/powerpc/kernel/ptrace/ptrace.c
index 0a0a33eb0d28..00789ad2c4a3 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -375,6 +375,8 @@ void __init pt_regs_check(void)
 offsetof(struct user_pt_regs, dar));
BUILD_BUG_ON(offsetof(struct pt_regs, dsisr) !=
 offsetof(struct user_pt_regs, dsisr));
+   BUILD_BUG_ON(offsetof(struct pt_regs, esr) !=
+offsetof(struct user_pt_regs, esr));
BUILD_BUG_ON(offsetof(struct pt_regs, result) !=
 offsetof(struct user_pt_regs, result));
  
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c

index dfbce527c98e..2164f5705a0b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -562,7 +562,7 @@ static inline int check_io_access(struct pt_regs *regs)
  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
  /* On 4xx, the reason for the machine check or program exception
 is in the ESR. */
-#define get_reason(regs)   ((regs)->dsisr)

Re: [PATCH] powerpc/kprobes: Fix kprobe Oops happens in booke

2021-08-05 Thread Christophe Leroy




Le 04/08/2021 à 16:37, Pu Lehui a écrit :

When using kprobe on powerpc booke series processor, Oops happens
as show bellow:

[   35.861352] Oops: Exception in kernel mode, sig: 5 [#1]
[   35.861676] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
[   35.861905] Modules linked in:
[   35.862144] CPU: 0 PID: 76 Comm: sh Not tainted 
5.14.0-rc3-00060-g7e96bf476270 #18
[   35.862610] NIP:  c0b96470 LR: c00107b4 CTR: c0161c80
[   35.862805] REGS: c387fe70 TRAP: 0700   Not tainted 
(5.14.0-rc3-00060-g7e96bf476270)
[   35.863198] MSR:  00029002   CR: 24022824  XER: 2000
[   35.863577]
[   35.863577] GPR00: c0015218 c387ff20 c313e300 c387ff50 0004 4002 
4000 0a1a2cce
[   35.863577] GPR08:  0004  59764000 24022422 102490c2 
 
[   35.863577] GPR16:   0040 1024 1024 1024 
1024 1022
[   35.863577] GPR24:  1024   bfc655e8 0800 
c387ff50 
[   35.865367] NIP [c0b96470] schedule+0x0/0x130
[   35.865606] LR [c00107b4] interrupt_exit_user_prepare_main+0xf4/0x100
[   35.865974] Call Trace:
[   35.866142] [c387ff20] [c0053224] irq_exit+0x114/0x120 (unreliable)
[   35.866472] [c387ff40] [c0015218] interrupt_return+0x14/0x13c
[   35.866728] --- interrupt: 900 at 0x100af3dc
[   35.866963] NIP:  100af3dc LR: 100de020 CTR: 
[   35.867177] REGS: c387ff50 TRAP: 0900   Not tainted 
(5.14.0-rc3-00060-g7e96bf476270)
[   35.867488] MSR:  0002f902   CR: 20022422  XER: 2000
[   35.867808]
[   35.867808] GPR00: c001509c bfc65570 1024b4d0  100de020 20022422 
bfc655a8 100af3dc
[   35.867808] GPR08: 0002f902    72656773 102490c2 
 
[   35.867808] GPR16:   0040 1024 1024 1024 
1024 1022
[   35.867808] GPR24:  1024   bfc655e8 10245910 
 0001
[   35.869406] NIP [100af3dc] 0x100af3dc
[   35.869578] LR [100de020] 0x100de020
[   35.869751] --- interrupt: 900
[   35.870001] Instruction dump:
[   35.870283] 40c20010 815e0518 714a0100 41e2fd04 3920 913e00c0 3b1e0450 
4bfffd80
[   35.870666] 0fe0 92a10024 4bfff1a9 6000 <7fe8> 7c0802a6 93e1001c 
7c5f1378
[   35.871339] ---[ end trace 23ff848139efa9b9 ]---

There is no real mode for booke arch and the MMU translation is
always on. The corresponding MSR_IS/MSR_DS bit in booke is used
to switch the address space, but not for real mode judgment.


Can you explain more the link between that explanation and the Oops itself ?



Fixes: 21f8b2fa3ca5 ("powerpc/kprobes: Ignore traps that happened in real mode")
Signed-off-by: Pu Lehui 
---
  arch/powerpc/include/asm/ptrace.h | 6 ++
  arch/powerpc/kernel/kprobes.c | 5 +
  2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 3e5d470a6155..4aec1a97024b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -187,6 +187,12 @@ static inline unsigned long frame_pointer(struct pt_regs 
*regs)
  #define user_mode(regs) (((regs)->msr & MSR_PR) != 0)
  #endif
  
+#ifdef CONFIG_BOOKE

+#define real_mode(regs)0
+#else
+#define real_mode(regs)(!((regs)->msr & MSR_IR) || !((regs)->msr & 
MSR_DR))
+#endif
+


You don't need an #ifdef stuff here, you can base your testing on 
IS_ENABLED(CONFIG_BOOKE)


  #define force_successful_syscall_return()   \
do { \
set_thread_flag(TIF_NOERROR); \
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index cbc28d1a2e1b..fac9a5974718 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -289,10 +289,7 @@ int kprobe_handler(struct pt_regs *regs)
unsigned int *addr = (unsigned int *)regs->nip;
struct kprobe_ctlblk *kcb;
  
-	if (user_mode(regs))

-   return 0;
-
-   if (!(regs->msr & MSR_IR) || !(regs->msr & MSR_DR))
+   if (user_mode(regs) || real_mode(regs))
return 0;
  
  	/*




Re: [PATCH v5 7/8] powerpc/64s: Initialize and use a temporary mm for patching on Radix

2021-08-05 Thread Christophe Leroy




Le 13/07/2021 à 07:31, Christopher M. Riedl a écrit :

When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped as writeable. Currently, a
per-cpu vmalloc patch area is used for this purpose. While the patch
area is per-cpu, the temporary page mapping is inserted into the kernel
page tables for the duration of patching. The mapping is exposed to CPUs
other than the patching CPU - this is undesirable from a hardening
perspective. Use a temporary mm instead which keeps the mapping local to
the CPU doing the patching.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
space. The patching address is randomized between PAGE_SIZE and
DEFAULT_MAP_WINDOW-PAGE_SIZE.

Bits of entropy with 64K page size on BOOK3S_64:

 bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

 PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
 bits of entropy = log2(128TB / 64K)
bits of entropy = 31

The upper limit is DEFAULT_MAP_WINDOW due to how the Book3s64 Hash MMU
operates - by default the space above DEFAULT_MAP_WINDOW is not
available. Currently the Hash MMU does not use a temporary mm so
technically this upper limit isn't necessary; however, a larger
randomization range does not further "harden" this overall approach and
future work may introduce patching with a temporary mm on Hash as well.

Randomization occurs only once during initialization at boot for each
possible CPU in the system.

Introduce two new functions, map_patch() and unmap_patch(), to
respectively create and remove the temporary mapping with write
permissions at patching_addr. Map the page with PAGE_KERNEL to set
EAA[0] for the PTE which ignores the AMR (so no need to unlock/lock
KUAP) according to PowerISA v3.0b Figure 35 on Radix.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

and:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl 

---

v5:  * Only support Book3s64 Radix MMU for now.
  * Use a per-cpu datastructure to hold the patching_addr and
patching_mm to avoid the need for a synchronization lock/mutex.

v4:  * In the previous series this was two separate patches: one to init
the temporary mm in poking_init() (unused in powerpc at the time)
and the other to use it for patching (which removed all the
per-cpu vmalloc code). Now that we use poking_init() in the
existing per-cpu vmalloc approach, that separation doesn't work
as nicely anymore so I just merged the two patches into one.
  * Preload the SLB entry and hash the page for the patching_addr
when using Hash on book3s64 to avoid taking an SLB and Hash fault
during patching. The previous implementation was a hack which
changed current->mm to allow the SLB and Hash fault handlers to
work with the temporary mm since both of those code-paths always
assume mm == current->mm.
  * Also (hmm - seeing a trend here) with the book3s64 Hash MMU we
have to manage the mm->context.active_cpus counter and mm cpumask
since they determine (via mm_is_thread_local()) if the TLB flush
in pte_clear() is local or not - it should always be local when
we're using the temporary mm. On book3s64's Radix MMU we can
just call local_flush_tlb_mm().
  * Use HPTE_USE_KERNEL_KEY on Hash to avoid costly lock/unlock of
KUAP.
---
  arch/powerpc/lib/code-patching.c | 132 +--
  1 file changed, 125 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 9f2eba9b70ee4..027dabd42b8dd 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -103,6 +104,7 @@ static inline void unuse_temporary_mm(struct temp_mm 
*temp_mm)
  
  static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);

  static DEFINE_PER_CPU(unsigned long, cpu_patching_addr);
+static DEFINE_PER_CPU(struct mm_struct *, cpu_patching_mm);
  
  #if IS_BUILTIN(CONFIG_LKDTM)

  unsigned long read_cpu_patching_addr(unsigned int cpu)
@@ -133,6 +135,51 @@ static int text_area_cpu_down(unsigned int cpu)
return 0;
  }
  
+static __always_inline void __poking_init_temp_mm(void)

+{
+   int cpu;
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+   struct mm_struct *patching_mm;
+   unsigned long patching_addr;
+
+   for_each_possible_cpu(cpu) {
+   /*
+* Some parts of the kernel (static keys for example) depend on
+* successful code patching. Code patching 

Re: [PATCH v5 6/8] powerpc: Rework and improve STRICT_KERNEL_RWX patching

2021-08-05 Thread Christophe Leroy




Le 13/07/2021 à 07:31, Christopher M. Riedl a écrit :

Rework code-patching with STRICT_KERNEL_RWX to prepare for the next
patch which uses a temporary mm for patching under the Book3s64 Radix
MMU. Make improvements by adding a WARN_ON when the patchsite doesn't
match after patching and return the error from __patch_instruction()
properly.

Signed-off-by: Christopher M. Riedl 

---

v5:  * New to series.
---
  arch/powerpc/lib/code-patching.c | 51 +---
  1 file changed, 27 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 3122d8e4cc013..9f2eba9b70ee4 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -102,11 +102,12 @@ static inline void unuse_temporary_mm(struct temp_mm 
*temp_mm)
  }
  
  static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);

+static DEFINE_PER_CPU(unsigned long, cpu_patching_addr);
  
  #if IS_BUILTIN(CONFIG_LKDTM)

  unsigned long read_cpu_patching_addr(unsigned int cpu)
  {
-   return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
+   return per_cpu(cpu_patching_addr, cpu);
  }
  #endif
  
@@ -121,6 +122,7 @@ static int text_area_cpu_up(unsigned int cpu)

return -1;
}
this_cpu_write(text_poke_area, area);
+   this_cpu_write(cpu_patching_addr, (unsigned long)area->addr);
  
  	return 0;

  }
@@ -146,7 +148,7 @@ void __init poking_init(void)
  /*
   * This can be called for kernel text or a module.
   */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch_area(void *addr)
  {
unsigned long pfn;
int err;
@@ -156,17 +158,20 @@ static int map_patch_area(void *addr, unsigned long 
text_poke_addr)
else
pfn = __pa_symbol(addr) >> PAGE_SHIFT;
  
-	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);

+   err = map_kernel_page(__this_cpu_read(cpu_patching_addr),
+ (pfn << PAGE_SHIFT), PAGE_KERNEL);
  
-	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);

+   pr_devel("Mapped addr %lx with pfn %lx:%d\n",
+__this_cpu_read(cpu_patching_addr), pfn, err);
if (err)
return -1;
  
  	return 0;

  }
  
-static inline int unmap_patch_area(unsigned long addr)

+static inline int unmap_patch_area(void)
  {
+   unsigned long addr = __this_cpu_read(cpu_patching_addr);
pte_t *ptep;
pmd_t *pmdp;
pud_t *pudp;
@@ -175,23 +180,23 @@ static inline int unmap_patch_area(unsigned long addr)
  
  	pgdp = pgd_offset_k(addr);

if (unlikely(!pgdp))
-   return -EINVAL;
+   goto out_err;
  
  	p4dp = p4d_offset(pgdp, addr);

if (unlikely(!p4dp))
-   return -EINVAL;
+   goto out_err;
  
  	pudp = pud_offset(p4dp, addr);

if (unlikely(!pudp))
-   return -EINVAL;
+   goto out_err;
  
  	pmdp = pmd_offset(pudp, addr);

if (unlikely(!pmdp))
-   return -EINVAL;
+   goto out_err;
  
  	ptep = pte_offset_kernel(pmdp, addr);

if (unlikely(!ptep))
-   return -EINVAL;
+   goto out_err;
  
  	pr_devel("clearing mm %p, pte %p, addr %lx\n", _mm, ptep, addr);
  
@@ -202,15 +207,17 @@ static inline int unmap_patch_area(unsigned long addr)

flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
  
  	return 0;

+
+out_err:
+   pr_warn("failed to unmap %lx\n", addr);
+   return -EINVAL;


Can you keep that in the caller of unmap_patch_area() instead of all those goto 
stuff ?


  }
  
  static int do_patch_instruction(u32 *addr, struct ppc_inst instr)

  {
-   int err;
+   int err, rc = 0;
u32 *patch_addr = NULL;
unsigned long flags;
-   unsigned long text_poke_addr;
-   unsigned long kaddr = (unsigned long)addr;
  
  	/*

 * During early early boot patch_instruction is called
@@ -222,24 +229,20 @@ static int do_patch_instruction(u32 *addr, struct 
ppc_inst instr)
  
  	local_irq_save(flags);
  
-	text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr;

-   if (map_patch_area(addr, text_poke_addr)) {
-   err = -1;
+   err = map_patch_area(addr);
+   if (err)
goto out;
-   }
-
-   patch_addr = (u32 *)(text_poke_addr + (kaddr & ~PAGE_MASK));
  
-	__patch_instruction(addr, instr, patch_addr);

+   patch_addr = (u32 *)(__this_cpu_read(cpu_patching_addr) | 
offset_in_page(addr));
+   rc = __patch_instruction(addr, instr, patch_addr);
  
-	err = unmap_patch_area(text_poke_addr);

-   if (err)
-   pr_warn("failed to unmap %lx\n", text_poke_addr);
+   err = unmap_patch_area();
  
  out:

local_irq_restore(flags);
+   WARN_ON(!ppc_inst_equal(ppc_inst_read(addr), instr));


Why adding that WARN_ON(), what could make that happen that 

Re: [PATCH v15 7/9] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX

2021-08-05 Thread Laurent Vivier
Hi,

On 09/06/2021 03:34, Jordan Niethe wrote:
> From: Russell Currey 
> 
> To enable strict module RWX on powerpc, set:
> 
> CONFIG_STRICT_MODULE_RWX=y
> 
> You should also have CONFIG_STRICT_KERNEL_RWX=y set to have any real
> security benefit.
> 
> ARCH_HAS_STRICT_MODULE_RWX is set to require ARCH_HAS_STRICT_KERNEL_RWX.
> This is due to a quirk in arch/Kconfig and arch/powerpc/Kconfig that
> makes STRICT_MODULE_RWX *on by default* in configurations where
> STRICT_KERNEL_RWX is *unavailable*.
> 
> Since this doesn't make much sense, and module RWX without kernel RWX
> doesn't make much sense, having the same dependencies as kernel RWX
> works around this problem.
> 
> Book3s/32 603 and 604 core processors are not able to write protect
> kernel pages so do not set ARCH_HAS_STRICT_MODULE_RWX for Book3s/32.
> 
> Reviewed-by: Christophe Leroy 
> Signed-off-by: Russell Currey 
> [jpn: - predicate on !PPC_BOOK3S_604
>   - make module_alloc() use PAGE_KERNEL protection]
> Signed-off-by: Jordan Niethe 
> ---
> v10: - Predicate on !PPC_BOOK3S_604
>  - Make module_alloc() use PAGE_KERNEL protection
> v11: - Neaten up
> v13: Use strict_kernel_rwx_enabled()
> v14: Make changes to module_alloc() its own commit
> v15: - Force STRICT_KERNEL_RWX if STRICT_MODULE_RWX is selected
>  - Predicate on !PPC_BOOK3S_32 instead
> ---
>  arch/powerpc/Kconfig | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index abfe2e9225fa..72f307f1796b 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -142,6 +142,7 @@ config PPC
>   select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
> && PPC_BOOK3S_64
>   select ARCH_HAS_SET_MEMORY
>   select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
> !HIBERNATION)
> + select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX 
> && !PPC_BOOK3S_32
>   select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
>   select ARCH_HAS_UACCESS_FLUSHCACHE
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
> @@ -267,6 +268,7 @@ config PPC
>   select PPC_DAWR if PPC64
>   select RTC_LIB
>   select SPARSE_IRQ
> + select STRICT_KERNEL_RWX if STRICT_MODULE_RWX
>   select SYSCTL_EXCEPTION_TRACE
>   select THREAD_INFO_IN_TASK
>   select VIRT_TO_BUS  if !PPC64
> 

since this patch is merged my VM is experiencing a crash at boot (20% of the 
time):

[8.496850] kernel tried to execute exec-protected page (c00804073278) - 
exploit
attempt? (uid: 0)
[8.496921] BUG: Unable to handle kernel instruction fetch
[8.496954] Faulting instruction address: 0xc00804073278
[8.496994] Oops: Kernel access of bad area, sig: 11 [#1]
[8.497028] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[8.497071] Modules linked in: drm virtio_console fuse 
drm_panel_orientation_quirks xfs
libcrc32c virtio_net net_failover virtio_blk vmx_crypto failover dm_mirror 
dm_region_hash
dm_log dm_mod
[8.497186] CPU: 3 PID: 44 Comm: kworker/3:1 Not tainted 5.14.0-rc4+ #12
[8.497228] Workqueue: events control_work_handler [virtio_console]
[8.497272] NIP:  c00804073278 LR: c00804073278 CTR: c01e9de0
[8.497320] REGS: c0002e4ef7e0 TRAP: 0400   Not tainted  (5.14.0-rc4+)
[8.497361] MSR:  80004280b033   CR: 
24002822
XER: 200400cf
[8.497426] CFAR: c01e9e44 IRQMASK: 1
[8.497426] GPR00: c00804073278 c0002e4efa80 c2a26b00 
c00042c39520
[8.497426] GPR04: 0001   
00ff
[8.497426] GPR08: 0001 c00042c39520 0001 
c00804076008
[8.497426] GPR12: c01e9de0 c001fffccb00 c018ba88 
c0002c91d400
[8.497426] GPR16:    

[8.497426] GPR20:    
c00804080340
[8.497426] GPR24: c008040a01e8   
c0002e0975c0
[8.497426] GPR28: c0002ce72940 c00042c39520 0048 
0038
[8.497891] NIP [c00804073278] fill_queue+0xf0/0x210 [virtio_console]
[8.497934] LR [c00804073278] fill_queue+0xf0/0x210 [virtio_console]
[8.497976] Call Trace:
[8.497993] [c0002e4efa80] [c0080407323c] fill_queue+0xb4/0x210
[virtio_console] (unreliable)
[8.498052] [c0002e4efae0] [c00804073a90] add_port+0x1a8/0x470 
[virtio_console]
[8.498102] [c0002e4efbb0] [c008040750f4] 
control_work_handler+0xbc/0x1e8
[virtio_console]
[8.498160] [c0002e4efc60] [c017f4f0] 
process_one_work+0x290/0x590
[8.498212] [c0002e4efd00] [c017f878] worker_thread+0x88/0x620
[8.498256] [c0002e4efda0] [c018bc14] kthread+0x194/0x1a0
[8.498299] [c0002e4efe10] 

Re: [PATCH v5 5/8] powerpc/64s: Introduce temporary mm for Radix MMU

2021-08-05 Thread Christophe Leroy




Le 13/07/2021 à 07:31, Christopher M. Riedl a écrit :

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. Another benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.


Can you explain more about that breakpoint stuff ? Why is it a special case here at all ? Isn't it 
the same when you switch from one user task to another one ? x86 commit doesn't say anythink about 
breakpoints.




Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl 

---

v5:  * Drop support for using a temporary mm on Book3s64 Hash MMU.

v4:  * Pass the prev mm instead of NULL to switch_mm_irqs_off() when
using/unusing the temp mm as suggested by Jann Horn to keep
the context.active counter in-sync on mm/nohash.
  * Disable SLB preload in the temporary mm when initializing the
temp_mm struct.
  * Include asm/debug.h header to fix build issue with
ppc44x_defconfig.
---
  arch/powerpc/include/asm/debug.h |  1 +
  arch/powerpc/kernel/process.c|  5 +++
  arch/powerpc/lib/code-patching.c | 56 
  3 files changed, 62 insertions(+)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index 86a14736c76c3..dfd82635ea8b3 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -46,6 +46,7 @@ static inline int debugger_fault_handler(struct pt_regs 
*regs) { return 0; }
  #endif
  
  void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk);

+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk);
  bool ppc_breakpoint_available(void);
  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
  extern void do_send_trap(struct pt_regs *regs, unsigned long address,
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 185beb2905801..a0776200772e8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -865,6 +865,11 @@ static inline int set_breakpoint_8xx(struct 
arch_hw_breakpoint *brk)
return 0;
  }
  
+void __get_breakpoint(int nr, struct arch_hw_breakpoint *brk)

+{
+   memcpy(brk, this_cpu_ptr(_brk[nr]), sizeof(*brk));
+}
+
  void __set_breakpoint(int nr, struct arch_hw_breakpoint *brk)
  {
memcpy(this_cpu_ptr(_brk[nr]), brk, sizeof(*brk));
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 54b6157d44e95..3122d8e4cc013 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -17,6 +17,9 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
  
  static int __patch_instruction(u32 *exec_addr, struct ppc_inst instr, u32 *patch_addr)

  {
@@ -45,6 +48,59 @@ int raw_patch_instruction(u32 *addr, struct ppc_inst instr)
  }
  
  #ifdef CONFIG_STRICT_KERNEL_RWX

+
+struct temp_mm {
+   struct mm_struct *temp;
+   struct mm_struct *prev;
+   struct arch_hw_breakpoint brk[HBP_NUM_MAX];
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+   /* We currently only support temporary mm on the Book3s64 Radix MMU */
+   WARN_ON(!radix_enabled());
+
+   temp_mm->temp = mm;
+   temp_mm->prev = NULL;
+   memset(_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   temp_mm->prev = current->active_mm;
+   switch_mm_irqs_off(temp_mm->prev, temp_mm->temp, current);
+
+   WARN_ON(!mm_is_thread_local(temp_mm->temp));
+
+   if (ppc_breakpoint_available()) {
+   struct arch_hw_breakpoint null_brk = {0};
+   int i = 0;
+
+   for (; i < nr_wp_slots(); ++i) {
+   __get_breakpoint(i, _mm->brk[i]);
+   if (temp_mm->brk[i].type != 0)
+   __set_breakpoint(i, _brk);
+   }
+   }
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)


not sure about the naming.

Maybe start_using_temp_mm() and stop_using_temp_mm() would be more explicit.



+{
+   lockdep_assert_irqs_disabled();
+
+   switch_mm_irqs_off(temp_mm->temp, temp_mm->prev, current);
+
+   if 

[PATCH v2 3/3] powerpc/mce: Modify the real address error logging messages

2021-08-05 Thread Ganesh Goudar
To avoid ambiguity, modify the strings in real address error
logging messages to "foreign/control memory" from "foreign",
Since the error discriptions in P9 user manual and P10 user
manual are different for same type of errors.

P9 User Manual for MCE:
DSISR:59 Host real address to foreign space during translation.
DSISR:60 Host real address to foreign space on a load or store
 access.

P10 User Manual for MCE:
DSISR:59 D-side tablewalk used a host real address in the
 control memory address range.
DSISR:60 D-side operand access to control memory address space.

Signed-off-by: Ganesh Goudar 
---
v2: No changes in this patch.
---
 arch/powerpc/kernel/mce.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 47a683cd00d2..f3ef480bb739 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -388,14 +388,14 @@ void machine_check_print_event_info(struct 
machine_check_event *evt,
static const char *mc_ra_types[] = {
"Indeterminate",
"Instruction fetch (bad)",
-   "Instruction fetch (foreign)",
+   "Instruction fetch (foreign/control memory)",
"Page table walk ifetch (bad)",
-   "Page table walk ifetch (foreign)",
+   "Page table walk ifetch (foreign/control memory)",
"Load (bad)",
"Store (bad)",
"Page table walk Load/Store (bad)",
-   "Page table walk Load/Store (foreign)",
-   "Load/Store (foreign)",
+   "Page table walk Load/Store (foreign/control memory)",
+   "Load/Store (foreign/control memory)",
};
static const char *mc_link_types[] = {
"Indeterminate",
-- 
2.31.1



[PATCH v2 2/3] selftests/powerpc: Add test for real address error handling

2021-08-05 Thread Ganesh Goudar
Add test for real address or control memory address access
error handling, using NX-GZIP engine.

The error is injected by accessing the control memory address
using illegal instruction, on successful handling the process
attempting to access control memory address using illegal
instruction receives SIGBUS.

Signed-off-by: Ganesh Goudar 
---
v2: Fix build error.
---
 tools/testing/selftests/powerpc/Makefile  |  3 +-
 tools/testing/selftests/powerpc/mce/Makefile  |  6 +++
 .../selftests/powerpc/mce/inject-ra-err.c | 42 +++
 .../selftests/powerpc/mce/inject-ra-err.sh| 18 
 tools/testing/selftests/powerpc/mce/vas-api.h |  1 +
 5 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/mce/Makefile
 create mode 100644 tools/testing/selftests/powerpc/mce/inject-ra-err.c
 create mode 100755 tools/testing/selftests/powerpc/mce/inject-ra-err.sh
 create mode 12 tools/testing/selftests/powerpc/mce/vas-api.h

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 0830e63818c1..4830372d7416 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -31,7 +31,8 @@ SUB_DIRS = alignment  \
   vphn \
   math \
   ptrace   \
-  security
+  security \
+  mce
 
 endif
 
diff --git a/tools/testing/selftests/powerpc/mce/Makefile 
b/tools/testing/selftests/powerpc/mce/Makefile
new file mode 100644
index ..0f537ce86370
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mce/Makefile
@@ -0,0 +1,6 @@
+#SPDX-License-Identifier: GPL-2.0-or-later
+
+TEST_PROGS := inject-ra-err.sh
+TEST_GEN_FILES := inject-ra-err
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/mce/inject-ra-err.c 
b/tools/testing/selftests/powerpc/mce/inject-ra-err.c
new file mode 100644
index ..05ab11cec3da
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mce/inject-ra-err.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "vas-api.h"
+
+int main(void)
+{
+   int fd, ret;
+   int *paste_addr;
+   struct vas_tx_win_open_attr attr;
+   char *devname = "/dev/crypto/nx-gzip";
+
+   memset(, 0, sizeof(attr));
+   attr.version = 1;
+   attr.vas_id = 0;
+
+   fd = open(devname, O_RDWR);
+   if (fd < 0) {
+   fprintf(stderr, "Failed to open device %s\n", devname);
+   return -errno;
+   }
+   ret = ioctl(fd, VAS_TX_WIN_OPEN, );
+   if (ret < 0) {
+   fprintf(stderr, "ioctl() n %d, error %d\n", ret, errno);
+   ret = -errno;
+   goto out;
+   }
+   paste_addr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 
0ULL);
+   /* The following assignment triggers exception */
+   *paste_addr = 1;
+   ret = 0;
+out:
+   close(fd);
+   return ret;
+}
diff --git a/tools/testing/selftests/powerpc/mce/inject-ra-err.sh 
b/tools/testing/selftests/powerpc/mce/inject-ra-err.sh
new file mode 100755
index ..3633cdc651a1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mce/inject-ra-err.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+if [[ ! -w /dev/crypto/nx-gzip ]]; then
+   echo "WARN: Can't access /dev/crypto/nx-gzip, skipping"
+   exit 0
+fi
+
+timeout 5 ./inject-ra-err
+
+# 128 + 7 (SIGBUS) = 135, 128 is a exit code with special meaning.
+if [ $? -ne 135 ]; then
+   echo "FAILED: Real address or Control memory access error not handled"
+   exit $?
+fi
+
+echo "OK: Real address or Control memory access error is handled"
+exit 0
diff --git a/tools/testing/selftests/powerpc/mce/vas-api.h 
b/tools/testing/selftests/powerpc/mce/vas-api.h
new file mode 12
index ..1455c1bcd351
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mce/vas-api.h
@@ -0,0 +1 @@
+../../../../../arch/powerpc/include/uapi/asm/vas-api.h
\ No newline at end of file
-- 
2.31.1



[PATCH v2 1/3] powerpc/pseries: Parse control memory access error

2021-08-05 Thread Ganesh Goudar
Add support to parse and log control memory access
error for pseries.

Signed-off-by: Ganesh Goudar 
---
v2: No changes in this patch.
---
 arch/powerpc/platforms/pseries/ras.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index 167f2e1b8d39..608c35cad0c3 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -80,6 +80,7 @@ struct pseries_mc_errorlog {
 #define MC_ERROR_TYPE_TLB  0x04
 #define MC_ERROR_TYPE_D_CACHE  0x05
 #define MC_ERROR_TYPE_I_CACHE  0x07
+#define MC_ERROR_TYPE_CTRL_MEM_ACCESS  0x08
 
 /* RTAS pseries MCE error sub types */
 #define MC_ERROR_UE_INDETERMINATE  0
@@ -103,6 +104,9 @@ struct pseries_mc_errorlog {
 #define MC_ERROR_TLB_MULTIHIT  2
 #define MC_ERROR_TLB_INDETERMINATE 3
 
+#define MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK   0
+#define MC_ERROR_CTRL_MEM_ACCESS_OP_ACCESS 1
+
 static inline u8 rtas_mc_error_sub_type(const struct pseries_mc_errorlog *mlog)
 {
switch (mlog->error_type) {
@@ -112,6 +116,8 @@ static inline u8 rtas_mc_error_sub_type(const struct 
pseries_mc_errorlog *mlog)
caseMC_ERROR_TYPE_ERAT:
caseMC_ERROR_TYPE_TLB:
return (mlog->sub_err_type & 0x03);
+   caseMC_ERROR_TYPE_CTRL_MEM_ACCESS:
+   return (mlog->sub_err_type & 0x70) >> 4;
default:
return 0;
}
@@ -699,6 +705,21 @@ static int mce_handle_err_virtmode(struct pt_regs *regs,
case MC_ERROR_TYPE_I_CACHE:
mce_err.error_type = MCE_ERROR_TYPE_ICACHE;
break;
+   case MC_ERROR_TYPE_CTRL_MEM_ACCESS:
+   mce_err.error_type = MCE_ERROR_TYPE_RA;
+   if (mce_log->sub_err_type & 0x80)
+   eaddr = be64_to_cpu(mce_log->effective_address);
+   switch (err_sub_type) {
+   case MC_ERROR_CTRL_MEM_ACCESS_PTABLE_WALK:
+   mce_err.u.ra_error_type =
+   MCE_RA_ERROR_PAGE_TABLE_WALK_LOAD_STORE_FOREIGN;
+   break;
+   case MC_ERROR_CTRL_MEM_ACCESS_OP_ACCESS:
+   mce_err.u.ra_error_type =
+   MCE_RA_ERROR_LOAD_STORE_FOREIGN;
+   break;
+   }
+   break;
case MC_ERROR_TYPE_UNKNOWN:
default:
mce_err.error_type = MCE_ERROR_TYPE_UNKNOWN;
-- 
2.31.1



Re: [PATCH v5 8/8] lkdtm/powerpc: Fix code patching hijack test

2021-08-05 Thread Christophe Leroy




Le 13/07/2021 à 07:31, Christopher M. Riedl a écrit :

Code patching on powerpc with a STRICT_KERNEL_RWX uses a userspace
address in a temporary mm on Radix now. Use __put_user() to avoid write
failures due to KUAP when attempting a "hijack" on the patching address.
__put_user() also works with the non-userspace, vmalloc-based patching
address on non-Radix MMUs.


It is not really clean to use __put_user() on non user address, allthought it 
works by change.

I think it would be better to do something like

if (is_kernel_addr(addr))
copy_to_kernel_nofault(...);
else
copy_to_user_nofault(...);





Signed-off-by: Christopher M. Riedl 
---
  drivers/misc/lkdtm/perms.c | 9 -
  1 file changed, 9 deletions(-)

diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 41e87e5f9cc86..da6a34a0a49fb 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -262,16 +262,7 @@ static inline u32 lkdtm_read_patch_site(void)
  /* Returns True if the write succeeds */
  static inline bool lkdtm_try_write(u32 data, u32 *addr)
  {
-#ifdef CONFIG_PPC
-   __put_kernel_nofault(addr, , u32, err);
-   return true;
-
-err:
-   return false;
-#endif
-#ifdef CONFIG_X86_64
return !__put_user(data, addr);
-#endif
  }
  
  static int lkdtm_patching_cpu(void *data)




Re: [PATCH 1/3] arch: Export machine_restart() instances so they can be called from modules

2021-08-05 Thread Thomas Bogendoerfer
On Thu, Aug 05, 2021 at 08:50:30AM +0100, Lee Jones wrote:
> A recent attempt to convert the Power Reset Restart driver to tristate
> failed because of the following compile error (reported once merged by
> Stephen Rothwell via Linux Next):
> 
>   ERROR: "machine_restart" [drivers/power/reset/restart-poweroff.ko] 
> undefined!
> 
> This error occurs since some of the machine_restart() instances are
> not currently exported for use in modules.  This patch aims to rectify
> that.
> 
> Cc: Vineet Gupta 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Guo Ren 
> Cc: Yoshinori Sato 
> Cc: Brian Cain 
> Cc: Geert Uytterhoeven 
> Cc: Michal Simek 
> Cc: Thomas Bogendoerfer 
> Cc: John Crispin 
> Cc: Ley Foon Tan 
> Cc: Jonas Bonn 
> Cc: Stefan Kristiansson 
> Cc: Stafford Horne 
> Cc: James E.J. Bottomley 
> Cc: Helge Deller 
> Cc: Michael Ellerman 
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Albert Ou 
> Cc: Heiko Carstens 
> Cc: Vasily Gorbik 
> Cc: Christian Borntraeger 
> Cc: Rich Felker 
> Cc: David S. Miller 
> Cc: Jeff Dike 
> Cc: Richard Weinberger 
> Cc: Anton Ivanov 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Chris Zankel 
> Cc: Max Filippov 
> Cc: Greg Kroah-Hartman 
> Cc: Sebastian Reichel 
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-c...@vger.kernel.org
> Cc: uclinux-h8-de...@lists.sourceforge.jp
> Cc: linux-hexa...@vger.kernel.org
> Cc: linux-m...@lists.linux-m68k.org
> Cc: linux-m...@vger.kernel.org
> Cc: openr...@lists.librecores.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@lists.infradead.org
> Cc: linux-xte...@linux-xtensa.org
> Signed-off-by: Lee Jones 
> ---
> 
> The 2 patches this change supports have the required Acks already.
> 
> NB: If it's safe to omit some of these, let me know and I'll revise the patch.
> 
>  [...]
>  arch/mips/kernel/reset.c   | 1 +
>  arch/mips/lantiq/falcon/reset.c| 1 +
>  arch/mips/sgi-ip27/ip27-reset.c| 1 +

Acked-by: Thomas Bogendoerfer 

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]


Re: [PATCH v2] arch: vdso: remove if-conditionals of $(c-gettimeofday-y)

2021-08-05 Thread Thomas Bogendoerfer
On Sat, Jul 31, 2021 at 03:00:20PM +0900, Masahiro Yamada wrote:
> arm, arm64, csky, mips, powerpc always select GENERIC_GETTIMEOFDAY,
> hence $(gettimeofday-y) never becomes empty.
> 
> riscv conditionally selects GENERIC_GETTIMEOFDAY when MMU=y && 64BIT=y,
> but arch/riscv/kernel/vdso/vgettimeofday.o is built only under that
> condition. So, you can always define CFLAGS_vgettimeofday.o
> 
> Remove all the meaningless conditionals.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
> Changes in v2:
>   - Fix csky as well
> 
>  [..]
>  arch/mips/vdso/Makefile |  2 --

Acked-by: Thomas Bogendoerfer 

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]


Re: [PATCH v5 2/8] lkdtm/powerpc: Add test to hijack a patch mapping

2021-08-05 Thread Christophe Leroy




Le 13/07/2021 à 07:31, Christopher M. Riedl a écrit :

When live patching with STRICT_KERNEL_RWX the CPU doing the patching
must temporarily remap the page(s) containing the patch site with +W
permissions. While this temporary mapping is in use, another CPU could
write to the same mapping and maliciously alter kernel text. Implement a
LKDTM test to attempt to exploit such an opening during code patching.
The test is implemented on powerpc and requires LKDTM built into the
kernel (building LKDTM as a module is insufficient).

The LKDTM "hijack" test works as follows:

   1. A CPU executes an infinite loop to patch an instruction. This is
  the "patching" CPU.
   2. Another CPU attempts to write to the address of the temporary
  mapping used by the "patching" CPU. This other CPU is the
  "hijacker" CPU. The hijack either fails with a fault/error or
  succeeds, in which case some kernel text is now overwritten.

The virtual address of the temporary patch mapping is provided via an
LKDTM-specific accessor to the hijacker CPU. This test assumes a
hypothetical situation where this address was leaked previously.

How to run the test:

mount -t debugfs none /sys/kernel/debug
(echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT)

A passing test indicates that it is not possible to overwrite kernel
text from another CPU by using the temporary mapping established by
a CPU for patching.

Signed-off-by: Christopher M. Riedl 

---

v5:  * Use `u32*` instead of `struct ppc_inst*` based on new series in
upstream.

v4:  * Separate the powerpc and x86_64 bits into individual patches.
  * Use __put_kernel_nofault() when attempting to hijack the mapping
  * Use raw_smp_processor_id() to avoid triggering the BUG() when
calling smp_processor_id() in preemptible code - the only thing
that matters is that one of the threads is bound to a different
CPU - we are not using smp_processor_id() to access any per-cpu
data or similar where preemption should be disabled.
  * Rework the patching_cpu() kthread stop condition to avoid:
https://lwn.net/Articles/628628/
---
  drivers/misc/lkdtm/core.c  |   1 +
  drivers/misc/lkdtm/lkdtm.h |   1 +
  drivers/misc/lkdtm/perms.c | 134 +
  3 files changed, 136 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 8024b6a5cc7fc..fbcb95eda337b 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -147,6 +147,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
CRASHTYPE(WRITE_KERN),
+   CRASHTYPE(HIJACK_PATCH),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 99f90d3e5e9cb..87e7e6136d962 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -62,6 +62,7 @@ void lkdtm_EXEC_USERSPACE(void);
  void lkdtm_EXEC_NULL(void);
  void lkdtm_ACCESS_USERSPACE(void);
  void lkdtm_ACCESS_NULL(void);
+void lkdtm_HIJACK_PATCH(void);
  
  /* refcount.c */

  void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 2dede2ef658f3..39e7456852229 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  /* Whether or not to fill the target memory area with do_nothing(). */

@@ -222,6 +223,139 @@ void lkdtm_ACCESS_NULL(void)
pr_err("FAIL: survived bad write\n");
  }
  
+#if (IS_BUILTIN(CONFIG_LKDTM) && defined(CONFIG_STRICT_KERNEL_RWX) && \

+   defined(CONFIG_PPC))



I think this test shouldn't be limited to CONFIG_PPC and shouldn't be limited to 
CONFIG_STRICT_KERNEL_RWX. It should be there all the time.


Also why limiting it to IS_BUILTIN(CONFIG_LKDTM) ?


+/*
+ * This is just a dummy location to patch-over.
+ */
+static void patching_target(void)
+{
+   return;
+}
+
+#include 
+const u32 *patch_site = (const u32 *)_target;
+
+static inline int lkdtm_do_patch(u32 data)
+{
+   return patch_instruction((u32 *)patch_site, ppc_inst(data));
+}
+
+static inline u32 lkdtm_read_patch_site(void)
+{
+   return READ_ONCE(*patch_site);
+}
+
+/* Returns True if the write succeeds */
+static inline bool lkdtm_try_write(u32 data, u32 *addr)
+{
+   __put_kernel_nofault(addr, , u32, err);
+   return true;
+
+err:
+   return false;
+}
+
+static int lkdtm_patching_cpu(void *data)
+{
+   int err = 0;
+   u32 val = 0xdeadbeef;
+
+   pr_info("starting patching_cpu=%d\n", raw_smp_processor_id());
+
+   do {
+   err = lkdtm_do_patch(val);
+   } while (lkdtm_read_patch_site() == val && !err && 
!kthread_should_stop());
+
+   if (err)
+   pr_warn("XFAIL: 

Re: [PATCH v5 4/8] lkdtm/x86_64: Add test to hijack a patch mapping

2021-08-05 Thread Christophe Leroy




Le 13/07/2021 à 07:31, Christopher M. Riedl a écrit :

A previous commit implemented an LKDTM test on powerpc to exploit the
temporary mapping established when patching code with STRICT_KERNEL_RWX
enabled. Extend the test to work on x86_64 as well.

Signed-off-by: Christopher M. Riedl 
---
  drivers/misc/lkdtm/perms.c | 26 ++
  1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 39e7456852229..41e87e5f9cc86 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -224,7 +224,7 @@ void lkdtm_ACCESS_NULL(void)
  }
  
  #if (IS_BUILTIN(CONFIG_LKDTM) && defined(CONFIG_STRICT_KERNEL_RWX) && \

-   defined(CONFIG_PPC))
+   (defined(CONFIG_PPC) || defined(CONFIG_X86_64)))
  /*
   * This is just a dummy location to patch-over.
   */
@@ -233,12 +233,25 @@ static void patching_target(void)
return;
  }
  
-#include 

  const u32 *patch_site = (const u32 *)_target;
  
+#ifdef CONFIG_PPC

+#include 
+#endif
+
+#ifdef CONFIG_X86_64
+#include 
+#endif
+
  static inline int lkdtm_do_patch(u32 data)
  {
+#ifdef CONFIG_PPC
return patch_instruction((u32 *)patch_site, ppc_inst(data));
+#endif
+#ifdef CONFIG_X86_64
+   text_poke((void *)patch_site, , sizeof(u32));
+   return 0;
+#endif
  }
  
  static inline u32 lkdtm_read_patch_site(void)

@@ -249,11 +262,16 @@ static inline u32 lkdtm_read_patch_site(void)
  /* Returns True if the write succeeds */
  static inline bool lkdtm_try_write(u32 data, u32 *addr)
  {
+#ifdef CONFIG_PPC
__put_kernel_nofault(addr, , u32, err);
return true;
  
  err:

return false;
+#endif
+#ifdef CONFIG_X86_64
+   return !__put_user(data, addr);
+#endif
  }
  
  static int lkdtm_patching_cpu(void *data)

@@ -346,8 +364,8 @@ void lkdtm_HIJACK_PATCH(void)
  
  void lkdtm_HIJACK_PATCH(void)

  {
-   if (!IS_ENABLED(CONFIG_PPC))
-   pr_err("XFAIL: this test only runs on powerpc\n");
+   if (!IS_ENABLED(CONFIG_PPC) && !IS_ENABLED(CONFIG_X86_64))
+   pr_err("XFAIL: this test only runs on powerpc and x86_64\n");
if (!IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
pr_err("XFAIL: this test requires CONFIG_STRICT_KERNEL_RWX\n");
if (!IS_BUILTIN(CONFIG_LKDTM))



Instead of spreading arch specific stuff into LKDTM, wouldn't it make sence to define common a 
common API ? Because the day another arch like arm64 implements it own approach, do we add specific 
functions again and again into LKDTM ?


Also, I find it odd to define tests only when they can succeed. For other tests like 
ACCESS_USERSPACE, they are there all the time, regardless of whether we have selected 
CONFIG_PPC_KUAP or not. I think it should be the same here, have it all there time, if 
CONFIG_STRICT_KERNEL_RWX is selected the test succeeds otherwise it fails, but it is always there.


Christophe


Re: [PATCH v5 0/8] Use per-CPU temporary mappings for patching on Radix MMU

2021-08-05 Thread Christophe Leroy




Le 13/07/2021 à 07:31, Christopher M. Riedl a écrit :

When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
temporary mappings when patching itself. These mappings temporarily
override the strict RWX text protections to permit a write. Currently,
powerpc allocates a per-CPU VM area for patching. Patching occurs as
follows:

1. Map page in per-CPU VM area w/ PAGE_KERNEL protection
2. Patch text
3. Remove the temporary mapping

While the VM area is per-CPU, the mapping is actually inserted into the
kernel page tables. Presumably, this could allow another CPU to access
the normally write-protected text - either malicously or accidentally -
via this same mapping if the address of the VM area is known. Ideally,
the mapping should be kept local to the CPU doing the patching [0].

x86 introduced "temporary mm" structs which allow the creation of mappings
local to a particular CPU [1]. This series intends to bring the notion of a
temporary mm to powerpc's Book3s64 Radix MMU and harden it by using such a
mapping for patching a kernel with strict RWX permissions.

The first four patches implement an LKDTM test "proof-of-concept" which
exploits the potential vulnerability (ie. the temporary mapping during patching
is exposed in the kernel page tables and accessible by other CPUs) using a
simple brute-force approach. This test is implemented for both powerpc and
x86_64. The test passes on powerpc Radix with this new series, fails on
upstream powerpc, passes on upstream x86_64, and fails on an older (ancient)
x86_64 tree without the x86_64 temporary mm patches. The remaining patches add
support for and use a temporary mm for code patching on powerpc with the Radix
MMU.


I think four first patches (together with last one) are quite independent from the heart of the 
series itself which is patches 5, 6, 7. Maybe you should split that series it two series ? After all 
those selftests are nice to have but are not absolutely necessary, that would help getting forward I 
think.




Tested boot, ftrace, and repeated LKDTM "hijack":
- QEMU+KVM (host: POWER9 Blackbird): Radix MMU w/ KUAP
- QEMU+KVM (host: POWER9 Blackbird): Hash MMU

Tested repeated LKDTM "hijack":
- QEMU+KVM (host: AMD desktop): x86_64 upstream
- QEMU+KVM (host: AMD desktop): x86_64 w/o percpu temp mm to
  verify the LKDTM "hijack" test fails

Tested boot and ftrace:
- QEMU+TCG: ppc44x (bamboo)
- QEMU+TCG: g5 (mac99)

I also tested with various extra config options enabled as suggested in
section 12) in Documentation/process/submit-checklist.rst.

v5: * Only support Book3s64 Radix MMU for now. There are some issues with
  the previous implementation on the Hash MMU as pointed out by Nick
  Piggin. Fixing these is not trivial so we only support the Radix MMU
  for now. I tried using realmode (no data translation) to patch with
  Hash to at least avoid exposing the patch mapping to other CPUs but
  this doesn't reliably work either since we cannot access vmalloc'd
  space in realmode.


So you now accept to have two different mode depending on the platform ?
As far as I remember I commented some time ago that non SMP didn't need that feature and you were 
reluctant to have two different implementations. What made you change your mind ? (just curious).




* Use percpu variables for the patching_mm and patching_addr. This
  avoids the need for synchronization mechanisms entirely. Thanks to
  Peter Zijlstra for pointing out text_mutex which unfortunately didn't
  work out without larger code changes in powerpc. Also thanks to Daniel
  Axtens for comments about using percpu variables for the *percpu* temp
  mm things off list.

v4: * It's time to revisit this series again since @jpn and @mpe fixed
  our known STRICT_*_RWX bugs on powerpc/64s.
* Rebase on linuxppc/next:
   commit ee1bc694fbaec ("powerpc/kvm: Fix build error when 
PPC_MEM_KEYS/PPC_PSERIES=n")
* Completely rework how map_patch() works on book3s64 Hash MMU
* Split the LKDTM x86_64 and powerpc bits into separate patches
* Annotate commit messages with changes from v3 instead of
  listing them here completely out-of context...

v3: * Rebase on linuxppc/next: commit 9123e3a74ec7 ("Linux 5.9-rc1")
* Move temporary mm implementation into code-patching.c where it
  belongs
* Implement LKDTM hijacker test on x86_64 (on IBM time oof) Do
* not use address zero for the patching address in the
  temporary mm (thanks @dja for pointing this out!)
* Wrap the LKDTM test w/ CONFIG_SMP as suggested by Christophe
  Leroy
* Comments to clarify PTE pre-allocation and patching addr
  selection

v2: * Rebase on linuxppc/next:
  commit 105fb38124a4 ("powerpc/8xx: Modify ptep_get()")

Re: [PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-05 Thread Xianting Tian



在 2021/8/5 下午4:18, Greg KH 写道:

On Thu, Aug 05, 2021 at 04:08:46PM +0800, Xianting Tian wrote:

在 2021/8/5 下午3:58, Jiri Slaby 写道:

Hi,

On 04. 08. 21, 4:54, Xianting Tian wrote:

@@ -933,6 +949,16 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno,
int data,
   hp->outbuf_size = outbuf_size;
   hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];
   +    /*
+ * hvc_con_outbuf is guaranteed to be aligned at least to the
+ * size(N_OUTBUF) by kmalloc().
+ */
+    hp->hvc_con_outbuf = kzalloc(N_OUTBUF, GFP_KERNEL);
+    if (!hp->hvc_con_outbuf)
+    return ERR_PTR(-ENOMEM);

This leaks hp, right?

BTW your 2 patches are still not threaded, that is hard to follow.

yes, thanks, I found the bug, I am preparing to do this in v4.

It is the first time I send series patches(number >1), I checked the method
for sending series patch on LKML.org, I should send '0/2' which is the
history info for series patches.

Please use 'git send-email' to send the full series all at once,
otherwise it is hard to make the emails threaded "by hand" if you do not
do so.

I got it, thanks for your guide:)


thanks,

greg k-h


Re: [PATCH 3/3] powerpc/bpf: Reallocate BPF registers to volatile registers when possible on PPC64

2021-08-05 Thread Christophe Leroy




Le 27/07/2021 à 08:55, Jordan Niethe a écrit :

Implement commit 40272035e1d0 ("powerpc/bpf: Reallocate BPF registers to
volatile registers when possible on PPC32") for PPC64.

When the BPF routine doesn't call any function, the non volatile
registers can be reallocated to volatile registers in order to avoid
having to save them/restore on the stack. To keep track of which
registers can be reallocated to make sure registers are set seen when
used.


Maybe you could try and do as on PPC32, try to use r0 as much as possible 
instead of TMP regs.
r0 needs to be used carefully because for some instructions (ex: addi, lwz, etc) r0 means 0 instead 
of register 0, but it would help freeing one more register in several cases.




Before this patch, the test #359 ADD default X is:
0:   nop
4:   nop
8:   std r27,-40(r1)
c:   std r28,-32(r1)
   10:   xor r8,r8,r8
   14:   rotlwi  r8,r8,0
   18:   xor r28,r28,r28
   1c:   rotlwi  r28,r28,0
   20:   mr  r27,r3
   24:   li  r8,66
   28:   add r8,r8,r28
   2c:   rotlwi  r8,r8,0
   30:   ld  r27,-40(r1)
   34:   ld  r28,-32(r1)
   38:   mr  r3,r8
   3c:   blr

After this patch, the same test has become:
0:   nop
4:   nop
8:   xor r8,r8,r8
c:   rotlwi  r8,r8,0
   10:   xor r5,r5,r5
   14:   rotlwi  r5,r5,0
   18:   mr  r4,r3
   1c:   li  r8,66
   20:   add r8,r8,r5
   24:   rotlwi  r8,r8,0
   28:   mr  r3,r8
   2c:   blr

Signed-off-by: Jordan Niethe 
---
  arch/powerpc/net/bpf_jit64.h  |  2 ++
  arch/powerpc/net/bpf_jit_comp64.c | 60 +--
  2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 89b625d9342b..e20521bf77bf 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -70,6 +70,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
   */
  #define PPC_BPF_LL(ctx, r, base, i) do {  
  \
if ((i) % 4) {  
  \
+   bpf_set_seen_register(ctx, 
bpf_to_ppc(ctx, TMP_REG_2));\
EMIT(PPC_RAW_LI(bpf_to_ppc(ctx, 
TMP_REG_2), (i)));\
EMIT(PPC_RAW_LDX(r, base,   
  \
bpf_to_ppc(ctx, 
TMP_REG_2))); \
@@ -78,6 +79,7 @@ const int b2p[MAX_BPF_JIT_REG + 2] = {
} while(0)
  #define PPC_BPF_STL(ctx, r, base, i) do { 
  \
if ((i) % 4) {  
  \
+   bpf_set_seen_register(ctx, 
bpf_to_ppc(ctx, TMP_REG_2));\
EMIT(PPC_RAW_LI(bpf_to_ppc(ctx, 
TMP_REG_2), (i)));\
EMIT(PPC_RAW_STDX(r, base,  
  \
bpf_to_ppc(ctx, 
TMP_REG_2))); \
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index f7a668c1e364..287e0322bbf3 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -66,6 +66,24 @@ static int bpf_jit_stack_offsetof(struct codegen_context 
*ctx, int reg)
  
  void bpf_jit_realloc_regs(struct codegen_context *ctx)

  {
+   if (ctx->seen & SEEN_FUNC)
+   return;
+
+   while (ctx->seen & SEEN_NVREG_MASK &&
+  (ctx->seen & SEEN_VREG_MASK) != SEEN_VREG_MASK) {
+   int old = 32 - fls(ctx->seen & SEEN_NVREG_MASK);
+   int new = 32 - fls(~ctx->seen & SEEN_VREG_MASK);
+   int i;
+
+   for (i = BPF_REG_0; i <= TMP_REG_2; i++) {
+   if (ctx->b2p[i] != old)
+   continue;
+   ctx->b2p[i] = new;
+   bpf_set_seen_register(ctx, new);
+   bpf_clear_seen_register(ctx, old);
+   break;
+   }
+   }


This function is not very different from the one for PPC32. Maybe we could cook 
a common function.



  }
  
  void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)

@@ -106,10 +124,9 @@ void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
 * If we haven't created our own stack frame, we save these
 * in the protected zone below the previous stack frame
 */
-   for (i = BPF_REG_6; i <= BPF_REG_10; i++)
-   if (bpf_is_seen_register(ctx, bpf_to_ppc(ctx, i)))
-   PPC_BPF_STL(ctx, bpf_to_ppc(ctx, i), 1,
-   bpf_jit_stack_offsetof(ctx, bpf_to_ppc(ctx, 
i)));
+   for (i = BPF_PPC_NVR_MIN; i <= 31; i++)
+   if (bpf_is_seen_register(ctx, i))
+   

Re: [PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-05 Thread Xianting Tian


在 2021/8/5 下午3:58, Jiri Slaby 写道:

Hi,

On 04. 08. 21, 4:54, Xianting Tian wrote:
@@ -933,6 +949,16 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, 
int data,

  hp->outbuf_size = outbuf_size;
  hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];
  +    /*
+ * hvc_con_outbuf is guaranteed to be aligned at least to the
+ * size(N_OUTBUF) by kmalloc().
+ */
+    hp->hvc_con_outbuf = kzalloc(N_OUTBUF, GFP_KERNEL);
+    if (!hp->hvc_con_outbuf)
+    return ERR_PTR(-ENOMEM);


This leaks hp, right?

BTW your 2 patches are still not threaded, that is hard to follow.


yes, thanks, I found the bug, I am preparing to do this in v4.

It is the first time I send series patches(number >1), I checked the 
method for sending series patch on LKML.org, I should send '0/2' which 
is the history info for series patches.


I will add 0/2 in v4, sorry again for this:(

beside avove things, the solution in this patch is for you? thanks




+
+    spin_lock_init(>hvc_con_lock);
+
  tty_port_init(>port);
  hp->port.ops = _port_ops;


thanks,


[PATCH linux-next] powerpc/tm: remove duplicate include in tm-poison.c

2021-08-05 Thread cgel . zte
From: yong yiran 

'inttypes.h' included in 'tm-poison.c' is duplicated.
Remove all but the first include of inttypes.h from tm-poison.c.

Reported-by: Zeal Robot 
Signed-off-by: yong yiran 
---
 tools/testing/selftests/powerpc/tm/tm-poison.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-poison.c 
b/tools/testing/selftests/powerpc/tm/tm-poison.c
index 29e5f26af7b9..27c083a03d1f 100644
--- a/tools/testing/selftests/powerpc/tm/tm-poison.c
+++ b/tools/testing/selftests/powerpc/tm/tm-poison.c
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "tm.h"
 
-- 
2.25.1



Re: [PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-05 Thread Greg KH
On Thu, Aug 05, 2021 at 04:08:46PM +0800, Xianting Tian wrote:
> 
> 在 2021/8/5 下午3:58, Jiri Slaby 写道:
> > Hi,
> > 
> > On 04. 08. 21, 4:54, Xianting Tian wrote:
> > > @@ -933,6 +949,16 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno,
> > > int data,
> > >   hp->outbuf_size = outbuf_size;
> > >   hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];
> > >   +    /*
> > > + * hvc_con_outbuf is guaranteed to be aligned at least to the
> > > + * size(N_OUTBUF) by kmalloc().
> > > + */
> > > +    hp->hvc_con_outbuf = kzalloc(N_OUTBUF, GFP_KERNEL);
> > > +    if (!hp->hvc_con_outbuf)
> > > +    return ERR_PTR(-ENOMEM);
> > 
> > This leaks hp, right?
> > 
> > BTW your 2 patches are still not threaded, that is hard to follow.
> 
> yes, thanks, I found the bug, I am preparing to do this in v4.
> 
> It is the first time I send series patches(number >1), I checked the method
> for sending series patch on LKML.org, I should send '0/2' which is the
> history info for series patches.

Please use 'git send-email' to send the full series all at once,
otherwise it is hard to make the emails threaded "by hand" if you do not
do so.

thanks,

greg k-h


Re: [PATCH] powerpc/kprobes: Fix kprobe Oops happens in booke

2021-08-05 Thread Pu Lehui




On 2021/8/5 14:13, Michael Ellerman wrote:

Pu Lehui  writes:

When using kprobe on powerpc booke series processor, Oops happens
as show bellow:

[   35.861352] Oops: Exception in kernel mode, sig: 5 [#1]
[   35.861676] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
[   35.861905] Modules linked in:
[   35.862144] CPU: 0 PID: 76 Comm: sh Not tainted 
5.14.0-rc3-00060-g7e96bf476270 #18
[   35.862610] NIP:  c0b96470 LR: c00107b4 CTR: c0161c80
[   35.862805] REGS: c387fe70 TRAP: 0700   Not tainted 
(5.14.0-rc3-00060-g7e96bf476270)
[   35.863198] MSR:  00029002   CR: 24022824  XER: 2000
[   35.863577]
[   35.863577] GPR00: c0015218 c387ff20 c313e300 c387ff50 0004 4002 
4000 0a1a2cce
[   35.863577] GPR08:  0004  59764000 24022422 102490c2 
 
[   35.863577] GPR16:   0040 1024 1024 1024 
1024 1022
[   35.863577] GPR24:  1024   bfc655e8 0800 
c387ff50 
[   35.865367] NIP [c0b96470] schedule+0x0/0x130
[   35.865606] LR [c00107b4] interrupt_exit_user_prepare_main+0xf4/0x100
[   35.865974] Call Trace:
[   35.866142] [c387ff20] [c0053224] irq_exit+0x114/0x120 (unreliable)
[   35.866472] [c387ff40] [c0015218] interrupt_return+0x14/0x13c
[   35.866728] --- interrupt: 900 at 0x100af3dc
[   35.866963] NIP:  100af3dc LR: 100de020 CTR: 
[   35.867177] REGS: c387ff50 TRAP: 0900   Not tainted 
(5.14.0-rc3-00060-g7e96bf476270)
[   35.867488] MSR:  0002f902   CR: 20022422  XER: 2000
[   35.867808]
[   35.867808] GPR00: c001509c bfc65570 1024b4d0  100de020 20022422 
bfc655a8 100af3dc
[   35.867808] GPR08: 0002f902    72656773 102490c2 
 
[   35.867808] GPR16:   0040 1024 1024 1024 
1024 1022
[   35.867808] GPR24:  1024   bfc655e8 10245910 
 0001
[   35.869406] NIP [100af3dc] 0x100af3dc
[   35.869578] LR [100de020] 0x100de020
[   35.869751] --- interrupt: 900
[   35.870001] Instruction dump:
[   35.870283] 40c20010 815e0518 714a0100 41e2fd04 3920 913e00c0 3b1e0450 
4bfffd80
[   35.870666] 0fe0 92a10024 4bfff1a9 6000 <7fe8> 7c0802a6 93e1001c 
7c5f1378
[   35.871339] ---[ end trace 23ff848139efa9b9 ]---

There is no real mode for booke arch and the MMU translation is
always on. The corresponding MSR_IS/MSR_DS bit in booke is used
to switch the address space, but not for real mode judgment.

Fixes: 21f8b2fa3ca5 ("powerpc/kprobes: Ignore traps that happened in real mode")
Signed-off-by: Pu Lehui 
---
  arch/powerpc/include/asm/ptrace.h | 6 ++
  arch/powerpc/kernel/kprobes.c | 5 +
  2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 3e5d470a6155..4aec1a97024b 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -187,6 +187,12 @@ static inline unsigned long frame_pointer(struct pt_regs 
*regs)
  #define user_mode(regs) (((regs)->msr & MSR_PR) != 0)
  #endif
  
+#ifdef CONFIG_BOOKE

+#define real_mode(regs)0
+#else
+#define real_mode(regs)(!((regs)->msr & MSR_IR) || !((regs)->msr & 
MSR_DR))
+#endif


I'm not sure about this helper.

Arguably it should only return true if both MSR_IR and MSR_DR are clear.



diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index cbc28d1a2e1b..fac9a5974718 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -289,10 +289,7 @@ int kprobe_handler(struct pt_regs *regs)
unsigned int *addr = (unsigned int *)regs->nip;
struct kprobe_ctlblk *kcb;
  
-	if (user_mode(regs))

-   return 0;
-
-   if (!(regs->msr & MSR_IR) || !(regs->msr & MSR_DR))
+   if (user_mode(regs) || real_mode(regs))
return 0;


I think just adding an IS_ENABLED(CONFIG_BOOKE) here might be better.

cheers
.


Thanks for your suggestion, I will fix it in v2.

Best regards
Lehui


Re: [PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-05 Thread Jiri Slaby

On 05. 08. 21, 9:58, Jiri Slaby wrote:

Hi,

On 04. 08. 21, 4:54, Xianting Tian wrote:
@@ -933,6 +949,16 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, 
int data,

  hp->outbuf_size = outbuf_size;
  hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];


This deserves cleanup too. Why is "outbuf" not "char outbuf[0] 
__ALIGNED__" at the end of the structure? The allocation would be easier 
(using struct_size()) and this line would be gone completely.



+    /*
+ * hvc_con_outbuf is guaranteed to be aligned at least to the
+ * size(N_OUTBUF) by kmalloc().
+ */
+    hp->hvc_con_outbuf = kzalloc(N_OUTBUF, GFP_KERNEL);
+    if (!hp->hvc_con_outbuf)
+    return ERR_PTR(-ENOMEM);


This leaks hp, right?


Actually, why don't you make
char c[N_OUTBUF] __ALIGNED__;

part of struct hvc_struct directly?


BTW your 2 patches are still not threaded, that is hard to follow.


+
+    spin_lock_init(>hvc_con_lock);
+
  tty_port_init(>port);
  hp->port.ops = _port_ops;


thanks,

--
js
suse labs


[Bug 213961] Oops while loading radeon driver

2021-08-05 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213961

--- Comment #8 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Great:

[   15.246367] NIP [bea42a80] radeon_agp_head_init+0x1c/0xf8 [radeon]
[   15.246969] LR [bea39860] radeon_driver_load_kms+0x1bc/0x1f4 [radeon]
[   15.247160] Call Trace:
[   15.247168] [f2b75c30] [c0c1ec60] 0xc0c1ec60 (unreliable)
[   15.247180] [f2b75c50] [bea39860] radeon_driver_load_kms+0x1bc/0x1f4
[radeon]
[   15.247343] [f2b75c80] [be8cc74c] drm_dev_register+0x10c/0x268 [drm]
[   15.247718] [f2b75cb0] [bea36484] radeon_pci_probe+0x108/0x190 [radeon]
[   15.248001] [f2b75cd0] [c03832fc] pci_device_probe+0xf4/0x1a4

So we now know we have a NULL pointer dereference in radeon_agp_head_init().

Looks like all this code is quite recent, at least there are recent
modification, so I think you should address it with RADEON people, I'm not sure
the problem is a PPC32 subject.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v3 1/2] tty: hvc: pass DMA capable memory to put_chars()

2021-08-05 Thread Jiri Slaby

Hi,

On 04. 08. 21, 4:54, Xianting Tian wrote:

@@ -933,6 +949,16 @@ struct hvc_struct *hvc_alloc(uint32_t vtermno, int data,
hp->outbuf_size = outbuf_size;
hp->outbuf = &((char *)hp)[ALIGN(sizeof(*hp), sizeof(long))];
  
+	/*

+* hvc_con_outbuf is guaranteed to be aligned at least to the
+* size(N_OUTBUF) by kmalloc().
+*/
+   hp->hvc_con_outbuf = kzalloc(N_OUTBUF, GFP_KERNEL);
+   if (!hp->hvc_con_outbuf)
+   return ERR_PTR(-ENOMEM);


This leaks hp, right?

BTW your 2 patches are still not threaded, that is hard to follow.


+
+   spin_lock_init(>hvc_con_lock);
+
tty_port_init(>port);
hp->port.ops = _port_ops;
  


thanks,
--
js
suse labs


[PATCH kernel v2] KVM: PPC: Use arch_get_random_seed_long instead of powernv variant

2021-08-05 Thread Alexey Kardashevskiy
The powernv_get_random_long() does not work in nested KVM (which is
pseries) and produces a crash when accessing in_be64(rng->regs) in
powernv_get_random_long().

This replaces powernv_get_random_long with the ppc_md machine hook
wrapper.

Signed-off-by: Alexey Kardashevskiy 
---

Changes:
v2:
* replaces [PATCH kernel] powerpc/powernv: Check if powernv_rng is initialized

---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index be0cde26f156..ecfd133e0ca8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1165,7 +1165,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
 #endif
case H_RANDOM:
-   if (!powernv_get_random_long(>arch.regs.gpr[4]))
+   if (!arch_get_random_seed_long(>arch.regs.gpr[4]))
ret = H_HARDWARE;
break;
case H_RPT_INVALIDATE:
-- 
2.30.2



[PATCH 1/3] arch: Export machine_restart() instances so they can be called from modules

2021-08-05 Thread Lee Jones
A recent attempt to convert the Power Reset Restart driver to tristate
failed because of the following compile error (reported once merged by
Stephen Rothwell via Linux Next):

  ERROR: "machine_restart" [drivers/power/reset/restart-poweroff.ko] undefined!

This error occurs since some of the machine_restart() instances are
not currently exported for use in modules.  This patch aims to rectify
that.

Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Guo Ren 
Cc: Yoshinori Sato 
Cc: Brian Cain 
Cc: Geert Uytterhoeven 
Cc: Michal Simek 
Cc: Thomas Bogendoerfer 
Cc: John Crispin 
Cc: Ley Foon Tan 
Cc: Jonas Bonn 
Cc: Stefan Kristiansson 
Cc: Stafford Horne 
Cc: James E.J. Bottomley 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Albert Ou 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Rich Felker 
Cc: David S. Miller 
Cc: Jeff Dike 
Cc: Richard Weinberger 
Cc: Anton Ivanov 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: Greg Kroah-Hartman 
Cc: Sebastian Reichel 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: linux-hexa...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: openr...@lists.librecores.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-xte...@linux-xtensa.org
Signed-off-by: Lee Jones 
---

The 2 patches this change supports have the required Acks already.

NB: If it's safe to omit some of these, let me know and I'll revise the patch.

 arch/arc/kernel/reset.c| 1 +
 arch/arm/kernel/reboot.c   | 1 +
 arch/arm64/kernel/process.c| 1 +
 arch/csky/kernel/power.c   | 1 +
 arch/h8300/kernel/process.c| 1 +
 arch/hexagon/kernel/reset.c| 1 +
 arch/m68k/kernel/process.c | 1 +
 arch/microblaze/kernel/reset.c | 1 +
 arch/mips/kernel/reset.c   | 1 +
 arch/mips/lantiq/falcon/reset.c| 1 +
 arch/mips/sgi-ip27/ip27-reset.c| 1 +
 arch/nios2/kernel/process.c| 1 +
 arch/openrisc/kernel/process.c | 1 +
 arch/parisc/kernel/process.c   | 1 +
 arch/powerpc/kernel/setup-common.c | 1 +
 arch/riscv/kernel/reset.c  | 1 +
 arch/s390/kernel/setup.c   | 1 +
 arch/sh/kernel/reboot.c| 1 +
 arch/sparc/kernel/process_32.c | 1 +
 arch/sparc/kernel/reboot.c | 1 +
 arch/um/kernel/reboot.c| 1 +
 arch/x86/kernel/reboot.c   | 1 +
 arch/xtensa/kernel/setup.c | 1 +
 23 files changed, 23 insertions(+)

diff --git a/arch/arc/kernel/reset.c b/arch/arc/kernel/reset.c
index fd6c3eb930bad..ae4f8a43b0af4 100644
--- a/arch/arc/kernel/reset.c
+++ b/arch/arc/kernel/reset.c
@@ -20,6 +20,7 @@ void machine_restart(char *__unused)
pr_info("Put your restart handler here\n");
machine_halt();
 }
+EXPORT_SYMBOL(machine_restart);
 
 void machine_power_off(void)
 {
diff --git a/arch/arm/kernel/reboot.c b/arch/arm/kernel/reboot.c
index 0ce388f154226..2878260efd130 100644
--- a/arch/arm/kernel/reboot.c
+++ b/arch/arm/kernel/reboot.c
@@ -150,3 +150,4 @@ void machine_restart(char *cmd)
printk("Reboot failed -- System halted\n");
while (1);
 }
+EXPORT_SYMBOL(machine_restart);
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index b4bb67f17a2ca..cf89ce91d7145 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -212,6 +212,7 @@ void machine_restart(char *cmd)
printk("Reboot failed -- System halted\n");
while (1);
 }
+EXPORT_SYMBOL(machine_restart);
 
 #define bstr(suffix, str) [PSR_BTYPE_ ## suffix >> PSR_BTYPE_SHIFT] = str
 static const char *const btypes[] = {
diff --git a/arch/csky/kernel/power.c b/arch/csky/kernel/power.c
index 923ee4e381b81..b466c825cbb3c 100644
--- a/arch/csky/kernel/power.c
+++ b/arch/csky/kernel/power.c
@@ -28,3 +28,4 @@ void machine_restart(char *cmd)
do_kernel_restart(cmd);
asm volatile ("bkpt");
 }
+EXPORT_SYMBOL(machine_restart);
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 46b1342ce515b..8203ac5cd33ec 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -66,6 +66,7 @@ void machine_restart(char *__unused)
local_irq_disable();
__asm__("jmp @@0");
 }
+EXPORT_SYMBOL(machine_restart);
 
 void machine_halt(void)
 {
diff --git a/arch/hexagon/kernel/reset.c b/arch/hexagon/kernel/reset.c
index da36114d928f0..433378d52063c 100644
--- a/arch/hexagon/kernel/reset.c
+++ b/arch/hexagon/kernel/reset.c
@@ -19,6 +19,7 @@ void machine_halt(void)
 void machine_restart(char *cmd)
 {
 }
+EXPORT_SYMBOL(machine_restart);
 
 void 

[PATCH 0/3] power: reset: Convert Power-Off driver to tristate

2021-08-05 Thread Lee Jones
Provide support to compile the Power-Off driver as a module.

Elliot Berman (2):
  reboot: Export reboot_mode
  power: reset: Enable tristate on restart power-off driver

Lee Jones (1):
  arch: Export machine_restart() instances so they can be called from
modules

 arch/arc/kernel/reset.c| 1 +
 arch/arm/kernel/reboot.c   | 1 +
 arch/arm64/kernel/process.c| 1 +
 arch/csky/kernel/power.c   | 1 +
 arch/h8300/kernel/process.c| 1 +
 arch/hexagon/kernel/reset.c| 1 +
 arch/m68k/kernel/process.c | 1 +
 arch/microblaze/kernel/reset.c | 1 +
 arch/mips/kernel/reset.c   | 1 +
 arch/mips/lantiq/falcon/reset.c| 1 +
 arch/mips/sgi-ip27/ip27-reset.c| 1 +
 arch/nios2/kernel/process.c| 1 +
 arch/openrisc/kernel/process.c | 1 +
 arch/parisc/kernel/process.c   | 1 +
 arch/powerpc/kernel/setup-common.c | 1 +
 arch/riscv/kernel/reset.c  | 1 +
 arch/s390/kernel/setup.c   | 1 +
 arch/sh/kernel/reboot.c| 1 +
 arch/sparc/kernel/process_32.c | 1 +
 arch/sparc/kernel/reboot.c | 1 +
 arch/um/kernel/reboot.c| 1 +
 arch/x86/kernel/reboot.c   | 1 +
 arch/xtensa/kernel/setup.c | 1 +
 drivers/power/reset/Kconfig| 2 +-
 kernel/reboot.c| 2 ++
 25 files changed, 26 insertions(+), 1 deletion(-)

Cc: Albert Ou 
Cc: Anton Ivanov 
Cc: Borislav Petkov 
Cc: Brian Cain 
Cc: Catalin Marinas 
Cc: Christian Borntraeger 
Cc: Chris Zankel 
Cc: David S. Miller 
Cc: Geert Uytterhoeven 
Cc: Guo Ren 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Ingo Molnar 
Cc: James E.J. Bottomley 
Cc: Jeff Dike 
Cc: John Crispin 
Cc: Jonas Bonn 
Cc: Ley Foon Tan 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-hexa...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-snps-...@lists.infradead.org
Cc: linux...@lists.infradead.org
Cc: linux-xte...@linux-xtensa.org
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Michal Simek 
Cc: openr...@lists.librecores.org
Cc: Palmer Dabbelt 
Cc: Paul Walmsley 
Cc: Richard Weinberger 
Cc: Rich Felker 
Cc: sparcli...@vger.kernel.org
Cc: Stafford Horne 
Cc: Stefan Kristiansson 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: uclinux-h8-de...@lists.sourceforge.jp
Cc: Vasily Gorbik 
Cc: Vineet Gupta 
Cc: Will Deacon 
Cc: Yoshinori Sato 
-- 
2.32.0.605.g8dce9f2422-goog



Re: [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault

2021-08-05 Thread Bharata B Rao
On Thu, Aug 05, 2021 at 12:54:34PM +0530, Bharata B Rao wrote:
> Hi,
> 
> This series adds asynchronous page fault support for pseries guests
> and enables the support for the same in powerpc KVM. This is an
> early RFC with details and multiple TODOs listed in patch descriptions.
> 
> This patch needs supporting enablement in QEMU too which will be
> posted separately.

QEMU part is posted here:
https://lore.kernel.org/qemu-devel/20210805073228.502292-2-bhar...@linux.ibm.com/T/#u

Regards,
Bharata.


[RFC PATCH v0 5/5] pseries: Asynchronous page fault support

2021-08-05 Thread Bharata B Rao
Add asynchronous page fault support for pseries guests.

1. Setup the guest to handle async-pf
   - Issue H_REG_SNS hcall to register the SNS region.
   - Setup the subvention interrupt irq.
   - Enable async-pf by updating the byte_b9 of VPA for each
 CPU.
2. Check if the page fault is an expropriation notification
   (SRR1_PROGTRAP set in SRR1) and if so put the task on
   wait queue based on the expropriation correlation number
   read from the VPA.
3. Handle subvention interrupt to wake any waiting tasks.
   The wait and wakeup mechanism from x86 async-pf implementation
   is being reused here.

TODO:
- Check how to keep this feature together with other CMO features.
- The async-pf check in the page fault handler path is limited to
  guest with an #ifdef. This isn't sufficient and hence needs to
  be replaced by an appropriate check.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/async-pf.h   |  12 ++
 arch/powerpc/mm/fault.c   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++
 4 files changed, 238 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

diff --git a/arch/powerpc/include/asm/async-pf.h 
b/arch/powerpc/include/asm/async-pf.h
new file mode 100644
index ..95d6c3da9f50
--- /dev/null
+++ b/arch/powerpc/include/asm/async-pf.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. 
+ */
+
+#ifndef _ASM_POWERPC_ASYNC_PF_H
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr);
+#define _ASM_POWERPC_ASYNC_PF_H
+#endif
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..bbdc61605885 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -44,7 +44,7 @@
 #include 
 #include 
 #include 
-
+#include 
 
 /*
  * do_page_fault error handling helpers
@@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned 
long address,
vm_fault_t fault, major = 0;
bool kprobe_fault = kprobe_page_fault(regs, 11);
 
+#ifdef CONFIG_PPC_PSERIES
+   if (handle_async_page_fault(regs, address))
+   return 0;
+#endif
+
if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
return 0;
 
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..e0ada605ef20 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,7 @@ obj-y   := lpar.o hvCall.o nvram.o reconfig.o \
   of_helpers.o \
   setup.o iommu.o event_sources.o ras.o \
   firmware.o power.o dlpar.o mobility.o rng.o \
-  pci.o pci_dlpar.o eeh_pseries.o msi.o
+  pci.o pci_dlpar.o eeh_pseries.o msi.o async-pf.o
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_SCANLOG)  += scanlog.o
 obj-$(CONFIG_KEXEC_CORE)   += kexec.o
diff --git a/arch/powerpc/platforms/pseries/async-pf.c 
b/arch/powerpc/platforms/pseries/async-pf.c
new file mode 100644
index ..c2f3bbc0d674
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/async-pf.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static char sns_buffer[PAGE_SIZE] __aligned(4096);
+static uint16_t *esn_q = (uint16_t *)sns_buffer + 1;
+static unsigned long next_eq_entry, nr_eq_entries;
+
+#define ASYNC_PF_SLEEP_HASHBITS 8
+#define ASYNC_PF_SLEEP_HASHSIZE (1token == token)
+   return n;
+   }
+
+   return NULL;
+}
+static int async_pf_queue_task(u64 token, struct async_pf_sleep_node *n)
+{
+   u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+   struct async_pf_sleep_head *b = _pf_sleepers[key];
+   struct async_pf_sleep_node *e;
+
+   raw_spin_lock(>lock);
+   e = _find_apf_task(b, token);
+   if (e) {
+   /* dummy entry exist -> wake up was delivered ahead of PF */
+   hlist_del(>link);
+   raw_spin_unlock(>lock);
+   kfree(e);
+   return false;
+   }
+
+   n->token = token;
+   n->cpu = smp_processor_id();
+   init_swait_queue_head(>wq);
+   hlist_add_head(>link, >list);
+   raw_spin_unlock(>lock);
+   return true;
+}
+
+/*
+ * Handle Expropriation notification.
+ */
+int 

[RFC PATCH v0 4/5] KVM: PPC: BOOK3S HV: Async PF support

2021-08-05 Thread Bharata B Rao
Add asynchronous page fault support for PowerKVM by making
use of the Expropriation/Subvention Notification Option
defined by PAPR specifications.

1. When guest accessed page isn't immediately available in the
host, update the vcpu's VPA with a unique expropriation correlation
number and inject a DSI to the guest with SRR1_PROGTRAP bit set in
SRR1. This informs the guest vcpu to put the process to wait and
schedule a different process.
   - Async PF is supported for data pages in this implementation
 though PAPR allows it for code pages too.
   - Async PF is supported only for user pages here.
   - The feature is currently limited only to radix guests.

2. When the page becomes available, update the Subvention Notification
Structure  with the corresponding expropriation correlation number and
and inform the guest via subvention interrupt.
   - Subvention Notification Structure (SNS) is a region of memory
 shared between host and guest via which the communication related
 to expropriated and subvened pages happens between guest and host.
   - SNS region is registered by the guest via H_REG_SNS hcall which
 is implemented in QEMU.
   - H_REG_SNS implementation in QEMU needs a new ioctl KVM_PPC_SET_SNS.
 This ioctl is used to map and pin the guest page containing SNS
 in the host.
   - Subvention notification interrupt is raised to the guest by
 QEMU in response to the guest exit via KVM_REQ_ESN_EXIT. This
 interrupt informs the guest about the availability of the
 pages.

TODO:
- H_REG_SNS is implemented in QEMU because this hcall needs to return
  the interrupt source number associated with the subvention interrupt.
  Claiming of IRQ line and raising an external interrupt seem to be
  straightforward from QEMU. Figure out the in-kernel equivalents for
  these two so that, we can save on guest exit for each expropriated
  page and move the entire hcall implementation into the host kernel.
- The code is pretty much experimental and is barely able to boot a
  guest. I do see some requests for expropriated pages not getting
  fulfilled by host leading the long delays in guest. This needs some
  debugging.
- A few other aspects recommended by PAPR around this feature(like
  setting of page state flags) need to be evaluated and incorporated
  into the implementation if found appropriate.

Signed-off-by: Bharata B Rao 
---
 Documentation/virt/kvm/api.rst|  15 ++
 arch/powerpc/include/asm/hvcall.h |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h   |  21 +++
 arch/powerpc/include/asm/kvm_ppc.h|   1 +
 arch/powerpc/include/asm/lppaca.h |  12 +-
 arch/powerpc/include/uapi/asm/kvm.h   |   6 +
 arch/powerpc/kvm/Kconfig  |   2 +
 arch/powerpc/kvm/Makefile |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c|   3 +
 arch/powerpc/kvm/book3s_hv.c  |  25 +++
 arch/powerpc/kvm/book3s_hv_esn.c  | 189 ++
 include/uapi/linux/kvm.h  |   1 +
 tools/include/uapi/linux/kvm.h|   1 +
 14 files changed, 303 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..512f078b9d02 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5293,6 +5293,21 @@ the trailing ``'\0'``, is indicated by ``name_size`` in 
the header.
 The Stats Data block contains an array of 64-bit values in the same order
 as the descriptors in Descriptors block.
 
+4.134 KVM_PPC_SET_SNS
+-
+
+:Capability: basic
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: none
+:Returns: 0 on successful completion,
+
+As part of H_REG_SNS hypercall, this ioctl is used to map and pin
+the guest provided SNS structure in the host.
+
+This is used for providing asynchronous page fault support for
+powerpc pseries KVM guests.
+
 5. The kvm_run structure
 
 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 9bcf345cb208..9e33500c1723 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -321,6 +321,7 @@
 #define H_SCM_UNBIND_ALL0x3FC
 #define H_SCM_HEALTH0x400
 #define H_SCM_PERFORMANCE_STATS 0x418
+#define H_REG_SNS  0x41C
 #define H_RPT_INVALIDATE   0x448
 #define H_SCM_FLUSH0x44C
 #define MAX_HCALL_OPCODE   H_SCM_FLUSH
diff --git a/arch/powerpc/include/asm/kvm_book3s_esn.h 
b/arch/powerpc/include/asm/kvm_book3s_esn.h
new file mode 100644
index ..d79a441ea31d
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_esn.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_BOOK3S_ESN_H__
+#define __ASM_KVM_BOOK3S_ESN_H__
+
+/* SNS 

[RFC PATCH v0 3/5] KVM: PPC: Book3S: Enable setting SRR1 flags for DSI

2021-08-05 Thread Bharata B Rao
kvmppc_core_queue_data_storage() doesn't provide an option to
set SRR1 flags when raising DSI. Since kvmppc_inject_interrupt()
allows for such a provision, add an argument to allow the same.

This will be used to raise DSI with SRR1_PROGTRAP set when
expropriation interrupt needs to be injected to the guest.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/kvm_ppc.h | 3 ++-
 arch/powerpc/kvm/book3s.c  | 6 +++---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +++---
 arch/powerpc/kvm/book3s_hv.c   | 4 ++--
 arch/powerpc/kvm/book3s_hv_nested.c| 4 ++--
 arch/powerpc/kvm/book3s_pr.c   | 4 ++--
 6 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 2d88944f9f34..09235bdfd4ac 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -143,7 +143,8 @@ extern void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu 
*vcpu, ulong dear_flags,
ulong esr_flags);
 extern void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
   ulong dear_flags,
-  ulong esr_flags);
+  ulong esr_flags,
+  ulong srr1_flags);
 extern void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
   ulong esr_flags);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 79833f78d1da..f7f6641a788d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -284,11 +284,11 @@ void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu)
 }
 
 void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
-   ulong flags)
+   ulong dsisr, ulong srr1)
 {
kvmppc_set_dar(vcpu, dar);
-   kvmppc_set_dsisr(vcpu, flags);
-   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0);
+   kvmppc_set_dsisr(vcpu, dsisr);
+   kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, srr1);
 }
 EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..618206a504b0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -946,7 +946,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
if (dsisr & DSISR_BADACCESS) {
/* Reflect to the guest as DSI */
pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr);
-   kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+   kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
return RESUME_GUEST;
}
 
@@ -971,7 +971,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 * Bad address in guest page table tree, or other
 * unusual error - reflect it to the guest as DSI.
 */
-   kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+   kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
return RESUME_GUEST;
}
return kvmppc_hv_emulate_mmio(vcpu, gpa, ea, writing);
@@ -981,7 +981,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
if (writing) {
/* give the guest a DSI */
kvmppc_core_queue_data_storage(vcpu, ea, DSISR_ISSTORE |
-  DSISR_PROTFAULT);
+  DSISR_PROTFAULT, 0);
return RESUME_GUEST;
}
kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47ccd4a2df54..d07e9065f7c1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1592,7 +1592,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 
if (!(vcpu->arch.fault_dsisr & (DSISR_NOHPTE | 
DSISR_PROTFAULT))) {
kvmppc_core_queue_data_storage(vcpu,
-   vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+   vcpu->arch.fault_dar, vcpu->arch.fault_dsisr, 
0);
r = RESUME_GUEST;
break;
}
@@ -1610,7 +1610,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
r = RESUME_PAGE_FAULT;
} else {
kvmppc_core_queue_data_storage(vcpu,
-   vcpu->arch.fault_dar, err);
+   vcpu->arch.fault_dar, err, 0);
   

[RFC PATCH v0 2/5] KVM: PPC: Add support for KVM_REQ_ESN_EXIT

2021-08-05 Thread Bharata B Rao
Add a new KVM exit request KVM_REQ_ESN_EXIT that will be used
to exit to userspace (QEMU) whenever subvention notification
needs to be sent to the guest.

The userspace (QEMU) issues the subvention notification by
injecting an interrupt into the guest.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kvm/book3s_hv.c| 8 
 include/uapi/linux/kvm.h| 1 +
 3 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 9f52f282b1aa..204dc2d91388 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -52,6 +52,7 @@
 #define KVM_REQ_WATCHDOG   KVM_ARCH_REQ(0)
 #define KVM_REQ_EPR_EXIT   KVM_ARCH_REQ(1)
 #define KVM_REQ_PENDING_TIMER  KVM_ARCH_REQ(2)
+#define KVM_REQ_ESN_EXIT   KVM_ARCH_REQ(3)
 
 #include 
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 085fb8ecbf68..47ccd4a2df54 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2820,6 +2820,14 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu 
*vcpu)
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
 {
+   /*
+* If subvention interrupt needs to be injected to the guest
+* exit to user space.
+*/
+   if (kvm_check_request(KVM_REQ_ESN_EXIT, vcpu)) {
+   vcpu->run->exit_reason = KVM_EXIT_ESN;
+   return 0;
+   }
/* Indicate we want to get back into the guest */
return 1;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..47be532ed14b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD32
 #define KVM_EXIT_X86_BUS_LOCK 33
 #define KVM_EXIT_XEN  34
+#define KVM_EXIT_ESN 35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
-- 
2.31.1



[RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault

2021-08-05 Thread Bharata B Rao
Hi,

This series adds asynchronous page fault support for pseries guests
and enables the support for the same in powerpc KVM. This is an
early RFC with details and multiple TODOs listed in patch descriptions.

This patch needs supporting enablement in QEMU too which will be
posted separately.

Bharata B Rao (5):
  powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
  KVM: PPC: Add support for KVM_REQ_ESN_EXIT
  KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
  KVM: PPC: BOOK3S HV: Async PF support
  pseries: Asynchronous page fault support

 Documentation/virt/kvm/api.rst|  15 ++
 arch/powerpc/include/asm/async-pf.h   |  12 ++
 arch/powerpc/include/asm/hvcall.h |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h   |  22 +++
 arch/powerpc/include/asm/kvm_ppc.h|   4 +-
 arch/powerpc/include/asm/lppaca.h |  20 +-
 arch/powerpc/include/uapi/asm/kvm.h   |   6 +
 arch/powerpc/kvm/Kconfig  |   2 +
 arch/powerpc/kvm/Makefile |   5 +-
 arch/powerpc/kvm/book3s.c |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c|   9 +-
 arch/powerpc/kvm/book3s_hv.c  |  37 +++-
 arch/powerpc/kvm/book3s_hv_esn.c  | 189 +++
 arch/powerpc/kvm/book3s_hv_nested.c   |   4 +-
 arch/powerpc/kvm/book3s_pr.c  |   4 +-
 arch/powerpc/mm/fault.c   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++
 drivers/cpuidle/cpuidle-pseries.c |   4 +-
 include/uapi/linux/kvm.h  |   2 +
 tools/include/uapi/linux/kvm.h|   1 +
 22 files changed, 574 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

-- 
2.31.1



[RFC PATCH v0 1/5] powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9

2021-08-05 Thread Bharata B Rao
VPA byte offset 0xB9 was named as donate_dedicated_cpu as that
was the only used bit. The Expropriation/Subvention support defines
a bit in byte offset 0xB9. Define this bit and rename the field
in VPA to a generic name.

Signed-off-by: Bharata B Rao 
---
 arch/powerpc/include/asm/lppaca.h | 8 +++-
 drivers/cpuidle/cpuidle-pseries.c | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index c390ec377bae..57e432766f3e 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -80,7 +80,7 @@ struct lppaca {
u8  ebb_regs_in_use;
u8  reserved7[6];
u8  dtl_enable_mask;/* Dispatch Trace Log mask */
-   u8  donate_dedicated_cpu;   /* Donate dedicated CPU cycles */
+   u8  byte_b9; /* Donate dedicated CPU cycles & Expropriation int */
u8  fpregs_in_use;
u8  pmcregs_in_use;
u8  reserved8[28];
@@ -116,6 +116,12 @@ struct lppaca {
 
 #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
+/*
+ * Flags for Byte offset 0xB9
+ */
+#define LPPACA_DONATE_DED_CPU_CYCLES   0x1
+#define LPPACA_EXP_INT_ENABLED 0x2
+
 /*
  * We are using a non architected field to determine if a partition is
  * shared or dedicated. This currently works on both KVM and PHYP, but
diff --git a/drivers/cpuidle/cpuidle-pseries.c 
b/drivers/cpuidle/cpuidle-pseries.c
index a2b5c6f60cf0..b9d0f41c3f19 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -221,7 +221,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
u8 old_latency_hint;
 
pseries_idle_prolog();
-   get_lppaca()->donate_dedicated_cpu = 1;
+   get_lppaca()->byte_b9 |= LPPACA_DONATE_DED_CPU_CYCLES;
old_latency_hint = get_lppaca()->cede_latency_hint;
get_lppaca()->cede_latency_hint = cede_latency_hint[index];
 
@@ -229,7 +229,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
check_and_cede_processor();
 
local_irq_disable();
-   get_lppaca()->donate_dedicated_cpu = 0;
+   get_lppaca()->byte_b9 &= ~LPPACA_DONATE_DED_CPU_CYCLES;
get_lppaca()->cede_latency_hint = old_latency_hint;
 
pseries_idle_epilog();
-- 
2.31.1



Re: [PATCH v1 11/55] powerpc/time: add API for KVM to re-arm the host timer/decrementer

2021-08-05 Thread Christophe Leroy




Le 26/07/2021 à 05:49, Nicholas Piggin a écrit :

Rather than have KVM look up the host timer and fiddle with the
irq-work internal details, have the powerpc/time.c code provide a
function for KVM to re-arm the Linux timer code when exiting a
guest.

This is implementation has an improvement over existing code of
marking a decrementer interrupt as soft-pending if a timer has
expired, rather than setting DEC to a -ve value, which tended to
cause host timers to take two interrupts (first hdec to exit the
guest, then the immediate dec).

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/time.h | 16 +++---
  arch/powerpc/kernel/time.c  | 52 +++--
  arch/powerpc/kvm/book3s_hv.c|  7 ++---
  3 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 69b6be617772..924b2157882f 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -99,18 +99,6 @@ extern void div128_by_32(u64 dividend_high, u64 dividend_low,
  extern void secondary_cpu_time_init(void);
  extern void __init time_init(void);
  
-#ifdef CONFIG_PPC64

-static inline unsigned long test_irq_work_pending(void)
-{
-   unsigned long x;
-
-   asm volatile("lbz %0,%1(13)"
-   : "=r" (x)
-   : "i" (offsetof(struct paca_struct, irq_work_pending)));
-   return x;
-}
-#endif
-
  DECLARE_PER_CPU(u64, decrementers_next_tb);
  
  static inline u64 timer_get_next_tb(void)

@@ -118,6 +106,10 @@ static inline u64 timer_get_next_tb(void)
return __this_cpu_read(decrementers_next_tb);
  }
  
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE

+void timer_rearm_host_dec(u64 now);
+#endif
+
  /* Convert timebase ticks to nanoseconds */
  unsigned long long tb_to_ns(unsigned long long tb_ticks);
  
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c

index 72d872b49167..016828b7401b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -499,6 +499,16 @@ EXPORT_SYMBOL(profile_pc);
   * 64-bit uses a byte in the PACA, 32-bit uses a per-cpu variable...
   */
  #ifdef CONFIG_PPC64
+static inline unsigned long test_irq_work_pending(void)
+{
+   unsigned long x;
+
+   asm volatile("lbz %0,%1(13)"
+   : "=r" (x)
+   : "i" (offsetof(struct paca_struct, irq_work_pending)));


Can we just use READ_ONCE() instead of hard coding the read ?



+   return x;
+}
+
  static inline void set_irq_work_pending_flag(void)
  {
asm volatile("stb %0,%1(13)" : :
@@ -542,13 +552,44 @@ void arch_irq_work_raise(void)
preempt_enable();
  }
  
+static void set_dec_or_work(u64 val)

+{
+   set_dec(val);
+   /* We may have raced with new irq work */
+   if (unlikely(test_irq_work_pending()))
+   set_dec(1);
+}
+
  #else  /* CONFIG_IRQ_WORK */
  
  #define test_irq_work_pending()	0

  #define clear_irq_work_pending()
  
+static void set_dec_or_work(u64 val)

+{
+   set_dec(val);
+}
  #endif /* CONFIG_IRQ_WORK */
  
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE

+void timer_rearm_host_dec(u64 now)
+{
+   u64 *next_tb = this_cpu_ptr(_next_tb);
+
+   WARN_ON_ONCE(!arch_irqs_disabled());
+   WARN_ON_ONCE(mfmsr() & MSR_EE);
+
+   if (now >= *next_tb) {
+   local_paca->irq_happened |= PACA_IRQ_DEC;
+   } else {
+   now = *next_tb - now;
+   if (now <= decrementer_max)
+   set_dec_or_work(now);
+   }
+}
+EXPORT_SYMBOL_GPL(timer_rearm_host_dec);
+#endif
+
  /*
   * timer_interrupt - gets called when the decrementer overflows,
   * with interrupts disabled.
@@ -609,10 +650,7 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
} else {
now = *next_tb - now;
if (now <= decrementer_max)
-   set_dec(now);
-   /* We may have raced with new irq work */
-   if (test_irq_work_pending())
-   set_dec(1);
+   set_dec_or_work(now);
__this_cpu_inc(irq_stat.timer_irqs_others);
}
  
@@ -854,11 +892,7 @@ static int decrementer_set_next_event(unsigned long evt,

  struct clock_event_device *dev)
  {
__this_cpu_write(decrementers_next_tb, get_tb() + evt);
-   set_dec(evt);
-
-   /* We may have raced with new irq work */
-   if (test_irq_work_pending())
-   set_dec(1);
+   set_dec_or_work(evt);
  
  	return 0;

  }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6e6cfb10e9bb..0cef578930f9 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4018,11 +4018,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vc->entry_exit_map = 0x101;
vc->in_guest = 0;
  
-	next_timer = timer_get_next_tb();

-   set_dec(next_timer - tb);
-   /* We may 

Re: [PATCH] powerpc/kprobes: Fix kprobe Oops happens in booke

2021-08-05 Thread Michael Ellerman
Pu Lehui  writes:
> When using kprobe on powerpc booke series processor, Oops happens
> as show bellow:
>
> [   35.861352] Oops: Exception in kernel mode, sig: 5 [#1]
> [   35.861676] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
> [   35.861905] Modules linked in:
> [   35.862144] CPU: 0 PID: 76 Comm: sh Not tainted 
> 5.14.0-rc3-00060-g7e96bf476270 #18
> [   35.862610] NIP:  c0b96470 LR: c00107b4 CTR: c0161c80
> [   35.862805] REGS: c387fe70 TRAP: 0700   Not tainted 
> (5.14.0-rc3-00060-g7e96bf476270)
> [   35.863198] MSR:  00029002   CR: 24022824  XER: 2000
> [   35.863577]
> [   35.863577] GPR00: c0015218 c387ff20 c313e300 c387ff50 0004 4002 
> 4000 0a1a2cce
> [   35.863577] GPR08:  0004  59764000 24022422 102490c2 
>  
> [   35.863577] GPR16:   0040 1024 1024 1024 
> 1024 1022
> [   35.863577] GPR24:  1024   bfc655e8 0800 
> c387ff50 
> [   35.865367] NIP [c0b96470] schedule+0x0/0x130
> [   35.865606] LR [c00107b4] interrupt_exit_user_prepare_main+0xf4/0x100
> [   35.865974] Call Trace:
> [   35.866142] [c387ff20] [c0053224] irq_exit+0x114/0x120 (unreliable)
> [   35.866472] [c387ff40] [c0015218] interrupt_return+0x14/0x13c
> [   35.866728] --- interrupt: 900 at 0x100af3dc
> [   35.866963] NIP:  100af3dc LR: 100de020 CTR: 
> [   35.867177] REGS: c387ff50 TRAP: 0900   Not tainted 
> (5.14.0-rc3-00060-g7e96bf476270)
> [   35.867488] MSR:  0002f902   CR: 20022422  XER: 2000
> [   35.867808]
> [   35.867808] GPR00: c001509c bfc65570 1024b4d0  100de020 20022422 
> bfc655a8 100af3dc
> [   35.867808] GPR08: 0002f902    72656773 102490c2 
>  
> [   35.867808] GPR16:   0040 1024 1024 1024 
> 1024 1022
> [   35.867808] GPR24:  1024   bfc655e8 10245910 
>  0001
> [   35.869406] NIP [100af3dc] 0x100af3dc
> [   35.869578] LR [100de020] 0x100de020
> [   35.869751] --- interrupt: 900
> [   35.870001] Instruction dump:
> [   35.870283] 40c20010 815e0518 714a0100 41e2fd04 3920 913e00c0 3b1e0450 
> 4bfffd80
> [   35.870666] 0fe0 92a10024 4bfff1a9 6000 <7fe8> 7c0802a6 
> 93e1001c 7c5f1378
> [   35.871339] ---[ end trace 23ff848139efa9b9 ]---
>
> There is no real mode for booke arch and the MMU translation is
> always on. The corresponding MSR_IS/MSR_DS bit in booke is used
> to switch the address space, but not for real mode judgment.
>
> Fixes: 21f8b2fa3ca5 ("powerpc/kprobes: Ignore traps that happened in real 
> mode")
> Signed-off-by: Pu Lehui 
> ---
>  arch/powerpc/include/asm/ptrace.h | 6 ++
>  arch/powerpc/kernel/kprobes.c | 5 +
>  2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ptrace.h 
> b/arch/powerpc/include/asm/ptrace.h
> index 3e5d470a6155..4aec1a97024b 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -187,6 +187,12 @@ static inline unsigned long frame_pointer(struct pt_regs 
> *regs)
>  #define user_mode(regs) (((regs)->msr & MSR_PR) != 0)
>  #endif
>  
> +#ifdef CONFIG_BOOKE
> +#define real_mode(regs)  0
> +#else
> +#define real_mode(regs)  (!((regs)->msr & MSR_IR) || !((regs)->msr & 
> MSR_DR))
> +#endif

I'm not sure about this helper.

Arguably it should only return true if both MSR_IR and MSR_DR are clear.


> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index cbc28d1a2e1b..fac9a5974718 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -289,10 +289,7 @@ int kprobe_handler(struct pt_regs *regs)
>   unsigned int *addr = (unsigned int *)regs->nip;
>   struct kprobe_ctlblk *kcb;
>  
> - if (user_mode(regs))
> - return 0;
> -
> - if (!(regs->msr & MSR_IR) || !(regs->msr & MSR_DR))
> + if (user_mode(regs) || real_mode(regs))
>   return 0;

I think just adding an IS_ENABLED(CONFIG_BOOKE) here might be better.

cheers