Re: [PATCH v2 3/3] ASoC: fsl_easrc: Add EASRC ASoC CPU DAI and platform drivers

2020-02-25 Thread Nicolin Chen
On Mon, Feb 24, 2020 at 08:53:25AM +, S.j. Wang wrote:
> Hi
> 
> > >
> > > Signed-off-by: Shengjiu Wang 
> > > ---
> > >  sound/soc/fsl/Kconfig   |   10 +
> > >  sound/soc/fsl/Makefile  |2 +
> > >  sound/soc/fsl/fsl_asrc_common.h |1 +
> > >  sound/soc/fsl/fsl_easrc.c   | 2265 +++
> > >  sound/soc/fsl/fsl_easrc.h   |  668 +
> > >  sound/soc/fsl/fsl_easrc_dma.c   |  440 ++
> > 
> > I see a 90% similarity between fsl_asrc_dma and fsl_easrc_dma files.
> > Would it be possible reuse the existing code? Could share structures from
> > my point of view, just like it reuses "enum asrc_pair_index", I know
> > differentiating "pair" and "context" is a big point here though.
> > 
> > A possible quick solution for that, off the top of my head, could be:
> > 
> > 1) in fsl_asrc_common.h
> > 
> > struct fsl_asrc {
> > 
> > };
> > 
> > struct fsl_asrc_pair {
> > 
> > };
> > 
> > 2) in fsl_easrc.h
> > 
> > /* Renaming shared structures */
> > #define fsl_easrc fsl_asrc
> > #define fsl_easrc_context fsl_asrc_pair
> > 
> > May be a good idea to see if others have some opinion too.
> > 
> 
> We need to modify the fsl_asrc and fsl_asrc_pair, let them
> To be used by both driver,  also we need to put the specific
> Definition for each module to same struct, right?

Yea. A merged structure if that doesn't look that bad. I see most
of the fields in struct fsl_asrc are being reused by in fsl_easrc.

> > 
> > > +static const struct regmap_config fsl_easrc_regmap_config = {
> > > + .readable_reg = fsl_easrc_readable_reg,
> > > + .volatile_reg = fsl_easrc_volatile_reg,
> > > + .writeable_reg = fsl_easrc_writeable_reg,
> > 
> > Can we use regmap_range and regmap_access_table?
> > 
> 
> Can the regmap_range support discontinuous registers?  The
> reg_stride = 4.

I think it does. Giving an example here:
https://github.com/torvalds/linux/blob/master/drivers/mfd/da9063-i2c.c


[PATCH 9/8] powerpc: Switch 8xx MAINTAINERS entry to Christophe

2020-02-25 Thread Michael Ellerman
It's over 10 years since the last commit from Vitaly, so I suspect
he's moved on to other things.

Christophe has been the primary contributor to 8xx in the last several
years, so anoint him as the maintainer.

Remove the dead penguingppc.org link.

Cc: Vitaly Bordug 
Signed-off-by: Michael Ellerman 
Acked-by: Christophe Leroy 
---
 MAINTAINERS | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2e917116ef6a..0c1266afb52a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9658,8 +9658,7 @@ F:arch/powerpc/platforms/85xx/
 F: Documentation/devicetree/bindings/powerpc/fsl/
 
 LINUX FOR POWERPC EMBEDDED PPC8XX
-M: Vitaly Bordug 
-W: http://www.penguinppc.org/
+M: Christophe Leroy 
 L: linuxppc-dev@lists.ozlabs.org
 S: Maintained
 F: arch/powerpc/platforms/8xx/
-- 
2.21.1



Re: [PATCH] evh_bytechan: fix out of bounds accesses

2020-02-25 Thread Laurentiu Tudor




On 21.02.2020 01:57, Stephen Rothwell wrote:

Hi all,

On Thu, 16 Jan 2020 11:37:14 +1100 Stephen Rothwell  
wrote:


On Wed, 15 Jan 2020 14:01:35 -0600 Scott Wood  wrote:


On Thu, 2020-01-16 at 06:42 +1100, Stephen Rothwell wrote:

Hi Timur,

On Wed, 15 Jan 2020 07:25:45 -0600 Timur Tabi  wrote:

On 1/14/20 12:31 AM, Stephen Rothwell wrote:

+/**
+ * ev_byte_channel_send - send characters to a byte stream
+ * @handle: byte stream handle
+ * @count: (input) num of chars to send, (output) num chars sent
+ * @bp: pointer to chars to send
+ *
+ * Returns 0 for success, or an error code.
+ */
+static unsigned int ev_byte_channel_send(unsigned int handle,
+   unsigned int *count, const char *bp)


Well, now you've moved this into the .c file and it is no longer
available to other callers.  Anything wrong with keeping it in the .h
file?


There are currently no other callers - are there likely to be in the
future?  Even if there are, is it time critical enough that it needs to
be inlined everywhere?


It's not performance critical and there aren't likely to be other users --
just a matter of what's cleaner.  FWIW I'd rather see the original patch,
that keeps the raw asm hcall stuff as simple wrappers in one place.


And I don't mind either way :-)

I just want to get rid of the warnings.


Any progress with this?



I think that the consensus was to pick up the original patch that is, 
this one: https://patchwork.ozlabs.org/patch/1220186/


I've tested it too, so please feel free to add a:

Tested-by: Laurentiu Tudor 

---
Best Regards, Laurentiu


Re: [PATCH v7 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability

2020-02-25 Thread Alexey Budankov


Hi,

Is there anything else I could do in order to move the changes forward
or is something still missing from this patch set?
Could you please share you mind?

Thanks,
Alexey

On 17.02.2020 11:02, Alexey Budankov wrote:
> 
> Currently access to perf_events, i915_perf and other performance
> monitoring and observability subsystems of the kernel is open only for
> a privileged process [1] with CAP_SYS_ADMIN capability enabled in the
> process effective set [2].
> 
> This patch set introduces CAP_PERFMON capability designed to secure
> system performance monitoring and observability operations so that
> CAP_PERFMON would assist CAP_SYS_ADMIN capability in its governing role
> for performance monitoring and observability subsystems of the kernel.
> 
> CAP_PERFMON intends to harden system security and integrity during
> performance monitoring and observability operations by decreasing attack
> surface that is available to a CAP_SYS_ADMIN privileged process [2].
> Providing the access to performance monitoring and observability
> operations under CAP_PERFMON capability singly, without the rest of
> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
> and makes the operation more secure. Thus, CAP_PERFMON implements the
> principal of least privilege for performance monitoring and
> observability operations (POSIX IEEE 1003.1e: 2.2.2.39 principle of
> least privilege: A security design principle that states that a process
> or program be granted only those privileges (e.g., capabilities)
> necessary to accomplish its legitimate function, and only for the time
> that such privileges are actually required)
> 
> CAP_PERFMON intends to meet the demand to secure system performance
> monitoring and observability operations for adoption in security
> sensitive, restricted, multiuser production environments (e.g. HPC
> clusters, cloud and virtual compute environments), where root or
> CAP_SYS_ADMIN credentials are not available to mass users of a system,
> and securely unblock accessibility of system performance monitoring and
> observability operations beyond root and CAP_SYS_ADMIN use cases.
> 
> CAP_PERFMON intends to take over CAP_SYS_ADMIN credentials related to
> system performance monitoring and observability operations and balance
> amount of CAP_SYS_ADMIN credentials following the recommendations in
> the capabilities man page [2] for CAP_SYS_ADMIN: "Note: this capability
> is overloaded; see Notes to kernel developers, below." For backward
> compatibility reasons access to system performance monitoring and
> observability subsystems of the kernel remains open for CAP_SYS_ADMIN
> privileged processes but CAP_SYS_ADMIN capability usage for secure
> system performance monitoring and observability operations is
> discouraged with respect to the designed CAP_PERFMON capability.
> 
> Possible alternative solution to this system security hardening,
> capabilities balancing task of making performance monitoring and
> observability operations more secure and accessible could be to use
> the existing CAP_SYS_PTRACE capability to govern system performance
> monitoring and observability subsystems. However CAP_SYS_PTRACE
> capability still provides users with more credentials than are
> required for secure performance monitoring and observability
> operations and this excess is avoided by the designed CAP_PERFMON.
> 
> Although software running under CAP_PERFMON can not ensure avoidance of
> related hardware issues, the software can still mitigate those issues
> following the official hardware issues mitigation procedure [3]. The
> bugs in the software itself can be fixed following the standard kernel
> development process [4] to maintain and harden security of system
> performance monitoring and observability operations. Finally, the patch
> set is shaped in the way that simplifies backtracking procedure of
> possible induced issues [5] as much as possible.
> 
> The patch set is for tip perf/core repository:
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core
> sha1: fdb64822443ec9fb8c3a74b598a74790ae8d2e22
> 
> ---
> Changes in v7:
> - updated and extended kernel.rst and perf-security.rst documentation 
>   files with the information about CAP_PERFMON capability and its use cases
> - documented the case of double audit logging of CAP_PERFMON and CAP_SYS_ADMIN
>   capabilities on a SELinux enabled system
> Changes in v6:
> - avoided noaudit checks in perfmon_capable() to explicitly advertise
>   CAP_PERFMON usage thru audit logs to secure system performance
>   monitoring and observability
> Changes in v5:
> - renamed CAP_SYS_PERFMON to CAP_PERFMON
> - extended perfmon_capable() with noaudit checks
> Changes in v4:
> - converted perfmon_capable() into an inline function
> - made perf_events kprobes, uprobes, hw breakpoints and namespaces data
>   available to CAP_SYS_PERFMON privileged processes
> - applied perfmon_capable() to drivers/perf and drivers/oprofile
> - extended __cmd_ftrace

Re: [PATCH v3 03/27] powerpc: Map & release OpenCAPI LPC memory

2020-02-25 Thread Frederic Barrat




Le 21/02/2020 à 04:26, Alastair D'Silva a écrit :

From: Alastair D'Silva 

This patch adds platform support to map & release LPC memory.

Signed-off-by: Alastair D'Silva 
---
  arch/powerpc/include/asm/pnv-ocxl.h   |  4 +++
  arch/powerpc/platforms/powernv/ocxl.c | 43 +++
  2 files changed, 47 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 7de82647e761..0b2a6707e555 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -32,5 +32,9 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void 
*platform_data, int pe_handle)
  
  extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);

  extern void pnv_ocxl_free_xive_irq(u32 irq);
+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
+void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
+#endif



This breaks the compilation of the ocxl driver if CONFIG_MEMORY_HOTPLUG=n

Those functions still make sense even without memory hotplug, for 
example in the context of the implementation you had to access opencapi 
LPC memory through mmap(). The #ifdef is really needed only around the 
check_hotplug_memory_addressable() call.


  Fred



  #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index 8c65aacda9c8..f2edbcc67361 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -475,6 +475,49 @@ void pnv_ocxl_spa_release(void *platform_data)
  }
  EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
  
+#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE

+u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
+{
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   u32 bdfn = pci_dev_id(pdev);
+   __be64 base_addr_be64;
+   u64 base_addr;
+   int rc;
+
+   rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size, &base_addr_be64);
+   if (rc) {
+   dev_warn(&pdev->dev,
+"OPAL could not allocate LPC memory, rc=%d\n", rc);
+   return 0;
+   }
+
+   base_addr = be64_to_cpu(base_addr_be64);
+
+   rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT,
+ size >> PAGE_SHIFT);
+   if (rc)
+   return 0;
+
+   return base_addr;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
+
+void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
+{
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
+   u32 bdfn = pci_dev_id(pdev);
+   int rc;
+
+   rc = opal_npu_mem_release(phb->opal_id, bdfn);
+   if (rc)
+   dev_warn(&pdev->dev,
+"OPAL reported rc=%d when releasing LPC memory\n", rc);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
+#endif
+
  int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
  {
struct spa_data *data = (struct spa_data *) platform_data;





Re: [PATCH v2 4/5] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

2020-02-25 Thread Naveen N. Rao

Gautham R Shenoy wrote:

On Fri, Feb 21, 2020 at 10:50:12AM -0600, Nathan Lynch wrote:

"Gautham R. Shenoy"  writes:
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 80a676d..5b4b450 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include "cacheinfo.h"

> @@ -733,6 +734,42 @@ static void create_svm_file(void)
>  }
>  #endif /* CONFIG_PPC_SVM */
>  
> +static void read_idle_purr(void *val)

> +{
> +  u64 *ret = (u64 *)val;

No cast from void* needed.


Will fix this. Thanks.




> +
> +  *ret = read_this_idle_purr();
> +}
> +
> +static ssize_t idle_purr_show(struct device *dev,
> +struct device_attribute *attr, char *buf)
> +{
> +  struct cpu *cpu = container_of(dev, struct cpu, dev);
> +  u64 val;
> +
> +  smp_call_function_single(cpu->dev.id, read_idle_purr, &val, 1);
> +  return sprintf(buf, "%llx\n", val);
> +}
> +static DEVICE_ATTR(idle_purr, 0400, idle_purr_show, NULL);
> +
> +static void read_idle_spurr(void *val)
> +{
> +  u64 *ret = (u64 *)val;
> +
> +  *ret = read_this_idle_spurr();
> +}
> +
> +static ssize_t idle_spurr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> +  struct cpu *cpu = container_of(dev, struct cpu, dev);
> +  u64 val;
> +
> +  smp_call_function_single(cpu->dev.id, read_idle_spurr, &val, 1);
> +  return sprintf(buf, "%llx\n", val);
> +}
> +static DEVICE_ATTR(idle_spurr, 0400, idle_spurr_show, NULL);

It's regrettable that we have to wake up potentially idle CPUs in order
to derive correct idle statistics for them, but I suppose the main user
(lparstat) of these interfaces already is causing this to happen by
polling the existing per-cpu purr and spurr attributes.

So now lparstat will incur at minimum four syscalls and four IPIs per
CPU per polling interval -- one for each of purr, spurr, idle_purr and
idle_spurr. Correct?


Yes, it is unforunate that we will end up making four syscalls and
generating IPI noise, and this is something that I discussed with
Naveen and Kamalesh. We have the following two constraints:

1) These values of PURR and SPURR required are per-cpu. Hence putting
them in lparcfg is not an option.

2) sysfs semantics encourages a single value per key, the key being
the sysfs-file. Something like the following would have made far more
sense.

cat /sys/devices/system/cpu/cpuX/purr_spurr_accounting
purr:A
idle_purr:B
spurr:C
idle_spurr:D

There are some sysfs files which allow something like this. Eg: 
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state


Thoughts on any other alternatives?


Umm... procfs?
/me ducks






At some point it's going to make sense to batch sampling of remote CPUs'
SPRs.


How did you mean this? It looks like we first need to provide a separate 
user interface, since with the existing sysfs interface providing 
separate files, I am not sure if we can batch such reads.



- Naveen



Re: [linux-next/mainline][bisected 3acac06][ppc] Oops when unloading mpt3sas driver

2020-02-25 Thread Sreekanth Reddy
On Tue, Feb 25, 2020 at 11:51 AM Abdul Haleem
 wrote:
>
> On Fri, 2020-01-17 at 18:21 +0530, Abdul Haleem wrote:
> > On Thu, 2020-01-16 at 09:44 -0800, Christoph Hellwig wrote:
> > > Hi Abdul,
> > >
> > > I think the problem is that mpt3sas has some convoluted logic to do
> > > some DMA allocations with a 32-bit coherent mask, and then switches
> > > to a 63 or 64 bit mask, which is not supported by the DMA API.
> > >
> > > Can you try the patch below?
> >
> > Thank you Christoph, with the given patch applied the bug is not seen.
> >
> > rmmod of mpt3sas driver is successful, no kernel Oops
> >
> > Reported-and-tested-by: Abdul Haleem 
>
> Hi Christoph,
>
> I see the patch is under discussion, will this be merged upstream any
> time soon ? as boot is broken on our machines with out your patch.
>

Hi Abdul,

We have posted a new set of patches to fix this issue. This patch set
won't change the DMA Mask on the fly and also won't hardcode the DMA
mask to 32 bit.

[PATCH 0/5] mpt3sas: Fix changing coherent mask after allocation.

This patchset will have below patches, Please review and try with this
patch set.

Suganath Prabu S (5):
  mpt3sas: Don't change the dma coherent mask after  allocations
  mpt3sas: Rename function name is_MSB_are_same
  mpt3sas: Code Refactoring.
  mpt3sas: Handle RDPQ DMA allocation in same 4g region
  mpt3sas: Update version to 33.101.00.00

Regards,
Sreekanth

> --
> Regard's
>
> Abdul Haleem
> IBM Linux Technology Centre
>
>
>


Re: [PATCH v3 04/27] ocxl: Remove unnecessary externs

2020-02-25 Thread Frederic Barrat




Le 21/02/2020 à 04:26, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Function declarations don't need externs, remove the existing ones
so they are consistent with newer code

Signed-off-by: Alastair D'Silva 
---


Thanks for the cleanup!
Acked-by: Frederic Barrat 





  arch/powerpc/include/asm/pnv-ocxl.h | 32 ++---
  include/misc/ocxl.h |  6 +++---
  2 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index 0b2a6707e555..b23c99bc0c84 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -9,29 +9,27 @@
  #define PNV_OCXL_TL_BITS_PER_RATE   4
  #define PNV_OCXL_TL_RATE_BUF_SIZE   ((PNV_OCXL_TL_MAX_TEMPLATE+1) * 
PNV_OCXL_TL_BITS_PER_RATE / 8)
  
-extern int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled,

-   u16 *supported);
-extern int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
+int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 
*supported);
+int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
  
-extern int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,

+int pnv_ocxl_get_tl_cap(struct pci_dev *dev, long *cap,
char *rate_buf, int rate_buf_size);
-extern int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
+int pnv_ocxl_set_tl_conf(struct pci_dev *dev, long cap,
uint64_t rate_buf_phys, int rate_buf_size);
  
-extern int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);

-extern void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
-   void __iomem *tfc, void __iomem *pe_handle);
-extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
-   void __iomem **dar, void __iomem **tfc,
-   void __iomem **pe_handle);
+int pnv_ocxl_get_xsl_irq(struct pci_dev *dev, int *hwirq);
+void pnv_ocxl_unmap_xsl_regs(void __iomem *dsisr, void __iomem *dar,
+void __iomem *tfc, void __iomem *pe_handle);
+int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
+ void __iomem **dar, void __iomem **tfc,
+ void __iomem **pe_handle);
  
-extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,

-   void **platform_data);
-extern void pnv_ocxl_spa_release(void *platform_data);
-extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int 
pe_handle);
+int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void 
**platform_data);
+void pnv_ocxl_spa_release(void *platform_data);
+int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
  
-extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);

-extern void pnv_ocxl_free_xive_irq(u32 irq);
+int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
+void pnv_ocxl_free_xive_irq(u32 irq);
  #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
  u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
  void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 06dd5839e438..0a762e387418 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -173,7 +173,7 @@ int ocxl_context_detach(struct ocxl_context *ctx);
   *
   * Returns 0 on success, negative on failure
   */
-extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);
+int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int *irq_id);
  
  /**

   * Frees an IRQ associated with an AFU context
@@ -182,7 +182,7 @@ extern int ocxl_afu_irq_alloc(struct ocxl_context *ctx, int 
*irq_id);
   *
   * Returns 0 on success, negative on failure
   */
-extern int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id);
+int ocxl_afu_irq_free(struct ocxl_context *ctx, int irq_id);
  
  /**

   * Gets the address of the trigger page for an IRQ
@@ -193,7 +193,7 @@ extern int ocxl_afu_irq_free(struct ocxl_context *ctx, int 
irq_id);
   *
   * returns the trigger page address, or 0 if the IRQ is not valid
   */
-extern u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id);
+u64 ocxl_afu_irq_get_addr(struct ocxl_context *ctx, int irq_id);
  
  /**

   * Provide a callback to be called when an IRQ is triggered





Re: [PATCH] crypto: Replace zero-length array with flexible-array member

2020-02-25 Thread Horia Geanta
On 2/24/2020 6:18 PM, Gustavo A. R. Silva wrote:
> The current codebase makes use of the zero-length array language
> extension to the C90 standard, but the preferred mechanism to declare
> variable-length types such as these ones is a flexible array member[1][2],
> introduced in C99:
> 
> struct foo {
> int stuff;
> struct boo array[];
> };
> 
> By making use of the mechanism above, we will get a compiler warning
> in case the flexible array does not occur last in the structure, which
> will help us prevent some kind of undefined behavior bugs from being
> inadvertently introduced[3] to the codebase from now on.
> 
> Also, notice that, dynamic memory allocations won't be affected by
> this change:
> 
> "Flexible array members have incomplete type, and so the sizeof operator
> may not be applied. As a quirk of the original implementation of
> zero-length arrays, sizeof evaluates to zero."[1]
> 
> This issue was found with the help of Coccinelle.
> 
> [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> [2] https://github.com/KSPP/linux/issues/21
> [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
> 
> Signed-off-by: Gustavo A. R. Silva 
Reviewed-by: Horia Geantă 

for caam driver:

>  drivers/crypto/caam/caamalg.c  | 2 +-
>  drivers/crypto/caam/caamalg_qi.c   | 4 ++--
>  drivers/crypto/caam/caamalg_qi2.h  | 6 +++---
>  drivers/crypto/caam/caamhash.c | 2 +-

Thanks,
Horia


Re: [PATCH] crypto: Replace zero-length array with flexible-array member

2020-02-25 Thread Gustavo A. R. Silva



On 2/25/20 07:44, Horia Geanta wrote:
> On 2/24/2020 6:18 PM, Gustavo A. R. Silva wrote:
>> The current codebase makes use of the zero-length array language
>> extension to the C90 standard, but the preferred mechanism to declare
>> variable-length types such as these ones is a flexible array member[1][2],
>> introduced in C99:
>>
>> struct foo {
>> int stuff;
>> struct boo array[];
>> };
>>
>> By making use of the mechanism above, we will get a compiler warning
>> in case the flexible array does not occur last in the structure, which
>> will help us prevent some kind of undefined behavior bugs from being
>> inadvertently introduced[3] to the codebase from now on.
>>
>> Also, notice that, dynamic memory allocations won't be affected by
>> this change:
>>
>> "Flexible array members have incomplete type, and so the sizeof operator
>> may not be applied. As a quirk of the original implementation of
>> zero-length arrays, sizeof evaluates to zero."[1]
>>
>> This issue was found with the help of Coccinelle.
>>
>> [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
>> [2] https://github.com/KSPP/linux/issues/21
>> [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
>>
>> Signed-off-by: Gustavo A. R. Silva 
> Reviewed-by: Horia Geantă 
> 

Thank you, Horia.
--
Gustavo

> for caam driver:
> 
>>  drivers/crypto/caam/caamalg.c  | 2 +-
>>  drivers/crypto/caam/caamalg_qi.c   | 4 ++--
>>  drivers/crypto/caam/caamalg_qi2.h  | 6 +++---
>>  drivers/crypto/caam/caamhash.c | 2 +-
> 
> Thanks,
> Horia
> 


Re: [PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices

2020-02-25 Thread John Paul Adrian Glaubitz
Hello!

On 2/25/20 3:12 PM, Wolfram Sang wrote:
> Adding the Debian-PPC List to reach further people maybe willing to
> test.

This might be related [1].

Adrian

> [1] https://lists.debian.org/debian-powerpc/2020/01/msg00062.html

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


[Bug 201723] [Bisected][Regression] THERM_WINDTUNNEL not working any longer in kernel 4.19.x (PowerMac G4 MDD)

2020-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=201723

Wolfram Sang (w...@the-dreams.de) changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #6 from Wolfram Sang (w...@the-dreams.de) ---
Patch which works for Erhard is sent out:

http://patchwork.ozlabs.org/patch/1244322/

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load

2020-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=206669

Bug ID: 206669
   Summary: Little-endian kernel crashing on POWER8 on heavy
big-endian PowerKVM load
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 5.4.x
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-64
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: glaub...@physik.fu-berlin.de
CC: mator...@gmail.com
Regression: No

Created attachment 287605
  --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit
Backtrace of host system crashing with little-endian kernel

We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian
unstable hosting a big-endian ppc64 virtual machine running the same kernel in
big-endian mode.

When building OpenJDK-11 on the big-endian VM, the testsuite crashes the *host*
system which is little-endian with the following kernel backtrace. The problem
reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host
running 5.4.x.

Backtrace attached.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 199471] windfarm_pm72 no longer gets automatically loaded when CONFIG_I2C_POWERMAC=y is set (regression)

2020-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199471

Wolfram Sang (w...@the-dreams.de) changed:

   What|Removed |Added

 CC||w...@the-dreams.de

--- Comment #8 from Wolfram Sang (w...@the-dreams.de) ---
"This has been quite nice since 4.?.x up to 4.16.x as you only need
CONFIG_I2C_POWERMAC=y which selects the proper windfarm_pmXX at boot time."

I can't find that in the code. Are you sure i2c-powermac requested that module?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v3 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2020-02-25 Thread Frederic Barrat




Le 21/02/2020 à 04:26, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Tally up the LPC memory on an OpenCAPI link & allow it to be mapped

Signed-off-by: Alastair D'Silva 
---
  drivers/misc/ocxl/core.c  | 10 ++
  drivers/misc/ocxl/link.c  | 53 +++
  drivers/misc/ocxl/ocxl_internal.h | 33 +++
  3 files changed, 96 insertions(+)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index b7a09b21ab36..2531c6cf19a0 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, 
struct pci_dev *dev)
if (rc)
goto err_free_pasid;
  
+	if (afu->config.lpc_mem_size || afu->config.special_purpose_mem_size) {

+   rc = ocxl_link_add_lpc_mem(afu->fn->link, 
afu->config.lpc_mem_offset,
+  afu->config.lpc_mem_size +
+  
afu->config.special_purpose_mem_size);
+   if (rc)
+   goto err_free_mmio;
+   }
+
return 0;
  
+err_free_mmio:

+   unmap_mmio_areas(afu);
  err_free_pasid:
reclaim_afu_pasid(afu);
  err_free_actag:
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 58d111afd9f6..1e039cc5ebe5 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -84,6 +84,11 @@ struct ocxl_link {
int dev;
atomic_t irq_available;
struct spa *spa;
+   struct mutex lpc_mem_lock; /* protects lpc_mem & lpc_mem_sz */
+   u64 lpc_mem_sz; /* Total amount of LPC memory presented on the link */
+   u64 lpc_mem;
+   int lpc_consumers;
+
void *platform_data;
  };
  static struct list_head links_list = LIST_HEAD_INIT(links_list);
@@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, 
struct ocxl_link **out_l
if (rc)
goto err_spa;
  
+	mutex_init(&link->lpc_mem_lock);

+
/* platform specific hook */
rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
&link->platform_data);
@@ -711,3 +718,49 @@ void ocxl_link_free_irq(void *link_handle, int hw_irq)
atomic_inc(&link->irq_available);
  }
  EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
+
+int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   // Check for overflow
+   if (offset > (offset + size))
+   return -EINVAL;
+
+   mutex_lock(&link->lpc_mem_lock);
+   link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size);
+
+   mutex_unlock(&link->lpc_mem_lock);
+
+   return 0;
+}
+
+u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   mutex_lock(&link->lpc_mem_lock);
+
+   if(!link->lpc_mem)
+   link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, 
link->lpc_mem_sz);
+
+   if(link->lpc_mem)
+   link->lpc_consumers++;
+   mutex_unlock(&link->lpc_mem_lock);
+
+   return link->lpc_mem;
+}
+
+void ocxl_link_lpc_release(void *link_handle, struct pci_dev *pdev)
+{
+   struct ocxl_link *link = (struct ocxl_link *) link_handle;
+
+   mutex_lock(&link->lpc_mem_lock);
+   WARN_ON(--link->lpc_consumers < 0);



Here, we always decrement the lpc_consumers count. However, it was only 
incremented if the mapping was setup correctly in opal.


We could arguably claim that ocxl_link_lpc_release() should only be 
called if ocxl_link_lpc_map() succeeded, but it would make error path 
handling easier if we only decrement the lpc_consumers count if 
link->lpc_mem is set. So that we can just call ocxl_link_lpc_release() 
in error paths without having to worry about triggering the WARN_ON message.


  Fred




+   if (link->lpc_consumers == 0) {
+   pnv_ocxl_platform_lpc_release(pdev);
+   link->lpc_mem = 0;
+   }
+
+   mutex_unlock(&link->lpc_mem_lock);
+}
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index 198e4e4bc51d..d0c8c4838f42 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context *ctx, u64 
offset);
  u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
  void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
  
+/**

+ * ocxl_link_add_lpc_mem() - Increment the amount of memory required by an 
OpenCAPI link
+ *
+ * @link_handle: The OpenCAPI link handle
+ * @offset: The offset of the memory to add
+ * @size: The amount of memory to increment by
+ *
+ * Returns 0 on success, negative on overflow
+ */
+int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size);
+
+/**
+ * ocxl_link_lpc_map() - Map the LPC memory for an OpenCAPI device
+ * Since LPC memory 

Re: [PATCH v3 07/27] ocxl: Add functions to map/unmap LPC memory

2020-02-25 Thread Frederic Barrat




Le 21/02/2020 à 04:27, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Add functions to map/unmap LPC memory

Signed-off-by: Alastair D'Silva 
---



It looks ok to me.
Acked-by: Frederic Barrat 




  drivers/misc/ocxl/core.c  | 51 +++
  drivers/misc/ocxl/ocxl_internal.h |  3 ++
  include/misc/ocxl.h   | 21 +
  3 files changed, 75 insertions(+)

diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
index 2531c6cf19a0..75ff14e3882a 100644
--- a/drivers/misc/ocxl/core.c
+++ b/drivers/misc/ocxl/core.c
@@ -210,6 +210,56 @@ static void unmap_mmio_areas(struct ocxl_afu *afu)
release_fn_bar(afu->fn, afu->config.global_mmio_bar);
  }
  
+int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu)

+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if ((afu->config.lpc_mem_size + afu->config.special_purpose_mem_size) 
== 0)
+   return 0;
+
+   afu->lpc_base_addr = ocxl_link_lpc_map(afu->fn->link, dev);
+   if (afu->lpc_base_addr == 0)
+   return -EINVAL;
+
+   if (afu->config.lpc_mem_size > 0) {
+   afu->lpc_res.start = afu->lpc_base_addr + 
afu->config.lpc_mem_offset;
+   afu->lpc_res.end = afu->lpc_res.start + 
afu->config.lpc_mem_size - 1;
+   }
+
+   if (afu->config.special_purpose_mem_size > 0) {
+   afu->special_purpose_res.start = afu->lpc_base_addr +
+
afu->config.special_purpose_mem_offset;
+   afu->special_purpose_res.end = afu->special_purpose_res.start +
+  
afu->config.special_purpose_mem_size - 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ocxl_afu_map_lpc_mem);
+
+struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
+{
+   return &afu->lpc_res;
+}
+EXPORT_SYMBOL_GPL(ocxl_afu_lpc_mem);
+
+static void unmap_lpc_mem(struct ocxl_afu *afu)
+{
+   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
+
+   if (afu->lpc_res.start || afu->special_purpose_res.start) {
+   void *link = afu->fn->link;
+
+   // only release the link when the the last consumer calls 
release
+   ocxl_link_lpc_release(link, dev);
+
+   afu->lpc_res.start = 0;
+   afu->lpc_res.end = 0;
+   afu->special_purpose_res.start = 0;
+   afu->special_purpose_res.end = 0;
+   }
+}
+
  static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct pci_dev 
*dev)
  {
int rc;
@@ -251,6 +301,7 @@ static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, 
struct pci_dev *dev)
  
  static void deconfigure_afu(struct ocxl_afu *afu)

  {
+   unmap_lpc_mem(afu);
unmap_mmio_areas(afu);
reclaim_afu_pasid(afu);
reclaim_afu_actag(afu);
diff --git a/drivers/misc/ocxl/ocxl_internal.h 
b/drivers/misc/ocxl/ocxl_internal.h
index d0c8c4838f42..ce0cac1da416 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -52,6 +52,9 @@ struct ocxl_afu {
void __iomem *global_mmio_ptr;
u64 pp_mmio_start;
void *private;
+   u64 lpc_base_addr; /* Covers both LPC & special purpose memory */
+   struct resource lpc_res;
+   struct resource special_purpose_res;
  };
  
  enum ocxl_context_status {

diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 357ef1aadbc0..d8b0b4d46bfb 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -203,6 +203,27 @@ int ocxl_irq_set_handler(struct ocxl_context *ctx, int 
irq_id,
  
  // AFU Metadata
  
+/**

+ * ocxl_afu_map_lpc_mem() - Map the LPC system & special purpose memory for an 
AFU
+ * Do not call this during device discovery, as there may me multiple
+ * devices on a link, and the memory is mapped for the whole link, not
+ * just one device. It should only be called after all devices have
+ * registered their memory on the link.
+ *
+ * @afu: The AFU that has the LPC memory to map
+ *
+ * Returns 0 on success, negative on failure
+ */
+int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu);
+
+/**
+ * ocxl_afu_lpc_mem() - Get the physical address range of LPC memory for an AFU
+ * @afu: The AFU associated with the LPC memory
+ *
+ * Returns a pointer to the resource struct for the physical address range
+ */
+struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu);
+
  /**
   * ocxl_afu_config() - Get a pointer to the config for an AFU
   * @afu: a pointer to the AFU to get the config for





Re: [PATCH v3 08/27] ocxl: Emit a log message showing how much LPC memory was detected

2020-02-25 Thread Frederic Barrat




Le 21/02/2020 à 04:27, Alastair D'Silva a écrit :

From: Alastair D'Silva 

This patch emits a message showing how much LPC memory & special purpose
memory was detected on an OCXL device.

Signed-off-by: Alastair D'Silva 
---



Acked-by: Frederic Barrat 




  drivers/misc/ocxl/config.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index a62e3d7db2bf..701ae6216abf 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct pci_dev *dev,
afu->special_purpose_mem_size =
total_mem_size - lpc_mem_size;
}
+
+   dev_info(&dev->dev, "Probed LPC memory of %#llx bytes and special purpose 
memory of %#llx bytes\n",
+   afu->lpc_mem_size, afu->special_purpose_mem_size);
+
return 0;
  }
  





[PATCH v3 00/32] powerpc/64: interrupts and syscalls series

2020-02-25 Thread Nicholas Piggin
This is a long overdue update of the series, with fixes from me Michal
and Michael. Does not include Michal's syscall compat series.

Patches 1-22 are changes to low level 64s interrupt entry assembly
which has been posted before, no change except adding patch 21 and
fixing patch 22 to reconcile irq state in the soft-nmi handler to
avoid preempt warnings.

Patches 23-26 are to turn system call entry/exit code into C. Bunch
of irq and preempt and TM warnings and bugs caught by selftests etc
fixed, plus a few peripheral patches added (sstep and zeroing regs).

Patches 27-29 are to turn interrupt exit code into C. This had a bit
more change, most significantly a change to how interrupt exit soft
irq replay works.

Patches 30-32 are for scv system call support. Lot of changes here
to turn it into something a bit better than RFC quality. Discussion
about ABI seems to be settling and not very controversial.

Thanks,
Nick

Nicholas Piggin (32):
  powerpc/64s/exception: Introduce INT_DEFINE parameter block for code
generation
  powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE
parameters
  powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE
parameters
  powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
  powerpc/64s/exception: Move all interrupt handlers to new style code
gen macros
  powerpc/64s/exception: Remove old INT_ENTRY macro
  powerpc/64s/exception: Remove old INT_COMMON macro
  powerpc/64s/exception: Remove old INT_KVM_HANDLER
  powerpc/64s/exception: Add ISIDE option
  powerpc/64s/exception: move real->virt switch into the common handler
  powerpc/64s/exception: move soft-mask test to common code
  powerpc/64s/exception: move KVM test to common code
  powerpc/64s/exception: remove confusing IEARLY option
  powerpc/64s/exception: remove the SPR saving patch code macros
  powerpc/64s/exception: trim unused arguments from KVMTEST macro
  powerpc/64s/exception: hdecrementer avoid touching the stack
  powerpc/64s/exception: re-inline some handlers
  powerpc/64s/exception: Clean up SRR specifiers
  powerpc/64s/exception: add more comments for interrupt handlers
  powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is
supported
  powerpc/64s/exception: sreset interrupts reconcile fix
  powerpc/64s/exception: soft nmi interrupt should not use
ret_from_except
  powerpc/64: system call remove non-volatile GPR save optimisation
  powerpc/64: sstep ifdef the deprecated fast endian switch syscall
  powerpc/64: system call implement entry/exit logic in C
  powerpc/64: system call zero volatile registers when returning
  powerpc/64: implement soft interrupt replay in C
  powerpc/64s: interrupt implement exit logic in C
  powerpc/64s/exception: remove lite interrupt return
  powerpc/64: system call reconcile interrupts
  powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked
  powerpc/64s: system call support for scv/rfscv instructions

 Documentation/powerpc/syscall64-abi.rst   |   42 +-
 arch/powerpc/include/asm/asm-prototypes.h |   17 +-
 .../powerpc/include/asm/book3s/64/kup-radix.h |   24 +-
 arch/powerpc/include/asm/cputime.h|   29 +
 arch/powerpc/include/asm/exception-64s.h  |   10 +-
 arch/powerpc/include/asm/head-64.h|2 +-
 arch/powerpc/include/asm/hw_irq.h |6 +-
 arch/powerpc/include/asm/ppc_asm.h|2 +
 arch/powerpc/include/asm/processor.h  |2 +-
 arch/powerpc/include/asm/ptrace.h |3 +
 arch/powerpc/include/asm/setup.h  |4 +-
 arch/powerpc/include/asm/signal.h |3 +
 arch/powerpc/include/asm/switch_to.h  |   11 +
 arch/powerpc/include/asm/time.h   |4 +-
 arch/powerpc/kernel/Makefile  |3 +-
 arch/powerpc/kernel/cpu_setup_power.S |2 +-
 arch/powerpc/kernel/cputable.c|3 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |1 +
 arch/powerpc/kernel/entry_64.S| 1017 +++-
 arch/powerpc/kernel/exceptions-64e.S  |  287 ++-
 arch/powerpc/kernel/exceptions-64s.S  | 2168 -
 arch/powerpc/kernel/irq.c |  183 +-
 arch/powerpc/kernel/process.c |   89 +-
 arch/powerpc/kernel/setup_64.c|5 +-
 arch/powerpc/kernel/signal.h  |2 -
 arch/powerpc/kernel/syscall_64.c  |  379 +++
 arch/powerpc/kernel/syscalls/syscall.tbl  |   22 +-
 arch/powerpc/kernel/systbl.S  |9 +-
 arch/powerpc/kernel/time.c|9 -
 arch/powerpc/kernel/vector.S  |2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |   11 -
 arch/powerpc/kvm/book3s_segment.S |7 -
 arch/powerpc/lib/sstep.c  |5 +-
 arch/powerpc/platforms/pseries/setup.c|8 +-
 34 files changed, 2769 insertions(+), 1602 deletions(-)
 create mode 100

[PATCH v3 01/32] powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation

2020-02-25 Thread Nicholas Piggin
The code generation macro arguments are difficult to read, and
defaults can't easily be used.

This introduces a block where parameters can be set for interrupt
handler code generation by the subsequent macros, and adds the first
generation macro for interrupt entry.

One interrupt handler is converted to the new macros to demonstrate
the change, the rest will be coverted all at once.

No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 77 ++--
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ffc15f4f079d..1b942c98bc05 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -193,6 +193,61 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
mtctr   reg;\
bctr
 
+/*
+ * Interrupt code generation macros
+ */
+#define IVEC   .L_IVEC_\name\()
+#define IHSRR  .L_IHSRR_\name\()
+#define IAREA  .L_IAREA_\name\()
+#define IDAR   .L_IDAR_\name\()
+#define IDSISR .L_IDSISR_\name\()
+#define ISET_RI.L_ISET_RI_\name\()
+#define IEARLY .L_IEARLY_\name\()
+#define IMASK  .L_IMASK_\name\()
+#define IKVM_REAL  .L_IKVM_REAL_\name\()
+#define IKVM_VIRT  .L_IKVM_VIRT_\name\()
+
+#define INT_DEFINE_BEGIN(n)\
+.macro int_define_ ## n name
+
+#define INT_DEFINE_END(n)  \
+.endm ;
\
+int_define_ ## n n ;   \
+do_define_int n
+
+.macro do_define_int name
+   .ifndef IVEC
+   .error "IVEC not defined"
+   .endif
+   .ifndef IHSRR
+   IHSRR=EXC_STD
+   .endif
+   .ifndef IAREA
+   IAREA=PACA_EXGEN
+   .endif
+   .ifndef IDAR
+   IDAR=0
+   .endif
+   .ifndef IDSISR
+   IDSISR=0
+   .endif
+   .ifndef ISET_RI
+   ISET_RI=1
+   .endif
+   .ifndef IEARLY
+   IEARLY=0
+   .endif
+   .ifndef IMASK
+   IMASK=0
+   .endif
+   .ifndef IKVM_REAL
+   IKVM_REAL=0
+   .endif
+   .ifndef IKVM_VIRT
+   IKVM_VIRT=0
+   .endif
+.endm
+
 .macro INT_KVM_HANDLER name, vec, hsrr, area, skip
TRAMP_KVM_BEGIN(\name\()_kvm)
KVM_HANDLER \vec, \hsrr, \area, \skip
@@ -474,7 +529,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 */
GET_SCRATCH0(r10)
std r10,\area\()+EX_R13(r13)
-   .if \dar
+   .if \dar == 1
.if \hsrr
mfspr   r10,SPRN_HDAR
.else
@@ -482,7 +537,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
std r10,\area\()+EX_DAR(r13)
.endif
-   .if \dsisr
+   .if \dsisr == 1
.if \hsrr
mfspr   r10,SPRN_HDSISR
.else
@@ -506,6 +561,14 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
 .endm
 
+.macro GEN_INT_ENTRY name, virt, ool=0
+   .if ! \virt
+   INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, 
ISET_RI, IDAR, IDSISR, IMASK, IKVM_REAL
+   .else
+   INT_HANDLER \name, IVEC, \ool, IEARLY, \virt, IHSRR, IAREA, 
ISET_RI, IDAR, IDSISR, IMASK, IKVM_VIRT
+   .endif
+.endm
+
 /*
  * On entry r13 points to the paca, r9-r13 are saved in the paca,
  * r9 contains the saved CR, r11 and r12 contain the saved SRR0 and
@@ -1143,12 +1206,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
bl  unrecoverable_exception
b   .
 
+INT_DEFINE_BEGIN(data_access)
+   IVEC=0x300
+   IDAR=1
+   IDSISR=1
+   IKVM_REAL=1
+INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
-   INT_HANDLER data_access, 0x300, ool=1, dar=1, dsisr=1, kvm=1
+   GEN_INT_ENTRY data_access, virt=0, ool=1
 EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
-   INT_HANDLER data_access, 0x300, virt=1, dar=1, dsisr=1
+   GEN_INT_ENTRY data_access, virt=1
 EXC_VIRT_END(data_access, 0x4300, 0x80)
 INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
 EXC_COMMON_BEGIN(data_access_common)
-- 
2.23.0



[PATCH v3 02/32] powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters

2020-02-25 Thread Nicholas Piggin
No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 1b942c98bc05..f3f2ec88b3d8 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -206,6 +206,9 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IMASK  .L_IMASK_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
 #define IKVM_VIRT  .L_IKVM_VIRT_\name\()
+#define ISTACK .L_ISTACK_\name\()
+#define IRECONCILE .L_IRECONCILE_\name\()
+#define IKUAP  .L_IKUAP_\name\()
 
 #define INT_DEFINE_BEGIN(n)\
 .macro int_define_ ## n name
@@ -246,6 +249,15 @@ do_define_int n
.ifndef IKVM_VIRT
IKVM_VIRT=0
.endif
+   .ifndef ISTACK
+   ISTACK=1
+   .endif
+   .ifndef IRECONCILE
+   IRECONCILE=1
+   .endif
+   .ifndef IKUAP
+   IKUAP=1
+   .endif
 .endm
 
 .macro INT_KVM_HANDLER name, vec, hsrr, area, skip
@@ -670,6 +682,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
.endif
 .endm
 
+.macro GEN_COMMON name
+   INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR
+.endm
+
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -1221,13 +1237,7 @@ EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
 EXC_VIRT_END(data_access, 0x4300, 0x80)
 INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
 EXC_COMMON_BEGIN(data_access_common)
-   /*
-* Here r13 points to the paca, r9 contains the saved CR,
-* SRR0 and SRR1 are saved in r11 and r12,
-* r9 - r13 are saved in paca->exgen.
-* EX_DAR and EX_DSISR have saved DAR/DSISR
-*/
-   INT_COMMON 0x300, PACA_EXGEN, 1, 1, 1, 1, 1
+   GEN_COMMON data_access
ld  r4,_DAR(r1)
ld  r5,_DSISR(r1)
 BEGIN_MMU_FTR_SECTION
-- 
2.23.0



[PATCH v3 03/32] powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE parameters

2020-02-25 Thread Nicholas Piggin
No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index f3f2ec88b3d8..da3c22eea72d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -204,6 +204,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define ISET_RI.L_ISET_RI_\name\()
 #define IEARLY .L_IEARLY_\name\()
 #define IMASK  .L_IMASK_\name\()
+#define IKVM_SKIP  .L_IKVM_SKIP_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
 #define IKVM_VIRT  .L_IKVM_VIRT_\name\()
 #define ISTACK .L_ISTACK_\name\()
@@ -243,6 +244,9 @@ do_define_int n
.ifndef IMASK
IMASK=0
.endif
+   .ifndef IKVM_SKIP
+   IKVM_SKIP=0
+   .endif
.ifndef IKVM_REAL
IKVM_REAL=0
.endif
@@ -265,6 +269,10 @@ do_define_int n
KVM_HANDLER \vec, \hsrr, \area, \skip
 .endm
 
+.macro GEN_KVM name
+   KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP
+.endm
+
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -1226,6 +1234,7 @@ INT_DEFINE_BEGIN(data_access)
IVEC=0x300
IDAR=1
IDSISR=1
+   IKVM_SKIP=1
IKVM_REAL=1
 INT_DEFINE_END(data_access)
 
@@ -1235,7 +1244,8 @@ EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
GEN_INT_ENTRY data_access, virt=1
 EXC_VIRT_END(data_access, 0x4300, 0x80)
-INT_KVM_HANDLER data_access, 0x300, EXC_STD, PACA_EXGEN, 1
+TRAMP_KVM_BEGIN(data_access_kvm)
+   GEN_KVM data_access
 EXC_COMMON_BEGIN(data_access_common)
GEN_COMMON data_access
ld  r4,_DAR(r1)
-- 
2.23.0



[PATCH v3 04/32] powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros

2020-02-25 Thread Nicholas Piggin
These don't provide a large amount of code sharing. Removing them
makes code easier to shuffle around. For example, some of the common
instructions will be moved into the common code gen macro.

No generated code change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 160 ---
 1 file changed, 117 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index da3c22eea72d..0f1da3099c28 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -757,28 +757,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
 #define FINISH_NAP
 #endif
 
-#define EXC_COMMON(name, realvec, hdlr)
\
-   EXC_COMMON_BEGIN(name); \
-   INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \
-   bl  save_nvgprs;\
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr;   \
-   b   ret_from_except
-
-/*
- * Like EXC_COMMON, but for exceptions that can occur in the idle task and
- * therefore need the special idle handling (finish nap and runlatch)
- */
-#define EXC_COMMON_ASYNC(name, realvec, hdlr)  \
-   EXC_COMMON_BEGIN(name); \
-   INT_COMMON realvec, PACA_EXGEN, 1, 1, 1, 0, 0 ; \
-   FINISH_NAP; \
-   RUNLATCH_ON;\
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr;   \
-   b   ret_from_except_lite
-
-
 /*
  * There are a few constraints to be concerned with.
  * - Real mode exceptions code/data must be located at their physical location.
@@ -1349,7 +1327,13 @@ EXC_VIRT_BEGIN(hardware_interrupt, 0x4500, 0x100)
INT_HANDLER hardware_interrupt, 0x500, virt=1, hsrr=EXC_HV_OR_STD, 
bitmask=IRQS_DISABLED, kvm=1
 EXC_VIRT_END(hardware_interrupt, 0x4500, 0x100)
 INT_KVM_HANDLER hardware_interrupt, 0x500, EXC_HV_OR_STD, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(hardware_interrupt_common, 0x500, do_IRQ)
+EXC_COMMON_BEGIN(hardware_interrupt_common)
+   INT_COMMON 0x500, PACA_EXGEN, 1, 1, 1, 0, 0
+   FINISH_NAP
+   RUNLATCH_ON
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_IRQ
+   b   ret_from_except_lite
 
 
 EXC_REAL_BEGIN(alignment, 0x600, 0x100)
@@ -1455,7 +1439,13 @@ EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
INT_HANDLER decrementer, 0x900, virt=1, bitmask=IRQS_DISABLED
 EXC_VIRT_END(decrementer, 0x4900, 0x80)
 INT_KVM_HANDLER decrementer, 0x900, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON_ASYNC(decrementer_common, 0x900, timer_interrupt)
+EXC_COMMON_BEGIN(decrementer_common)
+   INT_COMMON 0x900, PACA_EXGEN, 1, 1, 1, 0, 0
+   FINISH_NAP
+   RUNLATCH_ON
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  timer_interrupt
+   b   ret_from_except_lite
 
 
 EXC_REAL_BEGIN(hdecrementer, 0x980, 0x80)
@@ -1465,7 +1455,12 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
INT_HANDLER hdecrementer, 0x980, virt=1, hsrr=EXC_HV, kvm=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
 INT_KVM_HANDLER hdecrementer, 0x980, EXC_HV, PACA_EXGEN, 0
-EXC_COMMON(hdecrementer_common, 0x980, hdec_interrupt)
+EXC_COMMON_BEGIN(hdecrementer_common)
+   INT_COMMON 0x980, PACA_EXGEN, 1, 1, 1, 0, 0
+   bl  save_nvgprs
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  hdec_interrupt
+   b   ret_from_except
 
 
 EXC_REAL_BEGIN(doorbell_super, 0xa00, 0x100)
@@ -1475,11 +1470,17 @@ EXC_VIRT_BEGIN(doorbell_super, 0x4a00, 0x100)
INT_HANDLER doorbell_super, 0xa00, virt=1, bitmask=IRQS_DISABLED
 EXC_VIRT_END(doorbell_super, 0x4a00, 0x100)
 INT_KVM_HANDLER doorbell_super, 0xa00, EXC_STD, PACA_EXGEN, 0
+EXC_COMMON_BEGIN(doorbell_super_common)
+   INT_COMMON 0xa00, PACA_EXGEN, 1, 1, 1, 0, 0
+   FINISH_NAP
+   RUNLATCH_ON
+   addir3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_PPC_DOORBELL
-EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, doorbell_exception)
+   bl  doorbell_exception
 #else
-EXC_COMMON_ASYNC(doorbell_super_common, 0xa00, unknown_exception)
+   bl  unknown_exception
 #endif
+   b   ret_from_except_lite
 
 
 EXC_REAL_NONE(0xb00, 0x100)
@@ -1610,7 +1611,12 @@ EXC_VIRT_BEGIN(single_step, 0x4d00, 0x100)
INT_HANDLER single_step, 0xd00, virt=1
 EXC_VIRT_END(single_step, 0x4d00, 0x100)
 INT_KVM_HANDLER single_step, 0xd00, EXC_STD, PACA_EXGEN, 0
-EXC_COMMON(single_step_common, 0xd00, single_step_exception)
+EXC_COMMON_BEGIN(single_step_common)
+   INT_COMMON 0xd00, PACA_EXGEN, 1, 1, 1, 0, 0
+   bl  save_nvgprs
+   addir3,r1,STACK_FRAME_

[PATCH v3 05/32] powerpc/64s/exception: Move all interrupt handlers to new style code gen macros

2020-02-25 Thread Nicholas Piggin
Aside from label names and BUG line numbers, the generated code change
is an additional HMI KVM handler added for the "late" KVM handler,
because early and late HMI generation is achieved by defining two
different interrupt types.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 556 ---
 1 file changed, 418 insertions(+), 138 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 0f1da3099c28..0157ba48efe9 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -206,8 +206,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IMASK  .L_IMASK_\name\()
 #define IKVM_SKIP  .L_IKVM_SKIP_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
+#define __IKVM_REAL(name)  .L_IKVM_REAL_ ## name
 #define IKVM_VIRT  .L_IKVM_VIRT_\name\()
 #define ISTACK .L_ISTACK_\name\()
+#define __ISTACK(name) .L_ISTACK_ ## name
 #define IRECONCILE .L_IRECONCILE_\name\()
 #define IKUAP  .L_IKUAP_\name\()
 
@@ -570,7 +572,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
/* nothing more */
.elseif \early
mfctr   r10 /* save ctr, even for !RELOCATABLE */
-   BRANCH_TO_C000(r11, \name\()_early_common)
+   BRANCH_TO_C000(r11, \name\()_common)
.elseif !\virt
INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri
.else
@@ -843,6 +845,19 @@ __start_interrupts:
 EXC_VIRT_NONE(0x4000, 0x100)
 
 
+INT_DEFINE_BEGIN(system_reset)
+   IVEC=0x100
+   IAREA=PACA_EXNMI
+   /*
+* MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
+* being used, so a nested NMI exception would corrupt it.
+*/
+   ISET_RI=0
+   ISTACK=0
+   IRECONCILE=0
+   IKVM_REAL=1
+INT_DEFINE_END(system_reset)
+
 EXC_REAL_BEGIN(system_reset, 0x100, 0x100)
 #ifdef CONFIG_PPC_P7_NAP
/*
@@ -880,11 +895,8 @@ BEGIN_FTR_SECTION
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 #endif
 
-   INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0, kvm=1
+   GEN_INT_ENTRY system_reset, virt=0
/*
-* MSR_RI is not enabled, because PACA_EXNMI and nmi stack is
-* being used, so a nested NMI exception would corrupt it.
-*
 * In theory, we should not enable relocation here if it was disabled
 * in SRR1, because the MMU may not be configured to support it (e.g.,
 * SLB may have been cleared). In practice, there should only be a few
@@ -893,7 +905,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
 */
 EXC_REAL_END(system_reset, 0x100, 0x100)
 EXC_VIRT_NONE(0x4100, 0x100)
-INT_KVM_HANDLER system_reset 0x100, EXC_STD, PACA_EXNMI, 0
+TRAMP_KVM_BEGIN(system_reset_kvm)
+   GEN_KVM system_reset
 
 #ifdef CONFIG_PPC_P7_NAP
 TRAMP_REAL_BEGIN(system_reset_idle_wake)
@@ -908,8 +921,8 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake)
  * Vectors for the FWNMI option.  Share common code.
  */
 TRAMP_REAL_BEGIN(system_reset_fwnmi)
-   /* See comment at system_reset exception, don't turn on RI */
-   INT_HANDLER system_reset, 0x100, area=PACA_EXNMI, ri=0
+   __IKVM_REAL(system_reset)=0
+   GEN_INT_ENTRY system_reset, virt=0
 
 #endif /* CONFIG_PPC_PSERIES */
 
@@ -929,7 +942,7 @@ EXC_COMMON_BEGIN(system_reset_common)
mr  r10,r1
ld  r1,PACA_NMI_EMERG_SP(r13)
subir1,r1,INT_FRAME_SIZE
-   INT_COMMON 0x100, PACA_EXNMI, 0, 1, 0, 0, 0
+   GEN_COMMON system_reset
bl  save_nvgprs
/*
 * Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does
@@ -971,23 +984,46 @@ EXC_COMMON_BEGIN(system_reset_common)
RFI_TO_USER_OR_KERNEL
 
 
-EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
-   INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, dar=1, 
dsisr=1
+INT_DEFINE_BEGIN(machine_check_early)
+   IVEC=0x200
+   IAREA=PACA_EXMC
/*
 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 * nested machine check corrupts it. machine_check_common enables
 * MSR_RI.
 */
+   ISET_RI=0
+   ISTACK=0
+   IEARLY=1
+   IDAR=1
+   IDSISR=1
+   IRECONCILE=0
+   IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
+INT_DEFINE_END(machine_check_early)
+
+INT_DEFINE_BEGIN(machine_check)
+   IVEC=0x200
+   IAREA=PACA_EXMC
+   ISET_RI=0
+   IDAR=1
+   IDSISR=1
+   IKVM_SKIP=1
+   IKVM_REAL=1
+INT_DEFINE_END(machine_check)
+
+EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
+   GEN_INT_ENTRY machine_check_early, virt=0
 EXC_REAL_END(machine_check, 0x200, 0x100)
 EXC_VIRT_NONE(0x4200, 0x100)
 
 #ifdef CONFIG_PPC_PSERIES
 TRAMP_REAL_BEGIN(machine_check_fwnmi)
/* See comment at machine_check exception, don't turn on RI */
-   INT_HANDLER machine_check, 0x200, early=1, area=PACA_EXMC, da

[PATCH v3 06/32] powerpc/64s/exception: Remove old INT_ENTRY macro

2020-02-25 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 68 
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 0157ba48efe9..74bf6e0bf61f 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -482,13 +482,13 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  * - Fall through and continue executing in real, unrelocated mode.
  *   This is done if early=2.
  */
-.macro INT_HANDLER name, vec, ool=0, early=0, virt=0, hsrr=0, area=PACA_EXGEN, 
ri=1, dar=0, dsisr=0, bitmask=0, kvm=0
+.macro GEN_INT_ENTRY name, virt, ool=0
SET_SCRATCH0(r13)   /* save r13 */
GET_PACA(r13)
-   std r9,\area\()+EX_R9(r13)  /* save r9 */
+   std r9,IAREA+EX_R9(r13) /* save r9 */
OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
HMT_MEDIUM
-   std r10,\area\()+EX_R10(r13)/* save r10 - r12 */
+   std r10,IAREA+EX_R10(r13)   /* save r10 - r12 */
OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
.if \ool
.if !\virt
@@ -502,47 +502,47 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
.endif
 
-   OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR)
-   OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR)
+   OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR)
+   OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR)
INTERRUPT_TO_KERNEL
-   SAVE_CTR(r10, \area\())
+   SAVE_CTR(r10, IAREA)
mfcrr9
-   .if \kvm
-   KVMTEST \name \hsrr \vec
+   .if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
+   KVMTEST \name IHSRR IVEC
.endif
-   .if \bitmask
+   .if IMASK
lbz r10,PACAIRQSOFTMASK(r13)
-   andi.   r10,r10,\bitmask
+   andi.   r10,r10,IMASK
/* Associate vector numbers with bits in paca->irq_happened */
-   .if \vec == 0x500 || \vec == 0xea0
+   .if IVEC == 0x500 || IVEC == 0xea0
li  r10,PACA_IRQ_EE
-   .elseif \vec == 0x900
+   .elseif IVEC == 0x900
li  r10,PACA_IRQ_DEC
-   .elseif \vec == 0xa00 || \vec == 0xe80
+   .elseif IVEC == 0xa00 || IVEC == 0xe80
li  r10,PACA_IRQ_DBELL
-   .elseif \vec == 0xe60
+   .elseif IVEC == 0xe60
li  r10,PACA_IRQ_HMI
-   .elseif \vec == 0xf00
+   .elseif IVEC == 0xf00
li  r10,PACA_IRQ_PMI
.else
.abort "Bad maskable vector"
.endif
 
-   .if \hsrr == EXC_HV_OR_STD
+   .if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
bne masked_Hinterrupt
FTR_SECTION_ELSE
bne masked_interrupt
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
+   .elseif IHSRR
bne masked_Hinterrupt
.else
bne masked_interrupt
.endif
.endif
 
-   std r11,\area\()+EX_R11(r13)
-   std r12,\area\()+EX_R12(r13)
+   std r11,IAREA+EX_R11(r13)
+   std r12,IAREA+EX_R12(r13)
 
/*
 * DAR/DSISR, SCRATCH0 must be read before setting MSR[RI],
@@ -550,47 +550,39 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 * not recoverable if they are live.
 */
GET_SCRATCH0(r10)
-   std r10,\area\()+EX_R13(r13)
-   .if \dar == 1
-   .if \hsrr
+   std r10,IAREA+EX_R13(r13)
+   .if IDAR == 1
+   .if IHSRR
mfspr   r10,SPRN_HDAR
.else
mfspr   r10,SPRN_DAR
.endif
-   std r10,\area\()+EX_DAR(r13)
+   std r10,IAREA+EX_DAR(r13)
.endif
-   .if \dsisr == 1
-   .if \hsrr
+   .if IDSISR == 1
+   .if IHSRR
mfspr   r10,SPRN_HDSISR
.else
mfspr   r10,SPRN_DSISR
.endif
-   stw r10,\area\()+EX_DSISR(r13)
+   stw r10,IAREA+EX_DSISR(r13)
.endif
 
-   .if \early == 2
+   .if IEARLY == 2
/* nothing more */
-   .elseif \early
+   .elseif IEARLY
mfctr   r10 /* save ctr, even for !RELOCATABLE */
BRANCH_TO_C000(r11, \name\()_common)
.elseif !\virt
-   INT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr, \ri
+   INT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR, ISET_RI
.else
-   INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, \hsrr
+   INT_VIRT_SAVE_SRR_AND_JUMP \name\()_common, IHSRR
.endif
.if \ool
.popsection
.endif
 .endm
 

[PATCH v3 08/32] powerpc/64s/exception: Remove old INT_KVM_HANDLER

2020-02-25 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 55 +---
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 90514766dc7d..cba99f9a815b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -266,15 +266,6 @@ do_define_int n
.endif
 .endm
 
-.macro INT_KVM_HANDLER name, vec, hsrr, area, skip
-   TRAMP_KVM_BEGIN(\name\()_kvm)
-   KVM_HANDLER \vec, \hsrr, \area, \skip
-.endm
-
-.macro GEN_KVM name
-   KVM_HANDLER IVEC, IHSRR, IAREA, IKVM_SKIP
-.endm
-
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -293,35 +284,35 @@ do_define_int n
bne \name\()_kvm
 .endm
 
-.macro KVM_HANDLER vec, hsrr, area, skip
-   .if \skip
+.macro GEN_KVM name
+   .if IKVM_SKIP
cmpwi   r10,KVM_GUEST_MODE_SKIP
beq 89f
.else
 BEGIN_FTR_SECTION_NESTED(947)
-   ld  r10,\area+EX_CFAR(r13)
+   ld  r10,IAREA+EX_CFAR(r13)
std r10,HSTATE_CFAR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
.endif
 
 BEGIN_FTR_SECTION_NESTED(948)
-   ld  r10,\area+EX_PPR(r13)
+   ld  r10,IAREA+EX_PPR(r13)
std r10,HSTATE_PPR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
-   ld  r10,\area+EX_R10(r13)
+   ld  r10,IAREA+EX_R10(r13)
std r12,HSTATE_SCRATCH0(r13)
sldir12,r9,32
/* HSRR variants have the 0x2 bit added to their trap number */
-   .if \hsrr == EXC_HV_OR_STD
+   .if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
-   ori r12,r12,(\vec + 0x2)
+   ori r12,r12,(IVEC + 0x2)
FTR_SECTION_ELSE
-   ori r12,r12,(\vec)
+   ori r12,r12,(IVEC)
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
-   ori r12,r12,(\vec + 0x2)
+   .elseif IHSRR
+   ori r12,r12,(IVEC+ 0x2)
.else
-   ori r12,r12,(\vec)
+   ori r12,r12,(IVEC)
.endif
 
 #ifdef CONFIG_RELOCATABLE
@@ -334,25 +325,25 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
std r9,HSTATE_SCRATCH1(r13)
__LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
mtctr   r9
-   ld  r9,\area+EX_R9(r13)
+   ld  r9,IAREA+EX_R9(r13)
bctr
 #else
-   ld  r9,\area+EX_R9(r13)
+   ld  r9,IAREA+EX_R9(r13)
b   kvmppc_interrupt
 #endif
 
 
-   .if \skip
+   .if IKVM_SKIP
 89:mtocrf  0x80,r9
-   ld  r9,\area+EX_R9(r13)
-   ld  r10,\area+EX_R10(r13)
-   .if \hsrr == EXC_HV_OR_STD
+   ld  r9,IAREA+EX_R9(r13)
+   ld  r10,IAREA+EX_R10(r13)
+   .if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
b   kvmppc_skip_Hinterrupt
FTR_SECTION_ELSE
b   kvmppc_skip_interrupt
ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
+   .elseif IHSRR
b   kvmppc_skip_Hinterrupt
.else
b   kvmppc_skip_interrupt
@@ -363,7 +354,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 #else
 .macro KVMTEST name, hsrr, n
 .endm
-.macro KVM_HANDLER name, vec, hsrr, area, skip
+.macro GEN_KVM name
 .endm
 #endif
 
@@ -1627,6 +1618,12 @@ EXC_VIRT_NONE(0x4b00, 0x100)
  * without saving, though xer is not a good idea to use, as hardware may
  * interpret some bits so it may be costly to change them.
  */
+INT_DEFINE_BEGIN(system_call)
+   IVEC=0xc00
+   IKVM_REAL=1
+   IKVM_VIRT=1
+INT_DEFINE_END(system_call)
+
 .macro SYSTEM_CALL virt
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
/*
@@ -1720,7 +1717,7 @@ TRAMP_KVM_BEGIN(system_call_kvm)
SET_SCRATCH0(r10)
std r9,PACA_EXGEN+EX_R9(r13)
mfcrr9
-   KVM_HANDLER 0xc00, EXC_STD, PACA_EXGEN, 0
+   GEN_KVM system_call
 #endif
 
 
-- 
2.23.0



[PATCH v3 07/32] powerpc/64s/exception: Remove old INT_COMMON macro

2020-02-25 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 51 +---
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 74bf6e0bf61f..90514766dc7d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -591,8 +591,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  * If stack=0, then the stack is already set in r1, and r1 is saved in r10.
  * PPR save and CPU accounting is not done for the !stack case (XXX why not?)
  */
-.macro INT_COMMON vec, area, stack, kuap, reconcile, dar, dsisr
-   .if \stack
+.macro GEN_COMMON name
+   .if ISTACK
andi.   r10,r12,MSR_PR  /* See if coming from user  */
mr  r10,r1  /* Save r1  */
subir1,r1,INT_FRAME_SIZE/* alloc frame on kernel stack  */
@@ -609,54 +609,54 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
std r0,GPR0(r1) /* save r0 in stackframe*/
std r10,GPR1(r1)/* save r1 in stackframe*/
 
-   .if \stack
-   .if \kuap
+   .if ISTACK
+   .if IKUAP
kuap_save_amr_and_lock r9, r10, cr1, cr0
.endif
beq 101f/* if from kernel mode  */
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10)
-   SAVE_PPR(\area, r9)
+   SAVE_PPR(IAREA, r9)
 101:
.else
-   .if \kuap
+   .if IKUAP
kuap_save_amr_and_lock r9, r10, cr1
.endif
.endif
 
/* Save original regs values from save area to stack frame. */
-   ld  r9,\area+EX_R9(r13) /* move r9, r10 to stackframe   */
-   ld  r10,\area+EX_R10(r13)
+   ld  r9,IAREA+EX_R9(r13) /* move r9, r10 to stackframe   */
+   ld  r10,IAREA+EX_R10(r13)
std r9,GPR9(r1)
std r10,GPR10(r1)
-   ld  r9,\area+EX_R11(r13)/* move r11 - r13 to stackframe */
-   ld  r10,\area+EX_R12(r13)
-   ld  r11,\area+EX_R13(r13)
+   ld  r9,IAREA+EX_R11(r13)/* move r11 - r13 to stackframe */
+   ld  r10,IAREA+EX_R12(r13)
+   ld  r11,IAREA+EX_R13(r13)
std r9,GPR11(r1)
std r10,GPR12(r1)
std r11,GPR13(r1)
-   .if \dar
-   .if \dar == 2
+   .if IDAR
+   .if IDAR == 2
ld  r10,_NIP(r1)
.else
-   ld  r10,\area+EX_DAR(r13)
+   ld  r10,IAREA+EX_DAR(r13)
.endif
std r10,_DAR(r1)
.endif
-   .if \dsisr
-   .if \dsisr == 2
+   .if IDSISR
+   .if IDSISR == 2
ld  r10,_MSR(r1)
lis r11,DSISR_SRR1_MATCH_64S@h
and r10,r10,r11
.else
-   lwz r10,\area+EX_DSISR(r13)
+   lwz r10,IAREA+EX_DSISR(r13)
.endif
std r10,_DSISR(r1)
.endif
 BEGIN_FTR_SECTION_NESTED(66)
-   ld  r10,\area+EX_CFAR(r13)
+   ld  r10,IAREA+EX_CFAR(r13)
std r10,ORIG_GPR3(r1)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
-   GET_CTR(r10, \area)
+   GET_CTR(r10, IAREA)
std r10,_CTR(r1)
std r2,GPR2(r1) /* save r2 in stackframe*/
SAVE_4GPRS(3, r1)   /* save r3 - r6 in stackframe   */
@@ -668,26 +668,22 @@ END_FTR_SECTION_NESTED(CPU_FTR_CFAR, CPU_FTR_CFAR, 66)
mfspr   r11,SPRN_XER/* save XER in stackframe   */
std r10,SOFTE(r1)
std r11,_XER(r1)
-   li  r9,(\vec)+1
+   li  r9,(IVEC)+1
std r9,_TRAP(r1)/* set trap number  */
li  r10,0
ld  r11,exception_marker@toc(r2)
std r10,RESULT(r1)  /* clear regs->result   */
std r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame   */
 
-   .if \stack
+   .if ISTACK
ACCOUNT_STOLEN_TIME
.endif
 
-   .if \reconcile
+   .if IRECONCILE
RECONCILE_IRQ_STATE(r10, r11)
.endif
 .endm
 
-.macro GEN_COMMON name
-   INT_COMMON IVEC, IAREA, ISTACK, IKUAP, IRECONCILE, IDAR, IDSISR
-.endm
-
 /*
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
@@ -2387,7 +2383,8 @@ EXC_COMMON_BEGIN(soft_nmi_common)
mr  r10,r1
ld  r1,PACAEMERGSP(r13)
subir1,r1,INT_FRAME_SIZE
-   INT_COMMON 0x900, PACA_EXGEN, 0, 1, 1, 0, 0
+   __ISTACK(decrementer)=0
+   GEN_COMMON decrementer
bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  soft_nmi_interrupt
-- 
2.23.0



[PATCH v3 09/32] powerpc/64s/exception: Add ISIDE option

2020-02-25 Thread Nicholas Piggin
Rather than using DAR=2 to select the i-side registers, add an
explicit option.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index cba99f9a815b..4eb099046f9d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -199,6 +199,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IVEC   .L_IVEC_\name\()
 #define IHSRR  .L_IHSRR_\name\()
 #define IAREA  .L_IAREA_\name\()
+#define IISIDE .L_IISIDE_\name\()
 #define IDAR   .L_IDAR_\name\()
 #define IDSISR .L_IDSISR_\name\()
 #define ISET_RI.L_ISET_RI_\name\()
@@ -231,6 +232,9 @@ do_define_int n
.ifndef IAREA
IAREA=PACA_EXGEN
.endif
+   .ifndef IISIDE
+   IISIDE=0
+   .endif
.ifndef IDAR
IDAR=0
.endif
@@ -542,7 +546,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 */
GET_SCRATCH0(r10)
std r10,IAREA+EX_R13(r13)
-   .if IDAR == 1
+   .if IDAR && !IISIDE
.if IHSRR
mfspr   r10,SPRN_HDAR
.else
@@ -550,7 +554,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
std r10,IAREA+EX_DAR(r13)
.endif
-   .if IDSISR == 1
+   .if IDSISR && !IISIDE
.if IHSRR
mfspr   r10,SPRN_HDSISR
.else
@@ -625,16 +629,18 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
std r9,GPR11(r1)
std r10,GPR12(r1)
std r11,GPR13(r1)
+
.if IDAR
-   .if IDAR == 2
+   .if IISIDE
ld  r10,_NIP(r1)
.else
ld  r10,IAREA+EX_DAR(r13)
.endif
std r10,_DAR(r1)
.endif
+
.if IDSISR
-   .if IDSISR == 2
+   .if IISIDE
ld  r10,_MSR(r1)
lis r11,DSISR_SRR1_MATCH_64S@h
and r10,r10,r11
@@ -643,6 +649,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
std r10,_DSISR(r1)
.endif
+
 BEGIN_FTR_SECTION_NESTED(66)
ld  r10,IAREA+EX_CFAR(r13)
std r10,ORIG_GPR3(r1)
@@ -1311,8 +1318,9 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(instruction_access)
IVEC=0x400
-   IDAR=2
-   IDSISR=2
+   IISIDE=1
+   IDAR=1
+   IDSISR=1
IKVM_REAL=1
 INT_DEFINE_END(instruction_access)
 
@@ -1341,7 +1349,8 @@ INT_DEFINE_BEGIN(instruction_access_slb)
IVEC=0x480
IAREA=PACA_EXSLB
IRECONCILE=0
-   IDAR=2
+   IISIDE=1
+   IDAR=1
IKVM_REAL=1
 INT_DEFINE_END(instruction_access_slb)
 
-- 
2.23.0



[PATCH v3 10/32] powerpc/64s/exception: move real->virt switch into the common handler

2020-02-25 Thread Nicholas Piggin
The real mode interrupt entry points currently use rfid to branch to
the common handler in virtual mode. This is a significant amount of
code, and forces other code (notably the KVM test) to live in the
real mode handler.

In the interest of minimising the amount of code that runs unrelocated
move the switch to virt mode into the common code, and do it with
mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST
performance of real-mode interrupt handlers is not a big concern these
days).

This requires CTR to always be saved (real-mode needs to reach 0xc...)
but that's not a huge impact these days. It could be optimized away in
future.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h |   4 -
 arch/powerpc/kernel/exceptions-64s.S | 251 ++-
 2 files changed, 109 insertions(+), 146 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 33f4f72eb035..47bd4ea0837d 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -33,11 +33,7 @@
 #include 
 
 /* PACA save area size in u64 units (exgen, exmc, etc) */
-#if defined(CONFIG_RELOCATABLE)
 #define EX_SIZE10
-#else
-#define EX_SIZE9
-#endif
 
 /*
  * maximum recursive depth of MCE exceptions
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 4eb099046f9d..112cdb446e03 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -32,16 +32,10 @@
 #define EX_CCR 52
 #define EX_CFAR56
 #define EX_PPR 64
-#if defined(CONFIG_RELOCATABLE)
 #define EX_CTR 72
 .if EX_SIZE != 10
.error "EX_SIZE is wrong"
 .endif
-#else
-.if EX_SIZE != 9
-   .error "EX_SIZE is wrong"
-.endif
-#endif
 
 /*
  * Following are fixed section helper macros.
@@ -124,22 +118,6 @@ name:
 #define EXC_HV 1
 #define EXC_STD0
 
-#if defined(CONFIG_RELOCATABLE)
-/*
- * If we support interrupts with relocation on AND we're a relocatable kernel,
- * we need to use CTR to get to the 2nd level handler.  So, save/restore it
- * when required.
- */
-#define SAVE_CTR(reg, area)mfctr   reg ;   std reg,area+EX_CTR(r13)
-#define GET_CTR(reg, area) ld  reg,area+EX_CTR(r13)
-#define RESTORE_CTR(reg, area) ld  reg,area+EX_CTR(r13) ; mtctr reg
-#else
-/* ...else CTR is unused and in register. */
-#define SAVE_CTR(reg, area)
-#define GET_CTR(reg, area) mfctr   reg
-#define RESTORE_CTR(reg, area)
-#endif
-
 /*
  * PPR save/restore macros used in exceptions-64s.S
  * Used for P7 or later processors
@@ -199,6 +177,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IVEC   .L_IVEC_\name\()
 #define IHSRR  .L_IHSRR_\name\()
 #define IAREA  .L_IAREA_\name\()
+#define IVIRT  .L_IVIRT_\name\()
 #define IISIDE .L_IISIDE_\name\()
 #define IDAR   .L_IDAR_\name\()
 #define IDSISR .L_IDSISR_\name\()
@@ -232,6 +211,9 @@ do_define_int n
.ifndef IAREA
IAREA=PACA_EXGEN
.endif
+   .ifndef IVIRT
+   IVIRT=1
+   .endif
.ifndef IISIDE
IISIDE=0
.endif
@@ -325,7 +307,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 * outside the head section. CONFIG_RELOCATABLE KVM expects CTR
 * to be saved in HSTATE_SCRATCH1.
 */
-   mfctr   r9
+   ld  r9,IAREA+EX_CTR(r13)
std r9,HSTATE_SCRATCH1(r13)
__LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
mtctr   r9
@@ -362,101 +344,6 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 .endm
 #endif
 
-.macro INT_SAVE_SRR_AND_JUMP label, hsrr, set_ri
-   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
-   .if ! \set_ri
-   xorir10,r10,MSR_RI  /* Clear MSR_RI */
-   .endif
-   .if \hsrr == EXC_HV_OR_STD
-   BEGIN_FTR_SECTION
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   mtspr   SPRN_HSRR1,r10
-   FTR_SECTION_ELSE
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   mtspr   SPRN_SRR1,r10
-   ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif \hsrr
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   mtspr   SPRN_HSRR1,r10
-   .else
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   mtspr   SPRN_SRR1,r10
-   .endif
-   LOAD_HANDLER(r10, \label\())
-   .if \hsrr == EXC_HV_OR_STD
-   BEGIN_FTR_SECTION
-   mtspr   SPRN_HSRR0,r10
-   HRFI_TO_KERNEL
-   FTR_SECTION_ELSE
-   mtspr   SPRN_SRR0,r10
-   RFI_TO_KERNE

[PATCH v3 11/32] powerpc/64s/exception: move soft-mask test to common code

2020-02-25 Thread Nicholas Piggin
As well as moving code out of the unrelocated vectors, this allows the
masked handlers to be moved to common code, and allows the soft_nmi
handler to be generated more like a regular handler.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 106 +--
 1 file changed, 49 insertions(+), 57 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 112cdb446e03..a23f2450f9ed 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -411,36 +411,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
KVMTEST \name IHSRR IVEC
.endif
-   .if IMASK
-   lbz r10,PACAIRQSOFTMASK(r13)
-   andi.   r10,r10,IMASK
-   /* Associate vector numbers with bits in paca->irq_happened */
-   .if IVEC == 0x500 || IVEC == 0xea0
-   li  r10,PACA_IRQ_EE
-   .elseif IVEC == 0x900
-   li  r10,PACA_IRQ_DEC
-   .elseif IVEC == 0xa00 || IVEC == 0xe80
-   li  r10,PACA_IRQ_DBELL
-   .elseif IVEC == 0xe60
-   li  r10,PACA_IRQ_HMI
-   .elseif IVEC == 0xf00
-   li  r10,PACA_IRQ_PMI
-   .else
-   .abort "Bad maskable vector"
-   .endif
-
-   .if IHSRR == EXC_HV_OR_STD
-   BEGIN_FTR_SECTION
-   bne masked_Hinterrupt
-   FTR_SECTION_ELSE
-   bne masked_interrupt
-   ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
-   .elseif IHSRR
-   bne masked_Hinterrupt
-   .else
-   bne masked_interrupt
-   .endif
-   .endif
 
std r11,IAREA+EX_R11(r13)
std r12,IAREA+EX_R12(r13)
@@ -524,6 +494,37 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 .endm
 
 .macro __GEN_COMMON_BODY name
+   .if IMASK
+   lbz r10,PACAIRQSOFTMASK(r13)
+   andi.   r10,r10,IMASK
+   /* Associate vector numbers with bits in paca->irq_happened */
+   .if IVEC == 0x500 || IVEC == 0xea0
+   li  r10,PACA_IRQ_EE
+   .elseif IVEC == 0x900
+   li  r10,PACA_IRQ_DEC
+   .elseif IVEC == 0xa00 || IVEC == 0xe80
+   li  r10,PACA_IRQ_DBELL
+   .elseif IVEC == 0xe60
+   li  r10,PACA_IRQ_HMI
+   .elseif IVEC == 0xf00
+   li  r10,PACA_IRQ_PMI
+   .else
+   .abort "Bad maskable vector"
+   .endif
+
+   .if IHSRR == EXC_HV_OR_STD
+   BEGIN_FTR_SECTION
+   bne masked_Hinterrupt
+   FTR_SECTION_ELSE
+   bne masked_interrupt
+   ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
+   .elseif IHSRR
+   bne masked_Hinterrupt
+   .else
+   bne masked_interrupt
+   .endif
+   .endif
+
.if ISTACK
andi.   r10,r12,MSR_PR  /* See if coming from user  */
mr  r10,r1  /* Save r1  */
@@ -2330,18 +2331,10 @@ EXC_VIRT_NONE(0x5800, 0x100)
 
 #ifdef CONFIG_PPC_WATCHDOG
 
-#define MASKED_DEC_HANDLER_LABEL 3f
-
-#define MASKED_DEC_HANDLER(_H) \
-3: /* soft-nmi */  \
-   std r12,PACA_EXGEN+EX_R12(r13); \
-   GET_SCRATCH0(r10);  \
-   std r10,PACA_EXGEN+EX_R13(r13); \
-   mfspr   r11,SPRN_SRR0;  /* save SRR0 */ \
-   mfspr   r12,SPRN_SRR1;  /* and SRR1 */  \
-   LOAD_HANDLER(r10, soft_nmi_common); \
-   mtctr   r10;\
-   bctr
+INT_DEFINE_BEGIN(soft_nmi)
+   IVEC=0x900
+   ISTACK=0
+INT_DEFINE_END(soft_nmi)
 
 /*
  * Branch to soft_nmi_interrupt using the emergency stack. The emergency
@@ -2353,19 +2346,16 @@ EXC_VIRT_NONE(0x5800, 0x100)
  * and run it entirely with interrupts hard disabled.
  */
 EXC_COMMON_BEGIN(soft_nmi_common)
+   mfspr   r11,SPRN_SRR0
mr  r10,r1
ld  r1,PACAEMERGSP(r13)
subir1,r1,INT_FRAME_SIZE
-   __ISTACK(decrementer)=0
-   __GEN_COMMON_BODY decrementer
+   __GEN_COMMON_BODY soft_nmi
bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  soft_nmi_interrupt
b   ret_from_except
 
-#else /* CONFIG_PPC_WATCHDOG */
-#define MASKED_DEC_HANDLER_LABEL 2f /* normal return */
-#define MASKED_DEC_HANDLER(_H)
 #endif /* CONFIG_PPC_WATCHDOG */
 
 /*
@@ -2384,7 +2374,6 @@ masked_Hinterrupt:
.else
 masked_interrupt:
.

[PATCH v3 14/32] powerpc/64s/exception: remove the SPR saving patch code macros

2020-02-25 Thread Nicholas Piggin
These are used infrequently enough they don't provide much help, so
inline them.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 82 ++--
 1 file changed, 28 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index f4f35d01fe00..feb563416abd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -110,46 +110,6 @@ name:
 #define EXC_HV 1
 #define EXC_STD0
 
-/*
- * PPR save/restore macros used in exceptions-64s.S
- * Used for P7 or later processors
- */
-#define SAVE_PPR(area, ra) \
-BEGIN_FTR_SECTION_NESTED(940)  \
-   ld  ra,area+EX_PPR(r13);/* Read PPR from paca */\
-   std ra,_PPR(r1);\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940)
-
-#define RESTORE_PPR_PACA(area, ra) \
-BEGIN_FTR_SECTION_NESTED(941)  \
-   ld  ra,area+EX_PPR(r13);\
-   mtspr   SPRN_PPR,ra;\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
-
-/*
- * Get an SPR into a register if the CPU has the given feature
- */
-#define OPT_GET_SPR(ra, spr, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   mfspr   ra,spr; \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Set an SPR from a register if the CPU has the given feature
- */
-#define OPT_SET_SPR(ra, spr, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   mtspr   spr,ra; \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Save a register to the PACA if the CPU has the given feature
- */
-#define OPT_SAVE_REG_TO_PACA(offset, ra, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   std ra,offset(r13); \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
 /*
  * Branch to label using its 0xC000 address. This results in instruction
  * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
@@ -278,18 +238,18 @@ do_define_int n
cmpwi   r10,KVM_GUEST_MODE_SKIP
beq 89f
.else
-BEGIN_FTR_SECTION_NESTED(947)
+BEGIN_FTR_SECTION
ld  r10,IAREA+EX_CFAR(r13)
std r10,HSTATE_CFAR(r13)
-END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
.endif
 
ld  r10,PACA_EXGEN+EX_CTR(r13)
mtctr   r10
-BEGIN_FTR_SECTION_NESTED(948)
+BEGIN_FTR_SECTION
ld  r10,IAREA+EX_PPR(r13)
std r10,HSTATE_PPR(r13)
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r11,IAREA+EX_R11(r13)
ld  r12,IAREA+EX_R12(r13)
std r12,HSTATE_SCRATCH0(r13)
@@ -386,10 +346,14 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
SET_SCRATCH0(r13)   /* save r13 */
GET_PACA(r13)
std r9,IAREA+EX_R9(r13) /* save r9 */
-   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
+BEGIN_FTR_SECTION
+   mfspr   r9,SPRN_PPR
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
HMT_MEDIUM
std r10,IAREA+EX_R10(r13)   /* save r10 - r12 */
-   OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
+BEGIN_FTR_SECTION
+   mfspr   r10,SPRN_CFAR
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
.if \ool
.if !\virt
b   tramp_real_\name
@@ -402,8 +366,12 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.endif
.endif
 
-   OPT_SAVE_REG_TO_PACA(IAREA+EX_PPR, r9, CPU_FTR_HAS_PPR)
-   OPT_SAVE_REG_TO_PACA(IAREA+EX_CFAR, r10, CPU_FTR_CFAR)
+BEGIN_FTR_SECTION
+   std r9,IAREA+EX_PPR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+BEGIN_FTR_SECTION
+   std r10,IAREA+EX_CFAR(r13)
+END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
INTERRUPT_TO_KERNEL
mfctr   r10
std r10,IAREA+EX_CTR(r13)
@@ -558,7 +526,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
.endif
beq 101f/* if from kernel mode  */
ACCOUNT_CPU_USER_ENTRY(r13, r9, r10)
-   SAVE_PPR(IAREA, r9)
+BEGIN_FTR_SECTION
+   ld  r9,IAREA+EX_PPR(r13)/* Read PPR from paca   */
+   std r9,_PPR(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 101:
.else
.if IKUAP
@@ -598,10 +569,10 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
std r10,_DSISR(r1)
.endif
 
-BEGIN_FT

[PATCH v3 13/32] powerpc/64s/exception: remove confusing IEARLY option

2020-02-25 Thread Nicholas Piggin
Replace IEARLY=1 and IEARLY=2 with IBRANCH_COMMON, which controls if
the entry code branches to a common handler; and IREALMODE_COMMON,
which controls whether the common handler should remain in real mode.

These special cases no longer avoid loading the SRR registers, there
is no point as most of them load the registers immediately anyway.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 48 ++--
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index eb2f6ee4d652..f4f35d01fe00 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -174,7 +174,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define IDAR   .L_IDAR_\name\()
 #define IDSISR .L_IDSISR_\name\()
 #define ISET_RI.L_ISET_RI_\name\()
-#define IEARLY .L_IEARLY_\name\()
+#define IBRANCH_TO_COMMON  .L_IBRANCH_TO_COMMON_\name\()
+#define IREALMODE_COMMON   .L_IREALMODE_COMMON_\name\()
 #define IMASK  .L_IMASK_\name\()
 #define IKVM_SKIP  .L_IKVM_SKIP_\name\()
 #define IKVM_REAL  .L_IKVM_REAL_\name\()
@@ -218,8 +219,15 @@ do_define_int n
.ifndef ISET_RI
ISET_RI=1
.endif
-   .ifndef IEARLY
-   IEARLY=0
+   .ifndef IBRANCH_TO_COMMON
+   IBRANCH_TO_COMMON=1
+   .endif
+   .ifndef IREALMODE_COMMON
+   IREALMODE_COMMON=0
+   .else
+   .if ! IBRANCH_TO_COMMON
+   .error "IREALMODE_COMMON=1 but IBRANCH_TO_COMMON=0"
+   .endif
.endif
.ifndef IMASK
IMASK=0
@@ -353,6 +361,11 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
  */
 
 .macro GEN_BRANCH_TO_COMMON name, virt
+   .if IREALMODE_COMMON
+   LOAD_HANDLER(r10, \name\()_common)
+   mtctr   r10
+   bctr
+   .else
.if \virt
 #ifndef CONFIG_RELOCATABLE
b   \name\()_common_virt
@@ -366,6 +379,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
mtctr   r10
bctr
.endif
+   .endif
 .endm
 
 .macro GEN_INT_ENTRY name, virt, ool=0
@@ -421,11 +435,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
stw r10,IAREA+EX_DSISR(r13)
.endif
 
-   .if IEARLY == 2
-   /* nothing more */
-   .elseif IEARLY
-   BRANCH_TO_C000(r11, \name\()_common)
-   .else
.if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
@@ -441,6 +450,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
mfspr   r11,SPRN_SRR0   /* save SRR0 */
mfspr   r12,SPRN_SRR1   /* and SRR1 */
.endif
+
+   .if IBRANCH_TO_COMMON
GEN_BRANCH_TO_COMMON \name \virt
.endif
 
@@ -926,6 +937,7 @@ INT_DEFINE_BEGIN(machine_check_early)
IVEC=0x200
IAREA=PACA_EXMC
IVIRT=0 /* no virt entry point */
+   IREALMODE_COMMON=1
/*
 * MSR_RI is not enabled, because PACA_EXMC is being used, so a
 * nested machine check corrupts it. machine_check_common enables
@@ -933,7 +945,6 @@ INT_DEFINE_BEGIN(machine_check_early)
 */
ISET_RI=0
ISTACK=0
-   IEARLY=1
IDAR=1
IDSISR=1
IRECONCILE=0
@@ -973,9 +984,6 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi)
EXCEPTION_RESTORE_REGS EXC_STD
 
 EXC_COMMON_BEGIN(machine_check_early_common)
-   mfspr   r11,SPRN_SRR0
-   mfspr   r12,SPRN_SRR1
-
/*
 * Switch to mc_emergency stack and handle re-entrancy (we limit
 * the nested MCE upto level 4 to avoid stack overflow).
@@ -1809,7 +1817,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
 INT_DEFINE_BEGIN(hmi_exception_early)
IVEC=0xe60
IHSRR=EXC_HV
-   IEARLY=1
+   IREALMODE_COMMON=1
ISTACK=0
IRECONCILE=0
IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
@@ -1829,8 +1837,6 @@ EXC_REAL_END(hmi_exception, 0xe60, 0x20)
 EXC_VIRT_NONE(0x4e60, 0x20)
 
 EXC_COMMON_BEGIN(hmi_exception_early_common)
-   mfspr   r11,SPRN_HSRR0  /* Save HSRR0 */
-   mfspr   r12,SPRN_HSRR1  /* Save HSRR1 */
mr  r10,r1  /* Save r1 */
ld  r1,PACAEMERGSP(r13) /* Use emergency stack for realmode */
subir1,r1,INT_FRAME_SIZE/* alloc stack frame*/
@@ -2156,29 +2162,23 @@ EXC_VIRT_NONE(0x5400, 0x100)
 INT_DEFINE_BEGIN(denorm_exception)
IVEC=0x1500
IHSRR=EXC_HV
-   IEARLY=2
+   IBRANCH_TO_COMMON=0
IKVM_REAL=1
 INT_DEFINE_END(denorm_exception)
 
 EXC_REAL_BEGIN(denorm_exception, 0x1500, 0x100)
GEN_INT_ENTRY denorm_exception, virt=0
 #ifdef CONFIG_PPC_DENORMALISATION
-   mfspr   r10,SPRN_HSRR1
-   andis.  r10,r10,(HSRR1_DENORM)@h /* denorm

[PATCH v3 12/32] powerpc/64s/exception: move KVM test to common code

2020-02-25 Thread Nicholas Piggin
This allows more code to be moved out of unrelocated regions. The system
call KVMTEST is changed to be open-coded and remain in the tramp area to
avoid having to move it to entry_64.S. The custom nature of the system
call entry code means the hcall case can be made more streamlined than
regular interrupt handlers.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S| 239 
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  11 --
 arch/powerpc/kvm/book3s_segment.S   |   7 -
 3 files changed, 119 insertions(+), 138 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index a23f2450f9ed..eb2f6ee4d652 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -44,7 +44,6 @@
  * EXC_VIRT_BEGIN/END  - virt (AIL), unrelocated exception vectors
  * TRAMP_REAL_BEGIN- real, unrelocated helpers (virt may call these)
  * TRAMP_VIRT_BEGIN- virt, unreloc helpers (in practice, real can use)
- * TRAMP_KVM_BEGIN - KVM handlers, these are put into real, unrelocated
  * EXC_COMMON  - After switching to virtual, relocated mode.
  */
 
@@ -74,13 +73,6 @@ name:
 #define TRAMP_VIRT_BEGIN(name) \
FIXED_SECTION_ENTRY_BEGIN(virt_trampolines, name)
 
-#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
-#define TRAMP_KVM_BEGIN(name)  \
-   TRAMP_VIRT_BEGIN(name)
-#else
-#define TRAMP_KVM_BEGIN(name)
-#endif
-
 #define EXC_REAL_NONE(start, size) \
FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, 
exc_real_##start##_##unused, start, size); \
FIXED_SECTION_ENTRY_END_LOCATION(real_vectors, 
exc_real_##start##_##unused, start, size)
@@ -271,6 +263,9 @@ do_define_int n
 .endm
 
 .macro GEN_KVM name
+   .balign IFETCH_ALIGN_BYTES
+\name\()_kvm:
+
.if IKVM_SKIP
cmpwi   r10,KVM_GUEST_MODE_SKIP
beq 89f
@@ -281,13 +276,18 @@ BEGIN_FTR_SECTION_NESTED(947)
 END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
.endif
 
+   ld  r10,PACA_EXGEN+EX_CTR(r13)
+   mtctr   r10
 BEGIN_FTR_SECTION_NESTED(948)
ld  r10,IAREA+EX_PPR(r13)
std r10,HSTATE_PPR(r13)
 END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
-   ld  r10,IAREA+EX_R10(r13)
+   ld  r11,IAREA+EX_R11(r13)
+   ld  r12,IAREA+EX_R12(r13)
std r12,HSTATE_SCRATCH0(r13)
sldir12,r9,32
+   ld  r9,IAREA+EX_R9(r13)
+   ld  r10,IAREA+EX_R10(r13)
/* HSRR variants have the 0x2 bit added to their trap number */
.if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
@@ -300,29 +300,16 @@ 
END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
.else
ori r12,r12,(IVEC)
.endif
-
-#ifdef CONFIG_RELOCATABLE
-   /*
-* KVM requires __LOAD_FAR_HANDLER beause kvmppc_interrupt lives
-* outside the head section. CONFIG_RELOCATABLE KVM expects CTR
-* to be saved in HSTATE_SCRATCH1.
-*/
-   ld  r9,IAREA+EX_CTR(r13)
-   std r9,HSTATE_SCRATCH1(r13)
-   __LOAD_FAR_HANDLER(r9, kvmppc_interrupt)
-   mtctr   r9
-   ld  r9,IAREA+EX_R9(r13)
-   bctr
-#else
-   ld  r9,IAREA+EX_R9(r13)
b   kvmppc_interrupt
-#endif
-
 
.if IKVM_SKIP
 89:mtocrf  0x80,r9
+   ld  r10,PACA_EXGEN+EX_CTR(r13)
+   mtctr   r10
ld  r9,IAREA+EX_R9(r13)
ld  r10,IAREA+EX_R10(r13)
+   ld  r11,IAREA+EX_R11(r13)
+   ld  r12,IAREA+EX_R12(r13)
.if IHSRR == EXC_HV_OR_STD
BEGIN_FTR_SECTION
b   kvmppc_skip_Hinterrupt
@@ -407,11 +394,6 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
mfctr   r10
std r10,IAREA+EX_CTR(r13)
mfcrr9
-
-   .if (!\virt && IKVM_REAL) || (\virt && IKVM_VIRT)
-   KVMTEST \name IHSRR IVEC
-   .endif
-
std r11,IAREA+EX_R11(r13)
std r12,IAREA+EX_R12(r13)
 
@@ -475,6 +457,10 @@ END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
 .macro __GEN_COMMON_ENTRY name
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
+   .if IKVM_REAL
+   KVMTEST \name IHSRR IVEC
+   .endif
+
ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
/* MSR[RI] is clear iff using SRR regs */
.if IHSRR == EXC_HV_OR_STD
@@ -487,9 +473,17 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
mtmsrd  r10
 
.if IVIRT
+   .if IKVM_VIRT
+   b   1f /* skip the virt test coming from real */
+   .endif
+
.balign IFETCH_ALIGN_BYTES
 DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 \name\()_common_virt:
+   .if IKVM_VIRT
+   KVMTEST \name IHSRR IVEC
+1:
+   .endif
.endif /* IVIRT */
 .endm
 
@@ -848,8 +842,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE |

[PATCH v3 15/32] powerpc/64s/exception: trim unused arguments from KVMTEST macro

2020-02-25 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index feb563416abd..7e056488d42a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -224,7 +224,7 @@ do_define_int n
 #define kvmppc_interrupt kvmppc_interrupt_pr
 #endif
 
-.macro KVMTEST name, hsrr, n
+.macro KVMTEST name
lbz r10,HSTATE_IN_GUEST(r13)
cmpwi   r10,0
bne \name\()_kvm
@@ -293,7 +293,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 .endm
 
 #else
-.macro KVMTEST name, hsrr, n
+.macro KVMTEST name
 .endm
 .macro GEN_KVM name
 .endm
@@ -437,7 +437,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
 DEFINE_FIXED_SYMBOL(\name\()_common_real)
 \name\()_common_real:
.if IKVM_REAL
-   KVMTEST \name IHSRR IVEC
+   KVMTEST \name
.endif
 
ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
@@ -460,7 +460,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
 DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 \name\()_common_virt:
.if IKVM_VIRT
-   KVMTEST \name IHSRR IVEC
+   KVMTEST \name
 1:
.endif
.endif /* IVIRT */
@@ -1582,7 +1582,7 @@ INT_DEFINE_END(system_call)
GET_PACA(r13)
std r10,PACA_EXGEN+EX_R10(r13)
INTERRUPT_TO_KERNEL
-   KVMTEST system_call EXC_STD 0xc00 /* uses r10, branch to 
system_call_kvm */
+   KVMTEST system_call /* uses r10, branch to system_call_kvm */
mfctr   r9
 #else
mr  r9,r13
-- 
2.23.0



[PATCH v3 16/32] powerpc/64s/exception: hdecrementer avoid touching the stack

2020-02-25 Thread Nicholas Piggin
The hdec interrupt handler is reported to sometimes fire in Linux if
KVM leaves it pending after a guest exists. This is harmless, so there
is a no-op handler for it.

The interrupt handler currently uses the regular kernel stack. Change
this to avoid touching the stack entirely.

This should be the last place where the regular Linux stack can be
accessed with asynchronous interrupts (including PMI) soft-masked.
It might be possible to take advantage of this invariant, e.g., to
context switch the kernel stack SLB entry without clearing MSR[EE].

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/time.h  |  1 -
 arch/powerpc/kernel/exceptions-64s.S | 25 -
 arch/powerpc/kernel/time.c   |  9 -
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 08dbe3e6831c..e0107495c4de 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -24,7 +24,6 @@ extern struct clock_event_device decrementer_clockevent;
 
 
 extern void generic_calibrate_decr(void);
-extern void hdec_interrupt(struct pt_regs *regs);
 
 /* Some sane defaults: 125 MHz timebase, 1GHz processor */
 extern unsigned long ppc_proc_freq;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 7e056488d42a..f87dc4bf937d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1491,6 +1491,8 @@ EXC_COMMON_BEGIN(decrementer_common)
 INT_DEFINE_BEGIN(hdecrementer)
IVEC=0x980
IHSRR=EXC_HV
+   ISTACK=0
+   IRECONCILE=0
IKVM_REAL=1
IKVM_VIRT=1
 INT_DEFINE_END(hdecrementer)
@@ -1502,11 +1504,24 @@ EXC_VIRT_BEGIN(hdecrementer, 0x4980, 0x80)
GEN_INT_ENTRY hdecrementer, virt=1
 EXC_VIRT_END(hdecrementer, 0x4980, 0x80)
 EXC_COMMON_BEGIN(hdecrementer_common)
-   GEN_COMMON hdecrementer
-   bl  save_nvgprs
-   addir3,r1,STACK_FRAME_OVERHEAD
-   bl  hdec_interrupt
-   b   ret_from_except
+   __GEN_COMMON_ENTRY hdecrementer
+   /*
+* Hypervisor decrementer interrupts not caught by the KVM test
+* shouldn't occur but are sometimes left pending on exit from a KVM
+* guest.  We don't need to do anything to clear them, as they are
+* edge-triggered.
+*
+* Be careful to avoid touching the kernel stack.
+*/
+   ld  r10,PACA_EXGEN+EX_CTR(r13)
+   mtctr   r10
+   mtcrf   0x80,r9
+   ld  r9,PACA_EXGEN+EX_R9(r13)
+   ld  r10,PACA_EXGEN+EX_R10(r13)
+   ld  r11,PACA_EXGEN+EX_R11(r13)
+   ld  r12,PACA_EXGEN+EX_R12(r13)
+   ld  r13,PACA_EXGEN+EX_R13(r13)
+   HRFI_TO_KERNEL
 
GEN_KVM hdecrementer
 
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 1168e8b37e30..bda9cb4a0a5f 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -663,15 +663,6 @@ void timer_broadcast_interrupt(void)
 }
 #endif
 
-/*
- * Hypervisor decrementer interrupts shouldn't occur but are sometimes
- * left pending on exit from a KVM guest.  We don't need to do anything
- * to clear them, as they are edge-triggered.
- */
-void hdec_interrupt(struct pt_regs *regs)
-{
-}
-
 #ifdef CONFIG_SUSPEND
 static void generic_suspend_disable_irqs(void)
 {
-- 
2.23.0



[PATCH v3 17/32] powerpc/64s/exception: re-inline some handlers

2020-02-25 Thread Nicholas Piggin
The reduction in interrupt entry size allows some handlers to be
re-inlined.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index f87dc4bf937d..ae0e68899f0e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1186,7 +1186,7 @@ INT_DEFINE_BEGIN(data_access)
 INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
-   GEN_INT_ENTRY data_access, virt=0, ool=1
+   GEN_INT_ENTRY data_access, virt=0
 EXC_REAL_END(data_access, 0x300, 0x80)
 EXC_VIRT_BEGIN(data_access, 0x4300, 0x80)
GEN_INT_ENTRY data_access, virt=1
@@ -1216,7 +1216,7 @@ INT_DEFINE_BEGIN(data_access_slb)
 INT_DEFINE_END(data_access_slb)
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-   GEN_INT_ENTRY data_access_slb, virt=0, ool=1
+   GEN_INT_ENTRY data_access_slb, virt=0
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
GEN_INT_ENTRY data_access_slb, virt=1
@@ -1472,7 +1472,7 @@ INT_DEFINE_BEGIN(decrementer)
 INT_DEFINE_END(decrementer)
 
 EXC_REAL_BEGIN(decrementer, 0x900, 0x80)
-   GEN_INT_ENTRY decrementer, virt=0, ool=1
+   GEN_INT_ENTRY decrementer, virt=0
 EXC_REAL_END(decrementer, 0x900, 0x80)
 EXC_VIRT_BEGIN(decrementer, 0x4900, 0x80)
GEN_INT_ENTRY decrementer, virt=1
-- 
2.23.0



[PATCH v3 18/32] powerpc/64s/exception: Clean up SRR specifiers

2020-02-25 Thread Nicholas Piggin
Remove more magic numbers and replace with nicely named bools.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 68 +---
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ae0e68899f0e..b01ff51892dc 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -105,11 +105,6 @@ name:
ori reg,reg,(ABS_ADDR(label))@l;\
addis   reg,reg,(ABS_ADDR(label))@h
 
-/* Exception register prefixes */
-#define EXC_HV_OR_STD  2 /* depends on HVMODE */
-#define EXC_HV 1
-#define EXC_STD0
-
 /*
  * Branch to label using its 0xC000 address. This results in instruction
  * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
@@ -128,6 +123,7 @@ name:
  */
 #define IVEC   .L_IVEC_\name\()
 #define IHSRR  .L_IHSRR_\name\()
+#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\()
 #define IAREA  .L_IAREA_\name\()
 #define IVIRT  .L_IVIRT_\name\()
 #define IISIDE .L_IISIDE_\name\()
@@ -159,7 +155,10 @@ do_define_int n
.error "IVEC not defined"
.endif
.ifndef IHSRR
-   IHSRR=EXC_STD
+   IHSRR=0
+   .endif
+   .ifndef IHSRR_IF_HVMODE
+   IHSRR_IF_HVMODE=0
.endif
.ifndef IAREA
IAREA=PACA_EXGEN
@@ -257,7 +256,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r9,IAREA+EX_R9(r13)
ld  r10,IAREA+EX_R10(r13)
/* HSRR variants have the 0x2 bit added to their trap number */
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
ori r12,r12,(IVEC + 0x2)
FTR_SECTION_ELSE
@@ -278,7 +277,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r10,IAREA+EX_R10(r13)
ld  r11,IAREA+EX_R11(r13)
ld  r12,IAREA+EX_R12(r13)
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
b   kvmppc_skip_Hinterrupt
FTR_SECTION_ELSE
@@ -403,7 +402,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
stw r10,IAREA+EX_DSISR(r13)
.endif
 
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
@@ -485,7 +484,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
.abort "Bad maskable vector"
.endif
 
-   .if IHSRR == EXC_HV_OR_STD
+   .if IHSRR_IF_HVMODE
BEGIN_FTR_SECTION
bne masked_Hinterrupt
FTR_SECTION_ELSE
@@ -618,12 +617,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
  * Restore all registers including H/SRR0/1 saved in a stack frame of a
  * standard exception.
  */
-.macro EXCEPTION_RESTORE_REGS hsrr
+.macro EXCEPTION_RESTORE_REGS hsrr=0
/* Move original SRR0 and SRR1 into the respective regs */
ld  r9,_MSR(r1)
-   .if \hsrr == EXC_HV_OR_STD
-   .error "EXC_HV_OR_STD Not implemented for EXCEPTION_RESTORE_REGS"
-   .endif
.if \hsrr
mtspr   SPRN_HSRR1,r9
.else
@@ -898,7 +894,7 @@ EXC_COMMON_BEGIN(system_reset_common)
ld  r10,SOFTE(r1)
stb r10,PACAIRQSOFTMASK(r13)
 
-   EXCEPTION_RESTORE_REGS EXC_STD
+   EXCEPTION_RESTORE_REGS
RFI_TO_USER_OR_KERNEL
 
GEN_KVM system_reset
@@ -952,7 +948,7 @@ TRAMP_REAL_BEGIN(machine_check_fwnmi)
lhz r12,PACA_IN_MCE(r13);   \
subir12,r12,1;  \
sth r12,PACA_IN_MCE(r13);   \
-   EXCEPTION_RESTORE_REGS EXC_STD
+   EXCEPTION_RESTORE_REGS
 
 EXC_COMMON_BEGIN(machine_check_early_common)
/*
@@ -1321,7 +1317,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(hardware_interrupt)
IVEC=0x500
-   IHSRR=EXC_HV_OR_STD
+   IHSRR_IF_HVMODE=1
IMASK=IRQS_DISABLED
IKVM_REAL=1
IKVM_VIRT=1
@@ -1490,7 +1486,7 @@ EXC_COMMON_BEGIN(decrementer_common)
 
 INT_DEFINE_BEGIN(hdecrementer)
IVEC=0x980
-   IHSRR=EXC_HV
+   IHSRR=1
ISTACK=0
IRECONCILE=0
IKVM_REAL=1
@@ -1719,7 +1715,7 @@ EXC_COMMON_BEGIN(single_step_common)
 
 INT_DEFINE_BEGIN(h_data_storage)
IVEC=0xe00
-   IHSRR=EXC_HV
+   IHSRR=1
IDAR=1
IDSISR=1
IKVM_SKIP=1
@@ -1751,7 +1747,7 @@ ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
 
 INT_DEFINE_BEGIN(h_instr_storage)
IVEC=0xe20
-   IHSRR=EXC_HV
+   IHSRR=1
IKVM_REAL=1
IKVM_VIRT=1
 INT_DEFINE_END(h_instr_storage)
@@ -1774,7 +1770,7 @@ EXC_COMMON_BEGIN(h_instr_storage_common)
 
 INT_DEFINE_BEGIN(emulation_assist)
IVEC=0xe40
-   IHSR

[PATCH v3 19/32] powerpc/64s/exception: add more comments for interrupt handlers

2020-02-25 Thread Nicholas Piggin
A few of the non-standard handlers are left uncommented. Some more
description could be added to some.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 391 ---
 1 file changed, 353 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index b01ff51892dc..e976cbf4f4aa 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -121,26 +121,26 @@ name:
 /*
  * Interrupt code generation macros
  */
-#define IVEC   .L_IVEC_\name\()
-#define IHSRR  .L_IHSRR_\name\()
-#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\()
-#define IAREA  .L_IAREA_\name\()
-#define IVIRT  .L_IVIRT_\name\()
-#define IISIDE .L_IISIDE_\name\()
-#define IDAR   .L_IDAR_\name\()
-#define IDSISR .L_IDSISR_\name\()
-#define ISET_RI.L_ISET_RI_\name\()
-#define IBRANCH_TO_COMMON  .L_IBRANCH_TO_COMMON_\name\()
-#define IREALMODE_COMMON   .L_IREALMODE_COMMON_\name\()
-#define IMASK  .L_IMASK_\name\()
-#define IKVM_SKIP  .L_IKVM_SKIP_\name\()
-#define IKVM_REAL  .L_IKVM_REAL_\name\()
+#define IVEC   .L_IVEC_\name\()/* Interrupt vector address */
+#define IHSRR  .L_IHSRR_\name\()   /* Sets SRR or HSRR registers */
+#define IHSRR_IF_HVMODE.L_IHSRR_IF_HVMODE_\name\() /* HSRR if HV else 
SRR */
+#define IAREA  .L_IAREA_\name\()   /* PACA save area */
+#define IVIRT  .L_IVIRT_\name\()   /* Has virt mode entry point */
+#define IISIDE .L_IISIDE_\name\()  /* Uses SRR0/1 not DAR/DSISR */
+#define IDAR   .L_IDAR_\name\()/* Uses DAR (or SRR0) */
+#define IDSISR .L_IDSISR_\name\()  /* Uses DSISR (or SRR1) */
+#define ISET_RI.L_ISET_RI_\name\() /* Run common code w/ 
MSR[RI]=1 */
+#define IBRANCH_TO_COMMON  .L_IBRANCH_TO_COMMON_\name\() /* ENTRY branch 
to common */
+#define IREALMODE_COMMON   .L_IREALMODE_COMMON_\name\() /* Common runs in 
realmode */
+#define IMASK  .L_IMASK_\name\()   /* IRQ soft-mask bit */
+#define IKVM_SKIP  .L_IKVM_SKIP_\name\()   /* Generate KVM skip handler */
+#define IKVM_REAL  .L_IKVM_REAL_\name\()   /* Real entry tests KVM */
 #define __IKVM_REAL(name)  .L_IKVM_REAL_ ## name
-#define IKVM_VIRT  .L_IKVM_VIRT_\name\()
-#define ISTACK .L_ISTACK_\name\()
+#define IKVM_VIRT  .L_IKVM_VIRT_\name\()   /* Virt entry tests KVM */
+#define ISTACK .L_ISTACK_\name\()  /* Set regular kernel stack */
 #define __ISTACK(name) .L_ISTACK_ ## name
-#define IRECONCILE .L_IRECONCILE_\name\()
-#define IKUAP  .L_IKUAP_\name\()
+#define IRECONCILE .L_IRECONCILE_\name\()  /* Do RECONCILE_IRQ_STATE */
+#define IKUAP  .L_IKUAP_\name\()   /* Do KUAP lock */
 
 #define INT_DEFINE_BEGIN(n)\
 .macro int_define_ ## n name
@@ -759,6 +759,39 @@ __start_interrupts:
 EXC_VIRT_NONE(0x4000, 0x100)
 
 
+/**
+ * Interrupt 0x100 - System Reset Interrupt (SRESET aka NMI).
+ * This is a non-maskable, asynchronous interrupt always taken in real-mode.
+ * It is caused by:
+ * - Wake from power-saving state, on powernv.
+ * - An NMI from another CPU, triggered by firmware or hypercall.
+ * - As crash/debug signal injected from BMC, firmware or hypervisor.
+ *
+ * Handling:
+ * Power-save wakeup is the only performance critical path, so this is
+ * determined quickly as possible first. In this case volatile registers
+ * can be discarded and SPRs like CFAR don't need to be read.
+ *
+ * If not a powersave wakeup, then it's run as a regular interrupt, however
+ * it uses its own stack and PACA save area to preserve the regular kernel
+ * environment for debugging.
+ *
+ * This interrupt is not maskable, so triggering it when MSR[RI] is clear,
+ * or SCRATCH0 is in use, etc. may cause a crash. It's also not entirely
+ * correct to switch to virtual mode to run the regular interrupt handler
+ * because it might be interrupted when the MMU is in a bad state (e.g., SLB
+ * is clear).
+ *
+ * FWNMI:
+ * PAPR specifies a "fwnmi" facility which sends the sreset to a different
+ * entry point with a different register set up. Some hypervisors will
+ * send the sreset to 0x100 in the guest if it is not fwnmi capable.
+ *
+ * KVM:
+ * Unlike most SRR interrupts, this may be taken by the host while executing
+ * in a guest, so a KVM test is required. KVM will pull the CPU out of guest
+ * mode and then raise the sreset.
+ */
 INT_DEFINE_BEGIN(system_reset)
IVEC=0x100
IAREA=PACA_EXNMI
@@ -834,6 +867,7 @@ TRAMP_REAL_BEGIN(system_reset_idle_wake)
  * Vectors for the FWNMI option.  Share common code.
  */
 TRAMP_REAL_BEGIN(system_reset_fwnmi)
+   /* XXX: fwnmi guest could run a nested/PR guest, so why no test?  */
__IKVM_REAL(system_reset)=0
GEN_INT_ENTR

[PATCH v3 20/32] powerpc/64s/exception: only test KVM in SRR interrupts when PR KVM is supported

2020-02-25 Thread Nicholas Piggin
Apart from SRESET, MCE, and syscall (hcall variant), the SRR type
interrupts are not escalated to hypervisor mode, so delivered to the OS.

When running PR KVM, the OS is the hypervisor, and the guest runs with
MSR[PR]=1, so these interrupts must test if a guest was running when
interrupted. These tests are required at the real-mode entry points
because the PR KVM host runs with LPCR[AIL]=0.

In HV KVM and nested HV KVM, the guest always receives these interrupts,
so there is no need for the host to make this test. So remove the tests
if PR KVM is not configured.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 65 ++--
 1 file changed, 62 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index e976cbf4f4aa..c23eb9c572b2 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -214,9 +214,36 @@ do_define_int n
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
- * If hv is possible, interrupts come into to the hv version
- * of the kvmppc_interrupt code, which then jumps to the PR handler,
- * kvmppc_interrupt_pr, if the guest is a PR guest.
+ * All interrupts which set HSRR registers, as well as SRESET and MCE and
+ * syscall when invoked with "sc 1" switch to MSR[HV]=1 (HVMODE) to be taken,
+ * so they all generally need to test whether they were taken in guest context.
+ *
+ * Note: SRESET and MCE may also be sent to the guest by the hypervisor, and be
+ * taken with MSR[HV]=0.
+ *
+ * Interrupts which set SRR registers (with the above exceptions) do not
+ * elevate to MSR[HV]=1 mode, though most can be taken when running with
+ * MSR[HV]=1  (e.g., bare metal kernel and userspace). So these interrupts do
+ * not need to test whether a guest is running because they get delivered to
+ * the guest directly, including nested HV KVM guests.
+ *
+ * The exception is PR KVM, where the guest runs with MSR[PR]=1 and the host
+ * runs with MSR[HV]=0, so the host takes all interrupts on behalf of the
+ * guest. PR KVM runs with LPCR[AIL]=0 which causes interrupts to always be
+ * delivered to the real-mode entry point, therefore such interrupts only test
+ * KVM in their real mode handlers, and only when PR KVM is possible.
+ *
+ * Interrupts that are taken in MSR[HV]=0 and escalate to MSR[HV]=1 are always
+ * delivered in real-mode when the MMU is in hash mode because the MMU
+ * registers are not set appropriately to translate host addresses. In nested
+ * radix mode these can be delivered in virt-mode as the host translations are
+ * used implicitly (see: effective LPID, effective PID).
+ */
+
+/*
+ * If an interrupt is taken while a guest is running, it is immediately routed
+ * to KVM to handle. If both HV and PR KVM arepossible, KVM interrupts go first
+ * to kvmppc_interrupt_hv, which handles the PR guest case.
  */
 #define kvmppc_interrupt kvmppc_interrupt_hv
 #else
@@ -1258,8 +1285,10 @@ INT_DEFINE_BEGIN(data_access)
IVEC=0x300
IDAR=1
IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_SKIP=1
IKVM_REAL=1
+#endif
 INT_DEFINE_END(data_access)
 
 EXC_REAL_BEGIN(data_access, 0x300, 0x80)
@@ -1306,8 +1335,10 @@ INT_DEFINE_BEGIN(data_access_slb)
IAREA=PACA_EXSLB
IRECONCILE=0
IDAR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_SKIP=1
IKVM_REAL=1
+#endif
 INT_DEFINE_END(data_access_slb)
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
@@ -1357,7 +1388,9 @@ INT_DEFINE_BEGIN(instruction_access)
IISIDE=1
IDAR=1
IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(instruction_access)
 
 EXC_REAL_BEGIN(instruction_access, 0x400, 0x80)
@@ -1396,7 +1429,9 @@ INT_DEFINE_BEGIN(instruction_access_slb)
IRECONCILE=0
IISIDE=1
IDAR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(instruction_access_slb)
 
 EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
@@ -1488,7 +1523,9 @@ INT_DEFINE_BEGIN(alignment)
IVEC=0x600
IDAR=1
IDSISR=1
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(alignment)
 
 EXC_REAL_BEGIN(alignment, 0x600, 0x100)
@@ -1518,7 +1555,9 @@ EXC_COMMON_BEGIN(alignment_common)
  */
 INT_DEFINE_BEGIN(program_check)
IVEC=0x700
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(program_check)
 
 EXC_REAL_BEGIN(program_check, 0x700, 0x100)
@@ -1581,7 +1620,9 @@ EXC_COMMON_BEGIN(program_check_common)
 INT_DEFINE_BEGIN(fp_unavailable)
IVEC=0x800
IRECONCILE=0
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
IKVM_REAL=1
+#endif
 INT_DEFINE_END(fp_unavailable)
 
 EXC_REAL_BEGIN(fp_unavailable, 0x800, 0x100)
@@ -1643,7 +1684,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
 INT_DEFINE_BEGIN(decrementer)
IVEC=0x900
IMASK=IRQS_DISAB

[PATCH v3 21/32] powerpc/64s/exception: sreset interrupts reconcile fix

2020-02-25 Thread Nicholas Piggin
This adds IRQ_HARD_DIS to irq_happened. Although it doesn't seem to
matter much because we're not allowed to enable irqs in an NMI handler,
the soft-irq debugging code is becoming more strict about ensuring
IRQ_HARD_DIS is in sync with MSR[EE], this may help avoid asserts or
other issues.

Add a comments explaining why MCE does not have this. Early machine
check is generally much smaller and more contained code which will
explode if you look at it wrong anyway as it runs in real mode, though
there's an argument that we should do similar reconciling for the MCE
as well.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index c23eb9c572b2..6ff5ea236b17 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -920,18 +920,19 @@ EXC_COMMON_BEGIN(system_reset_common)
__GEN_COMMON_BODY system_reset
bl  save_nvgprs
/*
-* Set IRQS_ALL_DISABLED unconditionally so arch_irqs_disabled does
+* Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does
 * the right thing. We do not want to reconcile because that goes
 * through irq tracing which we don't want in NMI.
 *
-* Save PACAIRQHAPPENED because some code will do a hard disable
-* (e.g., xmon). So we want to restore this back to where it was
-* when we return. DAR is unused in the stack, so save it there.
+* Save PACAIRQHAPPENED to _DAR (otherwise unused), and set HARD_DIS
+* as we are running with MSR[EE]=0.
 */
li  r10,IRQS_ALL_DISABLED
stb r10,PACAIRQSOFTMASK(r13)
lbz r10,PACAIRQHAPPENED(r13)
std r10,_DAR(r1)
+   ori r10,r10,PACA_IRQ_HARD_DIS
+   stb r10,PACAIRQHAPPENED(r13)
 
addir3,r1,STACK_FRAME_OVERHEAD
bl  system_reset_exception
@@ -976,6 +977,11 @@ EXC_COMMON_BEGIN(system_reset_common)
  * error detected there), determines if it was recoverable and logs the
  * event.
  *
+ * This early code does not "reconcile" irq soft-mask state like SRESET or
+ * regular interrupts do, so irqs_disabled() among other things may not work
+ * properly (irq disable/enable already doesn't work because irq tracing can
+ * not work in real mode).
+ *
  * Then, depending on the execution context when the interrupt is taken, there
  * are 3 main actions:
  * - Executing in kernel mode. The event is queued with irq_work, which means
-- 
2.23.0



[PATCH v3 22/32] powerpc/64s/exception: soft nmi interrupt should not use ret_from_except

2020-02-25 Thread Nicholas Piggin
The soft nmi handler does not reconcile interrupt state, so it should
not return via the normal ret_from_except path. Return like other NMIs,
using the EXCEPTION_RESTORE_REGS macro.

This becomes important when the scv interrupt is implemented, which
must handle soft-masked interrupts that have r13 set to something other
than the PACA -- returning to kernel in this case must restore r13.

Signed-off-by: Nicholas Piggin 
---
v3:
- save/restore irq soft mask state like other NMIs rather than a normal
  reconcile, to avoid soft mask warnings or possibly worse.

 arch/powerpc/kernel/exceptions-64s.S | 29 +++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6ff5ea236b17..5ddfc32cacad 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -2713,6 +2713,7 @@ EXC_VIRT_NONE(0x5800, 0x100)
 INT_DEFINE_BEGIN(soft_nmi)
IVEC=0x900
ISTACK=0
+   IRECONCILE=0/* Soft-NMI may fire under local_irq_disable */
 INT_DEFINE_END(soft_nmi)
 
 /*
@@ -2731,9 +2732,35 @@ EXC_COMMON_BEGIN(soft_nmi_common)
subir1,r1,INT_FRAME_SIZE
__GEN_COMMON_BODY soft_nmi
bl  save_nvgprs
+
+   /*
+* Set IRQS_ALL_DISABLED and save PACAIRQHAPPENED (see
+* system_reset_common)
+*/
+   li  r10,IRQS_ALL_DISABLED
+   stb r10,PACAIRQSOFTMASK(r13)
+   lbz r10,PACAIRQHAPPENED(r13)
+   std r10,_DAR(r1)
+   ori r10,r10,PACA_IRQ_HARD_DIS
+   stb r10,PACAIRQHAPPENED(r13)
+
addir3,r1,STACK_FRAME_OVERHEAD
bl  soft_nmi_interrupt
-   b   ret_from_except
+
+   /* Clear MSR_RI before setting SRR0 and SRR1. */
+   li  r9,0
+   mtmsrd  r9,1
+
+   /*
+* Restore soft mask settings.
+*/
+   ld  r10,_DAR(r1)
+   stb r10,PACAIRQHAPPENED(r13)
+   ld  r10,SOFTE(r1)
+   stb r10,PACAIRQSOFTMASK(r13)
+
+   EXCEPTION_RESTORE_REGS hsrr=0
+   RFI_TO_KERNEL
 
 #endif /* CONFIG_PPC_WATCHDOG */
 
-- 
2.23.0



[PATCH v3 23/32] powerpc/64: system call remove non-volatile GPR save optimisation

2020-02-25 Thread Nicholas Piggin
powerpc has an optimisation where interrupts avoid saving the
non-volatile (or callee saved) registers to the interrupt stack frame if
they are not required.

Two problems with this are that an interrupt does not always know
whether it will need non-volatiles; and if it does need them, they can
only be saved from the entry-scoped asm code (because we don't control
what the C compiler does with these registers).

system calls are the most difficult: some system calls always require
all registers (e.g., fork, to copy regs into the child).  Sometimes
registers are only required under certain conditions (e.g., tracing,
signal delivery). These cases require ugly logic in the call chains
(e.g., ppc_fork), and require a lot of logic to be implemented in asm.

So remove the optimisation for system calls, and always save NVGPRs on
entry. Modern high performance CPUs are not so sensitive, because the
stores are dense in cache and can be hidden by other expensive work in
the syscall path -- the null syscall selftests benchmark on POWER9 is
not slowed (124.40ns before and 123.64ns after, i.e., within the noise).

Other interrupts retain the NVGPR optimisation for now.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S   | 72 +---
 arch/powerpc/kernel/syscalls/syscall.tbl | 22 +---
 2 files changed, 28 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6ba675b0cf7d..14afe12eae8c 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -98,13 +98,14 @@ END_BTB_FLUSH_SECTION
std r11,_XER(r1)
std r11,_CTR(r1)
std r9,GPR13(r1)
+   SAVE_NVGPRS(r1)
mflrr10
/*
 * This clears CR0.SO (bit 28), which is the error indication on
 * return from this system call.
 */
rldimi  r2,r11,28,(63-28)
-   li  r11,0xc01
+   li  r11,0xc00
std r10,_LINK(r1)
std r11,_TRAP(r1)
std r3,ORIG_GPR3(r1)
@@ -323,7 +324,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 /* Traced system call support */
 .Lsyscall_dotrace:
-   bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_syscall_trace_enter
 
@@ -408,7 +408,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mtmsrd  r10,1
 #endif /* CONFIG_PPC_BOOK3E */
 
-   bl  save_nvgprs
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_syscall_trace_leave
b   ret_from_except
@@ -442,62 +441,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 _ASM_NOKPROBE_SYMBOL(system_call_common);
 _ASM_NOKPROBE_SYMBOL(system_call_exit);
 
-/* Save non-volatile GPRs, if not already saved. */
-_GLOBAL(save_nvgprs)
-   ld  r11,_TRAP(r1)
-   andi.   r0,r11,1
-   beqlr-
-   SAVE_NVGPRS(r1)
-   clrrdi  r0,r11,1
-   std r0,_TRAP(r1)
-   blr
-_ASM_NOKPROBE_SYMBOL(save_nvgprs);
-
-   
-/*
- * The sigsuspend and rt_sigsuspend system calls can call do_signal
- * and thus put the process into the stopped state where we might
- * want to examine its user state with ptrace.  Therefore we need
- * to save all the nonvolatile registers (r14 - r31) before calling
- * the C code.  Similarly, fork, vfork and clone need the full
- * register state on the stack so that it can be copied to the child.
- */
-
-_GLOBAL(ppc_fork)
-   bl  save_nvgprs
-   bl  sys_fork
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_vfork)
-   bl  save_nvgprs
-   bl  sys_vfork
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_clone)
-   bl  save_nvgprs
-   bl  sys_clone
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_clone3)
-   bl  save_nvgprs
-   bl  sys_clone3
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc32_swapcontext)
-   bl  save_nvgprs
-   bl  compat_sys_swapcontext
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc64_swapcontext)
-   bl  save_nvgprs
-   bl  sys_swapcontext
-   b   .Lsyscall_exit
-
-_GLOBAL(ppc_switch_endian)
-   bl  save_nvgprs
-   bl  sys_switch_endian
-   b   .Lsyscall_exit
-
 _GLOBAL(ret_from_fork)
bl  schedule_tail
REST_NVGPRS(r1)
@@ -516,6 +459,17 @@ _GLOBAL(ret_from_kernel_thread)
li  r3,0
b   .Lsyscall_exit
 
+/* Save non-volatile GPRs, if not already saved. */
+_GLOBAL(save_nvgprs)
+   ld  r11,_TRAP(r1)
+   andi.   r0,r11,1
+   beqlr-
+   SAVE_NVGPRS(r1)
+   clrrdi  r0,r11,1
+   std r0,_TRAP(r1)
+   blr
+_ASM_NOKPROBE_SYMBOL(save_nvgprs);
+
 #ifdef CONFIG_PPC_BOOK3S_64
 
 #define FLUSH_COUNT_CACHE  \
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
index 35b61bfc1b1a..220ae11555f2 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -9,7 +9,9 @@
 #
 0  nospu   restar

[PATCH v3 24/32] powerpc/64: sstep ifdef the deprecated fast endian switch syscall

2020-02-25 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/lib/sstep.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index c077acb983a1..5f3a7bd9d90d 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -3179,8 +3179,9 @@ int emulate_step(struct pt_regs *regs, unsigned int instr)
 * entry code works.  If that is changed, this will
 * need to be changed also.
 */
-   if (regs->gpr[0] == 0x1ebe &&
-   cpu_has_feature(CPU_FTR_REAL_LE)) {
+   if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
+   cpu_has_feature(CPU_FTR_REAL_LE) &&
+   regs->gpr[0] == 0x1ebe) {
regs->msr ^= MSR_LE;
goto instr_done;
}
-- 
2.23.0



[PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C

2020-02-25 Thread Nicholas Piggin
System call entry and particularly exit code is beyond the limit of what
is reasonable to implement in asm.

This conversion moves all conditional branches out of the asm code,
except for the case that all GPRs should be restored at exit.

Null syscall test is about 5% faster after this patch, because the exit
work is handled under local_irq_disable, and the hard mask and pending
interrupt replay is handled after that, which avoids games with MSR.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michal Suchanek 
---

v2,rebase (from Michal):
- Add endian conversion for dtl_idx (ms)
- Fix sparse warning about missing declaration (ms)
- Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe)

v3: Fixes thanks to reports from mpe and selftests errors:
- Several soft-mask debug and unsafe smp_processor_id() warnings due to
  tracing and other false positives due to checks in "unreconciled" code.
- Fix a bug with syscall tracing functions that set registers (e.g.,
  PTRACE_SETREG) not setting GPRs properly.
- Fix silly tabort_syscall bug that causes kernel crashes when making system
  calls in transactional state.

 arch/powerpc/include/asm/asm-prototypes.h |  17 +-
 .../powerpc/include/asm/book3s/64/kup-radix.h |  14 +-
 arch/powerpc/include/asm/cputime.h|  29 ++
 arch/powerpc/include/asm/hw_irq.h |   4 +
 arch/powerpc/include/asm/ptrace.h |   3 +
 arch/powerpc/include/asm/signal.h |   3 +
 arch/powerpc/include/asm/switch_to.h  |   5 +
 arch/powerpc/include/asm/time.h   |   3 +
 arch/powerpc/kernel/Makefile  |   3 +-
 arch/powerpc/kernel/entry_64.S| 338 +++---
 arch/powerpc/kernel/signal.h  |   2 -
 arch/powerpc/kernel/syscall_64.c  | 213 +++
 arch/powerpc/kernel/systbl.S  |   9 +-
 13 files changed, 328 insertions(+), 315 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_64.c

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 983c0084fb3f..4b3609554e76 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, 
fd_set __user *exp,
 unsigned long __init early_init(unsigned long dt_ptr);
 void __init machine_init(u64 dt_ptr);
 #endif
+#ifdef CONFIG_PPC64
+long system_call_exception(long r3, long r4, long r5, long r6, long r7, long 
r8, unsigned long r0, struct pt_regs *regs);
+notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs 
*regs);
+notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, 
unsigned long msr);
+notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, 
unsigned long msr);
+#endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low);
@@ -104,14 +110,6 @@ long sys_switch_endian(void);
 notrace unsigned int __check_irq_replay(void);
 void notrace restore_interrupts(void);
 
-/* ptrace */
-long do_syscall_trace_enter(struct pt_regs *regs);
-void do_syscall_trace_leave(struct pt_regs *regs);
-
-/* process */
-void restore_math(struct pt_regs *regs);
-void restore_tm_state(struct pt_regs *regs);
-
 /* prom_init (OpenFirmware) */
 unsigned long __init prom_init(unsigned long r3, unsigned long r4,
   unsigned long pp,
@@ -122,9 +120,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned 
long r4,
 void __init early_setup(unsigned long dt_ptr);
 void early_setup_secondary(void);
 
-/* time */
-void accumulate_stolen_time(void);
-
 /* misc runtime */
 extern u64 __bswapdi2(u64);
 extern s64 __lshrdi3(s64, int);
diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 90dd3a3fc8c7..71081d90f999 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -3,6 +3,7 @@
 #define _ASM_POWERPC_BOOK3S_64_KUP_RADIX_H
 
 #include 
+#include 
 
 #define AMR_KUAP_BLOCK_READUL(0x4000)
 #define AMR_KUAP_BLOCK_WRITE   UL(0x8000)
@@ -56,7 +57,14 @@
 
 #ifdef CONFIG_PPC_KUAP
 
-#include 
+#include 
+#include 
+
+static inline void kuap_check_amr(void)
+{
+   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   WARN_ON_ONCE(mfspr(SPRN_AMR) != AMR_KUAP_BLOCKED);
+}
 
 /*
  * We support individually allowing read or write, but we don't support nesting
@@ -127,6 +135,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long 
address, bool is_write)
(regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : 
AMR_KUAP_BLOCK_READ)),
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
+#else /* CONFIG_PPC_KUAP */
+static inline void kuap_check_amr(void)
+{
+}
 #endif /* C

[PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning

2020-02-25 Thread Nicholas Piggin
Kernel addresses and potentially other sensitive data could be leaked
in volatile registers after a syscall.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7404290fa132..0e2c56573a41 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -135,6 +135,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
 
cmpdi   r3,0
bne .Lsyscall_restore_regs
+   li  r0,0
+   li  r4,0
+   li  r5,0
+   li  r6,0
+   li  r7,0
+   li  r8,0
+   li  r9,0
+   li  r10,0
+   li  r11,0
+   li  r12,0
+   mtctr   r0
+   mtspr   SPRN_XER,r0
 .Lsyscall_restore_regs_cont:
 
 BEGIN_FTR_SECTION
-- 
2.23.0



[PATCH v3 27/32] powerpc/64: implement soft interrupt replay in C

2020-02-25 Thread Nicholas Piggin
When local_irq_enable() finds a pending soft-masked interrupt, it
"replays" it by setting up registers like the initial interrupt entry,
then calls into the low level handler to set up an interrupt stack
frame and process the interrupt.

This is not necessary, and uses more stack than needed. The high level
interrupt handler can be called directly from C, with just pt_regs set
up on stack. This should be faster and use less stack.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/hw_irq.h|   1 -
 arch/powerpc/kernel/exceptions-64e.S |  32 --
 arch/powerpc/kernel/exceptions-64s.S |  47 
 arch/powerpc/kernel/irq.c| 165 +--
 4 files changed, 130 insertions(+), 115 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 310583e62bd9..0e9a9598f91f 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -52,7 +52,6 @@
 #ifndef __ASSEMBLY__
 
 extern void replay_system_reset(void);
-extern void __replay_interrupt(unsigned int vector);
 
 extern void timer_interrupt(struct pt_regs *);
 extern void timer_broadcast_interrupt(void);
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index e4076e3c072d..4efac5490216 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1002,38 +1002,6 @@ masked_interrupt_book3e_0x280:
 masked_interrupt_book3e_0x2c0:
masked_interrupt_book3e PACA_IRQ_DBELL 0
 
-/*
- * Called from arch_local_irq_enable when an interrupt needs
- * to be resent. r3 contains either 0x500,0x900,0x260 or 0x280
- * to indicate the kind of interrupt. MSR:EE is already off.
- * We generate a stackframe like if a real interrupt had happened.
- *
- * Note: While MSR:EE is off, we need to make sure that _MSR
- * in the generated frame has EE set to 1 or the exception
- * handler will not properly re-enable them.
- */
-_GLOBAL(__replay_interrupt)
-   /* We are going to jump to the exception common code which
-* will retrieve various register values from the PACA which
-* we don't give a damn about.
-*/
-   mflrr10
-   mfmsr   r11
-   mfcrr4
-   mtspr   SPRN_SPRG_GEN_SCRATCH,r13;
-   std r1,PACA_EXGEN+EX_R1(r13);
-   stw r4,PACA_EXGEN+EX_CR(r13);
-   ori r11,r11,MSR_EE
-   subir1,r1,INT_FRAME_SIZE;
-   cmpwi   cr0,r3,0x500
-   beq exc_0x500_common
-   cmpwi   cr0,r3,0x900
-   beq exc_0x900_common
-   cmpwi   cr0,r3,0x280
-   beq exc_0x280_common
-   blr
-
-
 /*
  * This is called from 0x300 and 0x400 handlers after the prologs with
  * r14 and r15 containing the fault address and error code, with the
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 5ddfc32cacad..bad8cd9e7dba 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -3146,50 +3146,3 @@ doorbell_super_common_msgclr:
LOAD_REG_IMMEDIATE(r3, PPC_DBELL_MSGTYPE << (63-36))
PPC_MSGCLRP(3)
b   doorbell_super_common
-
-/*
- * Called from arch_local_irq_enable when an interrupt needs
- * to be resent. r3 contains 0x500, 0x900, 0xa00 or 0xe80 to indicate
- * which kind of interrupt. MSR:EE is already off. We generate a
- * stackframe like if a real interrupt had happened.
- *
- * Note: While MSR:EE is off, we need to make sure that _MSR
- * in the generated frame has EE set to 1 or the exception
- * handler will not properly re-enable them.
- *
- * Note that we don't specify LR as the NIP (return address) for
- * the interrupt because that would unbalance the return branch
- * predictor.
- */
-_GLOBAL(__replay_interrupt)
-   /* We are going to jump to the exception common code which
-* will retrieve various register values from the PACA which
-* we don't give a damn about, so we don't bother storing them.
-*/
-   mfmsr   r12
-   LOAD_REG_ADDR(r11, replay_interrupt_return)
-   mfcrr9
-   ori r12,r12,MSR_EE
-   cmpwi   r3,0x900
-   beq decrementer_common
-   cmpwi   r3,0x500
-BEGIN_FTR_SECTION
-   beq h_virt_irq_common
-FTR_SECTION_ELSE
-   beq hardware_interrupt_common
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_300)
-   cmpwi   r3,0xf00
-   beq performance_monitor_common
-BEGIN_FTR_SECTION
-   cmpwi   r3,0xa00
-   beq h_doorbell_common_msgclr
-   cmpwi   r3,0xe60
-   beq hmi_exception_common
-FTR_SECTION_ELSE
-   cmpwi   r3,0xa00
-   beq doorbell_super_common_msgclr
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
-replay_interrupt_return:
-   blr
-
-_ASM_NOKPROBE_SYMBOL(__replay_interrupt)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5c9b11878555..afd74eba70aa 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.

[PATCH v3 28/32] powerpc/64s: interrupt implement exit logic in C

2020-02-25 Thread Nicholas Piggin
Implement the bulk of interrupt return logic in C. The asm return code
must handle a few cases: restoring full GPRs, and emulating stack store.

The stack store emulation is significantly simplfied, rather than creating
a new return frame and switching to that before performing the store, it
uses the PACA to keep a scratch register around to perform thestore.

The asm return code is moved into 64e for now. The new logic has made
allowance for 64e, but I don't have a full environment that works well
to test it, and even booting in emulated qemu is not great for stress
testing. 64e shouldn't be too far off working with this, given a bit
more testing and auditing of the logic.

This is slightly faster on a POWER9 (page fault speed increases about
1.1%), probably due to reduced mtmsrd.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Michal Suchanek 
---
v2,rebase (from Michal):
- Move the FP restore functions to restore_math. They are not used
  anywhere else and when restore_math is not built gcc warns about them
  being unused (ms)
- Add asm/context_tracking.h include to exceptions-64e.S for SCHEDULE_USER
  definition

v3:
- Fix return from interrupt replay problem by replaying interrupts rather
  than enabling irqs. This ends up being cleaner and __check_irq_replay
  goes away completely for 64s. Should bring 64e up to speed and kill a lot
  of cruft after it's proven on 64s.
- Don't use _GLOBAL if it's not called from C
- Simplify stack store emulation code further, add a bit more commenting.
- Some missing no probe annotations

 .../powerpc/include/asm/book3s/64/kup-radix.h |  10 +
 arch/powerpc/include/asm/hw_irq.h |   1 +
 arch/powerpc/include/asm/switch_to.h  |   6 +
 arch/powerpc/kernel/entry_64.S| 486 +-
 arch/powerpc/kernel/exceptions-64e.S  | 255 -
 arch/powerpc/kernel/exceptions-64s.S  | 119 ++---
 arch/powerpc/kernel/irq.c |  36 +-
 arch/powerpc/kernel/process.c |  89 ++--
 arch/powerpc/kernel/syscall_64.c  | 164 +-
 arch/powerpc/kernel/vector.S  |   2 +-
 10 files changed, 642 insertions(+), 526 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 71081d90f999..3bcef989a35d 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -60,6 +60,12 @@
 #include 
 #include 
 
+static inline void kuap_restore_amr(struct pt_regs *regs)
+{
+   if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   mtspr(SPRN_AMR, regs->kuap);
+}
+
 static inline void kuap_check_amr(void)
 {
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
@@ -136,6 +142,10 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long 
address, bool is_write)
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
 #else /* CONFIG_PPC_KUAP */
+static inline void kuap_restore_amr(struct pt_regs *regs)
+{
+}
+
 static inline void kuap_check_amr(void)
 {
 }
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 0e9a9598f91f..e0e71777961f 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -52,6 +52,7 @@
 #ifndef __ASSEMBLY__
 
 extern void replay_system_reset(void);
+extern void replay_soft_interrupts(void);
 
 extern void timer_interrupt(struct pt_regs *);
 extern void timer_broadcast_interrupt(void);
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 476008bc3d08..b867b58b1093 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -23,7 +23,13 @@ extern void switch_booke_debug_regs(struct debug_reg 
*new_debug);
 
 extern int emulate_altivec(struct pt_regs *);
 
+#ifdef CONFIG_PPC_BOOK3S_64
 void restore_math(struct pt_regs *regs);
+#else
+static inline void restore_math(struct pt_regs *regs)
+{
+}
+#endif
 
 void restore_tm_state(struct pt_regs *regs);
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0e2c56573a41..e13eac968dfc 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -221,6 +222,7 @@ _GLOBAL(ret_from_kernel_thread)
li  r3,0
b   .Lsyscall_exit
 
+#ifdef CONFIG_PPC_BOOK3E
 /* Save non-volatile GPRs, if not already saved. */
 _GLOBAL(save_nvgprs)
ld  r11,_TRAP(r1)
@@ -231,6 +233,7 @@ _GLOBAL(save_nvgprs)
std r0,_TRAP(r1)
blr
 _ASM_NOKPROBE_SYMBOL(save_nvgprs);
+#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
 
@@ -294,7 +297,7 @@ flush_count_cache:
  * state of one is saved on its kernel stack.  Then the state
  * of the other is restored from its kernel stack.  The memory
  * management hardware is updated to the second process's state.
-

[PATCH v3 29/32] powerpc/64s/exception: remove lite interrupt return

2020-02-25 Thread Nicholas Piggin
The difference between lite and regular returns is that the lite case
restores all NVGPRs, whereas lite skips that. This is quite clumsy
though, most interrupts want the NVGPRs saved for debugging, not to
modify in the caller, so the NVGPRs restore is not necessary most of
the time. Restore NVGPRs explicitly for one case that requires it,
and move everything else over to avoiding the restore unless the
interrupt return demands it (e.g., handling a signal).

Signed-off-by: Nicholas Piggin 
---
v3:
- Add a copule of missing restore cases for instruction emulation

 arch/powerpc/kernel/entry_64.S   |  6 --
 arch/powerpc/kernel/exceptions-64s.S | 24 ++--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index e13eac968dfc..6d5464f83c05 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -471,12 +471,6 @@ _ASM_NOKPROBE_SYMBOL(fast_interrupt_return)
.globl interrupt_return
 interrupt_return:
 _ASM_NOKPROBE_SYMBOL(interrupt_return)
-   REST_NVGPRS(r1)
-
-   .balign IFETCH_ALIGN_BYTES
-   .globl interrupt_return_lite
-interrupt_return_lite:
-_ASM_NOKPROBE_SYMBOL(interrupt_return_lite)
ld  r4,_MSR(r1)
andi.   r0,r4,MSR_PR
beq .Lkernel_interrupt_return
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index d635fd4e40ea..b53e452cbca0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1513,7 +1513,7 @@ EXC_COMMON_BEGIN(hardware_interrupt_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_IRQ
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM hardware_interrupt
 
@@ -1541,6 +1541,7 @@ EXC_COMMON_BEGIN(alignment_common)
GEN_COMMON alignment
addir3,r1,STACK_FRAME_OVERHEAD
bl  alignment_exception
+   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
b   interrupt_return
 
GEN_KVM alignment
@@ -1604,6 +1605,7 @@ EXC_COMMON_BEGIN(program_check_common)
 3:
addir3,r1,STACK_FRAME_OVERHEAD
bl  program_check_exception
+   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
b   interrupt_return
 
GEN_KVM program_check
@@ -1700,7 +1702,7 @@ EXC_COMMON_BEGIN(decrementer_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  timer_interrupt
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM decrementer
 
@@ -1791,7 +1793,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
 #else
bl  unknown_exception
 #endif
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM doorbell_super
 
@@ -2060,6 +2062,7 @@ EXC_COMMON_BEGIN(emulation_assist_common)
GEN_COMMON emulation_assist
addir3,r1,STACK_FRAME_OVERHEAD
bl  emulation_assist_interrupt
+   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
b   interrupt_return
 
GEN_KVM emulation_assist
@@ -2176,7 +2179,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
 #else
bl  unknown_exception
 #endif
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM h_doorbell
 
@@ -2206,7 +2209,7 @@ EXC_COMMON_BEGIN(h_virt_irq_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_IRQ
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM h_virt_irq
 
@@ -2253,7 +2256,7 @@ EXC_COMMON_BEGIN(performance_monitor_common)
RUNLATCH_ON
addir3,r1,STACK_FRAME_OVERHEAD
bl  performance_monitor_exception
-   b   interrupt_return_lite
+   b   interrupt_return
 
GEN_KVM performance_monitor
 
@@ -2650,6 +2653,7 @@ EXC_COMMON_BEGIN(altivec_assist_common)
addir3,r1,STACK_FRAME_OVERHEAD
 #ifdef CONFIG_ALTIVEC
bl  altivec_assist_exception
+   REST_NVGPRS(r1) /* instruction emulation may change GPRs */
 #else
bl  unknown_exception
 #endif
@@ -3038,7 +3042,7 @@ do_hash_page:
 cmpdi  r3,0/* see if __hash_page succeeded */
 
/* Success */
-   beq interrupt_return_lite   /* Return from exception on success */
+   beq interrupt_return/* Return from exception on success */
 
/* Error */
blt-13f
@@ -3055,7 +3059,7 @@ handle_page_fault:
addir3,r1,STACK_FRAME_OVERHEAD
bl  do_page_fault
cmpdi   r3,0
-   beq+interrupt_return_lite
+   beq+interrupt_return
mr  r5,r3
addir3,r1,STACK_FRAME_OVERHEAD
ld  r4,_DAR(r1)
@@ -3070,9 +3074,9 @@ handle_dabr_fault:
bl  do_break
/*
 * do_break() may have changed the NV GPRS while h

[PATCH v3 30/32] powerpc/64: system call reconcile interrupts

2020-02-25 Thread Nicholas Piggin
This reconciles interrupts in the system call case like all other
interrupts. This allows system_call_common to be shared with the
scv system call implementation in a subsequent patch.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S   | 11 +++
 arch/powerpc/kernel/syscall_64.c | 28 +---
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6d5464f83c05..8406812c9734 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -113,6 +113,17 @@ END_BTB_FLUSH_SECTION
ld  r11,exception_marker@toc(r2)
std r11,-16(r10)/* "regshere" marker */
 
+   /*
+* RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which
+* would clobber syscall parameters. Also we always enter with IRQs
+* enabled and nothing pending. system_call_exception() will call
+* trace_hardirqs_off().
+*/
+   li  r11,IRQS_ALL_DISABLED
+   li  r12,PACA_IRQ_HARD_DIS
+   stb r11,PACAIRQSOFTMASK(r13)
+   stb r12,PACAIRQHAPPENED(r13)
+
/* Calling convention has r9 = orig r0, r10 = regs */
mr  r9,r0
bl  system_call_exception
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 08e0bebbd3b6..32601a572ff0 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -19,13 +19,19 @@ extern void __noreturn tabort_syscall(unsigned long nip, 
unsigned long msr);
 
 typedef long (*syscall_fn)(long, long, long, long, long, long);
 
-/* Has to run notrace because it is entered "unreconciled" */
-notrace long system_call_exception(long r3, long r4, long r5, long r6, long 
r7, long r8,
-  unsigned long r0, struct pt_regs *regs)
+/* Has to run notrace because it is entered not completely "reconciled" */
+notrace long system_call_exception(long r3, long r4, long r5,
+  long r6, long r7, long r8,
+  unsigned long r0, struct pt_regs *regs)
 {
unsigned long ti_flags;
syscall_fn f;
 
+   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
+   BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
+
+   trace_hardirqs_off(); /* finish reconciling */
+
if (IS_ENABLED(CONFIG_PPC_BOOK3S))
BUG_ON(!(regs->msr & MSR_RI));
BUG_ON(!(regs->msr & MSR_PR));
@@ -33,8 +39,10 @@ notrace long system_call_exception(long r3, long r4, long 
r5, long r6, long r7,
BUG_ON(regs->softe != IRQS_ENABLED);
 
if (IS_ENABLED(CONFIG_PPC_TRANSACTIONAL_MEM) &&
-   unlikely(regs->msr & MSR_TS_T))
+   unlikely(regs->msr & MSR_TS_T)) {
+   local_irq_enable();
tabort_syscall(regs->nip, regs->msr);
+   }
 
account_cpu_user_entry();
 
@@ -50,16 +58,6 @@ notrace long system_call_exception(long r3, long r4, long 
r5, long r6, long r7,
 
kuap_check_amr();
 
-   /*
-* A syscall should always be called with interrupts enabled
-* so we just unconditionally hard-enable here. When some kind
-* of irq tracing is used, we additionally check that condition
-* is correct
-*/
-   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG)) {
-   WARN_ON(irq_soft_mask_return() != IRQS_ENABLED);
-   WARN_ON(local_paca->irq_happened);
-   }
/*
 * This is not required for the syscall exit path, but makes the
 * stack frame look nicer. If this was initialised in the first stack
@@ -68,7 +66,7 @@ notrace long system_call_exception(long r3, long r4, long r5, 
long r6, long r7,
 */
regs->softe = IRQS_ENABLED;
 
-   __hard_irq_enable();
+   local_irq_enable();
 
ti_flags = current_thread_info()->flags;
if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
-- 
2.23.0



[PATCH v3 31/32] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked

2020-02-25 Thread Nicholas Piggin
The scv instruction causes an interrupt which can enter the kernel with
MSR[EE]=1, thus allowing interrupts to hit at any time. These must not
be taken as normal interrupts, because they come from MSR[PR]=0 context,
and yet the kernel stack is not yet set up and r13 is not set to the
PACA).

Treat this as a soft-masked interrupt regardless of the soft masked
state. This does not affect behaviour yet, because currently all
interrupts are taken with MSR[EE]=0.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index b53e452cbca0..7a6be3f32973 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -494,8 +494,24 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 
 .macro __GEN_COMMON_BODY name
.if IMASK
+   .if ! ISTACK
+   .error "No support for masked interrupt to use custom stack"
+   .endif
+
+   /* If coming from user, skip soft-mask tests. */
+   andi.   r10,r12,MSR_PR
+   bne 2f
+
+   /* Kernel code running below __end_interrupts is implicitly
+* soft-masked */
+   LOAD_HANDLER(r10, __end_interrupts)
+   cmpdr11,r10
+   li  r10,IMASK
+   blt-1f
+
+   /* Test the soft mask state against our interrupt's bit */
lbz r10,PACAIRQSOFTMASK(r13)
-   andi.   r10,r10,IMASK
+1: andi.   r10,r10,IMASK
/* Associate vector numbers with bits in paca->irq_happened */
.if IVEC == 0x500 || IVEC == 0xea0
li  r10,PACA_IRQ_EE
@@ -526,7 +542,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_virt)
 
.if ISTACK
andi.   r10,r12,MSR_PR  /* See if coming from user  */
-   mr  r10,r1  /* Save r1  */
+2: mr  r10,r1  /* Save r1  */
subir1,r1,INT_FRAME_SIZE/* alloc frame on kernel stack  */
beq-100f
ld  r1,PACAKSAVE(r13)   /* kernel stack to use  */
@@ -2791,7 +2807,8 @@ masked_interrupt:
ld  r10,PACA_EXGEN+EX_R10(r13)
ld  r11,PACA_EXGEN+EX_R11(r13)
ld  r12,PACA_EXGEN+EX_R12(r13)
-   /* returns to kernel where r13 must be set up, so don't restore it */
+   ld  r13,PACA_EXGEN+EX_R13(r13)
+   /* May return to masked low address where r13 is not set up */
.if \hsrr
HRFI_TO_KERNEL
.else
@@ -2950,6 +2967,10 @@ EXC_COMMON_BEGIN(ppc64_runlatch_on_trampoline)
 
 USE_FIXED_SECTION(virt_trampolines)
/*
+* All code below __end_interrupts is treated as soft-masked. If
+* any code runs here with MSR[EE]=1, it must then cope with pending
+* soft interrupt being raised (i.e., by ensuring it is replayed).
+*
 * The __end_interrupts marker must be past the out-of-line (OOL)
 * handlers, so that they are copied to real address 0x100 when running
 * a relocatable kernel. This ensures they can be reached from the short
-- 
2.23.0



[PATCH v3 32/32] powerpc/64s: system call support for scv/rfscv instructions

2020-02-25 Thread Nicholas Piggin
Add support for the scv instruction on POWER9 and later CPUs.

For now this implements the zeroth scv vector 'scv 0', as identical
to 'sc' system calls, with the exception that lr is not preserved, and
it is 64-bit only. There may yet be changes made to this ABI, so it's
for testing only.

rfscv is implemented to return from scv type system calls. It can not
be used to return from sc system calls because those are defined to
preserve lr.

In a comparison of getpid syscall, the test program had scv taking
about 3 more cycles in user mode (92 vs 89 for sc), due to lr handling.
getpid syscall throughput on POWER9 is improved by 33%, mostly due to
reducing mtmsr and mtspr.

Signed-off-by: Nicholas Piggin 
---
 Documentation/powerpc/syscall64-abi.rst   |  42 +---
 arch/powerpc/include/asm/asm-prototypes.h |   2 +-
 arch/powerpc/include/asm/exception-64s.h  |   6 ++
 arch/powerpc/include/asm/head-64.h|   2 +-
 arch/powerpc/include/asm/ppc_asm.h|   2 +
 arch/powerpc/include/asm/processor.h  |   2 +-
 arch/powerpc/include/asm/setup.h  |   4 +-
 arch/powerpc/kernel/cpu_setup_power.S |   2 +-
 arch/powerpc/kernel/cputable.c|   3 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |   1 +
 arch/powerpc/kernel/entry_64.S| 114 +
 arch/powerpc/kernel/exceptions-64s.S  | 119 +-
 arch/powerpc/kernel/setup_64.c|   5 +-
 arch/powerpc/kernel/syscall_64.c  |  14 ++-
 arch/powerpc/platforms/pseries/setup.c|   8 +-
 15 files changed, 295 insertions(+), 31 deletions(-)

diff --git a/Documentation/powerpc/syscall64-abi.rst 
b/Documentation/powerpc/syscall64-abi.rst
index e49f69f941b9..30c045e8726e 100644
--- a/Documentation/powerpc/syscall64-abi.rst
+++ b/Documentation/powerpc/syscall64-abi.rst
@@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI
 syscall
 ===
 
+Invocation
+--
+The syscall is made with the sc instruction, and returns with execution
+continuing at the instruction following the sc instruction.
+
+If PPC_FEATURE2_SCV appears in the AT_HWCAP2 ELF auxiliary vector, the
+scv 0 instruction is an alternative that may be used, with some differences
+to calling sequence.
+
 syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI
 specification C function calling sequence, including register preservation
 rules, with the following differences.
@@ -12,16 +21,23 @@ rules, with the following differences.
 .. [1] Some syscalls (typically low-level management functions) may have
different calling sequences (e.g., rt_sigreturn).
 
-Parameters and return value

+Parameters
+--
 The system call number is specified in r0.
 
 There is a maximum of 6 integer parameters to a syscall, passed in r3-r8.
 
-Both a return value and a return error code are returned. cr0.SO is the return
-error code, and r3 is the return value or error code. When cr0.SO is clear,
-the syscall succeeded and r3 is the return value. When cr0.SO is set, the
-syscall failed and r3 is the error code that generally corresponds to errno.
+Return value
+
+- For the sc instruction, both a return value and a return error code are
+  returned. cr0.SO is the return error code, and r3 is the return value or
+  error code. When cr0.SO is clear, the syscall succeeded and r3 is the return
+  value. When cr0.SO is set, the syscall failed and r3 is the error code that
+  generally corresponds to errno.
+
+- For the scv 0 instruction, there is a return value indicates failure if it
+  is >= -MAX_ERRNO (-4095) as an unsigned comparison, in which case it is the
+  negated return error code. Otherwise it is the successful return value.
 
 Stack
 -
@@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling 
sequence with the
 following differences:
 
 === = 
+--- For the sc instruction ---
 r0  Volatile  (System call number.)
 r3  Volatile  (Parameter 1, and return value.)
 r4-r8   Volatile  (Parameters 2-6.)
-cr0 Volatile  (cr0.SO is the return error condition)
+cr0 Volatile  (cr0.SO is the return error condition.)
 cr1, cr5-7  Nonvolatile
 lr  Nonvolatile
+
+--- For the scv 0 instruction ---
+r0  Volatile  (System call number.)
+r3  Volatile  (Parameter 1, and return value.)
+r4-r8   Volatile  (Parameters 2-6.)
 === = 
 
 All floating point and vector data registers as well as control and status
 registers are nonvolatile.
 
-Invocation
---
-The syscall is performed with the sc instruction, and returns with execution
-continuing at the instruction following the sc instruction.
-
 Transactional Memory
 
 Syscall behavior can change if the processor is in transactional or suspended
@@ -75,6 +92,7 @@ auxil

[PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices

2020-02-25 Thread Wolfram Sang
Removing attach_adapter from this driver caused a regression for at
least some machines. Those machines had the sensors described in their
DT, too, so they didn't need manual creation of the sensor devices. The
old code worked, though, because manual creation came first. Creation of
DT devices then failed later and caused error logs, but the sensors
worked nonetheless because of the manually created devices.

When removing attach_adaper, manual creation now comes later and loses
the race. The sensor devices were already registered via DT, yet with
another binding, so the driver could not be bound to it.

This fix refactors the code to remove the race and only manually creates
devices if there are no DT nodes present. Also, the DT binding is updated
to match both, the DT and manually created devices. Because we don't
know which device creation will be used at runtime, the code to start
the kthread is moved to do_probe() which will be called by both methods.

Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723
Reported-by: Erhard Furtner 
Tested-by: Erhard Furtner 
Signed-off-by: Wolfram Sang 
---

I suggest this stable-tag: # v4.19+

Adding the Debian-PPC List to reach further people maybe willing to
test.

This patch does not depend on "[PATCH RESEND] macintosh: convert to
i2c_new_scanned_device". In fact, this one here should go in first as
5.6 material. I will rebase and resend the i2c_new_scanned_device()
conversion on top of this regression fix.

I can also take this via I2C if easier.

 drivers/macintosh/therm_windtunnel.c | 52 +---
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/macintosh/therm_windtunnel.c 
b/drivers/macintosh/therm_windtunnel.c
index 8c744578122a..a0d87ed9da69 100644
--- a/drivers/macintosh/therm_windtunnel.c
+++ b/drivers/macintosh/therm_windtunnel.c
@@ -300,9 +300,11 @@ static int control_loop(void *dummy)
 /* i2c probing and setup   */
 //
 
-static int
-do_attach( struct i2c_adapter *adapter )
+static void do_attach(struct i2c_adapter *adapter)
 {
+   struct i2c_board_info info = { };
+   struct device_node *np;
+
/* scan 0x48-0x4f (DS1775) and 0x2c-2x2f (ADM1030) */
static const unsigned short scan_ds1775[] = {
0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f,
@@ -313,25 +315,24 @@ do_attach( struct i2c_adapter *adapter )
I2C_CLIENT_END
};
 
-   if( strncmp(adapter->name, "uni-n", 5) )
-   return 0;
-
-   if( !x.running ) {
-   struct i2c_board_info info;
+   if (x.running || strncmp(adapter->name, "uni-n", 5))
+   return;
 
-   memset(&info, 0, sizeof(struct i2c_board_info));
-   strlcpy(info.type, "therm_ds1775", I2C_NAME_SIZE);
+   np = of_find_compatible_node(adapter->dev.of_node, NULL, "MAC,ds1775");
+   if (np) {
+   of_node_put(np);
+   } else {
+   strlcpy(info.type, "MAC,ds1775", I2C_NAME_SIZE);
i2c_new_probed_device(adapter, &info, scan_ds1775, NULL);
+   }
 
-   strlcpy(info.type, "therm_adm1030", I2C_NAME_SIZE);
+   np = of_find_compatible_node(adapter->dev.of_node, NULL, "MAC,adm1030");
+   if (np) {
+   of_node_put(np);
+   } else {
+   strlcpy(info.type, "MAC,adm1030", I2C_NAME_SIZE);
i2c_new_probed_device(adapter, &info, scan_adm1030, NULL);
-
-   if( x.thermostat && x.fan ) {
-   x.running = 1;
-   x.poll_task = kthread_run(control_loop, NULL, "g4fand");
-   }
}
-   return 0;
 }
 
 static int
@@ -404,8 +405,8 @@ attach_thermostat( struct i2c_client *cl )
 enum chip { ds1775, adm1030 };
 
 static const struct i2c_device_id therm_windtunnel_id[] = {
-   { "therm_ds1775", ds1775 },
-   { "therm_adm1030", adm1030 },
+   { "MAC,ds1775", ds1775 },
+   { "MAC,adm1030", adm1030 },
{ }
 };
 MODULE_DEVICE_TABLE(i2c, therm_windtunnel_id);
@@ -414,6 +415,7 @@ static int
 do_probe(struct i2c_client *cl, const struct i2c_device_id *id)
 {
struct i2c_adapter *adapter = cl->adapter;
+   int ret = 0;
 
if( !i2c_check_functionality(adapter, I2C_FUNC_SMBUS_WORD_DATA
 | I2C_FUNC_SMBUS_WRITE_BYTE) )
@@ -421,11 +423,19 @@ do_probe(struct i2c_client *cl, const struct 
i2c_device_id *id)
 
switch (id->driver_data) {
case adm1030:
-   return attach_fan( cl );
+   ret = attach_fan(cl);
+   break;
case ds1775:
-   return attach_thermostat(cl);
+   ret = attach_thermostat(cl);
+   break;
}
-   return 0;
+
+   if (!x.running && x.th

[PATCH] i2c: powermac: correct comment about custom handling

2020-02-25 Thread Wolfram Sang
The comment had some flaws which are now fixed:
- the prefix is 'MAC' not 'AAPL'
- no kernel coding style and too short length
- 'we do' instead of 'we to'

Signed-off-by: Wolfram Sang 
---
 drivers/i2c/busses/i2c-powermac.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-powermac.c 
b/drivers/i2c/busses/i2c-powermac.c
index 973e5339033c..d565714c1f13 100644
--- a/drivers/i2c/busses/i2c-powermac.c
+++ b/drivers/i2c/busses/i2c-powermac.c
@@ -279,14 +279,13 @@ static bool i2c_powermac_get_type(struct i2c_adapter 
*adap,
 {
char tmp[16];
 
-   /* Note: we to _NOT_ want the standard
-* i2c drivers to match with any of our powermac stuff
-* unless they have been specifically modified to handle
-* it on a case by case basis. For example, for thermal
-* control, things like lm75 etc... shall match with their
-* corresponding windfarm drivers, _NOT_ the generic ones,
-* so we force a prefix of AAPL, onto the modalias to
-* make that happen
+   /*
+* Note: we do _NOT_ want the standard i2c drivers to match with any of
+* our powermac stuff unless they have been specifically modified to
+* handle it on a case by case basis. For example, for thermal control,
+* things like lm75 etc... shall match with their corresponding
+* windfarm drivers, _NOT_ the generic ones, so we force a prefix of
+* 'MAC', onto the modalias to make that happen
 */
 
/* First try proper modalias */
-- 
2.20.1



Re: [PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices

2020-02-25 Thread Wolfram Sang
On Tue, Feb 25, 2020 at 03:41:22PM +0100, John Paul Adrian Glaubitz wrote:
> Hello!
> 
> On 2/25/20 3:12 PM, Wolfram Sang wrote:
> > Adding the Debian-PPC List to reach further people maybe willing to
> > test.
> 
> This might be related [1].

IIUC, this is the same as
https://bugzilla.kernel.org/show_bug.cgi?id=199471.

I don't think my patch helps here.



signature.asc
Description: PGP signature


Re: [PATCH] evh_bytechan: fix out of bounds accesses

2020-02-25 Thread Stephen Rothwell
Hi Laurentiu,

On Tue, 25 Feb 2020 11:54:17 +0200 Laurentiu Tudor  
wrote:
>
> On 21.02.2020 01:57, Stephen Rothwell wrote:
> > 
> > On Thu, 16 Jan 2020 11:37:14 +1100 Stephen Rothwell  
> > wrote:  
> >>
> >> On Wed, 15 Jan 2020 14:01:35 -0600 Scott Wood  wrote:  
> >>>
> >>> On Thu, 2020-01-16 at 06:42 +1100, Stephen Rothwell wrote:  
> 
>  On Wed, 15 Jan 2020 07:25:45 -0600 Timur Tabi  wrote:  
> > On 1/14/20 12:31 AM, Stephen Rothwell wrote:  
> >> +/**
> >> + * ev_byte_channel_send - send characters to a byte stream
> >> + * @handle: byte stream handle
> >> + * @count: (input) num of chars to send, (output) num chars sent
> >> + * @bp: pointer to chars to send
> >> + *
> >> + * Returns 0 for success, or an error code.
> >> + */
> >> +static unsigned int ev_byte_channel_send(unsigned int handle,
> >> +  unsigned int *count, const char *bp)  
> >
> > Well, now you've moved this into the .c file and it is no longer
> > available to other callers.  Anything wrong with keeping it in the .h
> > file?  
> 
>  There are currently no other callers - are there likely to be in the
>  future?  Even if there are, is it time critical enough that it needs to
>  be inlined everywhere?  
> >>>
> >>> It's not performance critical and there aren't likely to be other users --
> >>> just a matter of what's cleaner.  FWIW I'd rather see the original patch,
> >>> that keeps the raw asm hcall stuff as simple wrappers in one place.  
> >>
> >> And I don't mind either way :-)
> >>
> >> I just want to get rid of the warnings.  
> > 
> > Any progress with this?
> 
> I think that the consensus was to pick up the original patch that is, 
> this one: https://patchwork.ozlabs.org/patch/1220186/
> 
> I've tested it too, so please feel free to add a:
> 
> Tested-by: Laurentiu Tudor 

So, whose tree should his go via?

-- 
Cheers,
Stephen Rothwell


pgp8HKriNlqei.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning

2020-02-25 Thread Segher Boessenkool
Hi!

On Wed, Feb 26, 2020 at 03:35:35AM +1000, Nicholas Piggin wrote:
> Kernel addresses and potentially other sensitive data could be leaked
> in volatile registers after a syscall.

>   cmpdi   r3,0
>   bne .Lsyscall_restore_regs
> + li  r0,0
> + li  r4,0
> + li  r5,0
> + li  r6,0
> + li  r7,0
> + li  r8,0
> + li  r9,0
> + li  r10,0
> + li  r11,0
> + li  r12,0
> + mtctr   r0
> + mtspr   SPRN_XER,r0
>  .Lsyscall_restore_regs_cont:

What about LR?  Is that taken care of later?

This also deserves a big fat comment imo, it is very important after
all, and not so obvious.


Segher


Re: MCE handler gets NIP wrong on MPC8378

2020-02-25 Thread Radu Rendec
On 02/20/2020 at 12:48 PM Christophe Leroy  wrote:
> Le 20/02/2020 à 18:34, Radu Rendec a écrit :
> > On 02/20/2020 at 11:25 AM Christophe Leroy  wrote:
> >> Le 20/02/2020 à 17:02, Radu Rendec a écrit :
> >>> On 02/20/2020 at 3:38 AM Christophe Leroy  wrote:
>  On 02/19/2020 10:39 PM, Radu Rendec wrote:
> > On 02/19/2020 at 4:21 PM Christophe Leroy  
> > wrote:
> >>> Interesting.
> >>>
> >>> 0x900 is the adress of the timer interrupt.
> >>>
> >>> Would the MCE occur just after the timer interrupt ?
> >
> > I doubt that. I'm using a small test module to artificially trigger the
> > MCE. Basically it's just this (the full code is in my original post):
> >
> >bad_addr_base = ioremap(0xf000, 0x100);
> >x = ioread32(bad_addr_base);
> >
> > I find it hard to believe that every time I load the module the lwbrx
> > instruction that triggers the MCE is executed exactly after the timer
> > interrupt (or that the timer interrupt always occurs close to the lwbrx
> > instruction).
> 
>  Can you try to see how much time there is between your read and the MCE ?
>  The below should allow it, you'll see first value in r13 and the other
>  in r14 (mce.c is your test code)
> 
>  Also provide the timebase frequency as reported in /proc/cpuinfo
> >>>
> >>> I just ran a test: r13 is 0xda8e0f91 and r14 is 0xdaae0f9c.
> >>>
> >>> # cat /proc/cpuinfo
> >>> processor   : 0
> >>> cpu : e300c4
> >>> clock   : 800.04MHz
> >>> revision: 1.1 (pvr 8086 1011)
> >>> bogomips: 200.00
> >>> timebase: 1
> >>>
> >>> The difference between r14 and r13 is 0x2b. Assuming TB is
> >>> incremented with 'timebase' frequency, that means 20.97 milliseconds
> >>> (although the e300 manual says TB is "incremented once every four core
> >>> input clock cycles").
> >>
> >> I wouldn't be surprised that the internal CPU clock be twice the input
> >> clock.
> >>
> >> So that's long enough to surely get a timer interrupt during every bad
> >> access.
> >>
> >> Now we have to understand why SRR1 contains the address of the timer
> >> exception entry and not the address of the bad access.
> >>
> >> The value of SRR1 confirms that it comes from 0x900 as MSR[IR] and [DR]
> >> are cleared when interrupts are enabled.
> >>
> >> Maybe you should file a support case at NXP. They are usually quite
> >> professionnal at responding.
> >
> > I already did (quite some time ago), but it started off as "why does the
> > MCE occur in the first place". That part has already been figured out,
> > but unfortunately I don't have a viable solution to it. Like you said,
> > now the focus has shifted to understanding why the SRR0 value is not
> > what we expect.
>
> Yes now the point is to understand why it starts processing the timer
> interrupt at 0x900 (with IR and DR cleared as observed in SRR1) just
> before taking the Machine Check.
>
> Allthough the execution of the decrementer interrupt is queue for after
> the completion of the failing memory access, I'd expect the Machine
> Check to take priority.
>
> Note that I have never observed such a behaviour on MPC8321 which has an
> e300c2 core.

I apologize for the silence during the past few days, I've been diverted
with something else. This is the feedback that I got from NXP:

| The e300 core uses SRR0/1 for both non-critical interrupts and machine
| check interrupts and if they happen simultaneously a problem can occur
| where the return address from the first exception is lost when handling
| the second exception concurrently. This only occurs in the rare case
| when the software ISR hasn't had the time to save SRR0/1 to the sw stack.
|
| If the ability to nest interrupts is desired, software then saves off
| enough state (i.e. the contents of SRR0, SRR1, etc) that will allow it
| to recover (i.e. resume handling the current interrupt) if another
| interrupt occurs.

So basically what they describe is a race condition between the MCE and
a regular interrupt, where the regular interrupt (the timer interrupt,
in our case) kicks in after the MCE handler is entered into but before
it saves SRR0. This not only requires very precise timing, but would
also end up with a saved SRR0 value that points back somewhere inside
the MCE handler.

But I've thought about something else. We already timed it and we know
it consistently takes around 20 ms between the faulty read and the MCE
handler execution. I'm thinking that the faulty read is essentially a
failed transaction on the internal bus, because no peripheral replies
to the access on the bad address. The 20 ms is probably the bus timeout.
How does this scenario look to you?

- The faulty read starts to execute. A new internal bus transaction is
  started, the bad address is put on the bus and the CPU waits for a
  peripheral to reply.
- The timer interrupt kicks in. The CPU saves NIP to SRR0 and 

RE: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices

2020-02-25 Thread Alastair D'Silva
On Mon, 2020-02-24 at 17:51 +1100, Oliver O'Halloran wrote:
> On Mon, Feb 24, 2020 at 3:43 PM Alastair D'Silva <
> alast...@au1.ibm.com> wrote:
> > On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote:
> > > On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva wrote:
> > > > V3:
> > > >   - Rebase against next/next-20200220
> > > >   - Move driver to arch/powerpc/platforms/powernv, we now
> > > > expect
> > > > this
> > > > driver to go upstream via the powerpc tree
> > > 
> > > That's rather the opposite direction of normal; mostly drivers
> > > live
> > > under
> > > drivers/ and not in arch/.  It's easier for drivers to get
> > > overlooked
> > > when doing tree-wide changes if they're hiding.
> > 
> > This is true, however, given that it was not all that desirable to
> > have
> > it under drivers/nvdimm, it's sister driver (for the same hardware)
> > is
> > also under arch, and that we don't expect this driver to be used on
> > any
> > platform other than powernv, we think this was the most reasonable
> > place to put it.
> 
> Historically powernv specific platform drivers go in their respective
> subsystem trees rather than in arch/ and I'd prefer we kept it that
> way. When I added the papr_scm driver I put it in the pseries
> platform
> directory because most of the pseries paravirt code lives there for
> some reason; I don't know why. Luckily for me that followed the same
> model that Dan used when he put the NFIT driver in drivers/acpi/ and
> the libnvdimm core in drivers/nvdimm/ so we didn't have anything to
> argue about. However, as Matthew pointed out, it is at odds with how
> most subsystems operate. Is there any particular reason we're doing
> things this way or should we think about moving libnvdimm users to
> drivers/nvdimm/?
> 
> Oliver


I'm not too fussed where it ends up, as long as it ends up somewhere :)

>From what I can tell, the issue is that we have both "infrastructure"
drivers, and end-device drivers. To me, it feels like drivers/nvdimm
should contain both, and I think this feels like the right approach.

I could move it back to drivers/nvdimm/ocxl, but I felt that it was
only tolerated there, not desired. This could be cleared up with a
response from Dan Williams, and if it is indeed dersired, this is my
preferred location.

I think a case could also be made for drivers/ocxl, simply because we
don't expect more than a handful of drivers to ever live there (I
expect most users will drive their devices from userspace via libocxl).

In defence of keeping it in arch/powerpc/powernv, I highly doubt this
driver will end up being used on any platform other than this. Even
though OpenCAPI was engineered as an open standard, there is some
competition from industry giants with a competing standard on a much
more popular platform.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v3 03/27] powerpc: Map & release OpenCAPI LPC memory

2020-02-25 Thread Alastair D'Silva
On Tue, 2020-02-25 at 11:02 +0100, Frederic Barrat wrote:
> 
> Le 21/02/2020 à 04:26, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > This patch adds platform support to map & release LPC memory.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/include/asm/pnv-ocxl.h   |  4 +++
> >   arch/powerpc/platforms/powernv/ocxl.c | 43
> > +++
> >   2 files changed, 47 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h
> > b/arch/powerpc/include/asm/pnv-ocxl.h
> > index 7de82647e761..0b2a6707e555 100644
> > --- a/arch/powerpc/include/asm/pnv-ocxl.h
> > +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> > @@ -32,5 +32,9 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void
> > *platform_data, int pe_handle)
> >   
> >   extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
> >   extern void pnv_ocxl_free_xive_irq(u32 irq);
> > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size);
> > +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
> > +#endif
> 
> This breaks the compilation of the ocxl driver if
> CONFIG_MEMORY_HOTPLUG=n
> 
> Those functions still make sense even without memory hotplug, for 
> example in the context of the implementation you had to access
> opencapi 
> LPC memory through mmap(). The #ifdef is really needed only around
> the 
> check_hotplug_memory_addressable() call.
> 
>Fred

Hmm, we do still need sparsemem though. Let me think about his some
more.

> 
> 
> >   #endif /* _ASM_PNV_OCXL_H */
> > diff --git a/arch/powerpc/platforms/powernv/ocxl.c
> > b/arch/powerpc/platforms/powernv/ocxl.c
> > index 8c65aacda9c8..f2edbcc67361 100644
> > --- a/arch/powerpc/platforms/powernv/ocxl.c
> > +++ b/arch/powerpc/platforms/powernv/ocxl.c
> > @@ -475,6 +475,49 @@ void pnv_ocxl_spa_release(void *platform_data)
> >   }
> >   EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
> >   
> > +#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
> > +{
> > +   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > +   struct pnv_phb *phb = hose->private_data;
> > +   u32 bdfn = pci_dev_id(pdev);
> > +   __be64 base_addr_be64;
> > +   u64 base_addr;
> > +   int rc;
> > +
> > +   rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size,
> > &base_addr_be64);
> > +   if (rc) {
> > +   dev_warn(&pdev->dev,
> > +"OPAL could not allocate LPC memory, rc=%d\n",
> > rc);
> > +   return 0;
> > +   }
> > +
> > +   base_addr = be64_to_cpu(base_addr_be64);
> > +
> > +   rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT,
> > + size >> PAGE_SHIFT);
> > +   if (rc)
> > +   return 0;
> > +
> > +   return base_addr;
> > +}
> > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
> > +
> > +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
> > +{
> > +   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > +   struct pnv_phb *phb = hose->private_data;
> > +   u32 bdfn = pci_dev_id(pdev);
> > +   int rc;
> > +
> > +   rc = opal_npu_mem_release(phb->opal_id, bdfn);
> > +   if (rc)
> > +   dev_warn(&pdev->dev,
> > +"OPAL reported rc=%d when releasing LPC
> > memory\n", rc);
> > +}
> > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
> > +#endif
> > +
> >   int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int
> > pe_handle)
> >   {
> > struct spa_data *data = (struct spa_data *) platform_data;
> > 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v3 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2020-02-25 Thread Alastair D'Silva
On Tue, 2020-02-25 at 17:30 +0100, Frederic Barrat wrote:
> 
> Le 21/02/2020 à 04:26, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > Tally up the LPC memory on an OpenCAPI link & allow it to be mapped
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   drivers/misc/ocxl/core.c  | 10 ++
> >   drivers/misc/ocxl/link.c  | 53
> > +++
> >   drivers/misc/ocxl/ocxl_internal.h | 33 +++
> >   3 files changed, 96 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> > index b7a09b21ab36..2531c6cf19a0 100644
> > --- a/drivers/misc/ocxl/core.c
> > +++ b/drivers/misc/ocxl/core.c
> > @@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu,
> > u8 afu_idx, struct pci_dev *dev)
> > if (rc)
> > goto err_free_pasid;
> >   
> > +   if (afu->config.lpc_mem_size || afu-
> > >config.special_purpose_mem_size) {
> > +   rc = ocxl_link_add_lpc_mem(afu->fn->link, afu-
> > >config.lpc_mem_offset,
> > +  afu->config.lpc_mem_size +
> > +  afu-
> > >config.special_purpose_mem_size);
> > +   if (rc)
> > +   goto err_free_mmio;
> > +   }
> > +
> > return 0;
> >   
> > +err_free_mmio:
> > +   unmap_mmio_areas(afu);
> >   err_free_pasid:
> > reclaim_afu_pasid(afu);
> >   err_free_actag:
> > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> > index 58d111afd9f6..1e039cc5ebe5 100644
> > --- a/drivers/misc/ocxl/link.c
> > +++ b/drivers/misc/ocxl/link.c
> > @@ -84,6 +84,11 @@ struct ocxl_link {
> > int dev;
> > atomic_t irq_available;
> > struct spa *spa;
> > +   struct mutex lpc_mem_lock; /* protects lpc_mem & lpc_mem_sz */
> > +   u64 lpc_mem_sz; /* Total amount of LPC memory presented on the
> > link */
> > +   u64 lpc_mem;
> > +   int lpc_consumers;
> > +
> > void *platform_data;
> >   };
> >   static struct list_head links_list = LIST_HEAD_INIT(links_list);
> > @@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int
> > PE_mask, struct ocxl_link **out_l
> > if (rc)
> > goto err_spa;
> >   
> > +   mutex_init(&link->lpc_mem_lock);
> > +
> > /* platform specific hook */
> > rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
> > &link->platform_data);
> > @@ -711,3 +718,49 @@ void ocxl_link_free_irq(void *link_handle, int
> > hw_irq)
> > atomic_inc(&link->irq_available);
> >   }
> >   EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
> > +
> > +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size)
> > +{
> > +   struct ocxl_link *link = (struct ocxl_link *) link_handle;
> > +
> > +   // Check for overflow
> > +   if (offset > (offset + size))
> > +   return -EINVAL;
> > +
> > +   mutex_lock(&link->lpc_mem_lock);
> > +   link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size);
> > +
> > +   mutex_unlock(&link->lpc_mem_lock);
> > +
> > +   return 0;
> > +}
> > +
> > +u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
> > +{
> > +   struct ocxl_link *link = (struct ocxl_link *) link_handle;
> > +
> > +   mutex_lock(&link->lpc_mem_lock);
> > +
> > +   if(!link->lpc_mem)
> > +   link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link-
> > >lpc_mem_sz);
> > +
> > +   if(link->lpc_mem)
> > +   link->lpc_consumers++;
> > +   mutex_unlock(&link->lpc_mem_lock);
> > +
> > +   return link->lpc_mem;
> > +}
> > +
> > +void ocxl_link_lpc_release(void *link_handle, struct pci_dev
> > *pdev)
> > +{
> > +   struct ocxl_link *link = (struct ocxl_link *) link_handle;
> > +
> > +   mutex_lock(&link->lpc_mem_lock);
> > +   WARN_ON(--link->lpc_consumers < 0);
> 
> Here, we always decrement the lpc_consumers count. However, it was
> only 
> incremented if the mapping was setup correctly in opal.
> 
> We could arguably claim that ocxl_link_lpc_release() should only be 
> called if ocxl_link_lpc_map() succeeded, but it would make error
> path 
> handling easier if we only decrement the lpc_consumers count if 
> link->lpc_mem is set. So that we can just call
> ocxl_link_lpc_release() 
> in error paths without having to worry about triggering the WARN_ON
> message.
> 
>Fred
> 
> 

Ok, this makes sense.

> 
> > +   if (link->lpc_consumers == 0) {
> > +   pnv_ocxl_platform_lpc_release(pdev);
> > +   link->lpc_mem = 0;
> > +   }
> > +
> > +   mutex_unlock(&link->lpc_mem_lock);
> > +}
> > diff --git a/drivers/misc/ocxl/ocxl_internal.h
> > b/drivers/misc/ocxl/ocxl_internal.h
> > index 198e4e4bc51d..d0c8c4838f42 100644
> > --- a/drivers/misc/ocxl/ocxl_internal.h
> > +++ b/drivers/misc/ocxl/ocxl_internal.h
> > @@ -142,4 +142,37 @@ int ocxl_irq_offset_to_id(struct ocxl_context
> > *ctx, u64 offset);
> >   u64 ocxl_irq_id_to_offset(struct ocxl_context *ctx, int irq_id);
> >   void ocxl_afu_irq_free_all(struct ocxl_context *ctx);
> >   
> > +/**
> > + * ocxl_link

Re: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices

2020-02-25 Thread Dan Williams
On Tue, Feb 25, 2020 at 4:14 PM Alastair D'Silva  wrote:
>
> On Mon, 2020-02-24 at 17:51 +1100, Oliver O'Halloran wrote:
> > On Mon, Feb 24, 2020 at 3:43 PM Alastair D'Silva <
> > alast...@au1.ibm.com> wrote:
> > > On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote:
> > > > On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva wrote:
> > > > > V3:
> > > > >   - Rebase against next/next-20200220
> > > > >   - Move driver to arch/powerpc/platforms/powernv, we now
> > > > > expect
> > > > > this
> > > > > driver to go upstream via the powerpc tree
> > > >
> > > > That's rather the opposite direction of normal; mostly drivers
> > > > live
> > > > under
> > > > drivers/ and not in arch/.  It's easier for drivers to get
> > > > overlooked
> > > > when doing tree-wide changes if they're hiding.
> > >
> > > This is true, however, given that it was not all that desirable to
> > > have
> > > it under drivers/nvdimm, it's sister driver (for the same hardware)
> > > is
> > > also under arch, and that we don't expect this driver to be used on
> > > any
> > > platform other than powernv, we think this was the most reasonable
> > > place to put it.
> >
> > Historically powernv specific platform drivers go in their respective
> > subsystem trees rather than in arch/ and I'd prefer we kept it that
> > way. When I added the papr_scm driver I put it in the pseries
> > platform
> > directory because most of the pseries paravirt code lives there for
> > some reason; I don't know why. Luckily for me that followed the same
> > model that Dan used when he put the NFIT driver in drivers/acpi/ and
> > the libnvdimm core in drivers/nvdimm/ so we didn't have anything to
> > argue about. However, as Matthew pointed out, it is at odds with how
> > most subsystems operate. Is there any particular reason we're doing
> > things this way or should we think about moving libnvdimm users to
> > drivers/nvdimm/?
> >
> > Oliver
>
>
> I'm not too fussed where it ends up, as long as it ends up somewhere :)
>
> From what I can tell, the issue is that we have both "infrastructure"
> drivers, and end-device drivers. To me, it feels like drivers/nvdimm
> should contain both, and I think this feels like the right approach.
>
> I could move it back to drivers/nvdimm/ocxl, but I felt that it was
> only tolerated there, not desired. This could be cleared up with a
> response from Dan Williams, and if it is indeed dersired, this is my
> preferred location.

Apologies if I gave the impression it was only tolerated. I'm ok with
drivers/nvdimm/ocxl/, and to the larger point I'd also be ok with a
drivers/{acpi => nvdimm}/nfit and {arch/powerpc/platforms/pseries =>
drivers/nvdimm}/papr_scm.c move as well to keep all the consumers of
the nvdimm related code together with the core.


RE: [PATCH v3 00/27] Add support for OpenCAPI Persistent Memory devices

2020-02-25 Thread Alastair D'Silva
On Tue, 2020-02-25 at 16:32 -0800, Dan Williams wrote:
> On Tue, Feb 25, 2020 at 4:14 PM Alastair D'Silva <
> alast...@au1.ibm.com> wrote:
> > On Mon, 2020-02-24 at 17:51 +1100, Oliver O'Halloran wrote:
> > > On Mon, Feb 24, 2020 at 3:43 PM Alastair D'Silva <
> > > alast...@au1.ibm.com> wrote:
> > > > On Sun, 2020-02-23 at 20:37 -0800, Matthew Wilcox wrote:
> > > > > On Mon, Feb 24, 2020 at 03:34:07PM +1100, Alastair D'Silva
> > > > > wrote:
> > > > > > V3:
> > > > > >   - Rebase against next/next-20200220
> > > > > >   - Move driver to arch/powerpc/platforms/powernv, we now
> > > > > > expect
> > > > > > this
> > > > > > driver to go upstream via the powerpc tree
> > > > > 
> > > > > That's rather the opposite direction of normal; mostly
> > > > > drivers
> > > > > live
> > > > > under
> > > > > drivers/ and not in arch/.  It's easier for drivers to get
> > > > > overlooked
> > > > > when doing tree-wide changes if they're hiding.
> > > > 
> > > > This is true, however, given that it was not all that desirable
> > > > to
> > > > have
> > > > it under drivers/nvdimm, it's sister driver (for the same
> > > > hardware)
> > > > is
> > > > also under arch, and that we don't expect this driver to be
> > > > used on
> > > > any
> > > > platform other than powernv, we think this was the most
> > > > reasonable
> > > > place to put it.
> > > 
> > > Historically powernv specific platform drivers go in their
> > > respective
> > > subsystem trees rather than in arch/ and I'd prefer we kept it
> > > that
> > > way. When I added the papr_scm driver I put it in the pseries
> > > platform
> > > directory because most of the pseries paravirt code lives there
> > > for
> > > some reason; I don't know why. Luckily for me that followed the
> > > same
> > > model that Dan used when he put the NFIT driver in drivers/acpi/
> > > and
> > > the libnvdimm core in drivers/nvdimm/ so we didn't have anything
> > > to
> > > argue about. However, as Matthew pointed out, it is at odds with
> > > how
> > > most subsystems operate. Is there any particular reason we're
> > > doing
> > > things this way or should we think about moving libnvdimm users
> > > to
> > > drivers/nvdimm/?
> > > 
> > > Oliver
> > 
> > I'm not too fussed where it ends up, as long as it ends up
> > somewhere :)
> > 
> > From what I can tell, the issue is that we have both
> > "infrastructure"
> > drivers, and end-device drivers. To me, it feels like
> > drivers/nvdimm
> > should contain both, and I think this feels like the right
> > approach.
> > 
> > I could move it back to drivers/nvdimm/ocxl, but I felt that it was
> > only tolerated there, not desired. This could be cleared up with a
> > response from Dan Williams, and if it is indeed dersired, this is
> > my
> > preferred location.
> 
> Apologies if I gave the impression it was only tolerated. I'm ok with
> drivers/nvdimm/ocxl/, and to the larger point I'd also be ok with a
> drivers/{acpi => nvdimm}/nfit and {arch/powerpc/platforms/pseries =>
> drivers/nvdimm}/papr_scm.c move as well to keep all the consumers of
> the nvdimm related code together with the core.

Great, thanks for clarifying, text is so imprecise when it comes to
nuance :)

I'll move ti back to drivers/nvdimm/ocxl then.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH v2 3/3] ASoC: fsl_easrc: Add EASRC ASoC CPU DAI and platform drivers

2020-02-25 Thread Shengjiu Wang
On Tue, Feb 25, 2020 at 4:05 PM Nicolin Chen  wrote:
>
> On Mon, Feb 24, 2020 at 08:53:25AM +, S.j. Wang wrote:
> > Hi
> >
> > > >
> > > > Signed-off-by: Shengjiu Wang 
> > > > ---
> > > >  sound/soc/fsl/Kconfig   |   10 +
> > > >  sound/soc/fsl/Makefile  |2 +
> > > >  sound/soc/fsl/fsl_asrc_common.h |1 +
> > > >  sound/soc/fsl/fsl_easrc.c   | 2265 +++
> > > >  sound/soc/fsl/fsl_easrc.h   |  668 +
> > > >  sound/soc/fsl/fsl_easrc_dma.c   |  440 ++
> > >
> > > I see a 90% similarity between fsl_asrc_dma and fsl_easrc_dma files.
> > > Would it be possible reuse the existing code? Could share structures from
> > > my point of view, just like it reuses "enum asrc_pair_index", I know
> > > differentiating "pair" and "context" is a big point here though.
> > >
> > > A possible quick solution for that, off the top of my head, could be:
> > >
> > > 1) in fsl_asrc_common.h
> > >
> > > struct fsl_asrc {
> > > 
> > > };
> > >
> > > struct fsl_asrc_pair {
> > > 
> > > };
> > >
> > > 2) in fsl_easrc.h
> > >
> > > /* Renaming shared structures */
> > > #define fsl_easrc fsl_asrc
> > > #define fsl_easrc_context fsl_asrc_pair
> > >
> > > May be a good idea to see if others have some opinion too.
> > >
> >
> > We need to modify the fsl_asrc and fsl_asrc_pair, let them
> > To be used by both driver,  also we need to put the specific
> > Definition for each module to same struct, right?
>
> Yea. A merged structure if that doesn't look that bad. I see most
> of the fields in struct fsl_asrc are being reused by in fsl_easrc.
>
> > >
> > > > +static const struct regmap_config fsl_easrc_regmap_config = {
> > > > + .readable_reg = fsl_easrc_readable_reg,
> > > > + .volatile_reg = fsl_easrc_volatile_reg,
> > > > + .writeable_reg = fsl_easrc_writeable_reg,
> > >
> > > Can we use regmap_range and regmap_access_table?
> > >
> >
> > Can the regmap_range support discontinuous registers?  The
> > reg_stride = 4.
>
> I think it does. Giving an example here:
> https://github.com/torvalds/linux/blob/master/drivers/mfd/da9063-i2c.c

The register in this i2c driver are continuous,  from 0x00, 0x01, 0x02...

But our case is 0x00, 0x04, 0x08, does it work?

best regards
wang shengjiu


Re: [PATCH v3 1/6] powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and kaslr_early_init()

2020-02-25 Thread Jason Yan




在 2020/2/20 21:40, Christophe Leroy 写道:



Le 06/02/2020 à 03:58, Jason Yan a écrit :

Some code refactor in kaslr_legal_offset() and kaslr_early_init(). No
functional change. This is a preparation for KASLR fsl_booke64.

Signed-off-by: Jason Yan 
Cc: Scott Wood 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
---
  arch/powerpc/mm/nohash/kaslr_booke.c | 40 ++--
  1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c 
b/arch/powerpc/mm/nohash/kaslr_booke.c

index 4a75f2d9bf0e..07b036e98353 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -25,6 +25,7 @@ struct regions {
  unsigned long pa_start;
  unsigned long pa_end;
  unsigned long kernel_size;
+    unsigned long linear_sz;
  unsigned long dtb_start;
  unsigned long dtb_end;
  unsigned long initrd_start;
@@ -260,11 +261,23 @@ static __init void get_cell_sizes(const void 
*fdt, int node, int *addr_cells,

  *size_cells = fdt32_to_cpu(*prop);
  }
-static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned 
long index,

-   unsigned long offset)
+static unsigned long __init kaslr_legal_offset(void *dt_ptr, unsigned 
long random)

  {
  unsigned long koffset = 0;
  unsigned long start;
+    unsigned long index;
+    unsigned long offset;
+
+    /*
+ * Decide which 64M we want to start
+ * Only use the low 8 bits of the random seed
+ */
+    index = random & 0xFF;
+    index %= regions.linear_sz / SZ_64M;
+
+    /* Decide offset inside 64M */
+    offset = random % (SZ_64M - regions.kernel_size);
+    offset = round_down(offset, SZ_16K);
  while ((long)index >= 0) {
  offset = memstart_addr + index * SZ_64M + offset;
@@ -289,10 +302,9 @@ static inline __init bool kaslr_disabled(void)
  static unsigned long __init kaslr_choose_location(void *dt_ptr, 
phys_addr_t size,

    unsigned long kernel_sz)
  {
-    unsigned long offset, random;
+    unsigned long random;
  unsigned long ram, linear_sz;
  u64 seed;
-    unsigned long index;
  kaslr_get_cmdline(dt_ptr);
  if (kaslr_disabled())
@@ -333,22 +345,12 @@ static unsigned long __init 
kaslr_choose_location(void *dt_ptr, phys_addr_t size

  regions.dtb_start = __pa(dt_ptr);
  regions.dtb_end = __pa(dt_ptr) + fdt_totalsize(dt_ptr);
  regions.kernel_size = kernel_sz;
+    regions.linear_sz = linear_sz;
  get_initrd_range(dt_ptr);
  get_crash_kernel(dt_ptr, ram);
-    /*
- * Decide which 64M we want to start
- * Only use the low 8 bits of the random seed
- */
-    index = random & 0xFF;
-    index %= linear_sz / SZ_64M;
-
-    /* Decide offset inside 64M */
-    offset = random % (SZ_64M - kernel_sz);
-    offset = round_down(offset, SZ_16K);
-
-    return kaslr_legal_offset(dt_ptr, index, offset);
+    return kaslr_legal_offset(dt_ptr, random);
  }
  /*
@@ -358,8 +360,6 @@ static unsigned long __init 
kaslr_choose_location(void *dt_ptr, phys_addr_t size

   */
  notrace void __init kaslr_early_init(void *dt_ptr, phys_addr_t size)
  {
-    unsigned long tlb_virt;
-    phys_addr_t tlb_phys;
  unsigned long offset;
  unsigned long kernel_sz;
@@ -375,8 +375,8 @@ notrace void __init kaslr_early_init(void *dt_ptr, 
phys_addr_t size)

  is_second_reloc = 1;
  if (offset >= SZ_64M) {
-    tlb_virt = round_down(kernstart_virt_addr, SZ_64M);
-    tlb_phys = round_down(kernstart_addr, SZ_64M);
+    unsigned long tlb_virt = round_down(kernstart_virt_addr, 
SZ_64M);

+    phys_addr_t tlb_phys = round_down(kernstart_addr, SZ_64M);


That looks like cleanup unrelated to the patch itself.


Hi, Christophe

These two variables is only for the booke32 code, so I moved the
definition here so that I can save a "#ifdef CONFIG_PPC32" for them.

Thanks,
Jason




  /* Create kernel map to relocate in */
  create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);



Christophe

.




Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64

2020-02-25 Thread Jason Yan




在 2020/2/20 21:48, Christophe Leroy 写道:



Le 06/02/2020 à 03:58, Jason Yan a écrit :

The implementation for Freescale BookE64 is similar as BookE32. One
difference is that Freescale BookE64 set up a TLB mapping of 1G during
booting. Another difference is that ppc64 needs the kernel to be
64K-aligned. So we can randomize the kernel in this 1G mapping and make
it 64K-aligned. This can save some code to creat another TLB map at
early boot. The disadvantage is that we only have about 1G/64K = 16384
slots to put the kernel in.

To support secondary cpu boot up, a variable __kaslr_offset was added in
first_256B section. This can help secondary cpu get the kaslr offset
before the 1:1 mapping has been setup.

Signed-off-by: Jason Yan 
Cc: Scott Wood 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
---
  arch/powerpc/Kconfig |  2 +-
  arch/powerpc/kernel/exceptions-64e.S | 10 +
  arch/powerpc/kernel/head_64.S    |  7 ++
  arch/powerpc/kernel/setup_64.c   |  4 +++-
  arch/powerpc/mm/mmu_decl.h   | 16 +++---
  arch/powerpc/mm/nohash/kaslr_booke.c | 33 +---
  6 files changed, 59 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c150a9d49343..754aeb96bb1c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,7 +568,7 @@ config RELOCATABLE
  config RANDOMIZE_BASE
  bool "Randomize the address of the kernel image"
-    depends on (FSL_BOOKE && FLATMEM && PPC32)
+    depends on (PPC_FSL_BOOK3E && FLATMEM)
  depends on RELOCATABLE
  help
    Randomizes the virtual address at which the kernel image is
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S

index 1b9b174bee86..c1c05b8684ca 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1378,6 +1378,7 @@ skpinv:    addi    r6,r6,1    /* 
Increment */

  1:    mflr    r6
  addi    r6,r6,(2f - 1b)
  tovirt(r6,r6)
+    add    r6,r6,r19
  lis    r7,MSR_KERNEL@h
  ori    r7,r7,MSR_KERNEL@l
  mtspr    SPRN_SRR0,r6
@@ -1400,6 +1401,7 @@ skpinv:    addi    r6,r6,1    /* 
Increment */

  /* We translate LR and return */
  tovirt(r8,r8)
+    add    r8,r8,r19
  mtlr    r8
  blr
@@ -1528,6 +1530,7 @@ a2_tlbinit_code_end:
   */
  _GLOBAL(start_initialization_book3e)
  mflr    r28
+    li    r19, 0
  /* First, we need to setup some initial TLBs to map the kernel
   * text, data and bss at PAGE_OFFSET. We don't have a real mode
@@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init)
  cmplwi    r4,0
  bne    2f
+    li    r19, 0
+#ifdef CONFIG_RANDOMIZE_BASE
+    LOAD_REG_ADDR_PIC(r19, __kaslr_offset)
+    lwz    r19,0(r19)
+    rlwinm  r19,r19,0,0,5
+#endif
  /* Setup TLB for this core */
  bl    initial_tlb_book3e
@@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init)
  lis    r3,PAGE_OFFSET@highest
  sldi    r3,r3,32
  or    r28,r28,r3
+    add    r28,r28,r19
  1:    mtlr    r28
  blr
diff --git a/arch/powerpc/kernel/head_64.S 
b/arch/powerpc/kernel/head_64.S

index ad79fddb974d..744624140fb8 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
  .8byte    0x0
  #ifdef CONFIG_RELOCATABLE
+#ifdef CONFIG_RANDOMIZE_BASE
+    . = 0x58
+    .globl    __kaslr_offset
+__kaslr_offset:
+DEFINE_FIXED_SYMBOL(__kaslr_offset)
+    .long    0
+#endif
  /* This flag is set to 1 by a loader if the kernel should run
   * at the loaded address instead of the linked address.  This
   * is used by kexec-tools to keep the the kdump kernel in the
diff --git a/arch/powerpc/kernel/setup_64.c 
b/arch/powerpc/kernel/setup_64.c

index 6104917a282d..a16b970a8d1a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -66,7 +66,7 @@
  #include 
  #include 
  #include 
-


Why remove this new line which clearly separates things in asm/ and 
things in local dir ?


Sorry to break this. I will add the new line back.




+#include 
  #include "setup.h"
  int spinning_secondaries;
@@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr)
  /* Enable early debugging if any specified (see udbg.h) */
  udbg_early_init();
+    kaslr_early_init(__va(dt_ptr), 0);
+
  udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr);
  /*
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 3e1c85c7d10b..bbd721d1e3d7 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr);
  extern void loadcam_entry(unsigned int index);
  extern void loadcam_multi(int first_idx, int num, int tmp_idx);
-#ifdef CONFIG_RANDOMIZE_BASE
-void kaslr_early_init(void *d

Re: [PATCH v3 5/6] powerpc/fsl_booke/64: clear the original kernel if randomized

2020-02-25 Thread Jason Yan




在 2020/2/20 21:49, Christophe Leroy 写道:



Le 06/02/2020 à 03:58, Jason Yan a écrit :

The original kernel still exists in the memory, clear it now.


No such problem with PPC32 ? Or is that common ?



PPC32 did this in relocate_init() in fsl_booke.c because PPC32 will not 
reach kaslr_early_init for the second pass after relocation.


Thanks,
Jason


Christophe



Signed-off-by: Jason Yan 
Cc: Scott Wood 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
---
  arch/powerpc/mm/nohash/kaslr_booke.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c 
b/arch/powerpc/mm/nohash/kaslr_booke.c

index c6f5c1db1394..ed1277059368 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -378,8 +378,10 @@ notrace void __init kaslr_early_init(void 
*dt_ptr, phys_addr_t size)

  unsigned int *__kaslr_offset = (unsigned int *)(KERNELBASE + 0x58);
  unsigned int *__run_at_load = (unsigned int *)(KERNELBASE + 0x5c);
-    if (*__run_at_load == 1)
+    if (*__run_at_load == 1) {
+    kaslr_late_init();
  return;
+    }
  /* Setup flat device-tree pointer */
  initial_boot_params = dt_ptr;



.




Re: [PATCH v3 6/6] powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst and add 64bit part

2020-02-25 Thread Jason Yan




在 2020/2/20 21:50, Christophe Leroy 写道:



Le 06/02/2020 à 03:58, Jason Yan a écrit :

Now we support both 32 and 64 bit KASLR for fsl booke. Add document for
64 bit part and rename kaslr-booke32.rst to kaslr-booke.rst.

Signed-off-by: Jason Yan 
Cc: Scott Wood 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
---
  .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 ---
  1 file changed, 31 insertions(+), 4 deletions(-)
  rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} 
(59%)


Also update Documentation/powerpc/index.rst ?



Oh yes, thanks for reminding me of this.

Thanks,
Jason


Christophe

.




Re: [PATCH v2 3/3] ASoC: fsl_easrc: Add EASRC ASoC CPU DAI and platform drivers

2020-02-25 Thread Nicolin Chen
On Wed, Feb 26, 2020 at 09:51:39AM +0800, Shengjiu Wang wrote:
> > > > > +static const struct regmap_config fsl_easrc_regmap_config = {
> > > > > + .readable_reg = fsl_easrc_readable_reg,
> > > > > + .volatile_reg = fsl_easrc_volatile_reg,
> > > > > + .writeable_reg = fsl_easrc_writeable_reg,
> > > >
> > > > Can we use regmap_range and regmap_access_table?
> > > >
> > >
> > > Can the regmap_range support discontinuous registers?  The
> > > reg_stride = 4.
> >
> > I think it does. Giving an example here:
> > https://github.com/torvalds/linux/blob/master/drivers/mfd/da9063-i2c.c
> 
> The register in this i2c driver are continuous,  from 0x00, 0x01, 0x02...
> 
> But our case is 0x00, 0x04, 0x08, does it work?

Ah...I see your point now. I am not very sure -- have only used
in I2C drivers. You can ignore if it doesn't likely work for us.


Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64

2020-02-25 Thread Jason Yan




在 2020/2/26 10:40, Jason Yan 写道:



在 2020/2/20 21:48, Christophe Leroy 写道:



Le 06/02/2020 à 03:58, Jason Yan a écrit :

The implementation for Freescale BookE64 is similar as BookE32. One
difference is that Freescale BookE64 set up a TLB mapping of 1G during
booting. Another difference is that ppc64 needs the kernel to be
64K-aligned. So we can randomize the kernel in this 1G mapping and make
it 64K-aligned. This can save some code to creat another TLB map at
early boot. The disadvantage is that we only have about 1G/64K = 16384
slots to put the kernel in.

To support secondary cpu boot up, a variable __kaslr_offset was added in
first_256B section. This can help secondary cpu get the kaslr offset
before the 1:1 mapping has been setup.

Signed-off-by: Jason Yan 
Cc: Scott Wood 
Cc: Diana Craciun 
Cc: Michael Ellerman 
Cc: Christophe Leroy 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nicholas Piggin 
Cc: Kees Cook 
---
  arch/powerpc/Kconfig |  2 +-
  arch/powerpc/kernel/exceptions-64e.S | 10 +
  arch/powerpc/kernel/head_64.S    |  7 ++
  arch/powerpc/kernel/setup_64.c   |  4 +++-
  arch/powerpc/mm/mmu_decl.h   | 16 +++---
  arch/powerpc/mm/nohash/kaslr_booke.c | 33 +---
  6 files changed, 59 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c150a9d49343..754aeb96bb1c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -568,7 +568,7 @@ config RELOCATABLE
  config RANDOMIZE_BASE
  bool "Randomize the address of the kernel image"
-    depends on (FSL_BOOKE && FLATMEM && PPC32)
+    depends on (PPC_FSL_BOOK3E && FLATMEM)
  depends on RELOCATABLE
  help
    Randomizes the virtual address at which the kernel image is
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S

index 1b9b174bee86..c1c05b8684ca 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -1378,6 +1378,7 @@ skpinv:    addi    r6,r6,1    /* 
Increment */

  1:    mflr    r6
  addi    r6,r6,(2f - 1b)
  tovirt(r6,r6)
+    add    r6,r6,r19
  lis    r7,MSR_KERNEL@h
  ori    r7,r7,MSR_KERNEL@l
  mtspr    SPRN_SRR0,r6
@@ -1400,6 +1401,7 @@ skpinv:    addi    r6,r6,1    /* 
Increment */

  /* We translate LR and return */
  tovirt(r8,r8)
+    add    r8,r8,r19
  mtlr    r8
  blr
@@ -1528,6 +1530,7 @@ a2_tlbinit_code_end:
   */
  _GLOBAL(start_initialization_book3e)
  mflr    r28
+    li    r19, 0
  /* First, we need to setup some initial TLBs to map the kernel
   * text, data and bss at PAGE_OFFSET. We don't have a real mode
@@ -1570,6 +1573,12 @@ _GLOBAL(book3e_secondary_core_init)
  cmplwi    r4,0
  bne    2f
+    li    r19, 0
+#ifdef CONFIG_RANDOMIZE_BASE
+    LOAD_REG_ADDR_PIC(r19, __kaslr_offset)
+    lwz    r19,0(r19)
+    rlwinm  r19,r19,0,0,5
+#endif
  /* Setup TLB for this core */
  bl    initial_tlb_book3e
@@ -1602,6 +1611,7 @@ _GLOBAL(book3e_secondary_core_init)
  lis    r3,PAGE_OFFSET@highest
  sldi    r3,r3,32
  or    r28,r28,r3
+    add    r28,r28,r19
  1:    mtlr    r28
  blr
diff --git a/arch/powerpc/kernel/head_64.S 
b/arch/powerpc/kernel/head_64.S

index ad79fddb974d..744624140fb8 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -104,6 +104,13 @@ __secondary_hold_acknowledge:
  .8byte    0x0
  #ifdef CONFIG_RELOCATABLE
+#ifdef CONFIG_RANDOMIZE_BASE
+    . = 0x58
+    .globl    __kaslr_offset
+__kaslr_offset:
+DEFINE_FIXED_SYMBOL(__kaslr_offset)
+    .long    0
+#endif
  /* This flag is set to 1 by a loader if the kernel should run
   * at the loaded address instead of the linked address.  This
   * is used by kexec-tools to keep the the kdump kernel in the
diff --git a/arch/powerpc/kernel/setup_64.c 
b/arch/powerpc/kernel/setup_64.c

index 6104917a282d..a16b970a8d1a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -66,7 +66,7 @@
  #include 
  #include 
  #include 
-


Why remove this new line which clearly separates things in asm/ and 
things in local dir ?


Sorry to break this. I will add the new line back.




+#include 
  #include "setup.h"
  int spinning_secondaries;
@@ -300,6 +300,8 @@ void __init early_setup(unsigned long dt_ptr)
  /* Enable early debugging if any specified (see udbg.h) */
  udbg_early_init();
+    kaslr_early_init(__va(dt_ptr), 0);
+
  udbg_printf(" -> %s(), dt_ptr: 0x%lx\n", __func__, dt_ptr);
  /*
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 3e1c85c7d10b..bbd721d1e3d7 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -147,14 +147,6 @@ void reloc_kernel_entry(void *fdt, long addr);
  extern void loadcam_entry(unsigned int index);
  extern void loadcam_multi(int first_idx, int num, int tmp_idx);
-#ifdef CONFIG_RANDOMIZE

Re: [PATCH v3 26/32] powerpc/64: system call zero volatile registers when returning

2020-02-25 Thread Nicholas Piggin
Segher Boessenkool's on February 26, 2020 7:20 am:
> Hi!
> 
> On Wed, Feb 26, 2020 at 03:35:35AM +1000, Nicholas Piggin wrote:
>> Kernel addresses and potentially other sensitive data could be leaked
>> in volatile registers after a syscall.
> 
>>  cmpdi   r3,0
>>  bne .Lsyscall_restore_regs
>> +li  r0,0
>> +li  r4,0
>> +li  r5,0
>> +li  r6,0
>> +li  r7,0
>> +li  r8,0
>> +li  r9,0
>> +li  r10,0
>> +li  r11,0
>> +li  r12,0
>> +mtctr   r0
>> +mtspr   SPRN_XER,r0
>>  .Lsyscall_restore_regs_cont:
> 
> What about LR?  Is that taken care of later?

LR is preserved by sc as per ABI.

> This also deserves a big fat comment imo, it is very important after
> all, and not so obvious.

Sure I can add something.

Thanks,
Nick


Re: [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load

2020-02-25 Thread Nicholas Piggin
bugzilla-dae...@bugzilla.kernel.org's on February 26, 2020 1:26 am:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> Bug ID: 206669
>Summary: Little-endian kernel crashing on POWER8 on heavy
> big-endian PowerKVM load
>Product: Platform Specific/Hardware
>Version: 2.5
> Kernel Version: 5.4.x
>   Hardware: All
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: PPC-64
>   Assignee: platform_ppc...@kernel-bugs.osdl.org
>   Reporter: glaub...@physik.fu-berlin.de
> CC: mator...@gmail.com
> Regression: No
> 
> Created attachment 287605
>   --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit
> Backtrace of host system crashing with little-endian kernel
> 
> We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian
> unstable hosting a big-endian ppc64 virtual machine running the same kernel in
> big-endian mode.
> 
> When building OpenJDK-11 on the big-endian VM, the testsuite crashes the 
> *host*
> system which is little-endian with the following kernel backtrace. The problem
> reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host
> running 5.4.x.
> 
> Backtrace attached.

Thanks for the report, we need to get more data about the first BUG if 
we can. What function in your vmlinux contains address 
0xc017a778? (use nm or objdump etc) Is that the first message you get,
No warnings or anything else earlier in the dmesg?

Also 0xc02659a0 would be interesting.

When reproducing, do you ever get a clean trace of the first bug? Could
you try setting /proc/sys/kernel/panic_on_oops and reproducing?

Thanks,
Nick



[Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load

2020-02-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #1 from npig...@gmail.com ---
bugzilla-dae...@bugzilla.kernel.org's on February 26, 2020 1:26 am:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> Bug ID: 206669
>Summary: Little-endian kernel crashing on POWER8 on heavy
> big-endian PowerKVM load
>Product: Platform Specific/Hardware
>Version: 2.5
> Kernel Version: 5.4.x
>   Hardware: All
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: PPC-64
>   Assignee: platform_ppc...@kernel-bugs.osdl.org
>   Reporter: glaub...@physik.fu-berlin.de
> CC: mator...@gmail.com
> Regression: No
> 
> Created attachment 287605
>   --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit
> Backtrace of host system crashing with little-endian kernel
> 
> We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian
> unstable hosting a big-endian ppc64 virtual machine running the same kernel
> in
> big-endian mode.
> 
> When building OpenJDK-11 on the big-endian VM, the testsuite crashes the
> *host*
> system which is little-endian with the following kernel backtrace. The
> problem
> reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host
> running 5.4.x.
> 
> Backtrace attached.

Thanks for the report, we need to get more data about the first BUG if 
we can. What function in your vmlinux contains address 
0xc017a778? (use nm or objdump etc) Is that the first message you get,
No warnings or anything else earlier in the dmesg?

Also 0xc02659a0 would be interesting.

When reproducing, do you ever get a clean trace of the first bug? Could
you try setting /proc/sys/kernel/panic_on_oops and reproducing?

Thanks,
Nick

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH v3 00/14] Initial Prefixed Instruction support

2020-02-25 Thread Jordan Niethe
A future revision of the ISA will introduce prefixed instructions. A
prefixed instruction is composed of a 4-byte prefix followed by a
4-byte suffix.

All prefixes have the major opcode 1. A prefix will never be a valid
word instruction. A suffix may be an existing word instruction or a
new instruction.

This series enables prefixed instructions and extends the instruction
emulation to support them. Then the places where prefixed instructions
might need to be emulated are updated.

v3 is based on feedback from Christophe Leroy. The major changes:
- Completely replacing store_inst() with patch_instruction() in
  xmon
- Improve implementation of mread_instr() to not use mread().
- Base the series on top of
  https://patchwork.ozlabs.org/patch/1232619/ as this will effect
  kprobes.
- Some renaming and simplification of conditionals.

v2 incorporates feedback from Daniel Axtens and and Balamuruhan
S. The major changes are:
- Squashing together all commits about SRR1 bits
- Squashing all commits for supporting prefixed load stores
- Changing abbreviated references to sufx/prfx -> suffix/prefix
- Introducing macros for returning the length of an instruction
- Removing sign extension flag from pstd/pld in sstep.c
- Dropping patch  "powerpc/fault: Use analyse_instr() to check for
  store with updates to sp" from the series, it did not really fit
  with prefixed enablement in the first place and as reported by Greg
  Kurz did not work correctly.

Alistair Popple (1):
  powerpc: Enable Prefixed Instructions

Jordan Niethe (13):
  powerpc: Define new SRR1 bits for a future ISA version
  powerpc sstep: Prepare to support prefixed instructions
  powerpc sstep: Add support for prefixed load/stores
  powerpc sstep: Add support for prefixed fixed-point arithmetic
  powerpc: Support prefixed instructions in alignment handler
  powerpc/traps: Check for prefixed instructions in
facility_unavailable_exception()
  powerpc/xmon: Remove store_inst() for patch_instruction()
  powerpc/xmon: Add initial support for prefixed instructions
  powerpc/xmon: Dump prefixed instructions
  powerpc/kprobes: Support kprobes on prefixed instructions
  powerpc/uprobes: Add support for prefixed instructions
  powerpc/hw_breakpoints: Initial support for prefixed instructions
  powerpc: Add prefix support to mce_find_instr_ea_and_pfn()

 arch/powerpc/include/asm/kprobes.h|   5 +-
 arch/powerpc/include/asm/ppc-opcode.h |  13 ++
 arch/powerpc/include/asm/reg.h|   7 +-
 arch/powerpc/include/asm/sstep.h  |   9 +-
 arch/powerpc/include/asm/uaccess.h|  25 
 arch/powerpc/include/asm/uprobes.h|  16 ++-
 arch/powerpc/kernel/align.c   |   8 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  23 
 arch/powerpc/kernel/hw_breakpoint.c   |   9 +-
 arch/powerpc/kernel/kprobes.c |  43 --
 arch/powerpc/kernel/mce_power.c   |   6 +-
 arch/powerpc/kernel/optprobes.c   |  31 +++--
 arch/powerpc/kernel/optprobes_head.S  |   6 +
 arch/powerpc/kernel/traps.c   |  22 ++-
 arch/powerpc/kernel/uprobes.c |   4 +-
 arch/powerpc/kvm/book3s_hv_nested.c   |   2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c   |   2 +-
 arch/powerpc/kvm/emulate_loadstore.c  |   2 +-
 arch/powerpc/lib/sstep.c  | 191 +-
 arch/powerpc/lib/test_emulate_step.c  |  30 ++--
 arch/powerpc/xmon/xmon.c  | 140 +++
 21 files changed, 497 insertions(+), 97 deletions(-)

-- 
2.17.1



[PATCH v3 01/14] powerpc: Enable Prefixed Instructions

2020-02-25 Thread Jordan Niethe
From: Alistair Popple 

Prefix instructions have their own FSCR bit which needs to enabled via
a CPU feature. The kernel will save the FSCR for problem state but it
needs to be enabled initially.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/include/asm/reg.h|  3 +++
 arch/powerpc/kernel/dt_cpu_ftrs.c | 23 +++
 2 files changed, 26 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1aa46dff0957..c7758c2ccc5f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -397,6 +397,7 @@
 #define SPRN_RWMR  0x375   /* Region-Weighting Mode Register */
 
 /* HFSCR and FSCR bit numbers are the same */
+#define FSCR_PREFIX_LG 13  /* Enable Prefix Instructions */
 #define FSCR_SCV_LG12  /* Enable System Call Vectored */
 #define FSCR_MSGP_LG   10  /* Enable MSGP */
 #define FSCR_TAR_LG8   /* Enable Target Address Register */
@@ -408,11 +409,13 @@
 #define FSCR_VECVSX_LG 1   /* Enable VMX/VSX  */
 #define FSCR_FP_LG 0   /* Enable Floating Point */
 #define SPRN_FSCR  0x099   /* Facility Status & Control Register */
+#define   FSCR_PREFIX  __MASK(FSCR_PREFIX_LG)
 #define   FSCR_SCV __MASK(FSCR_SCV_LG)
 #define   FSCR_TAR __MASK(FSCR_TAR_LG)
 #define   FSCR_EBB __MASK(FSCR_EBB_LG)
 #define   FSCR_DSCR__MASK(FSCR_DSCR_LG)
 #define SPRN_HFSCR 0xbe/* HV=1 Facility Status & Control Register */
+#define   HFSCR_PREFIX __MASK(FSCR_PREFIX_LG)
 #define   HFSCR_MSGP   __MASK(FSCR_MSGP_LG)
 #define   HFSCR_TAR__MASK(FSCR_TAR_LG)
 #define   HFSCR_EBB__MASK(FSCR_EBB_LG)
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 182b4047c1ef..396f2c6c588e 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -553,6 +553,28 @@ static int __init feat_enable_large_ci(struct 
dt_cpu_feature *f)
return 1;
 }
 
+static int __init feat_enable_prefix(struct dt_cpu_feature *f)
+{
+   u64 fscr, hfscr;
+
+   if (f->usable_privilege & USABLE_HV) {
+   hfscr = mfspr(SPRN_HFSCR);
+   hfscr |= HFSCR_PREFIX;
+   mtspr(SPRN_HFSCR, hfscr);
+   }
+
+   if (f->usable_privilege & USABLE_OS) {
+   fscr = mfspr(SPRN_FSCR);
+   fscr |= FSCR_PREFIX;
+   mtspr(SPRN_FSCR, fscr);
+
+   if (f->usable_privilege & USABLE_PR)
+   current->thread.fscr |= FSCR_PREFIX;
+   }
+
+   return 1;
+}
+
 struct dt_cpu_feature_match {
const char *name;
int (*enable)(struct dt_cpu_feature *f);
@@ -626,6 +648,7 @@ static struct dt_cpu_feature_match __initdata
{"vector-binary128", feat_enable, 0},
{"vector-binary16", feat_enable, 0},
{"wait-v3", feat_enable, 0},
+   {"prefix-instructions", feat_enable_prefix, 0},
 };
 
 static bool __initdata using_dt_cpu_ftrs;
-- 
2.17.1



[PATCH v3 02/14] powerpc: Define new SRR1 bits for a future ISA version

2020-02-25 Thread Jordan Niethe
Add the BOUNDARY SRR1 bit definition for when the cause of an alignment
exception is a prefixed instruction that crosses a 64-byte boundary.
Add the PREFIXED SRR1 bit definition for exceptions caused by prefixed
instructions.

Bit 35 of SRR1 is called SRR1_ISI_N_OR_G. This name comes from it being
used to indicate that an ISI was due to the access being no-exec or
guarded. A future ISA version adds another purpose. It is also set if
there is an access in a cache-inhibited location for prefixed
instruction.  Rename from SRR1_ISI_N_OR_G to SRR1_ISI_N_G_OR_CIP.

Signed-off-by: Jordan Niethe 
---
v2: Combined all the commits concerning SRR1 bits.
---
 arch/powerpc/include/asm/reg.h  | 4 +++-
 arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c7758c2ccc5f..173f33df4fab 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -762,7 +762,7 @@
 #endif
 
 #define   SRR1_ISI_NOPT0x4000 /* ISI: Not found in hash */
-#define   SRR1_ISI_N_OR_G  0x1000 /* ISI: Access is no-exec or G */
+#define   SRR1_ISI_N_G_OR_CIP  0x1000 /* ISI: Access is no-exec or G or CI 
for a prefixed instruction */
 #define   SRR1_ISI_PROT0x0800 /* ISI: Other protection 
fault */
 #define   SRR1_WAKEMASK0x0038 /* reason for wakeup */
 #define   SRR1_WAKEMASK_P8 0x003c /* reason for wakeup on POWER8 and 9 
*/
@@ -789,6 +789,8 @@
 #define   SRR1_PROGADDR0x0001 /* SRR0 contains subsequent 
addr */
 
 #define   SRR1_MCE_MCP 0x0008 /* Machine check signal caused 
interrupt */
+#define   SRR1_BOUNDARY0x1000 /* Prefixed instruction 
crosses 64-byte boundary */
+#define   SRR1_PREFIXED0x2000 /* Exception caused by 
prefixed instruction */
 
 #define SPRN_HSRR0 0x13A   /* Save/Restore Register 0 */
 #define SPRN_HSRR1 0x13B   /* Save/Restore Register 1 */
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index dc97e5be76f6..6ab685227574 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1169,7 +1169,7 @@ static int kvmhv_translate_addr_nested(struct kvm_vcpu 
*vcpu,
} else if (vcpu->arch.trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
/* Can we execute? */
if (!gpte_p->may_execute) {
-   flags |= SRR1_ISI_N_OR_G;
+   flags |= SRR1_ISI_N_G_OR_CIP;
goto forward_to_l1;
}
} else {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 220305454c23..b53a9f1c1a46 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -1260,7 +1260,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned 
long addr,
status &= ~DSISR_NOHPTE;/* DSISR_NOHPTE == SRR1_ISI_NOPT */
if (!data) {
if (gr & (HPTE_R_N | HPTE_R_G))
-   return status | SRR1_ISI_N_OR_G;
+   return status | SRR1_ISI_N_G_OR_CIP;
if (!hpte_read_permission(pp, slb_v & key))
return status | SRR1_ISI_PROT;
} else if (status & DSISR_ISSTORE) {
-- 
2.17.1



[PATCH v3 03/14] powerpc sstep: Prepare to support prefixed instructions

2020-02-25 Thread Jordan Niethe
Currently all instructions are a single word long. A future ISA version
will include prefixed instructions which have a double word length. The
functions used for analysing and emulating instructions need to be
modified so that they can handle these new instruction types.

A prefixed instruction is a word prefix followed by a word suffix. All
prefixes uniquely have the primary op-code 1. Suffixes may be valid word
instructions or instructions that only exist as suffixes.

In handling prefixed instructions it will be convenient to treat the
suffix and prefix as separate words. To facilitate this modify
analyse_instr() and emulate_step() to take a suffix as a
parameter. For word instructions it does not matter what is passed in
here - it will be ignored.

We also define a new flag, PREFIXED, to be used in instruction_op:type.
This flag will indicate when emulating an analysed instruction if the
NIP should be advanced by word length or double word length.

The callers of analyse_instr() and emulate_step() will need their own
changes to be able to support prefixed instructions. For now modify them
to pass in 0 as a suffix.

Note that at this point no prefixed instructions are emulated or
analysed - this is just making it possible to do so.

Signed-off-by: Jordan Niethe 
---
v2: - Move definition of __get_user_instr() and
__get_user_instr_inatomic() to "powerpc: Support prefixed instructions
in alignment handler."
- Use a macro for returning the length of an op
- Rename sufx -> suffix
- Define and use PPC_NO_SUFFIX instead of 0
v3: - Define and use OP_PREFIX
- Rename OP_LENGTH() to GETLENGTH()
- Define IS_PREFIX() as 0 for non 64 bit ppc
---
 arch/powerpc/include/asm/ppc-opcode.h | 13 
 arch/powerpc/include/asm/sstep.h  |  9 ++--
 arch/powerpc/kernel/align.c   |  2 +-
 arch/powerpc/kernel/hw_breakpoint.c   |  4 ++--
 arch/powerpc/kernel/kprobes.c |  2 +-
 arch/powerpc/kernel/mce_power.c   |  2 +-
 arch/powerpc/kernel/optprobes.c   |  3 ++-
 arch/powerpc/kernel/uprobes.c |  2 +-
 arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
 arch/powerpc/lib/sstep.c  | 12 ++-
 arch/powerpc/lib/test_emulate_step.c  | 30 +--
 arch/powerpc/xmon/xmon.c  |  5 +++--
 12 files changed, 54 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index c1df75edde44..24dc193cd3ef 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -158,6 +158,9 @@
 /* VMX Vector Store Instructions */
 #define OP_31_XOP_STVX  231
 
+/* Prefixed Instructions */
+#define OP_PREFIX  1
+
 #define OP_31   31
 #define OP_LWZ  32
 #define OP_STFS 52
@@ -377,6 +380,16 @@
 #define PPC_INST_VCMPEQUD  0x10c7
 #define PPC_INST_VCMPEQUB  0x1006
 
+/* macros for prefixed instructions */
+#ifdef __powerpc64__
+#define IS_PREFIX(x)   (((x) >> 26) == OP_PREFIX)
+#else
+#define IS_PREFIX(x)   (0)
+#endif
+
+#definePPC_NO_SUFFIX   0
+#definePPC_INST_LENGTH(x)  (IS_PREFIX(x) ? 8 : 4)
+
 /* macros to insert fields into opcodes */
 #define ___PPC_RA(a)   (((a) & 0x1f) << 16)
 #define ___PPC_RB(b)   (((b) & 0x1f) << 11)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 769f055509c9..5539df5c50a4 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -89,11 +89,15 @@ enum instruction_type {
 #define VSX_LDLEFT 4   /* load VSX register from left */
 #define VSX_CHECK_VEC  8   /* check MSR_VEC not MSR_VSX for reg >= 32 */
 
+/* Prefixed flag, ORed in with type */
+#define PREFIXED   0x800
+
 /* Size field in type word */
 #define SIZE(n)((n) << 12)
 #define GETSIZE(w) ((w) >> 12)
 
 #define GETTYPE(t) ((t) & INSTR_TYPE_MASK)
+#define GETLENGTH(t)   (((t) & PREFIXED) ? 8 : 4)
 
 #define MKOP(t, f, s)  ((t) | (f) | SIZE(s))
 
@@ -132,7 +136,7 @@ union vsx_reg {
  * otherwise.
  */
 extern int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
-unsigned int instr);
+unsigned int instr, unsigned int suffix);
 
 /*
  * Emulate an instruction that can be executed just by updating
@@ -149,7 +153,8 @@ void emulate_update_regs(struct pt_regs *reg, struct 
instruction_op *op);
  * 0 if it could not be emulated, or -1 for an instruction that
  * should not be emulated (rfid, mtmsrd clearing MSR_RI, etc.).
  */
-extern int emulate_step(struct pt_regs *regs, unsigned int instr);
+extern int emulate_step(struct pt_regs *regs, unsigned int instr,
+   unsigned int suffix);
 
 /*
  * Emulate a load or store instruction by reading/writing the
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index 92045ed64976..ba3bf5c3ab62 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/pow

[PATCH v3 04/14] powerpc sstep: Add support for prefixed load/stores

2020-02-25 Thread Jordan Niethe
This adds emulation support for the following prefixed integer
load/stores:
  * Prefixed Load Byte and Zero (plbz)
  * Prefixed Load Halfword and Zero (plhz)
  * Prefixed Load Halfword Algebraic (plha)
  * Prefixed Load Word and Zero (plwz)
  * Prefixed Load Word Algebraic (plwa)
  * Prefixed Load Doubleword (pld)
  * Prefixed Store Byte (pstb)
  * Prefixed Store Halfword (psth)
  * Prefixed Store Word (pstw)
  * Prefixed Store Doubleword (pstd)
  * Prefixed Load Quadword (plq)
  * Prefixed Store Quadword (pstq)

the follow prefixed floating-point load/stores:
  * Prefixed Load Floating-Point Single (plfs)
  * Prefixed Load Floating-Point Double (plfd)
  * Prefixed Store Floating-Point Single (pstfs)
  * Prefixed Store Floating-Point Double (pstfd)

and for the following prefixed VSX load/stores:
  * Prefixed Load VSX Scalar Doubleword (plxsd)
  * Prefixed Load VSX Scalar Single-Precision (plxssp)
  * Prefixed Load VSX Vector [0|1]  (plxv, plxv0, plxv1)
  * Prefixed Store VSX Scalar Doubleword (pstxsd)
  * Prefixed Store VSX Scalar Single-Precision (pstxssp)
  * Prefixed Store VSX Vector [0|1] (pstxv, pstxv0, pstxv1)

Signed-off-by: Jordan Niethe 
---
v2: - Combine all load/store patches
- Fix the name of Type 01 instructions
- Remove sign extension flag from pstd/pld
- Rename sufx -> suffix
v3: - Move prefixed loads and stores into the switch statement
---
 arch/powerpc/lib/sstep.c | 159 +++
 1 file changed, 159 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index efbe72370670..8e4ec953e279 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -187,6 +187,44 @@ static nokprobe_inline unsigned long xform_ea(unsigned int 
instr,
return ea;
 }
 
+/*
+ * Calculate effective address for a MLS:D-form / 8LS:D-form
+ * prefixed instruction
+ */
+static nokprobe_inline unsigned long mlsd_8lsd_ea(unsigned int instr,
+ unsigned int suffix,
+ const struct pt_regs *regs)
+{
+   int ra, prefix_r;
+   unsigned int  dd;
+   unsigned long ea, d0, d1, d;
+
+   prefix_r = instr & (1ul << 20);
+   ra = (suffix >> 16) & 0x1f;
+
+   d0 = instr & 0x3;
+   d1 = suffix & 0x;
+   d = (d0 << 16) | d1;
+
+   /*
+* sign extend a 34 bit number
+*/
+   dd = (unsigned int)(d >> 2);
+   ea = (signed int)dd;
+   ea = (ea << 2) | (d & 0x3);
+
+   if (!prefix_r && ra)
+   ea += regs->gpr[ra];
+   else if (!prefix_r && !ra)
+   ; /* Leave ea as is */
+   else if (prefix_r && !ra)
+   ea += regs->nip;
+   else if (prefix_r && ra)
+   ; /* Invalid form. Should already be checked for by caller! */
+
+   return ea;
+}
+
 /*
  * Return the largest power of 2, not greater than sizeof(unsigned long),
  * such that x is a multiple of it.
@@ -1166,6 +1204,7 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
  unsigned int instr, unsigned int suffix)
 {
unsigned int opcode, ra, rb, rc, rd, spr, u;
+   unsigned int suffixopcode, prefixtype, prefix_r;
unsigned long int imm;
unsigned long int val, val2;
unsigned int mb, me, sh;
@@ -2648,6 +2687,126 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
break;
}
break;
+   case 1: /* Prefixed instructions */
+   prefix_r = instr & (1ul << 20);
+   ra = (suffix >> 16) & 0x1f;
+   op->update_reg = ra;
+   rd = (suffix >> 21) & 0x1f;
+   op->reg = rd;
+   op->val = regs->gpr[rd];
+
+   suffixopcode = suffix >> 26;
+   prefixtype = (instr >> 24) & 0x3;
+   switch (prefixtype) {
+   case 0: /* Type 00  Eight-Byte Load/Store */
+   if (prefix_r && ra)
+   break;
+   op->ea = mlsd_8lsd_ea(instr, suffix, regs);
+   switch (suffixopcode) {
+   case 41:/* plwa */
+   op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4);
+   break;
+   case 42:/* plxsd */
+   op->reg = rd + 32;
+   op->type = MKOP(LOAD_VSX, PREFIXED, 8);
+   op->element_size = 8;
+   op->vsx_flags = VSX_CHECK_VEC;
+   break;
+   case 43:/* plxssp */
+   op->reg = rd + 32;
+   op->type = MKOP(LOAD_VSX, PREFIXED, 4);
+   op->element_size = 8;
+   op->vsx_flags = VSX_FPCONV | VS

[PATCH v3 05/14] powerpc sstep: Add support for prefixed fixed-point arithmetic

2020-02-25 Thread Jordan Niethe
This adds emulation support for the following prefixed Fixed-Point
Arithmetic instructions:
  * Prefixed Add Immediate (paddi)

Signed-off-by: Jordan Niethe 
---
v3: Since we moved the prefixed loads/stores into the load/store switch
statement it no longer makes sense to have paddi in there, so move it
out.
---
 arch/powerpc/lib/sstep.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 8e4ec953e279..f2010a3e1e06 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1331,6 +1331,26 @@ int analyse_instr(struct instruction_op *op, const 
struct pt_regs *regs,
 
switch (opcode) {
 #ifdef __powerpc64__
+   case 1:
+   prefix_r = instr & (1ul << 20);
+   ra = (suffix >> 16) & 0x1f;
+   rd = (suffix >> 21) & 0x1f;
+   op->reg = rd;
+   op->val = regs->gpr[rd];
+   suffixopcode = suffix >> 26;
+   prefixtype = (instr >> 24) & 0x3;
+   switch (prefixtype) {
+   case 2:
+   if (prefix_r && ra)
+   return 0;
+   switch (suffixopcode) {
+   case 14:/* paddi */
+   op->type = COMPUTE | PREFIXED;
+   op->val = mlsd_8lsd_ea(instr, suffix, regs);
+   goto compute_done;
+   }
+   }
+   break;
case 2: /* tdi */
if (rd & trap_compare(regs->gpr[ra], (short) instr))
goto trap;
-- 
2.17.1



[PATCH v3 06/14] powerpc: Support prefixed instructions in alignment handler

2020-02-25 Thread Jordan Niethe
Alignment interrupts can be caused by prefixed instructions accessing
memory. In the alignment handler the instruction that caused the
exception is loaded and attempted emulate. If the instruction is a
prefixed instruction load the prefix and suffix to emulate. After
emulating increment the NIP by 8.

Prefixed instructions are not permitted to cross 64-byte boundaries. If
they do the alignment interrupt is invoked with SRR1 BOUNDARY bit set.
If this occurs send a SIGBUS to the offending process if in user mode.
If in kernel mode call bad_page_fault().

Signed-off-by: Jordan Niethe 
---
v2: - Move __get_user_instr() and __get_user_instr_inatomic() to this
commit (previously in "powerpc sstep: Prepare to support prefixed
instructions").
- Rename sufx to suffix
- Use a macro for calculating instruction length
v3: Move __get_user_{instr(), instr_inatomic()} up with the other
get_user definitions and remove nested if.
---
 arch/powerpc/include/asm/uaccess.h | 25 +
 arch/powerpc/kernel/align.c|  8 +---
 arch/powerpc/kernel/traps.c| 21 -
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 2f500debae21..8903a96cbb4b 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -105,6 +105,31 @@ static inline int __access_ok(unsigned long addr, unsigned 
long size,
 #define __put_user_inatomic(x, ptr) \
__put_user_nosleep((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
 
+/*
+ * When reading an instruction iff it is a prefix, the suffix needs to be also
+ * loaded.
+ */
+#define __get_user_instr(x, y, ptr)\
+({ \
+   long __gui_ret = 0; \
+   y = 0;  \
+   __gui_ret = __get_user(x, ptr); \
+   if (!__gui_ret && IS_PREFIX(x)) \
+   __gui_ret = __get_user(y, ptr + 1); \
+   __gui_ret;  \
+})
+
+#define __get_user_instr_inatomic(x, y, ptr)   \
+({ \
+   long __gui_ret = 0; \
+   y = 0;  \
+   __gui_ret = __get_user_inatomic(x, ptr);\
+   if (!__gui_ret && IS_PREFIX(x)) \
+   __gui_ret = __get_user_inatomic(y, ptr + 1);\
+   __gui_ret;  \
+})
+
+
 extern long __put_user_bad(void);
 
 /*
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index ba3bf5c3ab62..4984cf681215 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -293,7 +293,7 @@ static int emulate_spe(struct pt_regs *regs, unsigned int 
reg,
 
 int fix_alignment(struct pt_regs *regs)
 {
-   unsigned int instr;
+   unsigned int instr, suffix;
struct instruction_op op;
int r, type;
 
@@ -303,13 +303,15 @@ int fix_alignment(struct pt_regs *regs)
 */
CHECK_FULL_REGS(regs);
 
-   if (unlikely(__get_user(instr, (unsigned int __user *)regs->nip)))
+   if (unlikely(__get_user_instr(instr, suffix,
+ (unsigned int __user *)regs->nip)))
return -EFAULT;
if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE)) {
/* We don't handle PPC little-endian any more... */
if (cpu_has_feature(CPU_FTR_PPC_LE))
return -EIO;
instr = swab32(instr);
+   suffix = swab32(suffix);
}
 
 #ifdef CONFIG_SPE
@@ -334,7 +336,7 @@ int fix_alignment(struct pt_regs *regs)
if ((instr & 0xfc0006fe) == (PPC_INST_COPY & 0xfc0006fe))
return -EIO;
 
-   r = analyse_instr(&op, regs, instr, PPC_NO_SUFFIX);
+   r = analyse_instr(&op, regs, instr, suffix);
if (r < 0)
return -EINVAL;
 
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 82a3438300fd..d80b82fc1ae3 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -583,6 +583,10 @@ static inline int check_io_access(struct pt_regs *regs)
 #define REASON_ILLEGAL (ESR_PIL | ESR_PUO)
 #define REASON_PRIVILEGED  ESR_PPR
 #define REASON_TRAPESR_PTR
+#define REASON_PREFIXED0
+#define REASON_BOUNDARY0
+
+#define inst_length(reason)4
 
 /* single-step stuff */
 #define single_stepping(regs)  (current->thread.debug.dbcr0 & DBCR0_IC)
@@ -597,6 +601,10 @@ static inline int check_io_access(struct pt_regs *regs)
 #define REASON_ILLEGAL SRR1_PROGILL
 #define REASON_PRIVILEGED  SRR1_PROGPRIV
 #define REASON_TRAPSRR1_PROGTRAP
+#define REASON_PREFIXEDSRR1_PREFIXED
+#define REASON_BOUN

[PATCH v3 07/14] powerpc/traps: Check for prefixed instructions in facility_unavailable_exception()

2020-02-25 Thread Jordan Niethe
If prefixed instructions are made unavailable by the [H]FSCR, attempting
to use them will cause a facility unavailable exception. Add "PREFIX" to
the facility_strings[].

Currently there are no prefixed instructions that are actually emulated
by emulate_instruction() within facility_unavailable_exception().
However, when caused by a prefixed instructions the SRR1 PREFIXED bit is
set. Prepare for dealing with emulated prefixed instructions by checking
for this bit.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kernel/traps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index d80b82fc1ae3..cd8b3043c268 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1739,6 +1739,7 @@ void facility_unavailable_exception(struct pt_regs *regs)
[FSCR_TAR_LG] = "TAR",
[FSCR_MSGP_LG] = "MSGP",
[FSCR_SCV_LG] = "SCV",
+   [FSCR_PREFIX_LG] = "PREFIX",
};
char *facility = "unknown";
u64 value;
-- 
2.17.1



[PATCH v3 08/14] powerpc/xmon: Remove store_inst() for patch_instruction()

2020-02-25 Thread Jordan Niethe
For modifying instructions in xmon, patch_instruction() can serve the
same role that store_inst() is performing with the advantage of not
being specific to xmon. In some places patch_instruction() is already
being using followed by store_inst(). In these cases just remove the
store_inst(). Otherwise replace store_inst() with patch_instruction().

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/xmon/xmon.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 897e512c6379..a673cf55641c 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -325,11 +325,6 @@ static inline void sync(void)
asm volatile("sync; isync");
 }
 
-static inline void store_inst(void *p)
-{
-   asm volatile ("dcbst 0,%0; sync; icbi 0,%0; isync" : : "r" (p));
-}
-
 static inline void cflush(void *p)
 {
asm volatile ("dcbf 0,%0; icbi 0,%0" : : "r" (p));
@@ -882,8 +877,7 @@ static struct bpt *new_breakpoint(unsigned long a)
for (bp = bpts; bp < &bpts[NBPTS]; ++bp) {
if (!bp->enabled && atomic_read(&bp->ref_count) == 0) {
bp->address = a;
-   bp->instr[1] = bpinstr;
-   store_inst(&bp->instr[1]);
+   patch_instruction(&bp->instr[1], bpinstr);
return bp;
}
}
@@ -913,7 +907,7 @@ static void insert_bpts(void)
bp->enabled = 0;
continue;
}
-   store_inst(&bp->instr[0]);
+   patch_instruction(&bp->instr[0], bp->instr[0]);
if (bp->enabled & BP_CIABR)
continue;
if (patch_instruction((unsigned int *)bp->address,
@@ -923,7 +917,6 @@ static void insert_bpts(void)
bp->enabled &= ~BP_TRAP;
continue;
}
-   store_inst((void *)bp->address);
}
 }
 
@@ -958,8 +951,6 @@ static void remove_bpts(void)
(unsigned int *)bp->address, bp->instr[0]) != 0)
printf("Couldn't remove breakpoint at %lx\n",
   bp->address);
-   else
-   store_inst((void *)bp->address);
}
 }
 
-- 
2.17.1



[PATCH v3 09/14] powerpc/xmon: Add initial support for prefixed instructions

2020-02-25 Thread Jordan Niethe
A prefixed instruction is composed of a word prefix and a word suffix.
It does not make sense to be able to have a breakpoint on the suffix of
a prefixed instruction, so make this impossible.

When leaving xmon_core() we check to see if we are currently at a
breakpoint. If this is the case, the breakpoint needs to be proceeded
from. Initially emulate_step() is tried, but if this fails then we need
to execute the saved instruction out of line. The NIP is set to the
address of bpt::instr[] for the current breakpoint.  bpt::instr[]
contains the instruction replaced by the breakpoint, followed by a trap
instruction.  After bpt::instr[0] is executed and we hit the trap we
enter back into xmon_bpt(). We know that if we got here and the offset
indicates we are at bpt::instr[1] then we have just executed out of line
so we can put the NIP back to the instruction after the breakpoint
location and continue on.

Adding prefixed instructions complicates this as the bpt::instr[1] needs
to be used to hold the suffix. To deal with this make bpt::instr[] big
enough for three word instructions.  bpt::instr[2] contains the trap,
and in the case of word instructions pad bpt::instr[1] with a noop.

No support for disassembling prefixed instructions.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
v3: - Just directly use PPC_INST_NOP
- Typo: plac -> place
- Rename read_inst() to mread_inst(). Do not have it call mread().
---
 arch/powerpc/xmon/xmon.c | 90 ++--
 1 file changed, 78 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a673cf55641c..a73a35aa4a75 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -97,7 +97,8 @@ static long *xmon_fault_jmp[NR_CPUS];
 /* Breakpoint stuff */
 struct bpt {
unsigned long   address;
-   unsigned intinstr[2];
+   /* Prefixed instructions can not cross 64-byte boundaries */
+   unsigned intinstr[3] __aligned(64);
atomic_tref_count;
int enabled;
unsigned long   pad;
@@ -120,6 +121,7 @@ static unsigned bpinstr = 0x7fe8;   /* trap */
 static int cmds(struct pt_regs *);
 static int mread(unsigned long, void *, int);
 static int mwrite(unsigned long, void *, int);
+static int mread_instr(unsigned long, unsigned int *, unsigned int *);
 static int handle_fault(struct pt_regs *);
 static void byterev(unsigned char *, int);
 static void memex(void);
@@ -701,7 +703,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
bp = at_breakpoint(regs->nip);
if (bp != NULL) {
int stepped = emulate_step(regs, bp->instr[0],
-  PPC_NO_SUFFIX);
+  bp->instr[1]);
if (stepped == 0) {
regs->nip = (unsigned long) &bp->instr[0];
atomic_inc(&bp->ref_count);
@@ -756,8 +758,8 @@ static int xmon_bpt(struct pt_regs *regs)
 
/* Are we at the trap at bp->instr[1] for some bp? */
bp = in_breakpoint_table(regs->nip, &offset);
-   if (bp != NULL && offset == 4) {
-   regs->nip = bp->address + 4;
+   if (bp != NULL && (offset == 4 || offset == 8)) {
+   regs->nip = bp->address + offset;
atomic_dec(&bp->ref_count);
return 1;
}
@@ -858,8 +860,9 @@ static struct bpt *in_breakpoint_table(unsigned long nip, 
unsigned long *offp)
if (off >= sizeof(bpts))
return NULL;
off %= sizeof(struct bpt);
-   if (off != offsetof(struct bpt, instr[0])
-   && off != offsetof(struct bpt, instr[1]))
+   if (off != offsetof(struct bpt, instr[0]) &&
+   off != offsetof(struct bpt, instr[1]) &&
+   off != offsetof(struct bpt, instr[2]))
return NULL;
*offp = off - offsetof(struct bpt, instr[0]);
return (struct bpt *) (nip - off);
@@ -876,8 +879,16 @@ static struct bpt *new_breakpoint(unsigned long a)
 
for (bp = bpts; bp < &bpts[NBPTS]; ++bp) {
if (!bp->enabled && atomic_read(&bp->ref_count) == 0) {
+   /*
+* Prefixed instructions are two words, but regular
+* instructions are only one. Use a nop to pad out the
+* regular instructions so that we can place the trap
+* at the same place. For prefixed instructions the nop
+* will get overwritten during insert_bpts().
+*/
bp->address = a;
-   patch_instruction(&bp->instr[1], bpinstr);
+   patch_instruction(&bp->instr[1], PPC_INST_NOP);
+   patch_instruction(&bp->instr[2], bpinstr);
return bp

[PATCH v3 10/14] powerpc/xmon: Dump prefixed instructions

2020-02-25 Thread Jordan Niethe
Currently when xmon is dumping instructions it reads a word at a time
and then prints that instruction (either as a hex number or by
disassembling it). For prefixed instructions it would be nice to show
its prefix and suffix as together. Use read_instr() so that if a prefix
is encountered its suffix is loaded too. Then print these in the form:
prefix:suffix
Xmon uses the disassembly routines from GNU binutils. These currently do
not support prefixed instructions so we will not disassemble the
prefixed instructions yet.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
v3: Simplify generic_inst_dump()
---
 arch/powerpc/xmon/xmon.c | 38 ++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index a73a35aa4a75..bf304189e33a 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2900,6 +2900,21 @@ prdump(unsigned long adrs, long ndump)
}
 }
 
+static bool instrs_are_equal(unsigned long insta, unsigned long suffixa,
+unsigned long instb, unsigned long suffixb)
+{
+   if (insta != instb)
+   return false;
+
+   if (!IS_PREFIX(insta) && !IS_PREFIX(instb))
+   return true;
+
+   if (IS_PREFIX(insta) && IS_PREFIX(instb))
+   return suffixa == suffixb;
+
+   return false;
+}
+
 typedef int (*instruction_dump_func)(unsigned long inst, unsigned long addr);
 
 static int
@@ -2908,12 +2923,11 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
 {
int nr, dotted;
unsigned long first_adr;
-   unsigned int inst, last_inst = 0;
-   unsigned char val[4];
+   unsigned int inst, suffix, last_inst = 0, last_suffix = 0;
 
dotted = 0;
-   for (first_adr = adr; count > 0; --count, adr += 4) {
-   nr = mread(adr, val, 4);
+   for (first_adr = adr; count > 0; --count, adr += nr) {
+   nr = mread_instr(adr, &inst, &suffix);
if (nr == 0) {
if (praddr) {
const char *x = fault_chars[fault_type];
@@ -2921,8 +2935,9 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
}
break;
}
-   inst = GETWORD(val);
-   if (adr > first_adr && inst == last_inst) {
+   if (adr > first_adr && instrs_are_equal(inst, suffix,
+   last_inst,
+   last_suffix)) {
if (!dotted) {
printf(" ...\n");
dotted = 1;
@@ -2931,10 +2946,17 @@ generic_inst_dump(unsigned long adr, long count, int 
praddr,
}
dotted = 0;
last_inst = inst;
-   if (praddr)
+   last_suffix = suffix;
+   if (praddr) {
printf(REG"  %.8x", adr, inst);
+   if (IS_PREFIX(inst))
+   printf(":%.8x", suffix);
+   }
printf("\t");
-   dump_func(inst, adr);
+   if (IS_PREFIX(inst))
+   printf("%.8x:%.8x", inst, suffix);
+   else
+   dump_func(inst, adr);
printf("\n");
}
return adr - first_adr;
-- 
2.17.1



[PATCH v3 11/14] powerpc/kprobes: Support kprobes on prefixed instructions

2020-02-25 Thread Jordan Niethe
A prefixed instruction is composed of a word prefix followed by a word
suffix. It does not make sense to be able to have a kprobe on the suffix
of a prefixed instruction, so make this impossible.

Kprobes work by replacing an instruction with a trap and saving that
instruction to be single stepped out of place later. Currently there is
not enough space allocated to keep a prefixed instruction for single
stepping. Increase the amount of space allocated for holding the
instruction copy.

kprobe_post_handler() expects all instructions to be 4 bytes long which
means that it does not function correctly for prefixed instructions.
Add checks for prefixed instructions which will use a length of 8 bytes
instead.

For optprobes we normally patch in loading the instruction we put a
probe on into r4 before calling emulate_step(). We now make space and
patch in loading the suffix into r5 as well.

Signed-off-by: Jordan Niethe 
---
v3: - Base on top of  https://patchwork.ozlabs.org/patch/1232619/
- Change printing format to %x:%x
---
 arch/powerpc/include/asm/kprobes.h   |  5 ++--
 arch/powerpc/kernel/kprobes.c| 43 +---
 arch/powerpc/kernel/optprobes.c  | 32 -
 arch/powerpc/kernel/optprobes_head.S |  6 
 4 files changed, 60 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kprobes.h 
b/arch/powerpc/include/asm/kprobes.h
index 66b3f2983b22..0d44ce8a3163 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -38,12 +38,13 @@ extern kprobe_opcode_t optprobe_template_entry[];
 extern kprobe_opcode_t optprobe_template_op_address[];
 extern kprobe_opcode_t optprobe_template_call_handler[];
 extern kprobe_opcode_t optprobe_template_insn[];
+extern kprobe_opcode_t optprobe_template_suffix[];
 extern kprobe_opcode_t optprobe_template_call_emulate[];
 extern kprobe_opcode_t optprobe_template_ret[];
 extern kprobe_opcode_t optprobe_template_end[];
 
-/* Fixed instruction size for powerpc */
-#define MAX_INSN_SIZE  1
+/* Prefixed instructions are two words */
+#define MAX_INSN_SIZE  2
 #define MAX_OPTIMIZED_LENGTH   sizeof(kprobe_opcode_t) /* 4 bytes */
 #define MAX_OPTINSN_SIZE   (optprobe_template_end - 
optprobe_template_entry)
 #define RELATIVEJUMP_SIZE  sizeof(kprobe_opcode_t) /* 4 bytes */
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 6b2e9e37f12b..9ccf1b9a1275 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -117,16 +117,28 @@ void *alloc_insn_page(void)
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
+   struct kprobe *prev;
kprobe_opcode_t insn = *p->addr;
+   kprobe_opcode_t prefix = *(p->addr - 1);
 
+   preempt_disable();
if ((unsigned long)p->addr & 0x03) {
printk("Attempt to register kprobe at an unaligned address\n");
ret = -EINVAL;
} else if (IS_MTMSRD(insn) || IS_RFID(insn) || IS_RFI(insn)) {
printk("Cannot register a kprobe on rfi/rfid or mtmsr[d]\n");
ret = -EINVAL;
+   } else if (IS_PREFIX(prefix)) {
+   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
+   ret = -EINVAL;
+   }
+   prev = get_kprobe(p->addr - 1);
+   if (prev && IS_PREFIX(*prev->ainsn.insn)) {
+   printk("Cannot register a kprobe on the second word of prefixed 
instruction\n");
+   ret = -EINVAL;
}
 
+
/* insn must be on a special executable page on ppc64.  This is
 * not explicitly required on ppc32 (right now), but it doesn't hurt */
if (!ret) {
@@ -136,11 +148,14 @@ int arch_prepare_kprobe(struct kprobe *p)
}
 
if (!ret) {
-   patch_instruction(p->ainsn.insn, *p->addr);
+   patch_instruction(&p->ainsn.insn[0], p->addr[0]);
+   if (IS_PREFIX(insn))
+   patch_instruction(&p->ainsn.insn[1], p->addr[1]);
p->opcode = *p->addr;
}
 
p->ainsn.boostable = 0;
+   preempt_enable_no_resched();
return ret;
 }
 NOKPROBE_SYMBOL(arch_prepare_kprobe);
@@ -225,10 +240,11 @@ NOKPROBE_SYMBOL(arch_prepare_kretprobe);
 static int try_to_emulate(struct kprobe *p, struct pt_regs *regs)
 {
int ret;
-   unsigned int insn = *p->ainsn.insn;
+   unsigned int insn = p->ainsn.insn[0];
+   unsigned int suffix = p->ainsn.insn[1];
 
/* regs->nip is also adjusted if emulate_step returns 1 */
-   ret = emulate_step(regs, insn, PPC_NO_SUFFIX);
+   ret = emulate_step(regs, insn, suffix);
if (ret > 0) {
/*
 * Once this instruction has been boosted
@@ -242,7 +258,11 @@ static int try_to_emulate(struct kprobe *p, struct pt_regs 
*regs)
 * So, we should never get here... but, its still
 * good to catch them, just

[PATCH v3 12/14] powerpc/uprobes: Add support for prefixed instructions

2020-02-25 Thread Jordan Niethe
Uprobes can execute instructions out of line. Increase the size of the
buffer used  for this so that this works for prefixed instructions. Take
into account the length of prefixed instructions when fixing up the nip.

Signed-off-by: Jordan Niethe 
---
v2: - Fix typo
- Use macro for instruction length
---
 arch/powerpc/include/asm/uprobes.h | 16 
 arch/powerpc/kernel/uprobes.c  |  4 ++--
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/uprobes.h 
b/arch/powerpc/include/asm/uprobes.h
index 2bbdf27d09b5..5516ab27db47 100644
--- a/arch/powerpc/include/asm/uprobes.h
+++ b/arch/powerpc/include/asm/uprobes.h
@@ -14,18 +14,26 @@
 
 typedef ppc_opcode_t uprobe_opcode_t;
 
+/*
+ * Ensure we have enough space for prefixed instructions, which
+ * are double the size of a word instruction, i.e. 8 bytes.
+ */
 #define MAX_UINSN_BYTES4
-#define UPROBE_XOL_SLOT_BYTES  (MAX_UINSN_BYTES)
+#define UPROBE_XOL_SLOT_BYTES  (2 * MAX_UINSN_BYTES)
 
 /* The following alias is needed for reference from arch-agnostic code */
 #define UPROBE_SWBP_INSN   BREAKPOINT_INSTRUCTION
 #define UPROBE_SWBP_INSN_SIZE  4 /* swbp insn size in bytes */
 
 struct arch_uprobe {
+/*
+ * Ensure there is enough space for prefixed instructions. Prefixed
+ * instructions must not cross 64-byte boundaries.
+ */
union {
-   u32 insn;
-   u32 ixol;
-   };
+   uprobe_opcode_t insn[2];
+   uprobe_opcode_t ixol[2];
+   } __aligned(64);
 };
 
 struct arch_uprobe_task {
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 4ab40c4b576f..7e0334ad5cfe 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -111,7 +111,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * support doesn't exist and have to fix-up the next instruction
 * to be executed.
 */
-   regs->nip = utask->vaddr + MAX_UINSN_BYTES;
+   regs->nip = utask->vaddr + PPC_INST_LENGTH(auprobe->insn[0]);
 
user_disable_single_step(current);
return 0;
@@ -173,7 +173,7 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * emulate_step() returns 1 if the insn was successfully emulated.
 * For all other cases, we need to single-step in hardware.
 */
-   ret = emulate_step(regs, auprobe->insn, PPC_NO_SUFFIX);
+   ret = emulate_step(regs, auprobe->insn[0], auprobe->insn[1]);
if (ret > 0)
return true;
 
-- 
2.17.1



[PATCH v3 13/14] powerpc/hw_breakpoints: Initial support for prefixed instructions

2020-02-25 Thread Jordan Niethe
Currently when getting an instruction to emulate in
hw_breakpoint_handler() we do not load the suffix of a prefixed
instruction. Ensure we load the suffix if the instruction we need to
emulate is a prefixed instruction.

Signed-off-by: Jordan Niethe 
---
v2: Rename sufx to suffix
v3: Add __user to type cast to remove sparse warning
---
 arch/powerpc/kernel/hw_breakpoint.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 3a7ec6760dab..edf46356dfb2 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -243,15 +243,16 @@ dar_range_overlaps(unsigned long dar, int size, struct 
arch_hw_breakpoint *info)
 static bool stepping_handler(struct pt_regs *regs, struct perf_event *bp,
 struct arch_hw_breakpoint *info)
 {
-   unsigned int instr = 0;
+   unsigned int instr = 0, suffix = 0;
int ret, type, size;
struct instruction_op op;
unsigned long addr = info->address;
 
-   if (__get_user_inatomic(instr, (unsigned int *)regs->nip))
+   if (__get_user_instr_inatomic(instr, suffix,
+ (unsigned int __user *)regs->nip))
goto fail;
 
-   ret = analyse_instr(&op, regs, instr, PPC_NO_SUFFIX);
+   ret = analyse_instr(&op, regs, instr, suffix);
type = GETTYPE(op.type);
size = GETSIZE(op.type);
 
@@ -275,7 +276,7 @@ static bool stepping_handler(struct pt_regs *regs, struct 
perf_event *bp,
return false;
}
 
-   if (!emulate_step(regs, instr, PPC_NO_SUFFIX))
+   if (!emulate_step(regs, instr, suffix))
goto fail;
 
return true;
-- 
2.17.1



[PATCH v3 14/14] powerpc: Add prefix support to mce_find_instr_ea_and_pfn()

2020-02-25 Thread Jordan Niethe
mce_find_instr_ea_and_pfn analyses an instruction to determine the
effective address that caused the machine check. Update this to load and
pass the suffix to analyse_instr for prefixed instructions.

Signed-off-by: Jordan Niethe 
---
v2: - Rename sufx to suffix
---
 arch/powerpc/kernel/mce_power.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index 824eda536f5d..091bab4a5464 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -365,7 +365,7 @@ static int mce_find_instr_ea_and_phys(struct pt_regs *regs, 
uint64_t *addr,
 * in real-mode is tricky and can lead to recursive
 * faults
 */
-   int instr;
+   int instr, suffix = 0;
unsigned long pfn, instr_addr;
struct instruction_op op;
struct pt_regs tmp = *regs;
@@ -374,7 +374,9 @@ static int mce_find_instr_ea_and_phys(struct pt_regs *regs, 
uint64_t *addr,
if (pfn != ULONG_MAX) {
instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK);
instr = *(unsigned int *)(instr_addr);
-   if (!analyse_instr(&op, &tmp, instr, PPC_NO_SUFFIX)) {
+   if (IS_PREFIX(instr))
+   suffix = *(unsigned int *)(instr_addr + 4);
+   if (!analyse_instr(&op, &tmp, instr, suffix)) {
pfn = addr_to_pfn(regs, op.ea);
*addr = op.ea;
*phys_addr = (pfn << PAGE_SHIFT);
-- 
2.17.1



[PATCH] ocxl: Fix misleading comment

2020-02-25 Thread Andrew Donnellan
In ocxl_context_free() we note that the AFU reference we're releasing was
taken in "ocxl_context_init", a function that doesn't actually exist.

Fix it to say ocxl_context_alloc() instead, which I expect was what was
intended.

Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi devices")
Cc: Frederic Barrat 
Signed-off-by: Andrew Donnellan 
---
 drivers/misc/ocxl/context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index de8a66b9d76b..c21f65a5c762 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -287,7 +287,7 @@ void ocxl_context_free(struct ocxl_context *ctx)
 
ocxl_afu_irq_free_all(ctx);
idr_destroy(&ctx->irq_idr);
-   /* reference to the AFU taken in ocxl_context_init */
+   /* reference to the AFU taken in ocxl_context_alloc() */
ocxl_afu_put(ctx->afu);
kfree(ctx);
 }
-- 
2.20.1



[PATCH 0/3] mm/vma: some more minor changes

2020-02-25 Thread Anshuman Khandual
The motivation here is to consolidate VMA flags and helpers in generic
memory header and reduce code duplication when ever applicable. If there
are other possible similar instances which might be missing here, please
do let me me know. I will be happy to incorporate them.

This series is based on v5.6-rc3. This series has been build tested on
multiple platforms but boot tested only on arm64 and x86.

Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Cc: linux...@kvack.org

Anshuman Khandual (3):
  mm/vma: Move VM_NO_KHUGEPAGED into generic header
  mm/vma: Make vma_is_foreign() available for general use
  mm/vma: Make is_vma_temporary_stack() available for general use

 arch/powerpc/mm/book3s64/pkeys.c   | 12 
 arch/x86/include/asm/mmu_context.h | 15 ---
 include/linux/huge_mm.h|  2 --
 include/linux/mm.h | 28 +++-
 mm/khugepaged.c|  2 --
 mm/rmap.c  | 14 --
 6 files changed, 27 insertions(+), 46 deletions(-)

-- 
2.20.1



[PATCH 2/3] mm/vma: Make vma_is_foreign() available for general use

2020-02-25 Thread Anshuman Khandual
Idea of a foreign VMA with respect to the present context is very generic.
But currently there are two identical definitions for this in powerpc and
x86 platforms. Lets consolidate those redundant definitions while making
vma_is_foreign() available for general use later. This should not cause
any functional change.

Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Cc: linux...@kvack.org
Signed-off-by: Anshuman Khandual 
---
 arch/powerpc/mm/book3s64/pkeys.c   | 12 
 arch/x86/include/asm/mmu_context.h | 15 ---
 include/linux/mm.h | 11 +++
 3 files changed, 11 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
index 59e0ebbd8036..07527f1ed108 100644
--- a/arch/powerpc/mm/book3s64/pkeys.c
+++ b/arch/powerpc/mm/book3s64/pkeys.c
@@ -381,18 +381,6 @@ bool arch_pte_access_permitted(u64 pte, bool write, bool 
execute)
  * So do not enforce things if the VMA is not from the current mm, or if we are
  * in a kernel thread.
  */
-static inline bool vma_is_foreign(struct vm_area_struct *vma)
-{
-   if (!current->mm)
-   return true;
-
-   /* if it is not our ->mm, it has to be foreign */
-   if (current->mm != vma->vm_mm)
-   return true;
-
-   return false;
-}
-
 bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
   bool execute, bool foreign)
 {
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index b538d9ddee9c..4e55370e48e8 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -213,21 +213,6 @@ static inline void arch_unmap(struct mm_struct *mm, 
unsigned long start,
  * So do not enforce things if the VMA is not from the current
  * mm, or if we are in a kernel thread.
  */
-static inline bool vma_is_foreign(struct vm_area_struct *vma)
-{
-   if (!current->mm)
-   return true;
-   /*
-* Should PKRU be enforced on the access to this VMA?  If
-* the VMA is from another process, then PKRU has no
-* relevance and should not be enforced.
-*/
-   if (current->mm != vma->vm_mm)
-   return true;
-
-   return false;
-}
-
 static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
bool write, bool execute, bool foreign)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f7e400e6ea3..2fd4b9bec4be 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct mempolicy;
 struct anon_vma;
@@ -542,6 +543,16 @@ static inline bool vma_is_anonymous(struct vm_area_struct 
*vma)
return !vma->vm_ops;
 }
 
+static inline bool vma_is_foreign(struct vm_area_struct *vma)
+{
+   if (!current->mm)
+   return true;
+
+   if (current->mm != vma->vm_mm)
+   return true;
+
+   return false;
+}
 #ifdef CONFIG_SHMEM
 /*
  * The vma_is_shmem is not inline because it is used only by slow
-- 
2.20.1



[RFC PATCH] Use IS_ENABLED() instead of #ifdefs

2020-02-25 Thread Christophe Leroy
---
This works for me. Only had to leave the #ifdef around the map_mem_in_cams()
Also had to set linear_sz and ram for the alternative case, otherwise I get



arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init':
arch/powerpc/mm/nohash/kaslr_booke.c:355:33: error: 'linear_sz' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
  regions.pa_end = memstart_addr + linear_sz;
   ~~^~~
arch/powerpc/mm/nohash/kaslr_booke.c:315:21: note: 'linear_sz' was declared here
  unsigned long ram, linear_sz;
 ^
arch/powerpc/mm/nohash/kaslr_booke.c:187:8: error: 'ram' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
  ret = parse_crashkernel(boot_command_line, size, &crash_size,
^~~
 &crash_base);
 
arch/powerpc/mm/nohash/kaslr_booke.c:315:16: note: 'ram' was declared here
  unsigned long ram, linear_sz;

---
 arch/powerpc/mm/mmu_decl.h   |  2 +-
 arch/powerpc/mm/nohash/kaslr_booke.c | 97 +++-
 2 files changed, 52 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index b869ea893301..3700e7c04e51 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -139,9 +139,9 @@ extern unsigned long calc_cam_sz(unsigned long ram, 
unsigned long virt,
 extern void adjust_total_lowmem(void);
 extern int switch_to_as1(void);
 extern void restore_to_as0(int esel, int offset, void *dt_ptr, int bootcpu);
+#endif
 void create_kaslr_tlb_entry(int entry, unsigned long virt, phys_addr_t phys);
 extern int is_second_reloc;
-#endif
 
 void reloc_kernel_entry(void *fdt, long addr);
 extern void loadcam_entry(unsigned int index);
diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c 
b/arch/powerpc/mm/nohash/kaslr_booke.c
index c6f5c1db1394..bf69cece9b8c 100644
--- a/arch/powerpc/mm/nohash/kaslr_booke.c
+++ b/arch/powerpc/mm/nohash/kaslr_booke.c
@@ -267,35 +267,37 @@ static unsigned long __init kaslr_legal_offset(void 
*dt_ptr, unsigned long rando
unsigned long start;
unsigned long offset;
 
-#ifdef CONFIG_PPC32
-   /*
-* Decide which 64M we want to start
-* Only use the low 8 bits of the random seed
-*/
-   unsigned long index = random & 0xFF;
-   index %= regions.linear_sz / SZ_64M;
-
-   /* Decide offset inside 64M */
-   offset = random % (SZ_64M - regions.kernel_size);
-   offset = round_down(offset, SZ_16K);
+   if (IS_ENABLED(CONFIG_PPC32)) {
+   unsigned long index;
+
+   /*
+* Decide which 64M we want to start
+* Only use the low 8 bits of the random seed
+*/
+   index = random & 0xFF;
+   index %= regions.linear_sz / SZ_64M;
+
+   /* Decide offset inside 64M */
+   offset = random % (SZ_64M - regions.kernel_size);
+   offset = round_down(offset, SZ_16K);
+
+   while ((long)index >= 0) {
+   offset = memstart_addr + index * SZ_64M + offset;
+   start = memstart_addr + index * SZ_64M;
+   koffset = get_usable_address(dt_ptr, start, offset);
+   if (koffset)
+   break;
+   index--;
+   }
+   } else {
+   /* Decide kernel offset inside 1G */
+   offset = random % (SZ_1G - regions.kernel_size);
+   offset = round_down(offset, SZ_64K);
 
-   while ((long)index >= 0) {
-   offset = memstart_addr + index * SZ_64M + offset;
-   start = memstart_addr + index * SZ_64M;
+   start = memstart_addr;
+   offset = memstart_addr + offset;
koffset = get_usable_address(dt_ptr, start, offset);
-   if (koffset)
-   break;
-   index--;
}
-#else
-   /* Decide kernel offset inside 1G */
-   offset = random % (SZ_1G - regions.kernel_size);
-   offset = round_down(offset, SZ_64K);
-
-   start = memstart_addr;
-   offset = memstart_addr + offset;
-   koffset = get_usable_address(dt_ptr, start, offset);
-#endif
 
if (koffset != 0)
koffset -= memstart_addr;
@@ -342,6 +344,8 @@ static unsigned long __init kaslr_choose_location(void 
*dt_ptr, phys_addr_t size
/* If the linear size is smaller than 64M, do not randmize */
if (linear_sz < SZ_64M)
return 0;
+#else
+   linear_sz = ram = size;
 #endif
 
/* check for a reserved-memory node and record its cell sizes */
@@ -373,17 +377,19 @@ notrace void __init kaslr_early_init(void *dt_ptr, 
phys_addr_t size)
 {
unsigned long offset;
unsigned long kernel_sz;
+   unsigned int *__kaslr_offset;
+   unsigned int *__

Re: [PATCH v3 10/27] powerpc: Add driver for OpenCAPI Persistent Memory

2020-02-25 Thread Andrew Donnellan

On 21/2/20 2:27 pm, Alastair D'Silva wrote:

From: Alastair D'Silva 

This driver exposes LPC memory on OpenCAPI pmem cards
as an NVDIMM, allowing the existing nvram infrastructure
to be used.

Namespace metadata is stored on the media itself, so
scm_reserve_metadata() maps 1 section's worth of PMEM storage
at the start to hold this. The rest of the PMEM range is registered
with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size() provide
callbacks to libnvdimm to access the metadata.

Signed-off-by: Alastair D'Silva 


I'm not particularly familiar with the nvdimm subsystem, so the scope of 
my review is more on the ocxl + misc issues side.


A few minor checkpatch warnings that don't matter all that much:

https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/11786//artifact/linux/checkpatch.log

A few other comments below.


diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl.c 
b/arch/powerpc/platforms/powernv/pmem/ocxl.c
new file mode 100644
index ..3c4eeb5dcc0f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/pmem/ocxl.c
@@ -0,0 +1,473 @@
+// SPDX-License-Id
+// Copyright 2019 IBM Corp.
+
+/*
+ * A driver for OpenCAPI devices that implement the Storage Class
+ * Memory specification.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ocxl_internal.h"
+
+
+static const struct pci_device_id ocxlpmem_pci_tbl[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
+   { }
+};
+
+MODULE_DEVICE_TABLE(pci, ocxlpmem_pci_tbl);
+
+#define NUM_MINORS 256 // Total to reserve
+
+static dev_t ocxlpmem_dev;
+static struct class *ocxlpmem_class;
+static struct mutex minors_idr_lock;
+static struct idr minors_idr;
+
+/**
+ * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from ndctl
+ * @ocxlpmem: the device metadata
+ * @command: the incoming data to write
+ * Return: 0 on success, negative on failure
+ */
+static int ndctl_config_write(struct ocxlpmem *ocxlpmem,
+ struct nd_cmd_set_config_hdr *command)
+{
+   if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
+   return -EINVAL;
+
+   memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset, 
command->in_buf,
+ command->in_length);


Out of scope for this patch - given that we use memcpy_mcsafe in the 
config read, does it make sense to change memcpy_flushcache to be mcsafe 
as well?



+
+   return 0;
+}
+
+/**
+ * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from ndctl
+ * @ocxlpmem: the device metadata
+ * @command: the read request
+ * Return: 0 on success, negative on failure
+ */
+static int ndctl_config_read(struct ocxlpmem *ocxlpmem,
+struct nd_cmd_get_config_data_hdr *command)
+{
+   if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
+   return -EINVAL;
+
+   memcpy_mcsafe(command->out_buf, ocxlpmem->metadata_addr + 
command->in_offset,
+ command->in_length);
+
+   return 0;
+}
+
+/**
+ * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command from ndctl
+ * @command: the read request
+ * Return: 0 on success, negative on failure
+ */
+static int ndctl_config_size(struct nd_cmd_get_config_size *command)
+{
+   command->status = 0;
+   command->config_size = LABEL_AREA_SIZE;
+   command->max_xfer = PAGE_SIZE;
+
+   return 0;
+}
+
+static int ndctl(struct nvdimm_bus_descriptor *nd_desc,
+struct nvdimm *nvdimm,
+unsigned int cmd, void *buf, unsigned int buf_len, int *cmd_rc)
+{
+   struct ocxlpmem *ocxlpmem = container_of(nd_desc, struct ocxlpmem, 
bus_desc);
+
+   switch (cmd) {
+   case ND_CMD_GET_CONFIG_SIZE:
+   *cmd_rc = ndctl_config_size(buf);
+   return 0;
+
+   case ND_CMD_GET_CONFIG_DATA:
+   *cmd_rc = ndctl_config_read(ocxlpmem, buf);
+   return 0;
+
+   case ND_CMD_SET_CONFIG_DATA:
+   *cmd_rc = ndctl_config_write(ocxlpmem, buf);
+   return 0;
+
+   default:
+   return -ENOTTY;
+   }
+}
+
+/**
+ * reserve_metadata() - Reserve space for nvdimm metadata
+ * @ocxlpmem: the device metadata
+ * @lpc_mem: The resource representing the LPC memory of the OpenCAPI device
+ */
+static int reserve_metadata(struct ocxlpmem *ocxlpmem,
+   struct resource *lpc_mem)
+{
+   ocxlpmem->metadata_addr = devm_memremap(&ocxlpmem->dev, lpc_mem->start,
+   LABEL_AREA_SIZE, MEMREMAP_WB);
+   if (IS_ERR(ocxlpmem->metadata_addr))
+   return PTR_ERR(ocxlpmem->metadata_addr);
+
+   return 0;
+}
+
+/**
+ * register_lpc_mem() - Discover persistent memory on a device and register it 
with the NVDIMM subsystem
+ * @ocxlpmem: the device metadata
+ * Return: 0 on success
+ */
+static int register_lpc_mem(struct ocxlpmem *ocxlpmem)
+{
+   struct nd_region_de

Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64

2020-02-25 Thread Christophe Leroy




Le 26/02/2020 à 03:40, Jason Yan a écrit :



在 2020/2/20 21:48, Christophe Leroy 写道:



Le 06/02/2020 à 03:58, Jason Yan a écrit :

  /*
   * Decide which 64M we want to start
   * Only use the low 8 bits of the random seed
   */
-    index = random & 0xFF;
+    unsigned long index = random & 0xFF;


That's not good in terms of readability, index declaration should 
remain at the top of the function, should be possible if using 
IS_ENABLED() instead


I'm wondering how to declare a variable inside a code block such as if 
(IS_ENABLED(CONFIG_PPC32)) at the top of the function and use the 
variable in another if (IS_ENABLED(CONFIG_PPC32)). Is there any good idea?


You declare it outside the block as usual:

unsigned long some_var;

if (condition) {
some_var = something;
}
do_many_things();
do_other_things();

if (condition)
return some_var;
else
return 0;


Christophe


Re: [PATCH v3 3/6] powerpc/fsl_booke/64: implement KASLR for fsl_booke64

2020-02-25 Thread Christophe Leroy




Le 26/02/2020 à 04:33, Jason Yan a écrit :



在 2020/2/26 10:40, Jason Yan 写道:



在 2020/2/20 21:48, Christophe Leroy 写道:



Le 06/02/2020 à 03:58, Jason Yan a écrit :

Hi Christophe,

When using a standard C if/else, all code compiled for PPC32 and PPC64, 
but this will bring some build error because not all variables both 
defined for PPC32 and PPC64.


[yanaijie@138 linux]$ sh ppc64build.sh
   CALL    scripts/atomic/check-atomics.sh
   CALL    scripts/checksyscalls.sh
   CHK include/generated/compile.h
   CC  arch/powerpc/mm/nohash/kaslr_booke.o
arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_choose_location':
arch/powerpc/mm/nohash/kaslr_booke.c:341:30: error: 
'CONFIG_LOWMEM_CAM_NUM' undeclared (first use in this function); did you 
mean 'CONFIG_FLATMEM_MANUAL'?

    ram = map_mem_in_cams(ram, CONFIG_LOWMEM_CAM_NUM, true);
   ^
   CONFIG_FLATMEM_MANUAL


This one has to remain inside an #ifdef. That's the only one that has to 
remain.


arch/powerpc/mm/nohash/kaslr_booke.c:341:30: note: each undeclared 
identifier is reported only once for each function it appears in

arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init':
arch/powerpc/mm/nohash/kaslr_booke.c:404:3: error: 'is_second_reloc' 


In mmu_decl.h, put the declaration outside the #ifdef CONFIG_PPC32


undeclared (first use in this function); did you mean '__cond_lock'?
    is_second_reloc = 1;
    ^~~
    __cond_lock
arch/powerpc/mm/nohash/kaslr_booke.c:411:4: error: implicit declaration 
of function 'create_kaslr_tlb_entry'; did you mean 'reloc_kernel_entry'? 


Same, put the declaration outside of the #ifdef


[-Werror=implicit-function-declaration]
     create_kaslr_tlb_entry(1, tlb_virt, tlb_phys);
     ^~
     reloc_kernel_entry
cc1: all warnings being treated as errors
make[3]: *** [scripts/Makefile.build:268: 
arch/powerpc/mm/nohash/kaslr_booke.o] Error 1

make[2]: *** [scripts/Makefile.build:505: arch/powerpc/mm/nohash] Error 2
make[1]: *** [scripts/Makefile.build:505: arch/powerpc/mm] Error 2
make: *** [Makefile:1681: arch/powerpc] Error 2


See the patch I sent you. It builds ok for me.

Christophe


Re: [PATCH v3 10/27] powerpc: Add driver for OpenCAPI Persistent Memory

2020-02-25 Thread Alastair D'Silva
On Wed, 2020-02-26 at 16:07 +1100, Andrew Donnellan wrote:
> On 21/2/20 2:27 pm, Alastair D'Silva wrote:
> > From: Alastair D'Silva 
> > 
> > This driver exposes LPC memory on OpenCAPI pmem cards
> > as an NVDIMM, allowing the existing nvram infrastructure
> > to be used.
> > 
> > Namespace metadata is stored on the media itself, so
> > scm_reserve_metadata() maps 1 section's worth of PMEM storage
> > at the start to hold this. The rest of the PMEM range is registered
> > with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size()
> > provide
> > callbacks to libnvdimm to access the metadata.
> > 
> > Signed-off-by: Alastair D'Silva 
> 
> I'm not particularly familiar with the nvdimm subsystem, so the scope
> of 
> my review is more on the ocxl + misc issues side.
> 
> A few minor checkpatch warnings that don't matter all that much:
> 
> https://openpower.xyz/job/snowpatch/job/snowpatch-linux-checkpatch/11786//artifact/linux/checkpatch.log
> 
> A few other comments below.
> 
> > diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl.c
> > b/arch/powerpc/platforms/powernv/pmem/ocxl.c
> > new file mode 100644
> > index ..3c4eeb5dcc0f
> > --- /dev/null
> > +++ b/arch/powerpc/platforms/powernv/pmem/ocxl.c
> > @@ -0,0 +1,473 @@
> > +// SPDX-License-Id
> > +// Copyright 2019 IBM Corp.
> > +
> > +/*
> > + * A driver for OpenCAPI devices that implement the Storage Class
> > + * Memory specification.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "ocxl_internal.h"
> > +
> > +
> > +static const struct pci_device_id ocxlpmem_pci_tbl[] = {
> > +   { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
> > +   { }
> > +};
> > +
> > +MODULE_DEVICE_TABLE(pci, ocxlpmem_pci_tbl);
> > +
> > +#define NUM_MINORS 256 // Total to reserve
> > +
> > +static dev_t ocxlpmem_dev;
> > +static struct class *ocxlpmem_class;
> > +static struct mutex minors_idr_lock;
> > +static struct idr minors_idr;
> > +
> > +/**
> > + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command
> > from ndctl
> > + * @ocxlpmem: the device metadata
> > + * @command: the incoming data to write
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int ndctl_config_write(struct ocxlpmem *ocxlpmem,
> > + struct nd_cmd_set_config_hdr *command)
> > +{
> > +   if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
> > +   return -EINVAL;
> > +
> > +   memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset, 
> > command->in_buf,
> > + command->in_length);
> 
> Out of scope for this patch - given that we use memcpy_mcsafe in the 
> config read, does it make sense to change memcpy_flushcache to be
> mcsafe 
> as well?
> 

Aneesh has confirmed that stores don't generate machine checks.

> > +
> > +   return 0;
> > +}
> > +
> > +/**
> > + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command
> > from ndctl
> > + * @ocxlpmem: the device metadata
> > + * @command: the read request
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int ndctl_config_read(struct ocxlpmem *ocxlpmem,
> > +struct nd_cmd_get_config_data_hdr
> > *command)
> > +{
> > +   if (command->in_offset + command->in_length > LABEL_AREA_SIZE)
> > +   return -EINVAL;
> > +
> > +   memcpy_mcsafe(command->out_buf, ocxlpmem->metadata_addr +
> > command->in_offset,
> > + command->in_length);
> > +
> > +   return 0;
> > +}
> > +
> > +/**
> > + * ndctl_config_size() - Handle a ND_CMD_GET_CONFIG_SIZE command
> > from ndctl
> > + * @command: the read request
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int ndctl_config_size(struct nd_cmd_get_config_size
> > *command)
> > +{
> > +   command->status = 0;
> > +   command->config_size = LABEL_AREA_SIZE;
> > +   command->max_xfer = PAGE_SIZE;
> > +
> > +   return 0;
> > +}
> > +
> > +static int ndctl(struct nvdimm_bus_descriptor *nd_desc,
> > +struct nvdimm *nvdimm,
> > +unsigned int cmd, void *buf, unsigned int buf_len, int
> > *cmd_rc)
> > +{
> > +   struct ocxlpmem *ocxlpmem = container_of(nd_desc, struct
> > ocxlpmem, bus_desc);
> > +
> > +   switch (cmd) {
> > +   case ND_CMD_GET_CONFIG_SIZE:
> > +   *cmd_rc = ndctl_config_size(buf);
> > +   return 0;
> > +
> > +   case ND_CMD_GET_CONFIG_DATA:
> > +   *cmd_rc = ndctl_config_read(ocxlpmem, buf);
> > +   return 0;
> > +
> > +   case ND_CMD_SET_CONFIG_DATA:
> > +   *cmd_rc = ndctl_config_write(ocxlpmem, buf);
> > +   return 0;
> > +
> > +   default:
> > +   return -ENOTTY;
> > +   }
> > +}
> > +
> > +/**
> > + * reserve_metadata() - Reserve space for nvdimm metadata
> > + * @ocxlpmem: the device metadata
> > + * @lpc_mem: The resource representing the LPC memory of the
> > OpenCAPI device
> > + */
> > +static int reserve_metadata(struct ocxlpmem *ocxlpmem,
> > +   

[PATCH] powerpc: fix emulate_step std test

2020-02-25 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/lib/test_emulate_step.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index 42347067739c..00d70253cb5b 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -160,7 +160,7 @@ static void __init test_std(void)
 
/* std r5, 0(r3) */
stepped = emulate_step(®s, TEST_STD(5, 3, 0));
-   if (stepped == 1 || regs.gpr[5] == a)
+   if (stepped == 1 && regs.gpr[5] == a)
show_result("std", "PASS");
else
show_result("std", "FAIL");
-- 
2.23.0



[PATCH v4 2/8] powerpc/kprobes: Mark newly allocated probes as RO

2020-02-25 Thread Russell Currey
From: Christophe Leroy 

With CONFIG_STRICT_KERNEL_RWX=y and CONFIG_KPROBES=y, there will be one
W+X page at boot by default.  This can be tested with
CONFIG_PPC_PTDUMP=y and CONFIG_PPC_DEBUG_WX=y set, and checking the
kernel log during boot.

powerpc doesn't implement its own alloc() for kprobes like other
architectures do, but we couldn't immediately mark RO anyway since we do
a memcpy to the page we allocate later.  After that, nothing should be
allowed to modify the page, and write permissions are removed well
before the kprobe is armed.

The memcpy() would fail if >1 probes were allocated, so use
patch_instruction() instead which is safe for RO.

Reviewed-by: Daniel Axtens 
Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..bfab91ded234 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
 DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
@@ -102,6 +104,16 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
return addr;
 }
 
+void *alloc_insn_page(void)
+{
+   void *page = vmalloc_exec(PAGE_SIZE);
+
+   if (page)
+   set_memory_ro((unsigned long)page, 1);
+
+   return page;
+}
+
 int arch_prepare_kprobe(struct kprobe *p)
 {
int ret = 0;
@@ -124,11 +136,8 @@ int arch_prepare_kprobe(struct kprobe *p)
}
 
if (!ret) {
-   memcpy(p->ainsn.insn, p->addr,
-   MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   patch_instruction(p->ainsn.insn, *p->addr);
p->opcode = *p->addr;
-   flush_icache_range((unsigned long)p->ainsn.insn,
-   (unsigned long)p->ainsn.insn + sizeof(kprobe_opcode_t));
}
 
p->ainsn.boostable = 0;
-- 
2.25.1



[PATCH v4 0/8] set_memory() routines and STRICT_MODULE_RWX

2020-02-25 Thread Russell Currey
Picking up from Christophe's last series, including the following changes:

- [6/8] Cast "data" to unsigned long instead of int to fix build
- [8/8] New, to fix an issue reported by Jordan Niethe

Christophe's last series is here:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=156428

Christophe Leroy (4):
  powerpc/mm: Implement set_memory() routines
  powerpc/kprobes: Mark newly allocated probes as RO
  powerpc/mm: implement set_memory_attr()
  powerpc/32: use set_memory_attr()

Russell Currey (4):
  powerpc/mm/ptdump: debugfs handler for W+X checks at runtime
  powerpc: Set ARCH_HAS_STRICT_MODULE_RWX
  powerpc/configs: Enable STRICT_MODULE_RWX in skiroot_defconfig
  powerpc/mm: Disable set_memory() routines when strict RWX isn't
enabled

 arch/powerpc/Kconfig   |   2 +
 arch/powerpc/Kconfig.debug |   6 +-
 arch/powerpc/configs/skiroot_defconfig |   1 +
 arch/powerpc/include/asm/set_memory.h  |  34 
 arch/powerpc/kernel/kprobes.c  |  17 +++-
 arch/powerpc/mm/Makefile   |   2 +-
 arch/powerpc/mm/pageattr.c | 112 +
 arch/powerpc/mm/pgtable_32.c   |  95 +++--
 arch/powerpc/mm/ptdump/ptdump.c|  21 -
 9 files changed, 197 insertions(+), 93 deletions(-)
 create mode 100644 arch/powerpc/include/asm/set_memory.h
 create mode 100644 arch/powerpc/mm/pageattr.c

-- 
2.25.1



[PATCH v4 1/8] powerpc/mm: Implement set_memory() routines

2020-02-25 Thread Russell Currey
From: Christophe Leroy 

The set_memory_{ro/rw/nx/x}() functions are required for STRICT_MODULE_RWX,
and are generally useful primitives to have.  This implementation is
designed to be completely generic across powerpc's many MMUs.

It's possible that this could be optimised to be faster for specific
MMUs, but the focus is on having a generic and safe implementation for
now.

This implementation does not handle cases where the caller is attempting
to change the mapping of the page it is executing from, or if another
CPU is concurrently using the page being altered.  These cases likely
shouldn't happen, but a more complex implementation with MMU-specific code
could safely handle them, so that is left as a TODO for now.

Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/set_memory.h | 32 
 arch/powerpc/mm/Makefile  |  2 +-
 arch/powerpc/mm/pageattr.c| 74 +++
 4 files changed, 108 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/set_memory.h
 create mode 100644 arch/powerpc/mm/pageattr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 497b7d0b2d7e..bd074246e34e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -129,6 +129,7 @@ config PPC
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
+   select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!HIBERNATION)
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
diff --git a/arch/powerpc/include/asm/set_memory.h 
b/arch/powerpc/include/asm/set_memory.h
new file mode 100644
index ..64011ea444b4
--- /dev/null
+++ b/arch/powerpc/include/asm/set_memory.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_SET_MEMORY_H
+#define _ASM_POWERPC_SET_MEMORY_H
+
+#define SET_MEMORY_RO  0
+#define SET_MEMORY_RW  1
+#define SET_MEMORY_NX  2
+#define SET_MEMORY_X   3
+
+int change_memory_attr(unsigned long addr, int numpages, long action);
+
+static inline int set_memory_ro(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_RO);
+}
+
+static inline int set_memory_rw(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_RW);
+}
+
+static inline int set_memory_nx(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_NX);
+}
+
+static inline int set_memory_x(unsigned long addr, int numpages)
+{
+   return change_memory_attr(addr, numpages, SET_MEMORY_X);
+}
+
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 5e147986400d..a998fdac52f9 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -5,7 +5,7 @@
 
 ccflags-$(CONFIG_PPC64):= $(NO_MINIMAL_TOC)
 
-obj-y  := fault.o mem.o pgtable.o mmap.o \
+obj-y  := fault.o mem.o pgtable.o mmap.o pageattr.o \
   init_$(BITS).o pgtable_$(BITS).o \
   pgtable-frag.o ioremap.o ioremap_$(BITS).o \
   init-common.o mmu_context.o drmem.o
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
new file mode 100644
index ..2b573768a7f7
--- /dev/null
+++ b/arch/powerpc/mm/pageattr.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * MMU-generic set_memory implementation for powerpc
+ *
+ * Copyright 2019, IBM Corporation.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+
+/*
+ * Updates the attributes of a page in three steps:
+ *
+ * 1. invalidate the page table entry
+ * 2. flush the TLB
+ * 3. install the new entry with the updated attributes
+ *
+ * This is unsafe if the caller is attempting to change the mapping of the
+ * page it is executing from, or if another CPU is concurrently using the
+ * page being altered.
+ *
+ * TODO make the implementation resistant to this.
+ */
+static int change_page_attr(pte_t *ptep, unsigned long addr, void *data)
+{
+   long action = (long)data;
+   pte_t pte;
+
+   spin_lock(&init_mm.page_table_lock);
+
+   /* invalidate the PTE so it's safe to modify */
+   pte = ptep_get_and_clear(&init_mm, addr, ptep);
+   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
+
+   /* modify the PTE bits as desired, then apply */
+   switch (action) {
+   case SET_MEMORY_RO:
+   pte = pte_wrprotect(pte);
+   break;
+   case SET_MEMORY_RW:
+   pte = pte_mkwrite(pte);
+   break;
+   case SET_MEMORY_NX:
+   pte = pte_exprotect(pte);
+   break;

[PATCH v4 3/8] powerpc/mm/ptdump: debugfs handler for W+X checks at runtime

2020-02-25 Thread Russell Currey
Very rudimentary, just

echo 1 > [debugfs]/check_wx_pages

and check the kernel log.  Useful for testing strict module RWX.

Updated the Kconfig entry to reflect this.

Also fixed a typo.

Signed-off-by: Russell Currey 
---
 arch/powerpc/Kconfig.debug  |  6 --
 arch/powerpc/mm/ptdump/ptdump.c | 21 -
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 0b063830eea8..e37960ef68c6 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -370,7 +370,7 @@ config PPC_PTDUMP
  If you are unsure, say N.
 
 config PPC_DEBUG_WX
-   bool "Warn on W+X mappings at boot"
+   bool "Warn on W+X mappings at boot & enable manual checks at runtime"
depends on PPC_PTDUMP && STRICT_KERNEL_RWX
help
  Generate a warning if any W+X mappings are found at boot.
@@ -384,7 +384,9 @@ config PPC_DEBUG_WX
  of other unfixed kernel bugs easier.
 
  There is no runtime or memory usage effect of this option
- once the kernel has booted up - it's a one time check.
+ once the kernel has booted up, it only automatically checks once.
+
+ Enables the "check_wx_pages" debugfs entry for checking at runtime.
 
  If in doubt, say "Y".
 
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 206156255247..a15e19a3b14e 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -4,7 +4,7 @@
  *
  * This traverses the kernel pagetables and dumps the
  * information about the used sections of memory to
- * /sys/kernel/debug/kernel_pagetables.
+ * /sys/kernel/debug/kernel_page_tables.
  *
  * Derived from the arm64 implementation:
  * Copyright (c) 2014, The Linux Foundation, Laura Abbott.
@@ -413,6 +413,25 @@ void ptdump_check_wx(void)
else
pr_info("Checked W+X mappings: passed, no W+X pages found\n");
 }
+
+static int check_wx_debugfs_set(void *data, u64 val)
+{
+   if (val != 1ULL)
+   return -EINVAL;
+
+   ptdump_check_wx();
+
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(check_wx_fops, NULL, check_wx_debugfs_set, "%llu\n");
+
+static int ptdump_check_wx_init(void)
+{
+   return debugfs_create_file("check_wx_pages", 0200, NULL,
+  NULL, &check_wx_fops) ? 0 : -ENOMEM;
+}
+device_initcall(ptdump_check_wx_init);
 #endif
 
 static int ptdump_init(void)
-- 
2.25.1



[PATCH v4 4/8] powerpc: Set ARCH_HAS_STRICT_MODULE_RWX

2020-02-25 Thread Russell Currey
To enable strict module RWX on powerpc, set:

CONFIG_STRICT_MODULE_RWX=y

You should also have CONFIG_STRICT_KERNEL_RWX=y set to have any real
security benefit.

ARCH_HAS_STRICT_MODULE_RWX is set to require ARCH_HAS_STRICT_KERNEL_RWX.
This is due to a quirk in arch/Kconfig and arch/powerpc/Kconfig that
makes STRICT_MODULE_RWX *on by default* in configurations where
STRICT_KERNEL_RWX is *unavailable*.

Since this doesn't make much sense, and module RWX without kernel RWX
doesn't make much sense, having the same dependencies as kernel RWX
works around this problem.

Signed-off-by: Russell Currey 
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bd074246e34e..e1fc7fba10bf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -131,6 +131,7 @@ config PPC
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!HIBERNATION)
+   select ARCH_HAS_STRICT_MODULE_RWX   if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UACCESS_MCSAFE  if PPC64
-- 
2.25.1



  1   2   >