from:"Michael Neuling"

[PATCH] selftests/powerpc: Increase timeout for vsx_signal test

2023-03-07 Thread Michael Neuling

On the max config P10 machine (1920 threads and 64TB) this test fails
with a timeout:

Sending signals to all threads 10 times...!! killing vmx_signal
!! child died by signal 15
failure: vmx_signal

The default timeout is 120sec so increase this 3x to 360sec. With this
change the test passes on these large machines.

Signed-off-by: Michael Neuling 
---
 tools/testing/selftests/powerpc/math/vmx_signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/powerpc/math/vmx_signal.c 
b/tools/testing/selftests/powerpc/math/vmx_signal.c
index b340a5c4e79d..c307dff19c12 100644
--- a/tools/testing/selftests/powerpc/math/vmx_signal.c
+++ b/tools/testing/selftests/powerpc/math/vmx_signal.c
@@ -151,5 +151,6 @@ int test_signal_vmx(void)
 
 int main(int argc, char *argv[])
 {
+   test_harness_set_timeout(360);
return test_harness(test_signal_vmx, "vmx_signal");
 }
-- 
2.39.2

Re: [PATCH] powerpc/microwatt: Add LiteX MMC driver

2022-08-18 Thread Michael Neuling

On Thu, 2022-08-04 at 11:27 +0930, Joel Stanley wrote:
> Enable the LiteX MMC device and it's dependency the common clock
> framework.
> 
> Signed-off-by: Joel Stanley 

Acked-by: Michael Neuling 

> ---
>  arch/powerpc/configs/microwatt_defconfig | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/powerpc/configs/microwatt_defconfig
> b/arch/powerpc/configs/microwatt_defconfig
> index eff933ebbb9e..ea2dbd778aad 100644
> --- a/arch/powerpc/configs/microwatt_defconfig
> +++ b/arch/powerpc/configs/microwatt_defconfig
> @@ -75,7 +75,12 @@ CONFIG_SPI_BITBANG=y
>  CONFIG_SPI_SPIDEV=y
>  # CONFIG_HWMON is not set
>  # CONFIG_USB_SUPPORT is not set
> +CONFIG_MMC=y
> +# CONFIG_PWRSEQ_EMMC is not set
> +# CONFIG_PWRSEQ_SIMPLE is not set
> +CONFIG_MMC_LITEX=y
>  # CONFIG_VIRTIO_MENU is not set
> +CONFIG_COMMON_CLK=y
>  # CONFIG_IOMMU_SUPPORT is not set
>  # CONFIG_NVMEM is not set
>  CONFIG_EXT4_FS=y

Re: [PATCH] powerpc/microwatt: Add mmu bits to device tree

2022-05-23 Thread Michael Neuling

On Fri, 2022-05-20 at 10:06 +1000, Nicholas Piggin wrote:
> Excerpts from Joel Stanley's message of May 19, 2022 10:57 pm:
> > In commit 5402e239d09f ("powerpc/64s: Get LPID bit width from device
> > tree") the kernel tried to determine the pid and lpid bits from the
> > device tree. If they are not found, there is a fallback, but Microwatt
> > wasn't covered as has the unusual configuration of being both !HV and bare
> > metal.
> > 
> > Set the values in the device tree to avoid having to add a special case.
> > The lpid value is the only one required, but add both for completeness.
> > 
> > Fixes: 5402e239d09f ("powerpc/64s: Get LPID bit width from device tree")
> > Signed-off-by: Joel Stanley 
> 
> LGTM... does Microwatt actually need 12 lpid bits, or does it just need 
> the LPID 0 partition table entry? 

Microwatt just assumes LPID=0 as it doesn't actually have an LPID SPR. It just
uses the first entry in the partition table.

There are some details here:

   
https://github.com/antonblanchard/microwatt/commit/18120f153d138f733fa7e8a89c3456bb93683f96

> I wonder if you set it to 1. That's a minor nit though.

.. or even set it to 0.

Mikey

Re: [PATCH] powerpc/tm: Fix more userspace r13 corruption

2022-03-10 Thread Michael Neuling

On Fri, 2022-03-11 at 12:47 +1000, Nicholas Piggin wrote:
> Commit cf13435b730a ("powerpc/tm: Fix userspace r13 corruption") fixes
> a problem in treclaim where a SLB miss can occur on the
> thread_struct->ckpt_regs while SCRATCH0 is live with the saved user r13
> value, clobbering it with the kernel r13 and ultimately resulting in
> kernel r13 being stored in ckpt_regs.
> 
> There is an equivalent problem in trechkpt where the user r13 value is
> loaded into r13 from chkpt_regs to be recheckpointed, but a SLB miss
> could occur on ckpt_regs accesses after that, which will result in r13
> being clobbered with a kernel value and that will get recheckpointed and
> then restored to user registers.
> 
> The same memory page is accessed right before this critical window where
> a SLB miss could cause corruption, so hitting the bug requires the SLB
> entry be removed within a small window of instructions, which is possible
> if a SLB related MCE hits there. PAPR also permits the hypervisor to
> discard this SLB entry (because slb_shadow->persistent is only set to
> SLB_NUM_BOLTED) although it's not known whether any implementations would
> do this (KVM does not). So this is an extremely unlikely bug, only found
> by inspection.
> 
> Fix this by also storing user r13 in a temporary location on the kernel
> stack and don't chane the r13 register from kernel r13 until the RI=0
> critical section that does not fault.

s/chane/change/

> 
> [ The SCRATCH0 change is not strictly part of the fix, it's only used in
>   the RI=0 section so it does not have the same problem as the previous
>   SCRATCH0 bug. ]
> 
> Signed-off-by: Nicholas Piggin 

This needs to be marked for stable also. Other than that:

Acked-by: Michael Neuling 

Thanks!

> ---
>  arch/powerpc/kernel/tm.S | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
> index 3beecc32940b..5a0f023a26e9 100644
> --- a/arch/powerpc/kernel/tm.S
> +++ b/arch/powerpc/kernel/tm.S
> @@ -443,7 +443,8 @@ restore_gprs:
>  
> REST_GPR(0, r7) /* GPR0 */
> REST_GPRS(2, 4, r7) /* GPR2-4 */
> -   REST_GPRS(8, 31, r7)/* GPR8-31 */
> +   REST_GPRS(8, 12, r7)/* GPR8-12 */
> +   REST_GPRS(14, 31, r7)   /* GPR14-31 */
>  
> /* Load up PPR and DSCR here so we don't run with user values for long
> */
> mtspr   SPRN_DSCR, r5
> @@ -479,18 +480,24 @@ restore_gprs:
> REST_GPR(6, r7)
>  
> /*
> -    * Store r1 and r5 on the stack so that we can access them after we
> -    * clear MSR RI.
> +    * Store user r1 and r5 and r13 on the stack (in the unused save
> +    * areas / compiler reserved areas), so that we can access them after
> +    * we clear MSR RI.
>  */
>  
> REST_GPR(5, r7)
> std r5, -8(r1)
> -   ld  r5, GPR1(r7)
> +   ld  r5, GPR13(r7)
> std r5, -16(r1)
> +   ld  r5, GPR1(r7)
> +   std r5, -24(r1)
>  
> REST_GPR(7, r7)
>  
> -   /* Clear MSR RI since we are about to use SCRATCH0. EE is already off
> */
> +   /* Stash the stack pointer away for use after recheckpoint */
> +   std r1, PACAR1(r13)
> +
> +   /* Clear MSR RI since we are about to clobber r13. EE is already off
> */
> li  r5, 0
> mtmsrd  r5, 1
>  
> @@ -501,9 +508,9 @@ restore_gprs:
>  * until we turn MSR RI back on.
>  */
>  
> -   SET_SCRATCH0(r1)
> ld  r5, -8(r1)
> -   ld  r1, -16(r1)
> +   ld  r13, -16(r1)
> +   ld  r1, -24(r1)
>  
> /* Commit register state as checkpointed state: */
> TRECHKPT
> @@ -519,9 +526,9 @@ restore_gprs:
>  */
>  
> GET_PACA(r13)
> -   GET_SCRATCH0(r1)
> +   ld  r1, PACAR1(r13)
>  
> -   /* R1 is restored, so we are recoverable again.  EE is still off */
> +   /* R13, R1 is restored, so we are recoverable again.  EE is still off
> */
> li  r4, MSR_RI
> mtmsrd  r4, 1
>

Re: [PATCH] powerpc/powernv/memtrace: Fake non-memblock aligned sized traces

2020-11-11 Thread Michael Neuling

CC Rashmica Gupta 

On Wed, 2020-11-11 at 16:55 +1100, Jordan Niethe wrote:
> The hardware trace macros which use the memory provided by memtrace are
> able to use trace sizes as small as 16MB. Only memblock aligned values
> can be removed from each NUMA node by writing that value to
> memtrace/enable in debugfs.  This means setting up, say, a 16MB trace is
> not possible.  To allow such a trace size, instead align whatever value
> is written to memtrace/enable to the memblock size for the purpose of
> removing it from each NUMA node but report the written value from
> memtrace/enable and memtrace/x/size in debugfs.
> 
> Signed-off-by: Jordan Niethe 
> ---
>  arch/powerpc/platforms/powernv/memtrace.c | 20 ++--
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c
> b/arch/powerpc/platforms/powernv/memtrace.c
> index 6828108486f8..1188bc8fd090 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -191,7 +191,7 @@ static int memtrace_init_debugfs(void)
>   ent->dir = dir;
>   debugfs_create_file("trace", 0400, dir, ent, _fops);
>   debugfs_create_x64("start", 0400, dir, >start);
> - debugfs_create_x64("size", 0400, dir, >size);
> + debugfs_create_x64("size", 0400, dir, _size);
>   }
>  
>   return ret;
> @@ -259,33 +259,25 @@ static int memtrace_enable_set(void *data, u64 val)
>  {
>   u64 bytes;
>  
> - /*
> -  * Don't attempt to do anything if size isn't aligned to a memory
> -  * block or equal to zero.
> -  */
> - bytes = memory_block_size_bytes();
> - if (val & (bytes - 1)) {
> - pr_err("Value must be aligned with 0x%llx\n", bytes);
> - return -EINVAL;
> - }
> -
>   /* Re-add/online previously removed/offlined memory */
>   if (memtrace_size) {
>   if (memtrace_online())
>   return -EAGAIN;
>   }
>  
> + memtrace_size = val;
> +
>   if (!val)
>   return 0;
>  
> - /* Offline and remove memory */
> - if (memtrace_init_regions_runtime(val))
> + /* Offline and remove memory aligned to memory blocks */
> + bytes = memory_block_size_bytes();
> + if (memtrace_init_regions_runtime(ALIGN(val, bytes)))
>   return -EINVAL;
>  
>   if (memtrace_init_debugfs())
>   return -EINVAL;
>  
> - memtrace_size = val;
>  
>   return 0;
>  }

[PATCH 1/2] powerpc: Fix user data corruption with P9N DD2.1 VSX CI load workaround emulation

2020-10-12 Thread Michael Neuling

__get_user_atomic_128_aligned() stores to kaddr using stvx which is a
VMX store instruction, hence kaddr must be 16 byte aligned otherwise
the store won't occur as expected.

Unfortunately when we call __get_user_atomic_128_aligned() in
p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
guaranteed to be 16B aligned. This means that the write to vbuf in
__get_user_atomic_128_aligned() has the bottom bits of the address
truncated. This results in other local variables being
overwritten. Also vbuf will not contain the correct data which results
in the userspace emulation being wrong and hence user data corruption.

In the past we've been mostly lucky as vbuf has ended up aligned but
this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
particular can change the stack arrangement enough that our luck runs
out.

This issue only occurs on POWER9 Nimbus <= DD2.1 bare metal.

The fix is to align vbuf to a 16 byte boundary.

Fixes: 5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue")
Signed-off-by: Michael Neuling 
Cc:  # v4.15+
---
 arch/powerpc/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index c5f39f13e96e..5006dcbe1d9f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -885,7 +885,7 @@ static void p9_hmi_special_emu(struct pt_regs *regs)
 {
unsigned int ra, rb, t, i, sel, instr, rc;
const void __user *addr;
-   u8 vbuf[16], *vdst;
+   u8 vbuf[16] __aligned(16), *vdst;
unsigned long ea, msr, msr_mask;
bool swap;
 
-- 
2.26.2

[PATCH 2/2] selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround

2020-10-12 Thread Michael Neuling

alignment_handler currently only tests the unaligned cases but it can
also be useful for testing the workaround for the P9N DD2.1 vector CI
load issue fixed by p9_hmi_special_emu(). This workaround was
introduced in 5080332c2c89 ("powerpc/64s: Add workaround for P9 vector
CI load issue").

This changes the loop to start from offset 0 rather than 1 so that we
test the kernel emulation in p9_hmi_special_emu().

Signed-off-by: Michael Neuling 
---
 .../selftests/powerpc/alignment/alignment_handler.c   | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 2a0503bc7e49..cb53a8b777e6 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -266,8 +266,12 @@ int do_test(char *test_name, void (*test_func)(char *, 
char *))
}
 
rc = 0;
-   /* offset = 0 no alignment fault, so skip */
-   for (offset = 1; offset < 16; offset++) {
+   /*
+* offset = 0 is aligned but tests the workaround for the P9N
+* DD2.1 vector CI load issue (see 5080332c2c89 "powerpc/64s:
+* Add workaround for P9 vector CI load issue")
+*/
+   for (offset = 0; offset < 16; offset++) {
width = 16; /* vsx == 16 bytes */
r = 0;
 
-- 
2.26.2

Re: [PATCH] powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores

2020-08-12 Thread Michael Neuling

On Mon, 2020-08-03 at 22:41 +1000, Michael Ellerman wrote:
> Michael Neuling  writes:
> > On POWER10 bit 12 in the PVR indicates if the core is SMT4 or
> > SMT8. Bit 12 is set for SMT4.
> > 
> > Without this patch, /proc/cpuinfo on a SMT4 DD1 POWER10 looks like
> > this:
> > cpu : POWER10, altivec supported
> > revision: 17.0 (pvr 0080 1100)
> > 
> > Signed-off-by: Michael Neuling 
> > ---
> >  arch/powerpc/kernel/setup-common.c | 1 +
> >  1 file changed, 1 insertion(+)
> 
> This should have a Fixes: pointing at something so it gets backported.

Yes it should.

Mikey

[PATCH] powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores

2020-08-02 Thread Michael Neuling

On POWER10 bit 12 in the PVR indicates if the core is SMT4 or
SMT8. Bit 12 is set for SMT4.

Without this patch, /proc/cpuinfo on a SMT4 DD1 POWER10 looks like
this:
cpu : POWER10, altivec supported
revision: 17.0 (pvr 0080 1100)

Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/setup-common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index b198b0ff25..808ec9fab6 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -311,6 +311,7 @@ static int show_cpuinfo(struct seq_file *m, void *v)
min = pvr & 0xFF;
break;
case 0x004e: /* POWER9 bits 12-15 give chip type */
+   case 0x0080: /* POWER10 bit 12 gives SMT8/4 */
maj = (pvr >> 8) & 0x0F;
min = pvr & 0xFF;
break;
-- 
2.26.2

Re: [PATCH v2 2/3] powerpc/powernv/idle: save-restore DAWR0,DAWRX0 for P10

2020-07-23 Thread Michael Neuling

On Fri, 2020-07-10 at 10:52 +0530, Pratik Rajesh Sampat wrote:
> Additional registers DAWR0, DAWRX0 may be lost on Power 10 for
> stop levels < 4.
> Therefore save the values of these SPRs before entering a  "stop"
> state and restore their values on wakeup.
> 
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  arch/powerpc/platforms/powernv/idle.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c
> b/arch/powerpc/platforms/powernv/idle.c
> index 19d94d021357..f2e2a6a4c274 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -600,6 +600,8 @@ struct p9_sprs {
>   u64 iamr;
>   u64 amor;
>   u64 uamor;
> + u64 dawr0;
> + u64 dawrx0;
>  };
>  
>  static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on)
> @@ -687,6 +689,10 @@ static unsigned long power9_idle_stop(unsigned long
> psscr, bool mmu_on)
>   sprs.iamr   = mfspr(SPRN_IAMR);
>   sprs.amor   = mfspr(SPRN_AMOR);
>   sprs.uamor  = mfspr(SPRN_UAMOR);
> + if (cpu_has_feature(CPU_FTR_ARCH_31)) {

Can you add a comment here saying even though DAWR0 is ARCH_30, it's only
required to be saved on 31. Otherwise this looks pretty odd.

> + sprs.dawr0 = mfspr(SPRN_DAWR0);
> + sprs.dawrx0 = mfspr(SPRN_DAWRX0);
> + }
>  
>   srr1 = isa300_idle_stop_mayloss(psscr); /* go idle */
>  
> @@ -710,6 +716,10 @@ static unsigned long power9_idle_stop(unsigned long
> psscr, bool mmu_on)
>   mtspr(SPRN_IAMR,sprs.iamr);
>   mtspr(SPRN_AMOR,sprs.amor);
>   mtspr(SPRN_UAMOR,   sprs.uamor);
> + if (cpu_has_feature(CPU_FTR_ARCH_31)) {
> + mtspr(SPRN_DAWR0, sprs.dawr0);
> + mtspr(SPRN_DAWRX0, sprs.dawrx0);
> + }
>  
>   /*
>* Workaround for POWER9 DD2.0, if we lost resources, the ERAT

Re: [PATCH v2 07/10] powerpc/perf: support BHRB disable bit and new filtering modes

2020-07-07 Thread Michael Neuling

On Wed, 2020-07-01 at 05:20 -0400, Athira Rajeev wrote:
> PowerISA v3.1 has few updates for the Branch History Rolling Buffer(BHRB).
> First is the addition of BHRB disable bit and second new filtering
> modes for BHRB.
> 
> BHRB disable is controlled via Monitor Mode Control Register A (MMCRA)
> bit 26, namely "BHRB Recording Disable (BHRBRD)". This field controls
> whether BHRB entries are written when BHRB recording is enabled by other
> bits. Patch implements support for this BHRB disable bit.

Probably good to note here that this is backwards compatible. So if you have a
kernel that doesn't know about this bit, it'll clear it and hence you still get
BHRB. 

You should also note why you'd want to do disable this (ie. the core will run
faster).

> Secondly PowerISA v3.1 introduce filtering support for
> PERF_SAMPLE_BRANCH_IND_CALL/COND. The patch adds BHRB filter support
> for "ind_call" and "cond" in power10_bhrb_filter_map().
> 
> 'commit bb19af816025 ("powerpc/perf: Prevent kernel address leak to userspace
> via BHRB buffer")'
> added a check in bhrb_read() to filter the kernel address from BHRB buffer.
> Patch here modified
> it to avoid that check for PowerISA v3.1 based processors, since PowerISA v3.1
> allows
> only MSR[PR]=1 address to be written to BHRB buffer.
> 
> Signed-off-by: Athira Rajeev 
> ---
>  arch/powerpc/perf/core-book3s.c   | 27 +--
>  arch/powerpc/perf/isa207-common.c | 13 +
>  arch/powerpc/perf/power10-pmu.c   | 13 +++--
>  arch/powerpc/platforms/powernv/idle.c | 14 ++

This touches the idle code so we should get those guys on CC (adding Vaidy and
Ego).

>  4 files changed, 59 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index fad5159..9709606 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -466,9 +466,13 @@ static void power_pmu_bhrb_read(struct perf_event *event,
> struct cpu_hw_events *
>* addresses at this point. Check the privileges before
>* exporting it to userspace (avoid exposure of regions
>* where we could have speculative execution)
> +  * Incase of ISA 310, BHRB will capture only user-space
> +  * address,hence include a check before filtering code
>*/
> - if (is_kernel_addr(addr) && perf_allow_kernel(
> >attr) != 0)
> - continue;
> + if (!(ppmu->flags & PPMU_ARCH_310S))
> + if (is_kernel_addr(addr) &&
> + perf_allow_kernel(>attr) != 0)
> + continue;
>  
>   /* Branches are read most recent first (ie. mfbhrb 0 is
>* the most recent branch).
> @@ -1212,7 +1216,7 @@ static void write_mmcr0(struct cpu_hw_events *cpuhw,
> unsigned long mmcr0)
>  static void power_pmu_disable(struct pmu *pmu)
>  {
>   struct cpu_hw_events *cpuhw;
> - unsigned long flags, mmcr0, val;
> + unsigned long flags, mmcr0, val, mmcra = 0;
>  
>   if (!ppmu)
>   return;
> @@ -1245,12 +1249,23 @@ static void power_pmu_disable(struct pmu *pmu)
>   mb();
>   isync();
>  
> + val = mmcra = cpuhw->mmcr[2];
> +
>   /*
>* Disable instruction sampling if it was enabled
>*/
> - if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
> - mtspr(SPRN_MMCRA,
> -   cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
> + if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE)
> + mmcra = cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE;
> +
> + /* Disable BHRB via mmcra [:26] for p10 if needed */
> + if (!(cpuhw->mmcr[2] & MMCRA_BHRB_DISABLE))
> + mmcra |= MMCRA_BHRB_DISABLE;
> +
> + /* Write SPRN_MMCRA if mmcra has either disabled
> +  * instruction sampling or BHRB
> +  */
> + if (val != mmcra) {
> + mtspr(SPRN_MMCRA, mmcra);
>   mb();
>   isync();
>   }
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-
> common.c
> index 7d4839e..463d925 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -404,6 +404,12 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
>  
>   mmcra = mmcr1 = mmcr2 = mmcr3 = 0;
>  
> + /* Disable bhrb unless explicitly requested
> +  * by setting MMCRA [:26] bit.
> +  */
> + if (cpu_has_feature(CPU_FTR_ARCH_31))
> + mmcra |= MMCRA_BHRB_DISABLE;
> +
>   /* Second pass: assign PMCs, set all MMCR1 fields */
>   for (i = 0; i < n_ev; ++i) {
>   pmc =

Re: [PATCH v2 06/10] powerpc/perf: power10 Performance Monitoring support

2020-07-07 Thread Michael Neuling



> @@ -480,6 +520,7 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
>   mmcr[1] = mmcr1;
>   mmcr[2] = mmcra;
>   mmcr[3] = mmcr2;
> + mmcr[4] = mmcr3;

This is fragile like the kvm vcpu case I commented on before but it gets passed
in via a function parameter?! Can you create a struct to store these in rather
than this odd ball numbering?

The cleanup should start in patch 1/10 here:

/*
 * The order of the MMCR array is:
-*  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2
+*  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2, MMCR3
 *  - 32-bit, MMCR0, MMCR1, MMCR2
 */
-   unsigned long mmcr[4];
+   unsigned long mmcr[5];



mikey

Re: [PATCH v2 04/10] powerpc/perf: Add power10_feat to dt_cpu_ftrs

2020-07-07 Thread Michael Neuling

On Wed, 2020-07-01 at 05:20 -0400, Athira Rajeev wrote:
> From: Madhavan Srinivasan 
> 
> Add power10 feature function to dt_cpu_ftrs.c along
> with a power10 specific init() to initialize pmu sprs.

Can you say why you're doing this?

Can you add some text about what you're doing to the BHRB in this patch?

Mikey

> 
> Signed-off-by: Madhavan Srinivasan 
> ---
>  arch/powerpc/include/asm/reg.h|  3 +++
>  arch/powerpc/kernel/cpu_setup_power.S |  7 +++
>  arch/powerpc/kernel/dt_cpu_ftrs.c | 26 ++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 21a1b2d..900ada1 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1068,6 +1068,9 @@
>  #define MMCR0_PMC2_LOADMISSTIME  0x5
>  #endif
>  
> +/* BHRB disable bit for PowerISA v3.10 */
> +#define MMCRA_BHRB_DISABLE   0x0020
> +
>  /*
>   * SPRG usage:
>   *
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S
> b/arch/powerpc/kernel/cpu_setup_power.S
> index efdcfa7..e8b3370c 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -233,3 +233,10 @@ __init_PMU_ISA207:
>   li  r5,0
>   mtspr   SPRN_MMCRS,r5
>   blr
> +
> +__init_PMU_ISA31:
> + li  r5,0
> + mtspr   SPRN_MMCR3,r5
> + LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
> + mtspr   SPRN_MMCRA,r5
> + blr
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index a0edeb3..14a513f 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -449,6 +449,31 @@ static int __init feat_enable_pmu_power9(struct
> dt_cpu_feature *f)
>   return 1;
>  }
>  
> +static void init_pmu_power10(void)
> +{
> + init_pmu_power9();
> +
> + mtspr(SPRN_MMCR3, 0);
> + mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
> +}
> +
> +static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
> +{
> + hfscr_pmu_enable();
> +
> + init_pmu_power10();
> + init_pmu_registers = init_pmu_power10;
> +
> + cur_cpu_spec->cpu_features |= CPU_FTR_MMCRA;
> + cur_cpu_spec->cpu_user_features |= PPC_FEATURE_PSERIES_PERFMON_COMPAT;
> +
> + cur_cpu_spec->num_pmcs  = 6;
> + cur_cpu_spec->pmc_type  = PPC_PMC_IBM;
> + cur_cpu_spec->oprofile_cpu_type = "ppc64/power10";
> +
> + return 1;
> +}
> +
>  static int __init feat_enable_tm(struct dt_cpu_feature *f)
>  {
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> @@ -638,6 +663,7 @@ struct dt_cpu_feature_match {
>   {"pc-relative-addressing", feat_enable, 0},
>   {"machine-check-power9", feat_enable_mce_power9, 0},
>   {"performance-monitor-power9", feat_enable_pmu_power9, 0},
> + {"performance-monitor-power10", feat_enable_pmu_power10, 0},
>   {"event-based-branch-v3", feat_enable, 0},
>   {"random-number-generator", feat_enable, 0},
>   {"system-call-vectored", feat_disable, 0},

Re: [PATCH v2 02/10] KVM: PPC: Book3S HV: Save/restore new PMU registers

2020-07-07 Thread Michael Neuling

@@ -637,12 +637,12 @@ struct kvm_vcpu_arch {
>   u32 ccr1;
>   u32 dbsr;
>  
> - u64 mmcr[5];
> + u64 mmcr[6];
>   u32 pmc[8];
>   u32 spmc[2];
>   u64 siar;


> + mfspr   r5, SPRN_MMCR3
> + mfspr   r6, SPRN_SIER2
> + mfspr   r7, SPRN_SIER3
> + std r5, VCPU_MMCR + 40(r9)
> + std r6, VCPU_SIER + 8(r9)
> + std r7, VCPU_SIER + 16(r9)


This is looking pretty fragile now. vcpu mmcr[6] stores (in this strict order):
   mmcr0, mmcr1, mmcra, mmcr2, mmcrs, mmmcr3.

Can we clean that up? Give mmcra and mmcrs their own entries in vcpu and then
have a flat array for mmcr0-3.

Mikey

Re: [RFC PATCH 1/4] powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

2020-06-01 Thread Michael Neuling

On Fri, 2020-05-29 at 11:24 +1000, Alistair Popple wrote:
> For what it's worth I tested this series on Mambo PowerNV and it seems to 
> correctly enable/disable the prefix FSCR bit based on the cpu feature so feel 
> free to add:
> 
> Tested-by: Alistair Popple 
> 
> Mikey is going to test out pseries.

FWIW this worked for me in the P10 + powervm sim testing.

Tested-by: Michael Neuling 

> 
> - Alistair
> 
> On Thursday, 28 May 2020 12:58:40 AM AEST Michael Ellerman wrote:
> > __init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc:
> > Add support for context switching the TAR register") (Feb 2013), and
> > only set FSCR_TAR.
> > 
> > At that point FSCR (Facility Status and Control Register) was not
> > context switched, so the setting was permanent after boot.
> > 
> > Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit
> > 54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again
> > that was permanent after boot.
> > 
> > Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on
> > POWER8") (Aug 2013) added a limited context switch of FSCR, just the
> > FSCR_DSCR bit was context switched based on thread.dscr_inherit. That
> > commit said "This clears the H/FSCR DSCR bit initially", but it
> > didn't, it left the initialisation of FSCR_DSCR in __init_FSCR().
> > However the initial context switch from init_task to pid 1 would clear
> > FSCR_DSCR because thread.dscr_inherit was 0.
> > 
> > That commit also introduced the requirement that FSCR_DSCR be clear
> > for user processes, so that we can take the facility unavailable
> > interrupt in order to manage dscr_inherit.
> > 
> > Then in commit 152d523e6307 ("powerpc: Create context switch helpers
> > save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to
> > thread_struct. However it still wasn't fully context switched, we just
> > took the existing value and set FSCR_DSCR if the new thread had
> > dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR |
> > FSCR_TAR, but that value was not propagated into the thread_struct, so
> > the initial context switch set FSCR_DSCR back to 0.
> > 
> > Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context
> > switching") (Jun 2016) added a full context switch of the FSCR, and
> > added an initialisation of init_task.thread.fscr to FSCR_TAR |
> > FSCR_EBB, but omitted FSCR_DSCR.
> > 
> > The end result is that swapper runs with FSCR_DSCR set because of the
> > initialisation in __init_FSCR(), but no other processes do, they use
> > the value from init_task.thread.fscr.
> > 
> > Having FSCR_DSCR set for swapper allows it to access SPR 3 from
> > userspace, but swapper never runs userspace, so it has no useful
> > effect. It's also confusing to have the value initialised in two
> > places to two different values.
> > 
> > So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the
> > point where there's a single value of FSCR, even if it's still set in
> > two places.
> > 
> > Signed-off-by: Michael Ellerman 
> > ---
> >  arch/powerpc/kernel/cpu_setup_power.S | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/cpu_setup_power.S
> > b/arch/powerpc/kernel/cpu_setup_power.S index a460298c7ddb..f91ecb10d0ae
> > 100644
> > --- a/arch/powerpc/kernel/cpu_setup_power.S
> > +++ b/arch/powerpc/kernel/cpu_setup_power.S
> > @@ -184,7 +184,7 @@ _GLOBAL(__restore_cpu_power9)
> > 
> >  __init_FSCR:
> > mfspr   r3,SPRN_FSCR
> > -   ori r3,r3,FSCR_TAR|FSCR_DSCR|FSCR_EBB
> > +   ori r3,r3,FSCR_TAR|FSCR_EBB
> > mtspr   SPRN_FSCR,r3
> > blr
> 
>

[PATCH] powerpc: Fix misleading small cores print

2020-05-28 Thread Michael Neuling

Currently when we boot on a big core system, we get this print:
  [0.040500] Using small cores at SMT level

This is misleading as we've actually detected big cores.

This patch clears up the print to say we've detect big cores but are
using small cores for scheduling.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6d2a3a3666..c820c95162 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1383,7 +1383,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 #ifdef CONFIG_SCHED_SMT
if (has_big_cores) {
-   pr_info("Using small cores at SMT level\n");
+   pr_info("Big cores detected but using small core scheduling\n");
power9_topology[0].mask = smallcore_smt_mask;
powerpc_topology[0].mask = smallcore_smt_mask;
}
-- 
2.26.2

[PATCH] powerpc/configs: Add LIBNVDIMM to ppc64_defconfig

2020-05-18 Thread Michael Neuling

This gives us OF_PMEM which is useful in mambo.

This adds 153K to the text of ppc64le_defconfig which 0.8% of the
total text.

  LIBNVDIMM text databss dec  hex
  Without   18574833 5518150 1539240 25632223 1871ddf
  With  18727834 5546206 1539368 25813408 189e1a0

Signed-off-by: Michael Neuling 
---
 arch/powerpc/configs/ppc64_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index bae8170d74..0a92549924 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -281,6 +281,7 @@ CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_DS1307=y
 CONFIG_VIRTIO_PCI=m
 CONFIG_VIRTIO_BALLOON=m
+CONFIG_LIBNVDIMM=y
 CONFIG_RAS=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT2_FS_XATTR=y
-- 
2.26.2

Re: [PATCH v2 1/7] powerpc: Add new HWCAP bits

2020-05-18 Thread Michael Neuling

On Tue, 2020-05-19 at 10:31 +1000, Alistair Popple wrote:
> POWER10 introduces two new architectural features - ISAv3.1 and matrix
> multiply accumulate (MMA) instructions. Userspace detects the presence
> of these features via two HWCAP bits introduced in this patch. These
> bits have been agreed to by the compiler and binutils team.
> 
> Signed-off-by: Alistair Popple 

I've test booted this series + powerpc/next (30df74d67d) on top of powervm and
OPAL on a P10 simulator.

In both cases, it enables MMA and prefix instructions and advertises them via
HWCAP2 MMA + ISA 3.1. Hence:

Tested-by: Michael Neuling 

> ---
>  arch/powerpc/include/uapi/asm/cputable.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/include/uapi/asm/cputable.h
> b/arch/powerpc/include/uapi/asm/cputable.h
> index 540592034740..2692a56bf20b 100644
> --- a/arch/powerpc/include/uapi/asm/cputable.h
> +++ b/arch/powerpc/include/uapi/asm/cputable.h
> @@ -50,6 +50,8 @@
>  #define PPC_FEATURE2_DARN0x0020 /* darn random number insn */
>  #define PPC_FEATURE2_SCV 0x0010 /* scv syscall */
>  #define PPC_FEATURE2_HTM_NO_SUSPEND  0x0008 /* TM w/out suspended state
> */
> +#define PPC_FEATURE2_ARCH_3_10x0004 /* ISA 3.1 */
> +#define PPC_FEATURE2_MMA 0x0002 /* Matrix Multiply Accumulate
> */
>  
>  /*
>   * IMPORTANT!

Re: [PATCH] powerpc: Add new HWCAP bits

2020-03-31 Thread Michael Neuling

On Tue, 2020-03-31 at 12:12 -0300, Tulio Magno Quites Machado Filho wrote:
> Alistair Popple  writes:
> 
> > diff --git a/arch/powerpc/include/uapi/asm/cputable.h
> > b/arch/powerpc/include/uapi/asm/cputable.h
> > index 540592034740..c6fe10b2 100644
> > --- a/arch/powerpc/include/uapi/asm/cputable.h
> > +++ b/arch/powerpc/include/uapi/asm/cputable.h
> > @@ -50,6 +50,8 @@
> >  #define PPC_FEATURE2_DARN  0x0020 /* darn random number insn */
> >  #define PPC_FEATURE2_SCV   0x0010 /* scv syscall */
> >  #define PPC_FEATURE2_HTM_NO_SUSPEND0x0008 /* TM w/out suspended
> > state */
> > +#define PPC_FEATURE2_ARCH_3_10 0x0004 /* ISA 3.10 */
> 
> I think this should have been:
> 
> #define PPC_FEATURE2_ARCH_3_1 0x0004 /* ISA 3.1 */

Agreed. That's the new name.

Sorry Al I should have caught that earlier.

Mikey

[PATCH] powerpc/tm: Document h/rfid and mtmsrd quirk

2020-03-24 Thread Michael Neuling

The ISA has a quirk that's useful for the Linux implementation.
Document it here so others are less likely to trip over it.

Signed-off-by: Michael Neuling 
Suggested-by: Michael Ellerman 
---
 .../powerpc/transactional_memory.rst  | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/powerpc/transactional_memory.rst 
b/Documentation/powerpc/transactional_memory.rst
index 09955103ac..74ae71001a 100644
--- a/Documentation/powerpc/transactional_memory.rst
+++ b/Documentation/powerpc/transactional_memory.rst
@@ -245,3 +245,30 @@ POWER9N DD2.2.
 Guest migration from POWER8 to POWER9 will work with POWER9N DD2.2 and
 POWER9C DD1.2. Since earlier POWER9 processors don't support TM
 emulation, migration from POWER8 to POWER9 is not supported there.
+
+Kernel implementation
+=
+
+h/rfid mtmsrd quirk
+===
+
+As defined in the ISA, rfid has a quirk which is useful in early
+exception handling. When in a userspace transaction and we enter the
+kernel via some exception, MSR will end up as TM=0 and TS=01 (ie. TM
+off but TM suspended). Regularly the kernel will want change bits in
+the MSR and will perform an rfid to do this. In this case rfid can
+have SRR0 TM = 0 and TS = 00 (ie. TM off and non transaction) and the
+resulting MSR will retain TM = 0 and TS=01 from before (ie. stay in
+suspend). This is a quirk in the architecture as this would normally
+be a transition from TS=01 to TS=00 (ie. suspend -> non transactional)
+which is an illegal transition.
+
+This quirk is described the architecture in the definition of rfid
+with these lines:
+
+  if (MSR 29:31 ¬ = 0b010 | SRR1 29:31 ¬ = 0b000) then
+ MSR 29:31 <- SRR1 29:31
+
+hrfid and mtmsrd have the same quirk.
+
+The Linux kernel uses this quirk in it's early exception handling.
-- 
2.25.1

Re: [RFC PATCH v2 00/12] Reduce ifdef mess in ptrace

2020-02-23 Thread Michael Neuling

Christophe,

> Le 28/06/2019 à 17:47, Christophe Leroy a écrit :
> > The purpose of this series is to reduce the amount of #ifdefs
> > in ptrace.c
> > 
> 
> Any feedback on this series which aims at fixing the issue you opened at 
> https://github.com/linuxppc/issues/issues/128 ?

Yeah, sorry my bad. You did all the hard work and I ignored it.

I like the approach and is a long the lines I was thinking. Putting it in a
ptrace subdir, splitting out adv_debug_regs, TM, SPE, Alitivec, VSX.
ppc_gethwdinfo() looks a lot nicer now too (that was some of the worst of it).

I've not gone through it with a fine tooth comb though. There is (rightly) a lot
of code moved around which could have introduced some issues.

It applies on v5.2 but are you planning on updating it to a newer base?

Mikey

Re: [PATCH] powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK

2020-02-17 Thread Michael Neuling

On Mon, 2020-02-17 at 07:40 +0100, Christophe Leroy wrote:
> 
> Le 16/02/2020 à 23:40, Michael Neuling a écrit :
> > On Fri, 2020-02-14 at 08:33 +, Christophe Leroy wrote:
> > > With CONFIG_VMAP_STACK, data MMU has to be enabled
> > > to read data on the stack.
> > 
> > Can you describe what goes wrong without this? Some oops message? rtas blows
> > up?
> > Get corrupt data?
> 
> Larry reported a machine check. Or in fact, he reported a Oops in 
> kprobe_handler(), that Oops being a bug in kprobe_handle() triggered by 
> this machine check.
> 
> By converting a VM address to a phys-like address as if is was linear 
> mem, you get in the dark. Either there is some physical memory at that 
> address and you corrupt it. Or there is none and you get a machine check.

Excellent. Please put that in the commit message.

> > Also can you say what you're actually doing (ie turning on MSR[DR])
> 
> Euh ... I'm saying that data MMU has to be enabled, so I'm enabling it.

Yeah, it's a minor point.

> > 
> > > Signed-off-by: Christophe Leroy 
> > > ---
> > >   arch/powerpc/kernel/entry_32.S | 9 +++--
> > >   1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/entry_32.S
> > > b/arch/powerpc/kernel/entry_32.S
> > > index 0713daa651d9..bc056d906b51 100644
> > > --- a/arch/powerpc/kernel/entry_32.S
> > > +++ b/arch/powerpc/kernel/entry_32.S
> > > @@ -1354,12 +1354,17 @@ _GLOBAL(enter_rtas)
> > >   mtspr   SPRN_SRR0,r8
> > >   mtspr   SPRN_SRR1,r9
> > >   RFI
> > > -1:   tophys(r9,r1)
> > > +1:   tophys_novmstack r9, r1
> > > +#ifdef CONFIG_VMAP_STACKisntruction
> > > + li  r0, MSR_KERNEL & ~MSR_IR/* can take DTLB miss */
> > 
> > You're potentially turning on more than MSR DR here. This should be clear in
> > the
> > commit message.
> 
> Am I ?
> 
> At the time of the RFI just above, SRR1 contains the value of r9 which 
> has been set 2 lines before to MSR_KERNEL & ~(MSR_IR|MSR_DR).
> 
> What should be clear in the commit message ?

You're right. I was just looking at the patch and not the code. It's clearer in
the code.

Mikey

> 
> > > + mtmsr   r0
> > > + isync
> > > +#endif
> > >   lwz r8,INT_FRAME_SIZE+4(r9) /* get return address */
> > >   lwz r9,8(r9)/* original msr value */
> > >   addir1,r1,INT_FRAME_SIZE
> > >   li  r0,0
> > > - tophys(r7, r2)
> > > + tophys_novmstack r7, r2
> > >   stw r0, THREAD + RTAS_SP(r7)
> > >   mtspr   SPRN_SRR0,r8
> > >   mtspr   SPRN_SRR1,r9
> 
> Christophe
>

Re: [PATCH] KVM: PPC: Book3S HV: Treat unrecognized TM instructions as illegal

2020-02-16 Thread Michael Neuling

On Sun, 2020-02-16 at 23:57 -0600, Segher Boessenkool wrote:
> On Mon, Feb 17, 2020 at 12:07:31PM +1100, Michael Neuling wrote:
> > On Thu, 2020-02-13 at 10:15 -0500, Gustavo Romero wrote:
> > > On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated
> > > by
> > > KVM. This is handled at first by the hardware raising a softpatch
> > > interrupt
> > > when certain TM instructions that need KVM assistance are executed in the
> > > guest. Some TM instructions, although not defined in the Power ISA, might
> > > raise a softpatch interrupt. For instance, 'tresume.' instruction as
> > > defined in the ISA must have bit 31 set (1), but an instruction that
> > > matches 'tresume.' OP and XO opcodes but has bit 31 not set (0), like
> > > 0x7cfe9ddc, also raises a softpatch interrupt, for example, if a code
> > > like the following is executed in the guest it will raise a softpatch
> > > interrupt just like a 'tresume.' when the TM facility is enabled:
> > > 
> > > int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }
> > > and then treats the executed instruction as 'nop' whilst it should
> > > actually
> > > be treated as an illegal instruction since it's not defined by the ISA.
> > 
> > The ISA has this: 
> > 
> >1.3.3 Reserved Fields, Reserved Values, and Reserved SPRs
> > 
> >Reserved fields in instructions are ignored by the pro-
> >cessor.
> > 
> > Hence the hardware will ignore reserved bits. For example executing your
> > little
> > program on P8 just exits normally with 0x7cfe9ddc being executed as a NOP.
> > 
> > Hence, we should NOP this, not generate an illegal.
> 
> It is not a reserved bit.
> 
> The IMC entry for it matches op1=01 op2=101110 presumably, which
> catches all TM instructions and nothing else (bits 0..5 and bits 21..30).
> That does not look at bit 31, the softpatch handler has to deal with this.
> 
> Some TM insns have bit 31 as 1 and some have it as /.  All instructions
> with a "." in the mnemonic have bit 31 is 1, all other have it reserved.
> The tables in appendices D, E, F show tend. and tsr. as having it
> reserved, which contradicts the individual instruction description (and
> does not make much sense).  (Only tcheck has /, everything else has 1;
> everything else has a mnemonic with a dot, and does write CR0 always).

Wow, interesting. 

P8 seems to be treating 31 as a reserved bit (with the table definition rather
than the individual instruction description). I'm inclined to match P8 even
though it's inconsistent with the dot mnemonic as you say.

Mikey

Re: [PATCH] KVM: PPC: Book3S HV: Treat unrecognized TM instructions as illegal

2020-02-16 Thread Michael Neuling

On Thu, 2020-02-13 at 10:15 -0500, Gustavo Romero wrote:
> On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by
> KVM. This is handled at first by the hardware raising a softpatch interrupt
> when certain TM instructions that need KVM assistance are executed in the
> guest. Some TM instructions, although not defined in the Power ISA, might
> raise a softpatch interrupt. For instance, 'tresume.' instruction as
> defined in the ISA must have bit 31 set (1), but an instruction that
> matches 'tresume.' OP and XO opcodes but has bit 31 not set (0), like
> 0x7cfe9ddc, also raises a softpatch interrupt, for example, if a code
> like the following is executed in the guest it will raise a softpatch
> interrupt just like a 'tresume.' when the TM facility is enabled:
> 
> int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }
> 
> Currently in such a case KVM throws a complete trace like the following:
> 
> [345523.705984] WARNING: CPU: 24 PID: 64413 at 
> arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
> [345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT 
> nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
> iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
> ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 
> vmx_crypto ipmi_devintf ipmi_msghandler
> ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm 
> iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
> libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs 
> blake2b_generic zstd_compress raid10 raid456
> async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor 
> raid6_pq raid1 raid0 multipath linear tg3
> crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv]
> [345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: GW   E
>  5.5.0+ #1
> [345523.706031] NIP:  c008072cb9c0 LR: c008072b5e80 CTR: 
> c008085c7850
> [345523.706034] REGS: c00399467680 TRAP: 0700   Tainted: GW   E   
>(5.5.0+)
> [345523.706034] MSR:  90010282b033 
>   CR: 24022428  XER: 
> [345523.706042] CFAR: c008072b5e7c IRQMASK: 0
> GPR00: c008072b5e80 c00399467910 c008072db500 
> c00375ccc720
> GPR04: c00375ccc720 0003fbec a10395dda5a6 
> 
> GPR08: 7cfe9ddc 7cfe9ddc05dc 7cfe9ddc7c0005dc 
> c008072cd530
> GPR12: c008085c7850 c003fffeb800 0001 
> 7dfb737f
> GPR16: c0002001edcca558   
> 0001
> GPR20: c1b21258 c0002001edcca558 0018 
> 
> GPR24: 0100  0001 
> 1500
> GPR28: c0002001edcc4278 c0037dd8 80050280f033 
> c00375ccc720
> [345523.706062] NIP [c008072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 
> [kvm_hv]
> [345523.706065] LR [c008072b5e80] 
> kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv]
> [345523.706066] Call Trace:
> [345523.706069] [c00399467910] [c00399467940] 0xc00399467940 
> (unreliable)
> [345523.706071] [c00399467950] [c00399467980] 0xc00399467980
> [345523.706075] [c003994679f0] [c008072bd1c4] 
> kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv]
> [345523.706079] [c00399467ac0] [c008072bd8e0] 
> kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv]
> [345523.706087] [c00399467b90] [c008085c93cc] 
> kvmppc_vcpu_run+0x34/0x48 [kvm]
> [345523.706095] [c00399467bb0] [c008085c582c] 
> kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
> [345523.706101] [c00399467c40] [c008085b7498] 
> kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm]
> [345523.706105] [c00399467db0] [c04adf9c] ksys_ioctl+0x13c/0x170
> [345523.706107] [c00399467e00] [c04adff8] sys_ioctl+0x28/0x80
> [345523.706111] [c00399467e20] [c000b278] system_call+0x5c/0x68
> [345523.706112] Instruction dump:
> [345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 
> 6d497c00 2f8907dd
> [345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe0> 38210040 
> 3860 ebc1fff0
> 
> and then treats the executed instruction as 'nop' whilst it should actually
> be treated as an illegal instruction since it's not defined by the ISA.

The ISA has this: 

   1.3.3 Reserved Fields, Reserved Values, and Reserved SPRs

   Reserved fields in instructions are ignored by the pro-
   cessor.

Hence the hardware will ignore reserved bits. For example executing your little
program on P8 just exits normally with 0x7cfe9ddc being executed as a NOP.

Hence, we should NOP this, not generate an illegal.

Mikey

> This commit changes the handling of the case above by treating the
> unrecognized TM instructions that can raise a softpatch but are not
>

Re: [PATCH v7 4/4] powerpc: Book3S 64-bit "heavyweight" KASAN support

2020-02-16 Thread Michael Neuling

Daniel. 

Can you start this commit message with a simple description of what you are
actually doing? This reads like you've been on a long journey to Mordor and
back, which as a reader of this patch in the long distant future, I don't care
about. I just want to know what you're implementing.

Also I'm struggling to review this as I don't know what software or hardware
mechanisms you are using to perform sanitisation.

Mikey

On Thu, 2020-02-13 at 11:47 +1100, Daniel Axtens wrote:
> KASAN support on Book3S is a bit tricky to get right:
> 
>  - It would be good to support inline instrumentation so as to be able to
>catch stack issues that cannot be caught with outline mode.
> 
>  - Inline instrumentation requires a fixed offset.
> 
>  - Book3S runs code in real mode after booting. Most notably a lot of KVM
>runs in real mode, and it would be good to be able to instrument it.
> 
>  - Because code runs in real mode after boot, the offset has to point to
>valid memory both in and out of real mode.
> 
> [ppc64 mm note: The kernel installs a linear mapping at effective
> address c000... onward. This is a one-to-one mapping with physical
> memory from ... onward. Because of how memory accesses work on
> powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
> same memory both with translations on (accessing as an 'effective
> address'), and with translations off (accessing as a 'real
> address'). This works in both guests and the hypervisor. For more
> details, see s5.7 of Book III of version 3 of the ISA, in particular
> the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
> KASAN implementation currently only supports Radix.]
> 
> One approach is just to give up on inline instrumentation. This way all
> checks can be delayed until after everything set is up correctly, and the
> address-to-shadow calculations can be overridden. However, the features and
> speed boost provided by inline instrumentation are worth trying to do
> better.
> 
> If _at compile time_ it is known how much contiguous physical memory a
> system has, the top 1/8th of the first block of physical memory can be set
> aside for the shadow. This is a big hammer and comes with 3 big
> consequences:
> 
>  - there's no nice way to handle physically discontiguous memory, so only
>the first physical memory block can be used.
> 
>  - kernels will simply fail to boot on machines with less memory than
>specified when compiling.
> 
>  - kernels running on machines with more memory than specified when
>compiling will simply ignore the extra memory.
> 
> Implement and document KASAN this way. The current implementation is Radix
> only.
> 
> Despite the limitations, it can still find bugs,
> e.g. http://patchwork.ozlabs.org/patch/1103775/
> 
> At the moment, this physical memory limit must be set _even for outline
> mode_. This may be changed in a later series - a different implementation
> could be added for outline mode that dynamically allocates shadow at a
> fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/
> 
> Suggested-by: Michael Ellerman 
> Cc: Balbir Singh  # ppc64 out-of-line radix version
> Cc: Christophe Leroy  # ppc32 version
> Signed-off-by: Daniel Axtens 
> 
> ---
> Changes since v6:
>  - rework kasan_late_init support, which also fixes book3e problem that
> snowpatch
>picked up (I think)
>  - fix a checkpatch error that snowpatch picked up
>  - don't needlessly move the include in kasan.h
> 
> Changes since v5:
>  - rebase on powerpc/merge, with Christophe's latest changes integrating
>kasan-vmalloc
>  - documentation tweaks based on latest 32-bit changes
> 
> Changes since v4:
>  - fix some ppc32 build issues
>  - support ptdump
>  - clean up the header file. It turns out we don't need or use
> KASAN_SHADOW_SIZE,
>so just dump it, and make KASAN_SHADOW_END the thing that varies between 32
>and 64 bit. As part of this, make sure KASAN_SHADOW_OFFSET is only
> configured for
>32 bit - it is calculated in the Makefile for ppc64.
>  - various cleanups
> 
> Changes since v3:
>  - Address further feedback from Christophe.
>  - Drop changes to stack walking, it looks like the issue I observed is
>related to that particular stack, not stack-walking generally.
> 
> Changes since v2:
> 
>  - Address feedback from Christophe around cleanups and docs.
>  - Address feedback from Balbir: at this point I don't have a good solution
>for the issues you identify around the limitations of the inline
> implementation
>but I think that it's worth trying to get the stack instrumentation
> support.
>I'm happy to have an alternative and more flexible outline mode - I had
>envisoned this would be called 'lightweight' mode as it imposes fewer
> restrictions.
>I've linked to your implementation. I think it's best to add it in a
> follow-up series.
>  - Made the default PHYS_MEM_SIZE_FOR_KASAN value

Re: Kernel (little-endian) crashing on POWER8 on heavy PowerKVM load

2020-02-16 Thread Michael Neuling

Paulus,

Something below for you I think


> We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian 
> unstable
> hosting a big-endian ppc64 virtual machine running the same kernel in 
> big-endian
> mode.
> 
> When building OpenJDK-11 on the big-endian VM, the testsuite crashes the 
> *host* system
> with the following kernel backtrace. The problem reproduces both with kernel 
> 4.19.98
> as well as 5.4.13.
> 
> Backtrace has been attached at the end of this mail.
> 
> Thanks,
> Adrian
> 
> watson login: [17667518570.438744] BUG: Unable to handle kernel data access 
> at 0xc2bfd038
> [17667518570.438772] Faulting instruction address: 0xc017a778
> [17667518570.438777] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [17667518570.438781] Faulting instruction address: 0xc02659a0
> [17667518570.438785] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [17667518570.438789] Faulting instruction address: 0xc02659a0
> [17667518570.438793] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [17667518570.438797] Faulting instruction address: 0xc02659a0
> [17667518570.438801] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [17667518570.438804] Faulting instruction address: 0xc02659a0
> [17667518570.438808] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08



> [17667518570.439197] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [ 8142.397983]  async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) 
> raid6_pq(E) libcrc32c(E) crc32c_generic(E)
> [17667518570.439207] Faulting instruction address: 0xc02659a0
> [ 8142.397992]  raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) 
> xhci_pci(E) xhci_hcd(E)
> [17667518570.439215] Thread overran stack, or stack corrupted
> [ 8142.398000]  e1000e(E) usbcore(E) ptp(E) pps_core(E) ipr(E) usb_common(E)
> [ 8142.398011] CPU: 48 PID: 2571 Comm: CPU 0/KVM Tainted: GE 
> 5.4.0-0.bpo.3-powerpc64le #1 Debian 5.4.13-1~bpo10+1
> [ 8142.398014] NIP:  c00fe3117a00 LR: c0196b9c CTR: 
> c00fe3117a00
> [17667518570.439234] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [ 8142.398026] REGS: c00fe315f4c0 TRAP: 0400   Tainted: GE
>   (5.4.0-0.bpo.3-powerpc64le Debian 5.4.13-1~bpo10+1)
> [17667518570.439243] Faulting instruction address: 0xc02659a0
> [17667518570.439245] Thread overran stack, or stack corrupted
> [ 8142.398038] MSR:  900010009033   CR: 28448484 
>  XER: 
> [ 8142.398046] CFAR: c0196b98 IRQMASK: 1 
> [ 8142.398046] GPR00: c0196e0c c00fe315f750 c12e0800 
> c00fe31179c0 
> [ 8142.398046] GPR04: 0003   
>  
> [ 8142.398046] GPR08: c00fe315f7f0 c00fe3117a00 8030 
> c008082bcd80 
> [ 8142.398046] GPR12: c00fe3117a00 c00f5a00  
> 0008 
> [ 8142.398046] GPR16: c13a5c18 c00ff1035e00 c00fe315f8e8 
> 0001 
> [ 8142.398046] GPR20:  c00fe315f8e8 c00fe31179c0 
>  
> [ 8142.398046] GPR24: c00fe315f7f0 0001  
> 0003 
> [ 8142.398046] GPR28:  c00fedc6e750 0010 
> c00fe311f8d0 
> [ 8142.398079] NIP [c00fe3117a00] 0xc00fe3117a00
> [ 8142.398087] LR [c0196b9c] __wake_up_common+0xcc/0x290
> [17667518570.439321] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [ 8142.398109] Call Trace:
> [17667518570.439328] Faulting instruction address: 0xc02659a0
> [17667518570.439330] Thread overran stack, or stack corrupted
> [ 8142.398122] [c00fe315f750] [c0196b9c] 
> __wake_up_common+0xcc/0x290 (unreliable)
> [ 8142.398127] [c00fe315f7d0] [c0196e0c] 
> __wake_up_common_lock+0xac/0x110
> [ 8142.398134] [c00fe315f850] [c008082a9760] 
> kvmppc_run_core+0x12f8/0x18c0 [kvm_hv]
> [ 8142.398140] [c00fe315fa10] [c008082acf14] 
> kvmppc_vcpu_run_hv+0x62c/0xb20 [kvm_hv]
> [ 8142.398149] [c00fe315fae0] [c008081098cc] 
> kvmppc_vcpu_run+0x34/0x48 [kvm]
> [ 8142.398158] [c00fe315fb00] [c0080810587c] 
> kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> [ 8142.398166] [c00fe315fb90] [c008080f7ac8] 
> kvm_vcpu_ioctl+0x340/0x7d0 [kvm]
> [ 8142.398172] [c00fe315fd00] [c0445410] do_vfs_ioctl+0xe0/0xac0
> [ 8142.398176] [c00fe315fdb0] [c0445eb4] ksys_ioctl+0xc4/0x110
> [ 8142.398180] [c00fe315fe00] [c0445f28] sys_ioctl+0x28/0x80
> [ 8142.398184] [c00fe315fe20] [c000b9c8] system_call+0x5c/0x68
> [ 8142.398186] Instruction dump:
> [17667518570.439406] BUG: Unable to handle kernel data access at 
> 0xc007f9070c08
> [ 8142.398196]        
>

Re: [PATCH] powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK

2020-02-16 Thread Michael Neuling

On Fri, 2020-02-14 at 08:33 +, Christophe Leroy wrote:
> With CONFIG_VMAP_STACK, data MMU has to be enabled
> to read data on the stack.

Can you describe what goes wrong without this? Some oops message? rtas blows up?
Get corrupt data?

Also can you say what you're actually doing (ie turning on MSR[DR])


> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/entry_32.S | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 0713daa651d9..bc056d906b51 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -1354,12 +1354,17 @@ _GLOBAL(enter_rtas)
>   mtspr   SPRN_SRR0,r8
>   mtspr   SPRN_SRR1,r9
>   RFI
> -1:   tophys(r9,r1)
> +1:   tophys_novmstack r9, r1
> +#ifdef CONFIG_VMAP_STACK
> + li  r0, MSR_KERNEL & ~MSR_IR/* can take DTLB miss */

You're potentially turning on more than MSR DR here. This should be clear in the
commit message.

> + mtmsr   r0
> + isync
> +#endif
>   lwz r8,INT_FRAME_SIZE+4(r9) /* get return address */
>   lwz r9,8(r9)/* original msr value */
>   addir1,r1,INT_FRAME_SIZE
>   li  r0,0
> - tophys(r7, r2)
> + tophys_novmstack r7, r2
>   stw r0, THREAD + RTAS_SP(r7)
>   mtspr   SPRN_SRR0,r8
>   mtspr   SPRN_SRR1,r9

Re: [PATCH 1/1] powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type

2020-02-16 Thread Michael Neuling

On Sat, 2020-02-15 at 02:36 -0300, Leonardo Bras wrote:
> Before checking for cpu_type == NULL, this same copy happens, so doing
> it here will just write the same value to the t->oprofile_type
> again.
> 
> Remove the repeated copy, as it is unnecessary.
> 
> Signed-off-by: Leonardo Bras 

LGTM

Reviewed-by: Michael Neuling 

> ---
>  arch/powerpc/kernel/cputable.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
> index e745abc5457a..5a87ec96582f 100644
> --- a/arch/powerpc/kernel/cputable.c
> +++ b/arch/powerpc/kernel/cputable.c
> @@ -2197,7 +2197,6 @@ static struct cpu_spec * __init setup_cpu_spec(unsigned
> long offset,
>*/
>   if (old.oprofile_cpu_type != NULL) {
>   t->oprofile_cpu_type = old.oprofile_cpu_type;
> - t->oprofile_type = old.oprofile_type;
>   }
>   }
>

Re: [PATCH v3 1/3] powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

2020-02-11 Thread Michael Neuling

> Found with tm-signal-context-force-tm kernel selftest.
> 
> v3: Subject and comment improvements.
> v2: Fix build failure when tm is disabled.
> 
> Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the
> signal context")
> Cc: sta...@vger.kernel.org # v3.9
> Signed-off-by: Gustavo Luiz Duarte 

Acked-By: Michael Neuling

Re: [PATCH V5 06/14] powerpc/vas: Setup thread IRQ handler per VAS instance

2020-02-10 Thread Michael Neuling

On Sun, 2020-02-09 at 21:17 -0800, Haren Myneni wrote:
> On Fri, 2020-02-07 at 16:57 +1100, Michael Neuling wrote:
> > >  /*
> > > + * Process CRBs that we receive on the fault window.
> > > + */
> > > +irqreturn_t vas_fault_handler(int irq, void *data)
> > > +{
> > > + struct vas_instance *vinst = data;
> > > + struct coprocessor_request_block buf, *crb;
> > > + struct vas_window *window;
> > > + void *fifo;
> > > +
> > > + /*
> > > +  * VAS can interrupt with multiple page faults. So process all
> > > +  * valid CRBs within fault FIFO until reaches invalid CRB.
> > > +  * NX updates nx_fault_stamp in CRB and pastes in fault FIFO.
> > > +  * kernel retrives send window from parition send window ID
> > > +  * (pswid) in nx_fault_stamp. So pswid should be non-zero and
> > > +  * use this to check whether CRB is valid.
> > > +  * After reading CRB entry, it is reset with 0's in fault FIFO.
> > > +  *
> > > +  * In case kernel receives another interrupt with different page
> > > +  * fault and CRBs are processed by the previous handling, will be
> > > +  * returned from this function when it sees invalid CRB (means 0's).
> > > +  */
> > > + do {
> > > + mutex_lock(>mutex);
> > 
> > This isn't going to work.
> > 
> > From Documentation/locking/mutex-design.rst
> > 
> > - Mutexes may not be used in hardware or software interrupt
> >   contexts such as tasklets and timers.
> 
> Initially used kernel thread per VAS instance and later using IRQ
> thread. 
> 
> vas_fault_handler() is IRQ thread function, not IRQ handler. I thought
> we can use mutex_lock() in thread function.

Sorry, I missed it was a threaded IRQ handler, so I think is ok to use a
mutex_lock() in there.

You should run with CONFIG DEBUG_MUTEXES and CONFIG_LOCKDEP enabled to give you
some more confidence.

It would be good to document how this mutex is used and document the start of
the function so it doesn't get changed later to a non-threaded handler. 

Mikey

Re: [PATCH V5 09/14] powerpc/vas: Update CSB and notify process for fault CRBs

2020-02-10 Thread Michael Neuling



> > > +
> > > + csb.cc = CSB_CC_TRANSLATION;
> > > + csb.ce = CSB_CE_TERMINATION;
> > > + csb.cs = 0;
> > > + csb.count = 0;
> > > +
> > > + /*
> > > +  * Returns the fault address in CPU format since it is passed with
> > > +  * signal. But if the user space expects BE format, need changes.
> > > +  * i.e either kernel (here) or user should convert to CPU format.
> > > +  * Not both!
> > > +  */
> > > + csb.address = be64_to_cpu(crb->stamp.nx.fault_storage_addr);
> > 
> > This looks wrong and I don't understand the comment. You need to convert
> > this
> > back to be64 to write it to csb.address. ie.
> > 
> >   csb.address = cpu_to_be64(be64_to_cpu(crb->stamp.nx.fault_storage_addr));
> > 
> > Which I think you can just avoid the endian conversion all together.
> 
> NX pastes fault CRB in big-endian, so passing this address in CPU format
> to user space, otherwise the library has to convert. 

OK, then please change the definition in struct coprocessor_status_block to just
__u64.

struct coprocessor_status_block {
u8 flags;
u8 cs;
u8 cc;
u8 ce;
__be32 count;
__be64 address;
} __packed __aligned(CSB_ALIGN);

Big but

I thought "struct coprocessor_status_block" was also written by hardware. If
that's the case then it needs to be __be64 and you need the kernel to synthesize
exactly what the hardware is doing. Hence the struct definition is correct and
the kernel needs to convert to _be64 on writing. 

> What is the standard way for passing to user space? 

CPU endian.

> > > +  * process will be polling on csb.flags after request is sent to
> > > +  * NX. So generally CSB update should not fail except when an
> > > +  * application does not follow the process properly. So an error
> > > +  * message will be displayed and leave it to user space whether
> > > +  * to ignore or handle this signal.
> > > +  */
> > > + rcu_read_lock();
> > > + rc = kill_pid_info(SIGSEGV, , pid);
> > > + rcu_read_unlock();
> > 
> > why the rcu_read_un/lock() here?
> 
> Used same as in kill_proc_info()/kill_something_info()

Please document.

Mikey

Re: [PATCH V5 06/14] powerpc/vas: Setup thread IRQ handler per VAS instance

2020-02-06 Thread Michael Neuling



>  /*
> + * Process CRBs that we receive on the fault window.
> + */
> +irqreturn_t vas_fault_handler(int irq, void *data)
> +{
> + struct vas_instance *vinst = data;
> + struct coprocessor_request_block buf, *crb;
> + struct vas_window *window;
> + void *fifo;
> +
> + /*
> +  * VAS can interrupt with multiple page faults. So process all
> +  * valid CRBs within fault FIFO until reaches invalid CRB.
> +  * NX updates nx_fault_stamp in CRB and pastes in fault FIFO.
> +  * kernel retrives send window from parition send window ID
> +  * (pswid) in nx_fault_stamp. So pswid should be non-zero and
> +  * use this to check whether CRB is valid.
> +  * After reading CRB entry, it is reset with 0's in fault FIFO.
> +  *
> +  * In case kernel receives another interrupt with different page
> +  * fault and CRBs are processed by the previous handling, will be
> +  * returned from this function when it sees invalid CRB (means 0's).
> +  */
> + do {
> + mutex_lock(>mutex);

This isn't going to work.

>From Documentation/locking/mutex-design.rst

- Mutexes may not be used in hardware or software interrupt
  contexts such as tasklets and timers.

Mikey

Re: [PATCH V5 09/14] powerpc/vas: Update CSB and notify process for fault CRBs

2020-02-06 Thread Michael Neuling

On Wed, 2020-01-22 at 00:17 -0800, Haren Myneni wrote:
> For each fault CRB, update fault address in CRB (fault_storage_addr)
> and translation error status in CSB so that user space can touch the
> fault address and resend the request. If the user space passed invalid
> CSB address send signal to process with SIGSEGV.
> 
> Signed-off-by: Sukadev Bhattiprolu 
> Signed-off-by: Haren Myneni 
> ---
>  arch/powerpc/platforms/powernv/vas-fault.c | 116
> +
>  1 file changed, 116 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/vas-fault.c
> b/arch/powerpc/platforms/powernv/vas-fault.c
> index 5c2cada..2cfab0c 100644
> --- a/arch/powerpc/platforms/powernv/vas-fault.c
> +++ b/arch/powerpc/platforms/powernv/vas-fault.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  
> @@ -26,6 +27,120 @@
>  #define VAS_FAULT_WIN_FIFO_SIZE  (4 << 20)
>  
>  /*
> + * Update the CSB to indicate a translation error.
> + *
> + * If the fault is in the CSB address itself or if we are unable to
> + * update the CSB, send a signal to the process, because we have no
> + * other way of notifying the user process.
> + *
> + * Remaining settings in the CSB are based on wait_for_csb() of
> + * NX-GZIP.
> + */
> +static void update_csb(struct vas_window *window,
> + struct coprocessor_request_block *crb)
> +{
> + int rc;
> + struct pid *pid;
> + void __user *csb_addr;
> + struct task_struct *tsk;
> + struct kernel_siginfo info;
> + struct coprocessor_status_block csb;
> +
> + /*
> +  * NX user space windows can not be opened for task->mm=NULL
> +  * and faults will not be generated for kernel requests.
> +  */
> + if (!window->mm || !window->user_win)
> + return;
> +
> + csb_addr = (void *)be64_to_cpu(crb->csb_addr);
> +
> + csb.cc = CSB_CC_TRANSLATION;
> + csb.ce = CSB_CE_TERMINATION;
> + csb.cs = 0;
> + csb.count = 0;
> +
> + /*
> +  * Returns the fault address in CPU format since it is passed with
> +  * signal. But if the user space expects BE format, need changes.
> +  * i.e either kernel (here) or user should convert to CPU format.
> +  * Not both!
> +  */
> + csb.address = be64_to_cpu(crb->stamp.nx.fault_storage_addr);

This looks wrong and I don't understand the comment. You need to convert this
back to be64 to write it to csb.address. ie.

  csb.address = cpu_to_be64(be64_to_cpu(crb->stamp.nx.fault_storage_addr));

Which I think you can just avoid the endian conversion all together.

> + csb.flags = 0;
> +
> + pid = window->pid;
> + tsk = get_pid_task(pid, PIDTYPE_PID);
> + /*
> +  * Send window will be closed after processing all NX requests
> +  * and process exits after closing all windows. In multi-thread
> +  * applications, thread may not exists, but does not close FD
> +  * (means send window) upon exit. Parent thread (tgid) can use
> +  * and close the window later.
> +  * pid and mm references are taken when window is opened by
> +  * process (pid). So tgid is used only when child thread opens
> +  * a window and exits without closing it in multithread tasks.
> +  */
> + if (!tsk) {
> + pid = window->tgid;
> + tsk = get_pid_task(pid, PIDTYPE_PID);
> + /*
> +  * Parent thread will be closing window during its exit.
> +  * So should not get here.
> +  */
> + if (!tsk)
> + return;
> + }
> +
> + /* Return if the task is exiting. */
> + if (tsk->flags & PF_EXITING) {
> + put_task_struct(tsk);
> + return;
> + }
> +
> + use_mm(window->mm);
> + rc = copy_to_user(csb_addr, , sizeof(csb));
> + /*
> +  * User space polls on csb.flags (first byte). So add barrier
> +  * then copy first byte with csb flags update.
> +  */
> + smp_mb();
> + if (!rc) {
> + csb.flags = CSB_V;
> + rc = copy_to_user(csb_addr, , sizeof(u8));
> + }
> + unuse_mm(window->mm);
> + put_task_struct(tsk);
> +
> + /* Success */
> + if (!rc)
> + return;
> +
> + pr_err("Invalid CSB address 0x%p signalling pid(%d)\n",
> + csb_addr, pid_vnr(pid));

This is a userspace error, not a kernel error. This should not be a pr_err().

Userspace could spam the console with this.

> +
> + clear_siginfo();
> + info.si_signo = SIGSEGV;
> + info.si_errno = EFAULT;
> + info.si_code = SEGV_MAPERR;
> + info.si_addr = csb_addr;
> +
> + /*
> +  * process will be polling on csb.flags after request is sent to
> +  * NX. So generally CSB update should not fail except when an
> +  * application does not follow the process properly. So an error
> +  * message will be displayed and leave it to user space whether
> +  * to ignore

Re: [PATCH v2 1/3] powerpc/tm: Clear the current thread's MSR[TS] after treclaim

2020-02-06 Thread Michael Neuling

On Thu, 2020-02-06 at 19:13 -0300, Gustavo Luiz Duarte wrote:
> 
> On 2/5/20 1:58 AM, Michael Neuling wrote:
> > Other than the minor things below that I think you need, the patch good with
> > me.
> > 
> > Acked-by: Michael Neuling 
> > 
> > > Subject: Re: [PATCH v2 1/3] powerpc/tm: Clear the current thread's MSR[TS]
> > > after treclaim
> > 
> > The subject should mention "signals".
> 
> How about "powerpc/tm: Clear the current thread's MSR[TS] when 
> transaction is reclaimed on signal delivery"  ?

mpe also likes the word "fix" in the subject if it's a fix. So maybe:

 powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

Thanks,
Mikey

Re: [PATCH v2 1/3] powerpc/tm: Clear the current thread's MSR[TS] after treclaim

2020-02-04 Thread Michael Neuling

Other than the minor things below that I think you need, the patch good with me.

Acked-by: Michael Neuling 

> Subject: Re: [PATCH v2 1/3] powerpc/tm: Clear the current thread's MSR[TS] 
> after treclaim

The subject should mention "signals".

On Mon, 2020-02-03 at 13:09 -0300, Gustavo Luiz Duarte wrote:
> After a treclaim, we expect to be in non-transactional state. If we don't
> immediately clear the current thread's MSR[TS] and we get preempted, then
> tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
> suspended transaction state.

It's not "immediately", it's before re-enabling preemption. 

There is a similar comment in the code that needs to be fixed too.

> When handling a signal caught in transactional state, handle_rt_signal64()
> calls get_tm_stackpointer() that treclaims the transaction using
> tm_reclaim_current() but without clearing the thread's MSR[TS]. This can cause
> the TM Bad Thing exception below if later we pagefault and get preempted 
> trying
> to access the user's sigframe, using __put_user(). Afterwards, when we are
> rescheduled back into do_page_fault() (but now in suspended state since the
> thread's MSR[TS] was not cleared), upon executing 'rfid' after completion of
> the page fault handling, the exception is raised because a transition from
> suspended to non-transactional state is invalid.
> 
>   Unexpected TM Bad Thing exception at c000de44 (msr 
> 0x800302a03031) tm_scratch=80010280b033
>   Oops: Unrecoverable exception, sig: 6 [#1]
>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>   Modules linked in: nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
> nf_defrag_ipv4 ip6_tables ip_tables nft_compat ip_set nf_tables nfnetlink xts 
> vmx_crypto sg virtio_balloon
>   r_mod cdrom virtio_net net_failover virtio_blk virtio_scsi failover 
> dm_mirror dm_region_hash dm_log dm_mod
>   CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
>   NIP:  c000de44 LR: c0034728 CTR: 
>   REGS: c0003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
>   MSR:  800302a03031   CR: 44000884 
>  XER: 
>   CFAR: c000dda4 IRQMASK: 0
>   PACATMSCRATCH: 80010280b033
>   GPR00: c0034728 c00f65a17c80 c1662800 
> 7fffacf3fd78
>   GPR04: 1000 1000  
> c00f611f8af0
>   GPR08:  78006001  
> 000c
>   GPR12: c00f611f84b0 c0003ffcb200  
> 
>   GPR16:    
> 
>   GPR20:    
> c00f611f8140
>   GPR24:  7fffacf3fd68 c00f65a17d90 
> c00f611f7800
>   GPR28: c00f65a17e90 c00f65a17e90 c1685e18 
> 7fffacf3f000
>   NIP [c000de44] fast_exception_return+0xf4/0x1b0
>   LR [c0034728] handle_rt_signal64+0x78/0xc50
>   Call Trace:
>   [c00f65a17c80] [c0034710] handle_rt_signal64+0x60/0xc50 
> (unreliable)
>   [c00f65a17d30] [c0023640] do_notify_resume+0x330/0x460
>   [c00f65a17e20] [c000dcc4] ret_from_except_lite+0x70/0x74
>   Instruction dump:
>   7c4ff120 e8410170 7c5a03a6 3840 f8410060 e8010070 e8410080 e8610088
>   6000 6000 e8810090 e8210078 <4c24> 4800 e8610178 
> 88ed0989
>   ---[ end trace 93094aa44b442f87 ]---
> 
> The simplified sequence of events that triggers the above exception is:
> 
>   ... # userspace in NON-TRANSACTIONAL state
>   tbegin  # userspace in TRANSACTIONAL state
>   signal delivery # kernelspace in SUSPENDED state
>   handle_rt_signal64()
> get_tm_stackpointer()
>   treclaim# kernelspace in NON-TRANSACTIONAL state
> __put_user()
>   page fault happens. We will never get back here because of the TM Bad 
> Thing exception.
> 
>   page fault handling kicks in and we voluntarily preempt ourselves
>   do_page_fault()
> __schedule()
>   __switch_to(other_task)
> 
>   our task is rescheduled and we recheckpoint because the thread's MSR[TS] 
> was not cleared
>   __switch_to(our_task)
> switch_to_tm()
>   tm_recheckpoint_new_task()
> trechkpt  # kernelspace in SUSPENDED state
> 
>   The page fault handling resumes, but now we are in suspended transaction 
> state
>   do_page_fault()completes
>   rfid <- trying to get back where the page fau

CVE-2019-15031: Linux kernel: powerpc: data leak with FP/VMX triggerable by interrupt in transaction

2019-09-10 Thread Michael Neuling

The Linux kernel for powerpc since v4.15 has a bug in it's TM handling during
interrupts where any user can read the FP/VMX registers of a difference user's
process. Users of TM + FP/VMX can also experience corruption of their FP/VMX
state.

To trigger the bug, a process starts a transaction with FP/VMX off and then
takes an interrupt. Due to the kernels incorrect handling of the interrupt,
FP/VMX is turned on but the checkpointed state is not updated. If this
transaction then rolls back, the checkpointed state may contain the state of a
different process. This checkpointed state can then be read by the process hence
leaking data from one process to another.

The trigger for this bug is an interrupt inside a transaction where FP/VMX is
off, hence the process needs FP/VMX off when starting the transaction. FP/VMX
availability is under the control of the kernel and is transparent to the user,
hence the user has to retry the transaction many times to trigger this bug. High
interrupt loads also help trigger this bug.

All 64-bit machines where TM is present are affected. This includes all POWER8
variants and POWER9 VMs under KVM or LPARs under PowerVM. POWER9 bare metal
doesn't support TM and hence is not affected.

The bug was introduced in commit:
fa7771176b439 ("powerpc: Don't enable FP/Altivec if not checkpointed")
Which was originally merged in v4.15

The upstream fix is here:
https://git.kernel.org/torvalds/c/a8318c13e79badb92bc6640704a64cc022a6eb97

The fix can be verified by running the tm-poison from the kernel selftests. This
test is in a patch here:
https://patchwork.ozlabs.org/patch/1157467/
which should eventually end up here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/powerpc/tm/tm-poison.c

cheers
Mikey

CVE-2019-15030: Linux kernel: powerpc: data leak with FP/VMX triggerable by unavailable exception in transaction

2019-09-10 Thread Michael Neuling

The Linux kernel for powerpc since v4.12 has a bug in it's TM handling where any
user can read the FP/VMX registers of a difference user's process. Users of TM +
FP/VMX can also experience corruption of their FP/VMX state.

To trigger the bug, a process starts a transaction and reads a FP/VMX register.
This transaction can then fail which causes a rollback to the checkpointed
state. Due to the kernel taking an FP/VMX unavaliable exception inside a
transaction and the kernel's incorrect handling of this, the checkpointed state
can be set to the FP/VMX registers of another process. This checkpointed state
can then be read by the process hence leaking data from one process to another.

The trigger for this bug is an FP/VMX unavailable exception inside a
transaction, hence the process needs FP/VMX off when starting the transaction.
FP/VMX availability is under the control of the kernel and is transparent to the
user, hence the user has to retry the transaction many times to trigger this
bug.

The bug was introduced in commit:
f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption")
Which was originally merged in v4.12

The upstream fix is here:
https://git.kernel.org/torvalds/c/8205d5d98ef7f155de211f5e2eb6ca03d95a5a60

cheers
Mikey

[PATCH 2/3] powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts

2019-09-02 Thread Michael Neuling

From: Gustavo Romero 

When in userspace and MSR FP=0 the hardware FP state is unrelated to
the current process. This is extended for transactions where if tbegin
is run with FP=0, the hardware checkpoint FP state will also be
unrelated to the current process. Due to this, we need to ensure this
hardware checkpoint is updated with the correct state before we enable
FP for this process.

Unfortunately we get this wrong when returning to a process from a
hardware interrupt. A process that starts a transaction with FP=0 can
take an interrupt. When the kernel returns back to that process, we
change to FP=1 but with hardware checkpoint FP state not updated. If
this transaction is then rolled back, the FP registers now contain the
wrong state.

The process looks like this:
   Userspace:  Kernel

   Start userspace
with MSR FP=0 TM=1
  < -
   ...
   tbegin
   bne
   Hardware interrupt
    >


ret_from_except
  restore_math()
/* sees FP=0 */
restore_fp()
  tm_active_with_fp()
/* sees FP=1 (Incorrect) */
  load_fp_state()
FP = 0 -> 1
  < -
   Return to userspace
 with MSR TM=1 FP=1
 with junk in the FP TM checkpoint
   TM rollback
   reads FP junk

When returning from the hardware exception, tm_active_with_fp() is
incorrectly making restore_fp() call load_fp_state() which is setting
FP=1.

The fix is to remove tm_active_with_fp().

tm_active_with_fp() is attempting to handle the case where FP state
has been changed inside a transaction. In this case the checkpointed
and transactional FP state is different and hence we must restore the
FP state (ie. we can't do lazy FP restore inside a transaction that's
used FP). It's safe to remove tm_active_with_fp() as this case is
handled by restore_tm_state(). restore_tm_state() detects if FP has
been using inside a transaction and will set load_fp and call
restore_math() to ensure the FP state (checkpoint and transaction) is
restored.

This is a data integrity problem for the current process as the FP
registers are corrupted. It's also a security problem as the FP
registers from one process may be leaked to another.

Similarly for VMX.

A simple testcase to replicate this will be posted to
tools/testing/selftests/powerpc/tm/tm-poison.c

This fixes CVE-2019-15031.

Fixes: a7771176b439 ("powerpc: Don't enable FP/Altivec if not checkpointed")
Cc: sta...@vger.kernel.org # 4.15+
Signed-off-by: Gustavo Romero 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/process.c | 18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 437b57068c..7a84c9f177 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -101,21 +101,8 @@ static void check_if_tm_restore_required(struct 
task_struct *tsk)
}
 }
 
-static bool tm_active_with_fp(struct task_struct *tsk)
-{
-   return MSR_TM_ACTIVE(tsk->thread.regs->msr) &&
-   (tsk->thread.ckpt_regs.msr & MSR_FP);
-}
-
-static bool tm_active_with_altivec(struct task_struct *tsk)
-{
-   return MSR_TM_ACTIVE(tsk->thread.regs->msr) &&
-   (tsk->thread.ckpt_regs.msr & MSR_VEC);
-}
 #else
 static inline void check_if_tm_restore_required(struct task_struct *tsk) { }
-static inline bool tm_active_with_fp(struct task_struct *tsk) { return false; }
-static inline bool tm_active_with_altivec(struct task_struct *tsk) { return 
false; }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
 bool strict_msr_control;
@@ -252,7 +239,7 @@ EXPORT_SYMBOL(enable_kernel_fp);
 
 static int restore_fp(struct task_struct *tsk)
 {
-   if (tsk->thread.load_fp || tm_active_with_fp(tsk)) {
+   if (tsk->thread.load_fp) {
load_fp_state(>thread.fp_state);
current->thread.load_fp++;
return 1;
@@ -334,8 +321,7 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
 
 static int restore_altivec(struct task_struct *tsk)
 {
-   if (cpu_has_feature(CPU_FTR_ALTIVEC) &&
-   (tsk->thread.load_vec || tm_active_with_altivec(tsk))) {
+   if (cpu_has_feature(CPU_FTR_ALTIVEC) && (tsk->thread.load_vec)) {
load_vr_state(>thread.vr_state);
tsk->thread.used_vr = 1;
tsk->thread.load_vec++;
-- 
2.21.0

[PATCH 1/3] powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction

2019-09-02 Thread Michael Neuling

From: Gustavo Romero 

When we take an FP unavailable exception in a transaction we have to
account for the hardware FP TM checkpointed registers being
incorrect. In this case for this process we know the current and
checkpointed FP registers must be the same (since FP wasn't used
inside the transaction) hence in the thread_struct we copy the current
FP registers to the checkpointed ones.

This copy is done in tm_reclaim_thread(). We use thread->ckpt_regs.msr
to determine if FP was on when in userspace. thread->ckpt_regs.msr
represents the state of the MSR when exiting userspace. This is setup
by check_if_tm_restore_required().

Unfortunatley there is an optimisation in giveup_all() which returns
early if tsk->thread.regs->msr (via local variable `usermsr`) has
FP=VEC=VSX=SPE=0. This optimisation means that
check_if_tm_restore_required() is not called and hence
thread->ckpt_regs.msr is not updated and will contain an old value.

This can happen if due to load_fp=255 we start a userspace process
with MSR FP=1 and then we are context switched out. In this case
thread->ckpt_regs.msr will contain FP=1. If that same process is then
context switched in and load_fp overflows, MSR will have FP=0. If that
process now enters a transaction and does an FP instruction, the FP
unavailable will not update thread->ckpt_regs.msr (the bug) and MSR
FP=1 will be retained in thread->ckpt_regs.msr.  tm_reclaim_thread()
will then not perform the required memcpy and the checkpointed FP regs
in the thread struct will contain the wrong values.

The code path for this happening is:

   Userspace:  Kernel
   Start userspace
with MSR FP/VEC/VSX/SPE=0 TM=1
  < -
   ...
   tbegin
   bne
   fp instruction
   FP unavailable
    >
fp_unavailable_tm()
  tm_reclaim_current()
tm_reclaim_thread()
  giveup_all()
return early since FP/VMX/VSX=0
/* ckpt MSR not updated 
(Incorrect) */
  tm_reclaim()
/* thread_struct ckpt FP regs 
contain junk (OK) */
  /* Sees ckpt MSR FP=1 (Incorrect) 
*/
  no memcpy() performed
/* thread_struct ckpt FP regs 
not fixed (Incorrect) */
  tm_recheckpoint()
 /* Put junk in hardware checkpoint 
FP regs */
 
  < -
   Return to userspace
 with MSR TM=1 FP=1
 with junk in the FP TM checkpoint
   TM rollback
   reads FP junk

This is a data integrity problem for the current process as the FP
registers are corrupted. It's also a security problem as the FP
registers from one process may be leaked to another.

This patch moves up check_if_tm_restore_required() in giveup_all() to
ensure thread->ckpt_regs.msr is updated correctly.

A simple testcase to replicate this will be posted to
tools/testing/selftests/powerpc/tm/tm-poison.c

Similarly for VMX.

This fixes CVE-2019-15030.

Fixes: f48e91e87e67 ("powerpc/tm: Fix FP and VMX register corruption")
Cc: sta...@vger.kernel.org # 4.12+
Signed-off-by: Gustavo Romero 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/process.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8fc4de0d22..437b57068c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -497,13 +497,14 @@ void giveup_all(struct task_struct *tsk)
if (!tsk->thread.regs)
return;
 
+   check_if_tm_restore_required(tsk);
+
usermsr = tsk->thread.regs->msr;
 
if ((usermsr & msr_all_available) == 0)
return;
 
msr_check_and_set(msr_all_available);
-   check_if_tm_restore_required(tsk);
 
WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & 
MSR_VEC)));
 
-- 
2.21.0

[PATCH 3/3] powerpc/tm: Add tm-poison test

2019-09-02 Thread Michael Neuling

From: Gustavo Romero 

Add TM selftest to check if FP or VEC register values from one process
can leak into another process when both run on the same CPU.

This tests for CVE-2019-15030 and CVE-2019-15031.

Signed-off-by: Gustavo Romero 
Signed-off-by: Michael Neuling 
---
 tools/testing/selftests/powerpc/tm/.gitignore |   1 +
 tools/testing/selftests/powerpc/tm/Makefile   |   2 +-
 .../testing/selftests/powerpc/tm/tm-poison.c  | 180 ++
 3 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-poison.c

diff --git a/tools/testing/selftests/powerpc/tm/.gitignore 
b/tools/testing/selftests/powerpc/tm/.gitignore
index 951fe855f7..98f2708d86 100644
--- a/tools/testing/selftests/powerpc/tm/.gitignore
+++ b/tools/testing/selftests/powerpc/tm/.gitignore
@@ -17,3 +17,4 @@ tm-vmx-unavail
 tm-unavailable
 tm-trap
 tm-sigreturn
+tm-poison
diff --git a/tools/testing/selftests/powerpc/tm/Makefile 
b/tools/testing/selftests/powerpc/tm/Makefile
index c0734ed0ef..b15a1a325b 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -5,7 +5,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr 
tm-signal-context-chk-fpu
 TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv 
tm-signal-stack \
tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
tm-trap \
$(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-sigreturn-nt \
-   tm-signal-context-force-tm
+   tm-signal-context-force-tm tm-poison
 
 top_srcdir = ../../../../..
 include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/tm/tm-poison.c 
b/tools/testing/selftests/powerpc/tm/tm-poison.c
new file mode 100644
index 00..0113da7a8d
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-poison.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2019, Gustavo Romero, Michael Neuling, IBM Corp.
+ *
+ * This test will spawn two processes. Both will be attached to the same
+ * CPU (CPU 0). The child will be in a loop writing to FP register f31 and
+ * VMX/VEC/Altivec register vr31 a known value, called poison, calling
+ * sched_yield syscall after to allow the parent to switch on the CPU.
+ * Parent will set f31 and vr31 to 1 and in a loop will check if f31 and
+ * vr31 remain 1 as expected until a given timeout (2m). If the issue is
+ * present child's poison will leak into parent's f31 or vr31 registers,
+ * otherwise, poison will never leak into parent's f31 and vr31 registers.
+ *
+ * This test for CVE-2019-15030 and CVE-2019-15031.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "tm.h"
+
+int tm_poison_test(void)
+{
+   int pid;
+   cpu_set_t cpuset;
+   uint64_t poison = 0xdeadbeefc0dec0fe;
+   uint64_t unknown = 0;
+   bool fail_fp = false;
+   bool fail_vr = false;
+
+   SKIP_IF(!have_htm());
+
+   /* Attach both Child and Parent to CPU 0 */
+   CPU_ZERO();
+   CPU_SET(0, );
+   sched_setaffinity(0, sizeof(cpuset), );
+
+   pid = fork();
+   if (!pid) {
+   /**
+* child
+*/
+   while (1) {
+   sched_yield();
+   asm (
+   "mtvsrd 31, %[poison];" // f31 = poison
+   "mtvsrd 63, %[poison];" // vr31 = poison
+
+   : : [poison] "r" (poison) : );
+   }
+   }
+
+   /**
+* parent
+*/
+   asm (
+   /*
+* Set r3, r4, and f31 to known value 1 before entering
+* in transaction. They won't be written after that.
+*/
+   "   li  3, 0x1  ;"
+   "   li  4, 0x1  ;"
+   "   mtvsrd  31, 4   ;"
+
+   /*
+* The Time Base (TB) is a 64-bit counter register that is
+* independent of the CPU clock and which is incremented
+* at a frequency of 51200 Hz, so every 1.953125ns.
+* So it's necessary 120s/0.1953125s = 6144000
+* increments to get a 2 minutes timeout. Below we set that
+* value in r5 and then use r6 to track initial TB value,
+* updating TB values in r7 at every iteration and comparing it
+* to r6. When r7 (current) - r6 (initial) > 6144000 we bail
+* out since for sure we spent already 2 minutes in the loop.
+* SPR 268 is the TB register.
+*/
+   "   lis 5, 14   ;"
+   "   ori 5, 5, 19996 ;"
+   "   sldi5, 5, 16

CVE-2019-13648: Linux kernel: powerpc: kernel crash in TM handling triggerable by any local user

2019-07-29 Thread Michael Neuling

The Linux kernel for powerpc since v3.9 has a bug in the TM handling  where any
unprivileged local user may crash the operating system.

This bug affects machines using 64-bit CPUs where Transactional Memory (TM) is
not present or has been disabled (see below for more details on affected CPUs).

To trigger the bug a process constructs a signal context which still has the MSR
TS bits set. That process then passes this signal context to the sigreturn()
system call. When returning back to userspace, the kernel then crashes with a
bad TM transition (TM Bad Thing) or by executing TM code on a non-TM system.

All 64bit machines where TM is not present are affected. This includes PowerPC
970 (G5), PA6T, POWER5/6/7 VMs under KVM or LPARs under PowerVM and POWER9 bare
metal. 

Additionally systems with TM hardware but where TM is disabled in software (via
ppc_tm=off kernel cmdline) are also affected. This includes POWER8/9 VMs under
KVM or LPARs under PowerVM and POWER8 bare metal.

The bug was introduced in commit:
  2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal 
context")

Which was originally merged in v3.9. 

The upstream fix is here:
  https://git.kernel.org/torvalds/c/f16d80b75a096c52354c6e0a574993f3b0dfbdfe

The fix can be verified by running `sigfuz -m` from the kernel selftests:
 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/powerpc/signal/sigfuz.c?h=v5.2

cheers
Mikey

[PATCH] powerpc/tm: Fix oops on sigreturn on systems without TM

2019-07-18 Thread Michael Neuling

On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:

[   74.980557] Unexpected TM Bad Thing exception at c00022fc (msr 
0x800102a03031) tm_scratch=80020280f033
[   74.980741] Oops: Unrecoverable exception, sig: 6 [#1]
[   74.980820] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   74.980917] Modules linked in:
[   74.980980] CPU: 0 PID: 1636 Comm: sigfuz Not tainted 
5.2.0-11043-g0a8ad0ffa4 #69
[   74.981096] NIP:  c00022fc LR: 7fffb2d67e48 CTR: 
[   74.981212] REGS: c0003fffbd70 TRAP: 0700   Not tainted  
(5.2.0-11045-g7142b497d8)
[   74.981325] MSR:  800102a03031   CR: 
42004242  XER: 
[   74.981463] CFAR: c00022e0 IRQMASK: 0
[   74.981463] GPR00: 0072 7fffb2b6e560 7fffb2d87f00 
0669
[   74.981463] GPR04: 7fffb2b6e728   
7fffb2b6f2a8
[   74.981463] GPR08:    

[   74.981463] GPR12:  7fffb2b76900  

[   74.981463] GPR16: 7fffb237 7fffb2d84390 7fffea3a15ac 
01000a250420
[   74.981463] GPR20: 7fffb2b6f260 10001770  

[   74.981463] GPR24: 7fffb2d843a0 7fffea3a14a0 0001 
0080
[   74.981463] GPR28: 7fffea3a14d8 003d0f00  
7fffb2b6e728
[   74.982420] NIP [c00022fc] rfi_flush_fallback+0x7c/0x80
[   74.982517] LR [7fffb2d67e48] 0x7fffb2d67e48
[   74.982593] Call Trace:
[   74.982632] Instruction dump:
[   74.982691] e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 
e92d0c00
[   74.982809] e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c24> 7db243a6 7db142a6 
f82d0c18

The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is on. This may not be the case as with
P9 powernv or if `ppc_tm=off` is used on P8.

This means any local user can crash the system.

Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.

Found with sigfuz kernel selftest on P9.

This fixes CVE-2019-13648.

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal 
context")
Cc: sta...@vger.kernel.org # v3.9
Reported-by: Praveen Pandey 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/signal_32.c | 3 +++
 arch/powerpc/kernel/signal_64.c | 5 +
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index f50b708d6d..98600b276f 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1198,6 +1198,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
goto bad;
 
if (MSR_TM_ACTIVE(msr_hi<<32)) {
+   /* Trying to start TM on non TM system */
+   if (!cpu_has_feature(CPU_FTR_TM))
+   goto bad;
/* We only recheckpoint on return if we're
 * transaction.
 */
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 2f80e270c7..117515564e 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -771,6 +771,11 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (MSR_TM_ACTIVE(msr)) {
/* We recheckpoint on return. */
struct ucontext __user *uc_transact;
+
+   /* Trying to start TM on non TM system */
+   if (!cpu_has_feature(CPU_FTR_TM))
+   goto badframe;
+
if (__get_user(uc_transact, >uc_link))
goto badframe;
if (restore_tm_sigcontexts(current, >uc_mcontext,
-- 
2.21.0

Re: [PATCH] KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation

2019-06-24 Thread Michael Neuling

On Mon, 2019-06-24 at 21:48 +1000, Michael Ellerman wrote:
> Michael Neuling  writes:
> > When emulating tsr, treclaim and trechkpt, we incorrectly set CR0. The
> > code currently sets:
> > CR0 <- 00 || MSR[TS]
> > but according to the ISA it should be:
> > CR0 <-  0 || MSR[TS] || 0
> 
> Seems bad, what's the worst case impact?

It's a data integrity issue as CR0 is corrupted.

> Do we have a test case for this?

Suraj has a KVM unit test for it.

> > This fixes the bit shift to put the bits in the correct location.
> 
> Fixes: ?

It's been around since we first wrote the code so:

Fixes: 4bb3c7a0208fc13c ("KVM: PPC: Book3S HV: Work around transactional memory 
bugs in POWER9")

Mikey

[PATCH] KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation

2019-06-20 Thread Michael Neuling

When emulating tsr, treclaim and trechkpt, we incorrectly set CR0. The
code currently sets:
CR0 <- 00 || MSR[TS]
but according to the ISA it should be:
CR0 <-  0 || MSR[TS] || 0

This fixes the bit shift to put the bits in the correct location.

Tested-by: Suraj Jitindar Singh 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kvm/book3s_hv_tm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
index 888e2609e3..31cd0f327c 100644
--- a/arch/powerpc/kvm/book3s_hv_tm.c
+++ b/arch/powerpc/kvm/book3s_hv_tm.c
@@ -131,7 +131,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
}
/* Set CR0 to indicate previous transactional state */
vcpu->arch.regs.ccr = (vcpu->arch.regs.ccr & 0x0fff) |
-   (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+   (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 29);
/* L=1 => tresume, L=0 => tsuspend */
if (instr & (1 << 21)) {
if (MSR_TM_SUSPENDED(msr))
@@ -175,7 +175,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 
/* Set CR0 to indicate previous transactional state */
vcpu->arch.regs.ccr = (vcpu->arch.regs.ccr & 0x0fff) |
-   (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+   (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 29);
vcpu->arch.shregs.msr &= ~MSR_TS_MASK;
return RESUME_GUEST;
 
@@ -205,7 +205,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 
/* Set CR0 to indicate previous transactional state */
vcpu->arch.regs.ccr = (vcpu->arch.regs.ccr & 0x0fff) |
-   (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 28);
+   (((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 29);
vcpu->arch.shregs.msr = msr | MSR_TS_S;
return RESUME_GUEST;
}
-- 
2.21.0

Re: [PATCH v5 2/2] powerpc: Fix compile issue with force DAWR

2019-06-18 Thread Michael Neuling

On Tue, 2019-06-18 at 18:28 +0200, Christophe Leroy wrote:
> 
> Le 04/06/2019 à 05:00, Michael Neuling a écrit :
> > If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
> > at linking with:
> >arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined
> > reference to `dawr_force_enable'
> > 
> > This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
> > DAWR on P9 option").
> > 
> > This moves a bunch of code around to fix this. It moves a lot of the
> > DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
> > compiling it.
> 
> After looking at all this once more, I'm just wondering: why are we 
> creating stuff specific to DAWR ?
> 
> In the old days, we only add DABR, and everything was named on DABR.
> When DAWR was introduced some years ago we renamed stuff like do_dabr() 
> to do_break() so that we could regroup things together. And now we are 
> taking dawr() out of the rest. Why not keep dabr() stuff and dawr() 
> stuff all together in something dedicated to breakpoints, and try to 
> regroup all breakpoint stuff in a single place ? I see some 
> breakpointing stuff done in kernel/process.c and other things done in 
> hw_breakpoint.c, to common functions call from one file to the other, 
> preventing GCC to fully optimise, etc ...
> 
> Also, behing this thinking, I have the idea that we could easily 
> implement 512 bytes breakpoints on the 8xx too. The 8xx have neither 
> DABR nor DAWR, but is using a set of comparators. And as you can see in 
> the 8xx version of __set_dabr() in kernel/process.c, we emulate the DABR 
> behaviour by setting two comparators. By using the same comparators with 
> a different setup, we should be able to implement breakpoints on larger 
> ranges of address.

Christophe

I agree that their are opportunities to refactor this code and I appreciate your
efforts in making this code better but... 

We have a problem here of not being able to compile an odd ball case that almost
no one ever hits (it was just an odd mpe CI case). We're up to v5 of a simple
fix which is just silly. 

So let's get this fix in and move on to the whole bunch of refactoring we can do
in this code which is already documented in the github issue tracking.

Regards,
Mikey

Re: [PATCH 5/5] Powerpc/Watchpoint: Fix length calculation for unaligned target

2019-06-18 Thread Michael Neuling

On Tue, 2019-06-18 at 09:57 +0530, Ravi Bangoria wrote:
> Watchpoint match range is always doubleword(8 bytes) aligned on
> powerpc. If the given range is crossing doubleword boundary, we
> need to increase the length such that next doubleword also get
> covered. Ex,
> 
>   address   len = 6 bytes
> |=.
>|v--|--v|
>| | | | | | | | | | | | | | | | |
>|---|---|
> <---8 bytes--->
> 
> In such case, current code configures hw as:
>   start_addr = address & ~HW_BREAKPOINT_ALIGN
>   len = 8 bytes
> 
> And thus read/write in last 4 bytes of the given range is ignored.
> Fix this by including next doubleword in the length. Watchpoint
> exception handler already ignores extraneous exceptions, so no
> changes required for that.

Nice catch. Thanks.

I assume this has been broken forever? Should we be CCing stable? If so, it
would be nice to have this self contained (separate from the refactor) so we can
more easily backport it.

Also, can you update 
tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c to catch this issue?

A couple more comments below.

> 
> Signed-off-by: Ravi Bangoria 
> ---
>  arch/powerpc/include/asm/hw_breakpoint.h |  7 ++--
>  arch/powerpc/kernel/hw_breakpoint.c  | 44 +---
>  arch/powerpc/kernel/process.c| 34 --
>  3 files changed, 60 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_breakpoint.h
> b/arch/powerpc/include/asm/hw_breakpoint.h
> index 8acbbdd4a2d5..749a357164d5 100644
> --- a/arch/powerpc/include/asm/hw_breakpoint.h
> +++ b/arch/powerpc/include/asm/hw_breakpoint.h
> @@ -34,6 +34,8 @@ struct arch_hw_breakpoint {
>  #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
>HW_BRK_TYPE_HYP)
>  
> +#define HW_BREAKPOINT_ALIGN 0x7
> +
>  #ifdef CONFIG_HAVE_HW_BREAKPOINT
>  #include 
>  #include 
> @@ -45,8 +47,6 @@ struct pmu;
>  struct perf_sample_data;
>  struct task_struct;
>  
> -#define HW_BREAKPOINT_ALIGN 0x7
> -
>  extern int hw_breakpoint_slots(int type);
>  extern int arch_bp_generic_fields(int type, int *gen_bp_type);
>  extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
> @@ -76,7 +76,8 @@ static inline void hw_breakpoint_disable(void)
>  }
>  extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
>  int hw_breakpoint_handler(struct die_args *args);
> -
> +extern u16 hw_breakpoint_get_final_len(struct arch_hw_breakpoint *brk,
> + unsigned long *start_addr, unsigned long *end_addr);
>  extern int set_dawr(struct arch_hw_breakpoint *brk);
>  extern bool dawr_force_enable;
>  static inline bool dawr_enabled(void)
> diff --git a/arch/powerpc/kernel/hw_breakpoint.c
> b/arch/powerpc/kernel/hw_breakpoint.c
> index 36bcf705df65..c122fd55aa44 100644
> --- a/arch/powerpc/kernel/hw_breakpoint.c
> +++ b/arch/powerpc/kernel/hw_breakpoint.c
> @@ -126,6 +126,28 @@ int arch_bp_generic_fields(int type, int *gen_bp_type)
>   return 0;
>  }
>  
> +/* Maximum len for DABR is 8 bytes and DAWR is 512 bytes */
> +static int hw_breakpoint_validate_len(struct arch_hw_breakpoint *hw)
> +{
> + u16 length_max = 8;
> + u16 final_len;
> + unsigned long start_addr, end_addr;
> +
> + final_len = hw_breakpoint_get_final_len(hw, _addr, _addr);
> +
> + if (dawr_enabled()) {
> + length_max = 512;
> + /* DAWR region can't cross 512 bytes boundary */
> + if ((start_addr >> 9) != (end_addr >> 9))
> + return -EINVAL;
> + }
> +
> + if (final_len > length_max)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
>  /*
>   * Validate the arch-specific HW Breakpoint register settings
>   */
> @@ -133,12 +155,10 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
>const struct perf_event_attr *attr,
>struct arch_hw_breakpoint *hw)
>  {
> - int length_max;
> -
>   if (!ppc_breakpoint_available())
>   return -ENODEV;
>  
> - if (!bp)
> + if (!bp || !attr->bp_len)
>   return -EINVAL;
>  
>   hw->type = HW_BRK_TYPE_TRANSLATE;
> @@ -160,23 +180,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
>   hw->address = attr->bp_addr;
>   hw->len = attr->bp_len;
>  
> - length_max = 8; /* DABR */
> - if (dawr_enabled()) {
> - length_max = 512 ; /* 64 doublewords */
> - /* DAWR region can't cross 512 bytes boundary */
> - if ((hw->address >> 9) != ((hw->address + hw->len - 1) >> 9))
> - return -EINVAL;
> - }
> -
> - /*
> -  * Since breakpoint length can be a maximum of length_max and
> -  * breakpoint addresses are aligned to nearest double-word
> -  * HW_BREAKPOINT_ALIGN by rounding off to the lower address,
> -  * the 'symbolsize' should satisfy the check

Re: [PATCH 0/5] Powerpc/hw-breakpoint: Fixes plus Code refactor

2019-06-18 Thread Michael Neuling

On Tue, 2019-06-18 at 08:01 +0200, Christophe Leroy wrote:
> 
> Le 18/06/2019 à 06:27, Ravi Bangoria a écrit :
> > patch 1-3: Code refactor
> > patch 4: Speedup disabling breakpoint
> > patch 5: Fix length calculation for unaligned targets
> 
> While you are playing with hw breakpoints, did you have a look at 
> https://github.com/linuxppc/issues/issues/38 ?

Agreed and also: 

https://github.com/linuxppc/issues/issues/170

https://github.com/linuxppc/issues/issues/128 

Mikey

Re: [PATCH 4/5] Powerpc/hw-breakpoint: Optimize disable path

2019-06-18 Thread Michael Neuling

On Tue, 2019-06-18 at 09:57 +0530, Ravi Bangoria wrote:
> Directly setting dawr and dawrx with 0 should be enough to
> disable watchpoint. No need to reset individual bits in
> variable and then set in hw.

This seems like a pointless optimisation to me. 

I'm all for adding more code/complexity if it buys us some performance, but I
can't imagine this is a fast path (nor have you stated any performance
benefits). 

Mikey

> 
> Signed-off-by: Ravi Bangoria 
> ---
>  arch/powerpc/include/asm/hw_breakpoint.h |  3 ++-
>  arch/powerpc/kernel/process.c| 12 
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/hw_breakpoint.h
> b/arch/powerpc/include/asm/hw_breakpoint.h
> index 78202d5fb13a..8acbbdd4a2d5 100644
> --- a/arch/powerpc/include/asm/hw_breakpoint.h
> +++ b/arch/powerpc/include/asm/hw_breakpoint.h
> @@ -19,6 +19,7 @@ struct arch_hw_breakpoint {
>  /* Note: Don't change the the first 6 bits below as they are in the same
> order
>   * as the dabr and dabrx.
>   */
> +#define HW_BRK_TYPE_DISABLE  0x00
>  #define HW_BRK_TYPE_READ 0x01
>  #define HW_BRK_TYPE_WRITE0x02
>  #define HW_BRK_TYPE_TRANSLATE0x04
> @@ -68,7 +69,7 @@ static inline void hw_breakpoint_disable(void)
>   struct arch_hw_breakpoint brk;
>  
>   brk.address = 0;
> - brk.type = 0;
> + brk.type = HW_BRK_TYPE_DISABLE;
>   brk.len = 0;
>   if (ppc_breakpoint_available())
>   __set_breakpoint();
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index f002d286..265fac9fb3a4 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -793,10 +793,22 @@ static inline int set_dabr(struct arch_hw_breakpoint
> *brk)
>   return __set_dabr(dabr, dabrx);
>  }
>  
> +static int disable_dawr(void)
> +{
> + if (ppc_md.set_dawr)
> + return ppc_md.set_dawr(0, 0);
> +
> + mtspr(SPRN_DAWRX, 0);
> + return 0;
> +}
> +
>  int set_dawr(struct arch_hw_breakpoint *brk)
>  {
>   unsigned long dawr, dawrx, mrd;
>  
> + if (brk->type == HW_BRK_TYPE_DISABLE)
> + return disable_dawr();
> +
>   dawr = brk->address;
>  
>   dawrx  = (brk->type & HW_BRK_TYPE_RDWR) << (63 - 58);

Re: [PATCH 3/5] Powerpc/hw-breakpoint: Refactor set_dawr()

2019-06-18 Thread Michael Neuling

This is going to collide with this patch 
https://patchwork.ozlabs.org/patch/1109594/

Mikey


On Tue, 2019-06-18 at 09:57 +0530, Ravi Bangoria wrote:
> Remove unnecessary comments. Code itself is self explanatory.
> And, ISA already talks about MRD field. I Don't think we need
> to re-describe it.
> 
> Signed-off-by: Ravi Bangoria 
> ---
>  arch/powerpc/kernel/process.c | 17 +
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index f0fbbf6a6a1f..f002d286 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -799,18 +799,11 @@ int set_dawr(struct arch_hw_breakpoint *brk)
>  
>   dawr = brk->address;
>  
> - dawrx  = (brk->type & (HW_BRK_TYPE_READ | HW_BRK_TYPE_WRITE)) \
> -<< (63 - 58); //* read/write bits */
> - dawrx |= ((brk->type & (HW_BRK_TYPE_TRANSLATE)) >> 2) \
> -<< (63 - 59); //* translate */
> - dawrx |= (brk->type & (HW_BRK_TYPE_PRIV_ALL)) \
> ->> 3; //* PRIM bits */
> - /* dawr length is stored in field MDR bits 48:53.  Matches range in
> -doublewords (64 bits) baised by -1 eg. 0b00=1DW and
> -0b11=64DW.
> -brk->len is in bytes.
> -This aligns up to double word size, shifts and does the bias.
> - */
> + dawrx  = (brk->type & HW_BRK_TYPE_RDWR) << (63 - 58);
> + dawrx |= ((brk->type & HW_BRK_TYPE_TRANSLATE) >> 2) << (63 - 59);
> + dawrx |= (brk->type & HW_BRK_TYPE_PRIV_ALL) >> 3;
> +
> + /* brk->len is in bytes. */
>   mrd = ((brk->len + 7) >> 3) - 1;
>   dawrx |= (mrd & 0x3f) << (63 - 53);
>

Re: [PATCH 1/5] Powerpc/hw-breakpoint: Replace stale do_dabr() with do_break()

2019-06-18 Thread Michael Neuling

> Subject: Powerpc/hw-breakpoint: Replace stale do_dabr() with do_break()

Can you add the word "comment" to this subject. Currently it implies there are
code changes here.

Mikey


On Tue, 2019-06-18 at 09:57 +0530, Ravi Bangoria wrote:
> do_dabr() was renamed with do_break() long ago. But I still see
> some comments mentioning do_dabr(). Replace it.
> 
> Signed-off-by: Ravi Bangoria 
> ---
>  arch/powerpc/kernel/hw_breakpoint.c | 2 +-
>  arch/powerpc/kernel/ptrace.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/hw_breakpoint.c
> b/arch/powerpc/kernel/hw_breakpoint.c
> index a293a53b4365..1908e4fcc132 100644
> --- a/arch/powerpc/kernel/hw_breakpoint.c
> +++ b/arch/powerpc/kernel/hw_breakpoint.c
> @@ -232,7 +232,7 @@ int hw_breakpoint_handler(struct die_args *args)
>* Return early after invoking user-callback function without restoring
>* DABR if the breakpoint is from ptrace which always operates in
>* one-shot mode. The ptrace-ed process will receive the SIGTRAP signal
> -  * generated in do_dabr().
> +  * generated in do_break().
>*/
>   if (bp->overflow_handler == ptrace_triggered) {
>   perf_bp_event(bp, regs);
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 684b0b315c32..44b823e5e8c8 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -2373,7 +2373,7 @@ void ptrace_triggered(struct perf_event *bp,
>   /*
>* Disable the breakpoint request here since ptrace has defined a
>* one-shot behaviour for breakpoint exceptions in PPC64.
> -  * The SIGTRAP signal is generated automatically for us in do_dabr().
> +  * The SIGTRAP signal is generated automatically for us in do_break().
>* We don't have to do anything about that here
>*/
>   attr = bp->attr;

Re: [PATCH] KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()

2019-06-12 Thread Michael Neuling



> > > 3:
> > > /* Emulate H_SET_DABR/X on P8 for the sake of compat mode
> > > guests */
> > > rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> > > c010b03c:   74 2e 85 50 rlwimi  r5,r4,5,25,26
> > > rlwimi  r5, r4, 2, DAWRX_WT
> > > c010b040:   f6 16 85 50 rlwimi  r5,r4,2,27,27
> > > clrrdi  r4, r4, 3
> > > c010b044:   24 07 84 78 rldicr  r4,r4,0,60
> > > std r4, VCPU_DAWR(r3)
> > > c010b048:   c0 13 83 f8 std r4,5056(r3)
> > > std r5, VCPU_DAWRX(r3)
> > > c010b04c:   c8 13 a3 f8 std r5,5064(r3)
> > > mtspr   SPRN_DAWR, r4
> > > c010b050:   a6 2b 94 7c mtspr   180,r4
> > > mtspr   SPRN_DAWRX, r5
> > > c010b054:   a6 2b bc 7c mtspr   188,r5
> > > li  r3, 0
> > > c010b058:   00 00 60 38 li  r3,0
> > > blr
> > > c010b05c:   20 00 80 4e blr
> > 
> > It's the `mtspr   SPRN_DAWR, r4` as you're HV=0.  I'm not sure how
> > nested works
> > in that regard. Is the level above suppose to trap and emulate
> > that?  
> > 
> 
> Yeah so as a nested hypervisor we need to avoid that call to mtspr
> SPRN_DAWR since it's HV privileged and we run with HV = 0.
> 
> The fix will be to check kvmhv_on_pseries() before doing the write. In
> fact we should avoid the write any time we call the function from _not_
> real mode.
> 
> I'll submit a fix for the KVM side. Doesn't look like this is anything
> to do with Mikey's patch, was always broken as far as I can tell.

Thanks Suraj.

Mikey

[PATCH v2] KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()

2019-06-12 Thread Michael Neuling

Commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") screwed up some assembler and corrupted a pointer in
r3. This resulted in crashes like the below:

  [   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
  [   44.374848] Faulting instruction address: 0xc010b044
  [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
  [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
pSeries
  [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm 
sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi 
failover
  [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not 
tainted 5.2.0-rc4+ #3
  [   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR: 
c010aff4
  [   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
  [   44.375691] MSR:  8280b033   CR: 
42244842  XER: 
  [   44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 4200 
IRQMASK: 0
  [   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 

  [   44.375815] GPR04:  0003 2444b05d 
c017f11c45d0
  [   44.375815] GPR08: 07803e018dfe 0028 0001 
0075
  [   44.375815] GPR12: c010aff4 c7ff6300  

  [   44.375815] GPR16:  c017f11d  
c017f11ca7a8
  [   44.375815] GPR20: c017f11c42ec   
000a
  [   44.375815] GPR24: fffc  c017f11c 
c1a77ed8
  [   44.375815] GPR28: c0179af7 fffc c008089ff170 
c0179ae88540
  [   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
  [   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 
[kvm_hv]
  [   44.376849] Call Trace:
  [   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c 
(unreliable)
  [   44.376982] [c0179b397a10] [c008089dd6bc] 
kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
  [   44.377084] [c0179b397ae0] [c008093f8bcc] 
kvmppc_vcpu_run+0x34/0x48 [kvm]
  [   44.377185] [c0179b397b00] [c008093f522c] 
kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
  [   44.377286] [c0179b397b90] [c008093e3618] 
kvm_vcpu_ioctl+0x460/0x850 [kvm]
  [   44.377384] [c0179b397d00] [c04ba6c4] do_vfs_ioctl+0xe4/0xb40
  [   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
  [   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
  [   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
  [   44.377712] Instruction dump:
  [   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 2c2b 
3860
  [   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8 
7c942ba6 7cbc2ba6

Fix the bug by only changing r3 when we are returning immediately.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling 
Reported-by: Cédric Le Goater 
--
mpe: This is for 5.2 fixes

v2: Review from Christophe Leroy
  - De-Mikey/Cedric-ify commit message
  - Add "Fixes:"
  - Other trivial commit messages changes
  - No code change
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 139027c62d..f781ee1458 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
LOAD_REG_ADDR(r11, dawr_force_enable)
lbz r11, 0(r11)
cmpdi   r11, 0
+   bne 3f
li  r3, H_HARDWARE
-   beqlr
+   blr
+3:
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
rlwimi  r5, r4, 2, DAWRX_WT
-- 
2.21.0

Re: [PATCH] KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()

2019-06-12 Thread Michael Neuling

On Wed, 2019-06-12 at 09:43 +0200, Cédric Le Goater wrote:
> On 12/06/2019 09:22, Michael Neuling wrote:
> > In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
> > option") I screwed up some assembler and corrupted a pointer in
> > r3. This resulted in crashes like the below from Cédric:
> > 
> >   [   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
> >   [   44.374848] Faulting instruction address: 0xc010b044
> >   [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
> >   [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> > pSeries
> >   [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
> > iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
> > nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
> > bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
> > iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv 
> > kvm sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover 
> > virtio_scsi failover
> >   [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not 
> > tainted 5.2.0-rc4+ #3
> >   [   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR: 
> > c010aff4
> >   [   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  
> > (5.2.0-rc4+)
> >   [   44.375691] MSR:  8280b033   
> > CR: 42244842  XER: 
> >   [   44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 
> > 4200 IRQMASK: 0
> >   [   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 
> > 
> >   [   44.375815] GPR04:  0003 2444b05d 
> > c017f11c45d0
> >   [   44.375815] GPR08: 07803e018dfe 0028 0001 
> > 0075
> >   [   44.375815] GPR12: c010aff4 c7ff6300  
> > 
> >   [   44.375815] GPR16:  c017f11d  
> > c017f11ca7a8
> >   [   44.375815] GPR20: c017f11c42ec   
> > 000a
> >   [   44.375815] GPR24: fffc  c017f11c 
> > c1a77ed8
> >   [   44.375815] GPR28: c0179af7 fffc c008089ff170 
> > c0179ae88540
> >   [   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
> >   [   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 
> > [kvm_hv]
> >   [   44.376849] Call Trace:
> >   [   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c 
> > (unreliable)
> >   [   44.376982] [c0179b397a10] [c008089dd6bc] 
> > kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
> >   [   44.377084] [c0179b397ae0] [c008093f8bcc] 
> > kvmppc_vcpu_run+0x34/0x48 [kvm]
> >   [   44.377185] [c0179b397b00] [c008093f522c] 
> > kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> >   [   44.377286] [c0179b397b90] [c008093e3618] 
> > kvm_vcpu_ioctl+0x460/0x850 [kvm]
> >   [   44.377384] [c0179b397d00] [c04ba6c4] 
> > do_vfs_ioctl+0xe4/0xb40
> >   [   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
> >   [   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
> >   [   44.377628] [c0179b397e20] [c0000000b888] system_call+0x5c/0x70
> >   [   44.377712] Instruction dump:
> >   [   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 
> > 2c2b 3860
> >   [   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8 
> > 7c942ba6 7cbc2ba6
> > 
> > This fixes the problem by only changing r3 when we are returning
> > immediately.
> > 
> > Signed-off-by: Michael Neuling 
> > Reported-by: Cédric Le Goater 
> 
> On nested, I still see : 
> 
> [   94.609274] Oops: Exception in kernel mode, sig: 4 [#1]
> [   94.609432] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> pSeries
> [   94.609596] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
> iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
> nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
> bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
> iptable_filter bpfilter vmx_crypto kvm_hv crct10dif_vpmsum crc32c_vpmsum kvm 
> sch_fq_codel ip_tables x_tables autofs4 virtio_net virtio_scsi net_failover 
> failover
> [   94.610179] CPU: 12 PID: 2026 Comm: qemu-system-ppc Kdump: loaded No

[PATCH] KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()

2019-06-12 Thread Michael Neuling

In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") I screwed up some assembler and corrupted a pointer in
r3. This resulted in crashes like the below from Cédric:

  [   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
  [   44.374848] Faulting instruction address: 0xc010b044
  [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
  [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
pSeries
  [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm 
sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi 
failover
  [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not 
tainted 5.2.0-rc4+ #3
  [   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR: 
c010aff4
  [   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
  [   44.375691] MSR:  8280b033   CR: 
42244842  XER: 
  [   44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 4200 
IRQMASK: 0
  [   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 

  [   44.375815] GPR04:  0003 2444b05d 
c017f11c45d0
  [   44.375815] GPR08: 07803e018dfe 0028 0001 
0075
  [   44.375815] GPR12: c010aff4 c7ff6300  

  [   44.375815] GPR16:  c017f11d  
c017f11ca7a8
  [   44.375815] GPR20: c017f11c42ec   
000a
  [   44.375815] GPR24: fffc  c017f11c 
c1a77ed8
  [   44.375815] GPR28: c0179af7 fffc c008089ff170 
c0179ae88540
  [   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
  [   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 
[kvm_hv]
  [   44.376849] Call Trace:
  [   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c 
(unreliable)
  [   44.376982] [c0179b397a10] [c008089dd6bc] 
kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
  [   44.377084] [c0179b397ae0] [c008093f8bcc] 
kvmppc_vcpu_run+0x34/0x48 [kvm]
  [   44.377185] [c0179b397b00] [c008093f522c] 
kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
  [   44.377286] [c0179b397b90] [c008093e3618] 
kvm_vcpu_ioctl+0x460/0x850 [kvm]
  [   44.377384] [c0179b397d00] [c04ba6c4] do_vfs_ioctl+0xe4/0xb40
  [   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
  [   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
  [   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
  [   44.377712] Instruction dump:
  [   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 2c2b 
3860
  [   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8 
7c942ba6 7cbc2ba6

This fixes the problem by only changing r3 when we are returning
immediately.

Signed-off-by: Michael Neuling 
Reported-by: Cédric Le Goater 
--
mpe: This is for 5.2 fixes
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 139027c62d..f781ee1458 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
LOAD_REG_ADDR(r11, dawr_force_enable)
lbz r11, 0(r11)
cmpdi   r11, 0
+   bne 3f
li  r3, H_HARDWARE
-   beqlr
+   blr
+3:
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
rlwimi  r5, r4, 2, DAWRX_WT
-- 
2.21.0

Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Michael Neuling

On Tue, 2019-06-11 at 09:51 +0200, Christophe Leroy wrote:
> 
> Le 11/06/2019 à 09:24, Michael Neuling a écrit :
> > On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:
> > > On 11/06/2019 08:44, Michael Neuling wrote:
> > > > > >   2:
> > > > > > -BEGIN_FTR_SECTION
> > > > > > -   /* POWER9 with disabled DAWR */
> > > > > > +   LOAD_REG_ADDR(r11, dawr_force_enable)
> > > > > > +   lbz r11, 0(r11)
> > > > > > +   cmpdi   r11, 0
> > > > > > li  r3, H_HARDWARE
> > > > > > -   blr
> > > > > > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > > > > > +   beqlr
> > > > > 
> > > > > Why is this a 'beqlr' ? Shouldn't it be a blr ?
> > > > 
> > > > I believe it's right and should be a beqlr.  It's to replace the FTR
> > > > section to
> > > > make it dynamic based on the dawr_force_enable bit.
> > > 
> > > hmm, see the crash below on a L1 running a nested guest. r3 is set
> > > to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
> > > this ?
> > > 
> > > C.
> > > 
> > > 
> > > [   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
> > > [   44.374848] Faulting instruction address: 0xc010b044
> > > [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
> > > [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> > > pSeries
> > > [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM
> > > iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack
> > > nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4
> > > xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter
> > > ip6_tables iptable_filter bpfilter vmx_crypto crct10dif_vpmsum
> > > crc32c_vpmsum kvm_hv kvm sch_fq_codel ip_tables x_tables autofs4
> > > virtio_net net_failover virtio_scsi failover
> > > [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not
> > > tainted 5.2.0-rc4+ #3
> > > [   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR:
> > > c010aff4
> > > [   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-
> > > rc4+)
> > > [   44.375691] MSR:  8280b033
> > >   CR: 42244842  XER: 
> > > [   44.375815] CFAR: c010aff8 DAR: 13bf DSISR:
> > > 4200 IRQMASK: 0
> > > [   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300
> > > 
> > > [   44.375815] GPR04:  0003 2444b05d
> > > c017f11c45d0
> > > [   44.375815] GPR08: 07803e018dfe 0028 0001
> > > 0075
> > > [   44.375815] GPR12: c010aff4 c7ff6300 
> > > 
> > > [   44.375815] GPR16:  c017f11d 
> > > c017f11ca7a8
> > > [   44.375815] GPR20: c017f11c42ec  
> > > 000a
> > > [   44.375815] GPR24: fffc  c017f11c
> > > c1a77ed8
> > > [   44.375815] GPR28: c0179af7 fffc c008089ff170
> > > c0179ae88540
> > > [   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
> > > [   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0
> > > [kvm_hv]
> > > [   44.376849] Call Trace:
> > > [   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c
> > > (unreliable)
> > > [   44.376982] [c0179b397a10] [c008089dd6bc]
> > > kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
> > > [   44.377084] [c0179b397ae0] [c008093f8bcc]
> > > kvmppc_vcpu_run+0x34/0x48 [kvm]
> > > [   44.377185] [c0179b397b00] [c008093f522c]
> > > kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> > > [   44.377286] [c0179b397b90] [c008093e3618]
> > > kvm_vcpu_ioctl+0x460/0x850 [kvm]
> > > [   44.377384] [c0179b397d00] [c04ba6c4]
> > > do_vfs_ioctl+0xe4/0xb40
> > > [   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
> > > [   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
> > > [   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
> > > [   44.3777

Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Michael Neuling

On Tue, 2019-06-11 at 08:48 +0200, Cédric Le Goater wrote:
> On 11/06/2019 08:44, Michael Neuling wrote:
> > > >  2:
> > > > -BEGIN_FTR_SECTION
> > > > -   /* POWER9 with disabled DAWR */
> > > > +   LOAD_REG_ADDR(r11, dawr_force_enable)
> > > > +   lbz r11, 0(r11)
> > > > +   cmpdi   r11, 0
> > > > li  r3, H_HARDWARE
> > > > -   blr
> > > > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > > > +   beqlr
> > > 
> > > Why is this a 'beqlr' ? Shouldn't it be a blr ? 
> > 
> > I believe it's right and should be a beqlr.  It's to replace the FTR 
> > section to
> > make it dynamic based on the dawr_force_enable bit.
> 
> hmm, see the crash below on a L1 running a nested guest. r3 is set
> to -1 (H_HARDWARE) but a vpcu pointer was expected. How can we fix
> this ? 
> 
> C.
> 
> 
> [   44.374746] BUG: Kernel NULL pointer dereference at 0x13bf
> [   44.374848] Faulting instruction address: 0xc010b044
> [   44.374906] Oops: Kernel access of bad area, sig: 11 [#1]
> [   44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> pSeries
> [   44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM 
> iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack 
> nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp 
> bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables 
> iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm 
> sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi 
> failover
> [   44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not 
> tainted 5.2.0-rc4+ #3
> [   44.375500] NIP:  c010b044 LR: c008089dacf4 CTR: 
> c010aff4
> [   44.375604] REGS: c0179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
> [   44.375691] MSR:  8280b033   CR: 
> 42244842  XER: 
> [   44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 4200 
> IRQMASK: 0 
> [   44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 
>  
> [   44.375815] GPR04:  0003 2444b05d 
> c017f11c45d0 
> [   44.375815] GPR08: 07803e018dfe 0028 0001 
> 0075 
> [   44.375815] GPR12: c010aff4 c7ff6300  
>  
> [   44.375815] GPR16:  c017f11d  
> c017f11ca7a8 
> [   44.375815] GPR20: c017f11c42ec   
> 000a 
> [   44.375815] GPR24: fffc  c017f11c 
> c1a77ed8 
> [   44.375815] GPR28: c0179af7 fffc c008089ff170 
> c0179ae88540 
> [   44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68
> [   44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 
> [kvm_hv]
> [   44.376849] Call Trace:
> [   44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c 
> (unreliable)
> [   44.376982] [c0179b397a10] [c008089dd6bc] 
> kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
> [   44.377084] [c0179b397ae0] [c008093f8bcc] 
> kvmppc_vcpu_run+0x34/0x48 [kvm]
> [   44.377185] [c0179b397b00] [c008093f522c] 
> kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
> [   44.377286] [c0179b397b90] [c008093e3618] 
> kvm_vcpu_ioctl+0x460/0x850 [kvm]
> [   44.377384] [c0179b397d00] [c04ba6c4] do_vfs_ioctl+0xe4/0xb40
> [   44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110
> [   44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80
> [   44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70
> [   44.377712] Instruction dump:
> [   44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 2c2b 
> 3860 
> [   44.377862] 4d820020 50852e74 508516f6 78840724  f8a313c8 
> 7c942ba6 7cbc2ba6 


Opps, it's because I corrupted r3 :-(

Does this fix it?


diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 139027c62d..f781ee1458 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2519,8 +2519,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
LOAD_REG_ADDR(r11, dawr_force_enable)
lbz r11, 0(r11)
cmpdi   r11, 0
+   bne 3f
li  r3, H_HARDWARE
-   beqlr
+   blr
+3:
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
rlwimi  r5, r4, 2, DAWRX_WT

Re: [PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-06-11 Thread Michael Neuling



> >  2:
> > -BEGIN_FTR_SECTION
> > -   /* POWER9 with disabled DAWR */
> > +   LOAD_REG_ADDR(r11, dawr_force_enable)
> > +   lbz r11, 0(r11)
> > +   cmpdi   r11, 0
> > li  r3, H_HARDWARE
> > -   blr
> > -END_FTR_SECTION_IFCLR(CPU_FTR_DAWR)
> > +   beqlr
> 
> Why is this a 'beqlr' ? Shouldn't it be a blr ? 

I believe it's right and should be a beqlr.  It's to replace the FTR section to
make it dynamic based on the dawr_force_enable bit.

Mikey

Re: [PATCH] Powerpc/Watchpoint: Restore nvgprs while returning from exception

2019-06-06 Thread Michael Neuling

On Thu, 2019-06-06 at 12:59 +0530, Ravi Bangoria wrote:
> Powerpc hw triggers watchpoint before executing the instruction.
> To make trigger-after-execute behavior, kernel emulates the
> instruction. If the instruction is 'load something into non-
> volatile register', exception handler should restore emulated
> register state while returning back, otherwise there will be
> register state corruption. Ex, Adding a watchpoint on a list
> can corrput the list:
> 
>   # cat /proc/kallsyms | grep kthread_create_list
>   c121c8b8 d kthread_create_list
> 
> Add watchpoint on kthread_create_list->next:
> 
>   # perf record -e mem:0xc121c8c0
> 
> Run some workload such that new kthread gets invoked. Ex, I
> just logged out from console:
> 
>   list_add corruption. next->prev should be prev (c1214e00), \
>   but was c121c8b8. (next=c121c8b8).
>   WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
>   CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
>   ...
>   NIP __list_add_valid+0xb4/0xc0
>   LR __list_add_valid+0xb0/0xc0
>   Call Trace:
>   __list_add_valid+0xb0/0xc0 (unreliable)
>   __kthread_create_on_node+0xe0/0x260
>   kthread_create_on_node+0x34/0x50
>   create_worker+0xe8/0x260
>   worker_thread+0x444/0x560
>   kthread+0x160/0x1a0
>   ret_from_kernel_thread+0x5c/0x70
> 
> Signed-off-by: Ravi Bangoria 

How long has this been around? Should we be CCing stable?

Mikey

> ---
>  arch/powerpc/kernel/exceptions-64s.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S
> index 9481a11..96de0d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1753,7 +1753,7 @@ handle_dabr_fault:
>   ld  r5,_DSISR(r1)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  do_break
> -12:  b   ret_from_except_lite
> +12:  b   ret_from_except
>  
>  
>  #ifdef CONFIG_PPC_BOOK3S_64

[PATCH v5 1/2] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-06-03 Thread Michael Neuling

From: Mathieu Malaterre 

In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:

   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);

Since GCC 8 this triggers the following warning about incompatible
function types:

  arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible 
function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void 
*)' [-Werror=cast-function-type]

Since the warning is there for a reason, and should not be hidden behind
a cast, provide an intermediate callback function to avoid the warning.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Suggested-by: Christoph Hellwig 
Signed-off-by: Mathieu Malaterre 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/hw_breakpoint.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index da307dd93e..ca3a2358b7 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -384,6 +384,11 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
 bool dawr_force_enable;
 EXPORT_SYMBOL_GPL(dawr_force_enable);
 
+static void set_dawr_cb(void *info)
+{
+   set_dawr(info);
+}
+
 static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -403,7 +408,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
-   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
+   smp_call_function(set_dawr_cb, _brk, 0);
 
return rc;
 }
-- 
2.21.0

[PATCH v5 2/2] powerpc: Fix compile issue with force DAWR

2019-06-03 Thread Michael Neuling

If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to 
`dawr_force_enable'

This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").

This moves a bunch of code around to fix this. It moves a lot of the
DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
compiling it.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling 
--
v5:
  - Changes based on comments by hch
- Change // to /* comments
- Change return code from -1 to -ENODEV
- Remove unneeded default n in new Kconfig option
- Remove setting to default value
- Remove unnecessary braces

v4:
  - Fix merge conflict with patch from Mathieu Malaterre:
 powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool
  - Fixed checkpatch issues noticed by Christophe Leroy.

v3:
  Fixes based on Christophe Leroy's comments:
  - Fix Kconfig options to better reflect reality
  - Reorder alphabetically
  - Inline vs #define
  - Fixed default return for dawr_enabled() when CONFIG_PPC_DAWR=N

V2:
  Fixes based on Christophe Leroy's comments:
  - Fix commit message formatting
  - Move more DAWR code into dawr.c
---
 arch/powerpc/Kconfig |  4 +
 arch/powerpc/include/asm/hw_breakpoint.h | 21 +++--
 arch/powerpc/kernel/Makefile |  1 +
 arch/powerpc/kernel/dawr.c   | 98 
 arch/powerpc/kernel/hw_breakpoint.c  | 61 ---
 arch/powerpc/kernel/process.c| 28 ---
 arch/powerpc/kvm/Kconfig |  1 +
 7 files changed, 118 insertions(+), 96 deletions(-)
 create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308..9d61b36df4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -234,6 +234,7 @@ config PPC
select OLD_SIGSUSPEND
select PCI_DOMAINS  if PCI
select PCI_SYSCALL  if PCI
+   select PPC_DAWR if PPC64
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
@@ -370,6 +371,9 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
 
+config PPC_DAWR
+   bool
+
 config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..41abdae6d0 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -90,18 +90,25 @@ static inline void hw_breakpoint_disable(void)
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 int hw_breakpoint_handler(struct die_args *args);
 
-extern int set_dawr(struct arch_hw_breakpoint *brk);
+#else  /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline void hw_breakpoint_disable(void) { }
+static inline void thread_change_pc(struct task_struct *tsk,
+   struct pt_regs *regs) { }
+
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+
+#ifdef CONFIG_PPC_DAWR
 extern bool dawr_force_enable;
 static inline bool dawr_enabled(void)
 {
return dawr_force_enable;
 }
-
-#else  /* CONFIG_HAVE_HW_BREAKPOINT */
-static inline void hw_breakpoint_disable(void) { }
-static inline void thread_change_pc(struct task_struct *tsk,
-   struct pt_regs *regs) { }
+int set_dawr(struct arch_hw_breakpoint *brk);
+#else
 static inline bool dawr_enabled(void) { return false; }
-#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline int set_dawr(struct arch_hw_breakpoint *brk) { return -1; }
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..56dfa7a2a6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64)   += setup_64.o sys_ppc32.o \
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
+obj-$(CONFIG_PPC_DAWR) += dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 00..ae3bd6abac
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * DAWR infrastructure
+ *
+ * Copyright 2019, Michael Neuling, IBM Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_ena

Re: [PATCH v4 2/2] powerpc: Fix compile issue with force DAWR

2019-06-03 Thread Michael Neuling

I agree with all the below and will address in v5.

Mikey

On Tue, 2019-05-28 at 23:28 -0700, Christoph Hellwig wrote:
> > +config PPC_DAWR
> > +   bool
> > +   default n
> 
> "default n" is the default default.  No need to write this line.
> 
> > +++ b/arch/powerpc/kernel/dawr.c
> > @@ -0,0 +1,100 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +//
> > +// DAWR infrastructure
> > +//
> > +// Copyright 2019, Michael Neuling, IBM Corporation.
> 
> Normal top of file header should be /* */, //-style comments are only
> for the actual SPDX heder line.
> 
> > +   /* Send error to user if they hypervisor won't allow us to write DAWR */
> > +   if ((!dawr_force_enable) &&
> > +   (firmware_has_feature(FW_FEATURE_LPAR)) &&
> > +   (set_dawr(_brk) != H_SUCCESS))
> 
> None of the three inner brace sets here are required, and the code
> becomes much easier to read without them.
> 
> > +   return -1;
> 
> What about returning a proper error code?
> 
> > +static int __init dawr_force_setup(void)
> > +{
> > +   dawr_force_enable = false;
> 
> This variable already is initialized to alse by default, so this line
> is not required.
> 
> > +   if (PVR_VER(mfspr(SPRN_PVR)) == PVR_POWER9) {
> > +   /* Turn DAWR off by default, but allow admin to turn it on */
> > +   dawr_force_enable = false;
> 
> .. and neither is this one.
>

[PATCH v4 1/2] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-05-28 Thread Michael Neuling

From: Mathieu Malaterre 

In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:

   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);

Since GCC 8 this triggers the following warning about incompatible
function types:

  arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible 
function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void 
*)' [-Werror=cast-function-type]

Since the warning is there for a reason, and should not be hidden behind
a cast, provide an intermediate callback function to avoid the warning.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Suggested-by: Christoph Hellwig 
Signed-off-by: Mathieu Malaterre 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/hw_breakpoint.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index da307dd93e..ca3a2358b7 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -384,6 +384,11 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
 bool dawr_force_enable;
 EXPORT_SYMBOL_GPL(dawr_force_enable);
 
+static void set_dawr_cb(void *info)
+{
+   set_dawr(info);
+}
+
 static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -403,7 +408,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
-   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
+   smp_call_function(set_dawr_cb, _brk, 0);
 
return rc;
 }
-- 
2.21.0

[PATCH v4 2/2] powerpc: Fix compile issue with force DAWR

2019-05-28 Thread Michael Neuling

If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to 
`dawr_force_enable'

This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").

This moves a bunch of code around to fix this. It moves a lot of the
DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
compiling it.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling 
--
v4:
  - Fix merge conflict with patch from Mathieu Malaterre:
 powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool
  - Fixed checkpatch issues noticed by Christophe Leroy.

v3:
  Fixes based on Christophe Leroy's comments:
  - Fix Kconfig options to better reflect reality
  - Reorder alphabetically
  - Inline vs #define
  - Fixed default return for dawr_enabled() when CONFIG_PPC_DAWR=N

V2:
  Fixes based on Christophe Leroy's comments:
  - Fix commit message formatting
  - Move more DAWR code into dawr.c
---
 arch/powerpc/Kconfig |   5 ++
 arch/powerpc/include/asm/hw_breakpoint.h |  21 +++--
 arch/powerpc/kernel/Makefile |   1 +
 arch/powerpc/kernel/dawr.c   | 100 +++
 arch/powerpc/kernel/hw_breakpoint.c  |  61 --
 arch/powerpc/kernel/process.c|  28 ---
 arch/powerpc/kvm/Kconfig |   1 +
 7 files changed, 121 insertions(+), 96 deletions(-)
 create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8c1c636308..87a3ce4e92 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -234,6 +234,7 @@ config PPC
select OLD_SIGSUSPEND
select PCI_DOMAINS  if PCI
select PCI_SYSCALL  if PCI
+   select PPC_DAWR if PPC64
select RTC_LIB
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
@@ -370,6 +371,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
 
+config PPC_DAWR
+   bool
+   default n
+
 config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..41abdae6d0 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -90,18 +90,25 @@ static inline void hw_breakpoint_disable(void)
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 int hw_breakpoint_handler(struct die_args *args);
 
-extern int set_dawr(struct arch_hw_breakpoint *brk);
+#else  /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline void hw_breakpoint_disable(void) { }
+static inline void thread_change_pc(struct task_struct *tsk,
+   struct pt_regs *regs) { }
+
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+
+#ifdef CONFIG_PPC_DAWR
 extern bool dawr_force_enable;
 static inline bool dawr_enabled(void)
 {
return dawr_force_enable;
 }
-
-#else  /* CONFIG_HAVE_HW_BREAKPOINT */
-static inline void hw_breakpoint_disable(void) { }
-static inline void thread_change_pc(struct task_struct *tsk,
-   struct pt_regs *regs) { }
+int set_dawr(struct arch_hw_breakpoint *brk);
+#else
 static inline bool dawr_enabled(void) { return false; }
-#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+static inline int set_dawr(struct arch_hw_breakpoint *brk) { return -1; }
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..56dfa7a2a6 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64)   += setup_64.o sys_ppc32.o \
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
+obj-$(CONFIG_PPC_DAWR) += dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 00..c8b3fb610c
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// DAWR infrastructure
+//
+// Copyright 2019, Michael Neuling, IBM Corporation.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_enable);
+
+int set_dawr(struct arch_hw_breakpoint *brk)
+{
+   unsigned long dawr, dawrx, mrd;
+
+   dawr = brk->address;
+
+   dawrx  = (brk->type & (HW_BRK_TYPE_READ | HW_BRK_TYPE_WRITE))
+

Re: [PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-14 Thread Michael Neuling



> > > > --
> > > > v2:
> > > > Fixes based on Christophe Leroy's comments:
> > > > - Fix commit message formatting
> > > > - Move more DAWR code into dawr.c
> > > > ---
> > > >arch/powerpc/Kconfig |  5 ++
> > > >arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
> > > >arch/powerpc/kernel/Makefile |  1 +
> > > >arch/powerpc/kernel/dawr.c   | 75
> > > > 
> > > >arch/powerpc/kernel/hw_breakpoint.c  | 56 --
> > > >arch/powerpc/kvm/Kconfig |  1 +
> > > >6 files changed, 94 insertions(+), 64 deletions(-)
> > > >create mode 100644 arch/powerpc/kernel/dawr.c
> > > > 
> > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > > index 2711aac246..f4b19e48cc 100644
> > > > --- a/arch/powerpc/Kconfig
> > > > +++ b/arch/powerpc/Kconfig
> > > > @@ -242,6 +242,7 @@ config PPC
> > > > select SYSCTL_EXCEPTION_TRACE
> > > > select THREAD_INFO_IN_TASK
> > > > select VIRT_TO_BUS  if !PPC64
> > > > +   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF
> > > 
> > > What's PERF ? Did you mean PERF_EVENTS ?
> > > 
> > > Then what you mean is:
> > > - Selected all the time for PPC64
> > > - Selected for PPC32 when PERF is also selected.
> > > 
> > > Is that what you want ? At first I thought it was linked to P9.
> > 
> > This is wrong.  I think we just want PPC64. PERF is a red herring.
> 
> Are you sure ? Michael suggested PERF || KVM as far as I remember.

It was mpe but I think it was wrong.

We could make it more selective with something like:
   PPC64 && (HW_BREAK || PPC_ADV_DEBUG_REGS || PERF)
but I think that will end up back in the larger mess of
  https://github.com/linuxppc/issues/issues/128

I don't think the minor size gain is is worth delving in that mess, hence I made
it simply PPC64 which is hopefully an improvement on what we have since it
eliminates 32bit.

> > > >#else/* CONFIG_HAVE_HW_BREAKPOINT */
> > > >static inline void hw_breakpoint_disable(void) { }
> > > >static inline void thread_change_pc(struct task_struct *tsk,
> > > > struct pt_regs *regs) { }
> > > > -static inline bool dawr_enabled(void) { return false; }
> > > > +
> > > >#endif   /* CONFIG_HAVE_HW_BREAKPOINT */
> > > > +
> > > > +extern bool dawr_force_enable;
> > > > +
> > > > +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE
> > > > +extern bool dawr_enabled(void);
> > > 
> > > Functions should not be 'extern'. I'm sure checkpatch --strict will tell
> > > you.
> > 
> > That's not what's currently being done in this header file.  I'm keeping
> > with
> > the style of that file.
> 
> style is not defined on a per file basis. There is the style from the 
> past and the nowadays style. If you keep old style just because the file 
> includes old style statements, then the code will never improve.
> 
> All new patches should come with clean 'checkpatch' report and follow 
> new style. Having mixed styles in a file is not a problem, that's the 
> way to improvement. See arch/powerpc/mm/mmu_decl.h as an exemple.

I'm not sure that's mpe's policy.

mpe?

> > > > +
> > > > +static ssize_t dawr_write_file_bool(struct file *file,
> > > > +   const char __user *user_buf,
> > > > +   size_t count, loff_t *ppos)
> > > > +{
> > > > +   struct arch_hw_breakpoint null_brk = {0, 0, 0};
> > > > +   size_t rc;
> > > > +
> > > > +   /* Send error to user if they hypervisor won't allow us to write
> > > > DAWR */
> > > > +   if ((!dawr_force_enable) &&
> > > > +   (firmware_has_feature(FW_FEATURE_LPAR)) &&
> > > > +   (set_dawr(_brk) != H_SUCCESS))
> > > 
> > > The above is not real clear.
> > > set_dabr() returns 0, H_SUCCESS is not used there.
> > 
> > It pseries_set_dawr() will return a hcall number.
> 
> Right, then it maybe means set_dawr() should be fixes ?

Sorry, I don't understand this.

> > This code hasn't changed. I'm just moving it.
> 
> Right, but could be an improvment for another patch.
> As far as I remember you are the one who wrote that code at first place, 
> arent't you ?

Yep, classic crap Mikey code :-)

Mikey

Re: [PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-13 Thread Michael Neuling

On Mon, 2019-05-13 at 11:08 +0200, Christophe Leroy wrote:
> 
> Le 13/05/2019 à 09:17, Michael Neuling a écrit :
> > If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
> > at linking with:
> >arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined
> > reference to `dawr_force_enable'
> > 
> > This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
> > DAWR on P9 option").
> > 
> > This puts more of the dawr infrastructure in a new file.
> 
> I think you are doing a bit more than that. I think you should explain 
> that you define a new CONFIG_ option, when it is selected, etc ...
> 
> The commit you are referring to is talking about P9. It looks like your 
> patch covers a lot more, so it should be mentionned her I guess.

Not really. It looks like I'm doing a lot but I'm really just moving code around
to deal with the ugliness of a bunch of config options and dependencies. 

> > Signed-off-by: Michael Neuling 
> 
> You should add a Fixes: tag, ie:
> 
> Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")

Ok

> 
> > --
> > v2:
> >Fixes based on Christophe Leroy's comments:
> >- Fix commit message formatting
> >- Move more DAWR code into dawr.c
> > ---
> >   arch/powerpc/Kconfig |  5 ++
> >   arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
> >   arch/powerpc/kernel/Makefile |  1 +
> >   arch/powerpc/kernel/dawr.c   | 75 
> >   arch/powerpc/kernel/hw_breakpoint.c  | 56 --
> >   arch/powerpc/kvm/Kconfig |  1 +
> >   6 files changed, 94 insertions(+), 64 deletions(-)
> >   create mode 100644 arch/powerpc/kernel/dawr.c
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 2711aac246..f4b19e48cc 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -242,6 +242,7 @@ config PPC
> > select SYSCTL_EXCEPTION_TRACE
> > select THREAD_INFO_IN_TASK
> > select VIRT_TO_BUS  if !PPC64
> > +   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF
> 
> What's PERF ? Did you mean PERF_EVENTS ?
> 
> Then what you mean is:
> - Selected all the time for PPC64
> - Selected for PPC32 when PERF is also selected.
> 
> Is that what you want ? At first I thought it was linked to P9.

This is wrong.  I think we just want PPC64. PERF is a red herring.

> And ... did you read below statement ?

Clearly not :-)

> 
> > #
> > # Please keep this list sorted alphabetically.
> > #
> > @@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
> > depends on PPC_ADV_DEBUG_REGS && 44x
> > default y
> >   
> > +config PPC_DAWR_FORCE_ENABLE
> > +   bool
> > +   default y
> > +
> 
> Why defaulting it to y. Then how is it set to n ?

Good point.

> 
> >   config ZONE_DMA
> > bool
> > default y if PPC_BOOK3E_64
> > diff --git a/arch/powerpc/include/asm/hw_breakpoint.h
> > b/arch/powerpc/include/asm/hw_breakpoint.h
> > index 0fe8c1e46b..ffbc8eab41 100644
> > --- a/arch/powerpc/include/asm/hw_breakpoint.h
> > +++ b/arch/powerpc/include/asm/hw_breakpoint.h
> > @@ -47,6 +47,8 @@ struct arch_hw_breakpoint {
> >   #define HW_BRK_TYPE_PRIV_ALL  (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL |
> > \
> >  HW_BRK_TYPE_HYP)
> >   
> > +extern int set_dawr(struct arch_hw_breakpoint *brk);
> > +
> >   #ifdef CONFIG_HAVE_HW_BREAKPOINT
> >   #include 
> >   #include 
> > @@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void)
> >   extern void thread_change_pc(struct task_struct *tsk, struct pt_regs
> > *regs);
> >   int hw_breakpoint_handler(struct die_args *args);
> >   
> > -extern int set_dawr(struct arch_hw_breakpoint *brk);
> > -extern bool dawr_force_enable;
> > -static inline bool dawr_enabled(void)
> > -{
> > -   return dawr_force_enable;
> > -}
> > -
> 
> That's a very simple function, why not keep it here (or in another .h) 
> as 'static inline' ?

Sure.

> >   #else /* CONFIG_HAVE_HW_BREAKPOINT */
> >   static inline void hw_breakpoint_disable(void) { }
> >   static inline void thread_change_pc(struct task_struct *tsk,
> > struct pt_regs *regs) { }
> > -static inline bool dawr_enabled(void) { return false; }
> > +
> >   #endif/* CONFIG_HAVE_HW_BREAKPOINT */
> > +
> > +extern bool daw

[PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-13 Thread Michael Neuling

If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to 
`dawr_force_enable'

This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").

This puts more of the dawr infrastructure in a new file.

Signed-off-by: Michael Neuling 
--
v2:
  Fixes based on Christophe Leroy's comments:
  - Fix commit message formatting
  - Move more DAWR code into dawr.c
---
 arch/powerpc/Kconfig |  5 ++
 arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
 arch/powerpc/kernel/Makefile |  1 +
 arch/powerpc/kernel/dawr.c   | 75 
 arch/powerpc/kernel/hw_breakpoint.c  | 56 --
 arch/powerpc/kvm/Kconfig |  1 +
 6 files changed, 94 insertions(+), 64 deletions(-)
 create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2711aac246..f4b19e48cc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -242,6 +242,7 @@ config PPC
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
+   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF
#
# Please keep this list sorted alphabetically.
#
@@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
 
+config PPC_DAWR_FORCE_ENABLE
+   bool
+   default y
+
 config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..ffbc8eab41 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -47,6 +47,8 @@ struct arch_hw_breakpoint {
 #define HW_BRK_TYPE_PRIV_ALL   (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
 HW_BRK_TYPE_HYP)
 
+extern int set_dawr(struct arch_hw_breakpoint *brk);
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include 
 #include 
@@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void)
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 int hw_breakpoint_handler(struct die_args *args);
 
-extern int set_dawr(struct arch_hw_breakpoint *brk);
-extern bool dawr_force_enable;
-static inline bool dawr_enabled(void)
-{
-   return dawr_force_enable;
-}
-
 #else  /* CONFIG_HAVE_HW_BREAKPOINT */
 static inline void hw_breakpoint_disable(void) { }
 static inline void thread_change_pc(struct task_struct *tsk,
struct pt_regs *regs) { }
-static inline bool dawr_enabled(void) { return false; }
+
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+extern bool dawr_force_enable;
+
+#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE
+extern bool dawr_enabled(void);
+#else
+#define dawr_enabled() true
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..a9c497c34f 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64)   += setup_64.o sys_ppc32.o \
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
+obj-$(CONFIG_PPC_DAWR_FORCE_ENABLE)+= dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 00..cf1d02fe1e
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// DAWR infrastructure
+//
+// Copyright 2019, Michael Neuling, IBM Corporation.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_enable);
+
+extern bool dawr_enabled(void)
+{
+   return dawr_force_enable;
+}
+EXPORT_SYMBOL_GPL(dawr_enabled);
+
+static ssize_t dawr_write_file_bool(struct file *file,
+   const char __user *user_buf,
+   size_t count, loff_t *ppos)
+{
+   struct arch_hw_breakpoint null_brk = {0, 0, 0};
+   size_t rc;
+
+   /* Send error to user if they hypervisor won't allow us to write DAWR */
+   if ((!dawr_force_enable) &&
+   (firmware_has_feature(FW_FEATURE_LPAR)) &&
+   (set_dawr(_brk) != H_SUCCESS))
+   return -1;
+
+   rc = debugfs_write_file_bool(file, user_buf, count, ppos);
+   if (rc)
+   return rc;
+
+   /* If we are clearing, make sure all CPUs have the DAWR cleared

[PATCH] powerpc: Document xive=off option

2019-05-12 Thread Michael Neuling

commit 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE
interrupt controller") added an option to turn off Linux native XIVE
usage via the xive=off kernel command line option.

This documents this option.

Signed-off-by: Michael Neuling 
---
 Documentation/admin-guide/kernel-parameters.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index c45a19d654..ee410d0ef4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5177,6 +5177,15 @@
Format:

,[,[,[,]]]
 
+   xive=   [PPC]
+   By default on POWER9 and above, the kernel will
+   natively use the XIVE interrupt controller. This option
+   allows the fallback firmware mode to be used:
+
+   off   Fallback to firmware control of XIVE interrupt
+ controller on both pseries and powernv
+ platforms. Only useful on POWER9 and above.
+
xhci-hcd.quirks [USB,KNL]
A hex value specifying bitmask with supplemental xhci
host controller quirks. Meaning of each bit can be
-- 
2.21.0

[PATCH] powerpc: Fix compile issue with force DAWR

2019-05-07 Thread Michael Neuling

If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to 
`dawr_force_enable'

This was caused by this recent patch:
   commit c1fe190c06723322f2dfac31d3b982c581e434ef
   Author: Michael Neuling 
   powerpc: Add force enable of DAWR on P9 option

This builds dawr_force_enable in always via a new file.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/Makefile|  2 +-
 arch/powerpc/kernel/dawr.c  | 11 +++
 arch/powerpc/kernel/hw_breakpoint.c |  3 ---
 3 files changed, 12 insertions(+), 4 deletions(-)
 create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..48a20ef5be 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -49,7 +49,7 @@ obj-y := cputable.o ptrace.o 
syscalls.o \
   signal.o sysfs.o cacheinfo.o time.o \
   prom.o traps.o setup-common.o \
   udbg.o misc.o io.o misc_$(BITS).o \
-  of_platform.o prom_parse.o
+  of_platform.o prom_parse.o dawr.o
 obj-$(CONFIG_PPC64)+= setup_64.o sys_ppc32.o \
   signal_64.o ptrace32.o \
   paca.o nvram_64.o firmware.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 00..ca343efd23
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// DAWR global variables
+//
+// Copyright 2019, Michael Neuling, IBM Corporation.
+
+#include 
+#include 
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_enable);
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index da307dd93e..78a17454f4 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -381,9 +381,6 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
/* TODO */
 }
 
-bool dawr_force_enable;
-EXPORT_SYMBOL_GPL(dawr_force_enable);
-
 static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
-- 
2.21.0

Re: [PATCH] powerpc/tm: Avoid machine crash on rt_sigreturn

2019-05-01 Thread Michael Neuling

On Wed, 2019-01-16 at 14:47 -0200, Breno Leitao wrote:
> There is a kernel crash that happens if rt_sigreturn is called inside a
> transactional block.
> 
> This crash happens if the kernel hits an in-kernel page fault when
> accessing userspace memory, usually through copy_ckvsx_to_user(). A major
> page fault calls might_sleep() function, which can cause a task reschedule.
> A task reschedule (switch_to()) reclaim and recheckpoint the TM states,
> but, in the signal return path, the checkpointed memory was already
> reclaimed, thus the exception stack has MSR that points to MSR[TS]=0.
> 
> When the code returns from might_sleep() and a task reschedule happened,
> then this task is returned with the memory recheckpointed, and
> CPU MSR[TS] = suspended.
> 
> This means that there is a side effect at might_sleep() if it is called
> with CPU MSR[TS] = 0 and the task has regs->msr[TS] != 0.
> 
> This side effect can cause a TM bad thing, since at the exception entrance,
> the stack saves MSR[TS]=0, and this is what will be used at RFID, but,
> the processor has MSR[TS] = Suspended, and this transition will be invalid
> and a TM Bad thing will be raised, causing the following crash:
> 
>   Unexpected TM Bad Thing exception at c000e9ec (msr
> 0x800302a03031) tm_scratch=80010280b033
>   cpu 0xc: Vector: 700 (Program Check) at [c0003ff1fd70]
>   pc: c000e9ec: fast_exception_return+0x100/0x1bc
>   lr: c0032948: handle_rt_signal64+0xb8/0xaf0
>   sp: c004263ebc40
>  msr: 800302a03031
> current = 0xc00415050300
> paca= 0xc0003ffc4080   irqmask: 0x03   irq_happened: 0x01
>   pid   = 25006, comm = sigfuz
>   Linux version 5.0.0-rc1-1-g3bd6e94bec12 (breno@debian) (gcc version
> 8.2.0 (Debian 8.2.0-3)) #899 SMP Mon Jan 7 11:30:07 EST 2019
>   WARNING: exception is not recoverable, can't continue
>   enter ? for help
>   [c004263ebc40] c0032948 handle_rt_signal64+0xb8/0xaf0
> (unreliable)
>   [c004263ebd30] c0022780 do_notify_resume+0x2f0/0x430
>   [c004263ebe20] c000e844 ret_from_except_lite+0x70/0x74
>   --- Exception: c00 (System Call) at 7fffbaac400c
>   SP (7fffeca90f40) is in userspace
> 
> The solution for this problem is running the sigreturn code with
> regs->msr[TS] disabled, thus, avoiding hitting the side effect above. This
> does not seem to be a problem since regs->msr will be replaced by the
> ucontext value, so, it is being flushed already. In this case, it is
> flushed earlier.
> 
> Signed-off-by: Breno Leitao 

Acked-by: Michael Neuling 

This still applies on powerpc/next so just acking rather than reposting

> ---
>  arch/powerpc/kernel/signal_64.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 6794466f6420..06c299ef6132 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -565,7 +565,7 @@ static long restore_tm_sigcontexts(struct task_struct
> *tsk,
>   preempt_disable();
>  
>   /* pull in MSR TS bits from user context */
> - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> + regs->msr |= msr & MSR_TS_MASK;
>  
>   /*
>* Ensure that TM is enabled in regs->msr before we leave the signal
> @@ -745,6 +745,31 @@ SYSCALL_DEFINE0(rt_sigreturn)
>   if (MSR_TM_SUSPENDED(mfmsr()))
>   tm_reclaim_current(0);
>  
> + /*
> +  * Disable MSR[TS] bit also, so, if there is an exception in the
> +  * code below (as a page fault in copy_ckvsx_to_user()), it does
> +  * not recheckpoint this task if there was a context switch inside
> +  * the exception.
> +  *
> +  * A major page fault can indirectly call schedule(). A reschedule
> +  * process in the middle of an exception can have a side effect
> +  * (Changing the CPU MSR[TS] state), since schedule() is called
> +  * with the CPU MSR[TS] disable and returns with MSR[TS]=Suspended
> +  * (switch_to() calls tm_recheckpoint() for the 'new' process). In
> +  * this case, the process continues to be the same in the CPU, but
> +  * the CPU state just changed.
> +  *
> +  * This can cause a TM Bad Thing, since the MSR in the stack will
> +  * have the MSR[TS]=0, and this is what will be used to RFID.
> +  *
> +  * Clearing MSR[TS] state here will avoid a recheckpoint if there
> +  * is any process reschedule in kernel space. The MSR[TS] state
> +  * does not need to be saved also, since it will be replaced with
> +  * the MSR[TS] that came from user context later, at
> +  * restore_tm_sigcontexts.
> +  */
> + regs->msr &= ~MSR_TS_MASK;
> +
>   if (__get_user(msr, >uc_mcontext.gp_regs[PT_MSR]))
>   goto badframe;
>   if (MSR_TM_ACTIVE(msr)) {

Re: [PATCH] Documentation: powerpc: Expand the DAWR acronym

2019-04-01 Thread Michael Neuling

On Mon, 2019-04-01 at 16:41 +1030, Joel Stanley wrote:
> Those not of us not drowning in POWER might not know what this means.

Hehe... thanks!

> Signed-off-by: Joel Stanley 

Acked-by: Michael Neuling 

> ---
>  Documentation/powerpc/DAWR-POWER9.txt | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/powerpc/DAWR-POWER9.txt
> b/Documentation/powerpc/DAWR-POWER9.txt
> index 2feaa6619658..f6c24e54ce83 100644
> --- a/Documentation/powerpc/DAWR-POWER9.txt
> +++ b/Documentation/powerpc/DAWR-POWER9.txt
> @@ -1,10 +1,10 @@
>  DAWR issues on POWER9
>  
>  
> -On POWER9 the DAWR can cause a checkstop if it points to cache
> -inhibited (CI) memory. Currently Linux has no way to disinguish CI
> -memory when configuring the DAWR, so (for now) the DAWR is disabled by
> -this commit:
> +On POWER9 the Data Address Watchpoint Register (DAWR) can cause a checkstop
> +if it points to cache inhibited (CI) memory. Currently Linux has no way to
> +disinguish CI memory when configuring the DAWR, so (for now) the DAWR is
> +disabled by this commit:
>  
>  commit 9654153158d3e0684a1bdb76dbababdb7111d5a0
>  Author: Michael Neuling

[PATCH v2] powerpc: Add force enable of DAWR on P9 option

2019-04-01 Thread Michael Neuling

This adds a flag so that the DAWR can be enabled on P9 via:
  echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous

The DAWR was previously force disabled on POWER9 in:
  9654153158 powerpc: Disable DAWR in the base POWER9 CPU features
Also see Documentation/powerpc/DAWR-POWER9.txt

This is a dangerous setting, USE AT YOUR OWN RISK.

Some users may not care about a bad user crashing their box
(ie. single user/desktop systems) and really want the DAWR.  This
allows them to force enable DAWR.

This flag can also be used to disable DAWR access. Once this is
cleared, all DAWR access should be cleared immediately and your
machine once again safe from crashing.

Userspace may get confused by toggling this. If DAWR is force
enabled/disabled between getting the number of breakpoints (via
PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
inconsistent view of what's available. Similarly for guests.

For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
enabled in the host AND the guest. For this reason, this won't work on
POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
dawr_enable_dangerous file will fail if the hypervisor doesn't support
writing the DAWR.

To double check the DAWR is working, run this kernel selftest:
  tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
Any errors/failures/skips mean something is wrong.

Signed-off-by: Michael Neuling 
---
v2:
  Fix compile errors found by kbuild test robot.
---
 Documentation/powerpc/DAWR-POWER9.txt| 32 
 arch/powerpc/include/asm/hw_breakpoint.h |  8 +++
 arch/powerpc/kernel/hw_breakpoint.c  | 62 +++-
 arch/powerpc/kernel/process.c|  9 ++--
 arch/powerpc/kernel/ptrace.c |  3 +-
 arch/powerpc/kvm/book3s_hv.c |  3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 23 +
 7 files changed, 123 insertions(+), 17 deletions(-)

diff --git a/Documentation/powerpc/DAWR-POWER9.txt 
b/Documentation/powerpc/DAWR-POWER9.txt
index 2feaa66196..bdec036509 100644
--- a/Documentation/powerpc/DAWR-POWER9.txt
+++ b/Documentation/powerpc/DAWR-POWER9.txt
@@ -56,3 +56,35 @@ POWER9. Loads and stores to the watchpoint locations will 
not be
 trapped in GDB. The watchpoint is remembered, so if the guest is
 migrated back to the POWER8 host, it will start working again.
 
+Force enabling the DAWR
+=
+Kernels (since ~v5.2) have an option to force enable the DAWR via:
+
+  echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous
+
+This enables the DAWR even on POWER9.
+
+This is a dangerous setting, USE AT YOUR OWN RISK.
+
+Some users may not care about a bad user crashing their box
+(ie. single user/desktop systems) and really want the DAWR.  This
+allows them to force enable DAWR.
+
+This flag can also be used to disable DAWR access. Once this is
+cleared, all DAWR access should be cleared immediately and your
+machine once again safe from crashing.
+
+Userspace may get confused by toggling this. If DAWR is force
+enabled/disabled between getting the number of breakpoints (via
+PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
+inconsistent view of what's available. Similarly for guests.
+
+For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
+enabled in the host AND the guest. For this reason, this won't work on
+POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
+dawr_enable_dangerous file will fail if the hypervisor doesn't support
+writing the DAWR.
+
+To double check the DAWR is working, run this kernel selftest:
+  tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+Any errors/failures/skips mean something is wrong.
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index ece4dc89c9..0fe8c1e46b 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -90,10 +90,18 @@ static inline void hw_breakpoint_disable(void)
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 int hw_breakpoint_handler(struct die_args *args);
 
+extern int set_dawr(struct arch_hw_breakpoint *brk);
+extern bool dawr_force_enable;
+static inline bool dawr_enabled(void)
+{
+   return dawr_force_enable;
+}
+
 #else  /* CONFIG_HAVE_HW_BREAKPOINT */
 static inline void hw_breakpoint_disable(void) { }
 static inline void thread_change_pc(struct task_struct *tsk,
struct pt_regs *regs) { }
+static inline bool dawr_enabled(void) { return false; }
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif /* __KERNEL__ */
 #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index fec8a67731..da307dd93e 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -29,11 +29,15 @@
 #include 
 #include 
 #include 
+#include 
+#i

[PATCH] powerpc: Add force enable of DAWR on P9 option

2019-03-25 Thread Michael Neuling

This adds a flag so that the DAWR can be enabled on P9 via:
  echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous

The DAWR was previously force disabled on POWER9 in:
  9654153158 powerpc: Disable DAWR in the base POWER9 CPU features
Also see Documentation/powerpc/DAWR-POWER9.txt

This is a dangerous setting, USE AT YOUR OWN RISK.

Some users may not care about a bad user crashing their box
(ie. single user/desktop systems) and really want the DAWR.  This
allows them to force enable DAWR.

This flag can also be used to disable DAWR access. Once this is
cleared, all DAWR access should be cleared immediately and your
machine once again safe from crashing.

Userspace may get confused by toggling this. If DAWR is force
enabled/disabled between getting the number of breakpoints (via
PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
inconsistent view of what's available. Similarly for guests.

For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
enabled in the host AND the guest. For this reason, this won't work on
POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
dawr_enable_dangerous file will fail if the hypervisor doesn't support
writing the DAWR.

To double check the DAWR is working, run this kernel selftest:
  tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
Any errors/failures/skips mean something is wrong.

Signed-off-by: Michael Neuling 
---
 Documentation/powerpc/DAWR-POWER9.txt| 32 +
 arch/powerpc/include/asm/hw_breakpoint.h |  7 +++
 arch/powerpc/kernel/hw_breakpoint.c  | 60 +++-
 arch/powerpc/kernel/process.c|  9 ++--
 arch/powerpc/kernel/ptrace.c |  3 +-
 arch/powerpc/kvm/book3s_hv.c |  3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 23 +
 7 files changed, 120 insertions(+), 17 deletions(-)

diff --git a/Documentation/powerpc/DAWR-POWER9.txt 
b/Documentation/powerpc/DAWR-POWER9.txt
index 2feaa66196..bdec036509 100644
--- a/Documentation/powerpc/DAWR-POWER9.txt
+++ b/Documentation/powerpc/DAWR-POWER9.txt
@@ -56,3 +56,35 @@ POWER9. Loads and stores to the watchpoint locations will 
not be
 trapped in GDB. The watchpoint is remembered, so if the guest is
 migrated back to the POWER8 host, it will start working again.
 
+Force enabling the DAWR
+=
+Kernels (since ~v5.2) have an option to force enable the DAWR via:
+
+  echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous
+
+This enables the DAWR even on POWER9.
+
+This is a dangerous setting, USE AT YOUR OWN RISK.
+
+Some users may not care about a bad user crashing their box
+(ie. single user/desktop systems) and really want the DAWR.  This
+allows them to force enable DAWR.
+
+This flag can also be used to disable DAWR access. Once this is
+cleared, all DAWR access should be cleared immediately and your
+machine once again safe from crashing.
+
+Userspace may get confused by toggling this. If DAWR is force
+enabled/disabled between getting the number of breakpoints (via
+PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
+inconsistent view of what's available. Similarly for guests.
+
+For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
+enabled in the host AND the guest. For this reason, this won't work on
+POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
+dawr_enable_dangerous file will fail if the hypervisor doesn't support
+writing the DAWR.
+
+To double check the DAWR is working, run this kernel selftest:
+  tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+Any errors/failures/skips mean something is wrong.
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index ece4dc89c9..87c2a74f64 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -90,6 +90,13 @@ static inline void hw_breakpoint_disable(void)
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 int hw_breakpoint_handler(struct die_args *args);
 
+extern int set_dawr(struct arch_hw_breakpoint *brk);
+extern bool dawr_force_enable;
+static inline bool dawr_enabled(void)
+{
+   return dawr_force_enable;
+}
+
 #else  /* CONFIG_HAVE_HW_BREAKPOINT */
 static inline void hw_breakpoint_disable(void) { }
 static inline void thread_change_pc(struct task_struct *tsk,
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index fec8a67731..0b2461e1d9 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -29,11 +29,14 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -174,7 +177,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
if (!ppc_breakpoint_available())
return -ENODEV;
length_max = 8; /* DABR */
-   if (cpu_has_f

Re: [PATCH] powerpc/security: Fix spectre_v2 reporting

2019-03-21 Thread Michael Neuling

On Thu, 2019-03-21 at 15:24 +1100, Michael Ellerman wrote:
> When I updated the spectre_v2 reporting to handle software count cache
> flush I got the logic wrong when there's no software count cache
> enabled at all.
> 
> The result is that on systems with the software count cache flush
> disabled we print:
> 
>   Mitigation: Indirect branch cache disabled, Software count cache flush
> 
> Which correctly indicates that the count cache is disabled, but
> incorrectly says the software count cache flush is enabled.
> 
> The root of the problem is that we are trying to handle all
> combinations of options. But we know now that we only expect to see
> the software count cache flush enabled if the other options are false.
> 
> So split the two cases, which simplifies the logic and fixes the bug.
> We were also missing a space before "(hardware accelerated)".
> 
> The result is we see one of:
> 
>   Mitigation: Indirect branch serialisation (kernel only)
>   Mitigation: Indirect branch cache disabled
>   Mitigation: Software count cache flush
>   Mitigation: Software count cache flush (hardware accelerated)
> 
> Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache 
> flush")
> Cc: sta...@vger.kernel.org # v4.19+
> Signed-off-by: Michael Ellerman 

LGTM

Reviewed-by: Michael Neuling 

> ---
>  arch/powerpc/kernel/security.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
> index 9b8631533e02..b33bafb8fcea 100644
> --- a/arch/powerpc/kernel/security.c
> +++ b/arch/powerpc/kernel/security.c
> @@ -190,29 +190,22 @@ ssize_t cpu_show_spectre_v2(struct device *dev, struct 
> device_attribute *attr, c
>   bcs = security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED);
>   ccd = security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED);
>  
> - if (bcs || ccd || count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
> - bool comma = false;
> + if (bcs || ccd) {
>   seq_buf_printf(, "Mitigation: ");
>  
> - if (bcs) {
> + if (bcs)
>   seq_buf_printf(, "Indirect branch serialisation 
> (kernel only)");
> - comma = true;
> - }
>  
> - if (ccd) {
> - if (comma)
> - seq_buf_printf(, ", ");
> - seq_buf_printf(, "Indirect branch cache disabled");
> - comma = true;
> - }
> -
> - if (comma)
> + if (bcs && ccd)
>   seq_buf_printf(, ", ");
>  
> - seq_buf_printf(, "Software count cache flush");
> + if (ccd)
> + seq_buf_printf(, "Indirect branch cache disabled");
> + } else if (count_cache_flush_type != COUNT_CACHE_FLUSH_NONE) {
> + seq_buf_printf(, "Mitigation: Software count cache flush");
>  
>   if (count_cache_flush_type == COUNT_CACHE_FLUSH_HW)
> - seq_buf_printf(, "(hardware accelerated)");
> + seq_buf_printf(, " (hardware accelerated)");
>   } else if (btb_flush_enabled) {
>   seq_buf_printf(, "Mitigation: Branch predictor state flush");
>   } else {

Re: [PATCH] selftests/powerpc: New TM signal self test

2018-11-28 Thread Michael Neuling

On Wed, 2018-11-28 at 11:23 -0200, Breno Leitao wrote:
> A new self test that forces MSR[TS] to be set without calling any TM
> instruction. This test also tries to cause a page fault at a signal
> handler, exactly between MSR[TS] set and tm_recheckpoint(), forcing
> thread->texasr to be rewritten with TEXASR[FS] = 0, which will cause a BUG
> when tm_recheckpoint() is called.
> 
> This test is not deterministic since it is hard to guarantee that the page
> access will cause a page fault. Tests have shown that the bug could be
> exposed with few interactions in a buggy kernel. This test is configured to
> loop 5000x, having a good chance to hit the kernel issue in just one run.
> This self test takes less than two seconds to run.

You could try using sigaltstack() to put the ucontext somewhere else. Then you
could play tricks with that memory to try to force a fault.
madvise()+MADV_DONTNEED or fadvise()+POSIX_FADV_DONTNEED might do the trick.

This is more extra credit to make it more reliable. Not a requirement.


> This test uses set/getcontext because the kernel will recheckpoint
> zeroed structures, causing the test to segfault, which is undesired because
> the test needs to rerun, so, there is a signal handler for SIGSEGV which
> will restart the test.

Please put this description at the top of the test also.

Other than that, it looks good.

Mikey

> 
> Signed-off-by: Breno Leitao 
> ---
>  tools/testing/selftests/powerpc/tm/.gitignore |   1 +
>  tools/testing/selftests/powerpc/tm/Makefile   |   3 +-
>  .../powerpc/tm/tm-signal-force-msr.c  | 115 ++
>  3 files changed, 118 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> 
> diff --git a/tools/testing/selftests/powerpc/tm/.gitignore
> b/tools/testing/selftests/powerpc/tm/.gitignore
> index c3ee8393dae8..89679822ebc9 100644
> --- a/tools/testing/selftests/powerpc/tm/.gitignore
> +++ b/tools/testing/selftests/powerpc/tm/.gitignore
> @@ -11,6 +11,7 @@ tm-signal-context-chk-fpu
>  tm-signal-context-chk-gpr
>  tm-signal-context-chk-vmx
>  tm-signal-context-chk-vsx
> +tm-signal-force-msr
>  tm-vmx-unavail
>  tm-unavailable
>  tm-trap
> diff --git a/tools/testing/selftests/powerpc/tm/Makefile
> b/tools/testing/selftests/powerpc/tm/Makefile
> index 9fc2cf6fbc92..58a2ebd13958 100644
> --- a/tools/testing/selftests/powerpc/tm/Makefile
> +++ b/tools/testing/selftests/powerpc/tm/Makefile
> @@ -4,7 +4,7 @@ SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-
> signal-context-chk-fpu
>  
>  TEST_GEN_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-
> stack \
>   tm-vmxcopy tm-fork tm-tar tm-tmspr tm-vmx-unavail tm-unavailable 
> tm-trap 
> \
> - $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn
> + $(SIGNAL_CONTEXT_CHK_TESTS) tm-sigreturn tm-signal-force-msr
>  
>  top_srcdir = ../../../../..
>  include ../../lib.mk
> @@ -20,6 +20,7 @@ $(OUTPUT)/tm-vmx-unavail: CFLAGS += -pthread -m64
>  $(OUTPUT)/tm-resched-dscr: ../pmu/lib.c
>  $(OUTPUT)/tm-unavailable: CFLAGS += -O0 -pthread -m64 -Wno-
> error=uninitialized -mvsx
>  $(OUTPUT)/tm-trap: CFLAGS += -O0 -pthread -m64
> +$(OUTPUT)/tm-signal-force-msr: CFLAGS += -pthread
>  
>  SIGNAL_CONTEXT_CHK_TESTS := $(patsubst
> %,$(OUTPUT)/%,$(SIGNAL_CONTEXT_CHK_TESTS))
>  $(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
> diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> new file mode 100644
> index ..4441d61c2328
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/tm/tm-signal-force-msr.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2018, Breno Leitao, Gustavo Romero, IBM Corp.
> + */
> +
> +#define _GNU_SOURCE
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "tm.h"
> +#include "utils.h"
> +
> +#define __MASK(X)   (1UL<<(X))
> +#define MSR_TS_S_LG 33  /* Trans Mem state: Suspended */
> +#define MSR_TM  __MASK(MSR_TM_LG)   /* Transactional Mem Available */
> +#define MSR_TS_S__MASK(MSR_TS_S_LG) /* Transaction Suspended */

Surely we have these defined somewhere else in selftests? 

> +
> +#define COUNT_MAX   5000/* Number of interactions */
> +
> +/* Setting contexts because the test will crash and we want to recover */
> +ucontext_t init_context, main_context;
> +
> +static int count, first_time;
> +
> +void trap_signal_handler(int signo, siginfo_t *si, void *uc)
> +{
> + ucontext_t *ucp = uc;
> +
> + /*
> +  * Allocating memory in a signal handler, and never freeing it on
> +  * purpose, forcing the heap increase, so, the memory leak is what
> +  * we want here.
> +  */
> + ucp->uc_link = malloc(sizeof(ucontext_t));
> + memcpy(>uc_link, >uc_mcontext, sizeof(ucp->uc_mcontext));
> +
> + /* Forcing to enable MSR[TM] */
> + ucp->uc_mcontext.gp_regs[PT_MSR]

Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint

2018-11-21 Thread Michael Neuling

> Do you mean in this part of code?
> 
>   SYSCALL_DEFINE0(rt_sigreturn)
>   {
>   
> if (__copy_from_user(, >uc_sigmask, sizeof(set)))
> goto badframe;
> 
>   ...
> if (MSR_TM_SUSPENDED(mfmsr()))
> tm_reclaim_current(0);

I'm actually thinking after the reclaim, not before.

If I follow your original email properly, you have a problem because you end up
in this senario:
1) Current MSR is not TM suspended
2) regs->msr[TS] set
3) get_user() (which may fault)

After the tm_reclaim there are cases in restore_tm_sigcontexts() where the above
is also the case. Hence why I think we have a problem there too.

Mikey

Re: [PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint

2018-11-19 Thread Michael Neuling

On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote:
> On a signal handler return, the user could set a context with MSR[TS] bits
> set, and these bits would be copied to task regs->msr.
> 
> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
> several __get_user() are called and then a recheckpoint is executed.

Do we have the same problem on the entry side?  There are a bunch of
copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is
there some chance of the same problem there?

> This is a problem since a page fault (in kernel space) could happen when
> calling __get_user(). If it happens, the process MSR[TS] bits were
> already set, but recheckpoint was not executed, and SPRs are still invalid.
> 
> The page fault can cause the current process to be de-scheduled, with
> MSR[TS] active and without tm_recheckpoint() being called.  More
> importantly, without TEXAR[FS] bit set also.

This patch is great and should go in ASAP 

> Since TEXASR might not have the FS bit set, and when the process is
> scheduled back, it will try to reclaim, which will be aborted because of
> the CPU is not in the suspended state, and, then, recheckpoint. This
> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
> zero, hitting a BUG_ON().
> 
>   [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

Can you put more of the backtrace here?  Can be useful for people searching for
a similar problem.

> This patch simply delays the MSR[TS] set, so, if there is any page fault in
> the __get_user() section, it does not have regs->msr[TS] set, since the TM
> structures are still invalid, thus avoiding doing TM operations for
> in-kernel exceptions and possible process reschedule.

Can you put a comment in the code what says after the MSR[TS] setting, there can
be no get/put_user before the recheckpoint? 

Also, it looks like the 32bit signals case is safe, but please add a comment in
there too.

> With this patch, the MSR[TS] will only be set just before recheckpointing
> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
> invalid state.
> 
> It is not possible to move tm_recheckpoint to happen earlier, because it is
> required to get the checkpointed registers from userspace, with
> __get_user(), thus, the only way to avoid this undesired behavior is
> delaying the MSR[TS] set, as done in this patch.
> 
> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
> Cc: sta...@vger.kernel.org (v3.9+)
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/signal_64.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 83d51bf586c7..15b153bdd826 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct 
> *tsk,
>   if (MSR_TM_RESV(msr))
>   return -EINVAL;
>  
> - /* pull in MSR TS bits from user context */
> - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> -
> - /*
> -  * Ensure that TM is enabled in regs->msr before we leave the signal
> -  * handler. It could be the case that (a) user disabled the TM bit
> -  * through the manipulation of the MSR bits in uc_mcontext or (b) the
> -  * TM bit was disabled because a sufficient number of context switches
> -  * happened whilst in the signal handler and load_tm overflowed,
> -  * disabling the TM bit. In either case we can end up with an illegal
> -  * TM state leading to a TM Bad Thing when we return to userspace.
> -  */
> - regs->msr |= MSR_TM;
> -
>   /* pull in MSR LE from user context */
>   regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>  
> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct 
> *tsk,
>   tm_enable();
>   /* Make sure the transaction is marked as failed */
>   tsk->thread.tm_texasr |= TEXASR_FS;
> +
> + /* pull in MSR TS bits from user context */
> + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> +
> + /*
> +  * Ensure that TM is enabled in regs->msr before we leave the signal
> +  * handler. It could be the case that (a) user disabled the TM bit
> +  * through the manipulation of ararararthe MSR bits in uc_mcontext or 
> (b) the
> +  * TM bit was disabled because a sufficient number of context switches
> +  * happened whilst in the signal handler and load_tm overflowed,
> +  * disabling the TM bit. In either case we can end up with an illegal
> +  * TM state leading to a TM Bad Thing when we return to userspace.
> +  */
> + regs->msr |= MSR_TM;
> +
>   /* This loads the checkpointed FP/VEC state, if used */
>   tm_recheckpoint(>thread);
>

Re: [RFC PATCH 13/14] powerpc/tm: Do not restore TM without SPRs

2018-11-14 Thread Michael Neuling

On Tue, 2018-11-06 at 10:40 -0200, Breno Leitao wrote:
> Currently the signal context restore code enables the bit on the MSR
> register without restoring the TM SPRs, which can cause undesired side
> effects.
> 
> This is not correct because if TM is enabled in MSR, it means the TM SPR
> registers are valid and updated, which is not correct here. In fact, the
> live registers may contain previous' thread SPRs.
> 
> Functions check if the register values are valid or not through looking
> if the facility is enabled at MSR, as MSR[TM] set means that the TM SPRs
> are hot.
> 
> When just enabling MSR[TM] without updating the live SPRs, this can cause a
> crash, since current TM SPR from previous thread will be saved on the
> current thread, and might not have TEXASR[FS] set, for example.
> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/signal_64.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 487c3b6aa2e3..156b90e8ee78 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -478,8 +478,18 @@ static long restore_tm_sigcontexts(struct task_struct
> *tsk,
>* happened whilst in the signal handler and load_tm overflowed,
>* disabling the TM bit. In either case we can end up with an illegal
>* TM state leading to a TM Bad Thing when we return to userspace.
> +  *
> +  * Every time MSR_TM is enabled, mainly for the b) case, the TM SPRs
> +  * must be restored in the live registers, since the live registers
> +  * could contain garbage and later we want to read from live, since
> +  * MSR_TM is enabled, and MSR[TM] is what is used to check if the
> +  * TM SPRs live registers are valid or not.
>*/
> - regs->msr |= MSR_TM;
> + if ((regs->msr & MSR_TM) == 0) {
> + regs->msr |= MSR_TM;
> + tm_enable();
> + tm_restore_sprs(>thread);
> + }

I'm wondering if we should put the save/restore TM registers in the early
entry/exit code too. We'd need to add the check on msr[tm]/load_tm.

Distributing the SPR save/restore throughout the kernel is just going to lead us
to similar problems that we are having now with reclaim/recheckpoint.

Mikey


>  
>   /* pull in MSR LE from user context */
>   regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);

Re: [RFC PATCH 09/14] powerpc/tm: Warn if state is transactional

2018-11-14 Thread Michael Neuling




On Tue, 2018-11-06 at 10:40 -0200, Breno Leitao wrote:
> Since every kernel entrance is calling TM_KERNEL_ENTRY, it is not
> expected to arrive at this point with a suspended transaction.
> 
> If that is the case, cause a warning and reclaim the current thread in
> order to avoid a TM Bad Thing.
> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/process.c | 7 +++
>  arch/powerpc/kernel/signal.c  | 2 +-
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 73872f751b33..849591bf0881 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1752,11 +1752,10 @@ void start_thread(struct pt_regs *regs, unsigned long
> start, unsigned long sp)
>  
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   /*
> -  * Clear any transactional state, we're exec()ing. The cause is
> -  * not important as there will never be a recheckpoint so it's not
> -  * user visible.
> +  * It is a bug if the transaction was not reclaimed until this
> +  * point. Warn us and try to workaround it calling tm_reclaim().
>*/
> - if (MSR_TM_SUSPENDED(mfmsr()))
> + if (WARN_ON(MSR_TM_SUSPENDED(mfmsr(
>   tm_reclaim_current(0);
>  #endif

Let's turn these into BUG_ON()

Mikey
 
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index b3e8db376ecd..cbaccc2be0fb 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -203,7 +203,7 @@ unsigned long get_tm_stackpointer(struct task_struct *tsk)
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   BUG_ON(tsk != current);
>  
> - if (MSR_TM_ACTIVE(tsk->thread.regs->msr)) {
> + if (WARN_ON(MSR_TM_ACTIVE(mfmsr( {
>   tm_reclaim_current(TM_CAUSE_SIGNAL);
>   if (MSR_TM_TRANSACTIONAL(tsk->thread.regs->msr))
>   return tsk->thread.ckpt_regs.gpr[1];

Re: [RFC PATCH 08/14] powerpc/tm: Recheckpoint at exit path

2018-11-14 Thread Michael Neuling

On Tue, 2018-11-06 at 10:40 -0200, Breno Leitao wrote:
> In the past, TIF_RESTORE_TM was being handled with the rest of the TIF
> workers,
> but, that was too early, and can cause some IRQ to be replayed in suspended
> state (after recheckpoint).
> 
> This patch moves TIF_RESTORE_TM handler to as late as possible, it also
> forces the IRQ to be disabled, and it will continue to be until RFID, so,
> no IRQ will be replayed at all. I.e, if trecheckpoint happens, it will RFID
> to userspace.
> 
> This makes TIF_RESTORE_TM a special case that should not be handled
> similarly to the _TIF_USER_WORK_MASK.
> 
> Since _TIF_RESTORE_TM is not part of _TIF_USER_WORK_MASK anymore, we
> need to force system_call_exit to continue to leaves through
> fast_exception_return, so, we add the flags together with
> _TIF_USER_WORK_MASK at system_call_exist path.
> 
> If this flag is set at system_call_exit, it means that recheckpoint
> will be called, and doing it through fast_exception_return is the only
> way to do so.
> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/include/asm/thread_info.h |  2 +-
>  arch/powerpc/kernel/entry_64.S | 23 ++-
>  2 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/thread_info.h
> b/arch/powerpc/include/asm/thread_info.h
> index 544cac0474cb..2835d60bc9ef 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -139,7 +139,7 @@ void arch_setup_new_exec(void);
>  
>  #define _TIF_USER_WORK_MASK  (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
>_TIF_NOTIFY_RESUME | _TIF_UPROBE | \
> -  _TIF_RESTORE_TM | _TIF_PATCH_PENDING | \
> +  _TIF_PATCH_PENDING | \
>_TIF_FSCHECK)
>  #define _TIF_PERSYSCALL_MASK (_TIF_RESTOREALL|_TIF_NOERROR)
>  
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 17484ebda66c..a86619edf29d 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -255,7 +255,7 @@ system_call_exit:
>  
>   ld  r9,TI_FLAGS(r12)
>   li  r11,-MAX_ERRNO
> - andi.   r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MAS
> K|_TIF_PERSYSCALL_MASK)
> + andi.   r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|
> _TIF_PERSYSCALL_MASK |_TIF_RESTORE_TM)
>   bne-.Lsyscall_exit_work
>  
>   andi.   r0,r8,MSR_FP
> @@ -784,14 +784,6 @@ _GLOBAL(ret_from_except_lite)
>   SCHEDULE_USER
>   b   ret_from_except_lite
>  2:
> -#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> - andi.   r0,r4,_TIF_USER_WORK_MASK & ~_TIF_RESTORE_TM
> - bne 3f  /* only restore TM if nothing else to do */
> - addir3,r1,STACK_FRAME_OVERHEAD
> - bl  restore_tm_state
> - b   restore
> -3:
> -#endif
>   bl  save_nvgprs
>   /*
>* Use a non volatile GPR to save and restore our thread_info flags
> @@ -938,6 +930,19 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS)
>*/
>   .globl  fast_exception_return
>  fast_exception_return:
> +
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> + CURRENT_THREAD_INFO(r4, r1)
> + ld  r4,TI_FLAGS(r4)
> + andi.   r0,r4,_TIF_RESTORE_TM
> + beq 22f
> + ld  r4,_MSR(r1) /* TODO: MSR[!PR] shouldn't be here */
> + andi.   r0,r4,MSR_PR
> + beq 22f  /* Skip if Kernel thread */
> + addir3,r1,STACK_FRAME_OVERHEAD
> + bl  restore_tm_state

Calling out to C here is a bit concerning this late.

The main thing you are calling out to is asm anyway (with a small C wrapper). 
We might want to adapt the asm this case, rather than the old model of getting
it called from __switch_to(). We shouldn't need to call
tm_reclaim/tm_recheckpoint from C anymore (especially if we use BUG_ON() rather
than WARN_ON() in the cases already mentioned.).

Same for patch 1.

> +22:
> +#endif
>   ld  r3,_MSR(r1)
>   ld  r4,_CTR(r1)
>   ld  r0,_LINK(r1)

Re: [RFC PATCH 05/14] powerpc/tm: Refactor the __switch_to_tm code

2018-11-14 Thread Michael Neuling

On Tue, 2018-11-06 at 10:40 -0200, Breno Leitao wrote:
> __switch_to_tm is the function that switches between two tasks which might
> have TM enabled. This function is clearly split in two parts, the task that
> is leaving the CPU, known as 'prev' and the task that is being scheduled,
> known as 'new'.
> 
> It starts checking if the previous task had TM enable, if so, it increases
> the load_tm (this is the only place we increment load_tm). It also saves
> the TM SPRs here.
> 
> If the previous task was scheduled out with a transaction active, the
> failure cause needs to be updated, since it might contain the failure cause
> that caused the exception, as TM_CAUSE_MISC. In this case, since there was
> a context switch, overwrite the failure cause.
> 
> If the previous task has overflowed load_tm, disable TM, putting the
> facility save/restore lazy mechanism at lazy mode.
> 
> Regarding the 'new' task being scheduled, restoring TM SPRs is enough if
> the task had TM enabled when it was de-scheduled. (Checking if a
> recheckpoint would be required will be done later, at restore_tm_state()
> stage.)
> 
> On top of that, both tm_reclaim_task() and tm_recheckpoint_new_task()
> functions are not used anymore, removing them.

Is the above describing the previous functionality or the refactored
functionality?

> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/process.c | 167 --
>  1 file changed, 78 insertions(+), 89 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 1842fd96b123..73872f751b33 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -912,48 +912,6 @@ void tm_reclaim_current(uint8_t cause)
>   tm_reclaim_thread(>thread, cause);
>  }
>  
> -static inline void tm_reclaim_task(struct task_struct *tsk)
> -{
> - /* We have to work out if we're switching from/to a task that's in the
> -  * middle of a transaction.
> -  *
> -  * In switching we need to maintain a 2nd register state as
> -  * oldtask->thread.ckpt_regs.  We tm_reclaim(oldproc); this saves the
> -  * checkpointed (tbegin) state in ckpt_regs, ckfp_state and
> -  * ckvr_state
> -  *
> -  * We also context switch (save) TFHAR/TEXASR/TFIAR in here.
> -  */
> - struct thread_struct *thr = >thread;
> -
> - if (!thr->regs)
> - return;
> -
> - if (!MSR_TM_ACTIVE(thr->regs->msr))
> - goto out_and_saveregs;
> -
> - WARN_ON(tm_suspend_disabled);
> -
> - TM_DEBUG("--- tm_reclaim on pid %d (NIP=%lx, "
> -  "ccr=%lx, msr=%lx, trap=%lx)\n",
> -  tsk->pid, thr->regs->nip,
> -  thr->regs->ccr, thr->regs->msr,
> -  thr->regs->trap);
> -
> - tm_reclaim_thread(thr, TM_CAUSE_RESCHED);
> -
> - TM_DEBUG("--- tm_reclaim on pid %d complete\n",
> -  tsk->pid);
> -
> -out_and_saveregs:
> - /* Always save the regs here, even if a transaction's not active.
> -  * This context-switches a thread's TM info SPRs.  We do it here to
> -  * be consistent with the restore path (in recheckpoint) which
> -  * cannot happen later in _switch().
> -  */
> - tm_save_sprs(thr);
> -}
> -
>  extern void __tm_recheckpoint(struct thread_struct *thread);
>  
>  void tm_recheckpoint(struct thread_struct *thread)
> @@ -980,59 +938,91 @@ void tm_recheckpoint(struct thread_struct *thread)
>   local_irq_restore(flags);
>  }
>  
> -static inline void tm_recheckpoint_new_task(struct task_struct *new)
> +static void tm_change_failure_cause(struct task_struct *task, uint8_t cause)
>  {
> - if (!cpu_has_feature(CPU_FTR_TM))
> - return;
> -
> - /* Recheckpoint the registers of the thread we're about to switch to.
> -  *
> -  * If the task was using FP, we non-lazily reload both the original and
> -  * the speculative FP register states.  This is because the kernel
> -  * doesn't see if/when a TM rollback occurs, so if we take an FP
> -  * unavailable later, we are unable to determine which set of FP regs
> -  * need to be restored.
> -  */
> - if (!tm_enabled(new))
> - return;
> -
> - if (!MSR_TM_ACTIVE(new->thread.regs->msr)){
> - tm_restore_sprs(>thread);
> - return;
> - }
> - /* Recheckpoint to restore original checkpointed register state. */
> - TM_DEBUG("*** tm_recheckpoint of pid %d (new->msr 0x%lx)\n",
> -  new->pid, new->thread.regs->msr);
> -
> - tm_recheckpoint(>thread);
> -
> - /*
> -  * The checkpointed state has been restored but the live state has
> -  * not, ensure all the math functionality is turned off to trigger
> -  * restore_math() to reload.
> -  */
> - new->thread.regs->msr &= ~(MSR_FP | MSR_VEC | MSR_VSX);
> -
> - TM_DEBUG("*** tm_recheckpoint of pid %d complete "
> -  "(kernel msr 0x%lx)\n",
> -

Re: [RFC PATCH 03/14] powerpc/tm: Recheckpoint when exiting from kernel

2018-11-14 Thread Michael Neuling

On Tue, 2018-11-06 at 10:40 -0200, Breno Leitao wrote:
> This is the only place we are going to recheckpoint now. Now the task
> needs to have TIF_RESTORE_TM flag set, which will get into
> restore_tm_state() at exception exit path, and execute the recheckpoint
> depending on the MSR.
> 
> Every time a task is required to recheckpoint, or just have the TM SPRs
> restore, the TIF_RESTORE_TM flag should be set and the task MSR should
> properly be in a transactional state, which will be checked by
> restore_tm_state().
> 
> After the facility registers are recheckpointed, they are clobbered with
> the values that were recheckpointed (and are now also in the checkpoint
> area).

Which facility registers? I don't understand this.

> If facility is enabled at MSR that is being returned to user space, then
> the facility registers need to be restored, otherwise userspace will see
> invalid values.
> 
> This patch simplify the restore_tm_state() to just restore the facility
> registers that are enabled when returning to userspace, i.e. the MSR will
> be the same that will be put into SRR1, which will be the MSR after RFID.
> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/process.c | 38 ---
>  1 file changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 4d5322cfad25..c7e758a42b8f 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1049,8 +1049,6 @@ static inline void __switch_to_tm(struct task_struct
> *prev,
>   */
>  void restore_tm_state(struct pt_regs *regs)
>  {
> - unsigned long msr_diff;
> -
>   /*
>* This is the only moment we should clear TIF_RESTORE_TM as
>* it is here that ckpt_regs.msr and pt_regs.msr become the same
> @@ -1061,19 +1059,35 @@ void restore_tm_state(struct pt_regs *regs)
>   if (!MSR_TM_ACTIVE(regs->msr))
>   return;
>  
> - msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
> - msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
> + tm_enable();
> + /* The only place we recheckpoint */
> + tm_recheckpoint(>thread);
>  
> - /* Ensure that restore_math() will restore */
> - if (msr_diff & MSR_FP)
> - current->thread.load_fp = 1;
> + /*
> +  * Restore the facility registers that were clobbered during
> +  * recheckpoint.
> +  */
> + if (regs->msr & MSR_FP) {
> + /*
> +  * Using load_fp_state() instead of restore_fp() because we
> +  * want to force the restore, independent of
> +  * tsk->thread.load_fp. Same for other cases below.
> +  */
> + load_fp_state(>thread.fp_state);
> + }
>  #ifdef CONFIG_ALTIVEC
> - if (cpu_has_feature(CPU_FTR_ALTIVEC) && msr_diff & MSR_VEC)
> - current->thread.load_vec = 1;
> + if (cpu_has_feature(CPU_FTR_ALTIVEC) && regs->msr & MSR_VEC)
> + load_vr_state(>thread.vr_state);
> +#endif
> +#ifdef CONFIG_VSX
> + if (cpu_has_feature(CPU_FTR_VSX) && regs->msr & MSR_VSX) {
> + /*
> +  * If VSX is enabled, it is expected that VEC and FP are
> +  * also enabled and already restored the full register set.
> +  * Cause a warning if that is not the case.
> +  */
> + WARN_ON(!(regs->msr & MSR_VEC) || !(regs->msr & MSR_FP)); }
>  #endif
> - restore_math(regs);
> -
> - regs->msr |= msr_diff;
>  }
>  
>  #else

Re: [RFC PATCH 01/14] powerpc/tm: Reclaim transaction on kernel entry

2018-11-14 Thread Michael Neuling

On Tue, 2018-11-06 at 10:40 -0200, Breno Leitao wrote:
> This patch creates a macro that will be invoked on all entrance to the
> kernel, so, in kernel space the transaction will be completely reclaimed
> and not suspended anymore.
> 
> This patchset checks if we are coming from PR, if not, skip. 

Remove the double negative here. ie

"This skips when coming from the OS". or "Only happens when coming from PR"

> This is useful
> when there is a irq_replay() being called after recheckpoint, when the IRQ
> is re-enable. 

So we are talking about tm_recheckpoint on exit? On exit, we do:
   tm_recheckpoint -> irq_replay -> rfid?

Why not swap the order of the recheckpoint and the replay to avoid this problem?

> In this case, we do not want to re-reclaim and
> re-recheckpoint, thus, if not coming from PR, skip it completely.

Move double negatives... Try: "if coming from the OS, skip" or "only do it when
coming from userspace"

> This macro does not care about TM SPR also, it will only be saved and
> restore in the context switch code now on.
> This macro will return 0 or 1 in r3 register, to specify if a reclaim was
> executed or not.
> 
> This patchset is based on initial work done by Cyril:
> https://patchwork.ozlabs.org/cover/875341/
> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/include/asm/exception-64s.h | 46 
>  arch/powerpc/kernel/entry_64.S   | 10 ++
>  arch/powerpc/kernel/exceptions-64s.S | 12 +--
>  3 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h 
> b/arch/powerpc/include/asm/exception-64s.h
> index 3b4767ed3ec5..931a74ba037b 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -36,6 +36,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  
>  /* PACA save area offsets (exgen, exmc, etc) */
>  #define EX_R90
> @@ -677,10 +678,54 @@ BEGIN_FTR_SECTION   \
>   beqlppc64_runlatch_on_trampoline;   \
>  END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
>  
> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> +
> +/*
> + * This macro will reclaim a transaction if called when coming from userspace
> + * (MSR.PR = 1) and if the transaction state is active or suspended.
> + *
> + * Since we don't want to reclaim when coming from kernel, for instance after
> + * a trechkpt. or a IRQ replay, the live MSR is not useful and instead of it 
> the
> + * MSR from thread stack is used to check the MSR.PR bit.
> + * This macro has one argument which is the cause that will be used by 
> treclaim.
> + * and returns in r3 '1' if the reclaim happens or '0' if reclaim didn't
> + * happen, which is useful to know what registers were clobbered.
> + *
> + * NOTE: If addition registers are clobbered here, make sure the callee
> + * function restores them before proceeding.
> + */
> +#define TM_KERNEL_ENTRY(cause)   
> \
> + ld  r3, _MSR(r1);   \
> + andi.   r0, r3, MSR_PR; /* Coming from userspace? */\
> + beq 1f; /* Skip reclaim if MSR.PR != 1 */   \
> + rldicl. r0, r3, (64-MSR_TM_LG), 63; /* Is TM enabled? */\
> + beq 1f; /* Skip reclaim if TM is off */ \
> + rldicl. r0, r3, (64-MSR_TS_LG), 62; /* Is active */ \
> + beq 1f; /* Skip reclaim if neither */   \
> + /*  \
> +  * If there is a transaction active or suspended, save the  \
> +  * non-volatile GPRs if they are not already saved. \
> +  */ \
> + bl  save_nvgprs;\
> + /*  \
> +  * Soft disable the IRQs, otherwise it might cause a CPU hang.  \
> +  */ \
> + RECONCILE_IRQ_STATE(r10, r11);  \
> + li  r3, cause;  \
> + bl  tm_reclaim_current; \

Are we ready to call out to C at this point in the exception handlers?

> + li  r3, 1;  /* Reclaim happened */  \
> + b   2f; \
> +1:   li  r3, 0;  /* Reclaim didn't happen */ \
> +2:
> +#else
> +#define TM_KERNEL_ENTRY(cause)
> +#endif
> +
>  #define EXCEPTION_COMMON(area, trap, label, hdlr, ret, additions) \
> EXCEPTION_PROLOG_COMMON(trap, area);\
> /* Volatile regs are potentially clobbered here */  \
> additions;  \
> +   TM_KERNEL_ENTRY(TM_CAUSE_MISC);

Re: [RFC PATCH 02/14] powerpc/tm: Reclaim on unavailable exception

2018-11-14 Thread Michael Neuling

On Tue, 2018-11-06 at 10:40 -0200, Breno Leitao wrote:
> If there is a FP/VEC/Altivec touch inside a transaction and the facility is
> disabled, then a facility unavailable exception is raised and ends up
> calling {fp,vec,vsx}_unavailable_tm, which was reclaiming and
> recheckpointing.
> 
> This is not required anymore, since the checkpointed state was reclaimed in
> the exception entrance, and it will be recheckpointed by restore_tm_state
> later.
> 
> Adding a WARN_ON() warning if we hit the _unavailable_tm() in suspended
> mode, i.e, the reclaim was not executed somehow in the trap entrance, and
> this is a bug.

The "why" above is good and the important part of the commit but, 

Can you also add what you're doing?  The above would suggest you're just
removing some things but you're actually adding the TM_KERNEL_ENTRY() macro too.

Mikey

> 
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/include/asm/exception-64s.h |  4 
>  arch/powerpc/kernel/exceptions-64s.S |  3 +++
>  arch/powerpc/kernel/traps.c  | 22 --
>  3 files changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h
> b/arch/powerpc/include/asm/exception-64s.h
> index 931a74ba037b..80f01d5683c3 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -711,6 +711,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
>* Soft disable the IRQs, otherwise it might cause a CPU hang.  \
>*/ \
>   RECONCILE_IRQ_STATE(r10, r11);  \
> + /*  \
> +  * Although this cause will be set initially, it might be   \
> +  * updated later, once the exception is better understood   \
> +  */ \
>   li  r3, cause;  \
>   bl  tm_reclaim_current; \
>   li  r3, 1;  /* Reclaim happened */  \
> diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S
> index 5c685a46202d..47e05b09eed6 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -786,6 +786,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  2:   /* User process was in a transaction */
>   bl  save_nvgprs
> + TM_KERNEL_ENTRY(TM_CAUSE_FAC_UNAV)
>   RECONCILE_IRQ_STATE(r10, r11)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  fp_unavailable_tm
> @@ -1128,6 +1129,7 @@ BEGIN_FTR_SECTION
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  2:   /* User process was in a transaction */
>   bl  save_nvgprs
> + TM_KERNEL_ENTRY(TM_CAUSE_FAC_UNAV)
>   RECONCILE_IRQ_STATE(r10, r11)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  altivec_unavailable_tm
> @@ -1164,6 +1166,7 @@ BEGIN_FTR_SECTION
>  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>  2:   /* User process was in a transaction */
>   bl  save_nvgprs
> + TM_KERNEL_ENTRY(TM_CAUSE_FAC_UNAV)
>   RECONCILE_IRQ_STATE(r10, r11)
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  vsx_unavailable_tm
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 9a86572db1ef..e74b735e974c 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1742,23 +1742,10 @@ void fp_unavailable_tm(struct pt_regs *regs)
>   * transaction, and probably retry but now with FP enabled.  So the
>   * checkpointed FP registers need to be loaded.
>*/
> - tm_reclaim_current(TM_CAUSE_FAC_UNAV);
> -
> - /*
> -  * Reclaim initially saved out bogus (lazy) FPRs to ckfp_state, and
> -  * then it was overwrite by the thr->fp_state by tm_reclaim_thread().
> -  *
> -  * At this point, ck{fp,vr}_state contains the exact values we want to
> -  * recheckpoint.
> -  */
> + WARN_ON(MSR_TM_SUSPENDED(mfmsr()));
>  
>   /* Enable FP for the task: */
>   current->thread.load_fp = 1;
> -
> - /*
> -  * Recheckpoint all the checkpointed ckpt, ck{fp, vr}_state registers.
> -  */
> - tm_recheckpoint(>thread);
>  }
>  
>  void altivec_unavailable_tm(struct pt_regs *regs)
> @@ -1770,10 +1757,10 @@ void altivec_unavailable_tm(struct pt_regs *regs)
>   TM_DEBUG("Vector Unavailable trap whilst transactional at 0x%lx,"
>"MSR=%lx\n",
>regs->nip, regs->msr);
> - tm_reclaim_current(TM_CAUSE_FAC_UNAV);
> + WARN_ON(MSR_TM_SUSPENDED(mfmsr()));
>   current->thread.load_vec = 1;
> - tm_recheckpoint(>thread);
>   current->thread.used_vr = 1;
> +
>  }
>  
>  void vsx_unavailable_tm(struct pt_regs *regs)
> @@ -1792,12 +1779,11 @@ void vsx_unavailable_tm(struct pt_regs *regs)
>

Re: [RFC PATCH v2 00/14] New TM Model

2018-11-06 Thread Michael Neuling



> In fact, I was the one that identified this performance degradation issue,
> and reported to Adhemerval who kindly fixed it with
> f0458cf4f9ff3d870c43b624e6dccaaf657d5e83.
> 
> Anyway, I think we are safe here.

FWIW Agreed. PPC_FEATURE2_HTM_NOSC should be persevered by this series.

Mikey

Re: [PATCH v4] powerpc: Avoid code patching freed init sections

2018-10-02 Thread Michael Neuling

On Tue, 2018-10-02 at 23:35 +0200, Andreas Schwab wrote:
> On Sep 14 2018, Michael Neuling  wrote:
> 
> > This stops us from doing code patching in init sections after they've
> > been freed.
> 
> This breaks booting on PowerBook6,7, crashing very early.

Sorry, Can you try?

http://patchwork.ozlabs.org/patch/977195/

Mikey

Re: [PATCH 2/2] powerpc/tm: Avoid SPR flush if TM is disabled

2018-10-01 Thread Michael Neuling

On Mon, 2018-10-01 at 16:47 -0300, Breno Leitao wrote:
> There is a bug in the flush_tmregs_to_thread() function, where it forces
> TM SPRs to be saved to the thread even if the TM facility is disabled.
> 
> This bug could be reproduced using a simple test case:
> 
>   mtspr(SPRN_TEXASR, XX);
>   sleep until load_tm == 0
>   cause a coredump
>   read SPRN_TEXASR in the coredump
> 
> In this case, the coredump may contain an invalid SPR, because the
> current code is flushing live SPRs (Used by the last thread with TM
> active) into the current thread, overwriting the latest SPRs (which were
> valid).
> 
> This patch checks if TM is enabled for current task before
> saving the SPRs, otherwise, the TM is lazily disabled and the thread
> value is already up-to-date and could be used directly, and saving is
> not required.

Acked-by: Michael Neuling 

Breno, can you send your selftest upstream that also?

Mikey

> 
> Fixes: cd63f3cf1d5 ("powerpc/tm: Fix saving of TM SPRs in core dump")
> Signed-off-by: Breno Leitao 
> ---
>  arch/powerpc/kernel/ptrace.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 9667666eb18e..e0a2ee865032 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -138,7 +138,12 @@ static void flush_tmregs_to_thread(struct task_struct
> *tsk)
>  
>   if (MSR_TM_SUSPENDED(mfmsr())) {
>   tm_reclaim_current(TM_CAUSE_SIGNAL);
> - } else {
> + } else if (tm_enabled(tsk)) {
> + /*
> +  * Only flush TM SPRs to the thread if TM was enabled,
> +  * otherwise (TM lazily disabled), the thread already
> +  * contains the latest SPR value
> +  */
>   tm_enable();
>   tm_save_sprs(&(tsk->thread));
>   }

Re: [PATCH] powerpc/lib: fix book3s/32 boot failure due to code patching

2018-10-01 Thread Michael Neuling

On Mon, 2018-10-01 at 12:21 +, Christophe Leroy wrote:
> Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init
> sections") accesses 'init_mem_is_free' flag too early, before the
> kernel is relocated. This provokes early boot failure (before the
> console is active).
> 
> As it is not necessary to do this verification that early, this
> patch moves the test into patch_instruction() instead of
> __patch_instruction().
> 
> This modification also has the advantage of avoiding unnecessary
> remappings.
> 
> Fixes: 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections")
> Signed-off-by: Christophe Leroy 

Thanks

Acked-by: Michael Neuling 

The original patch was also marked for stable so we should do the same here.

Cc: sta...@vger.kernel.org # 4.13+

> ---
>  arch/powerpc/lib/code-patching.c | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-
> patching.c
> index 6ae2777c220d..5ffee298745f 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -28,12 +28,6 @@ static int __patch_instruction(unsigned int *exec_addr,
> unsigned int instr,
>  {
>   int err;
>  
> - /* Make sure we aren't patching a freed init section */
> - if (init_mem_is_free && init_section_contains(exec_addr, 4)) {
> - pr_debug("Skipping init section patching addr: 0x%px\n",
> exec_addr);
> - return 0;
> - }
> -
>   __put_user_size(instr, patch_addr, 4, err);
>   if (err)
>   return err;
> @@ -148,7 +142,7 @@ static inline int unmap_patch_area(unsigned long addr)
>   return 0;
>  }
>  
> -int patch_instruction(unsigned int *addr, unsigned int instr)
> +static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>  {
>   int err;
>   unsigned int *patch_addr = NULL;
> @@ -188,12 +182,22 @@ int patch_instruction(unsigned int *addr, unsigned int
> instr)
>  }
>  #else /* !CONFIG_STRICT_KERNEL_RWX */
>  
> -int patch_instruction(unsigned int *addr, unsigned int instr)
> +static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>  {
>   return raw_patch_instruction(addr, instr);
>  }
>  
>  #endif /* CONFIG_STRICT_KERNEL_RWX */
> +
> +int patch_instruction(unsigned int *addr, unsigned int instr)
> +{
> + /* Make sure we aren't patching a freed init section */
> + if (init_mem_is_free && init_section_contains(addr, 4)) {
> + pr_debug("Skipping init section patching addr: 0x%px\n", addr);
> + return 0;
> + }
> + return do_patch_instruction(addr, instr);
> +}
>  NOKPROBE_SYMBOL(patch_instruction);
>  
>  int patch_branch(unsigned int *addr, unsigned long target, int flags)

Re: [v4] powerpc: Avoid code patching freed init sections

2018-10-01 Thread Michael Neuling

On Mon, 2018-10-01 at 13:25 +0200, Christophe LEROY wrote:
> 
> Le 21/09/2018 à 13:59, Michael Ellerman a écrit :
> > On Fri, 2018-09-14 at 01:14:11 UTC, Michael Neuling wrote:
> > > This stops us from doing code patching in init sections after they've
> > > been freed.
> > > 
> > > In this chain:
> > >kvm_guest_init() ->
> > >  kvm_use_magic_page() ->
> > >fault_in_pages_readable() ->
> > >__get_user() ->
> > >  __get_user_nocheck() ->
> > >barrier_nospec();
> > > 
> > > We have a code patching location at barrier_nospec() and
> > > kvm_guest_init() is an init function. This whole chain gets inlined,
> > > so when we free the init section (hence kvm_guest_init()), this code
> > > goes away and hence should no longer be patched.
> > > 
> > > We seen this as userspace memory corruption when using a memory
> > > checker while doing partition migration testing on powervm (this
> > > starts the code patching post migration via
> > > /sys/kernel/mobility/migration). In theory, it could also happen when
> > > using /sys/kernel/debug/powerpc/barrier_nospec.
> > > 
> > > cc: sta...@vger.kernel.org # 4.13+
> > > Signed-off-by: Michael Neuling 
> > > Reviewed-by: Nicholas Piggin 
> > > Reviewed-by: Christophe Leroy 
> > 
> > Applied to powerpc fixes, thanks.
> > 
> > https://git.kernel.org/powerpc/c/51c3c62b58b357e8d35e4cc32f7b4e
> > 
> 
> This patch breaks booting on my MPC83xx board (book3s/32) very early 
> (before console is active), provoking restart.
> u-boot reports a checkstop reset at restart.
> 
> Reverting this commit fixes the issue.
> 
> The following patch fixes the issue as well, but I think it is not the 
> best solution. I still think the test should be in patch_instruction() 
> instead of being in __patch_instruction(), see my comment on v2

Arrh, sorry.

Can you write this up as a real patch with a signed off by so mpe can take it?

Mikey

> 
> Christophe
> 
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index 6ae2777..6192fda 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -29,7 +29,7 @@ static int __patch_instruction(unsigned int 
> *exec_addr, unsigned int instr,
>  int err;
> 
>  /* Make sure we aren't patching a freed init section */
> -   if (init_mem_is_free && init_section_contains(exec_addr, 4)) {
> +   if (*PTRRELOC(_mem_is_free) && 
> init_section_contains(exec_addr, 4)) {
>  pr_debug("Skipping init section patching addr: 
> 0x%px\n", exec_addr);
>  return 0;
>  }
> 
> 
> Christophe
>

Re: [RFC PATCH 08/11] powerpc/tm: Do not reclaim on ptrace

2018-09-30 Thread Michael Neuling

On Sun, 2018-09-30 at 20:51 -0300, Breno Leitao wrote:
> Hi Mikey,
> 
> On 09/28/2018 02:36 AM, Michael Neuling wrote:
> > > > > + WARN_ON(MSR_TM_SUSPENDED(mfmsr())); + + tm_enable(); + 
> > > > > tm_save_sprs(&(tsk->thread));
> > > > 
> > > > Do we need to check if TM was enabled in the task before saving the
> > > > TM SPRs?
> > > > 
> > > > What happens if TM was lazily off and hence we had someone else's TM 
> > > > SPRs in the CPU currently?  Wouldn't this flush the wrong values to 
> > > > the task_struct?
> > > > 
> > > > I think we need to check the processes MSR before doing this.
> > > 
> > > Yes, it is a *very* good point, and I think we are vulnerable even 
> > > before this patch (in the current kernel). Take a look above, we are 
> > > calling tm_save_sprs() if MSR is not TM suspended independently if TM
> > > is lazily off.
> > 
> > I think you're right, we might already have an issue.  There are some 
> > paths in here that don't check the userspace msr or any of the lazy tm 
> > enable. :(
> 
> I was able to create a test case that reproduces this bug cleanly.
> 
> The testcase basically sleeps for N cycles, and then segfaults.
> 
> If N is high enough to have load_tm overflowed, then you see a corrupted
> TEXASR value in the core dump file. If load_tm != 0 during the coredump, you
> see the expected TEXASR value.
> 
> I wrote a small bash that check for both cases.
> 
>   $ git clone https://github.com/leitao/texasr_corrupt.git
>   $ make check
> 
> Anyway, I will propose a fix for this problem soon, since this whole patchset
> may delay to get ready. Is it OK?

Yeah, best to get a fix for this one out soon.

Mikey

Re: [RFC PATCH 08/11] powerpc/tm: Do not reclaim on ptrace

2018-09-27 Thread Michael Neuling

On Thu, 2018-09-27 at 18:03 -0300, Breno Leitao wrote:
> Hi Mikey,
> 
> On 09/18/2018 02:36 AM, Michael Neuling wrote:
> > On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
> > > Make sure that we are not suspended on ptrace and that the registers were
> > > already reclaimed.
> > > 
> > > Since the data was already reclaimed, there is nothing to be done here
> > > except to restore the SPRs.
> > > 
> > > Signed-off-by: Breno Leitao 
> > > ---
> > >  arch/powerpc/kernel/ptrace.c | 10 --
> > >  1 file changed, 4 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> > > index 9667666eb18e..cf6ee9154b11 100644
> > > --- a/arch/powerpc/kernel/ptrace.c
> > > +++ b/arch/powerpc/kernel/ptrace.c
> > > @@ -136,12 +136,10 @@ static void flush_tmregs_to_thread(struct
> > > task_struct
> > > *tsk)
> > >   if ((!cpu_has_feature(CPU_FTR_TM)) || (tsk != current))
> > >   return;
> > >  
> > > - if (MSR_TM_SUSPENDED(mfmsr())) {
> > > - tm_reclaim_current(TM_CAUSE_SIGNAL);
> > > - } else {
> > > - tm_enable();
> > > - tm_save_sprs(&(tsk->thread));
> > > - }
> > > + WARN_ON(MSR_TM_SUSPENDED(mfmsr()));
> > > +
> > > + tm_enable();
> > > + tm_save_sprs(&(tsk->thread));
> > 
> > Do we need to check if TM was enabled in the task before saving the TM SPRs?
> > 
> > What happens if TM was lazily off and hence we had someone else's TM SPRs in
> > the
> > CPU currently?  Wouldn't this flush the wrong values to the task_struct?
> > 
> > I think we need to check the processes MSR before doing this.
> 
> Yes, it is a *very* good point, and I think we are vulnerable even before
> this patch (in the current kernel). Take a look above, we are calling
> tm_save_sprs() if MSR is not TM suspended independently if TM is lazily off.

I think you're right, we might already have an issue.  There are some paths in
here that don't check the userspace msr or any of the lazy tm enable. :(

Mikey


> It shouldn't be hard to create a test case for it. I can try to call
> ptrace(PTRACE_GETVRREGS) on a task that sleeps until TM is lazily disabled,
> compare if the TM SPR changed in this mean time. (while doing HTM in parallel
> to keep HTM SPR changing). Let's see if I can read others task TM SPRs.
> 
> Thank you.
>

Re: [RFC PATCH 05/11] powerpc/tm: Function that updates the failure code

2018-09-27 Thread Michael Neuling

On Thu, 2018-09-27 at 17:58 -0300, Breno Leitao wrote:
> Hi Mikey,
> 
> On 09/17/2018 10:29 PM, Michael Neuling wrote:
> > On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
> > > Now the transaction reclaims happens very earlier in the trap handler, and
> > > it is impossible to know precisely, at that early time, what should be set
> > > as the failure cause for some specific cases, as, if the task will be
> > > rescheduled, thus, the transaction abort case should be updated from
> > > TM_CAUSE_MISC to TM_CAUSE_RESCHED, for example.
> > 
> > Please add comments to where this is used (in EXCEPTION_COMMON macro I
> > think)
> > that say this might happen.
> 
> Is it OK if I comment it in TM_KERNEL_ENTRY macro, since the failure cause
> could be updated independently of the exception being execute, so, every call
> to TM_KERNEL_ENTRY can have the cause overwritten.

Sure.


> I.e. it does not matter if  the exception is a systemcall or a page fault,
> the failure cause will be updated if there is a process reschedule after the
> exception/syscall is handled.
> 
> Thank you
>

Re: [RFC PATCH 11/11] selftests/powerpc: Adapt the test

2018-09-27 Thread Michael Neuling

On Thu, 2018-09-27 at 17:57 -0300, Breno Leitao wrote:
> Hi Mikey,
> 
> On 09/18/2018 03:36 AM, Michael Neuling wrote:
> > On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
> > > The Documentation/powerpc/transactional_memory.txt says:
> > > 
> > >  "Syscalls made from within a suspended transaction are performed as
> > > normal
> > >   and the transaction is not explicitly doomed by the kernel.  However,
> > >   what the kernel does to perform the syscall may result in the
> > > transaction
> > >   being doomed by the hardware."
> > > 
> > > With this new TM mechanism, the syscall will continue to be executed if
> > > the
> > > syscall happens on a suspended syscall, but, the syscall will *fail* if
> > > the
> > > transaction is still active during the syscall invocation.
> > 
> > Not sure I get this. This doesn't seem any different to before.
> > 
> > An active (not suspended) transaction *will* result in the syscall failing
> > and
> > the transaction being doomed.  
> > 
> > A syscall in a suspended transaction should succeed and the transaction.
> 
> I understand that a transaction will always be doomed if there is a
> reclaim/checkpoint, thus, if the you make a system call inside a suspended
> transaction, it will reclaim and recheckpoint, thus, dooming the transaction,
> and failing it on the next RFID.

So currently a syscall won't explicitly treclaim/trecheckpoint so it won't
necessarily be doomed.

With your new patches it will be.

> So, the syscall executed in suspended mode may succeed, because it will not
> be skipped (as in active mode), but the transaction will *always* fail,
> because there was a reclaim and recheckpoint.
> 
> > You might need to clean up the language. I try to use:
> > 
> >Active == transactional but not suspended (ie MSR[TS] = T)
> >Suspended == suspended (ie MSR [TS] = S)
> >Doomed == transaction to be rolled back at next opportinity (ie tcheck
> > returns doomed)
> > 
> > (note: the kernel MSR_TM_ACTIVE() macro is not consistent with this since
> > it's
> > MSR[TS] == S or T).
> 
> Ok, I agree with this language as well. I might want to improve the code to
> follow the same language all across the code.
> 
> > > On the syscall path, if the transaction is active and not suspended, it
> > > will call TM_KERNEL_ENTRY which will reclaim and recheckpoint the
> > > transaction, thus, dooming the transaction on userspace return, with
> > > failure code TM_CAUSE_SYSCALL.
> > 
> > But the test below is on a suspend transaction?
> 
> Sorry, I meant "suspended transaction" above instead of "transaction is
> active and not suspended".
> 
> > 
> > > This new model will break part of this test, but I understand that that
> > > the
> > > documentation above didn't guarantee that the syscall would succeed, and
> > > it
> > > will never succeed anymore now on.
> > 
> > The syscall should pass in suspend (modulo the normal syscall checks). The
> > transaction may fail as a result.
> 
> Perfect, and if the transaction fail, the CPU will rollback the changes and
> restore the checkpoint registers (replacing the r3 that contains the pid
> value), thus, it will be like "getpid" system call didn't execute.

No.  If we are suspended, then we go back right after the sc. We don't get
rolled back till the tresume.

Mikey

> For this test specifically, it assumes the syscall didn't execute if the
> transaction failed. Take a look:
> 
>   FUNC_START(getppid_tm_suspended)
>   tbegin.
>   beq 1f
>   li  r0, __NR_getppid
>   tsuspend.
>   sc
>   tresume.
>   tend.
>   blr
>   1:
>   li  r3, -1
>   blr
> 
> So, the tests thinks the syscall failed because the transaction aborted.
> 
> Anyway, I can try to improve this test other than removing this test, but I
> am not sure how.
> 
> Thank you
>

Re: [RFC PATCH 10/11] powerpc/tm: Set failure summary

2018-09-27 Thread Michael Neuling

On Thu, 2018-09-27 at 17:52 -0300, Breno Leitao wrote:
> Hi Mikey,
> 
> On 09/18/2018 02:50 AM, Michael Neuling wrote:
> > On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
> > > Since the transaction will be doomed with treckpt., the TEXASR[FS]
> > > should be set, to reflect that the transaction is a failure. This patch
> > > ensures it before recheckpointing, and remove changes from other places
> > > that were calling recheckpoint.
> > 
> > TEXASR[FS] should be set by the reclaim. 
> 
> Do you mean that the CPU should set TEXASR[FS] when treclaim is called, or,
> that the tm_reclaim?
> 
> Looking at the ISA, I didn't see TEXASR[FS] being set automatically when a
> reclaim happens, 
treclaim in ISA talks about "TMRecordFailure(cause)" and that macro sets
TEXASR[FS]=1. 

So yes treclaim always sets TEXASR[FS]=1.

> although, I see it needs TEXASR[FS] to be set when executing a 
> trecheckpoint, otherwise it will cause a TM Bad Thing.

Yep

> That is why I am forcing TEXASR[FS]=1 to doom the transaction so we can
> recheckpoint it, but it seems I understood this wrongly.

You shouldn't need to.  We do a bug_on() just before the trecheckpoint to make
sure it's set, but you shouldn't need to set it (other than the signals case).

> > I don't know why you'd need to set this
> > explicitly in process.c. The only case is when the user supplies a bad
> > signal
> > context, but we should check that in the signals code, not process.c
> > 
> > Hence I think this patch is wrong.
> > 
> > Also, according to the architecture, TEXASR[FS] HAS TO BE SET on
> > trecheckpoint
> > otherwise you'll get a TM Bad Thing. You should say that rather than
> > suggesting
> > it's because the transaction is doomed. It's ilqlegal to not do it. That's
> > why we
> > have this check in arch/powerpc/kernel/tm.S.
> 
> When you say "HAS TO BE SET", do you mean that the hardware will set it and
> we shouldn't care about this flag? Thus, if I am hitting EMIT_BUG_ENTRY, it
> means my TEXASR was messed somehow?

I'm just saying what you said before about the tm bad thing.  We have to make
sure it's set before we do the trecheckpoint otherwise we end up in the TM bad
thing. The treclaim should handle this for us (but again the signals case need
to be checked).

Mikey

Re: [RFC PATCH 09/11] powerpc/tm: Do not restore default DSCR

2018-09-27 Thread Michael Neuling

On Thu, 2018-09-27 at 17:51 -0300, Breno Leitao wrote:
> Hi Mikey,
> 
> On 09/18/2018 02:41 AM, Michael Neuling wrote:
> > On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
> > > In the previous TM code, trecheckpoint was being executed in the middle of
> > > an exception, thus, DSCR was being restored to default kernel DSCR value
> > > after trecheckpoint was done.
> > > 
> > > With this current patchset, trecheckpoint is executed just before getting
> > > to userspace, at ret_from_except_lite, for example. Thus, we do not need
> > > to
> > > set default kernel DSCR value anymore, as we are leaving kernel space.  It
> > > is OK to keep the checkpointed DSCR value into the live SPR, mainly
> > > because
> > > the transaction is doomed and it will fail soon (after RFID), 
> > 
> > What if we are going back to a suspended transaction?  It will remain live
> > until
> > userspace does a tresume
> 
> Hmm, I understand that once we get in kernel space, and call
> treclaim/trecheckpoint, the transaction will be doomed and it will abort and
> rollback when we leave kernel space. I.e., if we can treclaim/trecheckpointn
> in kernel space, the transaction will *always* abort just after RFID, so, a
> possible tresume will never be executed. Is my understanding wrong?

Your understanding is wrong. We don't roll back until MSR[TS] = T.

There are two cases for the RFID.

1) if you RFID back to userspace that is transactional (ie MSR[TS] = T), then it
will immediately rollback as soon as the RFID happens since we are going
directly to T.

2) If you RFID back to userspace that is suspended (ie MSR[TS] = S), then it
won't roll back until userspace does a tresume. It doesn't rollback until we go
MSR[TS] = S -> T).

> > 
> > > so,
> > > continuing with the pre-checkpointed DSCR value is what seems correct.
> > 
> > Reading this description suggests this patch isn't really needed. Right?
> 
> Maybe the description is not clear, but I understand this patch is required,
> otherwise we will leave userspace with a default DSCR value.
> 
> By the way, do you know if there is a change in DSCR inside a transaction,
> will it be reverted if the transaction is aborted?

Yes it will be reverted. We even have a selftest for it in
tools/testing/selftests/powerpc/tm/tm-resched-dscr.c

There are a bunch of checkpointed SPRs. From the arch:
   Checkpointed registers: The set of registers that are
   saved to the “checkpoint area” when a transaction is
   initiated, and restored upon transaction failure, is a
   subset of the architected register state, consisting of
   the General Purpose Registers, Floating-Point Regis-
   ters, Vector Registers, Vector-Scalar Registers, and the
   following Special Registers and fields: CR fields other
   than CR0, XER, LR, CTR, FPSCR, AMR, PPR,
   VRSAVE, VSCR, DSCR, and TAR.

Mikey

[PATCH] powerpc/tm: Reformat comments

2018-09-26 Thread Michael Neuling

The comments in this file don't conform to the coding style so take
them to "Comment Formatting Re-Education Camp"

Suggested-by: Michael "Camp Drill Sargent" Ellerman 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/tm.S | 49 +---
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 50e5cff10d..c954aecd9b 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -92,7 +92,8 @@ _GLOBAL(tm_abort)
blr
 EXPORT_SYMBOL_GPL(tm_abort);
 
-/* void tm_reclaim(struct thread_struct *thread,
+/*
+ * void tm_reclaim(struct thread_struct *thread,
  *uint8_t cause)
  *
  * - Performs a full reclaim.  This destroys outstanding
@@ -163,14 +164,16 @@ _GLOBAL(tm_reclaim)
 */
TRECLAIM(R4)/* Cause in r4 */
 
-   /*  GPRs  */
-   /* Stash the checkpointed r13 away in the scratch SPR and get the real
+   /*
+*  GPRs 
+* Stash the checkpointed r13 away in the scratch SPR and get the real
 *  paca
 */
SET_SCRATCH0(r13)
GET_PACA(r13)
 
-   /* Stash the checkpointed r1 away in paca tm_scratch and get the real
+   /*
+* Stash the checkpointed r1 away in paca tm_scratch and get the real
 * stack pointer back
 */
std r1, PACATMSCRATCH(r13)
@@ -195,7 +198,8 @@ _GLOBAL(tm_reclaim)
 
addir7, r12, PT_CKPT_REGS   /* Thread's ckpt_regs */
 
-   /* Make r7 look like an exception frame so that we
+   /*
+* Make r7 look like an exception frame so that we
 * can use the neat GPRx(n) macros.  r7 is NOT a pt_regs ptr!
 */
subir7, r7, STACK_FRAME_OVERHEAD
@@ -223,7 +227,8 @@ _GLOBAL(tm_reclaim)
/*  NIP  */
mfspr   r3, SPRN_TFHAR
std r3, _NIP(r7)/* Returns to failhandler */
-   /* The checkpointed NIP is ignored when rescheduling/rechkpting,
+   /*
+* The checkpointed NIP is ignored when rescheduling/rechkpting,
 * but is used in signal return to 'wind back' to the abort handler.
 */
 
@@ -246,12 +251,14 @@ _GLOBAL(tm_reclaim)
std r3, THREAD_TM_TAR(r12)
std r4, THREAD_TM_DSCR(r12)
 
-   /* MSR and flags:  We don't change CRs, and we don't need to alter
+   /*
+* MSR and flags:  We don't change CRs, and we don't need to alter
 * MSR.
 */
 
 
-   /*  FPR/VR/VSRs 
+   /*
+*  FPR/VR/VSRs 
 * After reclaiming, capture the checkpointed FPRs/VRs.
 *
 * We enabled VEC/FP/VSX in the msr above, so we can execute these
@@ -277,7 +284,8 @@ _GLOBAL(tm_reclaim)
stfdfr0,FPSTATE_FPSCR(r7)
 
 
-   /* TM regs, incl TEXASR -- these live in thread_struct.  Note they've
+   /*
+* TM regs, incl TEXASR -- these live in thread_struct.  Note they've
 * been updated by the treclaim, to explain to userland the failure
 * cause (aborted).
 */
@@ -313,7 +321,7 @@ _GLOBAL(tm_reclaim)
blr
 
 
-   /*
+   /*
 * void __tm_recheckpoint(struct thread_struct *thread)
 *  - Restore the checkpointed register state saved by tm_reclaim
 *when we switch_to a process.
@@ -329,7 +337,8 @@ _GLOBAL(__tm_recheckpoint)
std r2, STK_GOT(r1)
stdur1, -TM_FRAME_SIZE(r1)
 
-   /* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD].
+   /*
+* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD].
 * This is used for backing up the NVGPRs:
 */
SAVE_NVGPRS(r1)
@@ -338,7 +347,8 @@ _GLOBAL(__tm_recheckpoint)
 
addir7, r3, PT_CKPT_REGS/* Thread's ckpt_regs */
 
-   /* Make r7 look like an exception frame so that we
+   /*
+* Make r7 look like an exception frame so that we
 * can use the neat GPRx(n) macros.  r7 is now NOT a pt_regs ptr!
 */
subir7, r7, STACK_FRAME_OVERHEAD
@@ -407,14 +417,15 @@ restore_gprs:
 
REST_NVGPRS(r7) /* GPR14-31 */
 
-   /* Load up PPR and DSCR here so we don't run with user values for long
-*/
+   /* Load up PPR and DSCR here so we don't run with user values for long 
*/
mtspr   SPRN_DSCR, r5
mtspr   SPRN_PPR, r6
 
-   /* Do final sanity check on TEXASR to make sure FS is set.  Do this
+   /*
+* Do final sanity check on TEXASR to make sure FS is set.  Do this
 * here before we load up the userspace r1 so any bugs we hit will get
-* a call chain */
+* a call chain
+*/
mfspr   r5, SPRN_TEXASR

Re: [PATCH] powerpc/tm: Avoid possible userspace r1 corruption on reclaim

2018-09-25 Thread Michael Neuling

On Tue, 2018-09-25 at 22:00 +1000, Michael Ellerman wrote:
> Michael Neuling  writes:
> > Current we store the userspace r1 to PACATMSCRATCH before finally
> > saving it to the thread struct.
> > 
> > In theory an exception could be taken here (like a machine check or
> > SLB miss) that could write PACATMSCRATCH and hence corrupt the
> > userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but
> > others do.
> > 
> > We've never actually seen this happen but it's theoretically
> > possible. Either way, the code is fragile as it is.
> > 
> > This patch saves r1 to the kernel stack (which can't fault) before we
> > turn MSR[RI] back on. PACATMSCRATCH is still used but only with
> > MSR[RI] off. We then copy r1 from the kernel stack to the thread
> > struct once we have MSR[RI] back on.
> > 
> > Suggested-by: Breno Leitao 
> > Signed-off-by: Michael Neuling 
> > ---
> >  arch/powerpc/kernel/tm.S | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
> > index 701b0f5b09..8207816a1e 100644
> > --- a/arch/powerpc/kernel/tm.S
> > +++ b/arch/powerpc/kernel/tm.S
> > @@ -178,6 +178,12 @@ _GLOBAL(tm_reclaim)
> >  
> > std r11, GPR11(r1)  /* Temporary stash */
> >  
> > +   /* Move r1 to kernel stack in case PACATMSCRATCH is used once
> > +* we turn on RI
> > +*/
> 
> I see we still need to send you to Comment Formatting Re-Education Camp.

The rest of that file has they style, so I'm just keeping with that.  I can
submit a patch later to fix them all up.

> I rewrote it a bit too, to hopefully be clearer?
> 
>   /*
>* Move the saved user r1 to the kernel stack in case PACATMSCRATCH is
>* clobbered by an exception once we turn on MSR_RI below.
>*/
>   ld  r11, PACATMSCRATCH(r13)
>   std r11, GPR1(r1)

Yeah, that's clearer... thanks.

Mikey

[PATCH] powerpc/tm: Avoid possible userspace r1 corruption on reclaim

2018-09-25 Thread Michael Neuling

Current we store the userspace r1 to PACATMSCRATCH before finally
saving it to the thread struct.

In theory an exception could be taken here (like a machine check or
SLB miss) that could write PACATMSCRATCH and hence corrupt the
userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but
others do.

We've never actually seen this happen but it's theoretically
possible. Either way, the code is fragile as it is.

This patch saves r1 to the kernel stack (which can't fault) before we
turn MSR[RI] back on. PACATMSCRATCH is still used but only with
MSR[RI] off. We then copy r1 from the kernel stack to the thread
struct once we have MSR[RI] back on.

Suggested-by: Breno Leitao 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/tm.S | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 701b0f5b09..8207816a1e 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -178,6 +178,12 @@ _GLOBAL(tm_reclaim)
 
std r11, GPR11(r1)  /* Temporary stash */
 
+   /* Move r1 to kernel stack in case PACATMSCRATCH is used once
+* we turn on RI
+*/
+   ld  r11, PACATMSCRATCH(r13)
+   std r11, GPR1(r1)
+
/* Store r13 away so we can free up the scratch SPR for the
 * SLB fault handler (needed once we start access the
 * thread_struct)
@@ -214,7 +220,7 @@ _GLOBAL(tm_reclaim)
SAVE_GPR(8, r7) /* user r8 */
SAVE_GPR(9, r7) /* user r9 */
SAVE_GPR(10, r7)/* user r10 */
-   ld  r3, PACATMSCRATCH(r13)  /* user r1 */
+   ld  r3, GPR1(r1)/* user r1 */
ld  r4, GPR7(r1)/* user r7 */
ld  r5, GPR11(r1)   /* user r11 */
ld  r6, GPR12(r1)   /* user r12 */
-- 
2.17.1

Re: [PATCH] powerpc/tm: Fix userspace r13 corruption

2018-09-24 Thread Michael Neuling

On Mon, 2018-09-24 at 11:32 -0300, Breno Leitao wrote:
> Hi Mikey,
> 
> On 09/24/2018 04:27 AM, Michael Neuling wrote:
> > When we treclaim we store the userspace checkpointed r13 to a scratch
> > SPR and then later save the scratch SPR to the user thread struct.
> > 
> > Unfortunately, this doesn't work as accessing the user thread struct
> > can take an SLB fault and the SLB fault handler will write the same
> > scratch SPRG that now contains the userspace r13.
> > 
> > To fix this, we store r13 to the kernel stack (which can't fault)
> > before we access the user thread struct.
> > 
> > Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
> > as a random userspace segfault with r13 looking like a kernel address.
> > 
> > Signed-off-by: Michael Neuling 
> 
> Reviewed-by: Breno Leitao 
> 
> I am wondering if the same problem could not happen with r1 as well, since r1
> is kept in the paca->tm_scratch after MSR[RI] is enabled, in a code similar
> to this:
> 
> std r1, PACATMSCRATCH(r13)
>   ..
> li  r11, MSR_RI
>   mtmsrd  r11, 1
>   
> ld  r3, PACATMSCRATCH(r13)  /* user r1 */
> std r3, GPR1(r7)
> 
> There might be a corruption if the exception that is raised just after
> MSR[RI] is enabled somehow exits through 'fast_exception_return' (being
> called by most of the exception handlers as I understand) which will
> overwrite paca->tm_scratch with the upcoming MSR (aka SRR1) just before the
> SRR1, as:
> 
>   ld  r3,_MSR(r1)
> 
>   ...
>   std r3, PACATMSCRATCH(r13) /* Stash returned-to MSR */
> 
> 
> It seems that slb_miss_common and instruction_access_slb does not go through
> this path, but handle_page_fault calls
> ret_from_except_lite->fast_exception_return, which might cause this
> 'possible' issue.

Yeah, good catch! I think we are ok but it's delicate. 

I'll store it in the kernel stack and avoid PACATMSCRATCH before I turn RI on.  

This look ok?

diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 701b0f5b09..ff781b2852 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -178,6 +178,12 @@ _GLOBAL(tm_reclaim)
 
std r11, GPR11(r1)  /* Temporary stash */
 
+   /* Move r1 to kernel stack in case PACATMSCRATCH is used once
+* we turn on RI
+*/
+   ld  r11, PACATMSCRATCH(r13)
+   std r11, GPR1(r1)
+
/* Store r13 away so we can free up the scratch SPR for the
 * SLB fault handler (needed once we start access the
 * thread_struct)
@@ -214,7 +220,7 @@ _GLOBAL(tm_reclaim)
SAVE_GPR(8, r7) /* user r8 */
SAVE_GPR(9, r7) /* user r9 */
SAVE_GPR(10, r7)/* user r10 */
-   ld  r3, PACATMSCRATCH(r13)  /* user r1 */
+   ld  r3, GPR1(r1)/* user r1 */
ld  r4, GPR7(r1)/* user r7 */
ld  r5, GPR11(r1)   /* user r11 */
ld  r6, GPR12(r1)   /* user r12 */

[PATCH] powerpc/tm: Fix userspace r13 corruption

2018-09-24 Thread Michael Neuling

When we treclaim we store the userspace checkpointed r13 to a scratch
SPR and then later save the scratch SPR to the user thread struct.

Unfortunately, this doesn't work as accessing the user thread struct
can take an SLB fault and the SLB fault handler will write the same
scratch SPRG that now contains the userspace r13.

To fix this, we store r13 to the kernel stack (which can't fault)
before we access the user thread struct.

Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
as a random userspace segfault with r13 looking like a kernel address.

Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/tm.S | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 6bffbc5aff..701b0f5b09 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -176,13 +176,20 @@ _GLOBAL(tm_reclaim)
std r1, PACATMSCRATCH(r13)
ld  r1, PACAR1(r13)
 
-   /* Store the PPR in r11 and reset to decent value */
std r11, GPR11(r1)  /* Temporary stash */
 
+   /* Store r13 away so we can free up the scratch SPR for the
+* SLB fault handler (needed once we start access the
+* thread_struct)
+*/
+   GET_SCRATCH0(r11)
+   std r11, GPR13(r1)
+
/* Reset MSR RI so we can take SLB faults again */
li  r11, MSR_RI
mtmsrd  r11, 1
 
+   /* Store the PPR in r11 and reset to decent value */
mfspr   r11, SPRN_PPR
HMT_MEDIUM
 
@@ -211,7 +218,7 @@ _GLOBAL(tm_reclaim)
ld  r4, GPR7(r1)/* user r7 */
ld  r5, GPR11(r1)   /* user r11 */
ld  r6, GPR12(r1)   /* user r12 */
-   GET_SCRATCH0(8) /* user r13 */
+   ld  r8, GPR13(r1)   /* user r13 */
std r3, GPR1(r7)
std r4, GPR7(r7)
std r5, GPR11(r7)
-- 
2.17.1

Re: [PATCH v8 1/3] powerpc: Detect the presence of big-cores via "ibm,thread-groups"

2018-09-23 Thread Michael Neuling

On Fri, 2018-09-21 at 22:47 +0530, Gautham R Shenoy wrote:
> Hello Michael,
> 
> On Fri, Sep 21, 2018 at 01:02:45PM +1000, Michael Neuling wrote:
> > This doesn't compile for me with:
> > 
> > arch/powerpc/kernel/smp.c: In function ‘smp_prepare_cpus’:
> > arch/powerpc/kernel/smp.c:812:23: error: ‘tg.threads_per_group’ may be used
> > uninitialized in this function [-Werror=maybe-uninitialized]
> >   struct thread_groups tg;
> >^
> > arch/powerpc/kernel/smp.c:812:23: error: ‘tg.nr_groups’ may be used
> > uninitialized in this function [-Werror=maybe-uninitialized]
> > cc1: all warnings being treated as errors
> > /home/mikey/src/linux-ozlabs/scripts/Makefile.build:305: recipe for target
> > 'arch/powerpc/kernel/smp.o' failed
> > 
> 
> I couldn't get this error with gcc 4.8.5 and 8.1.1 with
> pseries_defconfig and powernv_defconfig with CONFIG_PPC_WERROR=y.

I'm using 5.4.0 which is in Ubuntu 16.04 (LTS).

> Does the following the following delta patch make it work?

Yep that fixes it, thanks.

Mikey

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1589 matches

Mail list logo