date:20200520

Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-05-20 Thread Michael Ellerman

Ira Weiny  writes:
> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>> modules and add the command family to the white list of NVDIMM command
>> sets. Also advertise support for ND_CMD_CALL for the dimm
>> command mask and implement necessary scaffolding in the module to
>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
...
>> + *
>> + * Payload Version:
>> + *
>> + * A 'payload_version' field is present in PDSM header that indicates a 
>> specific
>> + * version of the structure present in PDSM Payload for a given PDSM 
>> command.
>> + * This provides backward compatibility in case the PDSM Payload structure
>> + * evolves and different structures are supported by 'papr_scm' and 
>> 'libndctl'.
>> + *
>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the 
>> version
>> + * of the payload struct it supports via 'payload_version' field. The 
>> 'papr_scm'
>> + * module when servicing the PDSM envelope checks the 'payload_version' and 
>> then
>> + * uses 'payload struct version' == MIN('payload_version field',
>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of 
>> payload
>> + * struct in returned 'payload_version' field.
>
> FWIW many people believe using a size rather than version is more sustainable.
> It is expected that new payload structures are larger (more features) than the
> previous payload structure.
>
> I can't find references at the moment through.

I think clone_args is a good modern example:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88

cheers

Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-20 Thread Ram Pai

On Wed, May 20, 2020 at 07:43:08PM +0200, Laurent Dufour wrote:
> The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
> Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
> reserved to the Ultravisor.
> 
> However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
> context of the VM calling UV_ESM. This allows the Hypervisor to return to
> the guest without going through the Ultravisor. Thus the Secure bit of SRR1
> is not set in that particular case.
> 
> In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
> filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
> not set in that case.
> 
> Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
> Signed-off-by: Laurent Dufour 


Reviewed-by: Ram Pai 

> ---
>  arch/powerpc/kvm/book3s_hv.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 93493f0cbfe8..6ad1a3b14300 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>   ret = kvmppc_h_svm_init_done(vcpu->kvm);
>   break;
>   case H_SVM_INIT_ABORT:
> - ret = H_UNSUPPORTED;
> - if (kvmppc_get_srr1(vcpu) & MSR_S)
> - ret = kvmppc_h_svm_init_abort(vcpu->kvm);
> + /*
> +  * Even if that call is made by the Ultravisor, the SSR1 value
> +  * is the guest context one, with the secure bit clear as it has
> +  * not yet been secured. So we can't check it here.
> +  */

Frankly speaking, the comment above when read in isolation; i.e without
the delete code above, feels out of place.  The reasoning for change is
anyway captured in the changelog.  So, I think, we should delete this
comment.

Also the comment above assumes the Ultravisor will call H_SVM_INIT_ABORT
with SRR1(S) bit not set; which may or may not be true.  Regardless of
who and how H_SVM_INIT_ABORT is called, we should just call
kvmppc_h_svm_init_abort() and let it deal with the complexities.


RP

Re: linux-next: manual merge of the rcu tree with the powerpc tree

2020-05-20 Thread Stephen Rothwell

Hi all,

On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the rcu tree got a conflict in:
> 
>   arch/powerpc/kernel/traps.c
> 
> between commit:
> 
>   116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting")
> 
> from the powerpc tree and commit:
> 
>   187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()")
> 
> from the rcu tree.
> 
> I fixed it up (I used the powerpc tree version for now) and can carry the
> fix as necessary. This is now fixed as far as linux-next is concerned,
> but any non trivial conflicts should be mentioned to your upstream
> maintainer when your tree is submitted for merging.  You may also want
> to consider cooperating with the maintainer of the conflicting tree to
> minimise any particularly complex conflicts.

This is now a conflict between the powerpc commit and commit

  69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()")

from the tip tree.  I assume that the rcu and tip trees are sharing
some patches (but not commits) :-(

-- 
Cheers,
Stephen Rothwell


pgpK6ud0pBcwi.pgp
Description: OpenPGP digital signature

Re: [PATCH] soc: fsl: qe: Replace one-element array and use struct_size() helper

2020-05-20 Thread Kees Cook

On Wed, May 20, 2020 at 06:52:21PM -0500, Li Yang wrote:
> On Mon, May 18, 2020 at 5:57 PM Kees Cook  wrote:
> > Hm, looking at this code, I see a few other things that need to be
> > fixed:
> >
> > 1) drivers/tty/serial/ucc_uart.c does not do a be32_to_cpu() conversion
> >on the length test (understandably, a little-endian system has never run
> >this code since it's ppc specific), but it's still wrong:
> >
> > if (firmware->header.length != fw->size) {
> >
> >compare to the firmware loader:
> >
> > length = be32_to_cpu(hdr->length);
> >
> > 2) drivers/soc/fsl/qe/qe.c does not perform bounds checking on the
> >per-microcode offsets, so the uploader might send data outside the
> >firmware buffer. Perhaps:
> 
> We do validate the CRC for each microcode, it is unlikely the CRC
> check can pass if the offset or length is not correct.  But you are
> probably right that it will be safer to check the boundary and fail

Right, but a malicious firmware file could still match CRC but trick the
kernel code.

> quicker before we actually start the CRC check.  Will you come up with
> a formal patch or you want us to deal with it?

It sounds like Gustavo will be sending one, though I don't think either
of us have the hardware to test it with, so if you could do that part,
that would be great! :)

-- 
Kees Cook

Re: [PATCH v3] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2020-05-20 Thread Nicholas Piggin

Excerpts from Segher Boessenkool's message of May 18, 2020 10:19 pm:
> Hi!
> 
> On Mon, May 18, 2020 at 04:35:22PM +1000, Michael Ellerman wrote:
>> Nicholas Piggin  writes:
>> > Provide an option to build big-endian kernels using the ELF V2 ABI. This 
>> > works
>> > on GCC and clang (since about 2014). it is is not officially supported by 
>> > the
>> > GNU toolchain, but it can give big-endian kernels  some useful advantages 
>> > of
>> > the V2 ABI (e.g., less stack usage).
> 
>> This doesn't build with clang:
>> 
>>   /tmp/aesp8-ppc-dad624.s: Assembler messages:
>>   /tmp/aesp8-ppc-dad624.s: Error: .size expression for 
>> aes_p8_set_encrypt_key does not evaluate to a constant
> 
> What does this assembler code that clang doesn't swallow look like?  Is
> that valid code?  Etc.

The .size directive calculation is .text - .opd because the preprocessor 
isn't passing -mabi=elfv2 which makes our _GLOBAL function entry macro 
use the v1 convention. I guess I got the 64-bit vdso wrong as well, it 
should remain in ELFv1.

Thanks,
Nick

[PATCH v3 7/7] powerpc: Add POWER10 architected mode

2020-05-20 Thread Alistair Popple

PVR value of 0x0F06 means we are arch v3.1 compliant (i.e. POWER10).
This is used by phyp and kvm when booting as a pseries guest to detect
the presence of new P10 features and to enable the appropriate hwcap and
facility bits.

Signed-off-by: Alistair Popple 
Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/include/asm/cputable.h   | 15 --
 arch/powerpc/include/asm/mmu.h|  1 +
 arch/powerpc/include/asm/prom.h   |  1 +
 arch/powerpc/kernel/cpu_setup_power.S | 20 --
 arch/powerpc/kernel/cputable.c| 30 +++
 arch/powerpc/kernel/prom_init.c   | 12 +--
 6 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 1559dbf72842..bac2252c839e 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -468,6 +468,17 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1 | \
   CPU_FTR_P9_TM_HV_ASSIST | \
   CPU_FTR_P9_TM_XER_SO_BUG)
+#define CPU_FTRS_POWER10 (CPU_FTR_LWSYNC | \
+   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
+   CPU_FTR_MMCRA | CPU_FTR_SMT | \
+   CPU_FTR_COHERENT_ICACHE | \
+   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
+   CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
+   CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
+   CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
+   CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
+   CPU_FTR_ARCH_31)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
@@ -486,14 +497,14 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \
 CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
 #else
 #define CPU_FTRS_POSSIBLE  \
(CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \
 CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \
 CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \
 CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \
-CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2)
+CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10)
 #endif /* CONFIG_CPU_LITTLE_ENDIAN */
 #endif
 #else
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index cf2a08bfd5cd..f4ac25d4df05 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -122,6 +122,7 @@
 #define MMU_FTRS_POWER7MMU_FTRS_POWER6
 #define MMU_FTRS_POWER8MMU_FTRS_POWER6
 #define MMU_FTRS_POWER9MMU_FTRS_POWER6
+#define MMU_FTRS_POWER10   MMU_FTRS_POWER6
 #define MMU_FTRS_CELL  MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
MMU_FTR_CI_LARGE_PAGE
 #define MMU_FTRS_PA6T  MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 94e3fd54f2c8..324a13351749 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -117,6 +117,7 @@ extern int of_read_drc_info_cell(struct property **prop,
 #define OV1_PPC_2_07   0x01/* set if we support PowerPC 2.07 */
 
 #define OV1_PPC_3_00   0x80/* set if we support PowerPC 3.00 */
+#define OV1_PPC_3_10x40/* set if we support PowerPC 
3.1 */
 
 /* Option vector 2: Open Firmware options supported */
 #define OV2_REAL_MODE  0x20/* set if we want OF in real mode */
diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index a460298c7ddb..f3730cf904fa 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -91,10 +91,15 @@ _GLOBAL(__restore_cpu_power8)
mtlrr11
blr
 
+_GLOBAL(__setup_cpu_power10)
+   mflrr11
+   bl  __init_FSCR_P10
+   b   1f
+
 _GLOBAL(__setup_cpu_power9)
mflrr11
bl  __init_FSCR
-   bl  __init_PMU
+1: bl  __init_PMU
bl  __init_hvmode_206
mtlrr11
beqlr
@@ -116,10 +121,15 @@ _GLOBAL(__setup_cpu_power9)
mtlrr11
blr
 
+_GLOBAL(__restore_cpu_power10)
+   mflrr11
+   bl  __init_FSCR_P10
+   b   1f
+
 _GLOBAL(__restore_cpu_power9)
mflrr11
bl  __init_FSCR
-   bl  __init_PMU

[PATCH v3 6/7] powerpc/dt_cpu_ftrs: Add MMA feature

2020-05-20 Thread Alistair Popple

Matrix multiple assist (MMA) is a new feature added to ISAv3.1 and
POWER10. Support on powernv can be selected via a firmware CPU device
tree feature which enables it via a PCR bit.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/include/asm/reg.h|  3 ++-
 arch/powerpc/kernel/dt_cpu_ftrs.c | 17 -
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index dd20af367b57..88e6c78100d9 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -481,7 +481,8 @@
 #define   PCR_VEC_DIS  (__MASK(63-0))  /* Vec. disable (bit NA since POWER8) */
 #define   PCR_VSX_DIS  (__MASK(63-1))  /* VSX disable (bit NA since POWER8) */
 #define   PCR_TM_DIS   (__MASK(63-2))  /* Trans. memory disable (POWER8) */
-#define   PCR_HIGH_BITS(PCR_VEC_DIS | PCR_VSX_DIS | PCR_TM_DIS)
+#define   PCR_MMA_DIS  (__MASK(63-3)) /* Matrix-Multiply Accelerator */
+#define   PCR_HIGH_BITS(PCR_MMA_DIS | PCR_VEC_DIS | PCR_VSX_DIS | 
PCR_TM_DIS)
 /*
  * These bits are used in the function kvmppc_set_arch_compat() to specify and
  * determine both the compatibility level which we want to emulate and the
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 93c340906aad..0a41fce34165 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -75,6 +75,7 @@ static struct {
u64 lpcr_clear;
u64 hfscr;
u64 fscr;
+   u64 pcr;
 } system_registers;
 
 static void (*init_pmu_registers)(void);
@@ -102,7 +103,7 @@ static void __restore_cpu_cpufeatures(void)
if (hv_mode) {
mtspr(SPRN_LPID, 0);
mtspr(SPRN_HFSCR, system_registers.hfscr);
-   mtspr(SPRN_PCR, PCR_MASK);
+   mtspr(SPRN_PCR, system_registers.pcr);
}
mtspr(SPRN_FSCR, system_registers.fscr);
 
@@ -555,6 +556,18 @@ static int __init feat_enable_large_ci(struct 
dt_cpu_feature *f)
return 1;
 }
 
+static int __init feat_enable_mma(struct dt_cpu_feature *f)
+{
+   u64 pcr;
+
+   feat_enable(f);
+   pcr = mfspr(SPRN_PCR);
+   pcr &= ~PCR_MMA_DIS;
+   mtspr(SPRN_PCR, pcr);
+
+   return 1;
+}
+
 struct dt_cpu_feature_match {
const char *name;
int (*enable)(struct dt_cpu_feature *f);
@@ -629,6 +642,7 @@ static struct dt_cpu_feature_match __initdata
{"vector-binary16", feat_enable, 0},
{"wait-v3", feat_enable, 0},
{"prefix-instructions", feat_enable, 0},
+   {"matrix-multiply-assist", feat_enable_mma, 0},
 };
 
 static bool __initdata using_dt_cpu_ftrs;
@@ -779,6 +793,7 @@ static void __init cpufeatures_setup_finished(void)
system_registers.lpcr = mfspr(SPRN_LPCR);
system_registers.hfscr = mfspr(SPRN_HFSCR);
system_registers.fscr = mfspr(SPRN_FSCR);
+   system_registers.pcr = mfspr(SPRN_PCR);
 
pr_info("final cpu/mmu features = 0x%016lx 0x%08x\n",
cur_cpu_spec->cpu_features, cur_cpu_spec->mmu_features);
-- 
2.20.1

[PATCH v3 5/7] powerpc/dt_cpu_ftrs: Enable Prefixed Instructions

2020-05-20 Thread Alistair Popple

Prefix instructions have their own FSCR bit which needs to be enabled
via a CPU feature. The kernel will save the FSCR for problem state but
it needs to be enabled initially.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 677190f70cac..93c340906aad 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -628,6 +628,7 @@ static struct dt_cpu_feature_match __initdata
{"vector-binary128", feat_enable, 0},
{"vector-binary16", feat_enable, 0},
{"wait-v3", feat_enable, 0},
+   {"prefix-instructions", feat_enable, 0},
 };
 
 static bool __initdata using_dt_cpu_ftrs;
-- 
2.20.1

[PATCH v3 4/7] powerpc/dt_cpu_ftrs: Set current thread fscr bits

2020-05-20 Thread Alistair Popple

Setting the FSCR bit directly in the SPR only sets it during the initial
boot and early init of the kernel but not for the init process or any
subsequent kthreads. This is because the thread_struct for those is
copied from the current thread_struct setup at boot which doesn't
reflect any changes made to the FSCR during cpu feature detection. This
patch ensures the current thread state is updated to match once feature
detection is complete.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/kernel/dt_cpu_ftrs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index b5e21264d168..677190f70cac 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -170,6 +170,7 @@ static int __init feat_try_enable_unknown(struct 
dt_cpu_feature *f)
u64 fscr = mfspr(SPRN_FSCR);
fscr |= 1UL << f->fscr_bit_nr;
mtspr(SPRN_FSCR, fscr);
+   current->thread.fscr |= 1UL << f->fscr_bit_nr;
} else {
/* Does not have a known recipe */
return 0;
@@ -205,6 +206,7 @@ static int __init feat_enable(struct dt_cpu_feature *f)
u64 fscr = mfspr(SPRN_FSCR);
fscr |= 1UL << f->fscr_bit_nr;
mtspr(SPRN_FSCR, fscr);
+   current->thread.fscr |= 1UL << f->fscr_bit_nr;
}
}
 
-- 
2.20.1

[PATCH v3 3/7] powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected

2020-05-20 Thread Alistair Popple

On powernv hardware support for ISAv3.1 is advertised via a cpu feature
bit in the device tree. This patch enables the associated HWCAP bit if
the device tree indicates ISAv3.1 is available.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/kernel/dt_cpu_ftrs.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 36bc0d5c4f3a..b5e21264d168 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -26,6 +26,7 @@
 /* Device-tree visible constants follow */
 #define ISA_V2_07B  2070
 #define ISA_V3_0B   3000
+#define ISA_V3_13100
 
 #define USABLE_PR   (1U << 0)
 #define USABLE_OS   (1U << 1)
@@ -654,6 +655,11 @@ static void __init cpufeatures_setup_start(u32 isa)
cur_cpu_spec->cpu_features |= CPU_FTR_ARCH_300;
cur_cpu_spec->cpu_user_features2 |= PPC_FEATURE2_ARCH_3_00;
}
+
+   if (isa >= 3100) {
+   cur_cpu_spec->cpu_features |= CPU_FTR_ARCH_31;
+   cur_cpu_spec->cpu_user_features2 |= PPC_FEATURE2_ARCH_3_1;
+   }
 }
 
 static bool __init cpufeatures_process_feature(struct dt_cpu_feature *f)
-- 
2.20.1

[PATCH v3 2/7] powerpc: Add support for ISA v3.1

2020-05-20 Thread Alistair Popple

Newer ISA versions are enabled by clearing all bits in the PCR
associated with previous versions of the ISA. Enable ISA v3.1 support
by updating the PCR mask to include ISA v3.0. This ensures all PCR
bits corresponding to earlier architecture versions get cleared
thereby enabling ISA v3.1 if supported by the hardware.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/include/asm/cputable.h | 1 +
 arch/powerpc/include/asm/reg.h  | 3 ++-
 arch/powerpc/kvm/book3s_hv.c| 3 ---
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index c67b94f3334c..1559dbf72842 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -213,6 +213,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_TIDR
LONG_ASM_CONST(0x8000)
 #define CPU_FTR_P9_TLBIE_ERAT_BUG  LONG_ASM_CONST(0x0001)
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
+#define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 054f8a71d686..dd20af367b57 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -487,10 +487,11 @@
  * determine both the compatibility level which we want to emulate and the
  * compatibility level which the host is capable of emulating.
  */
+#define   PCR_ARCH_300 0x10/* Architecture 3.00 */
 #define   PCR_ARCH_207 0x8 /* Architecture 2.07 */
 #define   PCR_ARCH_206 0x4 /* Architecture 2.06 */
 #define   PCR_ARCH_205 0x2 /* Architecture 2.05 */
-#define   PCR_LOW_BITS (PCR_ARCH_207 | PCR_ARCH_206 | PCR_ARCH_205)
+#define   PCR_LOW_BITS (PCR_ARCH_207 | PCR_ARCH_206 | PCR_ARCH_205 | 
PCR_ARCH_300)
 #define   PCR_MASK ~(PCR_HIGH_BITS | PCR_LOW_BITS) /* PCR Reserved Bits */
 #defineSPRN_HEIR   0x153   /* Hypervisor Emulated Instruction 
Register */
 #define SPRN_TLBINDEXR 0x154   /* P7 TLB control register */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index db07199f0977..a0cf17597838 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -344,9 +344,6 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 
pvr)
vcpu->arch.pvr = pvr;
 }
 
-/* Dummy value used in computing PCR value below */
-#define PCR_ARCH_300   (PCR_ARCH_207 << 1)
-
 static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
 {
unsigned long host_pcr_bit = 0, guest_pcr_bit = 0;
-- 
2.20.1

[PATCH v3 1/7] powerpc: Add new HWCAP bits

2020-05-20 Thread Alistair Popple

POWER10 introduces two new architectural features - ISAv3.1 and matrix
multiply assist (MMA) instructions. Userspace detects the presence
of these features via two HWCAP bits introduced in this patch. These
bits have been agreed to by the compiler and binutils team.

According to ISAv3.1 MMA is an optional feature and software that makes
use of it should first check for availability via this HWCAP bit and use
alternate code paths if unavailable.

Signed-off-by: Alistair Popple 
Tested-by: Michael Neuling 
---
 arch/powerpc/include/uapi/asm/cputable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/cputable.h 
b/arch/powerpc/include/uapi/asm/cputable.h
index 540592034740..731b97dc2d15 100644
--- a/arch/powerpc/include/uapi/asm/cputable.h
+++ b/arch/powerpc/include/uapi/asm/cputable.h
@@ -50,6 +50,8 @@
 #define PPC_FEATURE2_DARN  0x0020 /* darn random number insn */
 #define PPC_FEATURE2_SCV   0x0010 /* scv syscall */
 #define PPC_FEATURE2_HTM_NO_SUSPEND0x0008 /* TM w/out suspended state 
*/
+#define PPC_FEATURE2_ARCH_3_1  0x0004 /* ISA 3.1 */
+#define PPC_FEATURE2_MMA   0x0002 /* Matrix Multiply Assist */
 
 /*
  * IMPORTANT!
-- 
2.20.1

[PATCH v3 0/7] Base support for POWER10

2020-05-20 Thread Alistair Popple

This series brings together several previously posted patches required for
POWER10 support and introduces a new patch enabling POWER10 architected
mode to enable booting as a POWER10 pseries guest.

It includes support for enabling facilities related to MMA and prefix
instructions.

Changes from v2:
  - s/accumulate/assist/
  - Updated commit messages

Changes from v1:
  - Two bug-fixes to enable prefix and MMA on pseries
  - Minor updates to commit message wording
  - Fixes a build error when CONFIG_KVM_BOOK3S_64_HV is enabled

Alistair Popple (7):
  powerpc: Add new HWCAP bits
  powerpc: Add support for ISA v3.1
  powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
  powerpc/dt_cpu_ftrs: Set current thread fscr bits
  powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
  powerpc/dt_cpu_ftrs: Add MMA feature
  powerpc: Add POWER10 architected mode

 arch/powerpc/include/asm/cputable.h  | 16 +++--
 arch/powerpc/include/asm/mmu.h   |  1 +
 arch/powerpc/include/asm/prom.h  |  1 +
 arch/powerpc/include/asm/reg.h   |  6 +++--
 arch/powerpc/include/uapi/asm/cputable.h |  2 ++
 arch/powerpc/kernel/cpu_setup_power.S| 20 ++--
 arch/powerpc/kernel/cputable.c   | 30 
 arch/powerpc/kernel/dt_cpu_ftrs.c| 26 +++-
 arch/powerpc/kernel/prom_init.c  | 12 --
 arch/powerpc/kvm/book3s_hv.c |  3 ---
 10 files changed, 105 insertions(+), 12 deletions(-)

-- 
2.20.1

Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()

2020-05-20 Thread Alexander Duyck

On Wed, May 20, 2020 at 11:27 AM Daniel Jordan
 wrote:
>
> Deferred struct page init is a significant bottleneck in kernel boot.
> Optimizing it maximizes availability for large-memory systems and allows
> spinning up short-lived VMs as needed without having to leave them
> running.  It also benefits bare metal machines hosting VMs that are
> sensitive to downtime.  In projects such as VMM Fast Restart[1], where
> guest state is preserved across kexec reboot, it helps prevent
> application and network timeouts in the guests.
>
> Multithread to take full advantage of system memory bandwidth.
>
> The maximum number of threads is capped at the number of CPUs on the
> node because speedups always improve with additional threads on every
> system tested, and at this phase of boot, the system is otherwise idle
> and waiting on page init to finish.
>
> Helper threads operate on section-aligned ranges to both avoid false
> sharing when setting the pageblock's migrate type and to avoid accessing
> uninitialized buddy pages, though max order alignment is enough for the
> latter.
>
> The minimum chunk size is also a section.  There was benefit to using
> multiple threads even on relatively small memory (1G) systems, and this
> is the smallest size that the alignment allows.
>
> The time (milliseconds) is the slowest node to initialize since boot
> blocks until all nodes finish.  intel_pstate is loaded in active mode
> without hwp and with turbo enabled, and intel_idle is active as well.
>
> Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz (Skylake, bare metal)
>   2 nodes * 26 cores * 2 threads = 104 CPUs
>   384G/node = 768G memory
>
>kernel boot deferred init
>
> node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
>   (  0) --   4078.0 (  9.0) --   1779.0 (  8.7)
>2% (  1)   1.4%   4021.3 (  2.9)   3.4%   1717.7 (  7.8)
>   12% (  6)  35.1%   2644.7 ( 35.3)  80.8%341.0 ( 35.5)
>   25% ( 13)  38.7%   2498.0 ( 34.2)  89.1%193.3 ( 32.3)
>   37% ( 19)  39.1%   2482.0 ( 25.2)  90.1%175.3 ( 31.7)
>   50% ( 26)  38.8%   2495.0 (  8.7)  89.1%193.7 (  3.5)
>   75% ( 39)  39.2%   2478.0 ( 21.0)  90.3%172.7 ( 26.7)
>  100% ( 52)  40.0%   2448.0 (  2.0)  91.9%143.3 (  1.5)
>
> Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz (Broadwell, bare metal)
>   1 node * 16 cores * 2 threads = 32 CPUs
>   192G/node = 192G memory
>
>kernel boot deferred init
>
> node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
>   (  0) --   1996.0 ( 18.0) --   1104.3 (  6.7)
>3% (  1)   1.4%   1968.0 (  3.0)   2.7%   1074.7 (  9.0)
>   12% (  4)  40.1%   1196.0 ( 22.7)  72.4%305.3 ( 16.8)
>   25% (  8)  47.4%   1049.3 ( 17.2)  84.2%174.0 ( 10.6)
>   37% ( 12)  48.3%   1032.0 ( 14.9)  86.8%145.3 (  2.5)
>   50% ( 16)  48.9%   1020.3 (  2.5)  88.0%133.0 (  1.7)
>   75% ( 24)  49.1%   1016.3 (  8.1)  88.4%128.0 (  1.7)
>  100% ( 32)  49.4%   1009.0 (  8.5)  88.6%126.3 (  0.6)
>
> Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, bare metal)
>   2 nodes * 18 cores * 2 threads = 72 CPUs
>   128G/node = 256G memory
>
>kernel boot deferred init
>
> node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
>   (  0) --   1682.7 (  6.7) --630.0 (  4.6)
>3% (  1)   0.4%   1676.0 (  2.0)   0.7%625.3 (  3.2)
>   12% (  4)  25.8%   1249.0 (  1.0)  68.2%200.3 (  1.2)
>   25% (  9)  30.0%   1178.0 (  5.2)  79.7%128.0 (  3.5)
>   37% ( 13)  30.6%   1167.7 (  3.1)  81.3%117.7 (  1.2)
>   50% ( 18)  30.6%   1167.3 (  2.3)  81.4%117.0 (  1.0)
>   75% ( 27)  31.0%   1161.3 (  4.6)  82.5%110.0 (  6.9)
>  100% ( 36)  32.1%   1142.0 (  3.6)  85.7% 90.0 (  1.0)
>
> AMD EPYC 7551 32-Core Processor (Zen, kvm guest)
>   1 node * 8 cores * 2 threads = 16 CPUs
>   64G/node = 64G memory
>
>kernel boot deferred init
>
> node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
>   (  0) --   1003.7 ( 16.6) --243.3 (  8.1)
>6% (  1)   1.4%990.0 (  4.6)   1.2%240.3 (  1.5)
>   12% (  2)  11.4%889.3 ( 16.7)  44.5%135.0 (  3.0)
>   25% (  4)  16.8%835.3 (  9.0)

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-20 Thread Frederic Weisbecker

On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote:
> On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> > Just a head up. Repeatedly compiling kernels for a while would trigger
> > endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> > .config are in,
> 
> Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
> not seen anything like that myself. Let me go have a look.
> 
> 
> In as far as the logs are readable (they're a wrapped mess, please don't
> do that!), they contain very little useful, as is typical with IPIs :/
> 
> > [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> > flush_smp_call_function_queue+0x1fa/0x2e0

So I've tried to think of a race that could produce that and here is
the only thing I could come up with. It's a bit complicated unfortunately:

CPU 0  CPU 1
-  -

tick {
trigger_load_balance() {
raise_softirq(SCHED_SOFTIRQ);
//but nohz_flags(0) = 0
}
   kick_ilb() {
   atomic_fetch_or(, 
nohz_flags(0))
softirq() {#VMEXIT or anything that 
could stop a CPU for a while
run_rebalance_domain() {
nohz_idle_balance() {
atomic_andnot(NOHZ_KICK_MASK, nohz_flag(0))
}
 }
 }
}

// schedule
nohz_newidle_balance() {
kick_ilb() { // pick current CPU
atomic_fetch_or(, nohz_flags(0))   #VMENTER
smp_call_function_single_async() { 
smp_call_function_single_async() {
// verified csd->flags != CSD_LOCK // verified 
csd->flags != CSD_LOCK
csd->flags = CSD_LOCK  csd->flags = CSD_LOCK
//execute in place //queue and send IPI
csd->flags = 0
nohz_csd_func()
}
}
}


IPI�{
flush_smp_call_function_queue() {
csd_unlock() {
WARN_ON(csd->flags != CSD_LOCK) <-!



The root cause here would be that trigger_load_balance() unconditionally raise
the softirq. And I have to confess I'm not clear why since the softirq is
essentially a no-op when nohz_flags() is 0.

Thanks.

Re: [PATCH] soc: fsl: qe: Replace one-element array and use struct_size() helper

2020-05-20 Thread Gustavo A. R. Silva

On Wed, May 20, 2020 at 06:52:21PM -0500, Li Yang wrote:
> On Mon, May 18, 2020 at 5:57 PM Kees Cook  wrote:
> >
> > On Mon, May 18, 2020 at 05:19:04PM -0500, Gustavo A. R. Silva wrote:
> > > The current codebase makes use of one-element arrays in the following
> > > form:
> > >
> > > struct something {
> > > int length;
> > > u8 data[1];
> > > };
> > >
> > > struct something *instance;
> > >
> > > instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL);
> > > instance->length = size;
> > > memcpy(instance->data, source, size);
> > >
> > > but the preferred mechanism to declare variable-length types such as
> > > these ones is a flexible array member[1][2], introduced in C99:
> > >
> > > struct foo {
> > > int stuff;
> > > struct boo array[];
> > > };
> > >
> > > By making use of the mechanism above, we will get a compiler warning
> > > in case the flexible array does not occur last in the structure, which
> > > will help us prevent some kind of undefined behavior bugs from being
> > > inadvertently introduced[3] to the codebase from now on. So, replace
> > > the one-element array with a flexible-array member.
> > >
> > > Also, make use of the new struct_size() helper to properly calculate the
> > > size of struct qe_firmware.
> > >
> > > This issue was found with the help of Coccinelle and, audited and fixed
> > > _manually_.
> > >
> > > [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> > > [2] https://github.com/KSPP/linux/issues/21
> > > [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
> > >
> > > Signed-off-by: Gustavo A. R. Silva 
> > > ---
> > >  drivers/soc/fsl/qe/qe.c | 4 ++--
> > >  include/soc/fsl/qe/qe.h | 2 +-
> > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
> > > index 447146861c2c1..2df20d6f85fa4 100644
> > > --- a/drivers/soc/fsl/qe/qe.c
> > > +++ b/drivers/soc/fsl/qe/qe.c
> > > @@ -448,7 +448,7 @@ int qe_upload_firmware(const struct qe_firmware 
> > > *firmware)
> > >   unsigned int i;
> > >   unsigned int j;
> > >   u32 crc;
> > > - size_t calc_size = sizeof(struct qe_firmware);
> > > + size_t calc_size;
> > >   size_t length;
> > >   const struct qe_header *hdr;
> > >
> > > @@ -480,7 +480,7 @@ int qe_upload_firmware(const struct qe_firmware 
> > > *firmware)
> > >   }
> > >
> > >   /* Validate the length and check if there's a CRC */
> > > - calc_size += (firmware->count - 1) * sizeof(struct qe_microcode);
> > > + calc_size = struct_size(firmware, microcode, firmware->count);
> > >
> > >   for (i = 0; i < firmware->count; i++)
> > >   /*
> > > diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h
> > > index e282ac01ec081..3feddfec9f87d 100644
> > > --- a/include/soc/fsl/qe/qe.h
> > > +++ b/include/soc/fsl/qe/qe.h
> > > @@ -307,7 +307,7 @@ struct qe_firmware {
> > >   u8 revision;/* The microcode version revision */
> > >   u8 padding; /* Reserved, for alignment */
> > >   u8 reserved[4]; /* Reserved, for future expansion */
> > > - } __attribute__ ((packed)) microcode[1];
> > > + } __packed microcode[];
> > >   /* All microcode binaries should be located here */
> > >   /* CRC32 should be located here, after the microcode binaries */
> > >  } __attribute__ ((packed));
> > > --
> > > 2.26.2
> > >
> >
> > Hm, looking at this code, I see a few other things that need to be
> > fixed:
> >
> > 1) drivers/tty/serial/ucc_uart.c does not do a be32_to_cpu() conversion
> >on the length test (understandably, a little-endian system has never run
> >this code since it's ppc specific), but it's still wrong:
> >
> > if (firmware->header.length != fw->size) {
> >
> >compare to the firmware loader:
> >
> > length = be32_to_cpu(hdr->length);
> >
> > 2) drivers/soc/fsl/qe/qe.c does not perform bounds checking on the
> >per-microcode offsets, so the uploader might send data outside the
> >firmware buffer. Perhaps:
> 
> We do validate the CRC for each microcode, it is unlikely the CRC
> check can pass if the offset or length is not correct.  But you are
> probably right that it will be safer to check the boundary and fail
> quicker before we actually start the CRC check.  Will you come up with
> a formal patch or you want us to deal with it?
> 

Li,

I will send a proper patch for this.

Thanks
--
Gustavo

> >
> >
> > diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
> > index 447146861c2c..c4e0bc452f03 100644
> > --- a/drivers/soc/fsl/qe/qe.c
> > +++ b/drivers/soc/fsl/qe/qe.c
> > @@ -451,6 +451,7 @@ int qe_upload_firmware(const struct qe_firmware 
> > *firmware)
> > size_t calc_size = sizeof(struct qe_firmware);
> > size_t length;
> > const struct qe_header *hdr;
> > +   void *firmware_end;
> >
> > if (!firmware) {
> > printk(KERN_E

Re: [PATCH] soc: fsl: qe: Replace one-element array and use struct_size() helper

2020-05-20 Thread Li Yang

On Mon, May 18, 2020 at 5:57 PM Kees Cook  wrote:
>
> On Mon, May 18, 2020 at 05:19:04PM -0500, Gustavo A. R. Silva wrote:
> > The current codebase makes use of one-element arrays in the following
> > form:
> >
> > struct something {
> > int length;
> > u8 data[1];
> > };
> >
> > struct something *instance;
> >
> > instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL);
> > instance->length = size;
> > memcpy(instance->data, source, size);
> >
> > but the preferred mechanism to declare variable-length types such as
> > these ones is a flexible array member[1][2], introduced in C99:
> >
> > struct foo {
> > int stuff;
> > struct boo array[];
> > };
> >
> > By making use of the mechanism above, we will get a compiler warning
> > in case the flexible array does not occur last in the structure, which
> > will help us prevent some kind of undefined behavior bugs from being
> > inadvertently introduced[3] to the codebase from now on. So, replace
> > the one-element array with a flexible-array member.
> >
> > Also, make use of the new struct_size() helper to properly calculate the
> > size of struct qe_firmware.
> >
> > This issue was found with the help of Coccinelle and, audited and fixed
> > _manually_.
> >
> > [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> > [2] https://github.com/KSPP/linux/issues/21
> > [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
> >
> > Signed-off-by: Gustavo A. R. Silva 
> > ---
> >  drivers/soc/fsl/qe/qe.c | 4 ++--
> >  include/soc/fsl/qe/qe.h | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
> > index 447146861c2c1..2df20d6f85fa4 100644
> > --- a/drivers/soc/fsl/qe/qe.c
> > +++ b/drivers/soc/fsl/qe/qe.c
> > @@ -448,7 +448,7 @@ int qe_upload_firmware(const struct qe_firmware 
> > *firmware)
> >   unsigned int i;
> >   unsigned int j;
> >   u32 crc;
> > - size_t calc_size = sizeof(struct qe_firmware);
> > + size_t calc_size;
> >   size_t length;
> >   const struct qe_header *hdr;
> >
> > @@ -480,7 +480,7 @@ int qe_upload_firmware(const struct qe_firmware 
> > *firmware)
> >   }
> >
> >   /* Validate the length and check if there's a CRC */
> > - calc_size += (firmware->count - 1) * sizeof(struct qe_microcode);
> > + calc_size = struct_size(firmware, microcode, firmware->count);
> >
> >   for (i = 0; i < firmware->count; i++)
> >   /*
> > diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h
> > index e282ac01ec081..3feddfec9f87d 100644
> > --- a/include/soc/fsl/qe/qe.h
> > +++ b/include/soc/fsl/qe/qe.h
> > @@ -307,7 +307,7 @@ struct qe_firmware {
> >   u8 revision;/* The microcode version revision */
> >   u8 padding; /* Reserved, for alignment */
> >   u8 reserved[4]; /* Reserved, for future expansion */
> > - } __attribute__ ((packed)) microcode[1];
> > + } __packed microcode[];
> >   /* All microcode binaries should be located here */
> >   /* CRC32 should be located here, after the microcode binaries */
> >  } __attribute__ ((packed));
> > --
> > 2.26.2
> >
>
> Hm, looking at this code, I see a few other things that need to be
> fixed:
>
> 1) drivers/tty/serial/ucc_uart.c does not do a be32_to_cpu() conversion
>on the length test (understandably, a little-endian system has never run
>this code since it's ppc specific), but it's still wrong:
>
> if (firmware->header.length != fw->size) {
>
>compare to the firmware loader:
>
> length = be32_to_cpu(hdr->length);
>
> 2) drivers/soc/fsl/qe/qe.c does not perform bounds checking on the
>per-microcode offsets, so the uploader might send data outside the
>firmware buffer. Perhaps:

We do validate the CRC for each microcode, it is unlikely the CRC
check can pass if the offset or length is not correct.  But you are
probably right that it will be safer to check the boundary and fail
quicker before we actually start the CRC check.  Will you come up with
a formal patch or you want us to deal with it?

>
>
> diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
> index 447146861c2c..c4e0bc452f03 100644
> --- a/drivers/soc/fsl/qe/qe.c
> +++ b/drivers/soc/fsl/qe/qe.c
> @@ -451,6 +451,7 @@ int qe_upload_firmware(const struct qe_firmware *firmware)
> size_t calc_size = sizeof(struct qe_firmware);
> size_t length;
> const struct qe_header *hdr;
> +   void *firmware_end;
>
> if (!firmware) {
> printk(KERN_ERR "qe-firmware: invalid pointer\n");
> @@ -491,19 +492,39 @@ int qe_upload_firmware(const struct qe_firmware 
> *firmware)
> calc_size += sizeof(__be32) *
> be32_to_cpu(firmware->microcode[i].count);
>
> -   /* Validate the length */
> +   /* Validate total length */
> if (length != calc_size + sizeof(__be32)) {
>

Re: [PATCH v6 09/11] PCI: layerscape: Add EP mode support for ls1088a and ls2088a

2020-05-20 Thread Rob Herring

On Sat, 14 Mar 2020 11:30:36 +0800, Xiaowei Bao wrote:
> Add PCIe EP mode support for ls1088a and ls2088a, there are some
> difference between LS1 and LS2 platform, so refactor the code of
> the EP driver.
> 
> Signed-off-by: Xiaowei Bao 
> ---
> v2:
>  - This is a new patch for supporting the ls1088a and ls2088a platform.
> v3:
>  - Adjust the some struct assignment order in probe function.
> v4:
>  - No change.
> v5:
>  - No change.
> v6:
>  - No change.
> 
>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 72 
> +++---
>  1 file changed, 53 insertions(+), 19 deletions(-)
> 

Reviewed-by: Rob Herring

Re: [PATCH v6 07/11] PCI: layerscape: Modify the way of getting capability with different PEX

2020-05-20 Thread Rob Herring

On Sat, 14 Mar 2020 11:30:34 +0800, Xiaowei Bao wrote:
> The different PCIe controller in one board may be have different
> capability of MSI or MSIX, so change the way of getting the MSI
> capability, make it more flexible.
> 
> Signed-off-by: Xiaowei Bao 
> ---
> v2:
>  - Remove the repeated assignment code.
> v3:
>  - Use ep_func msi_cap and msix_cap to decide the msi_capable and
>msix_capable of pci_epc_features struct.
> v4:
>  - No change.
> v5:
>  - No change.
> v6:
>  - No change.
> 
>  drivers/pci/controller/dwc/pci-layerscape-ep.c | 31 
> +++---
>  1 file changed, 23 insertions(+), 8 deletions(-)
> 

Reviewed-by: Rob Herring

Re: [PATCH v6 04/11] PCI: designware-ep: Modify MSI and MSIX CAP way of finding

2020-05-20 Thread Rob Herring

On Sat, Mar 14, 2020 at 11:30:31AM +0800, Xiaowei Bao wrote:
> Each PF of EP device should have it's own MSI or MSIX capabitily

s/it's/its/

> struct, so create a dw_pcie_ep_func struct and remove the msi_cap
> and msix_cap to this struct from dw_pcie_ep, and manage the PFs
> with a list.
> 
> Signed-off-by: Xiaowei Bao 
> ---
> v3:
>  - This is a new patch, to fix the issue of MSI and MSIX CAP way of
>finding.
> v4:
>  - Correct some word of commit message.
> v5:
>  - No change.
> v6:
>  - Fix up the compile error.
> 
>  drivers/pci/controller/dwc/pcie-designware-ep.c | 135 
> +---
>  drivers/pci/controller/dwc/pcie-designware.h|  18 +++-
>  2 files changed, 134 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c 
> b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 933bb89..fb915f2 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -19,6 +19,19 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
>   pci_epc_linkup(epc);
>  }
>  
> +struct dw_pcie_ep_func *
> +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no)
> +{
> + struct dw_pcie_ep_func *ep_func;
> +
> + list_for_each_entry(ep_func, &ep->func_list, list) {
> + if (ep_func->func_no == func_no)
> + return ep_func;
> + }
> +
> + return NULL;
> +}
> +
>  static unsigned int dw_pcie_ep_func_select(struct dw_pcie_ep *ep, u8 func_no)
>  {
>   unsigned int func_offset = 0;
> @@ -59,6 +72,47 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum 
> pci_barno bar)
>   __dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
>  }
>  
> +static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie_ep *ep, u8 func_no,
> + u8 cap_ptr, u8 cap)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + unsigned int func_offset = 0;
> + u8 cap_id, next_cap_ptr;
> + u16 reg;
> +
> + if (!cap_ptr)
> + return 0;
> +
> + func_offset = dw_pcie_ep_func_select(ep, func_no);
> +
> + reg = dw_pcie_readw_dbi(pci, func_offset + cap_ptr);
> + cap_id = (reg & 0x00ff);
> +
> + if (cap_id > PCI_CAP_ID_MAX)
> + return 0;
> +
> + if (cap_id == cap)
> + return cap_ptr;
> +
> + next_cap_ptr = (reg & 0xff00) >> 8;
> + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap);
> +}
> +
> +static u8 dw_pcie_ep_find_capability(struct dw_pcie_ep *ep, u8 func_no, u8 
> cap)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + unsigned int func_offset = 0;
> + u8 next_cap_ptr;
> + u16 reg;
> +
> + func_offset = dw_pcie_ep_func_select(ep, func_no);
> +
> + reg = dw_pcie_readw_dbi(pci, func_offset + PCI_CAPABILITY_LIST);
> + next_cap_ptr = (reg & 0x00ff);
> +
> + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap);
> +}
> +
>  static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
>  struct pci_epf_header *hdr)
>  {
> @@ -246,13 +300,18 @@ static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 
> func_no)
>   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>   u32 val, reg;
>   unsigned int func_offset = 0;
> + struct dw_pcie_ep_func *ep_func;
>  
> - if (!ep->msi_cap)
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
> +
> + if (!ep_func->msi_cap)
>   return -EINVAL;

if (!ep_func || !ep_func->msi_cap)
  return -EINVAL;

>  
>   func_offset = dw_pcie_ep_func_select(ep, func_no);
>  
> - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
>   val = dw_pcie_readw_dbi(pci, reg);
>   if (!(val & PCI_MSI_FLAGS_ENABLE))
>   return -EINVAL;
> @@ -268,13 +327,18 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 
> func_no, u8 interrupts)
>   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>   u32 val, reg;
>   unsigned int func_offset = 0;
> + struct dw_pcie_ep_func *ep_func;
> +
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
>  
> - if (!ep->msi_cap)
> + if (!ep_func->msi_cap)
>   return -EINVAL;

Same here.

>  
>   func_offset = dw_pcie_ep_func_select(ep, func_no);
>  
> - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
>   val = dw_pcie_readw_dbi(pci, reg);
>   val &= ~PCI_MSI_FLAGS_QMASK;
>   val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK;
> @@ -291,13 +355,18 @@ static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 
> func_no)
>   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>   u32 val, reg;
>   unsigned int func_offset = 0;
> + struct dw_pcie_ep_func *ep_func;
> +
> + ep_func = dw_pcie_ep_get_func_fr

Re: [PATCH v6 01/11] PCI: designware-ep: Add multiple PFs support for DWC

2020-05-20 Thread Rob Herring

On Sat, 14 Mar 2020 11:30:28 +0800, Xiaowei Bao wrote:
> Add multiple PFs support for DWC, due to different PF have different
> config space, we use func_conf_select callback function to access
> the different PF's config space, the different chip company need to
> implement this callback function when use the DWC IP core and intend
> to support multiple PFs feature.
> 
> Signed-off-by: Xiaowei Bao 
> Acked-by: Gustavo Pimentel 
> ---
> v2:
>  - Remove duplicate redundant code.
>  - Reimplement the PF config space access way.
> v3:
>  - Integrate duplicate code for func_select.
>  - Move PCIE_ATU_FUNC_NUM(pf) (pf << 20) to ((pf) << 20).
>  - Add the comments for func_conf_select function.
> v4:
>  - Correct the commit message.
> v5:
>  - No change.
> v6:
>  - No change.
> 
>  drivers/pci/controller/dwc/pcie-designware-ep.c | 123 
> 
>  drivers/pci/controller/dwc/pcie-designware.c|  59 
>  drivers/pci/controller/dwc/pcie-designware.h|  18 +++-
>  3 files changed, 142 insertions(+), 58 deletions(-)
> 

Reviewed-by: Rob Herring

Re: [PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-20 Thread Greg Kurz

On Wed, 20 May 2020 18:51:10 +0200
Laurent Dufour  wrote:

> The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
> Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
> reserved to the Ultravisor.
> 
> However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
> context of the VM calling UV_ESM. This allows the Hypervisor to return to
> the guest without going through the Ultravisor. Thus the Secure bit of SRR1
> is not set in that particular case.
> 
> In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
> filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
> not set in that case.
> 

Why not checking vcpu->kvm->arch.secure_guest then ?

> Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
> Signed-off-by: Laurent Dufour 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 93493f0cbfe8..eb1f96cb7b72 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>   ret = kvmppc_h_svm_init_done(vcpu->kvm);
>   break;
>   case H_SVM_INIT_ABORT:
> - ret = H_UNSUPPORTED;
> - if (kvmppc_get_srr1(vcpu) & MSR_S)
> - ret = kvmppc_h_svm_init_abort(vcpu->kvm);

or at least put a comment to explain why H_SVM_INIT_ABORT
doesn't have the same sanity check as the other SVM hcalls.

> + ret = kvmppc_h_svm_init_abort(vcpu->kvm);
>   break;
>  
>   default:

Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-05-20 Thread Vaibhav Jain



Thanks for reviewing this patch Ira. My responses below:

Ira Weiny  writes:

> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>> modules and add the command family to the white list of NVDIMM command
>> sets. Also advertise support for ND_CMD_CALL for the dimm
>> command mask and implement necessary scaffolding in the module to
>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>> 
>> The layout of the PDSM request as we expect from libnvdimm/libndctl is
>> described in newly introduced uapi header 'papr_scm_pdsm.h' which
>> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
>> to communicate the PDSM request via member
>> 'nd_pkg_papr_scm->nd_command' and size of payload that need to be
>> sent/received for servicing the PDSM.
>> 
>> A new function is_cmd_valid() is implemented that reads the args to
>> papr_scm_ndctl() and performs sanity tests on them. A new function
>> papr_scm_service_pdsm() is introduced and is called from
>> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
>> command from libnvdimm.
>> 
>> Cc: Dan Williams 
>> Cc: Michael Ellerman 
>> Cc: "Aneesh Kumar K . V" 
>> Signed-off-by: Vaibhav Jain 
>> ---
>> Changelog:
>> 
>> Resend:
>> * None
>> 
>> v6..v7 :
>> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>>   [Mpe].
>> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
>> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>>   [Mpe].
>> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>> 
>> v5..v6 :
>> * Changed the usage of the term DSM to PDSM to distinguish it from the
>>   ACPI term [ Dan Williams ]
>> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
>>   to reflect the new terminology.
>> * Updated the patch description and title to reflect the new terminology.
>> * Squashed patch to introduce new command family in 'ndctl.h' with
>>   this patch [ Dan Williams ]
>> * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0
>>   [ Dan Williams ]
>> * Removed redundant license text from the papr_scm_psdm.h file.
>>   [ Dan Williams ]
>> * s/envelop/envelope/ at various places [ Dan Williams ]
>> * Added '__packed' attribute to command package header to gaurd
>>   against different compiler adding paddings between the fields.
>>   [ Dan Williams]
>> * Converted various pr_debug to dev_debug [ Dan Williams ]
>> 
>> v4..v5 :
>> * None
>> 
>> v3..v4 :
>> * None
>> 
>> v2..v3 :
>> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>> 
>> v1..v2 :
>> * None
>> ---
>>  arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++
>>  arch/powerpc/platforms/pseries/papr_scm.c | 101 -
>>  include/uapi/linux/ndctl.h|   1 +
>>  3 files changed, 230 insertions(+), 6 deletions(-)
>>  create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
>> 
>> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h 
>> b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
>> new file mode 100644
>> index ..671693439c1c
>> --- /dev/null
>> +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
>> @@ -0,0 +1,134 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> +/*
>> + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl
>> + *
>> + * (C) Copyright IBM 2020
>> + *
>> + * Author: Vaibhav Jain 
>> + */
>> +
>> +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
>> +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
>> +
>> +#include 
>> +
>> +/*
>> + * PDSM Envelope:
>> + *
>> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
>> + * 'envelopes' which consists of a header and user-defined payload sections.
>> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
>> + * payload following it and offset of which relative to the struct is 
>> provided
>> + * by 'nd_pdsm_cmd_pkg.payload_offset'. *
>> + *
>> + *  +-+-+---+
>> + *  |   64-Bytes  |   8-Bytes   |   Max 184-Bytes   |
>> + *  +-+-+---+
>> + *  |   nd_pdsm_cmd_pkg |   |
>> + *  |-+ |   |
>> + *  |  nd_cmd_pkg | |   |
>> + *  +-+-+---+
>> + *  | nd_family   | |   |
>> + *  | nd_size_out | cmd_status  |   |
>> + *  | nd_size_in  | payload_version |  PAYLOAD  |
>> + *  | nd_command  | payload_offset ->   |
>> + *  | nd_fw_size  | |   |
>> + *  +-+-+---+
>> + *
>> + * PDSM He

Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-20 Thread Aneesh Kumar K.V

Dan Williams  writes:

> On Tue, May 19, 2020 at 6:53 AM Aneesh Kumar K.V
>  wrote:
>>
>> Dan Williams  writes:
>>
>> > On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V
>> >  wrote:
>>
>> ...
>>
>> >> Applications using new instructions will behave as expected when running
>> >> on P8 and P9. Only future hardware will differentiate between 'dcbf' and
>> >> 'dcbfps'
>> >
>> > Right, this is the problem. Applications using new instructions behave
>> > as expected, the kernel has been shipping of_pmem and papr_scm for
>> > several cycles now, you're saying that the DAX applications written
>> > against those platforms are going to be broken on P8 and P9?
>>
>> The expecation is that both kernel and userspace would get upgraded to
>> use the new instruction before actual persistent memory devices are
>> made available.
>>
>> >
>> >> > I'm thinking the kernel
>> >> > should go as far as to disable DAX operation by default on new
>> >> > hardware until userspace asserts that it is prepared to switch to the
>> >> > new implementation. Is there any other way to ensure the forward
>> >> > compatibility of deployed ppc64 DAX applications?
>> >>
>> >> AFAIU there is no released persistent memory hardware on ppc64 platform
>> >> and we need to make sure before applications get enabled to use these
>> >> persistent memory devices, they should switch to use the new
>> >> instruction?
>> >
>> > Right, I want the kernel to offer some level of safety here because
>> > everything you are describing sounds like a flag day conversion. Am I
>> > misreading? Is there some other gate that prevents existing users of
>> > of_pmem and papr_scm from having their expectations violated when
>> > running on P8 / P9 hardware? Maybe there's tighter ecosystem control
>> > that I'm just not familiar with, I'm only going off the fact that the
>> > kernel has shipped a non-zero number of NVDIMM drivers that build with
>> > ARCH=ppc64 for several cycles.
>>
>> If we are looking at adding changes to kernel that will prevent a kernel
>> from running on newer hardware in a specific case, we could as well take
>> the changes to get the kernel use the newer instructions right?
>
> Oh, no, I'm not talking about stopping the kernel from running. I'm
> simply recommending that support for MAP_SYNC mappings (userspace
> managed flushing) be disabled by default on PPC with either a
> compile-time or run-time default to assert that userspace has been
> audited for legacy applications or that the platform owner is
> otherwise willing to take the risk.
>
>> But I agree with your concern that if we have older kernel/applications
>> that continue to use `dcbf` on future hardware we will end up
>> having issues w.r.t powerfail consistency. The plan is what you outlined
>> above as tighter ecosystem control. Considering we don't have a pmem
>> device generally available, we get both kernel and userspace upgraded
>> to use these new instructions before such a device is made available.
>
> Ok, I think a compile time kernel option with a runtime override
> satisfies my concern. Does that work for you?

something like below? But this still won't handle devdax mmap right?

diff --git a/arch/arm64/include/asm/libnvdimm.h 
b/arch/arm64/include/asm/libnvdimm.h
new file mode 100644
index ..aee697a72537
--- /dev/null
+++ b/arch/arm64/include/asm/libnvdimm.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/include/asm/libnvdimm.h 
b/arch/powerpc/include/asm/libnvdimm.h
new file mode 100644
index ..da479200bfb8
--- /dev/null
+++ b/arch/powerpc/include/asm/libnvdimm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ARCH_LIBNVDIMM_H__
+#define __ARCH_LIBNVDIMM_H__
+
+#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm
+extern bool arch_disable_sync_nvdimm(void);
+
+#endif /* __ARCH_LIBNVDIMM_H__  */
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 0666a8d29596..3ce4fb4f167b 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,6 +9,8 @@
 
 #include 
 
+
+static bool sync_fault = IS_ENABLED(CONFIG_PPC_NVDIMM_SYNC_FAULT);
 /*
  * CONFIG_ARCH_HAS_PMEM_API symbols
  */
@@ -57,3 +59,16 @@ void memcpy_page_flushcache(char *to, struct page *page, 
size_t offset,
memcpy_flushcache(to, page_to_virt(page) + offset, len);
 }
 EXPORT_SYMBOL(memcpy_page_flushcache);
+
+bool arch_disable_sync_nvdimm(void)
+{
+   return !sync_fault;
+}
+
+static int __init parse_sync_fault(char *p)
+{
+   sync_fault = true;
+   return 0;
+}
+early_param("enable_sync_fault", parse_sync_fault);
+
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..dde11d75a746 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -383,6 +383,15 @@ config PPC_KUEP
 
  If

Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-20 Thread Greg Kurz

On Wed, 20 May 2020 19:43:08 +0200
Laurent Dufour  wrote:

> The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
> Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
> reserved to the Ultravisor.
> 
> However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
> context of the VM calling UV_ESM. This allows the Hypervisor to return to
> the guest without going through the Ultravisor. Thus the Secure bit of SRR1
> is not set in that particular case.
> 
> In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
> filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
> not set in that case.
> 
> Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
> Signed-off-by: Laurent Dufour 
> ---

Reviewed-by: Greg Kurz 

>  arch/powerpc/kvm/book3s_hv.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 93493f0cbfe8..6ad1a3b14300 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>   ret = kvmppc_h_svm_init_done(vcpu->kvm);
>   break;
>   case H_SVM_INIT_ABORT:
> - ret = H_UNSUPPORTED;
> - if (kvmppc_get_srr1(vcpu) & MSR_S)
> - ret = kvmppc_h_svm_init_abort(vcpu->kvm);
> + /*
> +  * Even if that call is made by the Ultravisor, the SSR1 value
> +  * is the guest context one, with the secure bit clear as it has
> +  * not yet been secured. So we can't check it here.
> +  */
> + ret = kvmppc_h_svm_init_abort(vcpu->kvm);
>   break;
>  
>   default:

[PATCH v2 0/7] padata: parallelize deferred page init

2020-05-20 Thread Daniel Jordan

Deferred struct page init is a bottleneck in kernel boot--the biggest
for us and probably others.  Optimizing it maximizes availability for
large-memory systems and allows spinning up short-lived VMs as needed
without having to leave them running.  It also benefits bare metal
machines hosting VMs that are sensitive to downtime.  In projects such
as VMM Fast Restart[1], where guest state is preserved across kexec
reboot, it helps prevent application and network timeouts in the guests.

So, multithread deferred init to take full advantage of system memory
bandwidth.

Extend padata, a framework that handles many parallel singlethreaded
jobs, to handle multithreaded jobs as well by adding support for
splitting up the work evenly, specifying a minimum amount of work that's
appropriate for one helper thread to do, load balancing between helpers,
and coordinating them.  More documentation in patches 4 and 7.

This series is the first step in a project to address other memory
proportional bottlenecks in the kernel such as pmem struct page init,
vfio page pinning, hugetlb fallocate, and munmap.  Deferred page init
doesn't require concurrency limits, resource control, or priority
adjustments like these other users will because it happens during boot
when the system is otherwise idle and waiting for page init to finish.

This has been run on a variety of x86 systems and speeds up kernel boot
by 3% to 49%, saving up to 1.6 out of 4 seconds.  Patch 5 has more
numbers.

Please review and test, and thanks to Alex, Andrew, Josh, and Pavel for
their feedback in the last version.

The powerpc and s390 lists are included in case they want to give this a
try, they had enabled this feature when it was configured per arch.

Series based on 5.7-rc6 plus these three from mmotm

  mm-call-touch_nmi_watchdog-on-max-order-boundaries-in-deferred-init.patch
  mm-initialize-deferred-pages-with-interrupts-enabled.patch
  mm-call-cond_resched-from-deferred_init_memmap.patch

and it's available here:

  git://oss.oracle.com/git/linux-dmjordan.git padata-mt-definit-v2
  
https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-definit-v2

and the future users and related features are available as
work-in-progress:

  git://oss.oracle.com/git/linux-dmjordan.git padata-mt-wip-v0.4
  
https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-wip-v0.4

v2:
 - Improve the problem statement (Andrew, Josh, Pavel)
 - Add T-b's to unchanged patches (Josh)
 - Fully initialize max-order blocks to avoid buddy issues (Alex)
 - Parallelize on section-aligned boundaries to avoid potential
   false sharing (Alex)
 - Return the maximum thread count from a function that architectures
   can override, with the generic version returning 1 (current
   behavior).  Override for x86 since that's the only arch this series
   has been tested on so far.  Other archs can test with more threads
   by dropping patch 6.
 - Rebase to v5.7-rc6, rerun tests

RFC v4 [2] -> v1:
 - merged with padata (Peter)
 - got rid of the 'task' nomenclature (Peter, Jon)

future work branch:
 - made lockdep-aware (Jason, Peter)
 - adjust workqueue worker priority with renice_or_cancel() (Tejun)
 - fixed undo problem in VFIO (Alex)

The remaining feedback, mainly resource control awareness (cgroup etc),
is TODO for later series.

[1] 
https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf
https://www.youtube.com/watch?v=pBsHnf93tcQ

https://lore.kernel.org/linux-mm/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/

[2] 
https://lore.kernel.org/linux-mm/20181105165558.11698-1-daniel.m.jor...@oracle.com/

Daniel Jordan (7):
  padata: remove exit routine
  padata: initialize earlier
  padata: allocate work structures for parallel jobs from a pool
  padata: add basic support for multithreaded jobs
  mm: parallelize deferred_init_memmap()
  mm: make deferred init's max threads arch-specific
  padata: document multithreaded jobs

 Documentation/core-api/padata.rst |  41 +++--
 arch/x86/mm/init_64.c |  12 ++
 include/linux/memblock.h  |   3 +
 include/linux/padata.h|  43 -
 init/main.c   |   2 +
 kernel/padata.c   | 277 --
 mm/Kconfig|   6 +-
 mm/page_alloc.c   |  67 +++-
 8 files changed, 373 insertions(+), 78 deletions(-)


base-commit: b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce
prerequisite-patch-id: 4ad522141e1119a325a9799dad2bd982fbac8b7c
prerequisite-patch-id: 169273327e56f5461101a71dfbd6b4cfd4570cf0
prerequisite-patch-id: 0f34692c8a9673d4c4f6a3545cf8ec3a2abf8620
-- 
2.26.2

[PATCH v2 1/7] padata: remove exit routine

2020-05-20 Thread Daniel Jordan

padata_driver_exit() is unnecessary because padata isn't built as a
module and doesn't exit.

padata's init routine will soon allocate memory, so getting rid of the
exit function now avoids pointless code to free it.

Signed-off-by: Daniel Jordan 
---
 kernel/padata.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index a6afa12fb75ee..835919c745266 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -1072,10 +1072,4 @@ static __init int padata_driver_init(void)
 }
 module_init(padata_driver_init);
 
-static __exit void padata_driver_exit(void)
-{
-   cpuhp_remove_multi_state(CPUHP_PADATA_DEAD);
-   cpuhp_remove_multi_state(hp_online);
-}
-module_exit(padata_driver_exit);
 #endif
-- 
2.26.2

[PATCH v2 7/7] padata: document multithreaded jobs

2020-05-20 Thread Daniel Jordan

Add Documentation for multithreaded jobs.

Signed-off-by: Daniel Jordan 
---
 Documentation/core-api/padata.rst | 41 +++
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/Documentation/core-api/padata.rst 
b/Documentation/core-api/padata.rst
index 9a24c111781d9..b7e047af993e8 100644
--- a/Documentation/core-api/padata.rst
+++ b/Documentation/core-api/padata.rst
@@ -4,23 +4,26 @@
 The padata parallel execution mechanism
 ===
 
-:Date: December 2019
+:Date: April 2020
 
 Padata is a mechanism by which the kernel can farm jobs out to be done in
-parallel on multiple CPUs while retaining their ordering.  It was developed for
-use with the IPsec code, which needs to be able to perform encryption and
-decryption on large numbers of packets without reordering those packets.  The
-crypto developers made a point of writing padata in a sufficiently general
-fashion that it could be put to other uses as well.
+parallel on multiple CPUs while optionally retaining their ordering.
 
-Usage
-=
+It was originally developed for IPsec, which needs to perform encryption and
+decryption on large numbers of packets without reordering those packets.  This
+is currently the sole consumer of padata's serialized job support.
+
+Padata also supports multithreaded jobs, splitting up the job evenly while load
+balancing and coordinating between threads.
+
+Running Serialized Jobs
+===
 
 Initializing
 
 
-The first step in using padata is to set up a padata_instance structure for
-overall control of how jobs are to be run::
+The first step in using padata to run parallel jobs is to set up a
+padata_instance structure for overall control of how jobs are to be run::
 
 #include 
 
@@ -162,6 +165,24 @@ functions that correspond to the allocation in reverse::
 It is the user's responsibility to ensure all outstanding jobs are complete
 before any of the above are called.
 
+Running Multithreaded Jobs
+==
+
+A multithreaded job has a main thread and zero or more helper threads, with the
+main thread participating in the job and then waiting until all helpers have
+finished.  padata splits the job into units called chunks, where a chunk is a
+piece of the job that one thread completes in one call to the thread function.
+
+A user has to do three things to run a multithreaded job.  First, describe the
+job by defining a padata_mt_job structure, which is explained in the Interface
+section.  This includes a pointer to the thread function, which padata will
+call each time it assigns a job chunk to a thread.  Then, define the thread
+function, which accepts three arguments, ``start``, ``end``, and ``arg``, where
+the first two delimit the range that the thread operates on and the last is a
+pointer to the job's shared state, if any.  Prepare the shared state, which is
+typically a stack-allocated structure that wraps the required data.  Last, call
+padata_do_multithreaded(), which will return once the job is finished.
+
 Interface
 =
 
-- 
2.26.2

[PATCH v2 3/7] padata: allocate work structures for parallel jobs from a pool

2020-05-20 Thread Daniel Jordan

padata allocates per-CPU, per-instance work structs for parallel jobs.
A do_parallel call assigns a job to a sequence number and hashes the
number to a CPU, where the job will eventually run using the
corresponding work.

This approach fit with how padata used to bind a job to each CPU
round-robin, makes less sense after commit bfde23ce200e6 ("padata:
unbind parallel jobs from specific CPUs") because a work isn't bound to
a particular CPU anymore, and isn't needed at all for multithreaded jobs
because they don't have sequence numbers.

Replace the per-CPU works with a preallocated pool, which allows sharing
them between existing padata users and the upcoming multithreaded user.
The pool will also facilitate setting NUMA-aware concurrency limits with
later users.

The pool is sized according to the number of possible CPUs.  With this
limit, MAX_OBJ_NUM no longer makes sense, so remove it.

If the global pool is exhausted, a parallel job is run in the current
task instead to throttle a system trying to do too much in parallel.

Signed-off-by: Daniel Jordan 
---
 include/linux/padata.h |   8 +--
 kernel/padata.c| 118 +++--
 2 files changed, 78 insertions(+), 48 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 476ecfa41f363..3bfa503503ac5 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -24,7 +24,6 @@
  * @list: List entry, to attach to the padata lists.
  * @pd: Pointer to the internal control structure.
  * @cb_cpu: Callback cpu for serializatioon.
- * @cpu: Cpu for parallelization.
  * @seq_nr: Sequence number of the parallelized data object.
  * @info: Used to pass information from the parallel to the serial function.
  * @parallel: Parallel execution function.
@@ -34,7 +33,6 @@ struct padata_priv {
struct list_headlist;
struct parallel_data*pd;
int cb_cpu;
-   int cpu;
unsigned intseq_nr;
int info;
void(*parallel)(struct padata_priv *padata);
@@ -68,15 +66,11 @@ struct padata_serial_queue {
 /**
  * struct padata_parallel_queue - The percpu padata parallel queue
  *
- * @parallel: List to wait for parallelization.
  * @reorder: List to wait for reordering after parallel processing.
- * @work: work struct for parallelization.
  * @num_obj: Number of objects that are processed by this cpu.
  */
 struct padata_parallel_queue {
-   struct padata_listparallel;
struct padata_listreorder;
-   struct work_structwork;
atomic_t  num_obj;
 };
 
@@ -111,7 +105,7 @@ struct parallel_data {
struct padata_parallel_queue__percpu *pqueue;
struct padata_serial_queue  __percpu *squeue;
atomic_trefcnt;
-   atomic_tseq_nr;
+   unsigned intseq_nr;
unsigned intprocessed;
int cpu;
struct padata_cpumask   cpumask;
diff --git a/kernel/padata.c b/kernel/padata.c
index 6f709bc0fc413..78ff9aa529204 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -32,7 +32,15 @@
 #include 
 #include 
 
-#define MAX_OBJ_NUM 1000
+struct padata_work {
+   struct work_struct  pw_work;
+   struct list_headpw_list;  /* padata_free_works linkage */
+   void*pw_data;
+};
+
+static DEFINE_SPINLOCK(padata_works_lock);
+static struct padata_work *padata_works;
+static LIST_HEAD(padata_free_works);
 
 static void padata_free_pd(struct parallel_data *pd);
 
@@ -58,30 +66,44 @@ static int padata_cpu_hash(struct parallel_data *pd, 
unsigned int seq_nr)
return padata_index_to_cpu(pd, cpu_index);
 }
 
-static void padata_parallel_worker(struct work_struct *parallel_work)
+static struct padata_work *padata_work_alloc(void)
 {
-   struct padata_parallel_queue *pqueue;
-   LIST_HEAD(local_list);
+   struct padata_work *pw;
 
-   local_bh_disable();
-   pqueue = container_of(parallel_work,
- struct padata_parallel_queue, work);
+   lockdep_assert_held(&padata_works_lock);
 
-   spin_lock(&pqueue->parallel.lock);
-   list_replace_init(&pqueue->parallel.list, &local_list);
-   spin_unlock(&pqueue->parallel.lock);
+   if (list_empty(&padata_free_works))
+   return NULL;/* No more work items allowed to be queued. */
 
-   while (!list_empty(&local_list)) {
-   struct padata_priv *padata;
+   pw = list_first_entry(&padata_free_works, struct padata_work, pw_list);
+   list_del(&pw->pw_list);
+   return pw;
+}
 
-   padata = list_entry(local_list.next,
-   struct padata_priv, list);
+static void padata_work_init(struct padata_work *pw, work_func_t work_fn,
+void *

[PATCH v2 2/7] padata: initialize earlier

2020-05-20 Thread Daniel Jordan

padata will soon initialize the system's struct pages in parallel, so it
needs to be ready by page_alloc_init_late().

The error return from padata_driver_init() triggers an initcall warning,
so add a warning to padata_init() to avoid silent failure.

Signed-off-by: Daniel Jordan 
---
 include/linux/padata.h |  6 ++
 init/main.c|  2 ++
 kernel/padata.c| 17 -
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index a0d8b41850b25..476ecfa41f363 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -164,6 +164,12 @@ struct padata_instance {
 #definePADATA_INVALID  4
 };
 
+#ifdef CONFIG_PADATA
+extern void __init padata_init(void);
+#else
+static inline void __init padata_init(void) {}
+#endif
+
 extern struct padata_instance *padata_alloc_possible(const char *name);
 extern void padata_free(struct padata_instance *pinst);
 extern struct padata_shell *padata_alloc_shell(struct padata_instance *pinst);
diff --git a/init/main.c b/init/main.c
index 03371976d3872..8ab521f7af5d2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -94,6 +94,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1482,6 +1483,7 @@ static noinline void __init kernel_init_freeable(void)
smp_init();
sched_init_smp();
 
+   padata_init();
page_alloc_init_late();
/* Initialize page ext after all struct pages are initialized. */
page_ext_init();
diff --git a/kernel/padata.c b/kernel/padata.c
index 835919c745266..6f709bc0fc413 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define MAX_OBJ_NUM 1000
 
@@ -1050,26 +1049,26 @@ void padata_free_shell(struct padata_shell *ps)
 }
 EXPORT_SYMBOL(padata_free_shell);
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-static __init int padata_driver_init(void)
+void __init padata_init(void)
 {
+#ifdef CONFIG_HOTPLUG_CPU
int ret;
 
ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "padata:online",
  padata_cpu_online, NULL);
if (ret < 0)
-   return ret;
+   goto err;
hp_online = ret;
 
ret = cpuhp_setup_state_multi(CPUHP_PADATA_DEAD, "padata:dead",
  NULL, padata_cpu_dead);
if (ret < 0) {
cpuhp_remove_multi_state(hp_online);
-   return ret;
+   goto err;
}
-   return 0;
-}
-module_init(padata_driver_init);
 
+   return;
+err:
+   pr_warn("padata: initialization failed\n");
 #endif
+}
-- 
2.26.2

[PATCH v2 6/7] mm: make deferred init's max threads arch-specific

2020-05-20 Thread Daniel Jordan

Using padata during deferred init has only been tested on x86, so for
now limit it to this architecture.

If another arch wants this, it can find the max thread limit that's best
for it and override deferred_page_init_max_threads().

Signed-off-by: Daniel Jordan 
---
 arch/x86/mm/init_64.c| 12 
 include/linux/memblock.h |  3 +++
 mm/page_alloc.c  | 13 -
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 8b5f73f5e207c..2d749ec12ea8a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1260,6 +1260,18 @@ void __init mem_init(void)
mem_init_print_info(NULL);
 }
 
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask)
+{
+   /*
+* More CPUs always led to greater speedups on tested systems, up to
+* all the nodes' CPUs.  Use all since the system is otherwise idle
+* now.
+*/
+   return max_t(int, cpumask_weight(node_cpumask), 1);
+}
+#endif
+
 int kernel_set_to_readonly;
 
 void mark_rodata_ro(void)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 6bc37a731d27b..2b289df44194f 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -275,6 +275,9 @@ void __next_mem_pfn_range_in_zone(u64 *idx, struct zone 
*zone,
 #define for_each_free_mem_pfn_range_in_zone_from(i, zone, p_start, p_end) \
for (; i != U64_MAX;  \
 __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end))
+
+int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask);
+
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 /**
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9cb780e8dec78..0d7d805f98b2d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1843,6 +1843,13 @@ deferred_init_memmap_chunk(unsigned long start_pfn, 
unsigned long end_pfn,
atomic_long_add(nr_pages, &args->nr_pages);
 }
 
+/* An arch may override for more concurrency. */
+__weak int __init
+deferred_page_init_max_threads(const struct cpumask *node_cpumask)
+{
+   return 1;
+}
+
 /* Initialise remaining memory on a node */
 static int __init deferred_init_memmap(void *data)
 {
@@ -1891,11 +1898,7 @@ static int __init deferred_init_memmap(void *data)
 first_init_pfn))
goto zone_empty;
 
-   /*
-* More CPUs always led to greater speedups on tested systems, up to
-* all the nodes' CPUs.  Use all since the system is otherwise idle now.
-*/
-   max_threads = max(cpumask_weight(cpumask), 1u);
+   max_threads = deferred_page_init_max_threads(cpumask);
 
while (spfn < epfn) {
epfn_align = ALIGN_DOWN(epfn, PAGES_PER_SECTION);
-- 
2.26.2

[PATCH v2 5/7] mm: parallelize deferred_init_memmap()

2020-05-20 Thread Daniel Jordan

Deferred struct page init is a significant bottleneck in kernel boot.
Optimizing it maximizes availability for large-memory systems and allows
spinning up short-lived VMs as needed without having to leave them
running.  It also benefits bare metal machines hosting VMs that are
sensitive to downtime.  In projects such as VMM Fast Restart[1], where
guest state is preserved across kexec reboot, it helps prevent
application and network timeouts in the guests.

Multithread to take full advantage of system memory bandwidth.

The maximum number of threads is capped at the number of CPUs on the
node because speedups always improve with additional threads on every
system tested, and at this phase of boot, the system is otherwise idle
and waiting on page init to finish.

Helper threads operate on section-aligned ranges to both avoid false
sharing when setting the pageblock's migrate type and to avoid accessing
uninitialized buddy pages, though max order alignment is enough for the
latter.

The minimum chunk size is also a section.  There was benefit to using
multiple threads even on relatively small memory (1G) systems, and this
is the smallest size that the alignment allows.

The time (milliseconds) is the slowest node to initialize since boot
blocks until all nodes finish.  intel_pstate is loaded in active mode
without hwp and with turbo enabled, and intel_idle is active as well.

Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz (Skylake, bare metal)
  2 nodes * 26 cores * 2 threads = 104 CPUs
  384G/node = 768G memory

   kernel boot deferred init
   
node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
  (  0) --   4078.0 (  9.0) --   1779.0 (  8.7)
   2% (  1)   1.4%   4021.3 (  2.9)   3.4%   1717.7 (  7.8)
  12% (  6)  35.1%   2644.7 ( 35.3)  80.8%341.0 ( 35.5)
  25% ( 13)  38.7%   2498.0 ( 34.2)  89.1%193.3 ( 32.3)
  37% ( 19)  39.1%   2482.0 ( 25.2)  90.1%175.3 ( 31.7)
  50% ( 26)  38.8%   2495.0 (  8.7)  89.1%193.7 (  3.5)
  75% ( 39)  39.2%   2478.0 ( 21.0)  90.3%172.7 ( 26.7)
 100% ( 52)  40.0%   2448.0 (  2.0)  91.9%143.3 (  1.5)

Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz (Broadwell, bare metal)
  1 node * 16 cores * 2 threads = 32 CPUs
  192G/node = 192G memory

   kernel boot deferred init
   
node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
  (  0) --   1996.0 ( 18.0) --   1104.3 (  6.7)
   3% (  1)   1.4%   1968.0 (  3.0)   2.7%   1074.7 (  9.0)
  12% (  4)  40.1%   1196.0 ( 22.7)  72.4%305.3 ( 16.8)
  25% (  8)  47.4%   1049.3 ( 17.2)  84.2%174.0 ( 10.6)
  37% ( 12)  48.3%   1032.0 ( 14.9)  86.8%145.3 (  2.5)
  50% ( 16)  48.9%   1020.3 (  2.5)  88.0%133.0 (  1.7)
  75% ( 24)  49.1%   1016.3 (  8.1)  88.4%128.0 (  1.7)
 100% ( 32)  49.4%   1009.0 (  8.5)  88.6%126.3 (  0.6)

Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, bare metal)
  2 nodes * 18 cores * 2 threads = 72 CPUs
  128G/node = 256G memory

   kernel boot deferred init
   
node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
  (  0) --   1682.7 (  6.7) --630.0 (  4.6)
   3% (  1)   0.4%   1676.0 (  2.0)   0.7%625.3 (  3.2)
  12% (  4)  25.8%   1249.0 (  1.0)  68.2%200.3 (  1.2)
  25% (  9)  30.0%   1178.0 (  5.2)  79.7%128.0 (  3.5)
  37% ( 13)  30.6%   1167.7 (  3.1)  81.3%117.7 (  1.2)
  50% ( 18)  30.6%   1167.3 (  2.3)  81.4%117.0 (  1.0)
  75% ( 27)  31.0%   1161.3 (  4.6)  82.5%110.0 (  6.9)
 100% ( 36)  32.1%   1142.0 (  3.6)  85.7% 90.0 (  1.0)

AMD EPYC 7551 32-Core Processor (Zen, kvm guest)
  1 node * 8 cores * 2 threads = 16 CPUs
  64G/node = 64G memory

   kernel boot deferred init
   
node% (thr)speedup  time_ms (stdev)speedup  time_ms (stdev)
  (  0) --   1003.7 ( 16.6) --243.3 (  8.1)
   6% (  1)   1.4%990.0 (  4.6)   1.2%240.3 (  1.5)
  12% (  2)  11.4%889.3 ( 16.7)  44.5%135.0 (  3.0)
  25% (  4)  16.8%835.3 (  9.0)  65.8% 83.3 (  2.5)
  37% (  6)  18.6%816.7 ( 17.6)  70.4% 72.0 (  1.0)
  50% (  8)  18.2%821.0 (  5.0)  70.7% 71.3 (  1.2)
  75% ( 12)  19.0%813.3 (  5.0)

[PATCH v2 4/7] padata: add basic support for multithreaded jobs

2020-05-20 Thread Daniel Jordan

Sometimes the kernel doesn't take full advantage of system memory
bandwidth, leading to a single CPU spending excessive time in
initialization paths where the data scales with memory size.

Multithreading naturally addresses this problem.

Extend padata, a framework that handles many parallel yet singlethreaded
jobs, to also handle multithreaded jobs by adding support for splitting
up the work evenly, specifying a minimum amount of work that's
appropriate for one helper thread to do, load balancing between helpers,
and coordinating them.

This is inspired by work from Pavel Tatashin and Steve Sistare.

Signed-off-by: Daniel Jordan 
---
 include/linux/padata.h |  29 
 kernel/padata.c| 152 -
 2 files changed, 178 insertions(+), 3 deletions(-)

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 3bfa503503ac5..b0affa466a841 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -4,6 +4,9 @@
  *
  * Copyright (C) 2008, 2009 secunet Security Networks AG
  * Copyright (C) 2008, 2009 Steffen Klassert 
+ *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ * Author: Daniel Jordan 
  */
 
 #ifndef PADATA_H
@@ -130,6 +133,31 @@ struct padata_shell {
struct list_headlist;
 };
 
+/**
+ * struct padata_mt_job - represents one multithreaded job
+ *
+ * @thread_fn: Called for each chunk of work that a padata thread does.
+ * @fn_arg: The thread function argument.
+ * @start: The start of the job (units are job-specific).
+ * @size: size of this node's work (units are job-specific).
+ * @align: Ranges passed to the thread function fall on this boundary, with the
+ * possible exceptions of the beginning and end of the job.
+ * @min_chunk: The minimum chunk size in job-specific units.  This allows
+ * the client to communicate the minimum amount of work that's
+ * appropriate for one worker thread to do at once.
+ * @max_threads: Max threads to use for the job, actual number may be less
+ *   depending on task size and minimum chunk size.
+ */
+struct padata_mt_job {
+   void (*thread_fn)(unsigned long start, unsigned long end, void *arg);
+   void*fn_arg;
+   unsigned long   start;
+   unsigned long   size;
+   unsigned long   align;
+   unsigned long   min_chunk;
+   int max_threads;
+};
+
 /**
  * struct padata_instance - The overall control structure.
  *
@@ -171,6 +199,7 @@ extern void padata_free_shell(struct padata_shell *ps);
 extern int padata_do_parallel(struct padata_shell *ps,
  struct padata_priv *padata, int *cb_cpu);
 extern void padata_do_serial(struct padata_priv *padata);
+extern void __init padata_do_multithreaded(struct padata_mt_job *job);
 extern int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
  cpumask_var_t cpumask);
 extern int padata_start(struct padata_instance *pinst);
diff --git a/kernel/padata.c b/kernel/padata.c
index 78ff9aa529204..e78f57d9aef90 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -7,6 +7,9 @@
  * Copyright (C) 2008, 2009 secunet Security Networks AG
  * Copyright (C) 2008, 2009 Steffen Klassert 
  *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ * Author: Daniel Jordan 
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
  * version 2, as published by the Free Software Foundation.
@@ -21,6 +24,7 @@
  * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -32,6 +36,8 @@
 #include 
 #include 
 
+#definePADATA_WORK_ONSTACK 1   /* Work's memory is on stack */
+
 struct padata_work {
struct work_struct  pw_work;
struct list_headpw_list;  /* padata_free_works linkage */
@@ -42,7 +48,17 @@ static DEFINE_SPINLOCK(padata_works_lock);
 static struct padata_work *padata_works;
 static LIST_HEAD(padata_free_works);
 
+struct padata_mt_job_state {
+   spinlock_t  lock;
+   struct completion   completion;
+   struct padata_mt_job*job;
+   int nworks;
+   int nworks_fini;
+   unsigned long   chunk_size;
+};
+
 static void padata_free_pd(struct parallel_data *pd);
+static void __init padata_mt_helper(struct work_struct *work);
 
 static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index)
 {
@@ -81,18 +97,56 @@ static struct padata_work *padata_work_alloc(void)
 }
 
 static void padata_work_init(struct padata_work *pw, work_func_t work_fn,
-void *data)
+void *data, int flags)
 {
-   INIT_WORK(&pw->pw_work, work_fn);
+   if (flags & PADATA_WORK_ONSTACK)
+   INIT_WORK_ONSTACK(&pw->

[PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-20 Thread Laurent Dufour

The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
reserved to the Ultravisor.

However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
context of the VM calling UV_ESM. This allows the Hypervisor to return to
the guest without going through the Ultravisor. Thus the Secure bit of SRR1
is not set in that particular case.

In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
not set in that case.

Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kvm/book3s_hv.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 93493f0cbfe8..6ad1a3b14300 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
ret = kvmppc_h_svm_init_done(vcpu->kvm);
break;
case H_SVM_INIT_ABORT:
-   ret = H_UNSUPPORTED;
-   if (kvmppc_get_srr1(vcpu) & MSR_S)
-   ret = kvmppc_h_svm_init_abort(vcpu->kvm);
+   /*
+* Even if that call is made by the Ultravisor, the SSR1 value
+* is the guest context one, with the secure bit clear as it has
+* not yet been secured. So we can't check it here.
+*/
+   ret = kvmppc_h_svm_init_abort(vcpu->kvm);
break;
 
default:
-- 
2.26.2

Re: [PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-20 Thread Laurent Dufour


Le 20/05/2020 à 19:32, Greg Kurz a écrit :

On Wed, 20 May 2020 18:51:10 +0200
Laurent Dufour  wrote:


The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
reserved to the Ultravisor.

However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
context of the VM calling UV_ESM. This allows the Hypervisor to return to
the guest without going through the Ultravisor. Thus the Secure bit of SRR1
is not set in that particular case.

In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
not set in that case.



Why not checking vcpu->kvm->arch.secure_guest then ?


I don't think that's the right place.



Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
Signed-off-by: Laurent Dufour 
---
  arch/powerpc/kvm/book3s_hv.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 93493f0cbfe8..eb1f96cb7b72 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
ret = kvmppc_h_svm_init_done(vcpu->kvm);
break;
case H_SVM_INIT_ABORT:
-   ret = H_UNSUPPORTED;
-   if (kvmppc_get_srr1(vcpu) & MSR_S)
-   ret = kvmppc_h_svm_init_abort(vcpu->kvm);


or at least put a comment to explain why H_SVM_INIT_ABORT
doesn't have the same sanity check as the other SVM hcalls.


I agree that might help. I'll send a v2 with a comment there.




+   ret = kvmppc_h_svm_init_abort(vcpu->kvm);
break;
  
  	default:

Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-05-20 Thread Vaibhav Jain



Thanks for reviewing this this patch Ira. My responses below:

Ira Weiny  writes:

> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
>> Implement support for fetching nvdimm health information via
>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
>> of 64-bit big-endian integers, bitwise-and of which is then stored in
>> 'struct papr_scm_priv' and subsequently partially exposed to
>> user-space via newly introduced dimm specific attribute
>> 'papr/flags'. Since the hcall is costly, the health information is
>> cached and only re-queried, 60s after the previous successful hcall.
>> 
>> The patch also adds a  documentation text describing flags reported by
>> the the new sysfs attribute 'papr/flags' is also introduced at
>> Documentation/ABI/testing/sysfs-bus-papr-scm.
>> 
>> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
>> PAPR hcalls")
>> 
>> Cc: Dan Williams 
>> Cc: Michael Ellerman 
>> Cc: "Aneesh Kumar K . V" 
>> Signed-off-by: Vaibhav Jain 
>> ---
>> Changelog:
>> 
>> Resend:
>> * None
>> 
>> v6..v7 :
>> * Used the exported buf_seq_printf() function to generate content for
>>   'papr/flags'
>> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
>>   and removed the papr_scm.h file [Mpe]
>> * Some minor consistency issued in sysfs-bus-papr-scm
>>   documentation. [Mpe]
>> * s/dimm_mutex/health_mutex/g [Mpe]
>> * Split drc_pmem_query_health() into two function one of which takes
>>   care of caching and locking. [Mpe]
>> * Fixed a local copy creation of dimm health information using
>>   READ_ONCE(). [Mpe]
>> 
>> v5..v6 :
>> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
>>   [Dan Williams]
>> * Include documentation for 'papr/flags' attr [Dan Williams]
>> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
>> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
>> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
>> * Replaced two __be64 integers from papr_scm_priv to a single u64
>>   integer [Mpe]
>> * Updated patch description to reflect the changes made in this
>>   version.
>> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
>>   flags_show() [Dan Williams]
>> 
>> v4..v5 :
>> * None
>> 
>> v3..v4 :
>> * None
>> 
>> v2..v3 :
>> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>>   NVDIMM unarmed [Aneesh]
>> 
>> v1..v2 :
>> * New patch in the series.
>> ---
>>  Documentation/ABI/testing/sysfs-bus-papr-scm |  27 +++
>>  arch/powerpc/platforms/pseries/papr_scm.c| 169 ++-
>>  2 files changed, 194 insertions(+), 2 deletions(-)
>>  create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
>> 
>> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm 
>> b/Documentation/ABI/testing/sysfs-bus-papr-scm
>> new file mode 100644
>> index ..6143d06072f1
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm
>> @@ -0,0 +1,27 @@
>> +What:   /sys/bus/nd/devices/nmemX/papr/flags
>> +Date:   Apr, 2020
>> +KernelVersion:  v5.8
>> +Contact:linuxppc-dev , 
>> linux-nvd...@lists.01.org,
>> +Description:
>> +(RO) Report flags indicating various states of a
>> +papr-scm NVDIMM device. Each flag maps to a one or
>> +more bits set in the dimm-health-bitmap retrieved in
>> +response to H_SCM_HEALTH hcall. The details of the bit
>> +flags returned in response to this hcall is available
>> +at 'Documentation/powerpc/papr_hcalls.rst' . Below are
>> +the flags reported in this sysfs file:
>> +
>> +* "not_armed"   : Indicates that NVDIMM contents will not
>> +  survive a power cycle.
>> +* "flush_fail"  : Indicates that NVDIMM contents
>> +  couldn't be flushed during last
>> +  shut-down event.
>> +* "restore_fail": Indicates that NVDIMM contents
>> +  couldn't be restored during NVDIMM
>> +  initialization.
>> +* "encrypted"   : NVDIMM contents are encrypted.
>> +* "smart_notify": There is health event for the NVDIMM.
>> +* "scrubbed": Indicating that contents of the
>> +  NVDIMM have been scrubbed.
>> +* "locked"  : Indicating that NVDIMM contents cant
>> +  be modified until next power cycle.
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index f35592423380..142636e1a59f 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -12,6 +12,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  
>> @@ -22,6 +23,44 @@
>>   (1ul << ND_CMD_GET_CONFIG_DATA) | \
>>

Re: [PATCH] input: i8042: Remove special PowerPC handling

2020-05-20 Thread Dmitry Torokhov

Hi Michael,

On Wed, May 20, 2020 at 04:07:00PM +1000, Michael Ellerman wrote:
> [ + Dmitry & linux-input ]
> 
> Nathan Chancellor  writes:
> > This causes a build error with CONFIG_WALNUT because kb_cs and kb_data
> > were removed in commit 917f0af9e5a9 ("powerpc: Remove arch/ppc and
> > include/asm-ppc").
> >
> > ld.lld: error: undefined symbol: kb_cs
> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >
> > ld.lld: error: undefined symbol: kb_data
> >> referenced by i8042.c:309 (drivers/input/serio/i8042.c:309)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042-ppcio.h:33 (drivers/input/serio/i8042-ppcio.h:33)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced by i8042.c:319 (drivers/input/serio/i8042.c:319)
> >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a
> >> referenced 15 more times
> >
> > Presumably since nobody has noticed this for the last 12 years, there is
> > not anyone actually trying to use this driver so we can just remove this
> > special walnut code and use the generic header so it builds for all
> > configurations.
> >
> > Fixes: 917f0af9e5a9 ("powerpc: Remove arch/ppc and include/asm-ppc")
> > Reported-by: kbuild test robot 
> > Signed-off-by: Nathan Chancellor 
> > ---
> >  drivers/input/serio/i8042-ppcio.h | 57 ---
> >  drivers/input/serio/i8042.h   |  2 --
> >  2 files changed, 59 deletions(-)
> >  delete mode 100644 drivers/input/serio/i8042-ppcio.h
> 
> This LGTM.
> 
> Acked-by: Michael Ellerman  (powerpc)
> 
> I assumed drivers/input/serio would be pretty quiet, but there's
> actually some commits to it in linux-next. So perhaps this should go via
> the input tree.
> 
> Dmitry do you want to take this, or should I take it via powerpc?
> 
> Original patch is here:
>   
> https://lore.kernel.org/lkml/20200518181043.3363953-1-natechancel...@gmail.com

I'm fine with you taking it through powerpc.

Acked-by: Dmitry Torokhov 

Also, while I have your attention ;), could you please ack or take
https://lore.kernel.org/lkml/20191002214854.GA114387@dtor-ws/ as I
believe this is the last user or input_polled_dev API and I would like
to drop it from the tree.

Thanks!

-- 
Dmitry

Re: [RESEND PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules

2020-05-20 Thread Christoph Hellwig

s/seq_buf: Export seq_buf_printf() to external modules/
  seq_buf: export seq_buf_printf/

[PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-20 Thread Laurent Dufour

The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
reserved to the Ultravisor.

However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
context of the VM calling UV_ESM. This allows the Hypervisor to return to
the guest without going through the Ultravisor. Thus the Secure bit of SRR1
is not set in that particular case.

In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
not set in that case.

Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/kvm/book3s_hv.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 93493f0cbfe8..eb1f96cb7b72 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
ret = kvmppc_h_svm_init_done(vcpu->kvm);
break;
case H_SVM_INIT_ABORT:
-   ret = H_UNSUPPORTED;
-   if (kvmppc_get_srr1(vcpu) & MSR_S)
-   ret = kvmppc_h_svm_init_abort(vcpu->kvm);
+   ret = kvmppc_h_svm_init_abort(vcpu->kvm);
break;
 
default:
-- 
2.26.2

[PATCH] net/ethernet/freescale: rework quiesce/activate for ucc_geth

2020-05-20 Thread Valentin Longchamp

ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.

The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].

A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.

The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.

[1] https://lists.openwall.net/netdev/2020/01/08/201

Signed-off-by: Valentin Longchamp 
Acked-by: Matteo Ghidoni 
---
 drivers/net/ethernet/freescale/ucc_geth.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
b/drivers/net/ethernet/freescale/ucc_geth.c
index 6e5f6dd169b5..552e7554a9f8 100644
--- a/drivers/net/ethernet/freescale/ucc_geth.c
+++ b/drivers/net/ethernet/freescale/ucc_geth.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ucc_geth.h"
 
@@ -1548,11 +1549,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, 
enum comm_dir mode)
 
 static void ugeth_quiesce(struct ucc_geth_private *ugeth)
 {
-   /* Prevent any further xmits, plus detach the device. */
-   netif_device_detach(ugeth->ndev);
-
-   /* Wait for any current xmits to finish. */
-   netif_tx_disable(ugeth->ndev);
+   /* Prevent any further xmits */
+   netif_tx_stop_all_queues(ugeth->ndev);
 
/* Disable the interrupt to avoid NAPI rescheduling. */
disable_irq(ugeth->ug_info->uf_info.irq);
@@ -1565,7 +1563,10 @@ static void ugeth_activate(struct ucc_geth_private 
*ugeth)
 {
napi_enable(&ugeth->napi);
enable_irq(ugeth->ug_info->uf_info.irq);
-   netif_device_attach(ugeth->ndev);
+
+   /* allow to xmit again  */
+   netif_tx_wake_all_queues(ugeth->ndev);
+   __netdev_watchdog_up(ugeth->ndev);
 }
 
 /* Called every time the controller might need to be made
-- 
2.25.1

Re: [RESEND PATCH v7 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods

2020-05-20 Thread Ira Weiny

On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
> modules and add the command family to the white list of NVDIMM command
> sets. Also advertise support for ND_CMD_CALL for the dimm
> command mask and implement necessary scaffolding in the module to
> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
> 
> The layout of the PDSM request as we expect from libnvdimm/libndctl is
> described in newly introduced uapi header 'papr_scm_pdsm.h' which
> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
> to communicate the PDSM request via member
> 'nd_pkg_papr_scm->nd_command' and size of payload that need to be
> sent/received for servicing the PDSM.
> 
> A new function is_cmd_valid() is implemented that reads the args to
> papr_scm_ndctl() and performs sanity tests on them. A new function
> papr_scm_service_pdsm() is introduced and is called from
> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
> command from libnvdimm.
> 
> Cc: Dan Williams 
> Cc: Michael Ellerman 
> Cc: "Aneesh Kumar K . V" 
> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
> 
> Resend:
> * None
> 
> v6..v7 :
> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>   [Mpe].
> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>   [Mpe].
> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
> 
> v5..v6 :
> * Changed the usage of the term DSM to PDSM to distinguish it from the
>   ACPI term [ Dan Williams ]
> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
>   to reflect the new terminology.
> * Updated the patch description and title to reflect the new terminology.
> * Squashed patch to introduce new command family in 'ndctl.h' with
>   this patch [ Dan Williams ]
> * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0
>   [ Dan Williams ]
> * Removed redundant license text from the papr_scm_psdm.h file.
>   [ Dan Williams ]
> * s/envelop/envelope/ at various places [ Dan Williams ]
> * Added '__packed' attribute to command package header to gaurd
>   against different compiler adding paddings between the fields.
>   [ Dan Williams]
> * Converted various pr_debug to dev_debug [ Dan Williams ]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
> 
> v1..v2 :
> * None
> ---
>  arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++
>  arch/powerpc/platforms/pseries/papr_scm.c | 101 -
>  include/uapi/linux/ndctl.h|   1 +
>  3 files changed, 230 insertions(+), 6 deletions(-)
>  create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> 
> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h 
> b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> new file mode 100644
> index ..671693439c1c
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> @@ -0,0 +1,134 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl
> + *
> + * (C) Copyright IBM 2020
> + *
> + * Author: Vaibhav Jain 
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
> +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
> +
> +#include 
> +
> +/*
> + * PDSM Envelope:
> + *
> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> + * 'envelopes' which consists of a header and user-defined payload sections.
> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> + * payload following it and offset of which relative to the struct is 
> provided
> + * by 'nd_pdsm_cmd_pkg.payload_offset'. *
> + *
> + *  +-+-+---+
> + *  |   64-Bytes  |   8-Bytes   |   Max 184-Bytes   |
> + *  +-+-+---+
> + *  |   nd_pdsm_cmd_pkg |   |
> + *  |-+ |   |
> + *  |  nd_cmd_pkg | |   |
> + *  +-+-+---+
> + *  | nd_family   |  |   |
> + *  | nd_size_out | cmd_status  ||
> + *  | nd_size_in  | payload_version |  PAYLOAD   |
> + *  | nd_command  | payload_offset ->|
> + *  | nd_fw_size  | ||
> + *  +-+-+---+
> + *
> + * PDSM Header:
> + *
> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> + * 'nd_cmd_pkg.nd_command'. Apart f

Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP

2020-05-20 Thread Ira Weiny

On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
> Implement support for fetching nvdimm health information via
> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
> of 64-bit big-endian integers, bitwise-and of which is then stored in
> 'struct papr_scm_priv' and subsequently partially exposed to
> user-space via newly introduced dimm specific attribute
> 'papr/flags'. Since the hcall is costly, the health information is
> cached and only re-queried, 60s after the previous successful hcall.
> 
> The patch also adds a  documentation text describing flags reported by
> the the new sysfs attribute 'papr/flags' is also introduced at
> Documentation/ABI/testing/sysfs-bus-papr-scm.
> 
> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
> PAPR hcalls")
> 
> Cc: Dan Williams 
> Cc: Michael Ellerman 
> Cc: "Aneesh Kumar K . V" 
> Signed-off-by: Vaibhav Jain 
> ---
> Changelog:
> 
> Resend:
> * None
> 
> v6..v7 :
> * Used the exported buf_seq_printf() function to generate content for
>   'papr/flags'
> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
>   and removed the papr_scm.h file [Mpe]
> * Some minor consistency issued in sysfs-bus-papr-scm
>   documentation. [Mpe]
> * s/dimm_mutex/health_mutex/g [Mpe]
> * Split drc_pmem_query_health() into two function one of which takes
>   care of caching and locking. [Mpe]
> * Fixed a local copy creation of dimm health information using
>   READ_ONCE(). [Mpe]
> 
> v5..v6 :
> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
>   [Dan Williams]
> * Include documentation for 'papr/flags' attr [Dan Williams]
> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
> * Replaced two __be64 integers from papr_scm_priv to a single u64
>   integer [Mpe]
> * Updated patch description to reflect the changes made in this
>   version.
> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
>   flags_show() [Dan Williams]
> 
> v4..v5 :
> * None
> 
> v3..v4 :
> * None
> 
> v2..v3 :
> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>NVDIMM unarmed [Aneesh]
> 
> v1..v2 :
> * New patch in the series.
> ---
>  Documentation/ABI/testing/sysfs-bus-papr-scm |  27 +++
>  arch/powerpc/platforms/pseries/papr_scm.c| 169 ++-
>  2 files changed, 194 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm 
> b/Documentation/ABI/testing/sysfs-bus-papr-scm
> new file mode 100644
> index ..6143d06072f1
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm
> @@ -0,0 +1,27 @@
> +What:/sys/bus/nd/devices/nmemX/papr/flags
> +Date:Apr, 2020
> +KernelVersion:   v5.8
> +Contact: linuxppc-dev , 
> linux-nvd...@lists.01.org,
> +Description:
> + (RO) Report flags indicating various states of a
> + papr-scm NVDIMM device. Each flag maps to a one or
> + more bits set in the dimm-health-bitmap retrieved in
> + response to H_SCM_HEALTH hcall. The details of the bit
> + flags returned in response to this hcall is available
> + at 'Documentation/powerpc/papr_hcalls.rst' . Below are
> + the flags reported in this sysfs file:
> +
> + * "not_armed"   : Indicates that NVDIMM contents will not
> +   survive a power cycle.
> + * "flush_fail"  : Indicates that NVDIMM contents
> +   couldn't be flushed during last
> +   shut-down event.
> + * "restore_fail": Indicates that NVDIMM contents
> +   couldn't be restored during NVDIMM
> +   initialization.
> + * "encrypted"   : NVDIMM contents are encrypted.
> + * "smart_notify": There is health event for the NVDIMM.
> + * "scrubbed": Indicating that contents of the
> +   NVDIMM have been scrubbed.
> + * "locked"  : Indicating that NVDIMM contents cant
> +   be modified until next power cycle.
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index f35592423380..142636e1a59f 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -22,6 +23,44 @@
>(1ul << ND_CMD_GET_CONFIG_DATA) | \
>(1ul << ND_CMD_SET_CONFIG_DATA))
>  
> +/* DIMM health bitmap bitmap indicators */
> +/* SCM device is unable to persist memory contents */
> +#define PAPR_SCM_DIMM_UNARMED

Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)

2020-05-20 Thread Aneesh Kumar K.V


On 5/20/20 7:23 PM, Christophe Leroy wrote:



Le 20/05/2020 à 15:43, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :

Hi again, Christophe,

On Mon, 18 May 2020 at 15:03, Christophe Leroy
 wrote:


Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It 
may

have broken swap.


Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
survives the beating. I'll be happy to test a definitive patch!



Yeah I discovered recently that the way swap is implemented on powerpc
expects RW and other important bits not be one of the 3 least
significant bits (see __pte_to_swp_entry() )


The last 3 bits are there to track the _PAGE_PRESENT right? What is the
RW dependency there? Are you suggesting of read/write migration entry?
A swap entry should not retain the pte rw bits right?

A swap entry is built using swap type + offset. And it should not have a
dependency on pte RW bits. Along with type and offset we also should
have the ability to mark it as a pte entry and also set not present
bits. With that understanding what am I missing here?


That's probably me who is missing something, I have not digged into the 
swap functionning yet indeed, so that was only my first feeling.


By the way, the problems is definitely due to the order changes in the 
PTE bits, whether that's because _PAGE_RW was moved to the last 3 bits 
or whether that's because _PAGE_PRESENT was moved out of the last 3 
bits, I don't know yet.


My (bad) understanding is from the fact that  __pte_to_swp_entry() is a 
right shift by 3 bits, so it looses the last 3 bits, and therefore 
__swp_entry_to_pte(__pte_to_swp_entry(pte)) looses the last 3 bits of a 
PTE.


Is there somewhere a description of how swap works exactly ?



Looking at  __set_pte_at(), I am wondering whether this was due to 
_PAGE_HASHPTE? . This would mean we end up wrongly updating some swap 
entry details. We call set_pte_at() on swap pte entries.


-aneesh

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-20 Thread Qian Cai

On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote:
> On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> > Just a head up. Repeatedly compiling kernels for a while would trigger
> > endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> > .config are in,
> 
> Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
> not seen anything like that myself. Let me go have a look.

Yes, I ended up figuring out the same commit a bit earlier. Since then I
reverted that commit and its dependency,

2a0a24ebb499 ("sched: Make scheduler_ipi inline")

Everything works fine so far.

> 
> 
> In as far as the logs are readable (they're a wrapped mess, please don't
> do that!), they contain very little useful, as is typical with IPIs :/

Sorry about that. I forgot that gmail webUI will wrap things around. I will
switch to mutt.

> 
> > [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> > flush_smp_call_function_queue+0x1fa/0x2e0
> > [ 1168.00][C1] Modules linked in: nls_iso8859_1 nls_cp437 vfat
> > fat kvm_amd ses kvm enclosure dax_pmem irqbypass dax_pmem_core efivars
> > acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi
> > scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror
> > dm_region_hash dm_log dm_mod
> > [ 1168.029492][C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> > 5.7.0-rc6-next-20200519 #1
> > [ 1168.037665][C1] Hardware name: HPE ProLiant DL385
> > Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > [ 1168.046978][C1] RIP: 0010:flush_smp_call_function_queue+0x1fa/0x2e0
> > [ 1168.053658][C1] Code: 01 0f 87 c9 12 00 00 83 e3 01 0f 85 cc fe
> > ff ff 48 c7 c7 c0 55 a9 8f c6 05 f6 86 cd 01 01 e8 de 09 ea ff 0f 0b
> > e9 b2 fe ff ff <0f> 0b e9 52 ff ff ff 0f 0b e9 f2 fe ff ff 65 44 8b 25
> > 10 52 3f 71
> > [ 1168.073262][C1] RSP: 0018:c9178918 EFLAGS: 00010046
> > [ 1168.079253][C1] RAX:  RBX: 430c58f8
> > RCX: 8ec26083
> > [ 1168.087156][C1] RDX: 0003 RSI: dc00
> > RDI: 430c58f8
> > [ 1168.095054][C1] RBP: c91789a8 R08: ed1108618cec
> > R09: ed1108618cec
> > [ 1168.102964][C1] R10: 430c675b R11: 
> > R12: 430c58e0
> > [ 1168.110866][C1] R13: 8eb30c40 R14: 430c5880
> > R15: 430c58e0
> > [ 1168.118767][C1] FS:  ()
> > GS:4308() knlGS:
> > [ 1168.127628][C1] CS:  0010 DS:  ES:  CR0: 80050033
> > [ 1168.134129][C1] CR2: 55b169604560 CR3: 000d08a14000
> > CR4: 003406e0
> > [ 1168.142026][C1] Call Trace:
> > [ 1168.145206][C1]  
> > [ 1168.147957][C1]  ? smp_call_on_cpu_callback+0xd0/0xd0
> > [ 1168.153421][C1]  ? rcu_read_lock_sched_held+0xac/0xe0
> > [ 1168.158880][C1]  ? rcu_read_lock_bh_held+0xc0/0xc0
> > [ 1168.164076][C1]  generic_smp_call_function_single_interrupt+0x13/0x2b
> > [ 1168.170938][C1]  smp_call_function_single_interrupt+0x157/0x4e0
> > [ 1168.177278][C1]  ? smp_call_function_interrupt+0x4e0/0x4e0
> > [ 1168.183172][C1]  ? interrupt_entry+0xe4/0xf0
> > [ 1168.187846][C1]  ? trace_hardirqs_off_caller+0x8d/0x1f0
> > [ 1168.193478][C1]  ? trace_hardirqs_on_caller+0x1f0/0x1f0
> > [ 1168.199116][C1]  ? _nohz_idle_balance+0x221/0x360
> > [ 1168.204228][C1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > [ 1168.209690][C1]  call_function_single_interrupt+0xf/0x20

Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)

2020-05-20 Thread Christophe Leroy





Le 20/05/2020 à 15:43, Aneesh Kumar K.V a écrit :

Christophe Leroy  writes:


Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :

Hi again, Christophe,

On Mon, 18 May 2020 at 15:03, Christophe Leroy
 wrote:


Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may
have broken swap.


Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
survives the beating. I'll be happy to test a definitive patch!



Yeah I discovered recently that the way swap is implemented on powerpc
expects RW and other important bits not be one of the 3 least
significant bits (see __pte_to_swp_entry() )


The last 3 bits are there to track the _PAGE_PRESENT right? What is the
RW dependency there? Are you suggesting of read/write migration entry?
A swap entry should not retain the pte rw bits right?

A swap entry is built using swap type + offset. And it should not have a
dependency on pte RW bits. Along with type and offset we also should
have the ability to mark it as a pte entry and also set not present
bits. With that understanding what am I missing here?


That's probably me who is missing something, I have not digged into the 
swap functionning yet indeed, so that was only my first feeling.


By the way, the problems is definitely due to the order changes in the 
PTE bits, whether that's because _PAGE_RW was moved to the last 3 bits 
or whether that's because _PAGE_PRESENT was moved out of the last 3 
bits, I don't know yet.


My (bad) understanding is from the fact that  __pte_to_swp_entry() is a 
right shift by 3 bits, so it looses the last 3 bits, and therefore 
__swp_entry_to_pte(__pte_to_swp_entry(pte)) looses the last 3 bits of a PTE.


Is there somewhere a description of how swap works exactly ?

Christophe

Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open

2020-05-20 Thread rananta

On 2020-05-20 02:38, Jiri Slaby wrote:

On 15. 05. 20, 1:22, rana...@codeaurora.org wrote:

On 2020-05-13 00:04, Greg KH wrote:
On Tue, May 12, 2020 at 02:39:50PM -0700, rana...@codeaurora.org 
wrote:

On 2020-05-12 01:25, Greg KH wrote:
> On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote:
> > commit bdb498c20040616e94b05c31a0ceb3e134b7e829
> > Author: Jiri Slaby 
> > Date:   Tue Aug 7 21:48:04 2012 +0200
> >
> > TTY: hvc_console, add tty install
> >
> > added hvc_install but did not move 'tty->driver_data = NULL;' from
> > hvc_open's fail path to hvc_cleanup.
> >
> > IOW hvc_open now NULLs tty->driver_data even for another task which
> > opened the tty earlier. The same holds for
> > "tty_port_tty_set(&hp->port,
> > NULL);" there. And actually "tty_port_put(&hp->port);" is also
> > incorrect
> > for the 2nd task opening the tty.
> >

...

These are the traces you get when the issue happens:
[  154.212291] hvc_install called for pid: 666
[  154.216705] hvc_open called for pid: 666
[  154.233657] hvc_open: request_irq failed with rc -22.
[  154.238934] hvc_open called for pid: 678
[  154.243012] Unable to handle kernel NULL pointer dereference at
virtual address 00c4
# hvc_install isn't called for pid: 678 as the file wasn't closed yet.

Nice. Does the attached help?

Yeah, it looks good. I think it also eliminates the port.count reference
issue discussed on the v2 patch. Are you planning to mainline this?

I wonder how comes the tty_port_put in hvc_open does not cause a UAF? I
would say hvc_open fails, tty_port_put is called. It decrements the
reference taken in hvc_install. So far so good.

Now, this should happen IMO:
tty_open
  -> hvc_open (fails)
-> tty_port_put
hvc_console driver defines port->ops->destruct(). Upon tty_port_put(), 
the
tty_port_destructor() calls port->ops->destruct(), rather than 
kfree(port).

  -> tty_release
-> tty_release_struct
  -> tty_kref_put
-> queue_release_one_tty
SCHEDULED WORKQUEUE
release_one_tty
  -> hvc_cleanup
-> tty_port_put (should die terrible death now)

Since port is not free'd, I think we should be good.

What am I missing?

thanks,

Thank you.
Raghavendra

Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6, 7)

2020-05-20 Thread Aneesh Kumar K.V

Christophe Leroy  writes:

> Le 18/05/2020 à 17:19, Rui Salvaterra a écrit :
>> Hi again, Christophe,
>> 
>> On Mon, 18 May 2020 at 15:03, Christophe Leroy
>>  wrote:
>>>
>>> Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may
>>> have broken swap.
>> 
>> Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top
>> survives the beating. I'll be happy to test a definitive patch!
>> 
>
> Yeah I discovered recently that the way swap is implemented on powerpc 
> expects RW and other important bits not be one of the 3 least 
> significant bits (see __pte_to_swp_entry() )

The last 3 bits are there to track the _PAGE_PRESENT right? What is the
RW dependency there? Are you suggesting of read/write migration entry?
A swap entry should not retain the pte rw bits right? 

A swap entry is built using swap type + offset. And it should not have a
dependency on pte RW bits. Along with type and offset we also should
have the ability to mark it as a pte entry and also set not present
bits. With that understanding what am I missing here?

>
> I guess the easiest for the time being is to revert the commit with a 
> proper explanation of the issue, then one day we'll modify the way 
> powerpc manages swap.
>

-aneesh

[PATCH] powerpc/64s: Disable STRICT_KERNEL_RWX

2020-05-20 Thread Michael Ellerman

Several strange crashes have been eventually traced back to
STRICT_KERNEL_RWX and its interaction with code patching.

Various paths in our ftrace, kprobes and other patching code need to
be hardened against patching failures, otherwise we can end up running
with partially/incorrectly patched ftrace paths, kprobes or jump
labels, which can then cause strange crashes.

Although fixes for those are in development, they're not -rc material.

There also seem to be problems with the underlying strict RWX logic,
which needs further debugging.

So for now disable STRICT_KERNEL_RWX on 64-bit to prevent people from
enabling the option and tripping over the bugs.

Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some 
configs")
Cc: sta...@vger.kernel.org # v4.13+
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 924c541a9260..d13b5328ca10 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -130,7 +130,7 @@ config PPC
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC_BOOK3S_64
-   select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!HIBERNATION)
+   select ARCH_HAS_STRICT_KERNEL_RWX   if (PPC32 && !HIBERNATION)
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
select ARCH_HAS_UACCESS_MCSAFE  if PPC64
-- 
2.25.1

Re: Endless soft-lockups for compiling workload since next-20200519

2020-05-20 Thread Peter Zijlstra

On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote:
> Just a head up. Repeatedly compiling kernels for a while would trigger
> endless soft-lockups since next-20200519 on both x86_64 and powerpc.
> .config are in,

Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've
not seen anything like that myself. Let me go have a look.


In as far as the logs are readable (they're a wrapped mess, please don't
do that!), they contain very little useful, as is typical with IPIs :/

> [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
> flush_smp_call_function_queue+0x1fa/0x2e0
> [ 1168.00][C1] Modules linked in: nls_iso8859_1 nls_cp437 vfat
> fat kvm_amd ses kvm enclosure dax_pmem irqbypass dax_pmem_core efivars
> acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi
> scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror
> dm_region_hash dm_log dm_mod
> [ 1168.029492][C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> 5.7.0-rc6-next-20200519 #1
> [ 1168.037665][C1] Hardware name: HPE ProLiant DL385
> Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> [ 1168.046978][C1] RIP: 0010:flush_smp_call_function_queue+0x1fa/0x2e0
> [ 1168.053658][C1] Code: 01 0f 87 c9 12 00 00 83 e3 01 0f 85 cc fe
> ff ff 48 c7 c7 c0 55 a9 8f c6 05 f6 86 cd 01 01 e8 de 09 ea ff 0f 0b
> e9 b2 fe ff ff <0f> 0b e9 52 ff ff ff 0f 0b e9 f2 fe ff ff 65 44 8b 25
> 10 52 3f 71
> [ 1168.073262][C1] RSP: 0018:c9178918 EFLAGS: 00010046
> [ 1168.079253][C1] RAX:  RBX: 430c58f8
> RCX: 8ec26083
> [ 1168.087156][C1] RDX: 0003 RSI: dc00
> RDI: 430c58f8
> [ 1168.095054][C1] RBP: c91789a8 R08: ed1108618cec
> R09: ed1108618cec
> [ 1168.102964][C1] R10: 430c675b R11: 
> R12: 430c58e0
> [ 1168.110866][C1] R13: 8eb30c40 R14: 430c5880
> R15: 430c58e0
> [ 1168.118767][C1] FS:  ()
> GS:4308() knlGS:
> [ 1168.127628][C1] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1168.134129][C1] CR2: 55b169604560 CR3: 000d08a14000
> CR4: 003406e0
> [ 1168.142026][C1] Call Trace:
> [ 1168.145206][C1]  
> [ 1168.147957][C1]  ? smp_call_on_cpu_callback+0xd0/0xd0
> [ 1168.153421][C1]  ? rcu_read_lock_sched_held+0xac/0xe0
> [ 1168.158880][C1]  ? rcu_read_lock_bh_held+0xc0/0xc0
> [ 1168.164076][C1]  generic_smp_call_function_single_interrupt+0x13/0x2b
> [ 1168.170938][C1]  smp_call_function_single_interrupt+0x157/0x4e0
> [ 1168.177278][C1]  ? smp_call_function_interrupt+0x4e0/0x4e0
> [ 1168.183172][C1]  ? interrupt_entry+0xe4/0xf0
> [ 1168.187846][C1]  ? trace_hardirqs_off_caller+0x8d/0x1f0
> [ 1168.193478][C1]  ? trace_hardirqs_on_caller+0x1f0/0x1f0
> [ 1168.199116][C1]  ? _nohz_idle_balance+0x221/0x360
> [ 1168.204228][C1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [ 1168.209690][C1]  call_function_single_interrupt+0xf/0x20

Re: [PATCH v2] tty: hvc: Fix data abort due to race in hvc_open

2020-05-20 Thread rananta


On 2020-05-20 01:59, Jiri Slaby wrote:

On 20. 05. 20, 8:47, Raghavendra Rao Ananta wrote:

Potentially, hvc_open() can be called in parallel when two tasks calls
open() on /dev/hvcX. In such a scenario, if the 
hp->ops->notifier_add()

callback in the function fails, where it sets the tty->driver_data to
NULL, the parallel hvc_open() can see this NULL and cause a memory 
abort.

Hence, do a NULL check at the beginning, before proceeding ahead.

The issue can be easily reproduced by launching two tasks 
simultaneously

that does an open() call on /dev/hvcX.
For example:
$ cat /dev/hvc0 & cat /dev/hvc0 &

Cc: sta...@vger.kernel.org
Signed-off-by: Raghavendra Rao Ananta 
---
 drivers/tty/hvc/hvc_console.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/tty/hvc/hvc_console.c 
b/drivers/tty/hvc/hvc_console.c

index 436cc51c92c3..80709f754cc8 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -350,6 +350,9 @@ static int hvc_open(struct tty_struct *tty, struct 
file * filp)

unsigned long flags;
int rc = 0;

+   if (!hp)
+   return -ENODEV;
+


This is still not fixing the bug properly. See:
https://lore.kernel.org/linuxppc-dev/0f7791f5-0a53-59f6-7277-247a789f3...@suse.cz/

In particular, the paragraph starting "IOW".

You are right. This doesn't fix the problem entirely. There are other 
parts to it which is
not handled in a clean way by the driver. Apart from the things you've 
mentioned, it doesn't

seem to handle the hp->port.count correctly as well.

hvc_open:
  hp->port.count++
  hp->ops->notifier_add(hp, hp->data) fails
  tty->driver_data = NULL

hvc_close:
  returns immediately as tty->driver_data == NULL, without 
hp->port.count--


This would leave the port in a stale state, and the second caller of 
hvc_open doesn't get

a chance to call/retry hp->ops->notifier_add(hp, hp->data);

However, the patch is not trying to address the logical issues with 
hvc_open and hvc_close.
It's only trying to eliminate the potential NULL pointer dereference, 
leading to a panic.
From what I see, the hvc_open is serialized by tty_lock, and adding a 
NULL check here is

preventing the second caller.

thanks,


Thank you.
Raghavendra

Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe

2020-05-20 Thread Mark Brown

On Wed, May 20, 2020 at 07:22:19PM +0800, Shengjiu Wang wrote:

> I see some driver also request dma channel in open() or hw_params().
> how can they avoid the defer probe issue?
> for example：
> sound/arm/pxa2xx-pcm-lib.c
> sound/soc/sprd/sprd-pcm-dma.c

Other drivers having problems means those drivers should be fixed, not
that we should copy the problems.  In the case of the PXA driver that's
very old code which predates deferred probe by I'd guess a decade.

signature.asc
Description: PGP signature

Re: [PATCH] powerpc: Add ppc_inst_next()

2020-05-20 Thread Christophe Leroy





Le 20/05/2020 à 14:21, Jordan Niethe a écrit :

On Wed, May 20, 2020 at 9:44 PM Michael Ellerman  wrote:


In a few places we want to calculate the address of the next
instruction. Previously that was simple, we just added 4 bytes, or if
using a u32 * we incremented that pointer by 1.

But prefixed instructions make it more complicated, we need to advance
by either 4 or 8 bytes depending on the actual instruction. We also
can't do pointer arithmetic using struct ppc_inst, because it is
always 8 bytes in size on 64-bit, even though we might only need to
advance by 4 bytes.

So add a ppc_inst_next() helper which calculates the location of the
next instruction, if the given instruction was located at the given
address. Note the instruction doesn't need to actually be at the
address in memory.

Convert several locations to use it.

Signed-off-by: Michael Ellerman 
---
  arch/powerpc/include/asm/inst.h   |  9 +
  arch/powerpc/kernel/uprobes.c |  2 +-
  arch/powerpc/lib/feature-fixups.c | 10 +-
  arch/powerpc/xmon/xmon.c  |  2 +-
  4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index d82e0c99cfa1..7d5ee1309b92 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x)
 return ppc_inst_prefixed(x) ? 8 : 4;
  }

+/*
+ * Return the address of the next instruction, if the instruction @value was
+ * located at @location.
+ */
+static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst 
value)
+{
+   return location + ppc_inst_len(value);
+}

I think this is a good idea. I tried something similar in the initial
post for an instruction type. I had:
+#define PPC_INST_NEXT(ptr) ((ptr) += PPC_INST_LEN(DEREF_PPC_INST_PTR((ptr
but how you've got it is much more clear/usable.


Yes I agree


I wonder why not
+static inline struct ppc_inst *ppc_inst_next(void *location)
+{
+   return location + ppc_inst_len(ppc_inst_read((struct ppc_inst
*)location);
+}


Because as Michael explains, the instruction to be skipped might not yet 
be at the pointed memory location (for instance in insert_bpts() )





+
  int probe_user_read_inst(struct ppc_inst *inst,
  struct ppc_inst __user *nip);

diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 83e883e1a42d..683ba76919a7 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
  * support doesn't exist and have to fix-up the next instruction
  * to be executed.
  */
-   regs->nip = utask->vaddr + ppc_inst_len(ppc_inst_read(&auprobe->insn));
+   regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, 
auprobe->insn);

 user_disable_single_step(current);
 return 0;
diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 80f320c2e189..0ad01eebf112 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, 
struct fixup_entry *fcur)
 src = alt_start;
 dest = start;

-   for (; src < alt_end; src = (void *)src + 
ppc_inst_len(ppc_inst_read(src)),
-(dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest {
+   for (; src < alt_end; src = ppc_inst_next(src, *src),
+ dest = ppc_inst_next(dest, *dest)) {

The reason to maybe use ppc_inst_read() in the helper instead of just
*dest would be we don't always need to read 8 bytes.


And reading 8 bytes might trigger a page fault if we are reading the 
very last non prefixed instruction of the last page.



 if (patch_alt_instruction(src, dest, alt_start, alt_end))
 return 1;
 }

-   for (; dest < end; dest = (void *)dest + 
ppc_inst_len(ppc_inst(PPC_INST_NOP)))
+   for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP)))

But then you wouldn't be able to do this as easily I guess.

 raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP));

 return 0;
@@ -405,8 +405,8 @@ static void do_final_fixups(void)
 while (src < end) {
 inst = ppc_inst_read(src);
 raw_patch_instruction(dest, inst);
-   src = (void *)src + ppc_inst_len(inst);
-   dest = (void *)dest + ppc_inst_len(inst);
+   src = ppc_inst_next(src, *src);
+   dest = ppc_inst_next(dest, *dest);
 }
  #endif
  }
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fb135f2cd6b0..aa123f56b7d4 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -939,7 +939,7 @@ static void insert_bpts(void)
 }

Re: [PATCH] powerpc: Add ppc_inst_next()

2020-05-20 Thread Jordan Niethe

On Wed, May 20, 2020 at 9:44 PM Michael Ellerman  wrote:
>
> In a few places we want to calculate the address of the next
> instruction. Previously that was simple, we just added 4 bytes, or if
> using a u32 * we incremented that pointer by 1.
>
> But prefixed instructions make it more complicated, we need to advance
> by either 4 or 8 bytes depending on the actual instruction. We also
> can't do pointer arithmetic using struct ppc_inst, because it is
> always 8 bytes in size on 64-bit, even though we might only need to
> advance by 4 bytes.
>
> So add a ppc_inst_next() helper which calculates the location of the
> next instruction, if the given instruction was located at the given
> address. Note the instruction doesn't need to actually be at the
> address in memory.
>
> Convert several locations to use it.
>
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/include/asm/inst.h   |  9 +
>  arch/powerpc/kernel/uprobes.c |  2 +-
>  arch/powerpc/lib/feature-fixups.c | 10 +-
>  arch/powerpc/xmon/xmon.c  |  2 +-
>  4 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index d82e0c99cfa1..7d5ee1309b92 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x)
> return ppc_inst_prefixed(x) ? 8 : 4;
>  }
>
> +/*
> + * Return the address of the next instruction, if the instruction @value was
> + * located at @location.
> + */
> +static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst 
> value)
> +{
> +   return location + ppc_inst_len(value);
> +}
I think this is a good idea. I tried something similar in the initial
post for an instruction type. I had:
+#define PPC_INST_NEXT(ptr) ((ptr) += PPC_INST_LEN(DEREF_PPC_INST_PTR((ptr
but how you've got it is much more clear/usable.
I wonder why not
+static inline struct ppc_inst *ppc_inst_next(void *location)
+{
+   return location + ppc_inst_len(ppc_inst_read((struct ppc_inst
*)location);
+}

> +
>  int probe_user_read_inst(struct ppc_inst *inst,
>  struct ppc_inst __user *nip);
>
> diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
> index 83e883e1a42d..683ba76919a7 100644
> --- a/arch/powerpc/kernel/uprobes.c
> +++ b/arch/powerpc/kernel/uprobes.c
> @@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
> struct pt_regs *regs)
>  * support doesn't exist and have to fix-up the next instruction
>  * to be executed.
>  */
> -   regs->nip = utask->vaddr + 
> ppc_inst_len(ppc_inst_read(&auprobe->insn));
> +   regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, 
> auprobe->insn);
>
> user_disable_single_step(current);
> return 0;
> diff --git a/arch/powerpc/lib/feature-fixups.c 
> b/arch/powerpc/lib/feature-fixups.c
> index 80f320c2e189..0ad01eebf112 100644
> --- a/arch/powerpc/lib/feature-fixups.c
> +++ b/arch/powerpc/lib/feature-fixups.c
> @@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, 
> struct fixup_entry *fcur)
> src = alt_start;
> dest = start;
>
> -   for (; src < alt_end; src = (void *)src + 
> ppc_inst_len(ppc_inst_read(src)),
> -(dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest {
> +   for (; src < alt_end; src = ppc_inst_next(src, *src),
> + dest = ppc_inst_next(dest, *dest)) {
The reason to maybe use ppc_inst_read() in the helper instead of just
*dest would be we don't always need to read 8 bytes.
> if (patch_alt_instruction(src, dest, alt_start, alt_end))
> return 1;
> }
>
> -   for (; dest < end; dest = (void *)dest + 
> ppc_inst_len(ppc_inst(PPC_INST_NOP)))
> +   for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP)))
But then you wouldn't be able to do this as easily I guess.
> raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP));
>
> return 0;
> @@ -405,8 +405,8 @@ static void do_final_fixups(void)
> while (src < end) {
> inst = ppc_inst_read(src);
> raw_patch_instruction(dest, inst);
> -   src = (void *)src + ppc_inst_len(inst);
> -   dest = (void *)dest + ppc_inst_len(inst);
> +   src = ppc_inst_next(src, *src);
> +   dest = ppc_inst_next(dest, *dest);
> }
>  #endif
>  }
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index fb135f2cd6b0..aa123f56b7d4 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -939,7 +939,7 @@ static void insert_bpts(void)
> }
>
> patch_instruction(bp->instr, instr);
> -   patch_instruction((void *)bp->instr + ppc_inst_len(instr),
> +   patch_instruction(ppc_inst_next(bp

[PATCH] powerpc/configs/64s: Enable CONFIG_PRINTK_CALLER

2020-05-20 Thread Michael Ellerman

This adds the CPU or thread number to printk messages. This helps a
lot when deciphering concurrent oopses that have been interleaved.

Example output, of PID1 (T1) triggering a warning:

  [1.581678][T1] WARNING: CPU: 0 PID: 1 at crypto/rsa-pkcs1pad.c:539 
pkcs1pad_verify+0x38/0x140
  [1.581681][T1] Modules linked in:
  [1.581693][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty #1515
  [1.581700][T1] NIP:  c0207d64 LR: c0207d3c CTR: 
c0207d2c
  [1.581708][T1] REGS: c000fd2e7560 TRAP: 0700   Not tainted  
(5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty)
  [1.581712][T1] MSR:  90029033   CR: 
44000222  XER: 0004

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/configs/powernv_defconfig | 1 +
 arch/powerpc/configs/ppc64_defconfig   | 1 +
 arch/powerpc/configs/pseries_defconfig | 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index df8bdbaa5d8f..2de9aadf0f50 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -347,3 +347,4 @@ CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=m
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
+CONFIG_PRINTK_CALLER=y
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index bae8170d7401..57142a648ebd 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -358,6 +358,7 @@ CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
 CONFIG_CRYPTO_DEV_VMX=y
 CONFIG_PRINTK_TIME=y
+CONFIG_PRINTK_CALLER=y
 CONFIG_MAGIC_SYSRQ=y
 CONFIG_DEBUG_KERNEL=y
 CONFIG_DEBUG_STACK_USAGE=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 0bea4d3ffb85..dfa4a726333b 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -322,3 +322,4 @@ CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=m
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
+CONFIG_PRINTK_CALLER=y
-- 
2.25.1

[PATCH] powerpc: Add ppc_inst_next()

2020-05-20 Thread Michael Ellerman

In a few places we want to calculate the address of the next
instruction. Previously that was simple, we just added 4 bytes, or if
using a u32 * we incremented that pointer by 1.

But prefixed instructions make it more complicated, we need to advance
by either 4 or 8 bytes depending on the actual instruction. We also
can't do pointer arithmetic using struct ppc_inst, because it is
always 8 bytes in size on 64-bit, even though we might only need to
advance by 4 bytes.

So add a ppc_inst_next() helper which calculates the location of the
next instruction, if the given instruction was located at the given
address. Note the instruction doesn't need to actually be at the
address in memory.

Convert several locations to use it.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/inst.h   |  9 +
 arch/powerpc/kernel/uprobes.c |  2 +-
 arch/powerpc/lib/feature-fixups.c | 10 +-
 arch/powerpc/xmon/xmon.c  |  2 +-
 4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
index d82e0c99cfa1..7d5ee1309b92 100644
--- a/arch/powerpc/include/asm/inst.h
+++ b/arch/powerpc/include/asm/inst.h
@@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x)
return ppc_inst_prefixed(x) ? 8 : 4;
 }
 
+/*
+ * Return the address of the next instruction, if the instruction @value was
+ * located at @location.
+ */
+static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst 
value)
+{
+   return location + ppc_inst_len(value);
+}
+
 int probe_user_read_inst(struct ppc_inst *inst,
 struct ppc_inst __user *nip);
 
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index 83e883e1a42d..683ba76919a7 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, 
struct pt_regs *regs)
 * support doesn't exist and have to fix-up the next instruction
 * to be executed.
 */
-   regs->nip = utask->vaddr + ppc_inst_len(ppc_inst_read(&auprobe->insn));
+   regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, 
auprobe->insn);
 
user_disable_single_step(current);
return 0;
diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 80f320c2e189..0ad01eebf112 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, 
struct fixup_entry *fcur)
src = alt_start;
dest = start;
 
-   for (; src < alt_end; src = (void *)src + 
ppc_inst_len(ppc_inst_read(src)),
-(dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest {
+   for (; src < alt_end; src = ppc_inst_next(src, *src),
+ dest = ppc_inst_next(dest, *dest)) {
if (patch_alt_instruction(src, dest, alt_start, alt_end))
return 1;
}
 
-   for (; dest < end; dest = (void *)dest + 
ppc_inst_len(ppc_inst(PPC_INST_NOP)))
+   for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP)))
raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP));
 
return 0;
@@ -405,8 +405,8 @@ static void do_final_fixups(void)
while (src < end) {
inst = ppc_inst_read(src);
raw_patch_instruction(dest, inst);
-   src = (void *)src + ppc_inst_len(inst);
-   dest = (void *)dest + ppc_inst_len(inst);
+   src = ppc_inst_next(src, *src);
+   dest = ppc_inst_next(dest, *dest);
}
 #endif
 }
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fb135f2cd6b0..aa123f56b7d4 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -939,7 +939,7 @@ static void insert_bpts(void)
}
 
patch_instruction(bp->instr, instr);
-   patch_instruction((void *)bp->instr + ppc_inst_len(instr),
+   patch_instruction(ppc_inst_next(bp->instr, instr),
  ppc_inst(bpinstr));
if (bp->enabled & BP_CIABR)
continue;
-- 
2.25.1

Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe

2020-05-20 Thread Shengjiu Wang

Hi

On Wed, May 20, 2020 at 5:42 PM Lucas Stach  wrote:
>
> Am Mittwoch, den 20.05.2020, 16:20 +0800 schrieb Shengjiu Wang:
> > Hi
> >
> > On Tue, May 19, 2020 at 6:04 PM Lucas Stach  wrote:
> > > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang:
> > > > There are two requirements that we need to move the request
> > > > of dma channel from probe to open.
> > >
> > > How do you handle -EPROBE_DEFER return code from the channel request if
> > > you don't do it in probe?
> >
> > I use the dma_request_slave_channel or dma_request_channel instead
> > of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER
> > return code.
>
> This is a pretty weak argument. The dmaengine device might probe after
> you try to get the channel. Using a function to request the channel
> that doesn't allow you to handle probe deferral is IMHO a bug and
> should be fixed, instead of building even more assumptions on top of
> it.
>

I see some driver also request dma channel in open() or hw_params().
how can they avoid the defer probe issue?
for example：
sound/arm/pxa2xx-pcm-lib.c
sound/soc/sprd/sprd-pcm-dma.c

> > > > - When dma device binds with power-domains, the power will
> > > > be enabled when we request dma channel. If the request of dma
> > > > channel happen on probe, then the power-domains will be always
> > > > enabled after kernel boot up,  which is not good for power
> > > > saving,  so we need to move the request of dma channel to .open();
> > >
> > > This is certainly something which could be fixed in the dmaengine
> > > driver.
> >
> > Dma driver always call the pm_runtime_get_sync in
> > device_alloc_chan_resources, the device_alloc_chan_resources is
> > called when channel is requested. so power is enabled on channel
> > request.
>
> So why can't you fix the dmaengine driver to do that RPM call at a
> later time when the channel is actually going to be used? This will
> allow further power savings with other slave devices than the audio
> PCM.
>
> Regards,
> Lucas
>

It seems the best place for calling pm_runtime_get_sync is the
device_alloc_chan_resources, and calling pm_runtime_put_sync
in the .device_free_chan_resources

For the slave_sg mode, the .device_prep_slave_sg and
.device_issue_pending  will be called many times after
.device_alloc_chan_resources. so it is not good to call
pm_runtime_get_sync in .device_prep_slave_sg or
.device_issue_pending

best regards
wang shengjiu

[PATCH] powerpc/xmon: Show task->thread.regs in process display

2020-05-20 Thread Michael Ellerman

Show the address of the tasks regs in the process listing in xmon. The
regs should always be on the stack page that we also print the address
of, but it's still helpful not to have to find them by hand.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/xmon/xmon.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index de585204d1d2..fb135f2cd6b0 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3185,8 +3185,8 @@ static void show_task(struct task_struct *tsk)
(tsk->exit_state & EXIT_DEAD) ? 'E' :
(tsk->state & TASK_INTERRUPTIBLE) ? 'S' : '?';
 
-   printf("%px %016lx %6d %6d %c %2d %s\n", tsk,
-   tsk->thread.ksp,
+   printf("%16px %16lx %16px %6d %6d %c %2d %s\n", tsk,
+   tsk->thread.ksp, tsk->thread.regs,
tsk->pid, rcu_dereference(tsk->parent)->pid,
state, task_cpu(tsk),
tsk->comm);
@@ -3309,7 +3309,7 @@ static void show_tasks(void)
unsigned long tskv;
struct task_struct *tsk = NULL;
 
-   printf(" task_struct ->thread.kspPID   PPID S  P CMD\n");
+   printf(" task_struct ->thread.ksp->thread.regsPID   
PPID S  P CMD\n");
 
if (scanhex(&tskv))
tsk = (struct task_struct *)tskv;
-- 
2.25.1

Re: [PATCH] powerpc/5200: update contact email

2020-05-20 Thread Michael Ellerman

On Sat, 2 May 2020 16:26:42 +0200, Wolfram Sang wrote:
> My 'pengutronix' address is defunct for years. Merge the entries and use
> the proper contact address.

Applied to powerpc/next.

[1/1] powerpc/5200: update contact email
  https://git.kernel.org/powerpc/c/ad0f522df1b2f4fe5d4ae6418e1ea216154a0662

cheers

Re: [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously

2020-05-20 Thread Michael Ellerman

On Tue, 28 Apr 2020 13:45:04 +1000, Sam Bobroff wrote:
> Here are some fixes and cleanups that have come from other work but that I
> think stand on their own.
> 
> Only one patch ("Release EEH device state synchronously", suggested by Oliver
> O'Halloran) is a significant change: it moves the cleanup of some EEH device
> data out of the (possibly asynchronous) device release handler and into the
> (synchronously called) bus notifier. This is useful for future work as it 
> makes
> it easier to reason about the lifetimes of EEH structures.
> 
> [...]

Applied to powerpc/next.

[1/2] powerpc/eeh: Fix pseries_eeh_configure_bridge()
  https://git.kernel.org/powerpc/c/6fa13640aea7bb0760846981aa2da4245307bd26
[2/2] powerpc/eeh: Release EEH device state synchronously
  https://git.kernel.org/powerpc/c/466381ecdc741b1767d980e10b1ec49f6bde56f3

cheers

Re: [PATCH v6 00/16] powerpc/watchpoint: Preparation for more than one watchpoint

2020-05-20 Thread Michael Ellerman

On Thu, 14 May 2020 16:47:25 +0530, Ravi Bangoria wrote:
> So far, powerpc Book3S code has been written with an assumption of
> only one watchpoint. But Power10[1] is introducing second watchpoint
> register (DAWR). Even though this patchset does not enable 2nd DAWR,
> it makes the infrastructure ready so that enabling 2nd DAWR should
> just be a matter of changing count.
> 
> Existing functionality works fine with the patchset. I've tested it
> with perf, ptrace(gdb), xmon. All hw-breakpoint selftests are passing
> as well. And I've build tested for 8xx and 'AMCC 44x, 46x or 47x'.
> 
> [...]

Applied to powerpc/next.

[01/16] powerpc/watchpoint: Rename current DAWR macros

https://git.kernel.org/powerpc/c/09f82b063aa9c248a3ef919aeec361054e7b044a
[02/16] powerpc/watchpoint: Add SPRN macros for second DAWR

https://git.kernel.org/powerpc/c/4a4ec2289a5d748cb64ff67ca8d74535a76a8436
[03/16] powerpc/watchpoint: Introduce function to get nr watchpoints dynamically

https://git.kernel.org/powerpc/c/a6ba44e8799230e36c8ab06fda7f77f421e9e795
[04/16] powerpc/watchpoint/ptrace: Return actual num of available watchpoints

https://git.kernel.org/powerpc/c/45093b382e0ac25c206b4dcd210c6be1f5e56e60
[05/16] powerpc/watchpoint: Provide DAWR number to set_dawr

https://git.kernel.org/powerpc/c/a18b834625d345bfa89c4e2754dd6cbb0133c4d7
[06/16] powerpc/watchpoint: Provide DAWR number to __set_breakpoint

https://git.kernel.org/powerpc/c/4a8a9379f2af4c9928529b3959bc2d8f7023c6bc
[07/16] powerpc/watchpoint: Get watchpoint count dynamically while disabling 
them

https://git.kernel.org/powerpc/c/c2919132734f29a7a33e1339bef8a67b11f322eb
[08/16] powerpc/watchpoint: Disable all available watchpoints when 
!dawr_force_enable

https://git.kernel.org/powerpc/c/22a214e461c5cc9428b86915d9cfcf84c6e11ad7
[09/16] powerpc/watchpoint: Convert thread_struct->hw_brk to an array

https://git.kernel.org/powerpc/c/303e6a9ddcdc168e92253c78cdb4bbe1e10d78b3
[10/16] powerpc/watchpoint: Use loop for thread_struct->ptrace_bps

https://git.kernel.org/powerpc/c/6b424efa119d5ea06b15ff240dddc3b4b9f9cdfb
[11/16] powerpc/watchpoint: Introduce is_ptrace_bp() function

https://git.kernel.org/powerpc/c/c9e82aeb197df2d93b1b4234bc0c80943fa594e8
[12/16] powerpc/watchpoint: Use builtin ALIGN*() macros

https://git.kernel.org/powerpc/c/e68ef121c1f4c38edf87a3354661ceb99d522729
[13/16] powerpc/watchpoint: Prepare handler to handle more than one watchpoint

https://git.kernel.org/powerpc/c/74c6881019b7d56c327fffc268d97adb5eb1b4f9
[14/16] powerpc/watchpoint: Don't allow concurrent perf and ptrace events

https://git.kernel.org/powerpc/c/29da4f91c0c1fbda12b8a31be0d564930208c92e
[15/16] powerpc/watchpoint/xmon: Don't allow breakpoint overwriting

https://git.kernel.org/powerpc/c/514db915e7b33e7eaf8e40192b93380f79b319b5
[16/16] powerpc/watchpoint/xmon: Support 2nd DAWR

https://git.kernel.org/powerpc/c/30df74d67d48949da87e3a5b57c381763e8fd526

cheers

Re: [PATCH v4 00/16] powerpc: machine check and system reset fixes

2020-05-20 Thread Michael Ellerman

On Fri, 8 May 2020 14:33:52 +1000, Nicholas Piggin wrote:
> Since v3, I fixed a compile error and returned the generic machine check
> exception handler to be NMI on 32 and 64e, as caught by Christophe's
> review.
> 
> Also added the last patch, just found it by looking at the code, a
> review for 32s would be good.
> 
> [...]

Patches 1-15 applied to powerpc/next.

[01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup

https://git.kernel.org/powerpc/c/8a5054d8cbbe03c68dcb0957c291c942132e4101
[02/16] powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path

https://git.kernel.org/powerpc/c/ac2a2a1417391180ef12f908a2864692d6d76d40
[03/16] powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR 
to RESULT

https://git.kernel.org/powerpc/c/16754d25bd7d4e53a52b311d99cc7a8fba875d81
[04/16] powerpc/64s/exceptions: Machine check reconcile irq state

https://git.kernel.org/powerpc/c/f0fd9dd3c213c947dfb5bc2cad3ef5e30d3258ec
[05/16] powerpc/pseries/ras: Avoid calling rtas_token() in NMI paths

https://git.kernel.org/powerpc/c/7368b38b21bfa39df637701a480262c15ab1a49e
[06/16] powerpc/pseries/ras: Fix FWNMI_VALID off by one

https://git.kernel.org/powerpc/c/deb70f7a35a22dffa55b2c3aac71bc6fb0f486ce
[07/16] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case

https://git.kernel.org/powerpc/c/dff681e95a23f28b3c688a8bd5535f78bd726bc8
[08/16] powerpc/pseries/ras: fwnmi sreset should not interlock

https://git.kernel.org/powerpc/c/d7b14c5c042865070a1411078ab49ea17bad0b41
[09/16] powerpc/pseries: Limit machine check stack to 4GB

https://git.kernel.org/powerpc/c/d2cbbd45d433b96e41711a293e59cff259143694
[10/16] powerpc/pseries: Machine check use rtas_call_unlocked() with args on 
stack

https://git.kernel.org/powerpc/c/2576f5f9169620bf329cf1e91086e6041b98e4b2
[11/16] powerpc/64s: machine check interrupt update NMI accounting

https://git.kernel.org/powerpc/c/116ac378bb3ff844df333e7609e7604651a0db9d
[12/16] powerpc: Implement ftrace_enabled() helpers

https://git.kernel.org/powerpc/c/f2d7f62e4abdb03de3f4267361d96c417312d05c
[13/16] powerpc/64s: machine check do not trace real-mode handler

https://git.kernel.org/powerpc/c/abd106fb437ad1cd8c8df8ccabd0fa941ef6342a
[14/16] powerpc/traps: Do not trace system reset

https://git.kernel.org/powerpc/c/bbbc8032b00f8ef287894425fbdb691049e28d39
[15/16] powerpc/traps: Make unrecoverable NMIs die instead of panic

https://git.kernel.org/powerpc/c/265d6e588d87194c2fe2d6c240247f0264e0c19b

cheers

Re: [PATCH v2] powerpc/64: Update Speculation_Store_Bypass in /proc//status

2020-05-20 Thread Michael Ellerman

On Thu, 2 Apr 2020 23:49:29 +1100, Michael Ellerman wrote:
> Currently we don't report anything useful in /proc//status:
> 
>   $ grep Speculation_Store_Bypass /proc/self/status
>   Speculation_Store_Bypass:   unknown
> 
> Our mitigation is currently always a barrier instruction, which
> doesn't map that well onto the existing possibilities for the PR_SPEC
> values.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/64: Update Speculation_Store_Bypass in /proc//status
  https://git.kernel.org/powerpc/c/d93e5e2d03d4f41dfedb92200a2c0413ab8ee4e7

cheers

Re: [PATCH v2 1/4] powerpc/64s: Always has full regs, so remove remnant checks

2020-05-20 Thread Michael Ellerman

On Thu, 7 May 2020 22:13:29 +1000, Michael Ellerman wrote:
> 


Applied to powerpc/next.

[1/4] powerpc/64s: Always has full regs, so remove remnant checks
  https://git.kernel.org/powerpc/c/feb9df3462e688d073848d85c8bb132fe8fd9ae5
[2/4] powerpc: Use set_trap() and avoid open-coding trap masking
  https://git.kernel.org/powerpc/c/db30144b5c9cfb09c6b8b2fa7a9c351c94aa3433
[3/4] powerpc: trap_is_syscall() helper to hide syscall trap number
  https://git.kernel.org/powerpc/c/912237ea166428edcbf3c137adf12cb987c477f2
[4/4] powerpc: Use trap metadata to prevent double restart rather than zeroing 
trap
  https://git.kernel.org/powerpc/c/4e0e45b07d790253643ee05300784ab2156e2d5e

cheers

Re: [PATCH] powerpc/64: Don't initialise init_task->thread.regs

2020-05-20 Thread Michael Ellerman

On Tue, 28 Apr 2020 22:31:30 +1000, Michael Ellerman wrote:
> Aneesh increased the size of struct pt_regs by 16 bytes and started
> seeing this WARN_ON:
> 
>   smp: Bringing up secondary CPUs ...
>   [ cut here ]
>   WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 
> giveup_all+0xb4/0x110
>   Modules linked in:
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318
>   NIP:  c001a2b4 LR: c001a29c CTR: c31d
>   REGS: c26d3980 TRAP: 0700   Not tainted  
> (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+)
>   MSR:  8282b033   CR: 48048224  
> XER: 
>   CFAR: c0019cc8 IRQMASK: 1
>   GPR00: c001a264 c26d3c20 c26d7200 8280b033
>   GPR04: 0001  0077 30206d7372203164
>   GPR08: 2000 02002000 8280b033 3230303030303030
>   GPR12: 8800 c31d 00800050 0266
>   GPR16: 0309a1a0 0309a4b0 0309a2d8 0309a890
>   GPR20: 030d0098 c264da40 fd62 c000ff798080
>   GPR24: c264edf0 c001007469f0 fd62 c20e5e90
>   GPR28: c264edf0 c264d200 1db6 c264d200
>   NIP [c001a2b4] giveup_all+0xb4/0x110
>   LR [c001a29c] giveup_all+0x9c/0x110
>   Call Trace:
>   [c26d3c20] [c001a264] giveup_all+0x64/0x110 (unreliable)
>   [c26d3c90] [c001ae34] __switch_to+0x104/0x480
>   [c26d3cf0] [c0e0b8a0] __schedule+0x320/0x970
>   [c26d3dd0] [c0e0c518] schedule_idle+0x38/0x70
>   [c26d3df0] [c019c7c8] do_idle+0x248/0x3f0
>   [c26d3e70] [c019cbb8] cpu_startup_entry+0x38/0x40
>   [c26d3ea0] [c0011bb0] rest_init+0xe0/0xf8
>   [c26d3ed0] [c2004820] start_kernel+0x990/0x9e0
>   [c26d3f90] [c000c49c] start_here_common+0x1c/0x400
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/64: Don't initialise init_task->thread.regs
  https://git.kernel.org/powerpc/c/7ffa8b7dc11752827329e4e84a574ea6aaf24716

cheers

Re: [PATCH] selftests/powerpc: Add a test of counting larx/stcx

2020-05-20 Thread Michael Ellerman

On Sun, 26 Apr 2020 21:44:10 +1000, Michael Ellerman wrote:
> This is based on the count_instructions test.
> 
> However this one also counts the number of failed stcx's, and in
> conjunction with knowing the size of the stcx loop, can calculate the
> total number of instructions executed even in the face of
> non-deterministic stcx failures.

Applied to powerpc/next.

[1/1] selftests/powerpc: Add a test of counting larx/stcx
  https://git.kernel.org/powerpc/c/7481cad4747303442209bc5dba2f56c3afcea07d

cheers

Re: [PATCH] drivers/macintosh: Fix memleak in windfarm_pm112 driver

2020-05-20 Thread Michael Ellerman

On Thu, 23 Apr 2020 16:00:38 +1000, Michael Ellerman wrote:
> create_cpu_loop() calls smu_sat_get_sdb_partition() which does
> kmalloc() and returns the allocated buffer. In fact it's called twice,
> and neither buffer is freed.
> 
> This results in a memory leak as reported by Erhard:
>   unreferenced object 0xc0047081f840 (size 32):
> comm "kwindfarm", pid 203, jiffies 4294880630 (age 5552.877s)
> hex dump (first 32 bytes):
>   c8 06 02 7f ff 02 ff 01 fb bf 00 41 00 20 00 00  ...A. ..
>   00 07 89 37 00 a0 00 00 00 00 00 00 00 00 00 00  ...7
> backtrace:
>   [<83f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 
> [windfarm_smu_sat]
>   [<3010fcb7>] .pm112_wf_notify+0x104c/0x13bc [windfarm_pm112]
>   [] .notifier_call_chain+0xa8/0x180
>   [<70490868>] .blocking_notifier_call_chain+0x64/0x90
>   [<131d8149>] .wf_thread_func+0x114/0x1a0
>   [<0d54838d>] .kthread+0x13c/0x190
>   [<669b72bc>] .ret_from_kernel_thread+0x58/0x64
>   unreferenced object 0xc004737089f0 (size 16):
> comm "kwindfarm", pid 203, jiffies 4294880879 (age 5552.050s)
> hex dump (first 16 bytes):
>   c4 04 01 7f 22 11 e0 e6 ff 55 7b 12 ec 11 00 00  "U{.
> backtrace:
>   [<83f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 
> [windfarm_smu_sat]
>   [] .pm112_wf_notify+0x1294/0x13bc [windfarm_pm112]
>   [] .notifier_call_chain+0xa8/0x180
>   [<70490868>] .blocking_notifier_call_chain+0x64/0x90
>   [<131d8149>] .wf_thread_func+0x114/0x1a0
>   [<0d54838d>] .kthread+0x13c/0x190
>   [<669b72bc>] .ret_from_kernel_thread+0x58/0x64
> 
> [...]

Applied to powerpc/next.

[1/1] drivers/macintosh: Fix memleak in windfarm_pm112 driver
  https://git.kernel.org/powerpc/c/93900337b9ac2f4eca427eff6d187be2dc3b5551

cheers

Re: [PATCH] powerpc: Drop unneeded cast in task_pt_regs()

2020-05-20 Thread Michael Ellerman

On Tue, 28 Apr 2020 22:31:52 +1000, Michael Ellerman wrote:
> There's no need to cast in task_pt_regs() as tsk->thread.regs should
> already be a struct pt_regs. If someone's using task_pt_regs() on
> something that's not a task but happens to have a thread.regs then
> we'll deal with them later.

Applied to powerpc/next.

[1/1] powerpc: Drop unneeded cast in task_pt_regs()
  https://git.kernel.org/powerpc/c/24ac99e97fa7b8f0db9b48413a76def9cf73295c

cheers

Re: [PATCH v8 00/30] Initial Prefixed Instruction support

2020-05-20 Thread Michael Ellerman

On Wed, 6 May 2020 13:40:20 +1000, Jordan Niethe wrote:
> A future revision of the ISA will introduce prefixed instructions. A
> prefixed instruction is composed of a 4-byte prefix followed by a
> 4-byte suffix.
> 
> All prefixes have the major opcode 1. A prefix will never be a valid
> word instruction. A suffix may be an existing word instruction or a
> new instruction.
> 
> [...]

Applied to powerpc/next.

[01/30] powerpc/xmon: Remove store_inst() for patch_instruction()

https://git.kernel.org/powerpc/c/802268fd82676ffce432776f60b93a0b15e58e0c
[02/30] powerpc/xmon: Move breakpoint instructions to own array

https://git.kernel.org/powerpc/c/51c9ba11f17f25ace1ea6bbfd4586c59105432de
[03/30] powerpc/xmon: Move breakpoints to text section

https://git.kernel.org/powerpc/c/4eff2b4f32a309e2171bfe53db3e93b5614f77cb
[04/30] powerpc/xmon: Use bitwise calculations in_breakpoint_table()

https://git.kernel.org/powerpc/c/5a7fdcab54ef17c395fc47e73c828a1432e51683
[05/30] powerpc: Change calling convention for create_branch() et. al.

https://git.kernel.org/powerpc/c/7c95d8893fb55869882c9f68f4c94840dc43f18f
[06/30] powerpc: Use a macro for creating instructions from u32s

https://git.kernel.org/powerpc/c/753462512868674a788ecc77bb96752efb818785
[07/30] powerpc: Use an accessor for instructions

https://git.kernel.org/powerpc/c/777e26f0edf8dab58b8dd474d35d83bde0ac6d76
[08/30] powerpc: Use a function for getting the instruction op code

https://git.kernel.org/powerpc/c/8094892d1aff14269d3b7bfcd8b941217eecd81f
[09/30] powerpc: Use a function for byte swapping instructions

https://git.kernel.org/powerpc/c/aabd2233b6aefeee6d7a2f667076d8346be1d30a
[10/30] powerpc: Introduce functions for instruction equality

https://git.kernel.org/powerpc/c/217862d9b98bf08958d57fd7b31b9de0f1a9477d
[11/30] powerpc: Use a datatype for instructions

https://git.kernel.org/powerpc/c/94afd069d937d84fb4f696eb9a78db4084e43d21
[12/30] powerpc: Use a function for reading instructions

https://git.kernel.org/powerpc/c/f8faaffaa7d99028e457ef2d1dcb43a98f736938
[13/30] powerpc: Add a probe_user_read_inst() function

https://git.kernel.org/powerpc/c/7ba68b2172c19031fdc2a2caf37328edd146e299
[14/30] powerpc: Add a probe_kernel_read_inst() function

https://git.kernel.org/powerpc/c/95b980a00d1220ca67550a933166704db8bc5c14
[15/30] powerpc/kprobes: Use patch_instruction()

https://git.kernel.org/powerpc/c/a8646f43ba5046e7f5c4396125d5136bfcb17b49
[16/30] powerpc: Define and use get_user_instr() et. al.

https://git.kernel.org/powerpc/c/5249385ad7f0ac178433f0ae9cc5b64612c8ff77
[17/30] powerpc: Introduce a function for reporting instruction length

https://git.kernel.org/powerpc/c/622cf6f436a12338bbcfbb3474629755547fd112
[18/30] powerpc/xmon: Use a function for reading instructions

https://git.kernel.org/powerpc/c/6c7a4f0a9f66fc7fdc6e208559e5d562f53e0991
[19/30] powerpc/xmon: Move insertion of breakpoint for xol'ing

https://git.kernel.org/powerpc/c/7fccfcfba04f9cb46438f368755d368f6c57f3a0
[20/30] powerpc: Make test_translate_branch() independent of instruction length

https://git.kernel.org/powerpc/c/0b582db5490a1f250ef63337dd46d5c7599dae80
[21/30] powerpc: Enable Prefixed Instructions

https://git.kernel.org/powerpc/c/2aa6195e43b3740258ead93aee42ac719dd4c4b0
[22/30] powerpc: Define new SRR1 bits for a ISA v3.1

https://git.kernel.org/powerpc/c/b691505ef9232a6e82f1c160911afcb4cb20487b
[23/30] powerpc: Add prefixed instructions to instruction data type

https://git.kernel.org/powerpc/c/650b55b707fdfa764e9f2b81314d3eb4216fb962
[24/30] powerpc: Test prefixed code patching

https://git.kernel.org/powerpc/c/f77f8ff7f13e6411c2e0ba25bb7e012a5ae6c927
[25/30] powerpc: Test prefixed instructions in feature fixups

https://git.kernel.org/powerpc/c/785b79d1e02873c2088ee1301154c66dace66ce5
[26/30] powerpc/xmon: Don't allow breakpoints on suffixes

https://git.kernel.org/powerpc/c/c9c831aebd8663d0129bbcee4d76be889f0627fe
[27/30] powerpc/kprobes: Don't allow breakpoints on suffixes

https://git.kernel.org/powerpc/c/b4657f7650babc9bfb41ce875abe41b18604a105
[28/30] powerpc: Support prefixed instructions in alignment handler

https://git.kernel.org/powerpc/c/9409d2f9dad2f0679d67dc24d8116dd3e837b035
[29/30] powerpc sstep: Add support for prefixed load/stores

https://git.kernel.org/powerpc/c/50b80a12e4ccff46d53b93754d817acd98bc9ae0
[30/30] powerpc sstep: Add support for prefixed fixed-point arithmetic

https://git.kernel.org/powerpc/c/3920742b92f5ea19a220edb947b6f33c99f501da

cheers

Re: [PATCH] powerpc: Replace zero-length array with flexible-array

2020-05-20 Thread Michael Ellerman

On Thu, 7 May 2020 13:57:49 -0500, Gustavo A. R. Silva wrote:
> The current codebase makes use of the zero-length array language
> extension to the C90 standard, but the preferred mechanism to declare
> variable-length types such as these ones is a flexible array member[1][2],
> introduced in C99:
> 
> struct foo {
> int stuff;
> struct boo array[];
> };
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc: Replace zero-length array with flexible-array
  https://git.kernel.org/powerpc/c/0f6be41c60699fd8cdfa93e5e85a306cec1ac1d0

cheers

Re: [PATCH] powerpc/mm: Replace zero-length array with flexible-array

2020-05-20 Thread Michael Ellerman

On Thu, 7 May 2020 13:57:55 -0500, Gustavo A. R. Silva wrote:
> The current codebase makes use of the zero-length array language
> extension to the C90 standard, but the preferred mechanism to declare
> variable-length types such as these ones is a flexible array member[1][2],
> introduced in C99:
> 
> struct foo {
> int stuff;
> struct boo array[];
> };
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/mm: Replace zero-length array with flexible-array
  https://git.kernel.org/powerpc/c/02bddf21c34d0a918acc8647195ba4507e3db8fc

cheers

Re: [PATCH v2 0/9] powerpc + ps3 patches

2020-05-20 Thread Michael Ellerman

On Sat, 09 May 2020 18:58:31 +, Geoff Levand wrote:
> This is a combined V2 of the two patch sets I sent out on March 27th,
> 'PS3 patches for v5.7' and 'powerpc: Minor updates to improve build 
> debugging'.
> 
> I've dropped these two patches that were in my 'PS3 patches for v5.7' set:
> 
>   powerpc/ps3: Add lv1_panic
>   powerpc/ps3: Add udbg_panic
> 
> [...]

Patches 1-6 and 8 applied to powerpc/next.

[1/9] powerpc/head_check: Automatic verbosity
  https://git.kernel.org/powerpc/c/4c592a34391ea4987d29c1718f931b50416ca015
[2/9] powerpc/wrapper: Output linker map file
  https://git.kernel.org/powerpc/c/f61200d3e3386e78d49677dfb3911c9d7c0dfe4b
[3/9] powerpc/head_check: Avoid broken pipe
  https://git.kernel.org/powerpc/c/331aa46aaf51325d8532a4948f5127b2edc441a5
[4/9] drivers/ps3: Remove duplicate error messages
  https://git.kernel.org/powerpc/c/6a8aa782cece2330322c33452a767f53f8ba38c9
[5/9] net/ps3_gelic_net: Remove duplicate error message
  https://git.kernel.org/powerpc/c/7b27b95a894d6a85c076f8d1f00e35316739bf51
[6/9] ps3disk: use the default segment boundary
  https://git.kernel.org/powerpc/c/720bc316690bd27dea9d71510b50f0cd698ffc32
[8/9] powerpc/ps3: Fix kexec shutdown hang
  https://git.kernel.org/powerpc/c/126554465d93b10662742128918a5fc338cda4aa

cheers

Re: [PATCH] powerpc/8xx: Update email address in MAINTAINERS

2020-05-20 Thread Michael Ellerman

On Wed, 6 May 2020 06:51:59 + (UTC), Christophe Leroy wrote:
> Since 01 May 2020, our email adresses have changed to @csgroup.eu
> 
> Update MAINTAINERS accordingly.

Applied to powerpc/next.

[1/1] powerpc/8xx: Update email address in MAINTAINERS
  https://git.kernel.org/powerpc/c/679d74abc4e14cb369e46f080d90f4dc8c143e65

cheers

Re: [PATCH] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT

2020-05-20 Thread Michael Ellerman

On Wed, 8 Apr 2020 15:58:49 + (UTC), Christophe Leroy wrote:
> When CONFIG_KASAN is selected, the stack usage is increased.
> 
> In the same way as x86 and arm64 architectures, increase
> THREAD_SHIFT when CONFIG_KASAN is selected.

Applied to powerpc/next.

[1/1] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
  https://git.kernel.org/powerpc/c/edbadaf0671072298e506074128b64e003c5812c

cheers

Re: [PATCH 1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN()

2020-05-20 Thread Michael Ellerman

On Mon, 20 Apr 2020 18:36:34 + (UTC), Christophe Leroy wrote:
> _ALIGN_UP() is specific to powerpc
> ALIGN() is generic and does the same
> 
> Replace _ALIGN_UP() by ALIGN()

Applied to powerpc/next.

[1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN()
  https://git.kernel.org/powerpc/c/7bfc3c84cbf5167d943cff9b3d2619dab0b7894c
[2/5] powerpc: Replace _ALIGN_DOWN() by ALIGN_DOWN()
  https://git.kernel.org/powerpc/c/e96d904ede6756641563d27daa746875b478a6c8
[3/5] powerpc: Replace _ALIGN_UP() by ALIGN()
  https://git.kernel.org/powerpc/c/b711531641038f3ff3723914f3d5ba79848d347e
[4/5] powerpc: Replace _ALIGN() by ALIGN()
  https://git.kernel.org/powerpc/c/d3f3d3bf76cfb04e73436a15e3987d3573e7523a
[5/5] powerpc: Remove _ALIGN_UP(), _ALIGN_DOWN() and _ALIGN()
  https://git.kernel.org/powerpc/c/4cdb2da654033d76e1b1cb4ac427d9193dce816b

cheers

[PATCH] Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."

2020-05-20 Thread Christophe Leroy

This reverts commit 697ece78f8f749aeea40f2711389901f0974017a.

The implementation of SWAP on powerpc requires page protection
bits to not be one of the least significant PTE bits.

Until the SWAP implementation is changed and this requirement voids,
we have to keep at least _PAGE_RW outside of the 3 last bits.

For now, revert to previous PTE bits order. A further rework
may come later.

Reported-by: Rui Salvaterra 
Fixes: 697ece78f8f7 ("powerpc/32s: reorder Linux PTE bits to better match Hash 
PTE bits.")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/hash.h |  8 
 arch/powerpc/kernel/head_32.S |  9 ++---
 arch/powerpc/mm/book3s32/hash_low.S   | 14 --
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
index 34a7215ae81e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -17,9 +17,9 @@
  * updating the accessed and modified bits in the page table tree.
  */
 
-#define _PAGE_USER 0x001   /* usermode access allowed */
-#define _PAGE_RW   0x002   /* software: user write access allowed */
-#define _PAGE_PRESENT  0x004   /* software: pte contains a translation */
+#define _PAGE_PRESENT  0x001   /* software: pte contains a translation */
+#define _PAGE_HASHPTE  0x002   /* hash_page has made an HPTE for this pte */
+#define _PAGE_USER 0x004   /* usermode access allowed */
 #define _PAGE_GUARDED  0x008   /* G: prohibit speculative access */
 #define _PAGE_COHERENT 0x010   /* M: enforce memory coherence (SMP systems) */
 #define _PAGE_NO_CACHE 0x020   /* I: cache inhibit */
@@ -27,7 +27,7 @@
 #define _PAGE_DIRTY0x080   /* C: page changed */
 #define _PAGE_ACCESSED 0x100   /* R: page referenced */
 #define _PAGE_EXEC 0x200   /* software: exec allowed */
-#define _PAGE_HASHPTE  0x400   /* hash_page has made an HPTE for this pte */
+#define _PAGE_RW   0x400   /* software: user write access allowed */
 #define _PAGE_SPECIAL  0x800   /* software: Special page */
 
 #ifdef CONFIG_PTE_64BIT
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index daaa153950c2..97c887950c3c 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -348,7 +348,7 @@ BEGIN_MMU_FTR_SECTION
andis.  r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH)@h
 #endif
bne handle_page_fault_tramp_2   /* if not, try to put a PTE */
-   rlwinm  r3, r5, 32 - 24, 30, 30 /* DSISR_STORE -> _PAGE_RW */
+   rlwinm  r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */
bl  hash_page
b   handle_page_fault_tramp_1
 FTR_SECTION_ELSE
@@ -497,6 +497,7 @@ InstructionTLBMiss:
andc.   r1,r1,r0/* check access & ~permission */
bne-InstructionAddressInvalid /* return if access not permitted */
/* Convert linux-style PTE to low word of PPC-style PTE */
+   rlwimi  r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */
ori r1, r1, 0xe06   /* clear out reserved bits */
andcr1, r0, r1  /* PP = user? 1 : 0 */
 BEGIN_FTR_SECTION
@@ -564,8 +565,9 @@ DataLoadTLBMiss:
 * we would need to update the pte atomically with lwarx/stwcx.
 */
/* Convert linux-style PTE to low word of PPC-style PTE */
-   rlwinm  r1,r0,0,30,30   /* _PAGE_RW -> PP msb */
-   rlwimi  r0,r0,1,30,30   /* _PAGE_USER -> PP msb */
+   rlwinm  r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */
+   rlwimi  r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */
+   rlwimi  r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */
ori r1,r1,0xe04 /* clear out reserved bits */
andcr1,r0,r1/* PP = user? rw? 1: 3: 0 */
 BEGIN_FTR_SECTION
@@ -643,6 +645,7 @@ DataStoreTLBMiss:
 * we would need to update the pte atomically with lwarx/stwcx.
 */
/* Convert linux-style PTE to low word of PPC-style PTE */
+   rlwimi  r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */
li  r1,0xe06/* clear out reserved bits & PP msb */
andcr1,r0,r1/* PP = user? 1: 0 */
 BEGIN_FTR_SECTION
diff --git a/arch/powerpc/mm/book3s32/hash_low.S 
b/arch/powerpc/mm/book3s32/hash_low.S
index 6d236080cb1a..877d880890fe 100644
--- a/arch/powerpc/mm/book3s32/hash_low.S
+++ b/arch/powerpc/mm/book3s32/hash_low.S
@@ -35,7 +35,7 @@ mmu_hash_lock:
 /*
  * Load a PTE into the hash table, if possible.
  * The address is in r4, and r3 contains an access flag:
- * _PAGE_RW (0x002) if a write.
+ * _PAGE_RW (0x400) if a write.
  * r9 contains the SRR1 value, from which we use the MSR_PR bit.
  * SPRG_THREAD contains the physical address of the current task's thread.
  *
@@ -69,7 +69,7 @@ _GLOBAL(hash_page)
blt+112f

Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-05-20 Thread Vaibhav Jain

Thanks for reviewing this patch Aneesh.

"Aneesh Kumar K.V"  writes:

> Vaibhav Jain  writes:
>
> 
>
>  +
>> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm 
>> */
>> +struct nd_pdsm_cmd_pkg {
>> +struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
>> +__s32 cmd_status;   /* Out: Sub-cmd status returned back */
>> +__u16 payload_offset;   /* In: offset from start of struct */
>> +__u16 payload_version;  /* In/Out: version of the payload */
>> +__u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> +} __packed;
>
> that payload_offset can be avoided if we prevent userspace to user a
> different variant of nd_pdsm_cmd_pkg which different header. We can keep
> things simpler if we can always find payload at
> nd_pdsm_cmd_pkg->payload.
Had introduced this member to handle case where new fields are added to
'struct nd_pdsm_cmd_pkg' without having to break the ABI. But agree with
the point that you made later that this can be simplified by replacing
'payload_offset' with a set of reserved variables. Will address this in
next iteration of this patchset.

>
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the 
>> kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> + */
>> +enum papr_scm_pdsm {
>> +PAPR_SCM_PDSM_MIN = 0x0,
>> +PAPR_SCM_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg 
>> *cmd)
>> +{
>> +return (struct nd_pdsm_cmd_pkg *) cmd;
>> +}
>> +
>> +/* Return the payload pointer for a given pcmd */
>> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> +{
>> +if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> +return NULL;
>> +else
>> +return (void *)((__u8 *) pcmd + pcmd->payload_offset);
>> +}
>> +
>
> we need to make sure userspace is not passing a wrong payload_offset.
Agree, that this function should have more strict checking for
payload_offset field. However will be getting rid of
'payload_offset' all together in the next iteration as you previously
suggested.

> and in the next patch you do
>
> + /* Copy the health struct to the payload */
> + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
> + pkg->hdr.nd_fw_size = copysize;
> +
Yes this is already being done in the patchset and changes proposed to
this pdsm_cmd_to_payload() should not impact other patches as
pdsm_cmd_to_payload() abstracts rest of the code from how to access the
payload.

> All this can be simplified if you can keep payload at
> nd_pdsm_cmd_pkg->payload.
>
> If you still want to have the ability to extend the header, then added a
> reserved field similar to nd_cmd_pkg.
>
Agree to this and will address this in V8.

>
> -aneesh

-- 
Cheers
~ Vaibhav

[PATCH V3 2/2] tools/perf: Add perf tools support for extended register capability in powerpc

2020-05-20 Thread Athira Rajeev

From: Anju T Sudhakar 

Add extended regs to sample_reg_mask in the tool side to use
with `-I?` option. Perf tools side uses extended mask to display
the platform supported register names (with -I? option) to the user
and also send this mask to the kernel to capture the extended registers
in each sample. Hence decide the mask value based on the processor
version.

Signed-off-by: Anju T Sudhakar 
[Decide extended mask at run time based on platform]
Signed-off-by: Athira Rajeev 
---
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
 tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 55 +
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h 
b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_POWERPC_MMCR2,
+   /* Max regs without the extended regs */
+   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+   - PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/tools/perf/arch/powerpc/include/perf_regs.h 
b/tools/perf/arch/powerpc/include/perf_regs.h
index e18a355..46ed00d 100644
--- a/tools/perf/arch/powerpc/include/perf_regs.h
+++ b/tools/perf/arch/powerpc/include/perf_regs.h
@@ -64,7 +64,10 @@
[PERF_REG_POWERPC_DAR] = "dar",
[PERF_REG_POWERPC_DSISR] = "dsisr",
[PERF_REG_POWERPC_SIER] = "sier",
-   [PERF_REG_POWERPC_MMCRA] = "mmcra"
+   [PERF_REG_POWERPC_MMCRA] = "mmcra",
+   [PERF_REG_POWERPC_MMCR0] = "mmcr0",
+   [PERF_REG_POWERPC_MMCR1] = "mmcr1",
+   [PERF_REG_POWERPC_MMCR2] = "mmcr2",
 };
 
 static inline const char *perf_reg_name(int id)
diff --git a/tools/perf/arch/powerpc/util/perf_regs.c 
b/tools/perf/arch/powerpc/util/perf_regs.c
index 0a52429..9179230 100644
--- a/tools/perf/arch/powerpc/util/perf_regs.c
+++ b/tools/perf/arch/powerpc/util/perf_regs.c
@@ -6,9 +6,14 @@
 
 #include "../../../util/perf_regs.h"
 #include "../../../util/debug.h"
+#include "../../../util/event.h"
+#include "../../../util/header.h"
+#include "../../../perf-sys.h"
 
 #include 
 
+#define PVR_POWER9 0x004E
+
 const struct sample_reg sample_reg_masks[] = {
SMPL_REG(r0, PERF_REG_POWERPC_R0),
SMPL_REG(r1, PERF_REG_POWERPC_R1),
@@ -55,6 +60,9 @@
SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR),
SMPL_REG(sier, PERF_REG_POWERPC_SIER),
SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA),
+   SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0),
+   SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1),
+   SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2),
SMPL_REG_END
 };
 
@@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op)
 
return SDT_ARG_VALID;
 }
+
+uint64_t arch__intr_reg_mask(void)
+{
+   struct perf_event_attr attr = {
+   .type   = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_CPU_CYCLES,
+   .sample_type= PERF_SAMPLE_REGS_INTR,
+   .precise_ip = 1,
+   .disabled   = 1,
+   .exclude_kernel = 1,
+   };
+   int fd, ret;
+   char buffer[64];
+   u32 version;
+   u64 extended_mask = 0;
+
+   /* Get the PVR value to set the extended
+* mask specific to platform
+*/
+   get_cpuid(buffer, sizeof(buffer));
+   ret = sscanf(buffer, "%u,", &version);
+
+   if (ret != 1) {
+   pr_debug("Failed to get the processor version, unable to output 
extended registers\n");
+   return PERF_REGS_MASK;
+   }
+
+   if (version == PVR_POWER9)
+   extended_mask = PERF_REG_PMU_MASK_300;
+   else
+   return PERF_REGS_MASK;
+
+   attr.sample_regs_intr = extended_mask;
+   attr.sample_period = 1;
+   event_attr_init(&attr);
+
+   /*
+* check if the pmu supports perf extended regs, before
+* returning the register mask to sample.
+*/
+   fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
+   if (fd != -1) {
+   close(fd);
+   return (extended_mask | PERF_REGS_MASK);
+   }
+   return PERF_REGS_MASK;
+}
-- 
1.8.3.1

[PATCH V3 1/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-05-20 Thread Athira Rajeev

From: Anju T Sudhakar 

Add support for perf extended register capability in powerpc.
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
PMU which support extended registers. The generic code define the mask
of extended registers as 0 for non supported architectures.

Patch adds extended regs support for power9 platform by
exposing MMCR0, MMCR1 and MMCR2 registers.

REG_RESERVED mask needs update to include extended regs.
`PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers,
is defined at runtime in the kernel based on platform since the supported
registers may differ from one processor version to another and hence the
MASK value.

with patch
--

available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11
r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26
r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe
trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2

PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0
... intr regs: mask 0x ABI 64-bit
 r00xc012b77c
 r10xc03fe5e03930
 r20xc1b0e000
 r30xc03fdcddf800
 r40xc03fc788
 r50x9c422724be
 r60xc03fe5e03908
 r70xff63bddc8706
 r80x9e4
 r90x0
 r10   0x1
 r11   0x0
 r12   0xc01299c0
 r13   0xc03c4800
 r14   0x0
 r15   0x7fffdd8b8b00
 r16   0x0
 r17   0x7fffdd8be6b8
 r18   0x7e7076607730
 r19   0x2f
 r20   0xc0001fc26c68
 r21   0xc0002041e4227e00
 r22   0xc0002018fb60
 r23   0x1
 r24   0xc03ffec4d900
 r25   0x8000
 r26   0x0
 r27   0x1
 r28   0x1
 r29   0xc1be1260
 r30   0x6008010
 r31   0xc03ffebb7218
 nip   0xc012b910
 msr   0x90009033
 orig_r3 0xc012b86c
 ctr   0xc01299c0
 link  0xc012b77c
 xer   0x0
 ccr   0x2800
 softe 0x1
 trap  0xf00
 dar   0x0
 dsisr 0x800
 sier  0x0
 mmcra 0x800
 mmcr0 0x82008090
 mmcr1 0x1e00
 mmcr2 0x0
 ... thread: perf:4784

Signed-off-by: Anju T Sudhakar 
[Defined PERF_REG_EXTENDED_MASK at run time to add support for different 
platforms ]
Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/perf_event_server.h |  8 +++
 arch/powerpc/include/uapi/asm/perf_regs.h| 14 +++-
 arch/powerpc/perf/core-book3s.c  |  1 +
 arch/powerpc/perf/perf_regs.c| 34 +---
 arch/powerpc/perf/power9-pmu.c   |  6 +
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3e9703f..1458e1a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -15,6 +15,9 @@
 #define MAX_EVENT_ALTERNATIVES 8
 #define MAX_LIMITED_HWCOUNTERS 2
 
+extern u64 mask_var;
+#define PERF_REG_EXTENDED_MASK  mask_var
+
 struct perf_event;
 
 /*
@@ -55,6 +58,11 @@ struct power_pmu {
int *blacklist_ev;
/* BHRB entries in the PMU */
int bhrb_nr;
+   /*
+* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
+* the pmu supports extended perf regs capability
+*/
+   int capabilities;
 };
 
 /*
diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064..485b1d5 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,18 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_POWERPC_MMCR2,
+   /* Max regs without the extended regs */
+   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+
+/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */
+#define PERF_REG_PMU_MASK_300   (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \
+   - PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf..f56b778 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2276,6 +2276,7 @@ int register_power_pmu(struct power_pmu *pmu)
 
power_pmu.attr_groups = ppmu->attr_groups;
 
+   power_pmu.capabilities |= (ppmu->capabilities & 
PERF_PMU_CAP_EXTENDED_REGS);
 #ifdef MSR_HV
/*
 * Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
index a213a0a..f1dbbc5 10064

[PATCH V3 0/2] powerpc/perf: Add support for perf extended regs in powerpc

2020-05-20 Thread Athira Rajeev

Patch set to add support for perf extended register capability in
powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to
indicate the PMU which support extended registers. The generic code
define the mask of extended registers as 0 for non supported architectures.

patch 1/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the
values of mmcr0,mmcr1,mmcr2 for POWER9. Defines `PERF_REG_EXTENDED_MASK`
at runtime which contains mask value of the supported registers under
extended regs.

Patch 2/2 adds extended regs to sample_reg_mask in the tool side to use
with `-I?` option.

Anju T Sudhakar (2):
  powerpc/perf: Add support for outputting extended regs in perf
intr_regs
  tools/perf: Add perf tools support for extended register capability in
powerpc
---
Changes from v2 -> v3
- Split kernel and tools side patches as suggested by Arnaldo
- Addressed review comment from Madhavan Srinivasn

Changes from v1 -> v2

- PERF_REG_EXTENDED_MASK` is defined at runtime in the kernel
based on platform. This will give flexibility in using extended
regs for all processor versions where the supported registers may differ.
- removed PERF_REG_EXTENDED_MASK from the perf tools side. Based on the
processor version(from PVR value), tool side will return the appropriate
extended mask
- Since tool changes can handle without a "PERF_REG_EXTENDED_MASK" macro,
dropped patch to set NO_AUXTRACE.
- Addressed review comments from Ravi Bangoria for V1

---

 arch/powerpc/include/asm/perf_event_server.h|  8 
 arch/powerpc/include/uapi/asm/perf_regs.h   | 14 ++-
 arch/powerpc/perf/core-book3s.c |  1 +
 arch/powerpc/perf/perf_regs.c   | 34 +--
 arch/powerpc/perf/power9-pmu.c  |  6 +++
 tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++-
 tools/perf/arch/powerpc/include/perf_regs.h |  5 ++-
 tools/perf/arch/powerpc/util/perf_regs.c| 55 +
 8 files changed, 131 insertions(+), 6 deletions(-)

-- 
1.8.3.1

Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe

2020-05-20 Thread Lucas Stach

Am Mittwoch, den 20.05.2020, 16:20 +0800 schrieb Shengjiu Wang:
> Hi
> 
> On Tue, May 19, 2020 at 6:04 PM Lucas Stach  wrote:
> > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang:
> > > There are two requirements that we need to move the request
> > > of dma channel from probe to open.
> > 
> > How do you handle -EPROBE_DEFER return code from the channel request if
> > you don't do it in probe?
> 
> I use the dma_request_slave_channel or dma_request_channel instead
> of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER
> return code.

This is a pretty weak argument. The dmaengine device might probe after
you try to get the channel. Using a function to request the channel
that doesn't allow you to handle probe deferral is IMHO a bug and
should be fixed, instead of building even more assumptions on top of
it.

> > > - When dma device binds with power-domains, the power will
> > > be enabled when we request dma channel. If the request of dma
> > > channel happen on probe, then the power-domains will be always
> > > enabled after kernel boot up,  which is not good for power
> > > saving,  so we need to move the request of dma channel to .open();
> > 
> > This is certainly something which could be fixed in the dmaengine
> > driver.
> 
> Dma driver always call the pm_runtime_get_sync in
> device_alloc_chan_resources, the device_alloc_chan_resources is
> called when channel is requested. so power is enabled on channel
> request.

So why can't you fix the dmaengine driver to do that RPM call at a
later time when the channel is actually going to be used? This will
allow further power savings with other slave devices than the audio
PCM.

Regards,
Lucas

> > > - With FE-BE case, if the dma channel is requested in probe,
> > > then there will be below issue, which is caused by that the
> > > dma channel will be requested duplicately
> > 
> > Why is this requested a second time? Is this just some missing cleanup
> > on a deferred probe path?
> 
> Not relate with deferred probe.  With DMA1->ASRC->DMA2->ESAI case,
> the DMA1->ASRC->DMA2 is in FE,  ESAI is in BE.  When ESAI drvier
> probe,  DMA3 channel is created with ESAI's "dma:tx" (DMA3 channel
> is not used in this FE-BE case).When FE-BE startup, DMA2
> channel is created, it needs the ESAI's "dma:tx", so below warning
> comes out.
> 
> > Regards,
> > Lucas
> > 
> > > [  638.906268] sysfs: cannot create duplicate filename 
> > > '/devices/soc0/soc/200.bus/200.spba-bus/2024000.esai/dma:tx'
> > > [  638.919061] CPU: 1 PID: 673 Comm: aplay Not tainted 
> > > 5.7.0-rc1-12956-gfc64b2585593 #287
> > > [  638.927113] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > > [  638.933690] [] (unwind_backtrace) from [] 
> > > (show_stack+0x10/0x14)
> > > [  638.941464] [] (show_stack) from [] 
> > > (dump_stack+0xe4/0x118)
> > > [  638.948808] [] (dump_stack) from [] 
> > > (sysfs_warn_dup+0x50/0x64)
> > > [  638.956406] [] (sysfs_warn_dup) from [] 
> > > (sysfs_do_create_link_sd+0xc8/0xd4)
> > > [  638.965134] [] (sysfs_do_create_link_sd) from [] 
> > > (dma_request_chan+0xb0/0x210)
> > > [  638.974120] [] (dma_request_chan) from [] 
> > > (dma_request_slave_channel+0x8/0x14)
> > > [  638.983111] [] (dma_request_slave_channel) from [] 
> > > (fsl_asrc_dma_hw_params+0x1e0/0x438)
> > > [  638.992881] [] (fsl_asrc_dma_hw_params) from [] 
> > > (soc_pcm_hw_params+0x4a0/0x6a8)
> > > [  639.001952] [] (soc_pcm_hw_params) from [] 
> > > (dpcm_fe_dai_hw_params+0x70/0xe4)
> > > [  639.010765] [] (dpcm_fe_dai_hw_params) from [] 
> > > (snd_pcm_hw_params+0x158/0x418)
> > > [  639.019750] [] (snd_pcm_hw_params) from [] 
> > > (snd_pcm_ioctl+0x734/0x183c)
> > > [  639.028129] [] (snd_pcm_ioctl) from [] 
> > > (ksys_ioctl+0x2ac/0xb98)
> > > [  639.035812] [] (ksys_ioctl) from [] 
> > > (ret_fast_syscall+0x0/0x28)
> > > [  639.043490] Exception stack(0xec529fa8 to 0xec529ff0)
> > > [  639.048565] 9fa0:   bee84650 01321870 0004 
> > > c25c4111 bee84650 0002000f
> > > [  639.056766] 9fc0: bee84650 01321870 01321820 0036 1f40 
> > >  0002c2f8 0003
> > > [  639.064964] 9fe0: b6f483fc bee8451c b6ee2655 b6e1dcf8
> > > [  639.070339] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
> > > 
> > > Signed-off-by: Shengjiu Wang 
> > > ---
> > >  sound/soc/fsl/imx-pcm-dma.c | 173 +---
> > >  1 file changed, 159 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/sound/soc/fsl/imx-pcm-dma.c b/sound/soc/fsl/imx-pcm-dma.c
> > > index 04a9bc749016..dae53b384df4 100644
> > > --- a/sound/soc/fsl/imx-pcm-dma.c
> > > +++ b/sound/soc/fsl/imx-pcm-dma.c
> > > @@ -11,6 +11,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > > 
> > >  #include 
> > >  #include 
> > > @@ -29,24 +30,168 @@ static bool filter(struct dma_chan *chan, void 
> > > *param)
> > >   return true;
> > >  }
> > > 
> > > -static const struct snd_dmaengine_pcm_config imx_dmaengine

Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open

2020-05-20 Thread Jiri Slaby

On 15. 05. 20, 1:22, rana...@codeaurora.org wrote:
> On 2020-05-13 00:04, Greg KH wrote:
>> On Tue, May 12, 2020 at 02:39:50PM -0700, rana...@codeaurora.org wrote:
>>> On 2020-05-12 01:25, Greg KH wrote:
>>> > On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote:
>>> > > commit bdb498c20040616e94b05c31a0ceb3e134b7e829
>>> > > Author: Jiri Slaby 
>>> > > Date:   Tue Aug 7 21:48:04 2012 +0200
>>> > >
>>> > > TTY: hvc_console, add tty install
>>> > >
>>> > > added hvc_install but did not move 'tty->driver_data = NULL;' from
>>> > > hvc_open's fail path to hvc_cleanup.
>>> > >
>>> > > IOW hvc_open now NULLs tty->driver_data even for another task which
>>> > > opened the tty earlier. The same holds for
>>> > > "tty_port_tty_set(&hp->port,
>>> > > NULL);" there. And actually "tty_port_put(&hp->port);" is also
>>> > > incorrect
>>> > > for the 2nd task opening the tty.
>>> > >

...

> These are the traces you get when the issue happens:
> [  154.212291] hvc_install called for pid: 666
> [  154.216705] hvc_open called for pid: 666
> [  154.233657] hvc_open: request_irq failed with rc -22.
> [  154.238934] hvc_open called for pid: 678
> [  154.243012] Unable to handle kernel NULL pointer dereference at
> virtual address 00c4
> # hvc_install isn't called for pid: 678 as the file wasn't closed yet.

Nice. Does the attached help?

I wonder how comes the tty_port_put in hvc_open does not cause a UAF? I
would say hvc_open fails, tty_port_put is called. It decrements the
reference taken in hvc_install. So far so good.

Now, this should happen IMO:
tty_open
  -> hvc_open (fails)
-> tty_port_put
  -> tty_release
-> tty_release_struct
  -> tty_kref_put
-> queue_release_one_tty
SCHEDULED WORKQUEUE
release_one_tty
  -> hvc_cleanup
-> tty_port_put (should die terrible death now)

What am I missing?

thanks,
-- 
js
suse labs
>From d891cdfcbd3b41eb23ddfc8d9e6cbe038ff8fb72 Mon Sep 17 00:00:00 2001
From: Jiri Slaby 
Date: Wed, 20 May 2020 11:29:25 +0200
Subject: [PATCH] hvc_console: fix open

Signed-off-by: Jiri Slaby 
---
 drivers/tty/hvc/hvc_console.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
index 436cc51c92c3..cdcc64ea2554 100644
--- a/drivers/tty/hvc/hvc_console.c
+++ b/drivers/tty/hvc/hvc_console.c
@@ -371,15 +371,14 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 	 * tty fields and return the kref reference.
 	 */
 	if (rc) {
-		tty_port_tty_set(&hp->port, NULL);
-		tty->driver_data = NULL;
-		tty_port_put(&hp->port);
 		printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc);
-	} else
+	} else {
 		/* We are ready... raise DTR/RTS */
 		if (C_BAUD(tty))
 			if (hp->ops->dtr_rts)
 hp->ops->dtr_rts(hp, 1);
+		tty_port_set_initialized(&hp->port, true);
+	}
 
 	/* Force wakeup of the polling thread */
 	hvc_kick();
@@ -389,22 +388,12 @@ static int hvc_open(struct tty_struct *tty, struct file * filp)
 
 static void hvc_close(struct tty_struct *tty, struct file * filp)
 {
-	struct hvc_struct *hp;
+	struct hvc_struct *hp = tty->driver_data;
 	unsigned long flags;
 
 	if (tty_hung_up_p(filp))
 		return;
 
-	/*
-	 * No driver_data means that this close was issued after a failed
-	 * hvc_open by the tty layer's release_dev() function and we can just
-	 * exit cleanly because the kref reference wasn't made.
-	 */
-	if (!tty->driver_data)
-		return;
-
-	hp = tty->driver_data;
-
 	spin_lock_irqsave(&hp->port.lock, flags);
 
 	if (--hp->port.count == 0) {
@@ -412,6 +401,9 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		/* We are done with the tty pointer now. */
 		tty_port_tty_set(&hp->port, NULL);
 
+		if (!tty_port_initialized(&hp->port))
+			return;
+
 		if (C_HUPCL(tty))
 			if (hp->ops->dtr_rts)
 hp->ops->dtr_rts(hp, 0);
@@ -428,6 +420,7 @@ static void hvc_close(struct tty_struct *tty, struct file * filp)
 		 * waking periodically to check chars_in_buffer().
 		 */
 		tty_wait_until_sent(tty, HVC_CLOSE_WAIT);
+		tty_port_set_initialized(&hp->port, false);
 	} else {
 		if (hp->port.count < 0)
 			printk(KERN_ERR "hvc_close %X: oops, count is %d\n",
-- 
2.26.2

[v2 1/2] dts: ppc: t4240rdb: remove interrupts property

2020-05-20 Thread Biwen Li

From: Biwen Li 

This removes interrupts property to drop warning as follows:
- $ hwclock.util-linux
  hwclock.util-linux: select() to /dev/rtc0
  to wait for clock tick timed out

My case:
- RTC ds1374's INT pin is connected to VCC on T4240RDB,
  then the RTC cannot inform cpu about the alarm interrupt

Signed-off-by: Biwen Li 
---
 arch/powerpc/boot/dts/fsl/t4240rdb.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts 
b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
index a56a705d41f7..145896f2eef6 100644
--- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
@@ -144,7 +144,6 @@
rtc@68 {
compatible = "dallas,ds1374";
reg = <0x68>;
-   interrupts = <0x1 0x1 0 0>;
};
};
 
-- 
2.17.1

[v2 2/2] dts: ppc: t1024rdb: remove interrupts property

2020-05-20 Thread Biwen Li

From: Biwen Li 

This removes interrupts property to drop warning as follows:
- $ hwclock.util-linux
  hwclock.util-linux: select() to /dev/rtc0
  to wait for clock tick timed out

My case:
- RTC ds1339s INT pin isn't connected to cpus INT pin on T1024RDB,
  then the RTC cannot inform cpu about alarm interrupt

How to fix it?
- remove IRQ line

Signed-off-by: Biwen Li 
---
 arch/powerpc/boot/dts/fsl/t1024rdb.dts | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/fsl/t1024rdb.dts 
b/arch/powerpc/boot/dts/fsl/t1024rdb.dts
index 645caff98ed1..605ceec66af3 100644
--- a/arch/powerpc/boot/dts/fsl/t1024rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t1024rdb.dts
@@ -161,7 +161,6 @@
rtc@68 {
compatible = "dallas,ds1339";
reg = <0x68>;
-   interrupts = <0x1 0x1 0 0>;
};
};
 
-- 
2.17.1

[PATCH -next] scsi: ibmvscsi: Make some functions static

2020-05-20 Thread Chen Tao

Fix the following warning:

drivers/scsi/ibmvscsi/ibmvscsi.c:2387:12: warning: symbol
'ibmvscsi_module_init' was not declared. Should it be static?
drivers/scsi/ibmvscsi/ibmvscsi.c:2409:13: warning: symbol
'ibmvscsi_module_exit' was not declared. Should it be static?

Signed-off-by: Chen Tao 
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 59f0f1030c54..44e64aa21194 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2384,7 +2384,7 @@ static struct vio_driver ibmvscsi_driver = {
 static struct srp_function_template ibmvscsi_transport_functions = {
 };
 
-int __init ibmvscsi_module_init(void)
+static int __init ibmvscsi_module_init(void)
 {
int ret;
 
@@ -2406,7 +2406,7 @@ int __init ibmvscsi_module_init(void)
return ret;
 }
 
-void __exit ibmvscsi_module_exit(void)
+static void __exit ibmvscsi_module_exit(void)
 {
vio_unregister_driver(&ibmvscsi_driver);
srp_release_transport(ibmvscsi_transport_template);
-- 
2.22.0

Re: [PATCH v2] tty: hvc: Fix data abort due to race in hvc_open

2020-05-20 Thread Jiri Slaby

On 20. 05. 20, 8:47, Raghavendra Rao Ananta wrote:
> Potentially, hvc_open() can be called in parallel when two tasks calls
> open() on /dev/hvcX. In such a scenario, if the hp->ops->notifier_add()
> callback in the function fails, where it sets the tty->driver_data to
> NULL, the parallel hvc_open() can see this NULL and cause a memory abort.
> Hence, do a NULL check at the beginning, before proceeding ahead.
> 
> The issue can be easily reproduced by launching two tasks simultaneously
> that does an open() call on /dev/hvcX.
> For example:
> $ cat /dev/hvc0 & cat /dev/hvc0 &
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Raghavendra Rao Ananta 
> ---
>  drivers/tty/hvc/hvc_console.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 436cc51c92c3..80709f754cc8 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -350,6 +350,9 @@ static int hvc_open(struct tty_struct *tty, struct file * 
> filp)
>   unsigned long flags;
>   int rc = 0;
> 
> + if (!hp)
> + return -ENODEV;
> +

This is still not fixing the bug properly. See:
https://lore.kernel.org/linuxppc-dev/0f7791f5-0a53-59f6-7277-247a789f3...@suse.cz/

In particular, the paragraph starting "IOW".

thanks,
-- 
js
suse labs

[PATCH v4 6/6] asm-generic/tlb: avoid potential double flush

2020-05-20 Thread Santosh Sivaraj

From: Peter Zijlstra 

commit 0758cd8304942292e95a0f750c374533db378b32 upstream

Aneesh reported that:

tlb_flush_mmu()
  tlb_flush_mmu_tlbonly()
tlb_flush() <-- #1
  tlb_flush_mmu_free()
tlb_table_flush()
  tlb_table_invalidate()
tlb_flush_mmu_tlbonly()
  tlb_flush()   <-- #2

does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not
clear tlb->end in that case.

Observe that any caller to __tlb_adjust_range() also sets at least one of
the tlb->freed_tables || tlb->cleared_p* bits, and those are
unconditionally cleared by __tlb_reset_range().

Change the condition for actually issuing TLBI to having one of those bits
set, as opposed to having tlb->end != 0.

Link: 
http://lkml.kernel.org/r/20200116064531.483522-4-aneesh.ku...@linux.ibm.com
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Aneesh Kumar K.V 
Reported-by: "Aneesh Kumar K.V" 
Cc:   # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: backported to 4.19 stable]
---
 include/asm-generic/tlb.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 19934cdd143e..427a70c56ddd 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -179,7 +179,12 @@ static inline void __tlb_reset_range(struct mmu_gather 
*tlb)
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
 {
-   if (!tlb->end)
+   /*
+* Anything calling __tlb_adjust_range() also sets at least one of
+* these bits.
+*/
+   if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds ||
+ tlb->cleared_puds || tlb->cleared_p4ds))
return;
 
tlb_flush(tlb);
-- 
2.25.4

[PATCH v4 5/6] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush

2020-05-20 Thread Santosh Sivaraj

From: Peter Zijlstra 

commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream

Architectures for which we have hardware walkers of Linux page table
should flush TLB on mmu gather batch allocation failures and batch flush.
Some architectures like POWER supports multiple translation modes (hash
and radix) and in the case of POWER only radix translation mode needs the
above TLBI.  This is because for hash translation mode kernel wants to
avoid this extra flush since there are no hardware walkers of linux page
table.  With radix translation, the hardware also walks linux page table
and with that, kernel needs to make sure to TLB invalidate page walk cache
before page table pages are freed.

More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating
TLB caches for RCU_TABLE_FREE")

The changes to sparc are to make sure we keep the old behavior since we
are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
tlb_needs_table_invalidate is to always force an invalidate and sparc can
avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
false for sparc architecture.

Link: 
http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.ku...@linux.ibm.com
Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes")
Signed-off-by: Peter Zijlstra (Intel) 
Cc:   # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: backported to 4.19 stable]
---
 arch/Kconfig|  3 ---
 arch/powerpc/Kconfig|  1 -
 arch/powerpc/include/asm/tlb.h  | 11 +++
 arch/sparc/Kconfig  |  1 -
 arch/sparc/include/asm/tlb_64.h |  9 +
 include/asm-generic/tlb.h   | 15 +++
 mm/memory.c | 16 
 7 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 061a12b8140e..3abbdb0cea44 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,9 +363,6 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
bool
 
-config HAVE_RCU_TABLE_NO_INVALIDATE
-   bool
-
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
bool
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1a00ce4b0040..e5bc0cfea2b1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,7 +216,6 @@ config PPC
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
-   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index f0e571b2dc7c..63418275f402 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -30,6 +30,17 @@
 #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change
 
 extern void tlb_flush(struct mmu_gather *tlb);
+/*
+ * book3s:
+ * Hash does not use the linux page-tables, so we can avoid
+ * the TLB invalidate for page-table freeing, Radix otoh does use the
+ * page-tables and needs the TLBI.
+ *
+ * nohash:
+ * We still do TLB invalidate in the __pte_free_tlb routine before we
+ * add the page table pages to mmu gather table batch.
+ */
+#define tlb_needs_table_invalidate()   radix_enabled()
 
 /* Get the generic bits... */
 #include 
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index d90d632868aa..e6f2a38d2e61 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,7 +64,6 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select HAVE_RCU_TABLE_FREE if SMP
-   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_DYNAMIC_FTRACE
diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h
index a2f3fa61ee36..8cb8f3833239 100644
--- a/arch/sparc/include/asm/tlb_64.h
+++ b/arch/sparc/include/asm/tlb_64.h
@@ -28,6 +28,15 @@ void flush_tlb_pending(void);
 #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0)
 #define tlb_flush(tlb) flush_tlb_pending()
 
+/*
+ * SPARC64's hardware TLB fill does not use the Linux page-tables
+ * and therefore we don't need a TLBI when freeing page-table pages.
+ */
+
+#ifdef CONFIG_HAVE_RCU_TABLE_FREE
+#define tlb_needs_table_invalidate()   (false)
+#endif
+
 #include 
 
 #endif /* _SPARC64_TLB_H */
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index f2b9dc9cbaf8..19934cdd143e 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -61,8 +61,23 @@ struct mmu_table_batch {
 extern void tlb_table_flush(struct mmu_gather *tlb);
 extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
+/*
+ * This allows an architecture that does not use the linux page-tables for
+ * hardware to skip the TLBI when freeing page tables.
+ */
+#ifndef tlb_needs_table_i

[PATCH v4 4/6] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case

2020-05-20 Thread Santosh Sivaraj

From: "Aneesh Kumar K.V" 

commit 12e4d53f3f04e81f9e83d6fc10edc7314ab9f6b9 upstream

Patch series "Fixup page directory freeing", v4.

This is a repost of patch series from Peter with the arch specific changes
except ppc64 dropped.  ppc64 changes are added here because we are redoing
the patch series on top of ppc64 changes.  This makes it easy to backport
these changes.  Only the first 2 patches need to be backported to stable.

The thing is, on anything SMP, freeing page directories should observe the
exact same order as normal page freeing:

 1) unhook page/directory
 2) TLB invalidate
 3) free page/directory

Without this, any concurrent page-table walk could end up with a
Use-after-Free.  This is esp.  trivial for anything that has software
page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware
caches partial page-walks (ie.  caches page directories).

Even on UP this might give issues since mmu_gather is preemptible these
days.  An interrupt or preempted task accessing user pages might stumble
into the free page if the hardware caches page directories.

This patch series fixes ppc64 and add generic MMU_GATHER changes to
support the conversion of other architectures.  I haven't added patches
w.r.t other architecture because they are yet to be acked.

This patch (of 9):

A followup patch is going to make sure we correctly invalidate page walk
cache before we free page table pages.  In order to keep things simple
enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the
!SMP case differently in the followup patch

!SMP case is right now broken for radix translation w.r.t page walk
cache flush.  We can get interrupted in between page table free and
that would imply we have page walk cache entries pointing to tables
which got freed already.  Michael said "both our platforms that run on
Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a
problem for anyone in practice, unless they've hacked their kernel to
build it !SMP."

Link: 
http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.ku...@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: backported for 4.19 stable]
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 --
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 
 arch/powerpc/mm/pgtable-book3s64.c   | 7 ---
 5 files changed, 1 insertion(+), 26 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e09cfb109b8c..1a00ce4b0040 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -215,7 +215,7 @@ config PPC
select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && 
HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
-   select HAVE_RCU_TABLE_FREE  if SMP
+   select HAVE_RCU_TABLE_FREE
select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 82e44b1a00ae..79ba3fbb512e 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -110,7 +110,6 @@ static inline void pgtable_free(void *table, unsigned 
index_size)
 #define check_pgt_cache()  do { } while (0)
 #define get_hugepd_cache_index(x)  (x)
 
-#ifdef CONFIG_SMP
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
void *table, int shift)
 {
@@ -127,13 +126,6 @@ static inline void __tlb_remove_table(void *_table)
 
pgtable_free(table, shift);
 }
-#else
-static inline void pgtable_free_tlb(struct mmu_gather *tlb,
-   void *table, int shift)
-{
-   pgtable_free(table, shift);
-}
-#endif
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f9019b579903..1013c0214213 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -47,9 +47,7 @@ extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned 
long);
 extern void pte_fragment_free(unsigned long *, int);
 extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
-#ifdef CONFIG_SMP
 extern void __tlb_remove_table(void *_table);
-#endif
 
 static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 8825953c225b..96eed46d5684 100644
-

[PATCH v4 3/6] asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE

2020-05-20 Thread Santosh Sivaraj

From: Peter Zijlstra 

commit 96bc9567cbe112e9320250f01b9c060c882e8619 upstream

Make issuing a TLB invalidate for page-table pages the normal case.

The reason is twofold:

 - too many invalidates is safer than too few,
 - most architectures use the linux page-tables natively
   and would thus require this.

Make it an opt-out, instead of an opt-in.

No change in behavior intended.

Signed-off-by: Peter Zijlstra (Intel) 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: prerequisite for upcoming tlbflush backports]
---
 arch/Kconfig | 2 +-
 arch/powerpc/Kconfig | 1 +
 arch/sparc/Kconfig   | 1 +
 arch/x86/Kconfig | 1 -
 mm/memory.c  | 2 +-
 5 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a336548487e6..061a12b8140e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -363,7 +363,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_RCU_TABLE_FREE
bool
 
-config HAVE_RCU_TABLE_INVALIDATE
+config HAVE_RCU_TABLE_NO_INVALIDATE
bool
 
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6f475dc5829b..e09cfb109b8c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -216,6 +216,7 @@ config PPC
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE  if SMP
+   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN
select HAVE_SYSCALL_TRACEPOINTS
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e6f2a38d2e61..d90d632868aa 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -64,6 +64,7 @@ config SPARC64
select HAVE_KRETPROBES
select HAVE_KPROBES
select HAVE_RCU_TABLE_FREE if SMP
+   select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_DYNAMIC_FTRACE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index af35f5caadbe..181d0d522977 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,7 +181,6 @@ config X86
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE  if PARAVIRT
-   select HAVE_RCU_TABLE_INVALIDATEif HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && 
(UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
select HAVE_STACKPROTECTOR  if CC_HAS_SANE_STACKPROTECTOR
diff --git a/mm/memory.c b/mm/memory.c
index 1832c5ed6ac0..ba5689610c04 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -327,7 +327,7 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct 
page *page, int page_
  */
 static inline void tlb_table_invalidate(struct mmu_gather *tlb)
 {
-#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE
+#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE
/*
 * Invalidate page-table caches used by hardware walkers. Then we still
 * need to RCU-sched wait while freeing the pages because software
-- 
2.25.4

[PATCH v4 2/6] asm-generic/tlb: Track which levels of the page tables have been cleared

2020-05-20 Thread Santosh Sivaraj

From: Will Deacon 

commit a6d60245d6d9b1caf66b0d94419988c4836980af upstream

It is common for architectures with hugepage support to require only a
single TLB invalidation operation per hugepage during unmap(), rather than
iterating through the mapping at a PAGE_SIZE increment. Currently,
however, the level in the page table where the unmap() operation occurs
is not stored in the mmu_gather structure, therefore forcing
architectures to issue additional TLB invalidation operations or to give
up and over-invalidate by e.g. invalidating the entire TLB.

Ideally, we could add an interval rbtree to the mmu_gather structure,
which would allow us to associate the correct mapping granule with the
various sub-mappings within the range being invalidated. However, this
is costly in terms of book-keeping and memory management, so instead we
approximate by keeping track of the page table levels that are cleared
and provide a means to query the smallest granule required for invalidation.

Signed-off-by: Will Deacon 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: prerequisite for upcoming tlbflush backports]
---
 include/asm-generic/tlb.h | 58 +--
 mm/memory.c   |  4 ++-
 2 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 97306b32d8d2..f2b9dc9cbaf8 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -114,6 +114,14 @@ struct mmu_gather {
 */
unsigned intfreed_tables : 1;
 
+   /*
+* at which levels have we cleared entries?
+*/
+   unsigned intcleared_ptes : 1;
+   unsigned intcleared_pmds : 1;
+   unsigned intcleared_puds : 1;
+   unsigned intcleared_p4ds : 1;
+
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
struct page *__pages[MMU_GATHER_BUNDLE];
@@ -148,6 +156,10 @@ static inline void __tlb_reset_range(struct mmu_gather 
*tlb)
tlb->end = 0;
}
tlb->freed_tables = 0;
+   tlb->cleared_ptes = 0;
+   tlb->cleared_pmds = 0;
+   tlb->cleared_puds = 0;
+   tlb->cleared_p4ds = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -197,6 +209,25 @@ static inline void 
tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 }
 #endif
 
+static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb)
+{
+   if (tlb->cleared_ptes)
+   return PAGE_SHIFT;
+   if (tlb->cleared_pmds)
+   return PMD_SHIFT;
+   if (tlb->cleared_puds)
+   return PUD_SHIFT;
+   if (tlb->cleared_p4ds)
+   return P4D_SHIFT;
+
+   return PAGE_SHIFT;
+}
+
+static inline unsigned long tlb_get_unmap_size(struct mmu_gather *tlb)
+{
+   return 1UL << tlb_get_unmap_shift(tlb);
+}
+
 /*
  * In the case of tlb vma handling, we can optimise these away in the
  * case where we're doing a full MM flush.  When we're doing a munmap,
@@ -230,13 +261,19 @@ static inline void 
tlb_remove_check_page_size_change(struct mmu_gather *tlb,
 #define tlb_remove_tlb_entry(tlb, ptep, address)   \
do {\
__tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->cleared_ptes = 1;  \
__tlb_remove_tlb_entry(tlb, ptep, address); \
} while (0)
 
-#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)\
-   do { \
-   __tlb_adjust_range(tlb, address, huge_page_size(h)); \
-   __tlb_remove_tlb_entry(tlb, ptep, address);  \
+#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)   \
+   do {\
+   unsigned long _sz = huge_page_size(h);  \
+   __tlb_adjust_range(tlb, address, _sz);  \
+   if (_sz == PMD_SIZE)\
+   tlb->cleared_pmds = 1;  \
+   else if (_sz == PUD_SIZE)   \
+   tlb->cleared_puds = 1;  \
+   __tlb_remove_tlb_entry(tlb, ptep, address); \
} while (0)
 
 /**
@@ -250,6 +287,7 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address)   \
do {\
__tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE);   \
+   tlb->cleared_pmds = 1;  \
__tlb_remove_pmd_tlb_entry(tlb, pmdp, address); \
} while (0)
 
@@ -264,6 +302,7 @@ static inline void tlb_remove_check_page_size_ch

[PATCH v4 1/6] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather

2020-05-20 Thread Santosh Sivaraj

From: Peter Zijlstra 

commit 22a61c3c4f1379ef8b0ce0d5cb78baf3178950e2 upstream

Some architectures require different TLB invalidation instructions
depending on whether it is only the last-level of page table being
changed, or whether there are also changes to the intermediate
(directory) entries higher up the tree.

Add a new bit to the flags bitfield in struct mmu_gather so that the
architecture code can operate accordingly if it's the intermediate
levels being invalidated.

Signed-off-by: Peter Zijlstra 
Signed-off-by: Will Deacon 
Cc:  # 4.19
Signed-off-by: Santosh Sivaraj 
[santosh: prerequisite for tlbflush backports]
---
 include/asm-generic/tlb.h | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b3353e21f3b3..97306b32d8d2 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,12 +97,22 @@ struct mmu_gather {
 #endif
unsigned long   start;
unsigned long   end;
-   /* we are in the middle of an operation to clear
-* a full mm and can make some optimizations */
-   unsigned intfullmm : 1,
-   /* we have performed an operation which
-* requires a complete flush of the tlb */
-   need_flush_all : 1;
+   /*
+* we are in the middle of an operation to clear
+* a full mm and can make some optimizations
+*/
+   unsigned intfullmm : 1;
+
+   /*
+* we have performed an operation which
+* requires a complete flush of the tlb
+*/
+   unsigned intneed_flush_all : 1;
+
+   /*
+* we have removed page directories
+*/
+   unsigned intfreed_tables : 1;
 
struct mmu_gather_batch *active;
struct mmu_gather_batch local;
@@ -137,6 +147,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb)
tlb->start = TASK_SIZE;
tlb->end = 0;
}
+   tlb->freed_tables = 0;
 }
 
 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
@@ -278,6 +289,7 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #define pte_free_tlb(tlb, ptep, address)   \
do {\
__tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__pte_free_tlb(tlb, ptep, address); \
} while (0)
 #endif
@@ -285,7 +297,8 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #ifndef pmd_free_tlb
 #define pmd_free_tlb(tlb, pmdp, address)   \
do {\
-   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__pmd_free_tlb(tlb, pmdp, address); \
} while (0)
 #endif
@@ -295,6 +308,7 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #define pud_free_tlb(tlb, pudp, address)   \
do {\
__tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__pud_free_tlb(tlb, pudp, address); \
} while (0)
 #endif
@@ -304,7 +318,8 @@ static inline void tlb_remove_check_page_size_change(struct 
mmu_gather *tlb,
 #ifndef p4d_free_tlb
 #define p4d_free_tlb(tlb, pudp, address)   \
do {\
-   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   __tlb_adjust_range(tlb, address, PAGE_SIZE);\
+   tlb->freed_tables = 1;  \
__p4d_free_tlb(tlb, pudp, address); \
} while (0)
 #endif
-- 
2.25.4

[PATCH v4 0/6] Memory corruption may occur due to incorrent tlb flush

2020-05-20 Thread Santosh Sivaraj

The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
flushes) may result in random memory corruption. Any concurrent page-table walk
could end up with a Use-after-Free. Even on UP this might give issues, since
mmu_gather is preemptible these days. An interrupt or preempted task accessing
user pages might stumble into the free page if the hardware caches page
directories.

The series is a backport of the fix sent by Peter [1].

The first three patches are dependencies for the last patch (avoid potential
double flush). If the performance impact due to double flush is considered
trivial then the first three patches and last patch may be dropped.

This is only for v4.19 stable.
--
Changelog:
 v2: Send the patches with the correct format (commit sha1 upstream) for stable
 v3: Fix compilation for ppc44x_defconfig and mpc885_ads_defconfig
 v4: No change, Resend.

--
Aneesh Kumar K.V (1):
  powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case

Peter Zijlstra (4):
  asm-generic/tlb: Track freeing of page-table directories in struct
mmu_gather
  asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
  mm/mmu_gather: invalidate TLB correctly on batch allocation failure
and flush
  asm-generic/tlb: avoid potential double flush

Will Deacon (1):
  asm-generic/tlb: Track which levels of the page tables have been
cleared

 arch/Kconfig |   3 -
 arch/powerpc/Kconfig |   2 +-
 arch/powerpc/include/asm/book3s/32/pgalloc.h |   8 --
 arch/powerpc/include/asm/book3s/64/pgalloc.h |   2 -
 arch/powerpc/include/asm/nohash/32/pgalloc.h |   8 --
 arch/powerpc/include/asm/tlb.h   |  11 ++
 arch/powerpc/mm/pgtable-book3s64.c   |   7 --
 arch/sparc/include/asm/tlb_64.h  |   9 ++
 arch/x86/Kconfig |   1 -
 include/asm-generic/tlb.h| 103 ---
 mm/memory.c  |  20 ++--
 11 files changed, 122 insertions(+), 52 deletions(-)

-- 
2.25.4

Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe

2020-05-20 Thread Shengjiu Wang

Hi

On Tue, May 19, 2020 at 6:04 PM Lucas Stach  wrote:
>
> Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang:
> > There are two requirements that we need to move the request
> > of dma channel from probe to open.
>
> How do you handle -EPROBE_DEFER return code from the channel request if
> you don't do it in probe?

I use the dma_request_slave_channel or dma_request_channel instead
of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER
return code.

>
> > - When dma device binds with power-domains, the power will
> > be enabled when we request dma channel. If the request of dma
> > channel happen on probe, then the power-domains will be always
> > enabled after kernel boot up,  which is not good for power
> > saving,  so we need to move the request of dma channel to .open();
>
> This is certainly something which could be fixed in the dmaengine
> driver.

Dma driver always call the pm_runtime_get_sync in
device_alloc_chan_resources, the device_alloc_chan_resources is
called when channel is requested. so power is enabled on channel
request.

>
> > - With FE-BE case, if the dma channel is requested in probe,
> > then there will be below issue, which is caused by that the
> > dma channel will be requested duplicately
>
> Why is this requested a second time? Is this just some missing cleanup
> on a deferred probe path?

Not relate with deferred probe.  With DMA1->ASRC->DMA2->ESAI case,
the DMA1->ASRC->DMA2 is in FE,  ESAI is in BE.  When ESAI drvier
probe,  DMA3 channel is created with ESAI's "dma:tx" (DMA3 channel
is not used in this FE-BE case).When FE-BE startup, DMA2
channel is created, it needs the ESAI's "dma:tx", so below warning
comes out.

>
> Regards,
> Lucas
>
> > [  638.906268] sysfs: cannot create duplicate filename 
> > '/devices/soc0/soc/200.bus/200.spba-bus/2024000.esai/dma:tx'
> > [  638.919061] CPU: 1 PID: 673 Comm: aplay Not tainted 
> > 5.7.0-rc1-12956-gfc64b2585593 #287
> > [  638.927113] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > [  638.933690] [] (unwind_backtrace) from [] 
> > (show_stack+0x10/0x14)
> > [  638.941464] [] (show_stack) from [] 
> > (dump_stack+0xe4/0x118)
> > [  638.948808] [] (dump_stack) from [] 
> > (sysfs_warn_dup+0x50/0x64)
> > [  638.956406] [] (sysfs_warn_dup) from [] 
> > (sysfs_do_create_link_sd+0xc8/0xd4)
> > [  638.965134] [] (sysfs_do_create_link_sd) from [] 
> > (dma_request_chan+0xb0/0x210)
> > [  638.974120] [] (dma_request_chan) from [] 
> > (dma_request_slave_channel+0x8/0x14)
> > [  638.983111] [] (dma_request_slave_channel) from [] 
> > (fsl_asrc_dma_hw_params+0x1e0/0x438)
> > [  638.992881] [] (fsl_asrc_dma_hw_params) from [] 
> > (soc_pcm_hw_params+0x4a0/0x6a8)
> > [  639.001952] [] (soc_pcm_hw_params) from [] 
> > (dpcm_fe_dai_hw_params+0x70/0xe4)
> > [  639.010765] [] (dpcm_fe_dai_hw_params) from [] 
> > (snd_pcm_hw_params+0x158/0x418)
> > [  639.019750] [] (snd_pcm_hw_params) from [] 
> > (snd_pcm_ioctl+0x734/0x183c)
> > [  639.028129] [] (snd_pcm_ioctl) from [] 
> > (ksys_ioctl+0x2ac/0xb98)
> > [  639.035812] [] (ksys_ioctl) from [] 
> > (ret_fast_syscall+0x0/0x28)
> > [  639.043490] Exception stack(0xec529fa8 to 0xec529ff0)
> > [  639.048565] 9fa0:   bee84650 01321870 0004 c25c4111 
> > bee84650 0002000f
> > [  639.056766] 9fc0: bee84650 01321870 01321820 0036 1f40  
> > 0002c2f8 0003
> > [  639.064964] 9fe0: b6f483fc bee8451c b6ee2655 b6e1dcf8
> > [  639.070339] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink
> >
> > Signed-off-by: Shengjiu Wang 
> > ---
> >  sound/soc/fsl/imx-pcm-dma.c | 173 +---
> >  1 file changed, 159 insertions(+), 14 deletions(-)
> >
> > diff --git a/sound/soc/fsl/imx-pcm-dma.c b/sound/soc/fsl/imx-pcm-dma.c
> > index 04a9bc749016..dae53b384df4 100644
> > --- a/sound/soc/fsl/imx-pcm-dma.c
> > +++ b/sound/soc/fsl/imx-pcm-dma.c
> > @@ -11,6 +11,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> > @@ -29,24 +30,168 @@ static bool filter(struct dma_chan *chan, void *param)
> >   return true;
> >  }
> >
> > -static const struct snd_dmaengine_pcm_config imx_dmaengine_pcm_config = {
> > - .prepare_slave_config = snd_dmaengine_pcm_prepare_slave_config,
> > - .compat_filter_fn = filter,
> > -};
> > +static int imx_pcm_hw_params(struct snd_soc_component *component,
> > +  struct snd_pcm_substream *substream,
> > +  struct snd_pcm_hw_params *params)
> > +{
> > + struct snd_pcm_runtime *runtime = substream->runtime;
> > + struct snd_soc_pcm_runtime *rtd = substream->private_data;
> > + struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0);
> > + struct snd_dmaengine_dai_dma_data *dma_data;
> > + struct dma_slave_config config;
> > + struct dma_chan *chan;
> > + int ret = 0;
> >
> > -int imx_pcm_dma_init(struct platform_device *pdev, size_t size)
> > + sn

Re: [RESEND PATCH v7 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods

2020-05-20 Thread Aneesh Kumar K.V

Vaibhav Jain  writes:



 +
> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm 
> */
> +struct nd_pdsm_cmd_pkg {
> + struct nd_cmd_pkg hdr;  /* Package header containing sub-cmd */
> + __s32 cmd_status;   /* Out: Sub-cmd status returned back */
> + __u16 payload_offset;   /* In: offset from start of struct */
> + __u16 payload_version;  /* In/Out: version of the payload */
> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> +} __packed;

that payload_offset can be avoided if we prevent userspace to user a
different variant of nd_pdsm_cmd_pkg which different header. We can keep
things simpler if we can always find payload at
nd_pdsm_cmd_pkg->payload.

> +
> +/*
> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the 
> kernel
> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> + */
> +enum papr_scm_pdsm {
> + PAPR_SCM_PDSM_MIN = 0x0,
> + PAPR_SCM_PDSM_MAX,
> +};
> +
> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg 
> *cmd)
> +{
> + return (struct nd_pdsm_cmd_pkg *) cmd;
> +}
> +
> +/* Return the payload pointer for a given pcmd */
> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> +{
> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> + return NULL;
> + else
> + return (void *)((__u8 *) pcmd + pcmd->payload_offset);
> +}
> +

we need to make sure userspace is not passing a wrong payload_offset.

and in the next patch you do

+   /* Copy the health struct to the payload */
+   memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
+   pkg->hdr.nd_fw_size = copysize;
+

All this can be simplified if you can keep payload at
nd_pdsm_cmd_pkg->payload.

If you still want to have the ability to extend the header, then added a
reserved field similar to nd_cmd_pkg.


-aneesh

94 matches

Mail list logo