Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
Ira Weiny writes: > On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote: >> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm >> modules and add the command family to the white list of NVDIMM command >> sets. Also advertise support for ND_CMD_CALL for the dimm >> command mask and implement necessary scaffolding in the module to >> handle ND_CMD_CALL ioctl and PDSM requests that we receive. ... >> + * >> + * Payload Version: >> + * >> + * A 'payload_version' field is present in PDSM header that indicates a >> specific >> + * version of the structure present in PDSM Payload for a given PDSM >> command. >> + * This provides backward compatibility in case the PDSM Payload structure >> + * evolves and different structures are supported by 'papr_scm' and >> 'libndctl'. >> + * >> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the >> version >> + * of the payload struct it supports via 'payload_version' field. The >> 'papr_scm' >> + * module when servicing the PDSM envelope checks the 'payload_version' and >> then >> + * uses 'payload struct version' == MIN('payload_version field', >> + * 'max payload-struct-version supported by papr_scm') to service the PDSM. >> + * After servicing the PDSM, 'papr_scm' put the negotiated version of >> payload >> + * struct in returned 'payload_version' field. > > FWIW many people believe using a size rather than version is more sustainable. > It is expected that new payload structures are larger (more features) than the > previous payload structure. > > I can't find references at the moment through. I think clone_args is a good modern example: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88 cheers
Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
On Wed, May 20, 2020 at 07:43:08PM +0200, Laurent Dufour wrote: > The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* > Hcalls") added checks of secure bit of SRR1 to filter out the Hcall > reserved to the Ultravisor. > > However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the > context of the VM calling UV_ESM. This allows the Hypervisor to return to > the guest without going through the Ultravisor. Thus the Secure bit of SRR1 > is not set in that particular case. > > In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be > filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is > not set in that case. > > Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") > Signed-off-by: Laurent Dufour Reviewed-by: Ram Pai > --- > arch/powerpc/kvm/book3s_hv.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 93493f0cbfe8..6ad1a3b14300 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) > ret = kvmppc_h_svm_init_done(vcpu->kvm); > break; > case H_SVM_INIT_ABORT: > - ret = H_UNSUPPORTED; > - if (kvmppc_get_srr1(vcpu) & MSR_S) > - ret = kvmppc_h_svm_init_abort(vcpu->kvm); > + /* > + * Even if that call is made by the Ultravisor, the SSR1 value > + * is the guest context one, with the secure bit clear as it has > + * not yet been secured. So we can't check it here. > + */ Frankly speaking, the comment above when read in isolation; i.e without the delete code above, feels out of place. The reasoning for change is anyway captured in the changelog. So, I think, we should delete this comment. Also the comment above assumes the Ultravisor will call H_SVM_INIT_ABORT with SRR1(S) bit not set; which may or may not be true. Regardless of who and how H_SVM_INIT_ABORT is called, we should just call kvmppc_h_svm_init_abort() and let it deal with the complexities. RP
Re: linux-next: manual merge of the rcu tree with the powerpc tree
Hi all, On Tue, 19 May 2020 17:23:16 +1000 Stephen Rothwell wrote: > > Today's linux-next merge of the rcu tree got a conflict in: > > arch/powerpc/kernel/traps.c > > between commit: > > 116ac378bb3f ("powerpc/64s: machine check interrupt update NMI accounting") > > from the powerpc tree and commit: > > 187416eeb388 ("hardirq/nmi: Allow nested nmi_enter()") > > from the rcu tree. > > I fixed it up (I used the powerpc tree version for now) and can carry the > fix as necessary. This is now fixed as far as linux-next is concerned, > but any non trivial conflicts should be mentioned to your upstream > maintainer when your tree is submitted for merging. You may also want > to consider cooperating with the maintainer of the conflicting tree to > minimise any particularly complex conflicts. This is now a conflict between the powerpc commit and commit 69ea03b56ed2 ("hardirq/nmi: Allow nested nmi_enter()") from the tip tree. I assume that the rcu and tip trees are sharing some patches (but not commits) :-( -- Cheers, Stephen Rothwell pgpK6ud0pBcwi.pgp Description: OpenPGP digital signature
Re: [PATCH] soc: fsl: qe: Replace one-element array and use struct_size() helper
On Wed, May 20, 2020 at 06:52:21PM -0500, Li Yang wrote: > On Mon, May 18, 2020 at 5:57 PM Kees Cook wrote: > > Hm, looking at this code, I see a few other things that need to be > > fixed: > > > > 1) drivers/tty/serial/ucc_uart.c does not do a be32_to_cpu() conversion > >on the length test (understandably, a little-endian system has never run > >this code since it's ppc specific), but it's still wrong: > > > > if (firmware->header.length != fw->size) { > > > >compare to the firmware loader: > > > > length = be32_to_cpu(hdr->length); > > > > 2) drivers/soc/fsl/qe/qe.c does not perform bounds checking on the > >per-microcode offsets, so the uploader might send data outside the > >firmware buffer. Perhaps: > > We do validate the CRC for each microcode, it is unlikely the CRC > check can pass if the offset or length is not correct. But you are > probably right that it will be safer to check the boundary and fail Right, but a malicious firmware file could still match CRC but trick the kernel code. > quicker before we actually start the CRC check. Will you come up with > a formal patch or you want us to deal with it? It sounds like Gustavo will be sending one, though I don't think either of us have the hardware to test it with, so if you could do that part, that would be great! :) -- Kees Cook
Re: [PATCH v3] powerpc/64: Option to use ELF V2 ABI for big-endian kernels
Excerpts from Segher Boessenkool's message of May 18, 2020 10:19 pm: > Hi! > > On Mon, May 18, 2020 at 04:35:22PM +1000, Michael Ellerman wrote: >> Nicholas Piggin writes: >> > Provide an option to build big-endian kernels using the ELF V2 ABI. This >> > works >> > on GCC and clang (since about 2014). it is is not officially supported by >> > the >> > GNU toolchain, but it can give big-endian kernels some useful advantages >> > of >> > the V2 ABI (e.g., less stack usage). > >> This doesn't build with clang: >> >> /tmp/aesp8-ppc-dad624.s: Assembler messages: >> /tmp/aesp8-ppc-dad624.s: Error: .size expression for >> aes_p8_set_encrypt_key does not evaluate to a constant > > What does this assembler code that clang doesn't swallow look like? Is > that valid code? Etc. The .size directive calculation is .text - .opd because the preprocessor isn't passing -mabi=elfv2 which makes our _GLOBAL function entry macro use the v1 convention. I guess I got the 64-bit vdso wrong as well, it should remain in ELFv1. Thanks, Nick
[PATCH v3 7/7] powerpc: Add POWER10 architected mode
PVR value of 0x0F06 means we are arch v3.1 compliant (i.e. POWER10). This is used by phyp and kvm when booting as a pseries guest to detect the presence of new P10 features and to enable the appropriate hwcap and facility bits. Signed-off-by: Alistair Popple Signed-off-by: Cédric Le Goater --- arch/powerpc/include/asm/cputable.h | 15 -- arch/powerpc/include/asm/mmu.h| 1 + arch/powerpc/include/asm/prom.h | 1 + arch/powerpc/kernel/cpu_setup_power.S | 20 -- arch/powerpc/kernel/cputable.c| 30 +++ arch/powerpc/kernel/prom_init.c | 12 +-- 6 files changed, 73 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 1559dbf72842..bac2252c839e 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -468,6 +468,17 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1 | \ CPU_FTR_P9_TM_HV_ASSIST | \ CPU_FTR_P9_TM_XER_SO_BUG) +#define CPU_FTRS_POWER10 (CPU_FTR_LWSYNC | \ + CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\ + CPU_FTR_MMCRA | CPU_FTR_SMT | \ + CPU_FTR_COHERENT_ICACHE | \ + CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ + CPU_FTR_DSCR | CPU_FTR_SAO | \ + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \ + CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ + CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ + CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \ + CPU_FTR_ARCH_31) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ @@ -486,14 +497,14 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | CPU_FTRS_POWER8 | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_VSX_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) #else #define CPU_FTRS_POSSIBLE \ (CPU_FTRS_PPC970 | CPU_FTRS_POWER5 | \ CPU_FTRS_POWER6 | CPU_FTRS_POWER7 | CPU_FTRS_POWER8E | \ CPU_FTRS_POWER8 | CPU_FTRS_CELL | CPU_FTRS_PA6T | \ CPU_FTR_VSX_COMP | CPU_FTR_ALTIVEC_COMP | CPU_FTRS_POWER9 | \ -CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2) +CPU_FTRS_POWER9_DD2_1 | CPU_FTRS_POWER9_DD2_2 | CPU_FTRS_POWER10) #endif /* CONFIG_CPU_LITTLE_ENDIAN */ #endif #else diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h index cf2a08bfd5cd..f4ac25d4df05 100644 --- a/arch/powerpc/include/asm/mmu.h +++ b/arch/powerpc/include/asm/mmu.h @@ -122,6 +122,7 @@ #define MMU_FTRS_POWER7MMU_FTRS_POWER6 #define MMU_FTRS_POWER8MMU_FTRS_POWER6 #define MMU_FTRS_POWER9MMU_FTRS_POWER6 +#define MMU_FTRS_POWER10 MMU_FTRS_POWER6 #define MMU_FTRS_CELL MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \ MMU_FTR_CI_LARGE_PAGE #define MMU_FTRS_PA6T MMU_FTRS_DEFAULT_HPTE_ARCH_V2 | \ diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index 94e3fd54f2c8..324a13351749 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -117,6 +117,7 @@ extern int of_read_drc_info_cell(struct property **prop, #define OV1_PPC_2_07 0x01/* set if we support PowerPC 2.07 */ #define OV1_PPC_3_00 0x80/* set if we support PowerPC 3.00 */ +#define OV1_PPC_3_10x40/* set if we support PowerPC 3.1 */ /* Option vector 2: Open Firmware options supported */ #define OV2_REAL_MODE 0x20/* set if we want OF in real mode */ diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S index a460298c7ddb..f3730cf904fa 100644 --- a/arch/powerpc/kernel/cpu_setup_power.S +++ b/arch/powerpc/kernel/cpu_setup_power.S @@ -91,10 +91,15 @@ _GLOBAL(__restore_cpu_power8) mtlrr11 blr +_GLOBAL(__setup_cpu_power10) + mflrr11 + bl __init_FSCR_P10 + b 1f + _GLOBAL(__setup_cpu_power9) mflrr11 bl __init_FSCR - bl __init_PMU +1: bl __init_PMU bl __init_hvmode_206 mtlrr11 beqlr @@ -116,10 +121,15 @@ _GLOBAL(__setup_cpu_power9) mtlrr11 blr +_GLOBAL(__restore_cpu_power10) + mflrr11 + bl __init_FSCR_P10 + b 1f + _GLOBAL(__restore_cpu_power9) mflrr11 bl __init_FSCR - bl __init_PMU
[PATCH v3 6/7] powerpc/dt_cpu_ftrs: Add MMA feature
Matrix multiple assist (MMA) is a new feature added to ISAv3.1 and POWER10. Support on powernv can be selected via a firmware CPU device tree feature which enables it via a PCR bit. Signed-off-by: Alistair Popple --- arch/powerpc/include/asm/reg.h| 3 ++- arch/powerpc/kernel/dt_cpu_ftrs.c | 17 - 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index dd20af367b57..88e6c78100d9 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -481,7 +481,8 @@ #define PCR_VEC_DIS (__MASK(63-0)) /* Vec. disable (bit NA since POWER8) */ #define PCR_VSX_DIS (__MASK(63-1)) /* VSX disable (bit NA since POWER8) */ #define PCR_TM_DIS (__MASK(63-2)) /* Trans. memory disable (POWER8) */ -#define PCR_HIGH_BITS(PCR_VEC_DIS | PCR_VSX_DIS | PCR_TM_DIS) +#define PCR_MMA_DIS (__MASK(63-3)) /* Matrix-Multiply Accelerator */ +#define PCR_HIGH_BITS(PCR_MMA_DIS | PCR_VEC_DIS | PCR_VSX_DIS | PCR_TM_DIS) /* * These bits are used in the function kvmppc_set_arch_compat() to specify and * determine both the compatibility level which we want to emulate and the diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 93c340906aad..0a41fce34165 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -75,6 +75,7 @@ static struct { u64 lpcr_clear; u64 hfscr; u64 fscr; + u64 pcr; } system_registers; static void (*init_pmu_registers)(void); @@ -102,7 +103,7 @@ static void __restore_cpu_cpufeatures(void) if (hv_mode) { mtspr(SPRN_LPID, 0); mtspr(SPRN_HFSCR, system_registers.hfscr); - mtspr(SPRN_PCR, PCR_MASK); + mtspr(SPRN_PCR, system_registers.pcr); } mtspr(SPRN_FSCR, system_registers.fscr); @@ -555,6 +556,18 @@ static int __init feat_enable_large_ci(struct dt_cpu_feature *f) return 1; } +static int __init feat_enable_mma(struct dt_cpu_feature *f) +{ + u64 pcr; + + feat_enable(f); + pcr = mfspr(SPRN_PCR); + pcr &= ~PCR_MMA_DIS; + mtspr(SPRN_PCR, pcr); + + return 1; +} + struct dt_cpu_feature_match { const char *name; int (*enable)(struct dt_cpu_feature *f); @@ -629,6 +642,7 @@ static struct dt_cpu_feature_match __initdata {"vector-binary16", feat_enable, 0}, {"wait-v3", feat_enable, 0}, {"prefix-instructions", feat_enable, 0}, + {"matrix-multiply-assist", feat_enable_mma, 0}, }; static bool __initdata using_dt_cpu_ftrs; @@ -779,6 +793,7 @@ static void __init cpufeatures_setup_finished(void) system_registers.lpcr = mfspr(SPRN_LPCR); system_registers.hfscr = mfspr(SPRN_HFSCR); system_registers.fscr = mfspr(SPRN_FSCR); + system_registers.pcr = mfspr(SPRN_PCR); pr_info("final cpu/mmu features = 0x%016lx 0x%08x\n", cur_cpu_spec->cpu_features, cur_cpu_spec->mmu_features); -- 2.20.1
[PATCH v3 5/7] powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
Prefix instructions have their own FSCR bit which needs to be enabled via a CPU feature. The kernel will save the FSCR for problem state but it needs to be enabled initially. Signed-off-by: Alistair Popple --- arch/powerpc/kernel/dt_cpu_ftrs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 677190f70cac..93c340906aad 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -628,6 +628,7 @@ static struct dt_cpu_feature_match __initdata {"vector-binary128", feat_enable, 0}, {"vector-binary16", feat_enable, 0}, {"wait-v3", feat_enable, 0}, + {"prefix-instructions", feat_enable, 0}, }; static bool __initdata using_dt_cpu_ftrs; -- 2.20.1
[PATCH v3 4/7] powerpc/dt_cpu_ftrs: Set current thread fscr bits
Setting the FSCR bit directly in the SPR only sets it during the initial boot and early init of the kernel but not for the init process or any subsequent kthreads. This is because the thread_struct for those is copied from the current thread_struct setup at boot which doesn't reflect any changes made to the FSCR during cpu feature detection. This patch ensures the current thread state is updated to match once feature detection is complete. Signed-off-by: Alistair Popple --- arch/powerpc/kernel/dt_cpu_ftrs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index b5e21264d168..677190f70cac 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -170,6 +170,7 @@ static int __init feat_try_enable_unknown(struct dt_cpu_feature *f) u64 fscr = mfspr(SPRN_FSCR); fscr |= 1UL << f->fscr_bit_nr; mtspr(SPRN_FSCR, fscr); + current->thread.fscr |= 1UL << f->fscr_bit_nr; } else { /* Does not have a known recipe */ return 0; @@ -205,6 +206,7 @@ static int __init feat_enable(struct dt_cpu_feature *f) u64 fscr = mfspr(SPRN_FSCR); fscr |= 1UL << f->fscr_bit_nr; mtspr(SPRN_FSCR, fscr); + current->thread.fscr |= 1UL << f->fscr_bit_nr; } } -- 2.20.1
[PATCH v3 3/7] powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
On powernv hardware support for ISAv3.1 is advertised via a cpu feature bit in the device tree. This patch enables the associated HWCAP bit if the device tree indicates ISAv3.1 is available. Signed-off-by: Alistair Popple --- arch/powerpc/kernel/dt_cpu_ftrs.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 36bc0d5c4f3a..b5e21264d168 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -26,6 +26,7 @@ /* Device-tree visible constants follow */ #define ISA_V2_07B 2070 #define ISA_V3_0B 3000 +#define ISA_V3_13100 #define USABLE_PR (1U << 0) #define USABLE_OS (1U << 1) @@ -654,6 +655,11 @@ static void __init cpufeatures_setup_start(u32 isa) cur_cpu_spec->cpu_features |= CPU_FTR_ARCH_300; cur_cpu_spec->cpu_user_features2 |= PPC_FEATURE2_ARCH_3_00; } + + if (isa >= 3100) { + cur_cpu_spec->cpu_features |= CPU_FTR_ARCH_31; + cur_cpu_spec->cpu_user_features2 |= PPC_FEATURE2_ARCH_3_1; + } } static bool __init cpufeatures_process_feature(struct dt_cpu_feature *f) -- 2.20.1
[PATCH v3 2/7] powerpc: Add support for ISA v3.1
Newer ISA versions are enabled by clearing all bits in the PCR associated with previous versions of the ISA. Enable ISA v3.1 support by updating the PCR mask to include ISA v3.0. This ensures all PCR bits corresponding to earlier architecture versions get cleared thereby enabling ISA v3.1 if supported by the hardware. Signed-off-by: Alistair Popple --- arch/powerpc/include/asm/cputable.h | 1 + arch/powerpc/include/asm/reg.h | 3 ++- arch/powerpc/kvm/book3s_hv.c| 3 --- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index c67b94f3334c..1559dbf72842 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -213,6 +213,7 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_TIDR LONG_ASM_CONST(0x8000) #define CPU_FTR_P9_TLBIE_ERAT_BUG LONG_ASM_CONST(0x0001) #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) +#define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 054f8a71d686..dd20af367b57 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -487,10 +487,11 @@ * determine both the compatibility level which we want to emulate and the * compatibility level which the host is capable of emulating. */ +#define PCR_ARCH_300 0x10/* Architecture 3.00 */ #define PCR_ARCH_207 0x8 /* Architecture 2.07 */ #define PCR_ARCH_206 0x4 /* Architecture 2.06 */ #define PCR_ARCH_205 0x2 /* Architecture 2.05 */ -#define PCR_LOW_BITS (PCR_ARCH_207 | PCR_ARCH_206 | PCR_ARCH_205) +#define PCR_LOW_BITS (PCR_ARCH_207 | PCR_ARCH_206 | PCR_ARCH_205 | PCR_ARCH_300) #define PCR_MASK ~(PCR_HIGH_BITS | PCR_LOW_BITS) /* PCR Reserved Bits */ #defineSPRN_HEIR 0x153 /* Hypervisor Emulated Instruction Register */ #define SPRN_TLBINDEXR 0x154 /* P7 TLB control register */ diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index db07199f0977..a0cf17597838 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -344,9 +344,6 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) vcpu->arch.pvr = pvr; } -/* Dummy value used in computing PCR value below */ -#define PCR_ARCH_300 (PCR_ARCH_207 << 1) - static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) { unsigned long host_pcr_bit = 0, guest_pcr_bit = 0; -- 2.20.1
[PATCH v3 1/7] powerpc: Add new HWCAP bits
POWER10 introduces two new architectural features - ISAv3.1 and matrix multiply assist (MMA) instructions. Userspace detects the presence of these features via two HWCAP bits introduced in this patch. These bits have been agreed to by the compiler and binutils team. According to ISAv3.1 MMA is an optional feature and software that makes use of it should first check for availability via this HWCAP bit and use alternate code paths if unavailable. Signed-off-by: Alistair Popple Tested-by: Michael Neuling --- arch/powerpc/include/uapi/asm/cputable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/include/uapi/asm/cputable.h b/arch/powerpc/include/uapi/asm/cputable.h index 540592034740..731b97dc2d15 100644 --- a/arch/powerpc/include/uapi/asm/cputable.h +++ b/arch/powerpc/include/uapi/asm/cputable.h @@ -50,6 +50,8 @@ #define PPC_FEATURE2_DARN 0x0020 /* darn random number insn */ #define PPC_FEATURE2_SCV 0x0010 /* scv syscall */ #define PPC_FEATURE2_HTM_NO_SUSPEND0x0008 /* TM w/out suspended state */ +#define PPC_FEATURE2_ARCH_3_1 0x0004 /* ISA 3.1 */ +#define PPC_FEATURE2_MMA 0x0002 /* Matrix Multiply Assist */ /* * IMPORTANT! -- 2.20.1
[PATCH v3 0/7] Base support for POWER10
This series brings together several previously posted patches required for POWER10 support and introduces a new patch enabling POWER10 architected mode to enable booting as a POWER10 pseries guest. It includes support for enabling facilities related to MMA and prefix instructions. Changes from v2: - s/accumulate/assist/ - Updated commit messages Changes from v1: - Two bug-fixes to enable prefix and MMA on pseries - Minor updates to commit message wording - Fixes a build error when CONFIG_KVM_BOOK3S_64_HV is enabled Alistair Popple (7): powerpc: Add new HWCAP bits powerpc: Add support for ISA v3.1 powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected powerpc/dt_cpu_ftrs: Set current thread fscr bits powerpc/dt_cpu_ftrs: Enable Prefixed Instructions powerpc/dt_cpu_ftrs: Add MMA feature powerpc: Add POWER10 architected mode arch/powerpc/include/asm/cputable.h | 16 +++-- arch/powerpc/include/asm/mmu.h | 1 + arch/powerpc/include/asm/prom.h | 1 + arch/powerpc/include/asm/reg.h | 6 +++-- arch/powerpc/include/uapi/asm/cputable.h | 2 ++ arch/powerpc/kernel/cpu_setup_power.S| 20 ++-- arch/powerpc/kernel/cputable.c | 30 arch/powerpc/kernel/dt_cpu_ftrs.c| 26 +++- arch/powerpc/kernel/prom_init.c | 12 -- arch/powerpc/kvm/book3s_hv.c | 3 --- 10 files changed, 105 insertions(+), 12 deletions(-) -- 2.20.1
Re: [PATCH v2 5/7] mm: parallelize deferred_init_memmap()
On Wed, May 20, 2020 at 11:27 AM Daniel Jordan wrote: > > Deferred struct page init is a significant bottleneck in kernel boot. > Optimizing it maximizes availability for large-memory systems and allows > spinning up short-lived VMs as needed without having to leave them > running. It also benefits bare metal machines hosting VMs that are > sensitive to downtime. In projects such as VMM Fast Restart[1], where > guest state is preserved across kexec reboot, it helps prevent > application and network timeouts in the guests. > > Multithread to take full advantage of system memory bandwidth. > > The maximum number of threads is capped at the number of CPUs on the > node because speedups always improve with additional threads on every > system tested, and at this phase of boot, the system is otherwise idle > and waiting on page init to finish. > > Helper threads operate on section-aligned ranges to both avoid false > sharing when setting the pageblock's migrate type and to avoid accessing > uninitialized buddy pages, though max order alignment is enough for the > latter. > > The minimum chunk size is also a section. There was benefit to using > multiple threads even on relatively small memory (1G) systems, and this > is the smallest size that the alignment allows. > > The time (milliseconds) is the slowest node to initialize since boot > blocks until all nodes finish. intel_pstate is loaded in active mode > without hwp and with turbo enabled, and intel_idle is active as well. > > Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz (Skylake, bare metal) > 2 nodes * 26 cores * 2 threads = 104 CPUs > 384G/node = 768G memory > >kernel boot deferred init > > node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) > ( 0) -- 4078.0 ( 9.0) -- 1779.0 ( 8.7) >2% ( 1) 1.4% 4021.3 ( 2.9) 3.4% 1717.7 ( 7.8) > 12% ( 6) 35.1% 2644.7 ( 35.3) 80.8%341.0 ( 35.5) > 25% ( 13) 38.7% 2498.0 ( 34.2) 89.1%193.3 ( 32.3) > 37% ( 19) 39.1% 2482.0 ( 25.2) 90.1%175.3 ( 31.7) > 50% ( 26) 38.8% 2495.0 ( 8.7) 89.1%193.7 ( 3.5) > 75% ( 39) 39.2% 2478.0 ( 21.0) 90.3%172.7 ( 26.7) > 100% ( 52) 40.0% 2448.0 ( 2.0) 91.9%143.3 ( 1.5) > > Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz (Broadwell, bare metal) > 1 node * 16 cores * 2 threads = 32 CPUs > 192G/node = 192G memory > >kernel boot deferred init > > node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) > ( 0) -- 1996.0 ( 18.0) -- 1104.3 ( 6.7) >3% ( 1) 1.4% 1968.0 ( 3.0) 2.7% 1074.7 ( 9.0) > 12% ( 4) 40.1% 1196.0 ( 22.7) 72.4%305.3 ( 16.8) > 25% ( 8) 47.4% 1049.3 ( 17.2) 84.2%174.0 ( 10.6) > 37% ( 12) 48.3% 1032.0 ( 14.9) 86.8%145.3 ( 2.5) > 50% ( 16) 48.9% 1020.3 ( 2.5) 88.0%133.0 ( 1.7) > 75% ( 24) 49.1% 1016.3 ( 8.1) 88.4%128.0 ( 1.7) > 100% ( 32) 49.4% 1009.0 ( 8.5) 88.6%126.3 ( 0.6) > > Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, bare metal) > 2 nodes * 18 cores * 2 threads = 72 CPUs > 128G/node = 256G memory > >kernel boot deferred init > > node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) > ( 0) -- 1682.7 ( 6.7) --630.0 ( 4.6) >3% ( 1) 0.4% 1676.0 ( 2.0) 0.7%625.3 ( 3.2) > 12% ( 4) 25.8% 1249.0 ( 1.0) 68.2%200.3 ( 1.2) > 25% ( 9) 30.0% 1178.0 ( 5.2) 79.7%128.0 ( 3.5) > 37% ( 13) 30.6% 1167.7 ( 3.1) 81.3%117.7 ( 1.2) > 50% ( 18) 30.6% 1167.3 ( 2.3) 81.4%117.0 ( 1.0) > 75% ( 27) 31.0% 1161.3 ( 4.6) 82.5%110.0 ( 6.9) > 100% ( 36) 32.1% 1142.0 ( 3.6) 85.7% 90.0 ( 1.0) > > AMD EPYC 7551 32-Core Processor (Zen, kvm guest) > 1 node * 8 cores * 2 threads = 16 CPUs > 64G/node = 64G memory > >kernel boot deferred init > > node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) > ( 0) -- 1003.7 ( 16.6) --243.3 ( 8.1) >6% ( 1) 1.4%990.0 ( 4.6) 1.2%240.3 ( 1.5) > 12% ( 2) 11.4%889.3 ( 16.7) 44.5%135.0 ( 3.0) > 25% ( 4) 16.8%835.3 ( 9.0)
Re: Endless soft-lockups for compiling workload since next-20200519
On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote: > On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote: > > Just a head up. Repeatedly compiling kernels for a while would trigger > > endless soft-lockups since next-20200519 on both x86_64 and powerpc. > > .config are in, > > Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've > not seen anything like that myself. Let me go have a look. > > > In as far as the logs are readable (they're a wrapped mess, please don't > do that!), they contain very little useful, as is typical with IPIs :/ > > > [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127 > > flush_smp_call_function_queue+0x1fa/0x2e0 So I've tried to think of a race that could produce that and here is the only thing I could come up with. It's a bit complicated unfortunately: CPU 0 CPU 1 - - tick { trigger_load_balance() { raise_softirq(SCHED_SOFTIRQ); //but nohz_flags(0) = 0 } kick_ilb() { atomic_fetch_or(, nohz_flags(0)) softirq() {#VMEXIT or anything that could stop a CPU for a while run_rebalance_domain() { nohz_idle_balance() { atomic_andnot(NOHZ_KICK_MASK, nohz_flag(0)) } } } } // schedule nohz_newidle_balance() { kick_ilb() { // pick current CPU atomic_fetch_or(, nohz_flags(0)) #VMENTER smp_call_function_single_async() { smp_call_function_single_async() { // verified csd->flags != CSD_LOCK // verified csd->flags != CSD_LOCK csd->flags = CSD_LOCK csd->flags = CSD_LOCK //execute in place //queue and send IPI csd->flags = 0 nohz_csd_func() } } } IPI�{ flush_smp_call_function_queue() { csd_unlock() { WARN_ON(csd->flags != CSD_LOCK) <-! The root cause here would be that trigger_load_balance() unconditionally raise the softirq. And I have to confess I'm not clear why since the softirq is essentially a no-op when nohz_flags() is 0. Thanks.
Re: [PATCH] soc: fsl: qe: Replace one-element array and use struct_size() helper
On Wed, May 20, 2020 at 06:52:21PM -0500, Li Yang wrote: > On Mon, May 18, 2020 at 5:57 PM Kees Cook wrote: > > > > On Mon, May 18, 2020 at 05:19:04PM -0500, Gustavo A. R. Silva wrote: > > > The current codebase makes use of one-element arrays in the following > > > form: > > > > > > struct something { > > > int length; > > > u8 data[1]; > > > }; > > > > > > struct something *instance; > > > > > > instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL); > > > instance->length = size; > > > memcpy(instance->data, source, size); > > > > > > but the preferred mechanism to declare variable-length types such as > > > these ones is a flexible array member[1][2], introduced in C99: > > > > > > struct foo { > > > int stuff; > > > struct boo array[]; > > > }; > > > > > > By making use of the mechanism above, we will get a compiler warning > > > in case the flexible array does not occur last in the structure, which > > > will help us prevent some kind of undefined behavior bugs from being > > > inadvertently introduced[3] to the codebase from now on. So, replace > > > the one-element array with a flexible-array member. > > > > > > Also, make use of the new struct_size() helper to properly calculate the > > > size of struct qe_firmware. > > > > > > This issue was found with the help of Coccinelle and, audited and fixed > > > _manually_. > > > > > > [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html > > > [2] https://github.com/KSPP/linux/issues/21 > > > [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") > > > > > > Signed-off-by: Gustavo A. R. Silva > > > --- > > > drivers/soc/fsl/qe/qe.c | 4 ++-- > > > include/soc/fsl/qe/qe.h | 2 +- > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c > > > index 447146861c2c1..2df20d6f85fa4 100644 > > > --- a/drivers/soc/fsl/qe/qe.c > > > +++ b/drivers/soc/fsl/qe/qe.c > > > @@ -448,7 +448,7 @@ int qe_upload_firmware(const struct qe_firmware > > > *firmware) > > > unsigned int i; > > > unsigned int j; > > > u32 crc; > > > - size_t calc_size = sizeof(struct qe_firmware); > > > + size_t calc_size; > > > size_t length; > > > const struct qe_header *hdr; > > > > > > @@ -480,7 +480,7 @@ int qe_upload_firmware(const struct qe_firmware > > > *firmware) > > > } > > > > > > /* Validate the length and check if there's a CRC */ > > > - calc_size += (firmware->count - 1) * sizeof(struct qe_microcode); > > > + calc_size = struct_size(firmware, microcode, firmware->count); > > > > > > for (i = 0; i < firmware->count; i++) > > > /* > > > diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h > > > index e282ac01ec081..3feddfec9f87d 100644 > > > --- a/include/soc/fsl/qe/qe.h > > > +++ b/include/soc/fsl/qe/qe.h > > > @@ -307,7 +307,7 @@ struct qe_firmware { > > > u8 revision;/* The microcode version revision */ > > > u8 padding; /* Reserved, for alignment */ > > > u8 reserved[4]; /* Reserved, for future expansion */ > > > - } __attribute__ ((packed)) microcode[1]; > > > + } __packed microcode[]; > > > /* All microcode binaries should be located here */ > > > /* CRC32 should be located here, after the microcode binaries */ > > > } __attribute__ ((packed)); > > > -- > > > 2.26.2 > > > > > > > Hm, looking at this code, I see a few other things that need to be > > fixed: > > > > 1) drivers/tty/serial/ucc_uart.c does not do a be32_to_cpu() conversion > >on the length test (understandably, a little-endian system has never run > >this code since it's ppc specific), but it's still wrong: > > > > if (firmware->header.length != fw->size) { > > > >compare to the firmware loader: > > > > length = be32_to_cpu(hdr->length); > > > > 2) drivers/soc/fsl/qe/qe.c does not perform bounds checking on the > >per-microcode offsets, so the uploader might send data outside the > >firmware buffer. Perhaps: > > We do validate the CRC for each microcode, it is unlikely the CRC > check can pass if the offset or length is not correct. But you are > probably right that it will be safer to check the boundary and fail > quicker before we actually start the CRC check. Will you come up with > a formal patch or you want us to deal with it? > Li, I will send a proper patch for this. Thanks -- Gustavo > > > > > > diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c > > index 447146861c2c..c4e0bc452f03 100644 > > --- a/drivers/soc/fsl/qe/qe.c > > +++ b/drivers/soc/fsl/qe/qe.c > > @@ -451,6 +451,7 @@ int qe_upload_firmware(const struct qe_firmware > > *firmware) > > size_t calc_size = sizeof(struct qe_firmware); > > size_t length; > > const struct qe_header *hdr; > > + void *firmware_end; > > > > if (!firmware) { > > printk(KERN_E
Re: [PATCH] soc: fsl: qe: Replace one-element array and use struct_size() helper
On Mon, May 18, 2020 at 5:57 PM Kees Cook wrote: > > On Mon, May 18, 2020 at 05:19:04PM -0500, Gustavo A. R. Silva wrote: > > The current codebase makes use of one-element arrays in the following > > form: > > > > struct something { > > int length; > > u8 data[1]; > > }; > > > > struct something *instance; > > > > instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL); > > instance->length = size; > > memcpy(instance->data, source, size); > > > > but the preferred mechanism to declare variable-length types such as > > these ones is a flexible array member[1][2], introduced in C99: > > > > struct foo { > > int stuff; > > struct boo array[]; > > }; > > > > By making use of the mechanism above, we will get a compiler warning > > in case the flexible array does not occur last in the structure, which > > will help us prevent some kind of undefined behavior bugs from being > > inadvertently introduced[3] to the codebase from now on. So, replace > > the one-element array with a flexible-array member. > > > > Also, make use of the new struct_size() helper to properly calculate the > > size of struct qe_firmware. > > > > This issue was found with the help of Coccinelle and, audited and fixed > > _manually_. > > > > [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html > > [2] https://github.com/KSPP/linux/issues/21 > > [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") > > > > Signed-off-by: Gustavo A. R. Silva > > --- > > drivers/soc/fsl/qe/qe.c | 4 ++-- > > include/soc/fsl/qe/qe.h | 2 +- > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c > > index 447146861c2c1..2df20d6f85fa4 100644 > > --- a/drivers/soc/fsl/qe/qe.c > > +++ b/drivers/soc/fsl/qe/qe.c > > @@ -448,7 +448,7 @@ int qe_upload_firmware(const struct qe_firmware > > *firmware) > > unsigned int i; > > unsigned int j; > > u32 crc; > > - size_t calc_size = sizeof(struct qe_firmware); > > + size_t calc_size; > > size_t length; > > const struct qe_header *hdr; > > > > @@ -480,7 +480,7 @@ int qe_upload_firmware(const struct qe_firmware > > *firmware) > > } > > > > /* Validate the length and check if there's a CRC */ > > - calc_size += (firmware->count - 1) * sizeof(struct qe_microcode); > > + calc_size = struct_size(firmware, microcode, firmware->count); > > > > for (i = 0; i < firmware->count; i++) > > /* > > diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h > > index e282ac01ec081..3feddfec9f87d 100644 > > --- a/include/soc/fsl/qe/qe.h > > +++ b/include/soc/fsl/qe/qe.h > > @@ -307,7 +307,7 @@ struct qe_firmware { > > u8 revision;/* The microcode version revision */ > > u8 padding; /* Reserved, for alignment */ > > u8 reserved[4]; /* Reserved, for future expansion */ > > - } __attribute__ ((packed)) microcode[1]; > > + } __packed microcode[]; > > /* All microcode binaries should be located here */ > > /* CRC32 should be located here, after the microcode binaries */ > > } __attribute__ ((packed)); > > -- > > 2.26.2 > > > > Hm, looking at this code, I see a few other things that need to be > fixed: > > 1) drivers/tty/serial/ucc_uart.c does not do a be32_to_cpu() conversion >on the length test (understandably, a little-endian system has never run >this code since it's ppc specific), but it's still wrong: > > if (firmware->header.length != fw->size) { > >compare to the firmware loader: > > length = be32_to_cpu(hdr->length); > > 2) drivers/soc/fsl/qe/qe.c does not perform bounds checking on the >per-microcode offsets, so the uploader might send data outside the >firmware buffer. Perhaps: We do validate the CRC for each microcode, it is unlikely the CRC check can pass if the offset or length is not correct. But you are probably right that it will be safer to check the boundary and fail quicker before we actually start the CRC check. Will you come up with a formal patch or you want us to deal with it? > > > diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c > index 447146861c2c..c4e0bc452f03 100644 > --- a/drivers/soc/fsl/qe/qe.c > +++ b/drivers/soc/fsl/qe/qe.c > @@ -451,6 +451,7 @@ int qe_upload_firmware(const struct qe_firmware *firmware) > size_t calc_size = sizeof(struct qe_firmware); > size_t length; > const struct qe_header *hdr; > + void *firmware_end; > > if (!firmware) { > printk(KERN_ERR "qe-firmware: invalid pointer\n"); > @@ -491,19 +492,39 @@ int qe_upload_firmware(const struct qe_firmware > *firmware) > calc_size += sizeof(__be32) * > be32_to_cpu(firmware->microcode[i].count); > > - /* Validate the length */ > + /* Validate total length */ > if (length != calc_size + sizeof(__be32)) { >
Re: [PATCH v6 09/11] PCI: layerscape: Add EP mode support for ls1088a and ls2088a
On Sat, 14 Mar 2020 11:30:36 +0800, Xiaowei Bao wrote: > Add PCIe EP mode support for ls1088a and ls2088a, there are some > difference between LS1 and LS2 platform, so refactor the code of > the EP driver. > > Signed-off-by: Xiaowei Bao > --- > v2: > - This is a new patch for supporting the ls1088a and ls2088a platform. > v3: > - Adjust the some struct assignment order in probe function. > v4: > - No change. > v5: > - No change. > v6: > - No change. > > drivers/pci/controller/dwc/pci-layerscape-ep.c | 72 > +++--- > 1 file changed, 53 insertions(+), 19 deletions(-) > Reviewed-by: Rob Herring
Re: [PATCH v6 07/11] PCI: layerscape: Modify the way of getting capability with different PEX
On Sat, 14 Mar 2020 11:30:34 +0800, Xiaowei Bao wrote: > The different PCIe controller in one board may be have different > capability of MSI or MSIX, so change the way of getting the MSI > capability, make it more flexible. > > Signed-off-by: Xiaowei Bao > --- > v2: > - Remove the repeated assignment code. > v3: > - Use ep_func msi_cap and msix_cap to decide the msi_capable and >msix_capable of pci_epc_features struct. > v4: > - No change. > v5: > - No change. > v6: > - No change. > > drivers/pci/controller/dwc/pci-layerscape-ep.c | 31 > +++--- > 1 file changed, 23 insertions(+), 8 deletions(-) > Reviewed-by: Rob Herring
Re: [PATCH v6 04/11] PCI: designware-ep: Modify MSI and MSIX CAP way of finding
On Sat, Mar 14, 2020 at 11:30:31AM +0800, Xiaowei Bao wrote: > Each PF of EP device should have it's own MSI or MSIX capabitily s/it's/its/ > struct, so create a dw_pcie_ep_func struct and remove the msi_cap > and msix_cap to this struct from dw_pcie_ep, and manage the PFs > with a list. > > Signed-off-by: Xiaowei Bao > --- > v3: > - This is a new patch, to fix the issue of MSI and MSIX CAP way of >finding. > v4: > - Correct some word of commit message. > v5: > - No change. > v6: > - Fix up the compile error. > > drivers/pci/controller/dwc/pcie-designware-ep.c | 135 > +--- > drivers/pci/controller/dwc/pcie-designware.h| 18 +++- > 2 files changed, 134 insertions(+), 19 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c > b/drivers/pci/controller/dwc/pcie-designware-ep.c > index 933bb89..fb915f2 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c > @@ -19,6 +19,19 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep) > pci_epc_linkup(epc); > } > > +struct dw_pcie_ep_func * > +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no) > +{ > + struct dw_pcie_ep_func *ep_func; > + > + list_for_each_entry(ep_func, &ep->func_list, list) { > + if (ep_func->func_no == func_no) > + return ep_func; > + } > + > + return NULL; > +} > + > static unsigned int dw_pcie_ep_func_select(struct dw_pcie_ep *ep, u8 func_no) > { > unsigned int func_offset = 0; > @@ -59,6 +72,47 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum > pci_barno bar) > __dw_pcie_ep_reset_bar(pci, func_no, bar, 0); > } > > +static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie_ep *ep, u8 func_no, > + u8 cap_ptr, u8 cap) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + unsigned int func_offset = 0; > + u8 cap_id, next_cap_ptr; > + u16 reg; > + > + if (!cap_ptr) > + return 0; > + > + func_offset = dw_pcie_ep_func_select(ep, func_no); > + > + reg = dw_pcie_readw_dbi(pci, func_offset + cap_ptr); > + cap_id = (reg & 0x00ff); > + > + if (cap_id > PCI_CAP_ID_MAX) > + return 0; > + > + if (cap_id == cap) > + return cap_ptr; > + > + next_cap_ptr = (reg & 0xff00) >> 8; > + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap); > +} > + > +static u8 dw_pcie_ep_find_capability(struct dw_pcie_ep *ep, u8 func_no, u8 > cap) > +{ > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > + unsigned int func_offset = 0; > + u8 next_cap_ptr; > + u16 reg; > + > + func_offset = dw_pcie_ep_func_select(ep, func_no); > + > + reg = dw_pcie_readw_dbi(pci, func_offset + PCI_CAPABILITY_LIST); > + next_cap_ptr = (reg & 0x00ff); > + > + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap); > +} > + > static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no, > struct pci_epf_header *hdr) > { > @@ -246,13 +300,18 @@ static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 > func_no) > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > u32 val, reg; > unsigned int func_offset = 0; > + struct dw_pcie_ep_func *ep_func; > > - if (!ep->msi_cap) > + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no); > + if (!ep_func) > + return -EINVAL; > + > + if (!ep_func->msi_cap) > return -EINVAL; if (!ep_func || !ep_func->msi_cap) return -EINVAL; > > func_offset = dw_pcie_ep_func_select(ep, func_no); > > - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS; > + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS; > val = dw_pcie_readw_dbi(pci, reg); > if (!(val & PCI_MSI_FLAGS_ENABLE)) > return -EINVAL; > @@ -268,13 +327,18 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 > func_no, u8 interrupts) > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > u32 val, reg; > unsigned int func_offset = 0; > + struct dw_pcie_ep_func *ep_func; > + > + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no); > + if (!ep_func) > + return -EINVAL; > > - if (!ep->msi_cap) > + if (!ep_func->msi_cap) > return -EINVAL; Same here. > > func_offset = dw_pcie_ep_func_select(ep, func_no); > > - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS; > + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS; > val = dw_pcie_readw_dbi(pci, reg); > val &= ~PCI_MSI_FLAGS_QMASK; > val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK; > @@ -291,13 +355,18 @@ static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 > func_no) > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > u32 val, reg; > unsigned int func_offset = 0; > + struct dw_pcie_ep_func *ep_func; > + > + ep_func = dw_pcie_ep_get_func_fr
Re: [PATCH v6 01/11] PCI: designware-ep: Add multiple PFs support for DWC
On Sat, 14 Mar 2020 11:30:28 +0800, Xiaowei Bao wrote: > Add multiple PFs support for DWC, due to different PF have different > config space, we use func_conf_select callback function to access > the different PF's config space, the different chip company need to > implement this callback function when use the DWC IP core and intend > to support multiple PFs feature. > > Signed-off-by: Xiaowei Bao > Acked-by: Gustavo Pimentel > --- > v2: > - Remove duplicate redundant code. > - Reimplement the PF config space access way. > v3: > - Integrate duplicate code for func_select. > - Move PCIE_ATU_FUNC_NUM(pf) (pf << 20) to ((pf) << 20). > - Add the comments for func_conf_select function. > v4: > - Correct the commit message. > v5: > - No change. > v6: > - No change. > > drivers/pci/controller/dwc/pcie-designware-ep.c | 123 > > drivers/pci/controller/dwc/pcie-designware.c| 59 > drivers/pci/controller/dwc/pcie-designware.h| 18 +++- > 3 files changed, 142 insertions(+), 58 deletions(-) > Reviewed-by: Rob Herring
Re: [PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
On Wed, 20 May 2020 18:51:10 +0200 Laurent Dufour wrote: > The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* > Hcalls") added checks of secure bit of SRR1 to filter out the Hcall > reserved to the Ultravisor. > > However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the > context of the VM calling UV_ESM. This allows the Hypervisor to return to > the guest without going through the Ultravisor. Thus the Secure bit of SRR1 > is not set in that particular case. > > In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be > filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is > not set in that case. > Why not checking vcpu->kvm->arch.secure_guest then ? > Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") > Signed-off-by: Laurent Dufour > --- > arch/powerpc/kvm/book3s_hv.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 93493f0cbfe8..eb1f96cb7b72 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) > ret = kvmppc_h_svm_init_done(vcpu->kvm); > break; > case H_SVM_INIT_ABORT: > - ret = H_UNSUPPORTED; > - if (kvmppc_get_srr1(vcpu) & MSR_S) > - ret = kvmppc_h_svm_init_abort(vcpu->kvm); or at least put a comment to explain why H_SVM_INIT_ABORT doesn't have the same sanity check as the other SVM hcalls. > + ret = kvmppc_h_svm_init_abort(vcpu->kvm); > break; > > default:
Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
Thanks for reviewing this patch Ira. My responses below: Ira Weiny writes: > On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote: >> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm >> modules and add the command family to the white list of NVDIMM command >> sets. Also advertise support for ND_CMD_CALL for the dimm >> command mask and implement necessary scaffolding in the module to >> handle ND_CMD_CALL ioctl and PDSM requests that we receive. >> >> The layout of the PDSM request as we expect from libnvdimm/libndctl is >> described in newly introduced uapi header 'papr_scm_pdsm.h' which >> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used >> to communicate the PDSM request via member >> 'nd_pkg_papr_scm->nd_command' and size of payload that need to be >> sent/received for servicing the PDSM. >> >> A new function is_cmd_valid() is implemented that reads the args to >> papr_scm_ndctl() and performs sanity tests on them. A new function >> papr_scm_service_pdsm() is introduced and is called from >> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL >> command from libnvdimm. >> >> Cc: Dan Williams >> Cc: Michael Ellerman >> Cc: "Aneesh Kumar K . V" >> Signed-off-by: Vaibhav Jain >> --- >> Changelog: >> >> Resend: >> * None >> >> v6..v7 : >> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h >> [Mpe]. >> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe]. >> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h >> [Mpe]. >> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe] >> >> v5..v6 : >> * Changed the usage of the term DSM to PDSM to distinguish it from the >> ACPI term [ Dan Williams ] >> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct >> to reflect the new terminology. >> * Updated the patch description and title to reflect the new terminology. >> * Squashed patch to introduce new command family in 'ndctl.h' with >> this patch [ Dan Williams ] >> * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0 >> [ Dan Williams ] >> * Removed redundant license text from the papr_scm_psdm.h file. >> [ Dan Williams ] >> * s/envelop/envelope/ at various places [ Dan Williams ] >> * Added '__packed' attribute to command package header to gaurd >> against different compiler adding paddings between the fields. >> [ Dan Williams] >> * Converted various pr_debug to dev_debug [ Dan Williams ] >> >> v4..v5 : >> * None >> >> v3..v4 : >> * None >> >> v2..v3 : >> * Updated the patch prefix to 'ndctl/uapi' [Aneesh] >> >> v1..v2 : >> * None >> --- >> arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++ >> arch/powerpc/platforms/pseries/papr_scm.c | 101 - >> include/uapi/linux/ndctl.h| 1 + >> 3 files changed, 230 insertions(+), 6 deletions(-) >> create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h >> >> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h >> b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h >> new file mode 100644 >> index ..671693439c1c >> --- /dev/null >> +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h >> @@ -0,0 +1,134 @@ >> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ >> +/* >> + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl >> + * >> + * (C) Copyright IBM 2020 >> + * >> + * Author: Vaibhav Jain >> + */ >> + >> +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ >> +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ >> + >> +#include >> + >> +/* >> + * PDSM Envelope: >> + * >> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via >> + * 'envelopes' which consists of a header and user-defined payload sections. >> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a >> + * payload following it and offset of which relative to the struct is >> provided >> + * by 'nd_pdsm_cmd_pkg.payload_offset'. * >> + * >> + * +-+-+---+ >> + * | 64-Bytes | 8-Bytes | Max 184-Bytes | >> + * +-+-+---+ >> + * | nd_pdsm_cmd_pkg | | >> + * |-+ | | >> + * | nd_cmd_pkg | | | >> + * +-+-+---+ >> + * | nd_family | | | >> + * | nd_size_out | cmd_status | | >> + * | nd_size_in | payload_version | PAYLOAD | >> + * | nd_command | payload_offset -> | >> + * | nd_fw_size | | | >> + * +-+-+---+ >> + * >> + * PDSM He
Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
Dan Williams writes: > On Tue, May 19, 2020 at 6:53 AM Aneesh Kumar K.V > wrote: >> >> Dan Williams writes: >> >> > On Mon, May 18, 2020 at 10:30 PM Aneesh Kumar K.V >> > wrote: >> >> ... >> >> >> Applications using new instructions will behave as expected when running >> >> on P8 and P9. Only future hardware will differentiate between 'dcbf' and >> >> 'dcbfps' >> > >> > Right, this is the problem. Applications using new instructions behave >> > as expected, the kernel has been shipping of_pmem and papr_scm for >> > several cycles now, you're saying that the DAX applications written >> > against those platforms are going to be broken on P8 and P9? >> >> The expecation is that both kernel and userspace would get upgraded to >> use the new instruction before actual persistent memory devices are >> made available. >> >> > >> >> > I'm thinking the kernel >> >> > should go as far as to disable DAX operation by default on new >> >> > hardware until userspace asserts that it is prepared to switch to the >> >> > new implementation. Is there any other way to ensure the forward >> >> > compatibility of deployed ppc64 DAX applications? >> >> >> >> AFAIU there is no released persistent memory hardware on ppc64 platform >> >> and we need to make sure before applications get enabled to use these >> >> persistent memory devices, they should switch to use the new >> >> instruction? >> > >> > Right, I want the kernel to offer some level of safety here because >> > everything you are describing sounds like a flag day conversion. Am I >> > misreading? Is there some other gate that prevents existing users of >> > of_pmem and papr_scm from having their expectations violated when >> > running on P8 / P9 hardware? Maybe there's tighter ecosystem control >> > that I'm just not familiar with, I'm only going off the fact that the >> > kernel has shipped a non-zero number of NVDIMM drivers that build with >> > ARCH=ppc64 for several cycles. >> >> If we are looking at adding changes to kernel that will prevent a kernel >> from running on newer hardware in a specific case, we could as well take >> the changes to get the kernel use the newer instructions right? > > Oh, no, I'm not talking about stopping the kernel from running. I'm > simply recommending that support for MAP_SYNC mappings (userspace > managed flushing) be disabled by default on PPC with either a > compile-time or run-time default to assert that userspace has been > audited for legacy applications or that the platform owner is > otherwise willing to take the risk. > >> But I agree with your concern that if we have older kernel/applications >> that continue to use `dcbf` on future hardware we will end up >> having issues w.r.t powerfail consistency. The plan is what you outlined >> above as tighter ecosystem control. Considering we don't have a pmem >> device generally available, we get both kernel and userspace upgraded >> to use these new instructions before such a device is made available. > > Ok, I think a compile time kernel option with a runtime override > satisfies my concern. Does that work for you? something like below? But this still won't handle devdax mmap right? diff --git a/arch/arm64/include/asm/libnvdimm.h b/arch/arm64/include/asm/libnvdimm.h new file mode 100644 index ..aee697a72537 --- /dev/null +++ b/arch/arm64/include/asm/libnvdimm.h @@ -0,0 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ARCH_LIBNVDIMM_H__ +#define __ARCH_LIBNVDIMM_H__ + +#endif /* __ARCH_LIBNVDIMM_H__ */ diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h new file mode 100644 index ..da479200bfb8 --- /dev/null +++ b/arch/powerpc/include/asm/libnvdimm.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __ARCH_LIBNVDIMM_H__ +#define __ARCH_LIBNVDIMM_H__ + +#define arch_disable_sync_nvdimm arch_disable_sync_nvdimm +extern bool arch_disable_sync_nvdimm(void); + +#endif /* __ARCH_LIBNVDIMM_H__ */ diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c index 0666a8d29596..3ce4fb4f167b 100644 --- a/arch/powerpc/lib/pmem.c +++ b/arch/powerpc/lib/pmem.c @@ -9,6 +9,8 @@ #include + +static bool sync_fault = IS_ENABLED(CONFIG_PPC_NVDIMM_SYNC_FAULT); /* * CONFIG_ARCH_HAS_PMEM_API symbols */ @@ -57,3 +59,16 @@ void memcpy_page_flushcache(char *to, struct page *page, size_t offset, memcpy_flushcache(to, page_to_virt(page) + offset, len); } EXPORT_SYMBOL(memcpy_page_flushcache); + +bool arch_disable_sync_nvdimm(void) +{ + return !sync_fault; +} + +static int __init parse_sync_fault(char *p) +{ + sync_fault = true; + return 0; +} +early_param("enable_sync_fault", parse_sync_fault); + diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 27a81c291be8..dde11d75a746 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -383,6 +383,15 @@ config PPC_KUEP If
Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
On Wed, 20 May 2020 19:43:08 +0200 Laurent Dufour wrote: > The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* > Hcalls") added checks of secure bit of SRR1 to filter out the Hcall > reserved to the Ultravisor. > > However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the > context of the VM calling UV_ESM. This allows the Hypervisor to return to > the guest without going through the Ultravisor. Thus the Secure bit of SRR1 > is not set in that particular case. > > In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be > filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is > not set in that case. > > Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") > Signed-off-by: Laurent Dufour > --- Reviewed-by: Greg Kurz > arch/powerpc/kvm/book3s_hv.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 93493f0cbfe8..6ad1a3b14300 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) > ret = kvmppc_h_svm_init_done(vcpu->kvm); > break; > case H_SVM_INIT_ABORT: > - ret = H_UNSUPPORTED; > - if (kvmppc_get_srr1(vcpu) & MSR_S) > - ret = kvmppc_h_svm_init_abort(vcpu->kvm); > + /* > + * Even if that call is made by the Ultravisor, the SSR1 value > + * is the guest context one, with the secure bit clear as it has > + * not yet been secured. So we can't check it here. > + */ > + ret = kvmppc_h_svm_init_abort(vcpu->kvm); > break; > > default:
[PATCH v2 0/7] padata: parallelize deferred page init
Deferred struct page init is a bottleneck in kernel boot--the biggest for us and probably others. Optimizing it maximizes availability for large-memory systems and allows spinning up short-lived VMs as needed without having to leave them running. It also benefits bare metal machines hosting VMs that are sensitive to downtime. In projects such as VMM Fast Restart[1], where guest state is preserved across kexec reboot, it helps prevent application and network timeouts in the guests. So, multithread deferred init to take full advantage of system memory bandwidth. Extend padata, a framework that handles many parallel singlethreaded jobs, to handle multithreaded jobs as well by adding support for splitting up the work evenly, specifying a minimum amount of work that's appropriate for one helper thread to do, load balancing between helpers, and coordinating them. More documentation in patches 4 and 7. This series is the first step in a project to address other memory proportional bottlenecks in the kernel such as pmem struct page init, vfio page pinning, hugetlb fallocate, and munmap. Deferred page init doesn't require concurrency limits, resource control, or priority adjustments like these other users will because it happens during boot when the system is otherwise idle and waiting for page init to finish. This has been run on a variety of x86 systems and speeds up kernel boot by 3% to 49%, saving up to 1.6 out of 4 seconds. Patch 5 has more numbers. Please review and test, and thanks to Alex, Andrew, Josh, and Pavel for their feedback in the last version. The powerpc and s390 lists are included in case they want to give this a try, they had enabled this feature when it was configured per arch. Series based on 5.7-rc6 plus these three from mmotm mm-call-touch_nmi_watchdog-on-max-order-boundaries-in-deferred-init.patch mm-initialize-deferred-pages-with-interrupts-enabled.patch mm-call-cond_resched-from-deferred_init_memmap.patch and it's available here: git://oss.oracle.com/git/linux-dmjordan.git padata-mt-definit-v2 https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-definit-v2 and the future users and related features are available as work-in-progress: git://oss.oracle.com/git/linux-dmjordan.git padata-mt-wip-v0.4 https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-wip-v0.4 v2: - Improve the problem statement (Andrew, Josh, Pavel) - Add T-b's to unchanged patches (Josh) - Fully initialize max-order blocks to avoid buddy issues (Alex) - Parallelize on section-aligned boundaries to avoid potential false sharing (Alex) - Return the maximum thread count from a function that architectures can override, with the generic version returning 1 (current behavior). Override for x86 since that's the only arch this series has been tested on so far. Other archs can test with more threads by dropping patch 6. - Rebase to v5.7-rc6, rerun tests RFC v4 [2] -> v1: - merged with padata (Peter) - got rid of the 'task' nomenclature (Peter, Jon) future work branch: - made lockdep-aware (Jason, Peter) - adjust workqueue worker priority with renice_or_cancel() (Tejun) - fixed undo problem in VFIO (Alex) The remaining feedback, mainly resource control awareness (cgroup etc), is TODO for later series. [1] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf https://www.youtube.com/watch?v=pBsHnf93tcQ https://lore.kernel.org/linux-mm/1588812129-8596-1-git-send-email-anthony.yzn...@oracle.com/ [2] https://lore.kernel.org/linux-mm/20181105165558.11698-1-daniel.m.jor...@oracle.com/ Daniel Jordan (7): padata: remove exit routine padata: initialize earlier padata: allocate work structures for parallel jobs from a pool padata: add basic support for multithreaded jobs mm: parallelize deferred_init_memmap() mm: make deferred init's max threads arch-specific padata: document multithreaded jobs Documentation/core-api/padata.rst | 41 +++-- arch/x86/mm/init_64.c | 12 ++ include/linux/memblock.h | 3 + include/linux/padata.h| 43 - init/main.c | 2 + kernel/padata.c | 277 -- mm/Kconfig| 6 +- mm/page_alloc.c | 67 +++- 8 files changed, 373 insertions(+), 78 deletions(-) base-commit: b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce prerequisite-patch-id: 4ad522141e1119a325a9799dad2bd982fbac8b7c prerequisite-patch-id: 169273327e56f5461101a71dfbd6b4cfd4570cf0 prerequisite-patch-id: 0f34692c8a9673d4c4f6a3545cf8ec3a2abf8620 -- 2.26.2
[PATCH v2 1/7] padata: remove exit routine
padata_driver_exit() is unnecessary because padata isn't built as a module and doesn't exit. padata's init routine will soon allocate memory, so getting rid of the exit function now avoids pointless code to free it. Signed-off-by: Daniel Jordan --- kernel/padata.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/kernel/padata.c b/kernel/padata.c index a6afa12fb75ee..835919c745266 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -1072,10 +1072,4 @@ static __init int padata_driver_init(void) } module_init(padata_driver_init); -static __exit void padata_driver_exit(void) -{ - cpuhp_remove_multi_state(CPUHP_PADATA_DEAD); - cpuhp_remove_multi_state(hp_online); -} -module_exit(padata_driver_exit); #endif -- 2.26.2
[PATCH v2 7/7] padata: document multithreaded jobs
Add Documentation for multithreaded jobs. Signed-off-by: Daniel Jordan --- Documentation/core-api/padata.rst | 41 +++ 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/Documentation/core-api/padata.rst b/Documentation/core-api/padata.rst index 9a24c111781d9..b7e047af993e8 100644 --- a/Documentation/core-api/padata.rst +++ b/Documentation/core-api/padata.rst @@ -4,23 +4,26 @@ The padata parallel execution mechanism === -:Date: December 2019 +:Date: April 2020 Padata is a mechanism by which the kernel can farm jobs out to be done in -parallel on multiple CPUs while retaining their ordering. It was developed for -use with the IPsec code, which needs to be able to perform encryption and -decryption on large numbers of packets without reordering those packets. The -crypto developers made a point of writing padata in a sufficiently general -fashion that it could be put to other uses as well. +parallel on multiple CPUs while optionally retaining their ordering. -Usage -= +It was originally developed for IPsec, which needs to perform encryption and +decryption on large numbers of packets without reordering those packets. This +is currently the sole consumer of padata's serialized job support. + +Padata also supports multithreaded jobs, splitting up the job evenly while load +balancing and coordinating between threads. + +Running Serialized Jobs +=== Initializing -The first step in using padata is to set up a padata_instance structure for -overall control of how jobs are to be run:: +The first step in using padata to run parallel jobs is to set up a +padata_instance structure for overall control of how jobs are to be run:: #include @@ -162,6 +165,24 @@ functions that correspond to the allocation in reverse:: It is the user's responsibility to ensure all outstanding jobs are complete before any of the above are called. +Running Multithreaded Jobs +== + +A multithreaded job has a main thread and zero or more helper threads, with the +main thread participating in the job and then waiting until all helpers have +finished. padata splits the job into units called chunks, where a chunk is a +piece of the job that one thread completes in one call to the thread function. + +A user has to do three things to run a multithreaded job. First, describe the +job by defining a padata_mt_job structure, which is explained in the Interface +section. This includes a pointer to the thread function, which padata will +call each time it assigns a job chunk to a thread. Then, define the thread +function, which accepts three arguments, ``start``, ``end``, and ``arg``, where +the first two delimit the range that the thread operates on and the last is a +pointer to the job's shared state, if any. Prepare the shared state, which is +typically a stack-allocated structure that wraps the required data. Last, call +padata_do_multithreaded(), which will return once the job is finished. + Interface = -- 2.26.2
[PATCH v2 3/7] padata: allocate work structures for parallel jobs from a pool
padata allocates per-CPU, per-instance work structs for parallel jobs. A do_parallel call assigns a job to a sequence number and hashes the number to a CPU, where the job will eventually run using the corresponding work. This approach fit with how padata used to bind a job to each CPU round-robin, makes less sense after commit bfde23ce200e6 ("padata: unbind parallel jobs from specific CPUs") because a work isn't bound to a particular CPU anymore, and isn't needed at all for multithreaded jobs because they don't have sequence numbers. Replace the per-CPU works with a preallocated pool, which allows sharing them between existing padata users and the upcoming multithreaded user. The pool will also facilitate setting NUMA-aware concurrency limits with later users. The pool is sized according to the number of possible CPUs. With this limit, MAX_OBJ_NUM no longer makes sense, so remove it. If the global pool is exhausted, a parallel job is run in the current task instead to throttle a system trying to do too much in parallel. Signed-off-by: Daniel Jordan --- include/linux/padata.h | 8 +-- kernel/padata.c| 118 +++-- 2 files changed, 78 insertions(+), 48 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 476ecfa41f363..3bfa503503ac5 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -24,7 +24,6 @@ * @list: List entry, to attach to the padata lists. * @pd: Pointer to the internal control structure. * @cb_cpu: Callback cpu for serializatioon. - * @cpu: Cpu for parallelization. * @seq_nr: Sequence number of the parallelized data object. * @info: Used to pass information from the parallel to the serial function. * @parallel: Parallel execution function. @@ -34,7 +33,6 @@ struct padata_priv { struct list_headlist; struct parallel_data*pd; int cb_cpu; - int cpu; unsigned intseq_nr; int info; void(*parallel)(struct padata_priv *padata); @@ -68,15 +66,11 @@ struct padata_serial_queue { /** * struct padata_parallel_queue - The percpu padata parallel queue * - * @parallel: List to wait for parallelization. * @reorder: List to wait for reordering after parallel processing. - * @work: work struct for parallelization. * @num_obj: Number of objects that are processed by this cpu. */ struct padata_parallel_queue { - struct padata_listparallel; struct padata_listreorder; - struct work_structwork; atomic_t num_obj; }; @@ -111,7 +105,7 @@ struct parallel_data { struct padata_parallel_queue__percpu *pqueue; struct padata_serial_queue __percpu *squeue; atomic_trefcnt; - atomic_tseq_nr; + unsigned intseq_nr; unsigned intprocessed; int cpu; struct padata_cpumask cpumask; diff --git a/kernel/padata.c b/kernel/padata.c index 6f709bc0fc413..78ff9aa529204 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -32,7 +32,15 @@ #include #include -#define MAX_OBJ_NUM 1000 +struct padata_work { + struct work_struct pw_work; + struct list_headpw_list; /* padata_free_works linkage */ + void*pw_data; +}; + +static DEFINE_SPINLOCK(padata_works_lock); +static struct padata_work *padata_works; +static LIST_HEAD(padata_free_works); static void padata_free_pd(struct parallel_data *pd); @@ -58,30 +66,44 @@ static int padata_cpu_hash(struct parallel_data *pd, unsigned int seq_nr) return padata_index_to_cpu(pd, cpu_index); } -static void padata_parallel_worker(struct work_struct *parallel_work) +static struct padata_work *padata_work_alloc(void) { - struct padata_parallel_queue *pqueue; - LIST_HEAD(local_list); + struct padata_work *pw; - local_bh_disable(); - pqueue = container_of(parallel_work, - struct padata_parallel_queue, work); + lockdep_assert_held(&padata_works_lock); - spin_lock(&pqueue->parallel.lock); - list_replace_init(&pqueue->parallel.list, &local_list); - spin_unlock(&pqueue->parallel.lock); + if (list_empty(&padata_free_works)) + return NULL;/* No more work items allowed to be queued. */ - while (!list_empty(&local_list)) { - struct padata_priv *padata; + pw = list_first_entry(&padata_free_works, struct padata_work, pw_list); + list_del(&pw->pw_list); + return pw; +} - padata = list_entry(local_list.next, - struct padata_priv, list); +static void padata_work_init(struct padata_work *pw, work_func_t work_fn, +void *
[PATCH v2 2/7] padata: initialize earlier
padata will soon initialize the system's struct pages in parallel, so it needs to be ready by page_alloc_init_late(). The error return from padata_driver_init() triggers an initcall warning, so add a warning to padata_init() to avoid silent failure. Signed-off-by: Daniel Jordan --- include/linux/padata.h | 6 ++ init/main.c| 2 ++ kernel/padata.c| 17 - 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index a0d8b41850b25..476ecfa41f363 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -164,6 +164,12 @@ struct padata_instance { #definePADATA_INVALID 4 }; +#ifdef CONFIG_PADATA +extern void __init padata_init(void); +#else +static inline void __init padata_init(void) {} +#endif + extern struct padata_instance *padata_alloc_possible(const char *name); extern void padata_free(struct padata_instance *pinst); extern struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); diff --git a/init/main.c b/init/main.c index 03371976d3872..8ab521f7af5d2 100644 --- a/init/main.c +++ b/init/main.c @@ -94,6 +94,7 @@ #include #include #include +#include #include #include @@ -1482,6 +1483,7 @@ static noinline void __init kernel_init_freeable(void) smp_init(); sched_init_smp(); + padata_init(); page_alloc_init_late(); /* Initialize page ext after all struct pages are initialized. */ page_ext_init(); diff --git a/kernel/padata.c b/kernel/padata.c index 835919c745266..6f709bc0fc413 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -31,7 +31,6 @@ #include #include #include -#include #define MAX_OBJ_NUM 1000 @@ -1050,26 +1049,26 @@ void padata_free_shell(struct padata_shell *ps) } EXPORT_SYMBOL(padata_free_shell); -#ifdef CONFIG_HOTPLUG_CPU - -static __init int padata_driver_init(void) +void __init padata_init(void) { +#ifdef CONFIG_HOTPLUG_CPU int ret; ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "padata:online", padata_cpu_online, NULL); if (ret < 0) - return ret; + goto err; hp_online = ret; ret = cpuhp_setup_state_multi(CPUHP_PADATA_DEAD, "padata:dead", NULL, padata_cpu_dead); if (ret < 0) { cpuhp_remove_multi_state(hp_online); - return ret; + goto err; } - return 0; -} -module_init(padata_driver_init); + return; +err: + pr_warn("padata: initialization failed\n"); #endif +} -- 2.26.2
[PATCH v2 6/7] mm: make deferred init's max threads arch-specific
Using padata during deferred init has only been tested on x86, so for now limit it to this architecture. If another arch wants this, it can find the max thread limit that's best for it and override deferred_page_init_max_threads(). Signed-off-by: Daniel Jordan --- arch/x86/mm/init_64.c| 12 include/linux/memblock.h | 3 +++ mm/page_alloc.c | 13 - 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 8b5f73f5e207c..2d749ec12ea8a 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1260,6 +1260,18 @@ void __init mem_init(void) mem_init_print_info(NULL); } +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) +{ + /* +* More CPUs always led to greater speedups on tested systems, up to +* all the nodes' CPUs. Use all since the system is otherwise idle +* now. +*/ + return max_t(int, cpumask_weight(node_cpumask), 1); +} +#endif + int kernel_set_to_readonly; void mark_rodata_ro(void) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 6bc37a731d27b..2b289df44194f 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -275,6 +275,9 @@ void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone, #define for_each_free_mem_pfn_range_in_zone_from(i, zone, p_start, p_end) \ for (; i != U64_MAX; \ __next_mem_pfn_range_in_zone(&i, zone, p_start, p_end)) + +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask); + #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ /** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9cb780e8dec78..0d7d805f98b2d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1843,6 +1843,13 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn, atomic_long_add(nr_pages, &args->nr_pages); } +/* An arch may override for more concurrency. */ +__weak int __init +deferred_page_init_max_threads(const struct cpumask *node_cpumask) +{ + return 1; +} + /* Initialise remaining memory on a node */ static int __init deferred_init_memmap(void *data) { @@ -1891,11 +1898,7 @@ static int __init deferred_init_memmap(void *data) first_init_pfn)) goto zone_empty; - /* -* More CPUs always led to greater speedups on tested systems, up to -* all the nodes' CPUs. Use all since the system is otherwise idle now. -*/ - max_threads = max(cpumask_weight(cpumask), 1u); + max_threads = deferred_page_init_max_threads(cpumask); while (spfn < epfn) { epfn_align = ALIGN_DOWN(epfn, PAGES_PER_SECTION); -- 2.26.2
[PATCH v2 5/7] mm: parallelize deferred_init_memmap()
Deferred struct page init is a significant bottleneck in kernel boot. Optimizing it maximizes availability for large-memory systems and allows spinning up short-lived VMs as needed without having to leave them running. It also benefits bare metal machines hosting VMs that are sensitive to downtime. In projects such as VMM Fast Restart[1], where guest state is preserved across kexec reboot, it helps prevent application and network timeouts in the guests. Multithread to take full advantage of system memory bandwidth. The maximum number of threads is capped at the number of CPUs on the node because speedups always improve with additional threads on every system tested, and at this phase of boot, the system is otherwise idle and waiting on page init to finish. Helper threads operate on section-aligned ranges to both avoid false sharing when setting the pageblock's migrate type and to avoid accessing uninitialized buddy pages, though max order alignment is enough for the latter. The minimum chunk size is also a section. There was benefit to using multiple threads even on relatively small memory (1G) systems, and this is the smallest size that the alignment allows. The time (milliseconds) is the slowest node to initialize since boot blocks until all nodes finish. intel_pstate is loaded in active mode without hwp and with turbo enabled, and intel_idle is active as well. Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz (Skylake, bare metal) 2 nodes * 26 cores * 2 threads = 104 CPUs 384G/node = 768G memory kernel boot deferred init node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) ( 0) -- 4078.0 ( 9.0) -- 1779.0 ( 8.7) 2% ( 1) 1.4% 4021.3 ( 2.9) 3.4% 1717.7 ( 7.8) 12% ( 6) 35.1% 2644.7 ( 35.3) 80.8%341.0 ( 35.5) 25% ( 13) 38.7% 2498.0 ( 34.2) 89.1%193.3 ( 32.3) 37% ( 19) 39.1% 2482.0 ( 25.2) 90.1%175.3 ( 31.7) 50% ( 26) 38.8% 2495.0 ( 8.7) 89.1%193.7 ( 3.5) 75% ( 39) 39.2% 2478.0 ( 21.0) 90.3%172.7 ( 26.7) 100% ( 52) 40.0% 2448.0 ( 2.0) 91.9%143.3 ( 1.5) Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz (Broadwell, bare metal) 1 node * 16 cores * 2 threads = 32 CPUs 192G/node = 192G memory kernel boot deferred init node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) ( 0) -- 1996.0 ( 18.0) -- 1104.3 ( 6.7) 3% ( 1) 1.4% 1968.0 ( 3.0) 2.7% 1074.7 ( 9.0) 12% ( 4) 40.1% 1196.0 ( 22.7) 72.4%305.3 ( 16.8) 25% ( 8) 47.4% 1049.3 ( 17.2) 84.2%174.0 ( 10.6) 37% ( 12) 48.3% 1032.0 ( 14.9) 86.8%145.3 ( 2.5) 50% ( 16) 48.9% 1020.3 ( 2.5) 88.0%133.0 ( 1.7) 75% ( 24) 49.1% 1016.3 ( 8.1) 88.4%128.0 ( 1.7) 100% ( 32) 49.4% 1009.0 ( 8.5) 88.6%126.3 ( 0.6) Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, bare metal) 2 nodes * 18 cores * 2 threads = 72 CPUs 128G/node = 256G memory kernel boot deferred init node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) ( 0) -- 1682.7 ( 6.7) --630.0 ( 4.6) 3% ( 1) 0.4% 1676.0 ( 2.0) 0.7%625.3 ( 3.2) 12% ( 4) 25.8% 1249.0 ( 1.0) 68.2%200.3 ( 1.2) 25% ( 9) 30.0% 1178.0 ( 5.2) 79.7%128.0 ( 3.5) 37% ( 13) 30.6% 1167.7 ( 3.1) 81.3%117.7 ( 1.2) 50% ( 18) 30.6% 1167.3 ( 2.3) 81.4%117.0 ( 1.0) 75% ( 27) 31.0% 1161.3 ( 4.6) 82.5%110.0 ( 6.9) 100% ( 36) 32.1% 1142.0 ( 3.6) 85.7% 90.0 ( 1.0) AMD EPYC 7551 32-Core Processor (Zen, kvm guest) 1 node * 8 cores * 2 threads = 16 CPUs 64G/node = 64G memory kernel boot deferred init node% (thr)speedup time_ms (stdev)speedup time_ms (stdev) ( 0) -- 1003.7 ( 16.6) --243.3 ( 8.1) 6% ( 1) 1.4%990.0 ( 4.6) 1.2%240.3 ( 1.5) 12% ( 2) 11.4%889.3 ( 16.7) 44.5%135.0 ( 3.0) 25% ( 4) 16.8%835.3 ( 9.0) 65.8% 83.3 ( 2.5) 37% ( 6) 18.6%816.7 ( 17.6) 70.4% 72.0 ( 1.0) 50% ( 8) 18.2%821.0 ( 5.0) 70.7% 71.3 ( 1.2) 75% ( 12) 19.0%813.3 ( 5.0)
[PATCH v2 4/7] padata: add basic support for multithreaded jobs
Sometimes the kernel doesn't take full advantage of system memory bandwidth, leading to a single CPU spending excessive time in initialization paths where the data scales with memory size. Multithreading naturally addresses this problem. Extend padata, a framework that handles many parallel yet singlethreaded jobs, to also handle multithreaded jobs by adding support for splitting up the work evenly, specifying a minimum amount of work that's appropriate for one helper thread to do, load balancing between helpers, and coordinating them. This is inspired by work from Pavel Tatashin and Steve Sistare. Signed-off-by: Daniel Jordan --- include/linux/padata.h | 29 kernel/padata.c| 152 - 2 files changed, 178 insertions(+), 3 deletions(-) diff --git a/include/linux/padata.h b/include/linux/padata.h index 3bfa503503ac5..b0affa466a841 100644 --- a/include/linux/padata.h +++ b/include/linux/padata.h @@ -4,6 +4,9 @@ * * Copyright (C) 2008, 2009 secunet Security Networks AG * Copyright (C) 2008, 2009 Steffen Klassert + * + * Copyright (c) 2020 Oracle and/or its affiliates. + * Author: Daniel Jordan */ #ifndef PADATA_H @@ -130,6 +133,31 @@ struct padata_shell { struct list_headlist; }; +/** + * struct padata_mt_job - represents one multithreaded job + * + * @thread_fn: Called for each chunk of work that a padata thread does. + * @fn_arg: The thread function argument. + * @start: The start of the job (units are job-specific). + * @size: size of this node's work (units are job-specific). + * @align: Ranges passed to the thread function fall on this boundary, with the + * possible exceptions of the beginning and end of the job. + * @min_chunk: The minimum chunk size in job-specific units. This allows + * the client to communicate the minimum amount of work that's + * appropriate for one worker thread to do at once. + * @max_threads: Max threads to use for the job, actual number may be less + * depending on task size and minimum chunk size. + */ +struct padata_mt_job { + void (*thread_fn)(unsigned long start, unsigned long end, void *arg); + void*fn_arg; + unsigned long start; + unsigned long size; + unsigned long align; + unsigned long min_chunk; + int max_threads; +}; + /** * struct padata_instance - The overall control structure. * @@ -171,6 +199,7 @@ extern void padata_free_shell(struct padata_shell *ps); extern int padata_do_parallel(struct padata_shell *ps, struct padata_priv *padata, int *cb_cpu); extern void padata_do_serial(struct padata_priv *padata); +extern void __init padata_do_multithreaded(struct padata_mt_job *job); extern int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, cpumask_var_t cpumask); extern int padata_start(struct padata_instance *pinst); diff --git a/kernel/padata.c b/kernel/padata.c index 78ff9aa529204..e78f57d9aef90 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -7,6 +7,9 @@ * Copyright (C) 2008, 2009 secunet Security Networks AG * Copyright (C) 2008, 2009 Steffen Klassert * + * Copyright (c) 2020 Oracle and/or its affiliates. + * Author: Daniel Jordan + * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, * version 2, as published by the Free Software Foundation. @@ -21,6 +24,7 @@ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. */ +#include #include #include #include @@ -32,6 +36,8 @@ #include #include +#definePADATA_WORK_ONSTACK 1 /* Work's memory is on stack */ + struct padata_work { struct work_struct pw_work; struct list_headpw_list; /* padata_free_works linkage */ @@ -42,7 +48,17 @@ static DEFINE_SPINLOCK(padata_works_lock); static struct padata_work *padata_works; static LIST_HEAD(padata_free_works); +struct padata_mt_job_state { + spinlock_t lock; + struct completion completion; + struct padata_mt_job*job; + int nworks; + int nworks_fini; + unsigned long chunk_size; +}; + static void padata_free_pd(struct parallel_data *pd); +static void __init padata_mt_helper(struct work_struct *work); static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index) { @@ -81,18 +97,56 @@ static struct padata_work *padata_work_alloc(void) } static void padata_work_init(struct padata_work *pw, work_func_t work_fn, -void *data) +void *data, int flags) { - INIT_WORK(&pw->pw_work, work_fn); + if (flags & PADATA_WORK_ONSTACK) + INIT_WORK_ONSTACK(&pw->
[PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") added checks of secure bit of SRR1 to filter out the Hcall reserved to the Ultravisor. However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the context of the VM calling UV_ESM. This allows the Hypervisor to return to the guest without going through the Ultravisor. Thus the Secure bit of SRR1 is not set in that particular case. In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is not set in that case. Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") Signed-off-by: Laurent Dufour --- arch/powerpc/kvm/book3s_hv.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 93493f0cbfe8..6ad1a3b14300 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) ret = kvmppc_h_svm_init_done(vcpu->kvm); break; case H_SVM_INIT_ABORT: - ret = H_UNSUPPORTED; - if (kvmppc_get_srr1(vcpu) & MSR_S) - ret = kvmppc_h_svm_init_abort(vcpu->kvm); + /* +* Even if that call is made by the Ultravisor, the SSR1 value +* is the guest context one, with the secure bit clear as it has +* not yet been secured. So we can't check it here. +*/ + ret = kvmppc_h_svm_init_abort(vcpu->kvm); break; default: -- 2.26.2
Re: [PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
Le 20/05/2020 à 19:32, Greg Kurz a écrit : On Wed, 20 May 2020 18:51:10 +0200 Laurent Dufour wrote: The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") added checks of secure bit of SRR1 to filter out the Hcall reserved to the Ultravisor. However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the context of the VM calling UV_ESM. This allows the Hypervisor to return to the guest without going through the Ultravisor. Thus the Secure bit of SRR1 is not set in that particular case. In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is not set in that case. Why not checking vcpu->kvm->arch.secure_guest then ? I don't think that's the right place. Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") Signed-off-by: Laurent Dufour --- arch/powerpc/kvm/book3s_hv.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 93493f0cbfe8..eb1f96cb7b72 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) ret = kvmppc_h_svm_init_done(vcpu->kvm); break; case H_SVM_INIT_ABORT: - ret = H_UNSUPPORTED; - if (kvmppc_get_srr1(vcpu) & MSR_S) - ret = kvmppc_h_svm_init_abort(vcpu->kvm); or at least put a comment to explain why H_SVM_INIT_ABORT doesn't have the same sanity check as the other SVM hcalls. I agree that might help. I'll send a v2 with a comment there. + ret = kvmppc_h_svm_init_abort(vcpu->kvm); break; default:
Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
Thanks for reviewing this this patch Ira. My responses below: Ira Weiny writes: > On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote: >> Implement support for fetching nvdimm health information via >> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair >> of 64-bit big-endian integers, bitwise-and of which is then stored in >> 'struct papr_scm_priv' and subsequently partially exposed to >> user-space via newly introduced dimm specific attribute >> 'papr/flags'. Since the hcall is costly, the health information is >> cached and only re-queried, 60s after the previous successful hcall. >> >> The patch also adds a documentation text describing flags reported by >> the the new sysfs attribute 'papr/flags' is also introduced at >> Documentation/ABI/testing/sysfs-bus-papr-scm. >> >> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for >> PAPR hcalls") >> >> Cc: Dan Williams >> Cc: Michael Ellerman >> Cc: "Aneesh Kumar K . V" >> Signed-off-by: Vaibhav Jain >> --- >> Changelog: >> >> Resend: >> * None >> >> v6..v7 : >> * Used the exported buf_seq_printf() function to generate content for >> 'papr/flags' >> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c >> and removed the papr_scm.h file [Mpe] >> * Some minor consistency issued in sysfs-bus-papr-scm >> documentation. [Mpe] >> * s/dimm_mutex/health_mutex/g [Mpe] >> * Split drc_pmem_query_health() into two function one of which takes >> care of caching and locking. [Mpe] >> * Fixed a local copy creation of dimm health information using >> READ_ONCE(). [Mpe] >> >> v5..v6 : >> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags' >> [Dan Williams] >> * Include documentation for 'papr/flags' attr [Dan Williams] >> * Change flag 'save_fail' to 'flush_fail' [Dan Williams] >> * Caching of health bitmap to reduce expensive hcalls [Dan Williams] >> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe] >> * Replaced two __be64 integers from papr_scm_priv to a single u64 >> integer [Mpe] >> * Updated patch description to reflect the changes made in this >> version. >> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from >> flags_show() [Dan Williams] >> >> v4..v5 : >> * None >> >> v3..v4 : >> * None >> >> v2..v3 : >> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for >> NVDIMM unarmed [Aneesh] >> >> v1..v2 : >> * New patch in the series. >> --- >> Documentation/ABI/testing/sysfs-bus-papr-scm | 27 +++ >> arch/powerpc/platforms/pseries/papr_scm.c| 169 ++- >> 2 files changed, 194 insertions(+), 2 deletions(-) >> create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm >> >> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm >> b/Documentation/ABI/testing/sysfs-bus-papr-scm >> new file mode 100644 >> index ..6143d06072f1 >> --- /dev/null >> +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm >> @@ -0,0 +1,27 @@ >> +What: /sys/bus/nd/devices/nmemX/papr/flags >> +Date: Apr, 2020 >> +KernelVersion: v5.8 >> +Contact:linuxppc-dev , >> linux-nvd...@lists.01.org, >> +Description: >> +(RO) Report flags indicating various states of a >> +papr-scm NVDIMM device. Each flag maps to a one or >> +more bits set in the dimm-health-bitmap retrieved in >> +response to H_SCM_HEALTH hcall. The details of the bit >> +flags returned in response to this hcall is available >> +at 'Documentation/powerpc/papr_hcalls.rst' . Below are >> +the flags reported in this sysfs file: >> + >> +* "not_armed" : Indicates that NVDIMM contents will not >> + survive a power cycle. >> +* "flush_fail" : Indicates that NVDIMM contents >> + couldn't be flushed during last >> + shut-down event. >> +* "restore_fail": Indicates that NVDIMM contents >> + couldn't be restored during NVDIMM >> + initialization. >> +* "encrypted" : NVDIMM contents are encrypted. >> +* "smart_notify": There is health event for the NVDIMM. >> +* "scrubbed": Indicating that contents of the >> + NVDIMM have been scrubbed. >> +* "locked" : Indicating that NVDIMM contents cant >> + be modified until next power cycle. >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >> b/arch/powerpc/platforms/pseries/papr_scm.c >> index f35592423380..142636e1a59f 100644 >> --- a/arch/powerpc/platforms/pseries/papr_scm.c >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >> @@ -12,6 +12,7 @@ >> #include >> #include >> #include >> +#include >> >> #include >> >> @@ -22,6 +23,44 @@ >> (1ul << ND_CMD_GET_CONFIG_DATA) | \ >>
Re: [PATCH] input: i8042: Remove special PowerPC handling
Hi Michael, On Wed, May 20, 2020 at 04:07:00PM +1000, Michael Ellerman wrote: > [ + Dmitry & linux-input ] > > Nathan Chancellor writes: > > This causes a build error with CONFIG_WALNUT because kb_cs and kb_data > > were removed in commit 917f0af9e5a9 ("powerpc: Remove arch/ppc and > > include/asm-ppc"). > > > > ld.lld: error: undefined symbol: kb_cs > >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28) > >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28) > >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > >> referenced by i8042-ppcio.h:28 (drivers/input/serio/i8042-ppcio.h:28) > >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > > > > ld.lld: error: undefined symbol: kb_data > >> referenced by i8042.c:309 (drivers/input/serio/i8042.c:309) > >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > >> referenced by i8042-ppcio.h:33 (drivers/input/serio/i8042-ppcio.h:33) > >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > >> referenced by i8042.c:319 (drivers/input/serio/i8042.c:319) > >> input/serio/i8042.o:(__i8042_command) in archive drivers/built-in.a > >> referenced 15 more times > > > > Presumably since nobody has noticed this for the last 12 years, there is > > not anyone actually trying to use this driver so we can just remove this > > special walnut code and use the generic header so it builds for all > > configurations. > > > > Fixes: 917f0af9e5a9 ("powerpc: Remove arch/ppc and include/asm-ppc") > > Reported-by: kbuild test robot > > Signed-off-by: Nathan Chancellor > > --- > > drivers/input/serio/i8042-ppcio.h | 57 --- > > drivers/input/serio/i8042.h | 2 -- > > 2 files changed, 59 deletions(-) > > delete mode 100644 drivers/input/serio/i8042-ppcio.h > > This LGTM. > > Acked-by: Michael Ellerman (powerpc) > > I assumed drivers/input/serio would be pretty quiet, but there's > actually some commits to it in linux-next. So perhaps this should go via > the input tree. > > Dmitry do you want to take this, or should I take it via powerpc? > > Original patch is here: > > https://lore.kernel.org/lkml/20200518181043.3363953-1-natechancel...@gmail.com I'm fine with you taking it through powerpc. Acked-by: Dmitry Torokhov Also, while I have your attention ;), could you please ack or take https://lore.kernel.org/lkml/20191002214854.GA114387@dtor-ws/ as I believe this is the last user or input_polled_dev API and I would like to drop it from the tree. Thanks! -- Dmitry
Re: [RESEND PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules
s/seq_buf: Export seq_buf_printf() to external modules/ seq_buf: export seq_buf_printf/
[PATCH] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") added checks of secure bit of SRR1 to filter out the Hcall reserved to the Ultravisor. However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the context of the VM calling UV_ESM. This allows the Hypervisor to return to the guest without going through the Ultravisor. Thus the Secure bit of SRR1 is not set in that particular case. In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is not set in that case. Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") Signed-off-by: Laurent Dufour --- arch/powerpc/kvm/book3s_hv.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 93493f0cbfe8..eb1f96cb7b72 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1099,9 +1099,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) ret = kvmppc_h_svm_init_done(vcpu->kvm); break; case H_SVM_INIT_ABORT: - ret = H_UNSUPPORTED; - if (kvmppc_get_srr1(vcpu) & MSR_S) - ret = kvmppc_h_svm_init_abort(vcpu->kvm); + ret = kvmppc_h_svm_init_abort(vcpu->kvm); break; default: -- 2.26.2
[PATCH] net/ethernet/freescale: rework quiesce/activate for ucc_geth
ugeth_quiesce/activate are used to halt the controller when there is a link change that requires to reconfigure the mac. The previous implementation called netif_device_detach(). This however causes the initial activation of the netdevice to fail precisely because it's detached. For details, see [1]. A possible workaround was the revert of commit net: linkwatch: add check for netdevice being present to linkwatch_do_dev However, the check introduced in the above commit is correct and shall be kept. The netif_device_detach() is thus replaced with netif_tx_stop_all_queues() that prevents any tranmission. This allows to perform mac config change required by the link change, without detaching the corresponding netdevice and thus not preventing its initial activation. [1] https://lists.openwall.net/netdev/2020/01/08/201 Signed-off-by: Valentin Longchamp Acked-by: Matteo Ghidoni --- drivers/net/ethernet/freescale/ucc_geth.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c index 6e5f6dd169b5..552e7554a9f8 100644 --- a/drivers/net/ethernet/freescale/ucc_geth.c +++ b/drivers/net/ethernet/freescale/ucc_geth.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "ucc_geth.h" @@ -1548,11 +1549,8 @@ static int ugeth_disable(struct ucc_geth_private *ugeth, enum comm_dir mode) static void ugeth_quiesce(struct ucc_geth_private *ugeth) { - /* Prevent any further xmits, plus detach the device. */ - netif_device_detach(ugeth->ndev); - - /* Wait for any current xmits to finish. */ - netif_tx_disable(ugeth->ndev); + /* Prevent any further xmits */ + netif_tx_stop_all_queues(ugeth->ndev); /* Disable the interrupt to avoid NAPI rescheduling. */ disable_irq(ugeth->ug_info->uf_info.irq); @@ -1565,7 +1563,10 @@ static void ugeth_activate(struct ucc_geth_private *ugeth) { napi_enable(&ugeth->napi); enable_irq(ugeth->ug_info->uf_info.irq); - netif_device_attach(ugeth->ndev); + + /* allow to xmit again */ + netif_tx_wake_all_queues(ugeth->ndev); + __netdev_watchdog_up(ugeth->ndev); } /* Called every time the controller might need to be made -- 2.25.1
Re: [RESEND PATCH v7 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote: > Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm > modules and add the command family to the white list of NVDIMM command > sets. Also advertise support for ND_CMD_CALL for the dimm > command mask and implement necessary scaffolding in the module to > handle ND_CMD_CALL ioctl and PDSM requests that we receive. > > The layout of the PDSM request as we expect from libnvdimm/libndctl is > described in newly introduced uapi header 'papr_scm_pdsm.h' which > defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used > to communicate the PDSM request via member > 'nd_pkg_papr_scm->nd_command' and size of payload that need to be > sent/received for servicing the PDSM. > > A new function is_cmd_valid() is implemented that reads the args to > papr_scm_ndctl() and performs sanity tests on them. A new function > papr_scm_service_pdsm() is introduced and is called from > papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL > command from libnvdimm. > > Cc: Dan Williams > Cc: Michael Ellerman > Cc: "Aneesh Kumar K . V" > Signed-off-by: Vaibhav Jain > --- > Changelog: > > Resend: > * None > > v6..v7 : > * Removed the re-definitions of __packed macro from papr_scm_pdsm.h > [Mpe]. > * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe]. > * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h > [Mpe]. > * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe] > > v5..v6 : > * Changed the usage of the term DSM to PDSM to distinguish it from the > ACPI term [ Dan Williams ] > * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct > to reflect the new terminology. > * Updated the patch description and title to reflect the new terminology. > * Squashed patch to introduce new command family in 'ndctl.h' with > this patch [ Dan Williams ] > * Updated the papr_scm_pdsm method starting index from 0x1 to 0x0 > [ Dan Williams ] > * Removed redundant license text from the papr_scm_psdm.h file. > [ Dan Williams ] > * s/envelop/envelope/ at various places [ Dan Williams ] > * Added '__packed' attribute to command package header to gaurd > against different compiler adding paddings between the fields. > [ Dan Williams] > * Converted various pr_debug to dev_debug [ Dan Williams ] > > v4..v5 : > * None > > v3..v4 : > * None > > v2..v3 : > * Updated the patch prefix to 'ndctl/uapi' [Aneesh] > > v1..v2 : > * None > --- > arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++ > arch/powerpc/platforms/pseries/papr_scm.c | 101 - > include/uapi/linux/ndctl.h| 1 + > 3 files changed, 230 insertions(+), 6 deletions(-) > create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > > diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > new file mode 100644 > index ..671693439c1c > --- /dev/null > +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h > @@ -0,0 +1,134 @@ > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ > +/* > + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl > + * > + * (C) Copyright IBM 2020 > + * > + * Author: Vaibhav Jain > + */ > + > +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ > +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ > + > +#include > + > +/* > + * PDSM Envelope: > + * > + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via > + * 'envelopes' which consists of a header and user-defined payload sections. > + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a > + * payload following it and offset of which relative to the struct is > provided > + * by 'nd_pdsm_cmd_pkg.payload_offset'. * > + * > + * +-+-+---+ > + * | 64-Bytes | 8-Bytes | Max 184-Bytes | > + * +-+-+---+ > + * | nd_pdsm_cmd_pkg | | > + * |-+ | | > + * | nd_cmd_pkg | | | > + * +-+-+---+ > + * | nd_family | | | > + * | nd_size_out | cmd_status || > + * | nd_size_in | payload_version | PAYLOAD | > + * | nd_command | payload_offset ->| > + * | nd_fw_size | || > + * +-+-+---+ > + * > + * PDSM Header: > + * > + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a > + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member > + * 'nd_cmd_pkg.nd_command'. Apart f
Re: [RESEND PATCH v7 3/5] powerpc/papr_scm: Fetch nvdimm health information from PHYP
On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote: > Implement support for fetching nvdimm health information via > H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair > of 64-bit big-endian integers, bitwise-and of which is then stored in > 'struct papr_scm_priv' and subsequently partially exposed to > user-space via newly introduced dimm specific attribute > 'papr/flags'. Since the hcall is costly, the health information is > cached and only re-queried, 60s after the previous successful hcall. > > The patch also adds a documentation text describing flags reported by > the the new sysfs attribute 'papr/flags' is also introduced at > Documentation/ABI/testing/sysfs-bus-papr-scm. > > [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for > PAPR hcalls") > > Cc: Dan Williams > Cc: Michael Ellerman > Cc: "Aneesh Kumar K . V" > Signed-off-by: Vaibhav Jain > --- > Changelog: > > Resend: > * None > > v6..v7 : > * Used the exported buf_seq_printf() function to generate content for > 'papr/flags' > * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c > and removed the papr_scm.h file [Mpe] > * Some minor consistency issued in sysfs-bus-papr-scm > documentation. [Mpe] > * s/dimm_mutex/health_mutex/g [Mpe] > * Split drc_pmem_query_health() into two function one of which takes > care of caching and locking. [Mpe] > * Fixed a local copy creation of dimm health information using > READ_ONCE(). [Mpe] > > v5..v6 : > * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags' > [Dan Williams] > * Include documentation for 'papr/flags' attr [Dan Williams] > * Change flag 'save_fail' to 'flush_fail' [Dan Williams] > * Caching of health bitmap to reduce expensive hcalls [Dan Williams] > * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe] > * Replaced two __be64 integers from papr_scm_priv to a single u64 > integer [Mpe] > * Updated patch description to reflect the changes made in this > version. > * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from > flags_show() [Dan Williams] > > v4..v5 : > * None > > v3..v4 : > * None > > v2..v3 : > * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for >NVDIMM unarmed [Aneesh] > > v1..v2 : > * New patch in the series. > --- > Documentation/ABI/testing/sysfs-bus-papr-scm | 27 +++ > arch/powerpc/platforms/pseries/papr_scm.c| 169 ++- > 2 files changed, 194 insertions(+), 2 deletions(-) > create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm > > diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm > b/Documentation/ABI/testing/sysfs-bus-papr-scm > new file mode 100644 > index ..6143d06072f1 > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm > @@ -0,0 +1,27 @@ > +What:/sys/bus/nd/devices/nmemX/papr/flags > +Date:Apr, 2020 > +KernelVersion: v5.8 > +Contact: linuxppc-dev , > linux-nvd...@lists.01.org, > +Description: > + (RO) Report flags indicating various states of a > + papr-scm NVDIMM device. Each flag maps to a one or > + more bits set in the dimm-health-bitmap retrieved in > + response to H_SCM_HEALTH hcall. The details of the bit > + flags returned in response to this hcall is available > + at 'Documentation/powerpc/papr_hcalls.rst' . Below are > + the flags reported in this sysfs file: > + > + * "not_armed" : Indicates that NVDIMM contents will not > + survive a power cycle. > + * "flush_fail" : Indicates that NVDIMM contents > + couldn't be flushed during last > + shut-down event. > + * "restore_fail": Indicates that NVDIMM contents > + couldn't be restored during NVDIMM > + initialization. > + * "encrypted" : NVDIMM contents are encrypted. > + * "smart_notify": There is health event for the NVDIMM. > + * "scrubbed": Indicating that contents of the > + NVDIMM have been scrubbed. > + * "locked" : Indicating that NVDIMM contents cant > + be modified until next power cycle. > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > b/arch/powerpc/platforms/pseries/papr_scm.c > index f35592423380..142636e1a59f 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -12,6 +12,7 @@ > #include > #include > #include > +#include > > #include > > @@ -22,6 +23,44 @@ >(1ul << ND_CMD_GET_CONFIG_DATA) | \ >(1ul << ND_CMD_SET_CONFIG_DATA)) > > +/* DIMM health bitmap bitmap indicators */ > +/* SCM device is unable to persist memory contents */ > +#define PAPR_SCM_DIMM_UNARMED
Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)
On 5/20/20 7:23 PM, Christophe Leroy wrote: Le 20/05/2020 à 15:43, Aneesh Kumar K.V a écrit : Christophe Leroy writes: Le 18/05/2020 à 17:19, Rui Salvaterra a écrit : Hi again, Christophe, On Mon, 18 May 2020 at 15:03, Christophe Leroy wrote: Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may have broken swap. Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top survives the beating. I'll be happy to test a definitive patch! Yeah I discovered recently that the way swap is implemented on powerpc expects RW and other important bits not be one of the 3 least significant bits (see __pte_to_swp_entry() ) The last 3 bits are there to track the _PAGE_PRESENT right? What is the RW dependency there? Are you suggesting of read/write migration entry? A swap entry should not retain the pte rw bits right? A swap entry is built using swap type + offset. And it should not have a dependency on pte RW bits. Along with type and offset we also should have the ability to mark it as a pte entry and also set not present bits. With that understanding what am I missing here? That's probably me who is missing something, I have not digged into the swap functionning yet indeed, so that was only my first feeling. By the way, the problems is definitely due to the order changes in the PTE bits, whether that's because _PAGE_RW was moved to the last 3 bits or whether that's because _PAGE_PRESENT was moved out of the last 3 bits, I don't know yet. My (bad) understanding is from the fact that __pte_to_swp_entry() is a right shift by 3 bits, so it looses the last 3 bits, and therefore __swp_entry_to_pte(__pte_to_swp_entry(pte)) looses the last 3 bits of a PTE. Is there somewhere a description of how swap works exactly ? Looking at __set_pte_at(), I am wondering whether this was due to _PAGE_HASHPTE? . This would mean we end up wrongly updating some swap entry details. We call set_pte_at() on swap pte entries. -aneesh
Re: Endless soft-lockups for compiling workload since next-20200519
On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote: > On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote: > > Just a head up. Repeatedly compiling kernels for a while would trigger > > endless soft-lockups since next-20200519 on both x86_64 and powerpc. > > .config are in, > > Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've > not seen anything like that myself. Let me go have a look. Yes, I ended up figuring out the same commit a bit earlier. Since then I reverted that commit and its dependency, 2a0a24ebb499 ("sched: Make scheduler_ipi inline") Everything works fine so far. > > > In as far as the logs are readable (they're a wrapped mess, please don't > do that!), they contain very little useful, as is typical with IPIs :/ Sorry about that. I forgot that gmail webUI will wrap things around. I will switch to mutt. > > > [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127 > > flush_smp_call_function_queue+0x1fa/0x2e0 > > [ 1168.00][C1] Modules linked in: nls_iso8859_1 nls_cp437 vfat > > fat kvm_amd ses kvm enclosure dax_pmem irqbypass dax_pmem_core efivars > > acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi > > scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror > > dm_region_hash dm_log dm_mod > > [ 1168.029492][C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > > 5.7.0-rc6-next-20200519 #1 > > [ 1168.037665][C1] Hardware name: HPE ProLiant DL385 > > Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 > > [ 1168.046978][C1] RIP: 0010:flush_smp_call_function_queue+0x1fa/0x2e0 > > [ 1168.053658][C1] Code: 01 0f 87 c9 12 00 00 83 e3 01 0f 85 cc fe > > ff ff 48 c7 c7 c0 55 a9 8f c6 05 f6 86 cd 01 01 e8 de 09 ea ff 0f 0b > > e9 b2 fe ff ff <0f> 0b e9 52 ff ff ff 0f 0b e9 f2 fe ff ff 65 44 8b 25 > > 10 52 3f 71 > > [ 1168.073262][C1] RSP: 0018:c9178918 EFLAGS: 00010046 > > [ 1168.079253][C1] RAX: RBX: 430c58f8 > > RCX: 8ec26083 > > [ 1168.087156][C1] RDX: 0003 RSI: dc00 > > RDI: 430c58f8 > > [ 1168.095054][C1] RBP: c91789a8 R08: ed1108618cec > > R09: ed1108618cec > > [ 1168.102964][C1] R10: 430c675b R11: > > R12: 430c58e0 > > [ 1168.110866][C1] R13: 8eb30c40 R14: 430c5880 > > R15: 430c58e0 > > [ 1168.118767][C1] FS: () > > GS:4308() knlGS: > > [ 1168.127628][C1] CS: 0010 DS: ES: CR0: 80050033 > > [ 1168.134129][C1] CR2: 55b169604560 CR3: 000d08a14000 > > CR4: 003406e0 > > [ 1168.142026][C1] Call Trace: > > [ 1168.145206][C1] > > [ 1168.147957][C1] ? smp_call_on_cpu_callback+0xd0/0xd0 > > [ 1168.153421][C1] ? rcu_read_lock_sched_held+0xac/0xe0 > > [ 1168.158880][C1] ? rcu_read_lock_bh_held+0xc0/0xc0 > > [ 1168.164076][C1] generic_smp_call_function_single_interrupt+0x13/0x2b > > [ 1168.170938][C1] smp_call_function_single_interrupt+0x157/0x4e0 > > [ 1168.177278][C1] ? smp_call_function_interrupt+0x4e0/0x4e0 > > [ 1168.183172][C1] ? interrupt_entry+0xe4/0xf0 > > [ 1168.187846][C1] ? trace_hardirqs_off_caller+0x8d/0x1f0 > > [ 1168.193478][C1] ? trace_hardirqs_on_caller+0x1f0/0x1f0 > > [ 1168.199116][C1] ? _nohz_idle_balance+0x221/0x360 > > [ 1168.204228][C1] ? trace_hardirqs_off_thunk+0x1a/0x1c > > [ 1168.209690][C1] call_function_single_interrupt+0xf/0x20
Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6,7)
Le 20/05/2020 à 15:43, Aneesh Kumar K.V a écrit : Christophe Leroy writes: Le 18/05/2020 à 17:19, Rui Salvaterra a écrit : Hi again, Christophe, On Mon, 18 May 2020 at 15:03, Christophe Leroy wrote: Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may have broken swap. Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top survives the beating. I'll be happy to test a definitive patch! Yeah I discovered recently that the way swap is implemented on powerpc expects RW and other important bits not be one of the 3 least significant bits (see __pte_to_swp_entry() ) The last 3 bits are there to track the _PAGE_PRESENT right? What is the RW dependency there? Are you suggesting of read/write migration entry? A swap entry should not retain the pte rw bits right? A swap entry is built using swap type + offset. And it should not have a dependency on pte RW bits. Along with type and offset we also should have the ability to mark it as a pte entry and also set not present bits. With that understanding what am I missing here? That's probably me who is missing something, I have not digged into the swap functionning yet indeed, so that was only my first feeling. By the way, the problems is definitely due to the order changes in the PTE bits, whether that's because _PAGE_RW was moved to the last 3 bits or whether that's because _PAGE_PRESENT was moved out of the last 3 bits, I don't know yet. My (bad) understanding is from the fact that __pte_to_swp_entry() is a right shift by 3 bits, so it looses the last 3 bits, and therefore __swp_entry_to_pte(__pte_to_swp_entry(pte)) looses the last 3 bits of a PTE. Is there somewhere a description of how swap works exactly ? Christophe
Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open
On 2020-05-20 02:38, Jiri Slaby wrote: On 15. 05. 20, 1:22, rana...@codeaurora.org wrote: On 2020-05-13 00:04, Greg KH wrote: On Tue, May 12, 2020 at 02:39:50PM -0700, rana...@codeaurora.org wrote: On 2020-05-12 01:25, Greg KH wrote: > On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote: > > commit bdb498c20040616e94b05c31a0ceb3e134b7e829 > > Author: Jiri Slaby > > Date: Tue Aug 7 21:48:04 2012 +0200 > > > > TTY: hvc_console, add tty install > > > > added hvc_install but did not move 'tty->driver_data = NULL;' from > > hvc_open's fail path to hvc_cleanup. > > > > IOW hvc_open now NULLs tty->driver_data even for another task which > > opened the tty earlier. The same holds for > > "tty_port_tty_set(&hp->port, > > NULL);" there. And actually "tty_port_put(&hp->port);" is also > > incorrect > > for the 2nd task opening the tty. > > ... These are the traces you get when the issue happens: [ 154.212291] hvc_install called for pid: 666 [ 154.216705] hvc_open called for pid: 666 [ 154.233657] hvc_open: request_irq failed with rc -22. [ 154.238934] hvc_open called for pid: 678 [ 154.243012] Unable to handle kernel NULL pointer dereference at virtual address 00c4 # hvc_install isn't called for pid: 678 as the file wasn't closed yet. Nice. Does the attached help? Yeah, it looks good. I think it also eliminates the port.count reference issue discussed on the v2 patch. Are you planning to mainline this? I wonder how comes the tty_port_put in hvc_open does not cause a UAF? I would say hvc_open fails, tty_port_put is called. It decrements the reference taken in hvc_install. So far so good. Now, this should happen IMO: tty_open -> hvc_open (fails) -> tty_port_put hvc_console driver defines port->ops->destruct(). Upon tty_port_put(), the tty_port_destructor() calls port->ops->destruct(), rather than kfree(port). -> tty_release -> tty_release_struct -> tty_kref_put -> queue_release_one_tty SCHEDULED WORKQUEUE release_one_tty -> hvc_cleanup -> tty_port_put (should die terrible death now) Since port is not free'd, I think we should be good. What am I missing? thanks, Thank you. Raghavendra
Re: [Regression 5.7-rc1] Random hangs on 32-bit PowerPC (PowerBook6, 7)
Christophe Leroy writes: > Le 18/05/2020 à 17:19, Rui Salvaterra a écrit : >> Hi again, Christophe, >> >> On Mon, 18 May 2020 at 15:03, Christophe Leroy >> wrote: >>> >>> Can you try reverting 697ece78f8f749aeea40f2711389901f0974017a ? It may >>> have broken swap. >> >> Yeah, that was a good call. :) Linux 5.7-rc1 with the revert on top >> survives the beating. I'll be happy to test a definitive patch! >> > > Yeah I discovered recently that the way swap is implemented on powerpc > expects RW and other important bits not be one of the 3 least > significant bits (see __pte_to_swp_entry() ) The last 3 bits are there to track the _PAGE_PRESENT right? What is the RW dependency there? Are you suggesting of read/write migration entry? A swap entry should not retain the pte rw bits right? A swap entry is built using swap type + offset. And it should not have a dependency on pte RW bits. Along with type and offset we also should have the ability to mark it as a pte entry and also set not present bits. With that understanding what am I missing here? > > I guess the easiest for the time being is to revert the commit with a > proper explanation of the issue, then one day we'll modify the way > powerpc manages swap. > -aneesh
[PATCH] powerpc/64s: Disable STRICT_KERNEL_RWX
Several strange crashes have been eventually traced back to STRICT_KERNEL_RWX and its interaction with code patching. Various paths in our ftrace, kprobes and other patching code need to be hardened against patching failures, otherwise we can end up running with partially/incorrectly patched ftrace paths, kprobes or jump labels, which can then cause strange crashes. Although fixes for those are in development, they're not -rc material. There also seem to be problems with the underlying strict RWX logic, which needs further debugging. So for now disable STRICT_KERNEL_RWX on 64-bit to prevent people from enabling the option and tripping over the bugs. Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Cc: sta...@vger.kernel.org # v4.13+ Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 924c541a9260..d13b5328ca10 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -130,7 +130,7 @@ config PPC select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 - select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !HIBERNATION) + select ARCH_HAS_STRICT_KERNEL_RWX if (PPC32 && !HIBERNATION) select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UACCESS_FLUSHCACHE select ARCH_HAS_UACCESS_MCSAFE if PPC64 -- 2.25.1
Re: Endless soft-lockups for compiling workload since next-20200519
On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote: > Just a head up. Repeatedly compiling kernels for a while would trigger > endless soft-lockups since next-20200519 on both x86_64 and powerpc. > .config are in, Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've not seen anything like that myself. Let me go have a look. In as far as the logs are readable (they're a wrapped mess, please don't do that!), they contain very little useful, as is typical with IPIs :/ > [ 1167.993773][C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127 > flush_smp_call_function_queue+0x1fa/0x2e0 > [ 1168.00][C1] Modules linked in: nls_iso8859_1 nls_cp437 vfat > fat kvm_amd ses kvm enclosure dax_pmem irqbypass dax_pmem_core efivars > acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi > scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror > dm_region_hash dm_log dm_mod > [ 1168.029492][C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted > 5.7.0-rc6-next-20200519 #1 > [ 1168.037665][C1] Hardware name: HPE ProLiant DL385 > Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 > [ 1168.046978][C1] RIP: 0010:flush_smp_call_function_queue+0x1fa/0x2e0 > [ 1168.053658][C1] Code: 01 0f 87 c9 12 00 00 83 e3 01 0f 85 cc fe > ff ff 48 c7 c7 c0 55 a9 8f c6 05 f6 86 cd 01 01 e8 de 09 ea ff 0f 0b > e9 b2 fe ff ff <0f> 0b e9 52 ff ff ff 0f 0b e9 f2 fe ff ff 65 44 8b 25 > 10 52 3f 71 > [ 1168.073262][C1] RSP: 0018:c9178918 EFLAGS: 00010046 > [ 1168.079253][C1] RAX: RBX: 430c58f8 > RCX: 8ec26083 > [ 1168.087156][C1] RDX: 0003 RSI: dc00 > RDI: 430c58f8 > [ 1168.095054][C1] RBP: c91789a8 R08: ed1108618cec > R09: ed1108618cec > [ 1168.102964][C1] R10: 430c675b R11: > R12: 430c58e0 > [ 1168.110866][C1] R13: 8eb30c40 R14: 430c5880 > R15: 430c58e0 > [ 1168.118767][C1] FS: () > GS:4308() knlGS: > [ 1168.127628][C1] CS: 0010 DS: ES: CR0: 80050033 > [ 1168.134129][C1] CR2: 55b169604560 CR3: 000d08a14000 > CR4: 003406e0 > [ 1168.142026][C1] Call Trace: > [ 1168.145206][C1] > [ 1168.147957][C1] ? smp_call_on_cpu_callback+0xd0/0xd0 > [ 1168.153421][C1] ? rcu_read_lock_sched_held+0xac/0xe0 > [ 1168.158880][C1] ? rcu_read_lock_bh_held+0xc0/0xc0 > [ 1168.164076][C1] generic_smp_call_function_single_interrupt+0x13/0x2b > [ 1168.170938][C1] smp_call_function_single_interrupt+0x157/0x4e0 > [ 1168.177278][C1] ? smp_call_function_interrupt+0x4e0/0x4e0 > [ 1168.183172][C1] ? interrupt_entry+0xe4/0xf0 > [ 1168.187846][C1] ? trace_hardirqs_off_caller+0x8d/0x1f0 > [ 1168.193478][C1] ? trace_hardirqs_on_caller+0x1f0/0x1f0 > [ 1168.199116][C1] ? _nohz_idle_balance+0x221/0x360 > [ 1168.204228][C1] ? trace_hardirqs_off_thunk+0x1a/0x1c > [ 1168.209690][C1] call_function_single_interrupt+0xf/0x20
Re: [PATCH v2] tty: hvc: Fix data abort due to race in hvc_open
On 2020-05-20 01:59, Jiri Slaby wrote: On 20. 05. 20, 8:47, Raghavendra Rao Ananta wrote: Potentially, hvc_open() can be called in parallel when two tasks calls open() on /dev/hvcX. In such a scenario, if the hp->ops->notifier_add() callback in the function fails, where it sets the tty->driver_data to NULL, the parallel hvc_open() can see this NULL and cause a memory abort. Hence, do a NULL check at the beginning, before proceeding ahead. The issue can be easily reproduced by launching two tasks simultaneously that does an open() call on /dev/hvcX. For example: $ cat /dev/hvc0 & cat /dev/hvc0 & Cc: sta...@vger.kernel.org Signed-off-by: Raghavendra Rao Ananta --- drivers/tty/hvc/hvc_console.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index 436cc51c92c3..80709f754cc8 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -350,6 +350,9 @@ static int hvc_open(struct tty_struct *tty, struct file * filp) unsigned long flags; int rc = 0; + if (!hp) + return -ENODEV; + This is still not fixing the bug properly. See: https://lore.kernel.org/linuxppc-dev/0f7791f5-0a53-59f6-7277-247a789f3...@suse.cz/ In particular, the paragraph starting "IOW". You are right. This doesn't fix the problem entirely. There are other parts to it which is not handled in a clean way by the driver. Apart from the things you've mentioned, it doesn't seem to handle the hp->port.count correctly as well. hvc_open: hp->port.count++ hp->ops->notifier_add(hp, hp->data) fails tty->driver_data = NULL hvc_close: returns immediately as tty->driver_data == NULL, without hp->port.count-- This would leave the port in a stale state, and the second caller of hvc_open doesn't get a chance to call/retry hp->ops->notifier_add(hp, hp->data); However, the patch is not trying to address the logical issues with hvc_open and hvc_close. It's only trying to eliminate the potential NULL pointer dereference, leading to a panic. From what I see, the hvc_open is serialized by tty_lock, and adding a NULL check here is preventing the second caller. thanks, Thank you. Raghavendra
Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
On Wed, May 20, 2020 at 07:22:19PM +0800, Shengjiu Wang wrote: > I see some driver also request dma channel in open() or hw_params(). > how can they avoid the defer probe issue? > for example: > sound/arm/pxa2xx-pcm-lib.c > sound/soc/sprd/sprd-pcm-dma.c Other drivers having problems means those drivers should be fixed, not that we should copy the problems. In the case of the PXA driver that's very old code which predates deferred probe by I'd guess a decade. signature.asc Description: PGP signature
Re: [PATCH] powerpc: Add ppc_inst_next()
Le 20/05/2020 à 14:21, Jordan Niethe a écrit : On Wed, May 20, 2020 at 9:44 PM Michael Ellerman wrote: In a few places we want to calculate the address of the next instruction. Previously that was simple, we just added 4 bytes, or if using a u32 * we incremented that pointer by 1. But prefixed instructions make it more complicated, we need to advance by either 4 or 8 bytes depending on the actual instruction. We also can't do pointer arithmetic using struct ppc_inst, because it is always 8 bytes in size on 64-bit, even though we might only need to advance by 4 bytes. So add a ppc_inst_next() helper which calculates the location of the next instruction, if the given instruction was located at the given address. Note the instruction doesn't need to actually be at the address in memory. Convert several locations to use it. Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/inst.h | 9 + arch/powerpc/kernel/uprobes.c | 2 +- arch/powerpc/lib/feature-fixups.c | 10 +- arch/powerpc/xmon/xmon.c | 2 +- 4 files changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index d82e0c99cfa1..7d5ee1309b92 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x) return ppc_inst_prefixed(x) ? 8 : 4; } +/* + * Return the address of the next instruction, if the instruction @value was + * located at @location. + */ +static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst value) +{ + return location + ppc_inst_len(value); +} I think this is a good idea. I tried something similar in the initial post for an instruction type. I had: +#define PPC_INST_NEXT(ptr) ((ptr) += PPC_INST_LEN(DEREF_PPC_INST_PTR((ptr but how you've got it is much more clear/usable. Yes I agree I wonder why not +static inline struct ppc_inst *ppc_inst_next(void *location) +{ + return location + ppc_inst_len(ppc_inst_read((struct ppc_inst *)location); +} Because as Michael explains, the instruction to be skipped might not yet be at the pointed memory location (for instance in insert_bpts() ) + int probe_user_read_inst(struct ppc_inst *inst, struct ppc_inst __user *nip); diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index 83e883e1a42d..683ba76919a7 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) * support doesn't exist and have to fix-up the next instruction * to be executed. */ - regs->nip = utask->vaddr + ppc_inst_len(ppc_inst_read(&auprobe->insn)); + regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, auprobe->insn); user_disable_single_step(current); return 0; diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 80f320c2e189..0ad01eebf112 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, struct fixup_entry *fcur) src = alt_start; dest = start; - for (; src < alt_end; src = (void *)src + ppc_inst_len(ppc_inst_read(src)), -(dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest { + for (; src < alt_end; src = ppc_inst_next(src, *src), + dest = ppc_inst_next(dest, *dest)) { The reason to maybe use ppc_inst_read() in the helper instead of just *dest would be we don't always need to read 8 bytes. And reading 8 bytes might trigger a page fault if we are reading the very last non prefixed instruction of the last page. if (patch_alt_instruction(src, dest, alt_start, alt_end)) return 1; } - for (; dest < end; dest = (void *)dest + ppc_inst_len(ppc_inst(PPC_INST_NOP))) + for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP))) But then you wouldn't be able to do this as easily I guess. raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP)); return 0; @@ -405,8 +405,8 @@ static void do_final_fixups(void) while (src < end) { inst = ppc_inst_read(src); raw_patch_instruction(dest, inst); - src = (void *)src + ppc_inst_len(inst); - dest = (void *)dest + ppc_inst_len(inst); + src = ppc_inst_next(src, *src); + dest = ppc_inst_next(dest, *dest); } #endif } diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index fb135f2cd6b0..aa123f56b7d4 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -939,7 +939,7 @@ static void insert_bpts(void) }
Re: [PATCH] powerpc: Add ppc_inst_next()
On Wed, May 20, 2020 at 9:44 PM Michael Ellerman wrote: > > In a few places we want to calculate the address of the next > instruction. Previously that was simple, we just added 4 bytes, or if > using a u32 * we incremented that pointer by 1. > > But prefixed instructions make it more complicated, we need to advance > by either 4 or 8 bytes depending on the actual instruction. We also > can't do pointer arithmetic using struct ppc_inst, because it is > always 8 bytes in size on 64-bit, even though we might only need to > advance by 4 bytes. > > So add a ppc_inst_next() helper which calculates the location of the > next instruction, if the given instruction was located at the given > address. Note the instruction doesn't need to actually be at the > address in memory. > > Convert several locations to use it. > > Signed-off-by: Michael Ellerman > --- > arch/powerpc/include/asm/inst.h | 9 + > arch/powerpc/kernel/uprobes.c | 2 +- > arch/powerpc/lib/feature-fixups.c | 10 +- > arch/powerpc/xmon/xmon.c | 2 +- > 4 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h > index d82e0c99cfa1..7d5ee1309b92 100644 > --- a/arch/powerpc/include/asm/inst.h > +++ b/arch/powerpc/include/asm/inst.h > @@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x) > return ppc_inst_prefixed(x) ? 8 : 4; > } > > +/* > + * Return the address of the next instruction, if the instruction @value was > + * located at @location. > + */ > +static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst > value) > +{ > + return location + ppc_inst_len(value); > +} I think this is a good idea. I tried something similar in the initial post for an instruction type. I had: +#define PPC_INST_NEXT(ptr) ((ptr) += PPC_INST_LEN(DEREF_PPC_INST_PTR((ptr but how you've got it is much more clear/usable. I wonder why not +static inline struct ppc_inst *ppc_inst_next(void *location) +{ + return location + ppc_inst_len(ppc_inst_read((struct ppc_inst *)location); +} > + > int probe_user_read_inst(struct ppc_inst *inst, > struct ppc_inst __user *nip); > > diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c > index 83e883e1a42d..683ba76919a7 100644 > --- a/arch/powerpc/kernel/uprobes.c > +++ b/arch/powerpc/kernel/uprobes.c > @@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, > struct pt_regs *regs) > * support doesn't exist and have to fix-up the next instruction > * to be executed. > */ > - regs->nip = utask->vaddr + > ppc_inst_len(ppc_inst_read(&auprobe->insn)); > + regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, > auprobe->insn); > > user_disable_single_step(current); > return 0; > diff --git a/arch/powerpc/lib/feature-fixups.c > b/arch/powerpc/lib/feature-fixups.c > index 80f320c2e189..0ad01eebf112 100644 > --- a/arch/powerpc/lib/feature-fixups.c > +++ b/arch/powerpc/lib/feature-fixups.c > @@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, > struct fixup_entry *fcur) > src = alt_start; > dest = start; > > - for (; src < alt_end; src = (void *)src + > ppc_inst_len(ppc_inst_read(src)), > -(dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest { > + for (; src < alt_end; src = ppc_inst_next(src, *src), > + dest = ppc_inst_next(dest, *dest)) { The reason to maybe use ppc_inst_read() in the helper instead of just *dest would be we don't always need to read 8 bytes. > if (patch_alt_instruction(src, dest, alt_start, alt_end)) > return 1; > } > > - for (; dest < end; dest = (void *)dest + > ppc_inst_len(ppc_inst(PPC_INST_NOP))) > + for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP))) But then you wouldn't be able to do this as easily I guess. > raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP)); > > return 0; > @@ -405,8 +405,8 @@ static void do_final_fixups(void) > while (src < end) { > inst = ppc_inst_read(src); > raw_patch_instruction(dest, inst); > - src = (void *)src + ppc_inst_len(inst); > - dest = (void *)dest + ppc_inst_len(inst); > + src = ppc_inst_next(src, *src); > + dest = ppc_inst_next(dest, *dest); > } > #endif > } > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > index fb135f2cd6b0..aa123f56b7d4 100644 > --- a/arch/powerpc/xmon/xmon.c > +++ b/arch/powerpc/xmon/xmon.c > @@ -939,7 +939,7 @@ static void insert_bpts(void) > } > > patch_instruction(bp->instr, instr); > - patch_instruction((void *)bp->instr + ppc_inst_len(instr), > + patch_instruction(ppc_inst_next(bp
[PATCH] powerpc/configs/64s: Enable CONFIG_PRINTK_CALLER
This adds the CPU or thread number to printk messages. This helps a lot when deciphering concurrent oopses that have been interleaved. Example output, of PID1 (T1) triggering a warning: [1.581678][T1] WARNING: CPU: 0 PID: 1 at crypto/rsa-pkcs1pad.c:539 pkcs1pad_verify+0x38/0x140 [1.581681][T1] Modules linked in: [1.581693][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty #1515 [1.581700][T1] NIP: c0207d64 LR: c0207d3c CTR: c0207d2c [1.581708][T1] REGS: c000fd2e7560 TRAP: 0700 Not tainted (5.5.0-rc5-gcc-8.2.0-00121-gf84c2e595927-dirty) [1.581712][T1] MSR: 90029033 CR: 44000222 XER: 0004 Signed-off-by: Michael Ellerman --- arch/powerpc/configs/powernv_defconfig | 1 + arch/powerpc/configs/ppc64_defconfig | 1 + arch/powerpc/configs/pseries_defconfig | 1 + 3 files changed, 3 insertions(+) diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index df8bdbaa5d8f..2de9aadf0f50 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -347,3 +347,4 @@ CONFIG_KVM_BOOK3S_64=m CONFIG_KVM_BOOK3S_64_HV=m CONFIG_VHOST_NET=m CONFIG_PRINTK_TIME=y +CONFIG_PRINTK_CALLER=y diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig index bae8170d7401..57142a648ebd 100644 --- a/arch/powerpc/configs/ppc64_defconfig +++ b/arch/powerpc/configs/ppc64_defconfig @@ -358,6 +358,7 @@ CONFIG_CRYPTO_DEV_NX=y CONFIG_CRYPTO_DEV_NX_ENCRYPT=m CONFIG_CRYPTO_DEV_VMX=y CONFIG_PRINTK_TIME=y +CONFIG_PRINTK_CALLER=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_STACK_USAGE=y diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 0bea4d3ffb85..dfa4a726333b 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -322,3 +322,4 @@ CONFIG_KVM_BOOK3S_64=m CONFIG_KVM_BOOK3S_64_HV=m CONFIG_VHOST_NET=m CONFIG_PRINTK_TIME=y +CONFIG_PRINTK_CALLER=y -- 2.25.1
[PATCH] powerpc: Add ppc_inst_next()
In a few places we want to calculate the address of the next instruction. Previously that was simple, we just added 4 bytes, or if using a u32 * we incremented that pointer by 1. But prefixed instructions make it more complicated, we need to advance by either 4 or 8 bytes depending on the actual instruction. We also can't do pointer arithmetic using struct ppc_inst, because it is always 8 bytes in size on 64-bit, even though we might only need to advance by 4 bytes. So add a ppc_inst_next() helper which calculates the location of the next instruction, if the given instruction was located at the given address. Note the instruction doesn't need to actually be at the address in memory. Convert several locations to use it. Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/inst.h | 9 + arch/powerpc/kernel/uprobes.c | 2 +- arch/powerpc/lib/feature-fixups.c | 10 +- arch/powerpc/xmon/xmon.c | 2 +- 4 files changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h index d82e0c99cfa1..7d5ee1309b92 100644 --- a/arch/powerpc/include/asm/inst.h +++ b/arch/powerpc/include/asm/inst.h @@ -100,6 +100,15 @@ static inline int ppc_inst_len(struct ppc_inst x) return ppc_inst_prefixed(x) ? 8 : 4; } +/* + * Return the address of the next instruction, if the instruction @value was + * located at @location. + */ +static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst value) +{ + return location + ppc_inst_len(value); +} + int probe_user_read_inst(struct ppc_inst *inst, struct ppc_inst __user *nip); diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c index 83e883e1a42d..683ba76919a7 100644 --- a/arch/powerpc/kernel/uprobes.c +++ b/arch/powerpc/kernel/uprobes.c @@ -112,7 +112,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) * support doesn't exist and have to fix-up the next instruction * to be executed. */ - regs->nip = utask->vaddr + ppc_inst_len(ppc_inst_read(&auprobe->insn)); + regs->nip = (unsigned long)ppc_inst_next((void *)utask->vaddr, auprobe->insn); user_disable_single_step(current); return 0; diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 80f320c2e189..0ad01eebf112 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -84,13 +84,13 @@ static int patch_feature_section(unsigned long value, struct fixup_entry *fcur) src = alt_start; dest = start; - for (; src < alt_end; src = (void *)src + ppc_inst_len(ppc_inst_read(src)), -(dest = (void *)dest + ppc_inst_len(ppc_inst_read(dest { + for (; src < alt_end; src = ppc_inst_next(src, *src), + dest = ppc_inst_next(dest, *dest)) { if (patch_alt_instruction(src, dest, alt_start, alt_end)) return 1; } - for (; dest < end; dest = (void *)dest + ppc_inst_len(ppc_inst(PPC_INST_NOP))) + for (; dest < end; dest = ppc_inst_next(dest, ppc_inst(PPC_INST_NOP))) raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP)); return 0; @@ -405,8 +405,8 @@ static void do_final_fixups(void) while (src < end) { inst = ppc_inst_read(src); raw_patch_instruction(dest, inst); - src = (void *)src + ppc_inst_len(inst); - dest = (void *)dest + ppc_inst_len(inst); + src = ppc_inst_next(src, *src); + dest = ppc_inst_next(dest, *dest); } #endif } diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index fb135f2cd6b0..aa123f56b7d4 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -939,7 +939,7 @@ static void insert_bpts(void) } patch_instruction(bp->instr, instr); - patch_instruction((void *)bp->instr + ppc_inst_len(instr), + patch_instruction(ppc_inst_next(bp->instr, instr), ppc_inst(bpinstr)); if (bp->enabled & BP_CIABR) continue; -- 2.25.1
Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
Hi On Wed, May 20, 2020 at 5:42 PM Lucas Stach wrote: > > Am Mittwoch, den 20.05.2020, 16:20 +0800 schrieb Shengjiu Wang: > > Hi > > > > On Tue, May 19, 2020 at 6:04 PM Lucas Stach wrote: > > > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang: > > > > There are two requirements that we need to move the request > > > > of dma channel from probe to open. > > > > > > How do you handle -EPROBE_DEFER return code from the channel request if > > > you don't do it in probe? > > > > I use the dma_request_slave_channel or dma_request_channel instead > > of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER > > return code. > > This is a pretty weak argument. The dmaengine device might probe after > you try to get the channel. Using a function to request the channel > that doesn't allow you to handle probe deferral is IMHO a bug and > should be fixed, instead of building even more assumptions on top of > it. > I see some driver also request dma channel in open() or hw_params(). how can they avoid the defer probe issue? for example: sound/arm/pxa2xx-pcm-lib.c sound/soc/sprd/sprd-pcm-dma.c > > > > - When dma device binds with power-domains, the power will > > > > be enabled when we request dma channel. If the request of dma > > > > channel happen on probe, then the power-domains will be always > > > > enabled after kernel boot up, which is not good for power > > > > saving, so we need to move the request of dma channel to .open(); > > > > > > This is certainly something which could be fixed in the dmaengine > > > driver. > > > > Dma driver always call the pm_runtime_get_sync in > > device_alloc_chan_resources, the device_alloc_chan_resources is > > called when channel is requested. so power is enabled on channel > > request. > > So why can't you fix the dmaengine driver to do that RPM call at a > later time when the channel is actually going to be used? This will > allow further power savings with other slave devices than the audio > PCM. > > Regards, > Lucas > It seems the best place for calling pm_runtime_get_sync is the device_alloc_chan_resources, and calling pm_runtime_put_sync in the .device_free_chan_resources For the slave_sg mode, the .device_prep_slave_sg and .device_issue_pending will be called many times after .device_alloc_chan_resources. so it is not good to call pm_runtime_get_sync in .device_prep_slave_sg or .device_issue_pending best regards wang shengjiu
[PATCH] powerpc/xmon: Show task->thread.regs in process display
Show the address of the tasks regs in the process listing in xmon. The regs should always be on the stack page that we also print the address of, but it's still helpful not to have to find them by hand. Signed-off-by: Michael Ellerman --- arch/powerpc/xmon/xmon.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index de585204d1d2..fb135f2cd6b0 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -3185,8 +3185,8 @@ static void show_task(struct task_struct *tsk) (tsk->exit_state & EXIT_DEAD) ? 'E' : (tsk->state & TASK_INTERRUPTIBLE) ? 'S' : '?'; - printf("%px %016lx %6d %6d %c %2d %s\n", tsk, - tsk->thread.ksp, + printf("%16px %16lx %16px %6d %6d %c %2d %s\n", tsk, + tsk->thread.ksp, tsk->thread.regs, tsk->pid, rcu_dereference(tsk->parent)->pid, state, task_cpu(tsk), tsk->comm); @@ -3309,7 +3309,7 @@ static void show_tasks(void) unsigned long tskv; struct task_struct *tsk = NULL; - printf(" task_struct ->thread.kspPID PPID S P CMD\n"); + printf(" task_struct ->thread.ksp->thread.regsPID PPID S P CMD\n"); if (scanhex(&tskv)) tsk = (struct task_struct *)tskv; -- 2.25.1
Re: [PATCH] powerpc/5200: update contact email
On Sat, 2 May 2020 16:26:42 +0200, Wolfram Sang wrote: > My 'pengutronix' address is defunct for years. Merge the entries and use > the proper contact address. Applied to powerpc/next. [1/1] powerpc/5200: update contact email https://git.kernel.org/powerpc/c/ad0f522df1b2f4fe5d4ae6418e1ea216154a0662 cheers
Re: [PATCH v4 0/2] powerpc/eeh: Release EEH device state synchronously
On Tue, 28 Apr 2020 13:45:04 +1000, Sam Bobroff wrote: > Here are some fixes and cleanups that have come from other work but that I > think stand on their own. > > Only one patch ("Release EEH device state synchronously", suggested by Oliver > O'Halloran) is a significant change: it moves the cleanup of some EEH device > data out of the (possibly asynchronous) device release handler and into the > (synchronously called) bus notifier. This is useful for future work as it > makes > it easier to reason about the lifetimes of EEH structures. > > [...] Applied to powerpc/next. [1/2] powerpc/eeh: Fix pseries_eeh_configure_bridge() https://git.kernel.org/powerpc/c/6fa13640aea7bb0760846981aa2da4245307bd26 [2/2] powerpc/eeh: Release EEH device state synchronously https://git.kernel.org/powerpc/c/466381ecdc741b1767d980e10b1ec49f6bde56f3 cheers
Re: [PATCH v6 00/16] powerpc/watchpoint: Preparation for more than one watchpoint
On Thu, 14 May 2020 16:47:25 +0530, Ravi Bangoria wrote: > So far, powerpc Book3S code has been written with an assumption of > only one watchpoint. But Power10[1] is introducing second watchpoint > register (DAWR). Even though this patchset does not enable 2nd DAWR, > it makes the infrastructure ready so that enabling 2nd DAWR should > just be a matter of changing count. > > Existing functionality works fine with the patchset. I've tested it > with perf, ptrace(gdb), xmon. All hw-breakpoint selftests are passing > as well. And I've build tested for 8xx and 'AMCC 44x, 46x or 47x'. > > [...] Applied to powerpc/next. [01/16] powerpc/watchpoint: Rename current DAWR macros https://git.kernel.org/powerpc/c/09f82b063aa9c248a3ef919aeec361054e7b044a [02/16] powerpc/watchpoint: Add SPRN macros for second DAWR https://git.kernel.org/powerpc/c/4a4ec2289a5d748cb64ff67ca8d74535a76a8436 [03/16] powerpc/watchpoint: Introduce function to get nr watchpoints dynamically https://git.kernel.org/powerpc/c/a6ba44e8799230e36c8ab06fda7f77f421e9e795 [04/16] powerpc/watchpoint/ptrace: Return actual num of available watchpoints https://git.kernel.org/powerpc/c/45093b382e0ac25c206b4dcd210c6be1f5e56e60 [05/16] powerpc/watchpoint: Provide DAWR number to set_dawr https://git.kernel.org/powerpc/c/a18b834625d345bfa89c4e2754dd6cbb0133c4d7 [06/16] powerpc/watchpoint: Provide DAWR number to __set_breakpoint https://git.kernel.org/powerpc/c/4a8a9379f2af4c9928529b3959bc2d8f7023c6bc [07/16] powerpc/watchpoint: Get watchpoint count dynamically while disabling them https://git.kernel.org/powerpc/c/c2919132734f29a7a33e1339bef8a67b11f322eb [08/16] powerpc/watchpoint: Disable all available watchpoints when !dawr_force_enable https://git.kernel.org/powerpc/c/22a214e461c5cc9428b86915d9cfcf84c6e11ad7 [09/16] powerpc/watchpoint: Convert thread_struct->hw_brk to an array https://git.kernel.org/powerpc/c/303e6a9ddcdc168e92253c78cdb4bbe1e10d78b3 [10/16] powerpc/watchpoint: Use loop for thread_struct->ptrace_bps https://git.kernel.org/powerpc/c/6b424efa119d5ea06b15ff240dddc3b4b9f9cdfb [11/16] powerpc/watchpoint: Introduce is_ptrace_bp() function https://git.kernel.org/powerpc/c/c9e82aeb197df2d93b1b4234bc0c80943fa594e8 [12/16] powerpc/watchpoint: Use builtin ALIGN*() macros https://git.kernel.org/powerpc/c/e68ef121c1f4c38edf87a3354661ceb99d522729 [13/16] powerpc/watchpoint: Prepare handler to handle more than one watchpoint https://git.kernel.org/powerpc/c/74c6881019b7d56c327fffc268d97adb5eb1b4f9 [14/16] powerpc/watchpoint: Don't allow concurrent perf and ptrace events https://git.kernel.org/powerpc/c/29da4f91c0c1fbda12b8a31be0d564930208c92e [15/16] powerpc/watchpoint/xmon: Don't allow breakpoint overwriting https://git.kernel.org/powerpc/c/514db915e7b33e7eaf8e40192b93380f79b319b5 [16/16] powerpc/watchpoint/xmon: Support 2nd DAWR https://git.kernel.org/powerpc/c/30df74d67d48949da87e3a5b57c381763e8fd526 cheers
Re: [PATCH v4 00/16] powerpc: machine check and system reset fixes
On Fri, 8 May 2020 14:33:52 +1000, Nicholas Piggin wrote: > Since v3, I fixed a compile error and returned the generic machine check > exception handler to be NMI on 32 and 64e, as caught by Christophe's > review. > > Also added the last patch, just found it by looking at the code, a > review for 32s would be good. > > [...] Patches 1-15 applied to powerpc/next. [01/16] powerpc/64s/exception: Fix machine check no-loss idle wakeup https://git.kernel.org/powerpc/c/8a5054d8cbbe03c68dcb0957c291c942132e4101 [02/16] powerpc/64s/exceptions: Fix in_mce accounting in unrecoverable path https://git.kernel.org/powerpc/c/ac2a2a1417391180ef12f908a2864692d6d76d40 [03/16] powerpc/64s/exceptions: Change irq reconcile for NMIs from reusing _DAR to RESULT https://git.kernel.org/powerpc/c/16754d25bd7d4e53a52b311d99cc7a8fba875d81 [04/16] powerpc/64s/exceptions: Machine check reconcile irq state https://git.kernel.org/powerpc/c/f0fd9dd3c213c947dfb5bc2cad3ef5e30d3258ec [05/16] powerpc/pseries/ras: Avoid calling rtas_token() in NMI paths https://git.kernel.org/powerpc/c/7368b38b21bfa39df637701a480262c15ab1a49e [06/16] powerpc/pseries/ras: Fix FWNMI_VALID off by one https://git.kernel.org/powerpc/c/deb70f7a35a22dffa55b2c3aac71bc6fb0f486ce [07/16] powerpc/pseries/ras: fwnmi avoid modifying r3 in error case https://git.kernel.org/powerpc/c/dff681e95a23f28b3c688a8bd5535f78bd726bc8 [08/16] powerpc/pseries/ras: fwnmi sreset should not interlock https://git.kernel.org/powerpc/c/d7b14c5c042865070a1411078ab49ea17bad0b41 [09/16] powerpc/pseries: Limit machine check stack to 4GB https://git.kernel.org/powerpc/c/d2cbbd45d433b96e41711a293e59cff259143694 [10/16] powerpc/pseries: Machine check use rtas_call_unlocked() with args on stack https://git.kernel.org/powerpc/c/2576f5f9169620bf329cf1e91086e6041b98e4b2 [11/16] powerpc/64s: machine check interrupt update NMI accounting https://git.kernel.org/powerpc/c/116ac378bb3ff844df333e7609e7604651a0db9d [12/16] powerpc: Implement ftrace_enabled() helpers https://git.kernel.org/powerpc/c/f2d7f62e4abdb03de3f4267361d96c417312d05c [13/16] powerpc/64s: machine check do not trace real-mode handler https://git.kernel.org/powerpc/c/abd106fb437ad1cd8c8df8ccabd0fa941ef6342a [14/16] powerpc/traps: Do not trace system reset https://git.kernel.org/powerpc/c/bbbc8032b00f8ef287894425fbdb691049e28d39 [15/16] powerpc/traps: Make unrecoverable NMIs die instead of panic https://git.kernel.org/powerpc/c/265d6e588d87194c2fe2d6c240247f0264e0c19b cheers
Re: [PATCH v2] powerpc/64: Update Speculation_Store_Bypass in /proc//status
On Thu, 2 Apr 2020 23:49:29 +1100, Michael Ellerman wrote: > Currently we don't report anything useful in /proc//status: > > $ grep Speculation_Store_Bypass /proc/self/status > Speculation_Store_Bypass: unknown > > Our mitigation is currently always a barrier instruction, which > doesn't map that well onto the existing possibilities for the PR_SPEC > values. > > [...] Applied to powerpc/next. [1/1] powerpc/64: Update Speculation_Store_Bypass in /proc//status https://git.kernel.org/powerpc/c/d93e5e2d03d4f41dfedb92200a2c0413ab8ee4e7 cheers
Re: [PATCH v2 1/4] powerpc/64s: Always has full regs, so remove remnant checks
On Thu, 7 May 2020 22:13:29 +1000, Michael Ellerman wrote: > Applied to powerpc/next. [1/4] powerpc/64s: Always has full regs, so remove remnant checks https://git.kernel.org/powerpc/c/feb9df3462e688d073848d85c8bb132fe8fd9ae5 [2/4] powerpc: Use set_trap() and avoid open-coding trap masking https://git.kernel.org/powerpc/c/db30144b5c9cfb09c6b8b2fa7a9c351c94aa3433 [3/4] powerpc: trap_is_syscall() helper to hide syscall trap number https://git.kernel.org/powerpc/c/912237ea166428edcbf3c137adf12cb987c477f2 [4/4] powerpc: Use trap metadata to prevent double restart rather than zeroing trap https://git.kernel.org/powerpc/c/4e0e45b07d790253643ee05300784ab2156e2d5e cheers
Re: [PATCH] powerpc/64: Don't initialise init_task->thread.regs
On Tue, 28 Apr 2020 22:31:30 +1000, Michael Ellerman wrote: > Aneesh increased the size of struct pt_regs by 16 bytes and started > seeing this WARN_ON: > > smp: Bringing up secondary CPUs ... > [ cut here ] > WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 > giveup_all+0xb4/0x110 > Modules linked in: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318 > NIP: c001a2b4 LR: c001a29c CTR: c31d > REGS: c26d3980 TRAP: 0700 Not tainted > (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+) > MSR: 8282b033 CR: 48048224 > XER: > CFAR: c0019cc8 IRQMASK: 1 > GPR00: c001a264 c26d3c20 c26d7200 8280b033 > GPR04: 0001 0077 30206d7372203164 > GPR08: 2000 02002000 8280b033 3230303030303030 > GPR12: 8800 c31d 00800050 0266 > GPR16: 0309a1a0 0309a4b0 0309a2d8 0309a890 > GPR20: 030d0098 c264da40 fd62 c000ff798080 > GPR24: c264edf0 c001007469f0 fd62 c20e5e90 > GPR28: c264edf0 c264d200 1db6 c264d200 > NIP [c001a2b4] giveup_all+0xb4/0x110 > LR [c001a29c] giveup_all+0x9c/0x110 > Call Trace: > [c26d3c20] [c001a264] giveup_all+0x64/0x110 (unreliable) > [c26d3c90] [c001ae34] __switch_to+0x104/0x480 > [c26d3cf0] [c0e0b8a0] __schedule+0x320/0x970 > [c26d3dd0] [c0e0c518] schedule_idle+0x38/0x70 > [c26d3df0] [c019c7c8] do_idle+0x248/0x3f0 > [c26d3e70] [c019cbb8] cpu_startup_entry+0x38/0x40 > [c26d3ea0] [c0011bb0] rest_init+0xe0/0xf8 > [c26d3ed0] [c2004820] start_kernel+0x990/0x9e0 > [c26d3f90] [c000c49c] start_here_common+0x1c/0x400 > > [...] Applied to powerpc/next. [1/1] powerpc/64: Don't initialise init_task->thread.regs https://git.kernel.org/powerpc/c/7ffa8b7dc11752827329e4e84a574ea6aaf24716 cheers
Re: [PATCH] selftests/powerpc: Add a test of counting larx/stcx
On Sun, 26 Apr 2020 21:44:10 +1000, Michael Ellerman wrote: > This is based on the count_instructions test. > > However this one also counts the number of failed stcx's, and in > conjunction with knowing the size of the stcx loop, can calculate the > total number of instructions executed even in the face of > non-deterministic stcx failures. Applied to powerpc/next. [1/1] selftests/powerpc: Add a test of counting larx/stcx https://git.kernel.org/powerpc/c/7481cad4747303442209bc5dba2f56c3afcea07d cheers
Re: [PATCH] drivers/macintosh: Fix memleak in windfarm_pm112 driver
On Thu, 23 Apr 2020 16:00:38 +1000, Michael Ellerman wrote: > create_cpu_loop() calls smu_sat_get_sdb_partition() which does > kmalloc() and returns the allocated buffer. In fact it's called twice, > and neither buffer is freed. > > This results in a memory leak as reported by Erhard: > unreferenced object 0xc0047081f840 (size 32): > comm "kwindfarm", pid 203, jiffies 4294880630 (age 5552.877s) > hex dump (first 32 bytes): > c8 06 02 7f ff 02 ff 01 fb bf 00 41 00 20 00 00 ...A. .. > 00 07 89 37 00 a0 00 00 00 00 00 00 00 00 00 00 ...7 > backtrace: > [<83f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 > [windfarm_smu_sat] > [<3010fcb7>] .pm112_wf_notify+0x104c/0x13bc [windfarm_pm112] > [] .notifier_call_chain+0xa8/0x180 > [<70490868>] .blocking_notifier_call_chain+0x64/0x90 > [<131d8149>] .wf_thread_func+0x114/0x1a0 > [<0d54838d>] .kthread+0x13c/0x190 > [<669b72bc>] .ret_from_kernel_thread+0x58/0x64 > unreferenced object 0xc004737089f0 (size 16): > comm "kwindfarm", pid 203, jiffies 4294880879 (age 5552.050s) > hex dump (first 16 bytes): > c4 04 01 7f 22 11 e0 e6 ff 55 7b 12 ec 11 00 00 "U{. > backtrace: > [<83f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 > [windfarm_smu_sat] > [ ] .pm112_wf_notify+0x1294/0x13bc [windfarm_pm112] > [ ] .notifier_call_chain+0xa8/0x180 > [<70490868>] .blocking_notifier_call_chain+0x64/0x90 > [<131d8149>] .wf_thread_func+0x114/0x1a0 > [<0d54838d>] .kthread+0x13c/0x190 > [<669b72bc>] .ret_from_kernel_thread+0x58/0x64 > > [...] Applied to powerpc/next. [1/1] drivers/macintosh: Fix memleak in windfarm_pm112 driver https://git.kernel.org/powerpc/c/93900337b9ac2f4eca427eff6d187be2dc3b5551 cheers
Re: [PATCH] powerpc: Drop unneeded cast in task_pt_regs()
On Tue, 28 Apr 2020 22:31:52 +1000, Michael Ellerman wrote: > There's no need to cast in task_pt_regs() as tsk->thread.regs should > already be a struct pt_regs. If someone's using task_pt_regs() on > something that's not a task but happens to have a thread.regs then > we'll deal with them later. Applied to powerpc/next. [1/1] powerpc: Drop unneeded cast in task_pt_regs() https://git.kernel.org/powerpc/c/24ac99e97fa7b8f0db9b48413a76def9cf73295c cheers
Re: [PATCH v8 00/30] Initial Prefixed Instruction support
On Wed, 6 May 2020 13:40:20 +1000, Jordan Niethe wrote: > A future revision of the ISA will introduce prefixed instructions. A > prefixed instruction is composed of a 4-byte prefix followed by a > 4-byte suffix. > > All prefixes have the major opcode 1. A prefix will never be a valid > word instruction. A suffix may be an existing word instruction or a > new instruction. > > [...] Applied to powerpc/next. [01/30] powerpc/xmon: Remove store_inst() for patch_instruction() https://git.kernel.org/powerpc/c/802268fd82676ffce432776f60b93a0b15e58e0c [02/30] powerpc/xmon: Move breakpoint instructions to own array https://git.kernel.org/powerpc/c/51c9ba11f17f25ace1ea6bbfd4586c59105432de [03/30] powerpc/xmon: Move breakpoints to text section https://git.kernel.org/powerpc/c/4eff2b4f32a309e2171bfe53db3e93b5614f77cb [04/30] powerpc/xmon: Use bitwise calculations in_breakpoint_table() https://git.kernel.org/powerpc/c/5a7fdcab54ef17c395fc47e73c828a1432e51683 [05/30] powerpc: Change calling convention for create_branch() et. al. https://git.kernel.org/powerpc/c/7c95d8893fb55869882c9f68f4c94840dc43f18f [06/30] powerpc: Use a macro for creating instructions from u32s https://git.kernel.org/powerpc/c/753462512868674a788ecc77bb96752efb818785 [07/30] powerpc: Use an accessor for instructions https://git.kernel.org/powerpc/c/777e26f0edf8dab58b8dd474d35d83bde0ac6d76 [08/30] powerpc: Use a function for getting the instruction op code https://git.kernel.org/powerpc/c/8094892d1aff14269d3b7bfcd8b941217eecd81f [09/30] powerpc: Use a function for byte swapping instructions https://git.kernel.org/powerpc/c/aabd2233b6aefeee6d7a2f667076d8346be1d30a [10/30] powerpc: Introduce functions for instruction equality https://git.kernel.org/powerpc/c/217862d9b98bf08958d57fd7b31b9de0f1a9477d [11/30] powerpc: Use a datatype for instructions https://git.kernel.org/powerpc/c/94afd069d937d84fb4f696eb9a78db4084e43d21 [12/30] powerpc: Use a function for reading instructions https://git.kernel.org/powerpc/c/f8faaffaa7d99028e457ef2d1dcb43a98f736938 [13/30] powerpc: Add a probe_user_read_inst() function https://git.kernel.org/powerpc/c/7ba68b2172c19031fdc2a2caf37328edd146e299 [14/30] powerpc: Add a probe_kernel_read_inst() function https://git.kernel.org/powerpc/c/95b980a00d1220ca67550a933166704db8bc5c14 [15/30] powerpc/kprobes: Use patch_instruction() https://git.kernel.org/powerpc/c/a8646f43ba5046e7f5c4396125d5136bfcb17b49 [16/30] powerpc: Define and use get_user_instr() et. al. https://git.kernel.org/powerpc/c/5249385ad7f0ac178433f0ae9cc5b64612c8ff77 [17/30] powerpc: Introduce a function for reporting instruction length https://git.kernel.org/powerpc/c/622cf6f436a12338bbcfbb3474629755547fd112 [18/30] powerpc/xmon: Use a function for reading instructions https://git.kernel.org/powerpc/c/6c7a4f0a9f66fc7fdc6e208559e5d562f53e0991 [19/30] powerpc/xmon: Move insertion of breakpoint for xol'ing https://git.kernel.org/powerpc/c/7fccfcfba04f9cb46438f368755d368f6c57f3a0 [20/30] powerpc: Make test_translate_branch() independent of instruction length https://git.kernel.org/powerpc/c/0b582db5490a1f250ef63337dd46d5c7599dae80 [21/30] powerpc: Enable Prefixed Instructions https://git.kernel.org/powerpc/c/2aa6195e43b3740258ead93aee42ac719dd4c4b0 [22/30] powerpc: Define new SRR1 bits for a ISA v3.1 https://git.kernel.org/powerpc/c/b691505ef9232a6e82f1c160911afcb4cb20487b [23/30] powerpc: Add prefixed instructions to instruction data type https://git.kernel.org/powerpc/c/650b55b707fdfa764e9f2b81314d3eb4216fb962 [24/30] powerpc: Test prefixed code patching https://git.kernel.org/powerpc/c/f77f8ff7f13e6411c2e0ba25bb7e012a5ae6c927 [25/30] powerpc: Test prefixed instructions in feature fixups https://git.kernel.org/powerpc/c/785b79d1e02873c2088ee1301154c66dace66ce5 [26/30] powerpc/xmon: Don't allow breakpoints on suffixes https://git.kernel.org/powerpc/c/c9c831aebd8663d0129bbcee4d76be889f0627fe [27/30] powerpc/kprobes: Don't allow breakpoints on suffixes https://git.kernel.org/powerpc/c/b4657f7650babc9bfb41ce875abe41b18604a105 [28/30] powerpc: Support prefixed instructions in alignment handler https://git.kernel.org/powerpc/c/9409d2f9dad2f0679d67dc24d8116dd3e837b035 [29/30] powerpc sstep: Add support for prefixed load/stores https://git.kernel.org/powerpc/c/50b80a12e4ccff46d53b93754d817acd98bc9ae0 [30/30] powerpc sstep: Add support for prefixed fixed-point arithmetic https://git.kernel.org/powerpc/c/3920742b92f5ea19a220edb947b6f33c99f501da cheers
Re: [PATCH] powerpc: Replace zero-length array with flexible-array
On Thu, 7 May 2020 13:57:49 -0500, Gustavo A. R. Silva wrote: > The current codebase makes use of the zero-length array language > extension to the C90 standard, but the preferred mechanism to declare > variable-length types such as these ones is a flexible array member[1][2], > introduced in C99: > > struct foo { > int stuff; > struct boo array[]; > }; > > [...] Applied to powerpc/next. [1/1] powerpc: Replace zero-length array with flexible-array https://git.kernel.org/powerpc/c/0f6be41c60699fd8cdfa93e5e85a306cec1ac1d0 cheers
Re: [PATCH] powerpc/mm: Replace zero-length array with flexible-array
On Thu, 7 May 2020 13:57:55 -0500, Gustavo A. R. Silva wrote: > The current codebase makes use of the zero-length array language > extension to the C90 standard, but the preferred mechanism to declare > variable-length types such as these ones is a flexible array member[1][2], > introduced in C99: > > struct foo { > int stuff; > struct boo array[]; > }; > > [...] Applied to powerpc/next. [1/1] powerpc/mm: Replace zero-length array with flexible-array https://git.kernel.org/powerpc/c/02bddf21c34d0a918acc8647195ba4507e3db8fc cheers
Re: [PATCH v2 0/9] powerpc + ps3 patches
On Sat, 09 May 2020 18:58:31 +, Geoff Levand wrote: > This is a combined V2 of the two patch sets I sent out on March 27th, > 'PS3 patches for v5.7' and 'powerpc: Minor updates to improve build > debugging'. > > I've dropped these two patches that were in my 'PS3 patches for v5.7' set: > > powerpc/ps3: Add lv1_panic > powerpc/ps3: Add udbg_panic > > [...] Patches 1-6 and 8 applied to powerpc/next. [1/9] powerpc/head_check: Automatic verbosity https://git.kernel.org/powerpc/c/4c592a34391ea4987d29c1718f931b50416ca015 [2/9] powerpc/wrapper: Output linker map file https://git.kernel.org/powerpc/c/f61200d3e3386e78d49677dfb3911c9d7c0dfe4b [3/9] powerpc/head_check: Avoid broken pipe https://git.kernel.org/powerpc/c/331aa46aaf51325d8532a4948f5127b2edc441a5 [4/9] drivers/ps3: Remove duplicate error messages https://git.kernel.org/powerpc/c/6a8aa782cece2330322c33452a767f53f8ba38c9 [5/9] net/ps3_gelic_net: Remove duplicate error message https://git.kernel.org/powerpc/c/7b27b95a894d6a85c076f8d1f00e35316739bf51 [6/9] ps3disk: use the default segment boundary https://git.kernel.org/powerpc/c/720bc316690bd27dea9d71510b50f0cd698ffc32 [8/9] powerpc/ps3: Fix kexec shutdown hang https://git.kernel.org/powerpc/c/126554465d93b10662742128918a5fc338cda4aa cheers
Re: [PATCH] powerpc/8xx: Update email address in MAINTAINERS
On Wed, 6 May 2020 06:51:59 + (UTC), Christophe Leroy wrote: > Since 01 May 2020, our email adresses have changed to @csgroup.eu > > Update MAINTAINERS accordingly. Applied to powerpc/next. [1/1] powerpc/8xx: Update email address in MAINTAINERS https://git.kernel.org/powerpc/c/679d74abc4e14cb369e46f080d90f4dc8c143e65 cheers
Re: [PATCH] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT
On Wed, 8 Apr 2020 15:58:49 + (UTC), Christophe Leroy wrote: > When CONFIG_KASAN is selected, the stack usage is increased. > > In the same way as x86 and arm64 architectures, increase > THREAD_SHIFT when CONFIG_KASAN is selected. Applied to powerpc/next. [1/1] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT https://git.kernel.org/powerpc/c/edbadaf0671072298e506074128b64e003c5812c cheers
Re: [PATCH 1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN()
On Mon, 20 Apr 2020 18:36:34 + (UTC), Christophe Leroy wrote: > _ALIGN_UP() is specific to powerpc > ALIGN() is generic and does the same > > Replace _ALIGN_UP() by ALIGN() Applied to powerpc/next. [1/5] drivers/powerpc: Replace _ALIGN_UP() by ALIGN() https://git.kernel.org/powerpc/c/7bfc3c84cbf5167d943cff9b3d2619dab0b7894c [2/5] powerpc: Replace _ALIGN_DOWN() by ALIGN_DOWN() https://git.kernel.org/powerpc/c/e96d904ede6756641563d27daa746875b478a6c8 [3/5] powerpc: Replace _ALIGN_UP() by ALIGN() https://git.kernel.org/powerpc/c/b711531641038f3ff3723914f3d5ba79848d347e [4/5] powerpc: Replace _ALIGN() by ALIGN() https://git.kernel.org/powerpc/c/d3f3d3bf76cfb04e73436a15e3987d3573e7523a [5/5] powerpc: Remove _ALIGN_UP(), _ALIGN_DOWN() and _ALIGN() https://git.kernel.org/powerpc/c/4cdb2da654033d76e1b1cb4ac427d9193dce816b cheers
[PATCH] Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."
This reverts commit 697ece78f8f749aeea40f2711389901f0974017a. The implementation of SWAP on powerpc requires page protection bits to not be one of the least significant PTE bits. Until the SWAP implementation is changed and this requirement voids, we have to keep at least _PAGE_RW outside of the 3 last bits. For now, revert to previous PTE bits order. A further rework may come later. Reported-by: Rui Salvaterra Fixes: 697ece78f8f7 ("powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits.") Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/32/hash.h | 8 arch/powerpc/kernel/head_32.S | 9 ++--- arch/powerpc/mm/book3s32/hash_low.S | 14 -- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/32/hash.h b/arch/powerpc/include/asm/book3s/32/hash.h index 34a7215ae81e..2a0a467d2985 100644 --- a/arch/powerpc/include/asm/book3s/32/hash.h +++ b/arch/powerpc/include/asm/book3s/32/hash.h @@ -17,9 +17,9 @@ * updating the accessed and modified bits in the page table tree. */ -#define _PAGE_USER 0x001 /* usermode access allowed */ -#define _PAGE_RW 0x002 /* software: user write access allowed */ -#define _PAGE_PRESENT 0x004 /* software: pte contains a translation */ +#define _PAGE_PRESENT 0x001 /* software: pte contains a translation */ +#define _PAGE_HASHPTE 0x002 /* hash_page has made an HPTE for this pte */ +#define _PAGE_USER 0x004 /* usermode access allowed */ #define _PAGE_GUARDED 0x008 /* G: prohibit speculative access */ #define _PAGE_COHERENT 0x010 /* M: enforce memory coherence (SMP systems) */ #define _PAGE_NO_CACHE 0x020 /* I: cache inhibit */ @@ -27,7 +27,7 @@ #define _PAGE_DIRTY0x080 /* C: page changed */ #define _PAGE_ACCESSED 0x100 /* R: page referenced */ #define _PAGE_EXEC 0x200 /* software: exec allowed */ -#define _PAGE_HASHPTE 0x400 /* hash_page has made an HPTE for this pte */ +#define _PAGE_RW 0x400 /* software: user write access allowed */ #define _PAGE_SPECIAL 0x800 /* software: Special page */ #ifdef CONFIG_PTE_64BIT diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S index daaa153950c2..97c887950c3c 100644 --- a/arch/powerpc/kernel/head_32.S +++ b/arch/powerpc/kernel/head_32.S @@ -348,7 +348,7 @@ BEGIN_MMU_FTR_SECTION andis. r0, r5, (DSISR_BAD_FAULT_32S | DSISR_DABRMATCH)@h #endif bne handle_page_fault_tramp_2 /* if not, try to put a PTE */ - rlwinm r3, r5, 32 - 24, 30, 30 /* DSISR_STORE -> _PAGE_RW */ + rlwinm r3, r5, 32 - 15, 21, 21 /* DSISR_STORE -> _PAGE_RW */ bl hash_page b handle_page_fault_tramp_1 FTR_SECTION_ELSE @@ -497,6 +497,7 @@ InstructionTLBMiss: andc. r1,r1,r0/* check access & ~permission */ bne-InstructionAddressInvalid /* return if access not permitted */ /* Convert linux-style PTE to low word of PPC-style PTE */ + rlwimi r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */ ori r1, r1, 0xe06 /* clear out reserved bits */ andcr1, r0, r1 /* PP = user? 1 : 0 */ BEGIN_FTR_SECTION @@ -564,8 +565,9 @@ DataLoadTLBMiss: * we would need to update the pte atomically with lwarx/stwcx. */ /* Convert linux-style PTE to low word of PPC-style PTE */ - rlwinm r1,r0,0,30,30 /* _PAGE_RW -> PP msb */ - rlwimi r0,r0,1,30,30 /* _PAGE_USER -> PP msb */ + rlwinm r1,r0,32-9,30,30/* _PAGE_RW -> PP msb */ + rlwimi r0,r0,32-1,30,30/* _PAGE_USER -> PP msb */ + rlwimi r0,r0,32-1,31,31/* _PAGE_USER -> PP lsb */ ori r1,r1,0xe04 /* clear out reserved bits */ andcr1,r0,r1/* PP = user? rw? 1: 3: 0 */ BEGIN_FTR_SECTION @@ -643,6 +645,7 @@ DataStoreTLBMiss: * we would need to update the pte atomically with lwarx/stwcx. */ /* Convert linux-style PTE to low word of PPC-style PTE */ + rlwimi r0,r0,32-2,31,31/* _PAGE_USER -> PP lsb */ li r1,0xe06/* clear out reserved bits & PP msb */ andcr1,r0,r1/* PP = user? 1: 0 */ BEGIN_FTR_SECTION diff --git a/arch/powerpc/mm/book3s32/hash_low.S b/arch/powerpc/mm/book3s32/hash_low.S index 6d236080cb1a..877d880890fe 100644 --- a/arch/powerpc/mm/book3s32/hash_low.S +++ b/arch/powerpc/mm/book3s32/hash_low.S @@ -35,7 +35,7 @@ mmu_hash_lock: /* * Load a PTE into the hash table, if possible. * The address is in r4, and r3 contains an access flag: - * _PAGE_RW (0x002) if a write. + * _PAGE_RW (0x400) if a write. * r9 contains the SRR1 value, from which we use the MSR_PR bit. * SPRG_THREAD contains the physical address of the current task's thread. * @@ -69,7 +69,7 @@ _GLOBAL(hash_page) blt+112f
Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods
Thanks for reviewing this patch Aneesh. "Aneesh Kumar K.V" writes: > Vaibhav Jain writes: > > > > + >> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm >> */ >> +struct nd_pdsm_cmd_pkg { >> +struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */ >> +__s32 cmd_status; /* Out: Sub-cmd status returned back */ >> +__u16 payload_offset; /* In: offset from start of struct */ >> +__u16 payload_version; /* In/Out: version of the payload */ >> +__u8 payload[]; /* In/Out: Sub-cmd data buffer */ >> +} __packed; > > that payload_offset can be avoided if we prevent userspace to user a > different variant of nd_pdsm_cmd_pkg which different header. We can keep > things simpler if we can always find payload at > nd_pdsm_cmd_pkg->payload. Had introduced this member to handle case where new fields are added to 'struct nd_pdsm_cmd_pkg' without having to break the ABI. But agree with the point that you made later that this can be simplified by replacing 'payload_offset' with a set of reserved variables. Will address this in next iteration of this patchset. > >> + >> +/* >> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the >> kernel >> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct >> + */ >> +enum papr_scm_pdsm { >> +PAPR_SCM_PDSM_MIN = 0x0, >> +PAPR_SCM_PDSM_MAX, >> +}; >> + >> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */ >> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg >> *cmd) >> +{ >> +return (struct nd_pdsm_cmd_pkg *) cmd; >> +} >> + >> +/* Return the payload pointer for a given pcmd */ >> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd) >> +{ >> +if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0) >> +return NULL; >> +else >> +return (void *)((__u8 *) pcmd + pcmd->payload_offset); >> +} >> + > > we need to make sure userspace is not passing a wrong payload_offset. Agree, that this function should have more strict checking for payload_offset field. However will be getting rid of 'payload_offset' all together in the next iteration as you previously suggested. > and in the next patch you do > > + /* Copy the health struct to the payload */ > + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize); > + pkg->hdr.nd_fw_size = copysize; > + Yes this is already being done in the patchset and changes proposed to this pdsm_cmd_to_payload() should not impact other patches as pdsm_cmd_to_payload() abstracts rest of the code from how to access the payload. > All this can be simplified if you can keep payload at > nd_pdsm_cmd_pkg->payload. > > If you still want to have the ability to extend the header, then added a > reserved field similar to nd_cmd_pkg. > Agree to this and will address this in V8. > > -aneesh -- Cheers ~ Vaibhav
[PATCH V3 2/2] tools/perf: Add perf tools support for extended register capability in powerpc
From: Anju T Sudhakar Add extended regs to sample_reg_mask in the tool side to use with `-I?` option. Perf tools side uses extended mask to display the platform supported register names (with -I? option) to the user and also send this mask to the kernel to capture the extended registers in each sample. Hence decide the mask value based on the processor version. Signed-off-by: Anju T Sudhakar [Decide extended mask at run time based on platform] Signed-off-by: Athira Rajeev --- tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++- tools/perf/arch/powerpc/include/perf_regs.h | 5 ++- tools/perf/arch/powerpc/util/perf_regs.c| 55 + 3 files changed, 72 insertions(+), 2 deletions(-) diff --git a/tools/arch/powerpc/include/uapi/asm/perf_regs.h b/tools/arch/powerpc/include/uapi/asm/perf_regs.h index f599064..485b1d5 100644 --- a/tools/arch/powerpc/include/uapi/asm/perf_regs.h +++ b/tools/arch/powerpc/include/uapi/asm/perf_regs.h @@ -48,6 +48,18 @@ enum perf_event_powerpc_regs { PERF_REG_POWERPC_DSISR, PERF_REG_POWERPC_SIER, PERF_REG_POWERPC_MMCRA, - PERF_REG_POWERPC_MAX, + /* Extended registers */ + PERF_REG_POWERPC_MMCR0, + PERF_REG_POWERPC_MMCR1, + PERF_REG_POWERPC_MMCR2, + /* Max regs without the extended regs */ + PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1, }; + +#define PERF_REG_PMU_MASK ((1ULL << PERF_REG_POWERPC_MAX) - 1) + +/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */ +#define PERF_REG_PMU_MASK_300 (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \ + - PERF_REG_PMU_MASK) + #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */ diff --git a/tools/perf/arch/powerpc/include/perf_regs.h b/tools/perf/arch/powerpc/include/perf_regs.h index e18a355..46ed00d 100644 --- a/tools/perf/arch/powerpc/include/perf_regs.h +++ b/tools/perf/arch/powerpc/include/perf_regs.h @@ -64,7 +64,10 @@ [PERF_REG_POWERPC_DAR] = "dar", [PERF_REG_POWERPC_DSISR] = "dsisr", [PERF_REG_POWERPC_SIER] = "sier", - [PERF_REG_POWERPC_MMCRA] = "mmcra" + [PERF_REG_POWERPC_MMCRA] = "mmcra", + [PERF_REG_POWERPC_MMCR0] = "mmcr0", + [PERF_REG_POWERPC_MMCR1] = "mmcr1", + [PERF_REG_POWERPC_MMCR2] = "mmcr2", }; static inline const char *perf_reg_name(int id) diff --git a/tools/perf/arch/powerpc/util/perf_regs.c b/tools/perf/arch/powerpc/util/perf_regs.c index 0a52429..9179230 100644 --- a/tools/perf/arch/powerpc/util/perf_regs.c +++ b/tools/perf/arch/powerpc/util/perf_regs.c @@ -6,9 +6,14 @@ #include "../../../util/perf_regs.h" #include "../../../util/debug.h" +#include "../../../util/event.h" +#include "../../../util/header.h" +#include "../../../perf-sys.h" #include +#define PVR_POWER9 0x004E + const struct sample_reg sample_reg_masks[] = { SMPL_REG(r0, PERF_REG_POWERPC_R0), SMPL_REG(r1, PERF_REG_POWERPC_R1), @@ -55,6 +60,9 @@ SMPL_REG(dsisr, PERF_REG_POWERPC_DSISR), SMPL_REG(sier, PERF_REG_POWERPC_SIER), SMPL_REG(mmcra, PERF_REG_POWERPC_MMCRA), + SMPL_REG(mmcr0, PERF_REG_POWERPC_MMCR0), + SMPL_REG(mmcr1, PERF_REG_POWERPC_MMCR1), + SMPL_REG(mmcr2, PERF_REG_POWERPC_MMCR2), SMPL_REG_END }; @@ -163,3 +171,50 @@ int arch_sdt_arg_parse_op(char *old_op, char **new_op) return SDT_ARG_VALID; } + +uint64_t arch__intr_reg_mask(void) +{ + struct perf_event_attr attr = { + .type = PERF_TYPE_HARDWARE, + .config = PERF_COUNT_HW_CPU_CYCLES, + .sample_type= PERF_SAMPLE_REGS_INTR, + .precise_ip = 1, + .disabled = 1, + .exclude_kernel = 1, + }; + int fd, ret; + char buffer[64]; + u32 version; + u64 extended_mask = 0; + + /* Get the PVR value to set the extended +* mask specific to platform +*/ + get_cpuid(buffer, sizeof(buffer)); + ret = sscanf(buffer, "%u,", &version); + + if (ret != 1) { + pr_debug("Failed to get the processor version, unable to output extended registers\n"); + return PERF_REGS_MASK; + } + + if (version == PVR_POWER9) + extended_mask = PERF_REG_PMU_MASK_300; + else + return PERF_REGS_MASK; + + attr.sample_regs_intr = extended_mask; + attr.sample_period = 1; + event_attr_init(&attr); + + /* +* check if the pmu supports perf extended regs, before +* returning the register mask to sample. +*/ + fd = sys_perf_event_open(&attr, 0, -1, -1, 0); + if (fd != -1) { + close(fd); + return (extended_mask | PERF_REGS_MASK); + } + return PERF_REGS_MASK; +} -- 1.8.3.1
[PATCH V3 1/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs
From: Anju T Sudhakar Add support for perf extended register capability in powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the PMU which support extended registers. The generic code define the mask of extended registers as 0 for non supported architectures. Patch adds extended regs support for power9 platform by exposing MMCR0, MMCR1 and MMCR2 registers. REG_RESERVED mask needs update to include extended regs. `PERF_REG_EXTENDED_MASK`, contains mask value of the supported registers, is defined at runtime in the kernel based on platform since the supported registers may differ from one processor version to another and hence the MASK value. with patch -- available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2 PERF_RECORD_SAMPLE(IP, 0x1): 4784/4784: 0 period: 1 addr: 0 ... intr regs: mask 0x ABI 64-bit r00xc012b77c r10xc03fe5e03930 r20xc1b0e000 r30xc03fdcddf800 r40xc03fc788 r50x9c422724be r60xc03fe5e03908 r70xff63bddc8706 r80x9e4 r90x0 r10 0x1 r11 0x0 r12 0xc01299c0 r13 0xc03c4800 r14 0x0 r15 0x7fffdd8b8b00 r16 0x0 r17 0x7fffdd8be6b8 r18 0x7e7076607730 r19 0x2f r20 0xc0001fc26c68 r21 0xc0002041e4227e00 r22 0xc0002018fb60 r23 0x1 r24 0xc03ffec4d900 r25 0x8000 r26 0x0 r27 0x1 r28 0x1 r29 0xc1be1260 r30 0x6008010 r31 0xc03ffebb7218 nip 0xc012b910 msr 0x90009033 orig_r3 0xc012b86c ctr 0xc01299c0 link 0xc012b77c xer 0x0 ccr 0x2800 softe 0x1 trap 0xf00 dar 0x0 dsisr 0x800 sier 0x0 mmcra 0x800 mmcr0 0x82008090 mmcr1 0x1e00 mmcr2 0x0 ... thread: perf:4784 Signed-off-by: Anju T Sudhakar [Defined PERF_REG_EXTENDED_MASK at run time to add support for different platforms ] Signed-off-by: Athira Rajeev --- arch/powerpc/include/asm/perf_event_server.h | 8 +++ arch/powerpc/include/uapi/asm/perf_regs.h| 14 +++- arch/powerpc/perf/core-book3s.c | 1 + arch/powerpc/perf/perf_regs.c| 34 +--- arch/powerpc/perf/power9-pmu.c | 6 + 5 files changed, 59 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 3e9703f..1458e1a 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -15,6 +15,9 @@ #define MAX_EVENT_ALTERNATIVES 8 #define MAX_LIMITED_HWCOUNTERS 2 +extern u64 mask_var; +#define PERF_REG_EXTENDED_MASK mask_var + struct perf_event; /* @@ -55,6 +58,11 @@ struct power_pmu { int *blacklist_ev; /* BHRB entries in the PMU */ int bhrb_nr; + /* +* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if +* the pmu supports extended perf regs capability +*/ + int capabilities; }; /* diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h b/arch/powerpc/include/uapi/asm/perf_regs.h index f599064..485b1d5 100644 --- a/arch/powerpc/include/uapi/asm/perf_regs.h +++ b/arch/powerpc/include/uapi/asm/perf_regs.h @@ -48,6 +48,18 @@ enum perf_event_powerpc_regs { PERF_REG_POWERPC_DSISR, PERF_REG_POWERPC_SIER, PERF_REG_POWERPC_MMCRA, - PERF_REG_POWERPC_MAX, + /* Extended registers */ + PERF_REG_POWERPC_MMCR0, + PERF_REG_POWERPC_MMCR1, + PERF_REG_POWERPC_MMCR2, + /* Max regs without the extended regs */ + PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1, }; + +#define PERF_REG_PMU_MASK ((1ULL << PERF_REG_POWERPC_MAX) - 1) + +/* PERF_REG_EXTENDED_MASK value for CPU_FTR_ARCH_300 */ +#define PERF_REG_PMU_MASK_300 (((1ULL << (PERF_REG_POWERPC_MMCR2 + 1)) - 1) \ + - PERF_REG_PMU_MASK) + #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */ diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 3dcfecf..f56b778 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2276,6 +2276,7 @@ int register_power_pmu(struct power_pmu *pmu) power_pmu.attr_groups = ppmu->attr_groups; + power_pmu.capabilities |= (ppmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS); #ifdef MSR_HV /* * Use FCHV to ignore kernel events if MSR.HV is set. diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c index a213a0a..f1dbbc5 10064
[PATCH V3 0/2] powerpc/perf: Add support for perf extended regs in powerpc
Patch set to add support for perf extended register capability in powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the PMU which support extended registers. The generic code define the mask of extended registers as 0 for non supported architectures. patch 1/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the values of mmcr0,mmcr1,mmcr2 for POWER9. Defines `PERF_REG_EXTENDED_MASK` at runtime which contains mask value of the supported registers under extended regs. Patch 2/2 adds extended regs to sample_reg_mask in the tool side to use with `-I?` option. Anju T Sudhakar (2): powerpc/perf: Add support for outputting extended regs in perf intr_regs tools/perf: Add perf tools support for extended register capability in powerpc --- Changes from v2 -> v3 - Split kernel and tools side patches as suggested by Arnaldo - Addressed review comment from Madhavan Srinivasn Changes from v1 -> v2 - PERF_REG_EXTENDED_MASK` is defined at runtime in the kernel based on platform. This will give flexibility in using extended regs for all processor versions where the supported registers may differ. - removed PERF_REG_EXTENDED_MASK from the perf tools side. Based on the processor version(from PVR value), tool side will return the appropriate extended mask - Since tool changes can handle without a "PERF_REG_EXTENDED_MASK" macro, dropped patch to set NO_AUXTRACE. - Addressed review comments from Ravi Bangoria for V1 --- arch/powerpc/include/asm/perf_event_server.h| 8 arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++- arch/powerpc/perf/core-book3s.c | 1 + arch/powerpc/perf/perf_regs.c | 34 +-- arch/powerpc/perf/power9-pmu.c | 6 +++ tools/arch/powerpc/include/uapi/asm/perf_regs.h | 14 ++- tools/perf/arch/powerpc/include/perf_regs.h | 5 ++- tools/perf/arch/powerpc/util/perf_regs.c| 55 + 8 files changed, 131 insertions(+), 6 deletions(-) -- 1.8.3.1
Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
Am Mittwoch, den 20.05.2020, 16:20 +0800 schrieb Shengjiu Wang: > Hi > > On Tue, May 19, 2020 at 6:04 PM Lucas Stach wrote: > > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang: > > > There are two requirements that we need to move the request > > > of dma channel from probe to open. > > > > How do you handle -EPROBE_DEFER return code from the channel request if > > you don't do it in probe? > > I use the dma_request_slave_channel or dma_request_channel instead > of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER > return code. This is a pretty weak argument. The dmaengine device might probe after you try to get the channel. Using a function to request the channel that doesn't allow you to handle probe deferral is IMHO a bug and should be fixed, instead of building even more assumptions on top of it. > > > - When dma device binds with power-domains, the power will > > > be enabled when we request dma channel. If the request of dma > > > channel happen on probe, then the power-domains will be always > > > enabled after kernel boot up, which is not good for power > > > saving, so we need to move the request of dma channel to .open(); > > > > This is certainly something which could be fixed in the dmaengine > > driver. > > Dma driver always call the pm_runtime_get_sync in > device_alloc_chan_resources, the device_alloc_chan_resources is > called when channel is requested. so power is enabled on channel > request. So why can't you fix the dmaengine driver to do that RPM call at a later time when the channel is actually going to be used? This will allow further power savings with other slave devices than the audio PCM. Regards, Lucas > > > - With FE-BE case, if the dma channel is requested in probe, > > > then there will be below issue, which is caused by that the > > > dma channel will be requested duplicately > > > > Why is this requested a second time? Is this just some missing cleanup > > on a deferred probe path? > > Not relate with deferred probe. With DMA1->ASRC->DMA2->ESAI case, > the DMA1->ASRC->DMA2 is in FE, ESAI is in BE. When ESAI drvier > probe, DMA3 channel is created with ESAI's "dma:tx" (DMA3 channel > is not used in this FE-BE case).When FE-BE startup, DMA2 > channel is created, it needs the ESAI's "dma:tx", so below warning > comes out. > > > Regards, > > Lucas > > > > > [ 638.906268] sysfs: cannot create duplicate filename > > > '/devices/soc0/soc/200.bus/200.spba-bus/2024000.esai/dma:tx' > > > [ 638.919061] CPU: 1 PID: 673 Comm: aplay Not tainted > > > 5.7.0-rc1-12956-gfc64b2585593 #287 > > > [ 638.927113] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > > [ 638.933690] [] (unwind_backtrace) from [] > > > (show_stack+0x10/0x14) > > > [ 638.941464] [] (show_stack) from [] > > > (dump_stack+0xe4/0x118) > > > [ 638.948808] [] (dump_stack) from [] > > > (sysfs_warn_dup+0x50/0x64) > > > [ 638.956406] [] (sysfs_warn_dup) from [] > > > (sysfs_do_create_link_sd+0xc8/0xd4) > > > [ 638.965134] [] (sysfs_do_create_link_sd) from [] > > > (dma_request_chan+0xb0/0x210) > > > [ 638.974120] [] (dma_request_chan) from [] > > > (dma_request_slave_channel+0x8/0x14) > > > [ 638.983111] [] (dma_request_slave_channel) from [] > > > (fsl_asrc_dma_hw_params+0x1e0/0x438) > > > [ 638.992881] [] (fsl_asrc_dma_hw_params) from [] > > > (soc_pcm_hw_params+0x4a0/0x6a8) > > > [ 639.001952] [] (soc_pcm_hw_params) from [] > > > (dpcm_fe_dai_hw_params+0x70/0xe4) > > > [ 639.010765] [] (dpcm_fe_dai_hw_params) from [] > > > (snd_pcm_hw_params+0x158/0x418) > > > [ 639.019750] [] (snd_pcm_hw_params) from [] > > > (snd_pcm_ioctl+0x734/0x183c) > > > [ 639.028129] [] (snd_pcm_ioctl) from [] > > > (ksys_ioctl+0x2ac/0xb98) > > > [ 639.035812] [] (ksys_ioctl) from [] > > > (ret_fast_syscall+0x0/0x28) > > > [ 639.043490] Exception stack(0xec529fa8 to 0xec529ff0) > > > [ 639.048565] 9fa0: bee84650 01321870 0004 > > > c25c4111 bee84650 0002000f > > > [ 639.056766] 9fc0: bee84650 01321870 01321820 0036 1f40 > > > 0002c2f8 0003 > > > [ 639.064964] 9fe0: b6f483fc bee8451c b6ee2655 b6e1dcf8 > > > [ 639.070339] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink > > > > > > Signed-off-by: Shengjiu Wang > > > --- > > > sound/soc/fsl/imx-pcm-dma.c | 173 +--- > > > 1 file changed, 159 insertions(+), 14 deletions(-) > > > > > > diff --git a/sound/soc/fsl/imx-pcm-dma.c b/sound/soc/fsl/imx-pcm-dma.c > > > index 04a9bc749016..dae53b384df4 100644 > > > --- a/sound/soc/fsl/imx-pcm-dma.c > > > +++ b/sound/soc/fsl/imx-pcm-dma.c > > > @@ -11,6 +11,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > @@ -29,24 +30,168 @@ static bool filter(struct dma_chan *chan, void > > > *param) > > > return true; > > > } > > > > > > -static const struct snd_dmaengine_pcm_config imx_dmaengine
Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open
On 15. 05. 20, 1:22, rana...@codeaurora.org wrote: > On 2020-05-13 00:04, Greg KH wrote: >> On Tue, May 12, 2020 at 02:39:50PM -0700, rana...@codeaurora.org wrote: >>> On 2020-05-12 01:25, Greg KH wrote: >>> > On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote: >>> > > commit bdb498c20040616e94b05c31a0ceb3e134b7e829 >>> > > Author: Jiri Slaby >>> > > Date: Tue Aug 7 21:48:04 2012 +0200 >>> > > >>> > > TTY: hvc_console, add tty install >>> > > >>> > > added hvc_install but did not move 'tty->driver_data = NULL;' from >>> > > hvc_open's fail path to hvc_cleanup. >>> > > >>> > > IOW hvc_open now NULLs tty->driver_data even for another task which >>> > > opened the tty earlier. The same holds for >>> > > "tty_port_tty_set(&hp->port, >>> > > NULL);" there. And actually "tty_port_put(&hp->port);" is also >>> > > incorrect >>> > > for the 2nd task opening the tty. >>> > > ... > These are the traces you get when the issue happens: > [ 154.212291] hvc_install called for pid: 666 > [ 154.216705] hvc_open called for pid: 666 > [ 154.233657] hvc_open: request_irq failed with rc -22. > [ 154.238934] hvc_open called for pid: 678 > [ 154.243012] Unable to handle kernel NULL pointer dereference at > virtual address 00c4 > # hvc_install isn't called for pid: 678 as the file wasn't closed yet. Nice. Does the attached help? I wonder how comes the tty_port_put in hvc_open does not cause a UAF? I would say hvc_open fails, tty_port_put is called. It decrements the reference taken in hvc_install. So far so good. Now, this should happen IMO: tty_open -> hvc_open (fails) -> tty_port_put -> tty_release -> tty_release_struct -> tty_kref_put -> queue_release_one_tty SCHEDULED WORKQUEUE release_one_tty -> hvc_cleanup -> tty_port_put (should die terrible death now) What am I missing? thanks, -- js suse labs >From d891cdfcbd3b41eb23ddfc8d9e6cbe038ff8fb72 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 20 May 2020 11:29:25 +0200 Subject: [PATCH] hvc_console: fix open Signed-off-by: Jiri Slaby --- drivers/tty/hvc/hvc_console.c | 23 --- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c index 436cc51c92c3..cdcc64ea2554 100644 --- a/drivers/tty/hvc/hvc_console.c +++ b/drivers/tty/hvc/hvc_console.c @@ -371,15 +371,14 @@ static int hvc_open(struct tty_struct *tty, struct file * filp) * tty fields and return the kref reference. */ if (rc) { - tty_port_tty_set(&hp->port, NULL); - tty->driver_data = NULL; - tty_port_put(&hp->port); printk(KERN_ERR "hvc_open: request_irq failed with rc %d.\n", rc); - } else + } else { /* We are ready... raise DTR/RTS */ if (C_BAUD(tty)) if (hp->ops->dtr_rts) hp->ops->dtr_rts(hp, 1); + tty_port_set_initialized(&hp->port, true); + } /* Force wakeup of the polling thread */ hvc_kick(); @@ -389,22 +388,12 @@ static int hvc_open(struct tty_struct *tty, struct file * filp) static void hvc_close(struct tty_struct *tty, struct file * filp) { - struct hvc_struct *hp; + struct hvc_struct *hp = tty->driver_data; unsigned long flags; if (tty_hung_up_p(filp)) return; - /* - * No driver_data means that this close was issued after a failed - * hvc_open by the tty layer's release_dev() function and we can just - * exit cleanly because the kref reference wasn't made. - */ - if (!tty->driver_data) - return; - - hp = tty->driver_data; - spin_lock_irqsave(&hp->port.lock, flags); if (--hp->port.count == 0) { @@ -412,6 +401,9 @@ static void hvc_close(struct tty_struct *tty, struct file * filp) /* We are done with the tty pointer now. */ tty_port_tty_set(&hp->port, NULL); + if (!tty_port_initialized(&hp->port)) + return; + if (C_HUPCL(tty)) if (hp->ops->dtr_rts) hp->ops->dtr_rts(hp, 0); @@ -428,6 +420,7 @@ static void hvc_close(struct tty_struct *tty, struct file * filp) * waking periodically to check chars_in_buffer(). */ tty_wait_until_sent(tty, HVC_CLOSE_WAIT); + tty_port_set_initialized(&hp->port, false); } else { if (hp->port.count < 0) printk(KERN_ERR "hvc_close %X: oops, count is %d\n", -- 2.26.2
[v2 1/2] dts: ppc: t4240rdb: remove interrupts property
From: Biwen Li This removes interrupts property to drop warning as follows: - $ hwclock.util-linux hwclock.util-linux: select() to /dev/rtc0 to wait for clock tick timed out My case: - RTC ds1374's INT pin is connected to VCC on T4240RDB, then the RTC cannot inform cpu about the alarm interrupt Signed-off-by: Biwen Li --- arch/powerpc/boot/dts/fsl/t4240rdb.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts b/arch/powerpc/boot/dts/fsl/t4240rdb.dts index a56a705d41f7..145896f2eef6 100644 --- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts +++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts @@ -144,7 +144,6 @@ rtc@68 { compatible = "dallas,ds1374"; reg = <0x68>; - interrupts = <0x1 0x1 0 0>; }; }; -- 2.17.1
[v2 2/2] dts: ppc: t1024rdb: remove interrupts property
From: Biwen Li This removes interrupts property to drop warning as follows: - $ hwclock.util-linux hwclock.util-linux: select() to /dev/rtc0 to wait for clock tick timed out My case: - RTC ds1339s INT pin isn't connected to cpus INT pin on T1024RDB, then the RTC cannot inform cpu about alarm interrupt How to fix it? - remove IRQ line Signed-off-by: Biwen Li --- arch/powerpc/boot/dts/fsl/t1024rdb.dts | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/boot/dts/fsl/t1024rdb.dts b/arch/powerpc/boot/dts/fsl/t1024rdb.dts index 645caff98ed1..605ceec66af3 100644 --- a/arch/powerpc/boot/dts/fsl/t1024rdb.dts +++ b/arch/powerpc/boot/dts/fsl/t1024rdb.dts @@ -161,7 +161,6 @@ rtc@68 { compatible = "dallas,ds1339"; reg = <0x68>; - interrupts = <0x1 0x1 0 0>; }; }; -- 2.17.1
[PATCH -next] scsi: ibmvscsi: Make some functions static
Fix the following warning: drivers/scsi/ibmvscsi/ibmvscsi.c:2387:12: warning: symbol 'ibmvscsi_module_init' was not declared. Should it be static? drivers/scsi/ibmvscsi/ibmvscsi.c:2409:13: warning: symbol 'ibmvscsi_module_exit' was not declared. Should it be static? Signed-off-by: Chen Tao --- drivers/scsi/ibmvscsi/ibmvscsi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index 59f0f1030c54..44e64aa21194 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -2384,7 +2384,7 @@ static struct vio_driver ibmvscsi_driver = { static struct srp_function_template ibmvscsi_transport_functions = { }; -int __init ibmvscsi_module_init(void) +static int __init ibmvscsi_module_init(void) { int ret; @@ -2406,7 +2406,7 @@ int __init ibmvscsi_module_init(void) return ret; } -void __exit ibmvscsi_module_exit(void) +static void __exit ibmvscsi_module_exit(void) { vio_unregister_driver(&ibmvscsi_driver); srp_release_transport(ibmvscsi_transport_template); -- 2.22.0
Re: [PATCH v2] tty: hvc: Fix data abort due to race in hvc_open
On 20. 05. 20, 8:47, Raghavendra Rao Ananta wrote: > Potentially, hvc_open() can be called in parallel when two tasks calls > open() on /dev/hvcX. In such a scenario, if the hp->ops->notifier_add() > callback in the function fails, where it sets the tty->driver_data to > NULL, the parallel hvc_open() can see this NULL and cause a memory abort. > Hence, do a NULL check at the beginning, before proceeding ahead. > > The issue can be easily reproduced by launching two tasks simultaneously > that does an open() call on /dev/hvcX. > For example: > $ cat /dev/hvc0 & cat /dev/hvc0 & > > Cc: sta...@vger.kernel.org > Signed-off-by: Raghavendra Rao Ananta > --- > drivers/tty/hvc/hvc_console.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c > index 436cc51c92c3..80709f754cc8 100644 > --- a/drivers/tty/hvc/hvc_console.c > +++ b/drivers/tty/hvc/hvc_console.c > @@ -350,6 +350,9 @@ static int hvc_open(struct tty_struct *tty, struct file * > filp) > unsigned long flags; > int rc = 0; > > + if (!hp) > + return -ENODEV; > + This is still not fixing the bug properly. See: https://lore.kernel.org/linuxppc-dev/0f7791f5-0a53-59f6-7277-247a789f3...@suse.cz/ In particular, the paragraph starting "IOW". thanks, -- js suse labs
[PATCH v4 6/6] asm-generic/tlb: avoid potential double flush
From: Peter Zijlstra commit 0758cd8304942292e95a0f750c374533db378b32 upstream Aneesh reported that: tlb_flush_mmu() tlb_flush_mmu_tlbonly() tlb_flush() <-- #1 tlb_flush_mmu_free() tlb_table_flush() tlb_table_invalidate() tlb_flush_mmu_tlbonly() tlb_flush() <-- #2 does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not clear tlb->end in that case. Observe that any caller to __tlb_adjust_range() also sets at least one of the tlb->freed_tables || tlb->cleared_p* bits, and those are unconditionally cleared by __tlb_reset_range(). Change the condition for actually issuing TLBI to having one of those bits set, as opposed to having tlb->end != 0. Link: http://lkml.kernel.org/r/20200116064531.483522-4-aneesh.ku...@linux.ibm.com Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Aneesh Kumar K.V Reported-by: "Aneesh Kumar K.V" Cc: # 4.19 Signed-off-by: Santosh Sivaraj [santosh: backported to 4.19 stable] --- include/asm-generic/tlb.h | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 19934cdd143e..427a70c56ddd 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -179,7 +179,12 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb) static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) { - if (!tlb->end) + /* +* Anything calling __tlb_adjust_range() also sets at least one of +* these bits. +*/ + if (!(tlb->freed_tables || tlb->cleared_ptes || tlb->cleared_pmds || + tlb->cleared_puds || tlb->cleared_p4ds)) return; tlb_flush(tlb); -- 2.25.4
[PATCH v4 5/6] mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush
From: Peter Zijlstra commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream Architectures for which we have hardware walkers of Linux page table should flush TLB on mmu gather batch allocation failures and batch flush. Some architectures like POWER supports multiple translation modes (hash and radix) and in the case of POWER only radix translation mode needs the above TLBI. This is because for hash translation mode kernel wants to avoid this extra flush since there are no hardware walkers of linux page table. With radix translation, the hardware also walks linux page table and with that, kernel needs to make sure to TLB invalidate page walk cache before page table pages are freed. More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE") The changes to sparc are to make sure we keep the old behavior since we are now removing HAVE_RCU_TABLE_NO_INVALIDATE. The default value for tlb_needs_table_invalidate is to always force an invalidate and sparc can avoid the table invalidate. Hence we define tlb_needs_table_invalidate to false for sparc architecture. Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.ku...@linux.ibm.com Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes") Signed-off-by: Peter Zijlstra (Intel) Cc: # 4.19 Signed-off-by: Santosh Sivaraj [santosh: backported to 4.19 stable] --- arch/Kconfig| 3 --- arch/powerpc/Kconfig| 1 - arch/powerpc/include/asm/tlb.h | 11 +++ arch/sparc/Kconfig | 1 - arch/sparc/include/asm/tlb_64.h | 9 + include/asm-generic/tlb.h | 15 +++ mm/memory.c | 16 7 files changed, 43 insertions(+), 13 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 061a12b8140e..3abbdb0cea44 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -363,9 +363,6 @@ config HAVE_ARCH_JUMP_LABEL config HAVE_RCU_TABLE_FREE bool -config HAVE_RCU_TABLE_NO_INVALIDATE - bool - config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1a00ce4b0040..e5bc0cfea2b1 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -216,7 +216,6 @@ config PPC select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE - select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN select HAVE_SYSCALL_TRACEPOINTS diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h index f0e571b2dc7c..63418275f402 100644 --- a/arch/powerpc/include/asm/tlb.h +++ b/arch/powerpc/include/asm/tlb.h @@ -30,6 +30,17 @@ #define tlb_remove_check_page_size_change tlb_remove_check_page_size_change extern void tlb_flush(struct mmu_gather *tlb); +/* + * book3s: + * Hash does not use the linux page-tables, so we can avoid + * the TLB invalidate for page-table freeing, Radix otoh does use the + * page-tables and needs the TLBI. + * + * nohash: + * We still do TLB invalidate in the __pte_free_tlb routine before we + * add the page table pages to mmu gather table batch. + */ +#define tlb_needs_table_invalidate() radix_enabled() /* Get the generic bits... */ #include diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index d90d632868aa..e6f2a38d2e61 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -64,7 +64,6 @@ config SPARC64 select HAVE_KRETPROBES select HAVE_KPROBES select HAVE_RCU_TABLE_FREE if SMP - select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_MEMBLOCK_NODE_MAP select HAVE_ARCH_TRANSPARENT_HUGEPAGE select HAVE_DYNAMIC_FTRACE diff --git a/arch/sparc/include/asm/tlb_64.h b/arch/sparc/include/asm/tlb_64.h index a2f3fa61ee36..8cb8f3833239 100644 --- a/arch/sparc/include/asm/tlb_64.h +++ b/arch/sparc/include/asm/tlb_64.h @@ -28,6 +28,15 @@ void flush_tlb_pending(void); #define __tlb_remove_tlb_entry(tlb, ptep, address) do { } while (0) #define tlb_flush(tlb) flush_tlb_pending() +/* + * SPARC64's hardware TLB fill does not use the Linux page-tables + * and therefore we don't need a TLBI when freeing page-table pages. + */ + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE +#define tlb_needs_table_invalidate() (false) +#endif + #include #endif /* _SPARC64_TLB_H */ diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index f2b9dc9cbaf8..19934cdd143e 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -61,8 +61,23 @@ struct mmu_table_batch { extern void tlb_table_flush(struct mmu_gather *tlb); extern void tlb_remove_table(struct mmu_gather *tlb, void *table); +/* + * This allows an architecture that does not use the linux page-tables for + * hardware to skip the TLBI when freeing page tables. + */ +#ifndef tlb_needs_table_i
[PATCH v4 4/6] powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
From: "Aneesh Kumar K.V" commit 12e4d53f3f04e81f9e83d6fc10edc7314ab9f6b9 upstream Patch series "Fixup page directory freeing", v4. This is a repost of patch series from Peter with the arch specific changes except ppc64 dropped. ppc64 changes are added here because we are redoing the patch series on top of ppc64 changes. This makes it easy to backport these changes. Only the first 2 patches need to be backported to stable. The thing is, on anything SMP, freeing page directories should observe the exact same order as normal page freeing: 1) unhook page/directory 2) TLB invalidate 3) free page/directory Without this, any concurrent page-table walk could end up with a Use-after-Free. This is esp. trivial for anything that has software page-table walkers (HAVE_FAST_GUP / software TLB fill) or the hardware caches partial page-walks (ie. caches page directories). Even on UP this might give issues since mmu_gather is preemptible these days. An interrupt or preempted task accessing user pages might stumble into the free page if the hardware caches page directories. This patch series fixes ppc64 and add generic MMU_GATHER changes to support the conversion of other architectures. I haven't added patches w.r.t other architecture because they are yet to be acked. This patch (of 9): A followup patch is going to make sure we correctly invalidate page walk cache before we free page table pages. In order to keep things simple enable RCU_TABLE_FREE even for !SMP so that we don't have to fixup the !SMP case differently in the followup patch !SMP case is right now broken for radix translation w.r.t page walk cache flush. We can get interrupted in between page table free and that would imply we have page walk cache entries pointing to tables which got freed already. Michael said "both our platforms that run on Power9 force SMP on in Kconfig, so the !SMP case is unlikely to be a problem for anyone in practice, unless they've hacked their kernel to build it !SMP." Link: http://lkml.kernel.org/r/20200116064531.483522-2-aneesh.ku...@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Cc: # 4.19 Signed-off-by: Santosh Sivaraj [santosh: backported for 4.19 stable] --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 -- arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 arch/powerpc/mm/pgtable-book3s64.c | 7 --- 5 files changed, 1 insertion(+), 26 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e09cfb109b8c..1a00ce4b0040 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -215,7 +215,7 @@ config PPC select HAVE_HARDLOCKUP_DETECTOR_PERFif PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !HAVE_HARDLOCKUP_DETECTOR_ARCH select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP - select HAVE_RCU_TABLE_FREE if SMP + select HAVE_RCU_TABLE_FREE select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h b/arch/powerpc/include/asm/book3s/32/pgalloc.h index 82e44b1a00ae..79ba3fbb512e 100644 --- a/arch/powerpc/include/asm/book3s/32/pgalloc.h +++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h @@ -110,7 +110,6 @@ static inline void pgtable_free(void *table, unsigned index_size) #define check_pgt_cache() do { } while (0) #define get_hugepd_cache_index(x) (x) -#ifdef CONFIG_SMP static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift) { @@ -127,13 +126,6 @@ static inline void __tlb_remove_table(void *_table) pgtable_free(table, shift); } -#else -static inline void pgtable_free_tlb(struct mmu_gather *tlb, - void *table, int shift) -{ - pgtable_free(table, shift); -} -#endif static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table, unsigned long address) diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h index f9019b579903..1013c0214213 100644 --- a/arch/powerpc/include/asm/book3s/64/pgalloc.h +++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h @@ -47,9 +47,7 @@ extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long); extern void pte_fragment_free(unsigned long *, int); extern void pmd_fragment_free(unsigned long *); extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift); -#ifdef CONFIG_SMP extern void __tlb_remove_table(void *_table); -#endif static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm) { diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h b/arch/powerpc/include/asm/nohash/32/pgalloc.h index 8825953c225b..96eed46d5684 100644 -
[PATCH v4 3/6] asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
From: Peter Zijlstra commit 96bc9567cbe112e9320250f01b9c060c882e8619 upstream Make issuing a TLB invalidate for page-table pages the normal case. The reason is twofold: - too many invalidates is safer than too few, - most architectures use the linux page-tables natively and would thus require this. Make it an opt-out, instead of an opt-in. No change in behavior intended. Signed-off-by: Peter Zijlstra (Intel) Cc: # 4.19 Signed-off-by: Santosh Sivaraj [santosh: prerequisite for upcoming tlbflush backports] --- arch/Kconfig | 2 +- arch/powerpc/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 - mm/memory.c | 2 +- 5 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index a336548487e6..061a12b8140e 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -363,7 +363,7 @@ config HAVE_ARCH_JUMP_LABEL config HAVE_RCU_TABLE_FREE bool -config HAVE_RCU_TABLE_INVALIDATE +config HAVE_RCU_TABLE_NO_INVALIDATE bool config ARCH_HAVE_NMI_SAFE_CMPXCHG diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 6f475dc5829b..e09cfb109b8c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -216,6 +216,7 @@ config PPC select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE if SMP + select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN select HAVE_SYSCALL_TRACEPOINTS diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index e6f2a38d2e61..d90d632868aa 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -64,6 +64,7 @@ config SPARC64 select HAVE_KRETPROBES select HAVE_KPROBES select HAVE_RCU_TABLE_FREE if SMP + select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_MEMBLOCK_NODE_MAP select HAVE_ARCH_TRANSPARENT_HUGEPAGE select HAVE_DYNAMIC_FTRACE diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index af35f5caadbe..181d0d522977 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -181,7 +181,6 @@ config X86 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE if PARAVIRT - select HAVE_RCU_TABLE_INVALIDATEif HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR diff --git a/mm/memory.c b/mm/memory.c index 1832c5ed6ac0..ba5689610c04 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -327,7 +327,7 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_ */ static inline void tlb_table_invalidate(struct mmu_gather *tlb) { -#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE +#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE /* * Invalidate page-table caches used by hardware walkers. Then we still * need to RCU-sched wait while freeing the pages because software -- 2.25.4
[PATCH v4 2/6] asm-generic/tlb: Track which levels of the page tables have been cleared
From: Will Deacon commit a6d60245d6d9b1caf66b0d94419988c4836980af upstream It is common for architectures with hugepage support to require only a single TLB invalidation operation per hugepage during unmap(), rather than iterating through the mapping at a PAGE_SIZE increment. Currently, however, the level in the page table where the unmap() operation occurs is not stored in the mmu_gather structure, therefore forcing architectures to issue additional TLB invalidation operations or to give up and over-invalidate by e.g. invalidating the entire TLB. Ideally, we could add an interval rbtree to the mmu_gather structure, which would allow us to associate the correct mapping granule with the various sub-mappings within the range being invalidated. However, this is costly in terms of book-keeping and memory management, so instead we approximate by keeping track of the page table levels that are cleared and provide a means to query the smallest granule required for invalidation. Signed-off-by: Will Deacon Cc: # 4.19 Signed-off-by: Santosh Sivaraj [santosh: prerequisite for upcoming tlbflush backports] --- include/asm-generic/tlb.h | 58 +-- mm/memory.c | 4 ++- 2 files changed, 53 insertions(+), 9 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 97306b32d8d2..f2b9dc9cbaf8 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -114,6 +114,14 @@ struct mmu_gather { */ unsigned intfreed_tables : 1; + /* +* at which levels have we cleared entries? +*/ + unsigned intcleared_ptes : 1; + unsigned intcleared_pmds : 1; + unsigned intcleared_puds : 1; + unsigned intcleared_p4ds : 1; + struct mmu_gather_batch *active; struct mmu_gather_batch local; struct page *__pages[MMU_GATHER_BUNDLE]; @@ -148,6 +156,10 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb) tlb->end = 0; } tlb->freed_tables = 0; + tlb->cleared_ptes = 0; + tlb->cleared_pmds = 0; + tlb->cleared_puds = 0; + tlb->cleared_p4ds = 0; } static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) @@ -197,6 +209,25 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, } #endif +static inline unsigned long tlb_get_unmap_shift(struct mmu_gather *tlb) +{ + if (tlb->cleared_ptes) + return PAGE_SHIFT; + if (tlb->cleared_pmds) + return PMD_SHIFT; + if (tlb->cleared_puds) + return PUD_SHIFT; + if (tlb->cleared_p4ds) + return P4D_SHIFT; + + return PAGE_SHIFT; +} + +static inline unsigned long tlb_get_unmap_size(struct mmu_gather *tlb) +{ + return 1UL << tlb_get_unmap_shift(tlb); +} + /* * In the case of tlb vma handling, we can optimise these away in the * case where we're doing a full MM flush. When we're doing a munmap, @@ -230,13 +261,19 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, #define tlb_remove_tlb_entry(tlb, ptep, address) \ do {\ __tlb_adjust_range(tlb, address, PAGE_SIZE);\ + tlb->cleared_ptes = 1; \ __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address)\ - do { \ - __tlb_adjust_range(tlb, address, huge_page_size(h)); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ +#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ + do {\ + unsigned long _sz = huge_page_size(h); \ + __tlb_adjust_range(tlb, address, _sz); \ + if (_sz == PMD_SIZE)\ + tlb->cleared_pmds = 1; \ + else if (_sz == PUD_SIZE) \ + tlb->cleared_puds = 1; \ + __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) /** @@ -250,6 +287,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, #define tlb_remove_pmd_tlb_entry(tlb, pmdp, address) \ do {\ __tlb_adjust_range(tlb, address, HPAGE_PMD_SIZE); \ + tlb->cleared_pmds = 1; \ __tlb_remove_pmd_tlb_entry(tlb, pmdp, address); \ } while (0) @@ -264,6 +302,7 @@ static inline void tlb_remove_check_page_size_ch
[PATCH v4 1/6] asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather
From: Peter Zijlstra commit 22a61c3c4f1379ef8b0ce0d5cb78baf3178950e2 upstream Some architectures require different TLB invalidation instructions depending on whether it is only the last-level of page table being changed, or whether there are also changes to the intermediate (directory) entries higher up the tree. Add a new bit to the flags bitfield in struct mmu_gather so that the architecture code can operate accordingly if it's the intermediate levels being invalidated. Signed-off-by: Peter Zijlstra Signed-off-by: Will Deacon Cc: # 4.19 Signed-off-by: Santosh Sivaraj [santosh: prerequisite for tlbflush backports] --- include/asm-generic/tlb.h | 31 +++ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index b3353e21f3b3..97306b32d8d2 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -97,12 +97,22 @@ struct mmu_gather { #endif unsigned long start; unsigned long end; - /* we are in the middle of an operation to clear -* a full mm and can make some optimizations */ - unsigned intfullmm : 1, - /* we have performed an operation which -* requires a complete flush of the tlb */ - need_flush_all : 1; + /* +* we are in the middle of an operation to clear +* a full mm and can make some optimizations +*/ + unsigned intfullmm : 1; + + /* +* we have performed an operation which +* requires a complete flush of the tlb +*/ + unsigned intneed_flush_all : 1; + + /* +* we have removed page directories +*/ + unsigned intfreed_tables : 1; struct mmu_gather_batch *active; struct mmu_gather_batch local; @@ -137,6 +147,7 @@ static inline void __tlb_reset_range(struct mmu_gather *tlb) tlb->start = TASK_SIZE; tlb->end = 0; } + tlb->freed_tables = 0; } static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) @@ -278,6 +289,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, #define pte_free_tlb(tlb, ptep, address) \ do {\ __tlb_adjust_range(tlb, address, PAGE_SIZE);\ + tlb->freed_tables = 1; \ __pte_free_tlb(tlb, ptep, address); \ } while (0) #endif @@ -285,7 +297,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, #ifndef pmd_free_tlb #define pmd_free_tlb(tlb, pmdp, address) \ do {\ - __tlb_adjust_range(tlb, address, PAGE_SIZE);\ + __tlb_adjust_range(tlb, address, PAGE_SIZE);\ + tlb->freed_tables = 1; \ __pmd_free_tlb(tlb, pmdp, address); \ } while (0) #endif @@ -295,6 +308,7 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, #define pud_free_tlb(tlb, pudp, address) \ do {\ __tlb_adjust_range(tlb, address, PAGE_SIZE);\ + tlb->freed_tables = 1; \ __pud_free_tlb(tlb, pudp, address); \ } while (0) #endif @@ -304,7 +318,8 @@ static inline void tlb_remove_check_page_size_change(struct mmu_gather *tlb, #ifndef p4d_free_tlb #define p4d_free_tlb(tlb, pudp, address) \ do {\ - __tlb_adjust_range(tlb, address, PAGE_SIZE);\ + __tlb_adjust_range(tlb, address, PAGE_SIZE);\ + tlb->freed_tables = 1; \ __p4d_free_tlb(tlb, pudp, address); \ } while (0) #endif -- 2.25.4
[PATCH v4 0/6] Memory corruption may occur due to incorrent tlb flush
The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC flushes) may result in random memory corruption. Any concurrent page-table walk could end up with a Use-after-Free. Even on UP this might give issues, since mmu_gather is preemptible these days. An interrupt or preempted task accessing user pages might stumble into the free page if the hardware caches page directories. The series is a backport of the fix sent by Peter [1]. The first three patches are dependencies for the last patch (avoid potential double flush). If the performance impact due to double flush is considered trivial then the first three patches and last patch may be dropped. This is only for v4.19 stable. -- Changelog: v2: Send the patches with the correct format (commit sha1 upstream) for stable v3: Fix compilation for ppc44x_defconfig and mpc885_ads_defconfig v4: No change, Resend. -- Aneesh Kumar K.V (1): powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case Peter Zijlstra (4): asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush asm-generic/tlb: avoid potential double flush Will Deacon (1): asm-generic/tlb: Track which levels of the page tables have been cleared arch/Kconfig | 3 - arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 -- arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 - arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 -- arch/powerpc/include/asm/tlb.h | 11 ++ arch/powerpc/mm/pgtable-book3s64.c | 7 -- arch/sparc/include/asm/tlb_64.h | 9 ++ arch/x86/Kconfig | 1 - include/asm-generic/tlb.h| 103 --- mm/memory.c | 20 ++-- 11 files changed, 122 insertions(+), 52 deletions(-) -- 2.25.4
Re: [PATCH] ASoC: fsl: imx-pcm-dma: Don't request dma channel in probe
Hi On Tue, May 19, 2020 at 6:04 PM Lucas Stach wrote: > > Am Dienstag, den 19.05.2020, 17:41 +0800 schrieb Shengjiu Wang: > > There are two requirements that we need to move the request > > of dma channel from probe to open. > > How do you handle -EPROBE_DEFER return code from the channel request if > you don't do it in probe? I use the dma_request_slave_channel or dma_request_channel instead of dmaengine_pcm_request_chan_of. so there should be not -EPROBE_DEFER return code. > > > - When dma device binds with power-domains, the power will > > be enabled when we request dma channel. If the request of dma > > channel happen on probe, then the power-domains will be always > > enabled after kernel boot up, which is not good for power > > saving, so we need to move the request of dma channel to .open(); > > This is certainly something which could be fixed in the dmaengine > driver. Dma driver always call the pm_runtime_get_sync in device_alloc_chan_resources, the device_alloc_chan_resources is called when channel is requested. so power is enabled on channel request. > > > - With FE-BE case, if the dma channel is requested in probe, > > then there will be below issue, which is caused by that the > > dma channel will be requested duplicately > > Why is this requested a second time? Is this just some missing cleanup > on a deferred probe path? Not relate with deferred probe. With DMA1->ASRC->DMA2->ESAI case, the DMA1->ASRC->DMA2 is in FE, ESAI is in BE. When ESAI drvier probe, DMA3 channel is created with ESAI's "dma:tx" (DMA3 channel is not used in this FE-BE case).When FE-BE startup, DMA2 channel is created, it needs the ESAI's "dma:tx", so below warning comes out. > > Regards, > Lucas > > > [ 638.906268] sysfs: cannot create duplicate filename > > '/devices/soc0/soc/200.bus/200.spba-bus/2024000.esai/dma:tx' > > [ 638.919061] CPU: 1 PID: 673 Comm: aplay Not tainted > > 5.7.0-rc1-12956-gfc64b2585593 #287 > > [ 638.927113] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > [ 638.933690] [] (unwind_backtrace) from [] > > (show_stack+0x10/0x14) > > [ 638.941464] [] (show_stack) from [] > > (dump_stack+0xe4/0x118) > > [ 638.948808] [] (dump_stack) from [] > > (sysfs_warn_dup+0x50/0x64) > > [ 638.956406] [] (sysfs_warn_dup) from [] > > (sysfs_do_create_link_sd+0xc8/0xd4) > > [ 638.965134] [] (sysfs_do_create_link_sd) from [] > > (dma_request_chan+0xb0/0x210) > > [ 638.974120] [] (dma_request_chan) from [] > > (dma_request_slave_channel+0x8/0x14) > > [ 638.983111] [] (dma_request_slave_channel) from [] > > (fsl_asrc_dma_hw_params+0x1e0/0x438) > > [ 638.992881] [] (fsl_asrc_dma_hw_params) from [] > > (soc_pcm_hw_params+0x4a0/0x6a8) > > [ 639.001952] [] (soc_pcm_hw_params) from [] > > (dpcm_fe_dai_hw_params+0x70/0xe4) > > [ 639.010765] [] (dpcm_fe_dai_hw_params) from [] > > (snd_pcm_hw_params+0x158/0x418) > > [ 639.019750] [] (snd_pcm_hw_params) from [] > > (snd_pcm_ioctl+0x734/0x183c) > > [ 639.028129] [] (snd_pcm_ioctl) from [] > > (ksys_ioctl+0x2ac/0xb98) > > [ 639.035812] [] (ksys_ioctl) from [] > > (ret_fast_syscall+0x0/0x28) > > [ 639.043490] Exception stack(0xec529fa8 to 0xec529ff0) > > [ 639.048565] 9fa0: bee84650 01321870 0004 c25c4111 > > bee84650 0002000f > > [ 639.056766] 9fc0: bee84650 01321870 01321820 0036 1f40 > > 0002c2f8 0003 > > [ 639.064964] 9fe0: b6f483fc bee8451c b6ee2655 b6e1dcf8 > > [ 639.070339] fsl-esai-dai 2024000.esai: Cannot create DMA dma:tx symlink > > > > Signed-off-by: Shengjiu Wang > > --- > > sound/soc/fsl/imx-pcm-dma.c | 173 +--- > > 1 file changed, 159 insertions(+), 14 deletions(-) > > > > diff --git a/sound/soc/fsl/imx-pcm-dma.c b/sound/soc/fsl/imx-pcm-dma.c > > index 04a9bc749016..dae53b384df4 100644 > > --- a/sound/soc/fsl/imx-pcm-dma.c > > +++ b/sound/soc/fsl/imx-pcm-dma.c > > @@ -11,6 +11,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -29,24 +30,168 @@ static bool filter(struct dma_chan *chan, void *param) > > return true; > > } > > > > -static const struct snd_dmaengine_pcm_config imx_dmaengine_pcm_config = { > > - .prepare_slave_config = snd_dmaengine_pcm_prepare_slave_config, > > - .compat_filter_fn = filter, > > -}; > > +static int imx_pcm_hw_params(struct snd_soc_component *component, > > + struct snd_pcm_substream *substream, > > + struct snd_pcm_hw_params *params) > > +{ > > + struct snd_pcm_runtime *runtime = substream->runtime; > > + struct snd_soc_pcm_runtime *rtd = substream->private_data; > > + struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0); > > + struct snd_dmaengine_dai_dma_data *dma_data; > > + struct dma_slave_config config; > > + struct dma_chan *chan; > > + int ret = 0; > > > > -int imx_pcm_dma_init(struct platform_device *pdev, size_t size) > > + sn
Re: [RESEND PATCH v7 4/5] ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
Vaibhav Jain writes: + > +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm > */ > +struct nd_pdsm_cmd_pkg { > + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */ > + __s32 cmd_status; /* Out: Sub-cmd status returned back */ > + __u16 payload_offset; /* In: offset from start of struct */ > + __u16 payload_version; /* In/Out: version of the payload */ > + __u8 payload[]; /* In/Out: Sub-cmd data buffer */ > +} __packed; that payload_offset can be avoided if we prevent userspace to user a different variant of nd_pdsm_cmd_pkg which different header. We can keep things simpler if we can always find payload at nd_pdsm_cmd_pkg->payload. > + > +/* > + * Methods to be embedded in ND_CMD_CALL request. These are sent to the > kernel > + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct > + */ > +enum papr_scm_pdsm { > + PAPR_SCM_PDSM_MIN = 0x0, > + PAPR_SCM_PDSM_MAX, > +}; > + > +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */ > +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg > *cmd) > +{ > + return (struct nd_pdsm_cmd_pkg *) cmd; > +} > + > +/* Return the payload pointer for a given pcmd */ > +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd) > +{ > + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0) > + return NULL; > + else > + return (void *)((__u8 *) pcmd + pcmd->payload_offset); > +} > + we need to make sure userspace is not passing a wrong payload_offset. and in the next patch you do + /* Copy the health struct to the payload */ + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize); + pkg->hdr.nd_fw_size = copysize; + All this can be simplified if you can keep payload at nd_pdsm_cmd_pkg->payload. If you still want to have the ability to extend the header, then added a reserved field similar to nd_cmd_pkg. -aneesh