Re: [PATCH bpf] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
Puranjay Mohan writes: > The Linux Kernel Memory Model [1][2] requires RMW operations that have a > return value to be fully ordered. > > BPF atomic operations with BPF_FETCH (including BPF_XCHG and > BPF_CMPXCHG) return a value back so they need to be JITed to fully > ordered operations. POWERPC currently emits relaxed operations for > these. Thanks for catching this. > diff --git a/arch/powerpc/net/bpf_jit_comp32.c > b/arch/powerpc/net/bpf_jit_comp32.c > index 2f39c50ca729..b635e5344e8a 100644 > --- a/arch/powerpc/net/bpf_jit_comp32.c > +++ b/arch/powerpc/net/bpf_jit_comp32.c > @@ -853,6 +853,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, > u32 *fimage, struct code > /* Get offset into TMP_REG */ > EMIT(PPC_RAW_LI(tmp_reg, off)); > tmp_idx = ctx->idx * 4; > + /* > + * Enforce full ordering for operations with BPF_FETCH > by emitting a 'sync' > + * before and after the operation. > + * > + * This is a requirement in the Linux Kernel Memory > Model. > + * See __cmpxchg_u64() in asm/cmpxchg.h as an example. > + */ > + if (imm & BPF_FETCH) > + EMIT(PPC_RAW_SYNC()); > /* load value from memory into r0 */ > EMIT(PPC_RAW_LWARX(_R0, tmp_reg, dst_reg, 0)); > > @@ -905,6 +914,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, > u32 *fimage, struct code > > /* For the BPF_FETCH variant, get old data into src_reg > */ > if (imm & BPF_FETCH) { > + /* Emit 'sync' to enforce full ordering */ > + EMIT(PPC_RAW_SYNC()); > EMIT(PPC_RAW_MR(ret_reg, ax_reg)); > if (!fp->aux->verifier_zext) > EMIT(PPC_RAW_LI(ret_reg - 1, 0)); /* > higher 32-bit */ On 32-bit there are non-SMP systems where those syncs will probably be expensive. I think just adding an IS_ENABLED(CONFIG_SMP) around the syncs is probably sufficient. Christophe? cheers
Re: [PATCH 0/4] ASoC: Constify static snd_pcm_hardware
On Mon, 29 Apr 2024 13:48:45 +0200, Krzysztof Kozlowski wrote: > No dependencies. > > Static 'struct snd_pcm_hardware' is not modified by few drivers and its > copy is passed to the core, so it can be made const for increased code > safety. > > Best regards, > Krzysztof > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [1/4] ASoC: qcom: Constify static snd_pcm_hardware commit: e6fa3509cb32df373b15212a99f69a6595efd1c3 [2/4] ASoC: fsl: Constify static snd_pcm_hardware commit: ed90156037659473ee95eafe3f72d8498e5384ff [3/4] ASoC: meson: Constify static snd_pcm_hardware commit: 7b5ce9f0c52a5885d34d46bba62e9eaedc3dd459 [4/4] ASoC: uniphier: Constify static snd_pcm_hardware commit: 74a15fabd271d0fd82ceecbbfa1b98ea0a4709dd All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
Re: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files
Hi Vignesh, kernel test robot noticed the following build warnings: [auto build test WARNING on kees/for-next/execve] [also build test WARNING on tip/x86/core kees/for-next/pstore kees/for-next/kspp linus/master v6.9-rc7 next-20240507] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Vignesh-Balasubramanian/x86-elf-Add-a-new-note-section-containing-Xfeatures-information-to-x86-core-files/20240507-175615 base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve patch link: https://lore.kernel.org/r/20240507095330.2674-2-vigbalas%40amd.com patch subject: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files config: i386-randconfig-013-20240508 (https://download.01.org/0day-ci/archive/20240508/202405080809.lgyhryu3-...@intel.com/config) compiler: clang version 18.1.4 (https://github.com/llvm/llvm-project e6c3289804a67ea0bb6a86fadbe454dd93b8d855) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240508/202405080809.lgyhryu3-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405080809.lgyhryu3-...@intel.com/ All warnings (new ones prefixed by >>): >> arch/x86/kernel/fpu/xstate.c:91:19: warning: unused variable 'owner_name' >> [-Wunused-const-variable] 91 | static const char owner_name[] = "LINUX"; | ^~ 1 warning generated. Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for DRM_I915_DEBUG_GEM Depends on [n]: HAS_IOMEM [=y] && DRM_I915 [=m] && EXPERT [=y] && DRM_I915_WERROR [=n] Selected by [m]: - DRM_I915_DEBUG [=y] && HAS_IOMEM [=y] && DRM_I915 [=m] && EXPERT [=y] && !COMPILE_TEST [=n] vim +/owner_name +91 arch/x86/kernel/fpu/xstate.c 90 > 91 static const char owner_name[] = "LINUX"; 92 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH RESEND v8 07/16] mm/execmem, arch: convert simple overrides of module_alloc to execmem
On Sun, 5 May 2024 19:06:19 +0300 Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" > > Several architectures override module_alloc() only to define address > range for code allocations different than VMALLOC address space. > > Provide a generic implementation in execmem that uses the parameters for > address space ranges, required alignment and page protections provided > by architectures. > > The architectures must fill execmem_info structure and implement > execmem_arch_setup() that returns a pointer to that structure. This way the > execmem initialization won't be called from every architecture, but rather > from a central place, namely a core_initcall() in execmem. > > The execmem provides execmem_alloc() API that wraps __vmalloc_node_range() > with the parameters defined by the architectures. If an architecture does > not implement execmem_arch_setup(), execmem_alloc() will fall back to > module_alloc(). > Looks good to me. Reviewed-by: Masami Hiramatsu (Google) Thanks, > Signed-off-by: Mike Rapoport (IBM) > Acked-by: Song Liu > --- > arch/loongarch/kernel/module.c | 19 -- > arch/mips/kernel/module.c | 20 -- > arch/nios2/kernel/module.c | 21 --- > arch/parisc/kernel/module.c| 24 > arch/riscv/kernel/module.c | 24 > arch/sparc/kernel/module.c | 20 -- > include/linux/execmem.h| 47 > mm/execmem.c | 67 -- > mm/mm_init.c | 2 + > 9 files changed, 210 insertions(+), 34 deletions(-) > > diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c > index c7d0338d12c1..ca6dd7ea1610 100644 > --- a/arch/loongarch/kernel/module.c > +++ b/arch/loongarch/kernel/module.c > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -490,10 +491,22 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char > *strtab, > return 0; > } > > -void *module_alloc(unsigned long size) > +static struct execmem_info execmem_info __ro_after_init; > + > +struct execmem_info __init *execmem_arch_setup(void) > { > - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, > __builtin_return_address(0)); > + execmem_info = (struct execmem_info){ > + .ranges = { > + [EXECMEM_DEFAULT] = { > + .start = MODULES_VADDR, > + .end= MODULES_END, > + .pgprot = PAGE_KERNEL, > + .alignment = 1, > + }, > + }, > + }; > + > + return &execmem_info; > } > > static void module_init_ftrace_plt(const Elf_Ehdr *hdr, > diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c > index 9a6c96014904..59225a3cf918 100644 > --- a/arch/mips/kernel/module.c > +++ b/arch/mips/kernel/module.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > #include > > struct mips_hi16 { > @@ -32,11 +33,22 @@ static LIST_HEAD(dbe_list); > static DEFINE_SPINLOCK(dbe_lock); > > #ifdef MODULES_VADDR > -void *module_alloc(unsigned long size) > +static struct execmem_info execmem_info __ro_after_init; > + > +struct execmem_info __init *execmem_arch_setup(void) > { > - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > - GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, > - __builtin_return_address(0)); > + execmem_info = (struct execmem_info){ > + .ranges = { > + [EXECMEM_DEFAULT] = { > + .start = MODULES_VADDR, > + .end= MODULES_END, > + .pgprot = PAGE_KERNEL, > + .alignment = 1, > + }, > + }, > + }; > + > + return &execmem_info; > } > #endif > > diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c > index 9c97b7513853..0d1ee86631fc 100644 > --- a/arch/nios2/kernel/module.c > +++ b/arch/nios2/kernel/module.c > @@ -18,15 +18,26 @@ > #include > #include > #include > +#include > > #include > > -void *module_alloc(unsigned long size) > +static struct execmem_info execmem_info __ro_after_init; > + > +struct execmem_info __init *execmem_arch_setup(void) > { > - return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END, > - GFP_KERNEL, PAGE_KERNEL_EXEC, > - VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, > - __builtin_return_address(0)); > + execmem_info = (struct execmem_info){ > + .ranges = { > + [EXECMEM_DEFAULT] = { > + .start = MODULES_VADDR,
Re: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files
Hi Vignesh, kernel test robot noticed the following build warnings: [auto build test WARNING on kees/for-next/execve] [also build test WARNING on tip/x86/core kees/for-next/pstore kees/for-next/kspp linus/master v6.9-rc7 next-20240507] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Vignesh-Balasubramanian/x86-elf-Add-a-new-note-section-containing-Xfeatures-information-to-x86-core-files/20240507-175615 base: https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve patch link: https://lore.kernel.org/r/20240507095330.2674-2-vigbalas%40amd.com patch subject: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files config: i386-buildonly-randconfig-006-20240508 (https://download.01.org/0day-ci/archive/20240508/202405080715.hyq1ae9v-...@intel.com/config) compiler: gcc-8 (Ubuntu 8.4.0-3ubuntu2) 8.4.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240508/202405080715.hyq1ae9v-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405080715.hyq1ae9v-...@intel.com/ All warnings (new ones prefixed by >>): >> arch/x86/kernel/fpu/xstate.c:91:19: warning: 'owner_name' defined but not >> used [-Wunused-const-variable=] static const char owner_name[] = "LINUX"; ^~ In file included from include/linux/string.h:369, from arch/x86/include/asm/page_32.h:18, from arch/x86/include/asm/page.h:14, from arch/x86/include/asm/processor.h:20, from arch/x86/include/asm/timex.h:5, from include/linux/timex.h:67, from include/linux/time32.h:13, from include/linux/time.h:60, from include/linux/compat.h:10, from arch/x86/kernel/fpu/xstate.c:8: In function 'fortify_memcpy_chk', inlined from 'membuf_write.isra.6' at include/linux/regset.h:42:3, inlined from '__copy_xstate_to_uabi_buf' at arch/x86/kernel/fpu/xstate.c:1049:2: include/linux/fortify-string.h:562:4: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? __read_overflow2_field(q_size_field, size); ^~ vim +/owner_name +91 arch/x86/kernel/fpu/xstate.c 90 > 91 static const char owner_name[] = "LINUX"; 92 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH RESEND v8 05/16] module: make module_memory_{alloc,free} more self-contained
On Sun, 5 May 2024 19:06:17 +0300 Mike Rapoport wrote: > From: "Mike Rapoport (IBM)" > > Move the logic related to the memory allocation and freeing into > module_memory_alloc() and module_memory_free(). > Looks good to me. Reviewed-by: Masami Hiramatsu (Google) Thanks, > Signed-off-by: Mike Rapoport (IBM) > Reviewed-by: Philippe Mathieu-Daudé > --- > kernel/module/main.c | 64 +++- > 1 file changed, 39 insertions(+), 25 deletions(-) > > diff --git a/kernel/module/main.c b/kernel/module/main.c > index e1e8a7a9d6c1..5b82b069e0d3 100644 > --- a/kernel/module/main.c > +++ b/kernel/module/main.c > @@ -1203,15 +1203,44 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type > type) > mod_mem_type_is_core_data(type); > } > > -static void *module_memory_alloc(unsigned int size, enum mod_mem_type type) > +static int module_memory_alloc(struct module *mod, enum mod_mem_type type) > { > + unsigned int size = PAGE_ALIGN(mod->mem[type].size); > + void *ptr; > + > + mod->mem[type].size = size; > + > if (mod_mem_use_vmalloc(type)) > - return vzalloc(size); > - return module_alloc(size); > + ptr = vmalloc(size); > + else > + ptr = module_alloc(size); > + > + if (!ptr) > + return -ENOMEM; > + > + /* > + * The pointer to these blocks of memory are stored on the module > + * structure and we keep that around so long as the module is > + * around. We only free that memory when we unload the module. > + * Just mark them as not being a leak then. The .init* ELF > + * sections *do* get freed after boot so we *could* treat them > + * slightly differently with kmemleak_ignore() and only grey > + * them out as they work as typical memory allocations which > + * *do* eventually get freed, but let's just keep things simple > + * and avoid *any* false positives. > + */ > + kmemleak_not_leak(ptr); > + > + memset(ptr, 0, size); > + mod->mem[type].base = ptr; > + > + return 0; > } > > -static void module_memory_free(void *ptr, enum mod_mem_type type) > +static void module_memory_free(struct module *mod, enum mod_mem_type type) > { > + void *ptr = mod->mem[type].base; > + > if (mod_mem_use_vmalloc(type)) > vfree(ptr); > else > @@ -1229,12 +1258,12 @@ static void free_mod_mem(struct module *mod) > /* Free lock-classes; relies on the preceding sync_rcu(). */ > lockdep_free_key_range(mod_mem->base, mod_mem->size); > if (mod_mem->size) > - module_memory_free(mod_mem->base, type); > + module_memory_free(mod, type); > } > > /* MOD_DATA hosts mod, so free it at last */ > lockdep_free_key_range(mod->mem[MOD_DATA].base, > mod->mem[MOD_DATA].size); > - module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA); > + module_memory_free(mod, MOD_DATA); > } > > /* Free a module, remove from lists, etc. */ > @@ -2225,7 +2254,6 @@ static int find_module_sections(struct module *mod, > struct load_info *info) > static int move_module(struct module *mod, struct load_info *info) > { > int i; > - void *ptr; > enum mod_mem_type t = 0; > int ret = -ENOMEM; > > @@ -2234,26 +2262,12 @@ static int move_module(struct module *mod, struct > load_info *info) > mod->mem[type].base = NULL; > continue; > } > - mod->mem[type].size = PAGE_ALIGN(mod->mem[type].size); > - ptr = module_memory_alloc(mod->mem[type].size, type); > - /* > - * The pointer to these blocks of memory are stored on the > module > - * structure and we keep that around so long as the module is > - * around. We only free that memory when we unload the > module. > - * Just mark them as not being a leak then. The .init* ELF > - * sections *do* get freed after boot so we *could* treat > them > - * slightly differently with kmemleak_ignore() and only grey > - * them out as they work as typical memory allocations which > - * *do* eventually get freed, but let's just keep things > simple > - * and avoid *any* false positives. > - */ > - kmemleak_not_leak(ptr); > - if (!ptr) { > + > + ret = module_memory_alloc(mod, type); > + if (ret) { > t = type; > goto out_enomem; > } > - memset(ptr, 0, mod->mem[type].size); > - mod->mem[type].base = ptr; > } > > /* Transfer each section which specifies SHF_ALLOC */ > @@ -2296,7 +2310,7 @@ static int move_module(struct module *mod, struct > load_info *info) > return 0; > out_enomem: >
[PATCH bpf] powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
The Linux Kernel Memory Model [1][2] requires RMW operations that have a return value to be fully ordered. BPF atomic operations with BPF_FETCH (including BPF_XCHG and BPF_CMPXCHG) return a value back so they need to be JITed to fully ordered operations. POWERPC currently emits relaxed operations for these. We can show this by running the following litmus-test: PPC SB+atomic_add+fetch { 0:r0=x; (* dst reg assuming offset is 0 *) 0:r1=2; (* src reg *) 0:r2=1; 0:r4=y; (* P0 writes to this, P1 reads this *) 0:r5=z; (* P1 writes to this, P0 reads this *) 0:r6=0; 1:r2=1; 1:r4=y; 1:r5=z; } P0 | P1; stw r2, 0(r4) | stw r2,0(r5) ; | ; loop:lwarx r3, r6, r0 | ; mr r8, r3 | ; add r3, r3, r1 | sync ; stwcx. r3, r6, r0 | ; bne loop| ; mr r1, r8 | ; | ; lwa r7, 0(r5) | lwa r7,0(r4) ; ~exists(0:r7=0 /\ 1:r7=0) Witnesses Positive: 9 Negative: 3 Condition ~exists (0:r7=0 /\ 1:r7=0) Observation SB+atomic_add+fetch Sometimes 3 9 This test shows that the older store in P0 is reordered with a newer load to a different address. Although there is a RMW operation with fetch between them. Adding a sync before and after RMW fixes the issue: Witnesses Positive: 9 Negative: 0 Condition ~exists (0:r7=0 /\ 1:r7=0) Observation SB+atomic_add+fetch Never 0 9 [1] https://www.kernel.org/doc/Documentation/memory-barriers.txt [2] https://www.kernel.org/doc/Documentation/atomic_t.txt Fixes: 65112709115f ("powerpc/bpf/64: add support for BPF_ATOMIC bitwise operations") Signed-off-by: Puranjay Mohan --- arch/powerpc/net/bpf_jit_comp32.c | 11 +++ arch/powerpc/net/bpf_jit_comp64.c | 11 +++ 2 files changed, 22 insertions(+) diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c index 2f39c50ca729..b635e5344e8a 100644 --- a/arch/powerpc/net/bpf_jit_comp32.c +++ b/arch/powerpc/net/bpf_jit_comp32.c @@ -853,6 +853,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code /* Get offset into TMP_REG */ EMIT(PPC_RAW_LI(tmp_reg, off)); tmp_idx = ctx->idx * 4; + /* +* Enforce full ordering for operations with BPF_FETCH by emitting a 'sync' +* before and after the operation. +* +* This is a requirement in the Linux Kernel Memory Model. +* See __cmpxchg_u64() in asm/cmpxchg.h as an example. +*/ + if (imm & BPF_FETCH) + EMIT(PPC_RAW_SYNC()); /* load value from memory into r0 */ EMIT(PPC_RAW_LWARX(_R0, tmp_reg, dst_reg, 0)); @@ -905,6 +914,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code /* For the BPF_FETCH variant, get old data into src_reg */ if (imm & BPF_FETCH) { + /* Emit 'sync' to enforce full ordering */ + EMIT(PPC_RAW_SYNC()); EMIT(PPC_RAW_MR(ret_reg, ax_reg)); if (!fp->aux->verifier_zext) EMIT(PPC_RAW_LI(ret_reg - 1, 0)); /* higher 32-bit */ diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index 79f23974a320..27026f19605d 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -804,6 +804,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code /* Get offset into TMP_REG_1 */ EMIT(PPC_RAW_LI(tmp1_reg, off)); tmp_idx = ctx->idx * 4; + /* +* Enforce full ordering for operations with BPF_FETCH by emitting a 'sync' +* before and after the operation. +* +* This is a requirement in the Linux Kernel Memory Model. +* See __cmpxchg_u64() in asm/cmpxchg.h as an example. +*/ + if (imm & BPF_FETCH) + EMIT(PPC_RAW_SYNC()); /* load value from memory into TMP_REG_2 */ if (size == BPF_DW) EMIT(PPC_RAW_LDARX(tmp2_reg, tmp1_reg, dst_reg, 0)); @@ -865,6 +874,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, u32 *fimage, struct code PPC_BCC_SHORT(COND_NE, tmp_idx); if (imm & BPF_FETCH) { +
Re: [PATCH 1/1] [RFC] ethernet: Convert from tasklet to BH workqueue
On Tue, May 07, 2024 at 07:01:11PM +, Allen Pais wrote: > The only generic interface to execute asynchronously in the BH context is > tasklet; however, it's marked deprecated and has some design flaws. To > replace tasklets, BH workqueue support was recently added. A BH workqueue > behaves similarly to regular workqueues except that the queued work items > are executed in the BH context. > > This patch converts drivers/ethernet/* from tasklet to BH workqueue. I doubt you're going to get many comments on this patch, being so large and spread across all drivers. I'm not going to bother trying to edit this down to something more sensible, I'll just plonk my comment here. For the mvpp2 driver, you're only updating a comment - and looking at it, the comment no longer reflects the code. It doesn't make use of tasklets at all. That makes the comment wrong whether or not it's updated. So I suggest rather than doing a search and replace for "tasklet" to "BH blahblah" (sorry, I don't remember what you replaced it with) just get rid of that bit of the comment. Thanks. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH 1/1] [RFC] ethernet: Convert from tasklet to BH workqueue
On Tue, May 7, 2024 at 12:23 PM Russell King (Oracle) wrote: > > On Tue, May 07, 2024 at 07:01:11PM +, Allen Pais wrote: > > The only generic interface to execute asynchronously in the BH context is > > tasklet; however, it's marked deprecated and has some design flaws. To > > replace tasklets, BH workqueue support was recently added. A BH workqueue > > behaves similarly to regular workqueues except that the queued work items > > are executed in the BH context. > > > > This patch converts drivers/ethernet/* from tasklet to BH workqueue. > > I doubt you're going to get many comments on this patch, being so large > and spread across all drivers. I'm not going to bother trying to edit > this down to something more sensible, I'll just plonk my comment here. > > For the mvpp2 driver, you're only updating a comment - and looking at > it, the comment no longer reflects the code. It doesn't make use of > tasklets at all. That makes the comment wrong whether or not it's > updated. So I suggest rather than doing a search and replace for > "tasklet" to "BH blahblah" (sorry, I don't remember what you replaced > it with) just get rid of that bit of the comment. > Thank you Russell. I will get rid of the comment. If it helps, I can create a patch for each driver. We did that in the past, with this series, I thought it would be easier to apply one patch. Thanks, - Allen
[PATCH 1/1] [RFC] ethernet: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/ethernet/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git disable_work-v1 Signed-off-by: Allen Pais --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx5/cq.c | 2 +- drivers/net/ethernet/alteon/acenic.c | 26 +++ drivers/net/ethernet/alteon/acenic.h | 7 +- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 30 drivers/net/ethernet/amd/xgbe/xgbe-i2c.c | 16 ++--- drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 16 ++--- drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 4 +- drivers/net/ethernet/amd/xgbe/xgbe.h | 11 +-- drivers/net/ethernet/broadcom/cnic.c | 19 ++--- drivers/net/ethernet/broadcom/cnic.h | 2 +- drivers/net/ethernet/cadence/macb.h | 3 +- drivers/net/ethernet/cadence/macb_main.c | 10 +-- .../net/ethernet/cavium/liquidio/lio_core.c | 4 +- .../net/ethernet/cavium/liquidio/lio_main.c | 25 +++ .../ethernet/cavium/liquidio/lio_vf_main.c| 10 +-- .../ethernet/cavium/liquidio/octeon_droq.c| 4 +- .../ethernet/cavium/liquidio/octeon_main.h| 5 +- .../net/ethernet/cavium/octeon/octeon_mgmt.c | 12 ++-- drivers/net/ethernet/cavium/thunder/nic.h | 5 +- .../net/ethernet/cavium/thunder/nicvf_main.c | 24 +++ .../ethernet/cavium/thunder/nicvf_queues.c| 5 +- .../ethernet/cavium/thunder/nicvf_queues.h| 3 +- drivers/net/ethernet/chelsio/cxgb/sge.c | 19 ++--- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h| 9 +-- .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- .../ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c | 4 +- .../net/ethernet/chelsio/cxgb4/cxgb4_uld.c| 2 +- drivers/net/ethernet/chelsio/cxgb4/sge.c | 41 +-- drivers/net/ethernet/chelsio/cxgb4vf/sge.c| 6 +- drivers/net/ethernet/dlink/sundance.c | 41 +-- .../net/ethernet/huawei/hinic/hinic_hw_cmdq.c | 2 +- .../net/ethernet/huawei/hinic/hinic_hw_eqs.c | 17 +++-- .../net/ethernet/huawei/hinic/hinic_hw_eqs.h | 2 +- drivers/net/ethernet/ibm/ehea/ehea.h | 3 +- drivers/net/ethernet/ibm/ehea/ehea_main.c | 14 ++-- drivers/net/ethernet/ibm/ibmvnic.c| 24 +++ drivers/net/ethernet/ibm/ibmvnic.h| 2 +- drivers/net/ethernet/jme.c| 72 +-- drivers/net/ethernet/jme.h| 9 +-- .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 +- drivers/net/ethernet/marvell/skge.c | 12 ++-- drivers/net/ethernet/marvell/skge.h | 3 +- drivers/net/ethernet/mediatek/mtk_wed_wo.c| 12 ++-- drivers/net/ethernet/mediatek/mtk_wed_wo.h| 3 +- drivers/net/ethernet/mellanox/mlx4/cq.c | 42 +-- drivers/net/ethernet/mellanox/mlx4/eq.c | 10 +-- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 11 +-- drivers/net/ethernet/mellanox/mlx5/core/cq.c | 38 +- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 12 ++-- .../ethernet/mellanox/mlx5/core/fpga/conn.c | 15 ++-- .../ethernet/mellanox/mlx5/core/fpga/conn.h | 3 +- .../net/ethernet/mellanox/mlx5/core/lib/eq.h | 11 +-- drivers/net/ethernet/mellanox/mlxsw/pci.c | 29 drivers/net/ethernet/micrel/ks8842.c | 29 drivers/net/ethernet/micrel/ksz884x.c | 37 +- drivers/net/ethernet/microchip/lan743x_ptp.c | 2 +- drivers/net/ethernet/natsemi/ns83820.c| 10 +-- drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 7 +- .../net/ethernet/netronome/nfp/nfd3/nfd3.h| 2 +- drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 6 +- .../net/ethernet/netronome/nfp/nfdk/nfdk.h| 3 +- drivers/net/ethernet/netronome/nfp/nfp_net.h | 4 +- .../ethernet/netronome/nfp/nfp_net_common.c | 12 ++-- .../net/ethernet/netronome/nfp/nfp_net_dp.h | 4 +- drivers/net/ethernet/ni/nixge.c | 19 ++--- drivers/net/ethernet/qlogic/qed/qed.h | 2 +- drivers/net/ethernet/qlogic/qed/qed_int.c | 6 +- drivers/net/ethernet/qlogic/qed/qed_int.h | 4 +- drivers/net/ethernet/qlogic/qed/qed_main.c| 20 +++--- drivers/net/ethernet/sfc/falcon/farch.c | 4 +- drivers/net/ethernet/sfc/falcon/net_driver.h | 2 +- drivers/net/ethernet/sfc/falcon/selftest.c| 2 +- drivers/net/ethernet/sfc/net_driver.h | 2 +- drivers/net/ethernet/sfc/selftest.c | 2 +- drivers/net/ethernet/sfc/siena/farch.c| 4 +- drivers/net/ethernet/sfc/siena/net_driver.h | 2 +- drivers/net/ethernet/sfc/siena/selftest.c | 2 +- drivers/net/ether
[PATCH 0/1] Convert tasklets to BH workqueues in ethernet drivers
This series focuses on converting the existing implementation of tasklets to bottom half (BH) workqueues across various Ethernet drivers under drivers/net/ethernet/*. Impact: The conversion is expected to maintain or improve the performance of the affected drivers. It also improves the maintainability and readability of the driver code. Testing: - Conducted standard network throughput and latency benchmarks to ensure performance parity or improvement. - Ran kernel regression tests to verify that changes do not introduce new issues. I appreciate your review and feedback on this patch series. And additional tested would be really helpful. Allen Pais (1): [RFC] ethernet: Convert from tasklet to BH workqueue drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx5/cq.c | 2 +- drivers/net/ethernet/alteon/acenic.c | 26 +++ drivers/net/ethernet/alteon/acenic.h | 7 +- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 30 drivers/net/ethernet/amd/xgbe/xgbe-i2c.c | 16 ++--- drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 16 ++--- drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 4 +- drivers/net/ethernet/amd/xgbe/xgbe.h | 11 +-- drivers/net/ethernet/broadcom/cnic.c | 19 ++--- drivers/net/ethernet/broadcom/cnic.h | 2 +- drivers/net/ethernet/cadence/macb.h | 3 +- drivers/net/ethernet/cadence/macb_main.c | 10 +-- .../net/ethernet/cavium/liquidio/lio_core.c | 4 +- .../net/ethernet/cavium/liquidio/lio_main.c | 25 +++ .../ethernet/cavium/liquidio/lio_vf_main.c| 10 +-- .../ethernet/cavium/liquidio/octeon_droq.c| 4 +- .../ethernet/cavium/liquidio/octeon_main.h| 5 +- .../net/ethernet/cavium/octeon/octeon_mgmt.c | 12 ++-- drivers/net/ethernet/cavium/thunder/nic.h | 5 +- .../net/ethernet/cavium/thunder/nicvf_main.c | 24 +++ .../ethernet/cavium/thunder/nicvf_queues.c| 5 +- .../ethernet/cavium/thunder/nicvf_queues.h| 3 +- drivers/net/ethernet/chelsio/cxgb/sge.c | 19 ++--- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h| 9 +-- .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- .../ethernet/chelsio/cxgb4/cxgb4_tc_mqprio.c | 4 +- .../net/ethernet/chelsio/cxgb4/cxgb4_uld.c| 2 +- drivers/net/ethernet/chelsio/cxgb4/sge.c | 41 +-- drivers/net/ethernet/chelsio/cxgb4vf/sge.c| 6 +- drivers/net/ethernet/dlink/sundance.c | 41 +-- .../net/ethernet/huawei/hinic/hinic_hw_cmdq.c | 2 +- .../net/ethernet/huawei/hinic/hinic_hw_eqs.c | 17 +++-- .../net/ethernet/huawei/hinic/hinic_hw_eqs.h | 2 +- drivers/net/ethernet/ibm/ehea/ehea.h | 3 +- drivers/net/ethernet/ibm/ehea/ehea_main.c | 14 ++-- drivers/net/ethernet/ibm/ibmvnic.c| 24 +++ drivers/net/ethernet/ibm/ibmvnic.h| 2 +- drivers/net/ethernet/jme.c| 72 +-- drivers/net/ethernet/jme.h| 9 +-- .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 2 +- drivers/net/ethernet/marvell/skge.c | 12 ++-- drivers/net/ethernet/marvell/skge.h | 3 +- drivers/net/ethernet/mediatek/mtk_wed_wo.c| 12 ++-- drivers/net/ethernet/mediatek/mtk_wed_wo.h| 3 +- drivers/net/ethernet/mellanox/mlx4/cq.c | 42 +-- drivers/net/ethernet/mellanox/mlx4/eq.c | 10 +-- drivers/net/ethernet/mellanox/mlx4/mlx4.h | 11 +-- drivers/net/ethernet/mellanox/mlx5/core/cq.c | 38 +- drivers/net/ethernet/mellanox/mlx5/core/eq.c | 12 ++-- .../ethernet/mellanox/mlx5/core/fpga/conn.c | 15 ++-- .../ethernet/mellanox/mlx5/core/fpga/conn.h | 3 +- .../net/ethernet/mellanox/mlx5/core/lib/eq.h | 11 +-- drivers/net/ethernet/mellanox/mlxsw/pci.c | 29 drivers/net/ethernet/micrel/ks8842.c | 29 drivers/net/ethernet/micrel/ksz884x.c | 37 +- drivers/net/ethernet/microchip/lan743x_ptp.c | 2 +- drivers/net/ethernet/natsemi/ns83820.c| 10 +-- drivers/net/ethernet/netronome/nfp/nfd3/dp.c | 7 +- .../net/ethernet/netronome/nfp/nfd3/nfd3.h| 2 +- drivers/net/ethernet/netronome/nfp/nfdk/dp.c | 6 +- .../net/ethernet/netronome/nfp/nfdk/nfdk.h| 3 +- drivers/net/ethernet/netronome/nfp/nfp_net.h | 4 +- .../ethernet/netronome/nfp/nfp_net_common.c | 12 ++-- .../net/ethernet/netronome/nfp/nfp_net_dp.h | 4 +- drivers/net/ethernet/ni/nixge.c | 19 ++--- drivers/net/ethernet/qlogic/qed/qed.h | 2 +- drivers/net/ethernet/qlogic/qed/qed_int.c | 6 +- drivers/net/ethernet/qlogic/qed/qed_int.h | 4 +- drivers/net/ethernet/qlogic/qed/qed_main.c| 20 +++--- drivers/net/ethernet/sfc/falcon/farch.c | 4 +- drivers/net/ethernet/sfc/falcon/net_driver.h | 2 +- drivers/net/ethernet/sfc/falcon/selftest.c| 2 +- drivers/net/ethernet/sfc/net_driver.h | 2 +- drivers/net/ethernet/sfc/selftest.c | 2 +- dr
Re: [PATCH v3] kprobe/ftrace: bail out if ftrace was killed
Christophe Leroy writes: > Le 01/05/2024 à 18:29, Stephen Brennan a écrit : >> If an error happens in ftrace, ftrace_kill() will prevent disarming >> kprobes. Eventually, the ftrace_ops associated with the kprobes will be >> freed, yet the kprobes will still be active, and when triggered, they >> will use the freed memory, likely resulting in a page fault and panic. >> >> This behavior can be reproduced quite easily, by creating a kprobe and >> then triggering a ftrace_kill(). For simplicity, we can simulate an >> ftrace error with a kernel module like [1]: >> >> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer >> >>sudo perf probe --add commit_creds >>sudo perf trace -e probe:commit_creds >># In another terminal >>make >>sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug >># Back to perf terminal >># ctrl-c >>sudo perf probe --del commit_creds >> >> After a short period, a page fault and panic would occur as the kprobe >> continues to execute and uses the freed ftrace_ops. While ftrace_kill() >> is supposed to be used only in extreme circumstances, it is invoked in >> FTRACE_WARN_ON() and so there are many places where an unexpected bug >> could be triggered, yet the system may continue operating, possibly >> without the administrator noticing. If ftrace_kill() does not panic the >> system, then we should do everything we can to continue operating, >> rather than leave a ticking time bomb. >> >> Signed-off-by: Stephen Brennan >> --- >> Changes in v3: >>Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled" >>variable and check it directly in the kprobe handlers. > > Isn't it safer to provide a fonction rather than a direct access to a > variable ? Is the concern that other code could modify this variable? If so, then I suppose the function call is safer. But the variable is not exported and I think built-in code can be trusted not to muck with it. Maybe I'm missing your point about safety though? > By the way, wouldn't it be more performant to use a static branch (jump > label) ? I agree with Steven's concern that text modification would unfortunately not be a good way to handle an error in text modification. Especially, I believe there could be deadlock risks, as static key enablement requires taking the text_mutex and the jump_label_mutex. I'd be concerned that the text_mutex could already be held in some situations where ftrace_kill() is called. But I'm not certain about that. Thanks for taking a look! Stephen
Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
Hi Jason, On 5/6/24 23:13, Jason Gunthorpe wrote: On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote: We have legacy workloads using VFIO in userspace/kvm guests running on downstream distro kernels. We want these workloads to be able to continue running on our arch. It has been broken since 2018, I don't find this reasoning entirely reasonable :\ Though upstream has been broken since 2018 for pSeries, the breaking patches got trickled into downstream distro kernels only in the last few years. The legacy workloads that were running on PowerNV with these downstream distros are now broken on the pSeries logical partitions without the fixes in this series. I firmly believe the refactoring in this patch series is a step in that direction. But fine, as long as we are going to fix it. PPC really needs this to be resolved to keep working. Thanks, We are working on it. Regards, Shivaprasad Jason
Re: [kvm-unit-tests PATCH v9 07/31] scripts: allow machine option to be specified in unittests.cfg
On 04/05/2024 14.28, Nicholas Piggin wrote: This allows different machines with different requirements to be supported by run_tests.sh, similarly to how different accelerators are handled. Acked-by: Thomas Huth Acked-by: Andrew Jones Signed-off-by: Nicholas Piggin --- docs/unittests.txt | 7 +++ scripts/common.bash | 8 ++-- scripts/runtime.bash | 16 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/docs/unittests.txt b/docs/unittests.txt index 7cf2c55ad..6449efd78 100644 --- a/docs/unittests.txt +++ b/docs/unittests.txt @@ -42,6 +42,13 @@ For / directories that support multiple architectures, this restricts the test to the specified arch. By default, the test will run on any architecture. +machine +--- +For those architectures that support multiple machine types, this restricts +the test to the specified machine. By default, the test will run on +any machine type. (Note, the machine can be specified with the MACHINE= +environment variable, and defaults to the architecture's default.) + smp --- smp = diff --git a/scripts/common.bash b/scripts/common.bash index 5e9ad53e2..3aa557c8c 100644 --- a/scripts/common.bash +++ b/scripts/common.bash @@ -10,6 +10,7 @@ function for_each_unittest() local opts local groups local arch + local machine local check local accel local timeout @@ -21,7 +22,7 @@ function for_each_unittest() if [[ "$line" =~ ^\[(.*)\]$ ]]; then rematch=${BASH_REMATCH[1]} if [ -n "${testname}" ]; then - $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout" + $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$machine" "$check" "$accel" "$timeout" fi testname=$rematch smp=1 @@ -29,6 +30,7 @@ function for_each_unittest() opts="" groups="" arch="" + machine="" check="" accel="" timeout="" @@ -58,6 +60,8 @@ function for_each_unittest() groups=${BASH_REMATCH[1]} elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then arch=${BASH_REMATCH[1]} + elif [[ $line =~ ^machine\ *=\ *(.*)$ ]]; then + machine=${BASH_REMATCH[1]} elif [[ $line =~ ^check\ *=\ *(.*)$ ]]; then check=${BASH_REMATCH[1]} elif [[ $line =~ ^accel\ *=\ *(.*)$ ]]; then @@ -67,7 +71,7 @@ function for_each_unittest() fi done if [ -n "${testname}" ]; then - $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout" + $(arch_cmd) "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$machine" "$check" "$accel" "$timeout" fi exec {fd}<&- } diff --git a/scripts/runtime.bash b/scripts/runtime.bash index 177b62166..0c96d6ea2 100644 --- a/scripts/runtime.bash +++ b/scripts/runtime.bash @@ -32,7 +32,7 @@ premature_failure() get_cmdline() { local kernel=$1 -echo "TESTNAME=$testname TIMEOUT=$timeout ACCEL=$accel $RUNTIME_arch_run $kernel -smp $smp $opts" +echo "TESTNAME=$testname TIMEOUT=$timeout MACHINE=$machine ACCEL=$accel $RUNTIME_arch_run $kernel -smp $smp $opts" } skip_nodefault() @@ -80,9 +80,10 @@ function run() local kernel="$4" local opts="$5" local arch="$6" -local check="${CHECK:-$7}" -local accel="$8" -local timeout="${9:-$TIMEOUT}" # unittests.cfg overrides the default +local machine="$7" +local check="${CHECK:-$8}" +local accel="$9" +local timeout="${10:-$TIMEOUT}" # unittests.cfg overrides the default if [ "${CONFIG_EFI}" == "y" ]; then kernel=${kernel/%.flat/.efi} @@ -116,6 +117,13 @@ function run() return 2 fi +if [ -n "$machine" ] && [ -n "$MACHINE" ] && [ "$machine" != "$MACHINE" ]; then +print_result "SKIP" $testname "" "$machine only" +return 2 +elif [ -n "$MACHINE" ]; then +machine="$MACHINE" +fi + if [ -n "$accel" ] && [ -n "$ACCEL" ] && [ "$accel" != "$ACCEL" ]; then print_result "SKIP" $testname "" "$accel only, but ACCEL=$ACCEL" return 2 For some reasons that I don't quite understand yet, this patch causes the "sieve" test to always timeout on the s390x runner, see e.g.: https://gitlab.com/thuth/kvm-unit-tests/-/jobs/6798954987 Everything is fine in the previous patches (I pushed now the previous 5 patches to the repo): https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/pipelines/1281919104 Could it be that he TIME
Re: [PATCH V2 3/9] tools/perf: Fix a comment about multi_regs in extract_reg_offset function
On Mon, May 06, 2024 at 09:40:15PM -0700, Namhyung Kim wrote: > On Mon, May 6, 2024 at 5:19 AM Athira Rajeev > wrote: > > > > Fix a comment in function which explains how multi_regs field gets set > > for an instruction. In the example, "mov %rsi, 8(%rbx,%rcx,4)", the > > comment mistakenly referred to "dst_multi_regs = 0". Correct it to use > > "src_multi_regs = 0" > > > > Signed-off-by: Athira Rajeev > > Acked-by: Namhyung Kim Cherry picked this one into perf-tools-next. Thanks, - Arnaldo
Re: [PATCH v18 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug
On Tue, 26 Mar 2024 11:24:07 +0530, Sourabh Jain wrote: > Commit 247262756121 ("crash: add generic infrastructure for crash > hotplug support") added a generic infrastructure that allows > architectures to selectively update the kdump image component during CPU > or memory add/remove events within the kernel itself. > > This patch series adds crash hotplug handler for PowerPC and enable > support to update the kdump image on CPU/Memory add/remove events. > > [...] Applied to powerpc/topic/kdump-hotplug. [1/6] crash: forward memory_notify arg to arch crash hotplug handler https://git.kernel.org/powerpc/c/118005713e35a1893c6ee47ab2926cca277737de [2/6] crash: add a new kexec flag for hotplug support https://git.kernel.org/powerpc/c/79365026f86948b52c3cb7bf099dded92c559b4c [3/6] powerpc/kexec: move *_memory_ranges functions to ranges.c https://git.kernel.org/powerpc/c/f5f0da5a7b18fab383bac92044fd8f4f288c9d38 [4/6] PowerPC/kexec: make the update_cpus_node() function public https://git.kernel.org/powerpc/c/0857beff9c1ec8bb421a8b7a721da0f34cc886c0 [5/6] powerpc/crash: add crash CPU hotplug support https://git.kernel.org/powerpc/c/b741092d59761b98781fcb4f3f521312ed8d5006 [6/6] powerpc/crash: add crash memory hotplug support https://git.kernel.org/powerpc/c/849599b702ef8977fcd5b2f27c61ef773c42bb88 cheers
Re: [PATCH] powerpc/crash: remove unnecessary NULL check before kvfree()
On Thu, 02 May 2024 23:50:40 +0530, Sourabh Jain wrote: > Fix the following coccicheck build warning: > > arch/powerpc/kexec/crash.c:488:2-8: WARNING: NULL check before some > freeing functions is not needed. > > Applied to powerpc/topic/kdump-hotplug. [1/1] powerpc/crash: remove unnecessary NULL check before kvfree() https://git.kernel.org/powerpc/c/9803af291162dbca4b9773586a3f5c392f0dd974 cheers
Re: [PATCH 00/13] ASoC: Use snd_soc_substream_to_rtd() for accessing private_data
On Tue, 30 Apr 2024 16:02:09 +0200, Krzysztof Kozlowski wrote: > Do not open-code snd_soc_substream_to_rtd() when accessing > snd_pcm_substream->private_data. This makes code more consistent with > rest of ASoC and allows in the future to move the field to any other > place or add additional checks in snd_soc_substream_to_rtd(). > > Best regards, > Krzysztof > > [...] Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next Thanks! [01/13] ASoC: qcom: Use snd_soc_substream_to_rtd() for accessing private_data commit: 77678a25d1ecf70dc1d7ea2c0ab7609af15b83d3 [02/13] ASoC: tegra: Use snd_soc_substream_to_rtd() for accessing private_data commit: 3beb985abbf29e660edd1708f8a120ae9bbbddc3 [03/13] ASoC: ti: Use snd_soc_substream_to_rtd() for accessing private_data commit: 72a666f47f958a57db16b6bdd9ed385674069693 [04/13] ASoC: arm: Use snd_soc_substream_to_rtd() for accessing private_data commit: a80f2f8443a4ae10c568566f57fe704ea52c5bdb [05/13] ASoC: amd: Use snd_soc_substream_to_rtd() for accessing private_data commit: a84d84077512fc64cf1fc2292a3638690a026737 [06/13] ASoC: fsl: Use snd_soc_substream_to_rtd() for accessing private_data commit: b695d8be5bba9897ee670ec102ca608ecaf625c4 [07/13] ASoC: img: Use snd_soc_substream_to_rtd() for accessing private_data commit: 3b62178720594e08bdf8a87515ccca0328fe41fe [08/13] ASoC: kirkwood: Use snd_soc_substream_to_rtd() for accessing private_data commit: fe42c3b75b93dee9a4010e2297f1783e48684af7 [09/13] ASoC: loongson: Use snd_soc_substream_to_rtd() for accessing private_data commit: ffad75cebb865fef6f8e40f921c08c79a8faf7e3 [10/13] ASoC: mediatek: Use snd_soc_substream_to_rtd() for accessing private_data commit: 410a45140fb76709cf2bbad84bc8a731acf632c8 [11/13] ASoC: meson: Use snd_soc_substream_to_rtd() for accessing private_data commit: 22f5680a9cbc7388f97e5386c15c325d6961b958 [12/13] ASoC: samsung: Use snd_soc_substream_to_rtd() for accessing private_data commit: 3e726593107d134221f666b4f2be612b278c3ddb [13/13] ASoC: sunxi: Use snd_soc_substream_to_rtd() for accessing private_data commit: 47aa51677c975a5f66bc93d1c527e8878cf34d6c All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark
[PATCH] macintosh/ams: Fix unused variable warning
If both CONFIG_SENSORS_AMS_PMU and CONFIG_SENSORS_AMS_I2C are unset, there is an unused variable warning in the ams driver: drivers/macintosh/ams/ams-core.c: In function 'ams_init': drivers/macintosh/ams/ams-core.c:181:29: warning: unused variable 'np' 181 | struct device_node *np; The driver needs at least one of the configs enabled in order to actually function. So fix the compiler warning by ensuring at least one of the configs is enabled. Suggested-by: Christophe Leroy Signed-off-by: Michael Ellerman --- drivers/macintosh/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/macintosh/Kconfig b/drivers/macintosh/Kconfig index a0e717a986dc..fb38f68f 100644 --- a/drivers/macintosh/Kconfig +++ b/drivers/macintosh/Kconfig @@ -262,7 +262,7 @@ config SENSORS_AMS will be called ams. config SENSORS_AMS_PMU - bool "PMU variant" + bool "PMU variant" if SENSORS_AMS_I2C depends on SENSORS_AMS && ADB_PMU default y help -- 2.45.0
Re: [kvm-unit-tests PATCH v9 17/31] powerpc: Add cpu_relax
On 04/05/2024 14.28, Nicholas Piggin wrote: Add a cpu_relax variant that uses SMT priority nop instructions like Linux. This was split out of the SMP patch because it affects the sprs test case. Signed-off-by: Nicholas Piggin --- lib/ppc64/asm/barrier.h | 1 + powerpc/sprs.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) Reviewed-by: Thomas Huth
Re: [PATCH 1/2] radix/kfence: map __kfence_pool at page granularity
Le 24/04/2024 à 13:09, Hari Bathini a écrit : > When KFENCE is enabled, total system memory is mapped at page level > granularity. But in radix MMU mode, ~3GB additional memory is needed > to map 100GB of system memory at page level granularity when compared > to using 2MB direct mapping. This is not desired considering KFENCE is > designed to be enabled in production kernels [1]. Also, mapping memory > allocated for KFENCE pool at page granularity seems sufficient enough > to enable KFENCE support. So, allocate __kfence_pool during bootup and > map it at page granularity instead of mapping all system memory at > page granularity. That seems to be more or less copied from ARM64 ? Is that the best approach ? Can't you implement arch_kfence_init_pool() instead ? Also, it seems your patch only addresses PPC64. The same should be done for PPC32 and there are probably parts that should be common. > > Without patch: > # cat /proc/meminfo > MemTotal: 101201920 kB > > With patch: > # cat /proc/meminfo > MemTotal: 104483904 kB > > All kfence_test.c testcases passed with this patch. > > [1] https://lore.kernel.org/all/20201103175841.3495947-2-el...@google.com/ > > Signed-off-by: Hari Bathini > --- > arch/powerpc/include/asm/kfence.h| 5 > arch/powerpc/mm/book3s64/radix_pgtable.c | 34 ++-- > arch/powerpc/mm/init_64.c| 14 ++ > 3 files changed, 45 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/include/asm/kfence.h > b/arch/powerpc/include/asm/kfence.h > index 424ceef82ae6..18ec2b06ba1e 100644 > --- a/arch/powerpc/include/asm/kfence.h > +++ b/arch/powerpc/include/asm/kfence.h > @@ -8,6 +8,7 @@ > #ifndef __ASM_POWERPC_KFENCE_H > #define __ASM_POWERPC_KFENCE_H > > +#include Why do you need that ? It can't be needed by the extern bool you are adding below. If it is needed by some C file that includes asm/kfence.h, it should include linux/kfence.h by itself, see for instance mm/kfence/kfence_test.c and mm/kfence/core.c > #include > #include > > @@ -15,6 +16,10 @@ > #define ARCH_FUNC_PREFIX "." > #endif > > +#ifdef CONFIG_KFENCE > +extern bool kfence_early_init; > +#endif > + > static inline bool arch_kfence_init_pool(void) > { > return true; > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index 15e88f1439ec..fccbf92f279b 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include > > @@ -291,9 +292,8 @@ static unsigned long next_boundary(unsigned long addr, > unsigned long end) > return end; > } > > -static int __meminit create_physical_mapping(unsigned long start, > - unsigned long end, > - int nid, pgprot_t _prot) > +static int __meminit create_physical_mapping(unsigned long start, unsigned > long end, int nid, > + pgprot_t _prot, unsigned long > mapping_sz_limit) > { > unsigned long vaddr, addr, mapping_size = 0; > bool prev_exec, exec = false; > @@ -301,7 +301,10 @@ static int __meminit create_physical_mapping(unsigned > long start, > int psize; > unsigned long max_mapping_size = memory_block_size; > > - if (debug_pagealloc_enabled_or_kfence()) > + if (mapping_sz_limit < max_mapping_size) > + max_mapping_size = mapping_sz_limit; > + > + if (debug_pagealloc_enabled()) > max_mapping_size = PAGE_SIZE; > > start = ALIGN(start, PAGE_SIZE); > @@ -358,6 +361,7 @@ static int __meminit create_physical_mapping(unsigned > long start, > > static void __init radix_init_pgtable(void) > { > + phys_addr_t kfence_pool __maybe_unused; Don't do that. Avoid using __maybe_unused. Instead, declare this var where it is used. > unsigned long rts_field; > phys_addr_t start, end; > u64 i; > @@ -365,6 +369,13 @@ static void __init radix_init_pgtable(void) > /* We don't support slb for radix */ > slb_set_size(0); > > +#ifdef CONFIG_KFENCE > + if (kfence_early_init) { Declare kfence_pool here. > + kfence_pool = memblock_phys_alloc(KFENCE_POOL_SIZE, PAGE_SIZE); > + memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE); > + } > +#endif > + > /* >* Create the linear mapping >*/ > @@ -380,10 +391,18 @@ static void __init radix_init_pgtable(void) > continue; > } > > - WARN_ON(create_physical_mapping(start, end, > - -1, PAGE_KERNEL)); > + WARN_ON(create_physical_mapping(start, end, -1, PAGE_KERNEL, > ~0UL)); > } > > +#ifdef CONFIG_KFENCE > + if (kfence_early_init) { >
[powerpc:topic/kdump-hotplug] BUILD SUCCESS 9803af291162dbca4b9773586a3f5c392f0dd974
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/kdump-hotplug branch HEAD: 9803af291162dbca4b9773586a3f5c392f0dd974 powerpc/crash: remove unnecessary NULL check before kvfree() elapsed time: 1453m configs tested: 180 configs skipped: 3 The following configs have been built successfully. More configs may be tested in the coming days. tested configs: alpha allnoconfig gcc alphaallyesconfig gcc alpha defconfig gcc arc allmodconfig gcc arc allnoconfig gcc arc allyesconfig gcc arc defconfig gcc arc nsimosci_hs_defconfig gcc arc randconfig-001-20240507 gcc arc randconfig-002-20240507 gcc arm allmodconfig gcc arm allnoconfig clang arm allyesconfig gcc arm am200epdkit_defconfig gcc arm defconfig clang armdove_defconfig gcc arm randconfig-001-20240507 gcc arm randconfig-002-20240507 clang arm randconfig-003-20240507 gcc arm randconfig-004-20240507 clang arm sp7021_defconfig gcc armvt8500_v6_v7_defconfig gcc arm64allmodconfig clang arm64 allnoconfig gcc arm64 defconfig gcc arm64 randconfig-001-20240507 clang arm64 randconfig-002-20240507 clang arm64 randconfig-003-20240507 clang arm64 randconfig-004-20240507 clang csky allmodconfig gcc csky allnoconfig gcc csky allyesconfig gcc cskydefconfig gcc csky randconfig-001-20240507 gcc csky randconfig-002-20240507 gcc hexagon allmodconfig clang hexagon allnoconfig clang hexagon allyesconfig clang hexagon defconfig clang hexagon randconfig-001-20240507 clang hexagon randconfig-002-20240507 clang i386 allmodconfig gcc i386 allnoconfig gcc i386 allyesconfig gcc i386 buildonly-randconfig-001-20240506 gcc i386 buildonly-randconfig-002-20240506 clang i386 buildonly-randconfig-003-20240506 gcc i386 buildonly-randconfig-004-20240506 gcc i386 buildonly-randconfig-005-20240506 gcc i386 buildonly-randconfig-006-20240506 clang i386defconfig clang i386 randconfig-001-20240506 gcc i386 randconfig-002-20240506 clang i386 randconfig-003-20240506 gcc i386 randconfig-004-20240506 clang i386 randconfig-005-20240506 clang i386 randconfig-006-20240506 gcc i386 randconfig-011-20240506 gcc i386 randconfig-012-20240506 gcc i386 randconfig-013-20240506 gcc i386 randconfig-014-20240506 clang i386 randconfig-015-20240506 clang i386 randconfig-016-20240506 clang loongarchallmodconfig gcc loongarch allnoconfig gcc loongarch defconfig gcc loongarch randconfig-001-20240507 gcc loongarch randconfig-002-20240507 gcc m68k allmodconfig gcc m68k allnoconfig gcc m68k allyesconfig gcc m68kdefconfig gcc m68km5407c3_defconfig gcc microblaze allmodconfig gcc microblazeallnoconfig gcc microblaze allyesconfig gcc microblaze defconfig gcc mips allnoconfig gcc mips allyesconfig gcc mips ath25_defconfig clang mipsmaltaup_xpa_defconfig gcc mipsqi_lb60_defconfig clang nios2allmodconfig gcc nios2 allnoconfig gcc nios2allyesconfig gcc nios2
Re: [kvm-unit-tests PATCH v9 12/31] powerpc: general interrupt tests
On 04/05/2024 14.28, Nicholas Piggin wrote: Add basic testing of various kinds of interrupts, machine check, page fault, illegal, decrementer, trace, syscall, etc. This has a known failure on QEMU TCG pseries machines where MSR[ME] can be incorrectly set to 0. Signed-off-by: Nicholas Piggin --- lib/powerpc/asm/processor.h | 4 + lib/powerpc/asm/reg.h | 17 ++ lib/powerpc/setup.c | 11 + lib/ppc64/asm/ptrace.h | 16 ++ powerpc/Makefile.common | 3 +- powerpc/interrupts.c| 414 powerpc/unittests.cfg | 3 + 7 files changed, 467 insertions(+), 1 deletion(-) create mode 100644 powerpc/interrupts.c Acked-by: Thomas Huth
Re: [kvm-unit-tests PATCH v9 03/31] powerpc: Mark known failing tests as kfail
On 07/05/2024 06.07, Nicholas Piggin wrote: On Mon May 6, 2024 at 5:37 PM AEST, Thomas Huth wrote: On 04/05/2024 14.28, Nicholas Piggin wrote: Mark the failing h_cede_tm and spapr_vpa tests as kfail. Signed-off-by: Nicholas Piggin --- powerpc/spapr_vpa.c | 3 ++- powerpc/tm.c| 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/powerpc/spapr_vpa.c b/powerpc/spapr_vpa.c index c2075e157..46fa0485c 100644 --- a/powerpc/spapr_vpa.c +++ b/powerpc/spapr_vpa.c @@ -150,7 +150,8 @@ static void test_vpa(void) report_fail("Could not deregister after registration"); disp_count1 = be32_to_cpu(vpa->vp_dispatch_count); - report(disp_count1 % 2 == 1, "Dispatch count is odd after deregister"); + /* TCG known fail, could be wrong test, must verify against PowerVM */ + report_kfail(true, disp_count1 % 2 == 1, "Dispatch count is odd after deregister"); Using "true" as first argument looks rather pointless - then you could also simply delete the test completely if it can never be tested reliably. Thus could you please introduce a helper function is_tcg() that could be used to check whether we run under TCG (and not KVM)? I think you could check for "linux,kvm" in the "compatible" property in /hypervisor in the device tree to see whether we're running in KVM mode or in TCG mode. This I added in patch 30. The reason for the suboptimal patch ordering was just me being lazy and avoiding rebasing annoyance. I'd written a bunch of failing test cases for QEMU work, but hadn't done the kvm/tcg test yet. It had a few conflicts so I put it at the end... can rebase if you'd really prefer. Ah, ok, no need to rebase then, as long it's there in the end, it's fine. Thanks, Thomas
[PATCH v7] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests
PAPR hypervisor has introduced three new counters in the VPA area of LPAR CPUs for KVM L2 guest (see [1] for terminology) observability - 2 for context switches from host to guest and vice versa, and 1 counter for getting the total time spent inside the KVM guest. Add a tracepoint that enables reading the counters for use by ftrace/perf. Note that this tracepoint is only available for nestedv2 API (i.e, KVM on PowerVM). [1] Terminology: a. L1 refers to the VM (LPAR) booted on top of PAPR hypervisor b. L2 refers to the KVM guest booted on top of L1. Acked-by: Naveen N Rao Signed-off-by: Vaibhav Jain Signed-off-by: Gautam Menghani --- v6 -> v7: 1. Use TRACE_EVENT_FN_COND to handle zero counters case. 2. Use for_each_present_cpu() to handle hotplugs. v5 -> v6: 1. Use TRACE_EVENT_FN to enable/disable counters only once. 2. Remove the agg. counters from vcpu->arch. 3. Use PACA to maintain old counter values instead of zeroing on every entry. 4. Simplify variable names v4 -> v5: 1. Define helper functions for getting/setting the accumulation counter in L2's VPA v3 -> v4: 1. After vcpu_run, check the VPA flag instead of checking for tracepoint being enabled for disabling the cs time accumulation. v2 -> v3: 1. Move the counter disabling and zeroing code to a different function. 2. Move the get_lppaca() inside the tracepoint_enabled() branch. 3. Add the aggregation logic to maintain total context switch time. v1 -> v2: 1. Fix the build error due to invalid struct member reference. arch/powerpc/include/asm/lppaca.h | 11 +-- arch/powerpc/include/asm/paca.h | 5 +++ arch/powerpc/kvm/book3s_hv.c | 52 +++ arch/powerpc/kvm/trace_hv.h | 29 + 4 files changed, 94 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h index 61ec2447dabf..f40a646bee3c 100644 --- a/arch/powerpc/include/asm/lppaca.h +++ b/arch/powerpc/include/asm/lppaca.h @@ -62,7 +62,8 @@ struct lppaca { u8 donate_dedicated_cpu; /* Donate dedicated CPU cycles */ u8 fpregs_in_use; u8 pmcregs_in_use; - u8 reserved8[28]; + u8 l2_counters_enable; /* Enable usage of counters for KVM guest */ + u8 reserved8[27]; __be64 wait_state_cycles; /* Wait cycles for this proc */ u8 reserved9[28]; __be16 slb_count; /* # of SLBs to maintain */ @@ -92,9 +93,13 @@ struct lppaca { /* cacheline 4-5 */ __be32 page_ins; /* CMO Hint - # page ins by OS */ - u8 reserved12[148]; + u8 reserved12[28]; + volatile __be64 l1_to_l2_cs_tb; + volatile __be64 l2_to_l1_cs_tb; + volatile __be64 l2_runtime_tb; + u8 reserved13[96]; volatile __be64 dtl_idx;/* Dispatch Trace Log head index */ - u8 reserved13[96]; + u8 reserved14[96]; } cacheline_aligned; #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr) diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 1d58da946739..f20ac7a6efa4 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -278,6 +278,11 @@ struct paca_struct { struct mce_info *mce_info; u8 mce_pending_irq_work; #endif /* CONFIG_PPC_BOOK3S_64 */ +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE + u64 l1_to_l2_cs; + u64 l2_to_l1_cs; + u64 l2_runtime_agg; +#endif } cacheline_aligned; extern void copy_mm_to_paca(struct mm_struct *mm); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8e86eb577eb8..6acfe59c9b4f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4108,6 +4108,54 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu) } } +static inline int kvmhv_get_l2_counters_status(void) +{ + return get_lppaca()->l2_counters_enable; +} + +static inline void kvmhv_set_l2_counters_status(int cpu, bool status) +{ + if (status) + lppaca_of(cpu).l2_counters_enable = 1; + else + lppaca_of(cpu).l2_counters_enable = 0; +} + +int kmvhv_counters_tracepoint_regfunc(void) +{ + int cpu; + + for_each_present_cpu(cpu) { + kvmhv_set_l2_counters_status(cpu, true); + } + return 0; +} + +void kmvhv_counters_tracepoint_unregfunc(void) +{ + int cpu; + + for_each_present_cpu(cpu) { + kvmhv_set_l2_counters_status(cpu, false); + } +} + +static void do_trace_nested_cs_time(struct kvm_vcpu *vcpu) +{ + struct lppaca *lp = get_lppaca(); + u64 l1_to_l2_ns, l2_to_l1_ns, l2_runtime_ns; + + l1_to_l2_ns = tb_to_ns(be64_to_cpu(lp->l1_to_l2_cs_tb)); + l2_to_l1_ns = tb_to_ns(be64_to_cpu(lp->l2_to_l1_cs_tb)); + l2_runtime_ns = tb_to_ns(be64_to_cpu(lp->l2_runtime_tb)); + trace_kvmppc_vcpu_stats(vcpu, l1_to_l2_ns - local_paca->
Re: [PATCH v4 2/2] powerpc/bpf: enable kfunc call
On Thu, May 02, 2024 at 11:02:05PM GMT, Hari Bathini wrote: > Currently, bpf jit code on powerpc assumes all the bpf functions and > helpers to be part of core kernel text. This is false for kfunc case, > as function addresses may not be part of core kernel text area. So, > add support for addresses that are not within core kernel text area > too, to enable kfunc support. Emit instructions based on whether the > function address is within core kernel text address or not, to retain > optimized instruction sequence where possible. > > In case of PCREL, as a bpf function that is not within core kernel > text area is likely to go out of range with relative addressing on > kernel base, use PC relative addressing. If that goes out of range, > load the full address with PPC_LI64(). > > With addresses that are not within core kernel text area supported, > override bpf_jit_supports_kfunc_call() to enable kfunc support. Also, > override bpf_jit_supports_far_kfunc_call() to enable 64-bit pointers, > as an address offset can be more than 32-bit long on PPC64. > > Signed-off-by: Hari Bathini > --- > > * Changes in v4: > - Use either kernelbase or PC for relative addressing. Also, fallback > to PPC_LI64(), if both are out of range. > - Update r2 with kernel TOC for elfv1 too as elfv1 also uses the > optimization sequence, that expects r2 to be kernel TOC, when > function address is within core kernel text. > > * Changes in v3: > - Retained optimized instruction sequence when function address is > a core kernel address as suggested by Naveen. > - Used unoptimized instruction sequence for PCREL addressing to > avoid out of range errors for core kernel function addresses. > - Folded patch that adds support for kfunc calls with patch that > enables/advertises this support as suggested by Naveen. > > > arch/powerpc/net/bpf_jit_comp.c | 10 + > arch/powerpc/net/bpf_jit_comp64.c | 61 ++- > 2 files changed, 61 insertions(+), 10 deletions(-) > > diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c > index 0f9a21783329..984655419da5 100644 > --- a/arch/powerpc/net/bpf_jit_comp.c > +++ b/arch/powerpc/net/bpf_jit_comp.c > @@ -359,3 +359,13 @@ void bpf_jit_free(struct bpf_prog *fp) > > bpf_prog_unlock_free(fp); > } > + > +bool bpf_jit_supports_kfunc_call(void) > +{ > + return true; > +} > + > +bool bpf_jit_supports_far_kfunc_call(void) > +{ > + return IS_ENABLED(CONFIG_PPC64); > +} > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > b/arch/powerpc/net/bpf_jit_comp64.c > index 4de08e35e284..8afc14a4a125 100644 > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -208,17 +208,13 @@ bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, > struct codegen_context *ctx, > unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0; > long reladdr; > > - if (WARN_ON_ONCE(!core_kernel_text(func_addr))) > + if (WARN_ON_ONCE(!kernel_text_address(func_addr))) > return -EINVAL; > > - if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) { > - reladdr = func_addr - local_paca->kernelbase; > +#ifdef CONFIG_PPC_KERNEL_PCREL Would be good to retain use of IS_ENABLED(). Reviewed-by: Naveen N Rao - Naveen
Re: [PATCH v4 1/2] powerpc64/bpf: fix tail calls for PCREL addressing
On Thu, May 02, 2024 at 11:02:04PM GMT, Hari Bathini wrote: > With PCREL addressing, there is no kernel TOC. So, it is not setup in > prologue when PCREL addressing is used. But the number of instructions > to skip on a tail call was not adjusted accordingly. That resulted in > not so obvious failures while using tailcalls. 'tailcalls' selftest > crashed the system with the below call trace: > > bpf_test_run+0xe8/0x3cc (unreliable) > bpf_prog_test_run_skb+0x348/0x778 > __sys_bpf+0xb04/0x2b00 > sys_bpf+0x28/0x38 > system_call_exception+0x168/0x340 > system_call_vectored_common+0x15c/0x2ec > > Also, as bpf programs are always module addresses and a bpf helper in > general is a core kernel text address, using PC relative addressing > often fails with "out of range of pcrel address" error. Switch to > using kernel base for relative addressing to handle this better. > > Fixes: 7e3a68be42e1 ("powerpc/64: vmlinux support building with PCREL > addresing") > Cc: sta...@vger.kernel.org > Signed-off-by: Hari Bathini > --- > > * Changes in v4: > - Fix out of range errors by switching to kernelbase instead of PC > for relative addressing. > > * Changes in v3: > - New patch to fix tailcall issues with PCREL addressing. > > > arch/powerpc/net/bpf_jit_comp64.c | 30 -- > 1 file changed, 16 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c > b/arch/powerpc/net/bpf_jit_comp64.c > index 79f23974a320..4de08e35e284 100644 > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -202,7 +202,8 @@ void bpf_jit_build_epilogue(u32 *image, struct > codegen_context *ctx) > EMIT(PPC_RAW_BLR()); > } > > -static int bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context > *ctx, u64 func) > +static int > +bpf_jit_emit_func_call_hlp(u32 *image, u32 *fimage, struct codegen_context > *ctx, u64 func) > { > unsigned long func_addr = func ? ppc_function_entry((void *)func) : 0; > long reladdr; > @@ -211,19 +212,20 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, > struct codegen_context *ctx, u > return -EINVAL; > > if (IS_ENABLED(CONFIG_PPC_KERNEL_PCREL)) { > - reladdr = func_addr - CTX_NIA(ctx); > + reladdr = func_addr - local_paca->kernelbase; > > if (reladdr >= (long)SZ_8G || reladdr < -(long)SZ_8G) { > - pr_err("eBPF: address of %ps out of range of pcrel > address.\n", > - (void *)func); > + pr_err("eBPF: address of %ps out of range of 34-bit > relative address.\n", > +(void *)func); > return -ERANGE; > } > - /* pla r12,addr */ > - EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(1) | IMM_H18(reladdr)); > - EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | IMM_L(reladdr)); > - EMIT(PPC_RAW_MTCTR(_R12)); > - EMIT(PPC_RAW_BCTR()); > - > + EMIT(PPC_RAW_LD(_R12, _R13, offsetof(struct paca_struct, > kernelbase))); > + /* Align for subsequent prefix instruction */ > + if (!IS_ALIGNED((unsigned long)fimage + CTX_NIA(ctx), 8)) > + EMIT(PPC_RAW_NOP()); We don't need the prefix instruction to be aligned to a doubleword boundary - it just shouldn't cross a 64-byte boundary. Since we know the exact address of the instruction here, we should be able to check for that case. > + /* paddi r12,r12,addr */ > + EMIT(PPC_PREFIX_MLS | __PPC_PRFX_R(0) | IMM_H18(reladdr)); > + EMIT(PPC_INST_PADDI | ___PPC_RT(_R12) | ___PPC_RA(_R12) | > IMM_L(reladdr)); > } else { > reladdr = func_addr - kernel_toc_addr(); > if (reladdr > 0x7FFF || reladdr < -(0x8000L)) { > @@ -233,9 +235,9 @@ static int bpf_jit_emit_func_call_hlp(u32 *image, struct > codegen_context *ctx, u > > EMIT(PPC_RAW_ADDIS(_R12, _R2, PPC_HA(reladdr))); > EMIT(PPC_RAW_ADDI(_R12, _R12, PPC_LO(reladdr))); > - EMIT(PPC_RAW_MTCTR(_R12)); > - EMIT(PPC_RAW_BCTRL()); > } > + EMIT(PPC_RAW_MTCTR(_R12)); > + EMIT(PPC_RAW_BCTRL()); This change shouldn't be necessary since these instructions are moved back into the conditional in the next patch. Other than those minor comments: Reviewed-by: Naveen N Rao - Naveen
Re: [PATCH 6/7] powerpc: Replace CONFIG_4xx with CONFIG_44x
Le 06/05/2024 à 14:51, Michael Ellerman a écrit : > Replace 4xx usage with 44x, and replace 4xx_SOC with 44x. You can go one step further because CONFIG_44x implies CONFIG_BOOKE, see https://patchwork.ozlabs.org/project/linuxppc-dev/patch/d404d1257cd1d36aa6b25237f54eb2c2223ce447.1607519517.git.christophe.le...@csgroup.eu/ > > Signed-off-by: Michael Ellerman > --- > arch/powerpc/Kconfig | 5 + > arch/powerpc/include/asm/cacheflush.h | 2 +- > arch/powerpc/include/asm/ppc_asm.h | 2 +- > arch/powerpc/kernel/entry_32.S | 6 +++--- > arch/powerpc/kernel/process.c | 2 +- > arch/powerpc/mm/fault.c| 4 ++-- > arch/powerpc/mm/ptdump/Makefile| 2 +- > arch/powerpc/platforms/4xx/Makefile| 2 +- > arch/powerpc/platforms/Kconfig.cputype | 8 +--- > arch/powerpc/sysdev/Kconfig| 4 ++-- > 10 files changed, 14 insertions(+), 23 deletions(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 9a7d2b218516..2b6fa87464a5 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -488,7 +488,7 @@ source "kernel/Kconfig.hz" > > config MATH_EMULATION > bool "Math emulation" > - depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT > + depends on 44x || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT > select PPC_FPU_REGS > help > Some PowerPC chips designed for embedded applications do not have > @@ -1102,9 +1102,6 @@ config PPC4xx_CPM > It also enables support for two different idle states (idle-wait > and idle-doze). > > -config 4xx_SOC > - bool > - > config FSL_LBC > bool "Freescale Local Bus support" > help > diff --git a/arch/powerpc/include/asm/cacheflush.h > b/arch/powerpc/include/asm/cacheflush.h > index ef7d2de33b89..f2656774aaa9 100644 > --- a/arch/powerpc/include/asm/cacheflush.h > +++ b/arch/powerpc/include/asm/cacheflush.h > @@ -121,7 +121,7 @@ static inline void invalidate_dcache_range(unsigned long > start, > mb(); /* sync */ > } > > -#ifdef CONFIG_4xx > +#ifdef CONFIG_44x > static inline void flush_instruction_cache(void) > { > iccci((void *)KERNELBASE); > diff --git a/arch/powerpc/include/asm/ppc_asm.h > b/arch/powerpc/include/asm/ppc_asm.h > index 1d1018c1e482..02897f4b0dbf 100644 > --- a/arch/powerpc/include/asm/ppc_asm.h > +++ b/arch/powerpc/include/asm/ppc_asm.h > @@ -482,7 +482,7 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, > CPU_FTR_CELL_TB_BUG, 96) >* and they must be used. >*/ > > -#if !defined(CONFIG_4xx) && !defined(CONFIG_PPC_8xx) > +#if !defined(CONFIG_44x) && !defined(CONFIG_PPC_8xx) > #define tlbia \ > li r4,1024;\ > mtctr r4; \ > diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S > index 1522164b10e4..98ad926c056f 100644 > --- a/arch/powerpc/kernel/entry_32.S > +++ b/arch/powerpc/kernel/entry_32.S > @@ -211,7 +211,7 @@ start_kernel_thread: > > .globl fast_exception_return > fast_exception_return: > -#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE)) > +#if !(defined(CONFIG_44x) || defined(CONFIG_BOOKE)) > andi. r10,r9,MSR_RI /* check for recoverable interrupt */ > beq 3f /* if not, we've got problems */ > #endif > @@ -365,7 +365,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) > rfi > _ASM_NOKPROBE_SYMBOL(interrupt_return) > > -#if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) > +#if defined(CONFIG_44x) || defined(CONFIG_BOOKE) > > /* >* Returning from a critical interrupt in user mode doesn't need > @@ -469,4 +469,4 @@ ret_from_mcheck_exc: > RET_FROM_EXC_LEVEL(SPRN_MCSRR0, SPRN_MCSRR1, PPC_RFMCI) > _ASM_NOKPROBE_SYMBOL(ret_from_mcheck_exc) > #endif /* CONFIG_BOOKE */ > -#endif /* !(CONFIG_4xx || CONFIG_BOOKE) */ > +#endif /* !(CONFIG_44x || CONFIG_BOOKE) */ > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index 196cfa41ad6e..cddb4c099bbd 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -1566,7 +1566,7 @@ static void __show_regs(struct pt_regs *regs) > if (trap == INTERRUPT_MACHINE_CHECK || > trap == INTERRUPT_DATA_STORAGE || > trap == INTERRUPT_ALIGNMENT) { > - if (IS_ENABLED(CONFIG_4xx) || IS_ENABLED(CONFIG_BOOKE)) > + if (IS_ENABLED(CONFIG_44x) || IS_ENABLED(CONFIG_BOOKE)) > pr_cont("DEAR: "REG" ESR: "REG" ", regs->dear, > regs->esr); > else > pr_cont("DAR: "REG" DSISR: %08lx ", regs->dar, > regs->dsisr); > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 53335ae21a40..9af44ddf4b53 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -361,13 +361,13 @@ static void sanity_che
Re: [PATCH V2 8/9] tools/perf: Add support to find global register variables using find_data_type_global_reg
Le 06/05/2024 à 14:19, Athira Rajeev a écrit : > There are cases where define a global register variable and associate it > with a specified register. Example, in powerpc, two registers are > defined to represent variable: > 1. r13: represents local_paca > register struct paca_struct *local_paca asm("r13"); > > 2. r1: represents stack_pointer > register void *__stack_pointer asm("r1"); What about r2: register struct task_struct *current asm ("r2"); > > These regs are present in dwarf debug as DW_OP_reg as part of variables > in the cu_die (compile unit). These are not present in die search done > in the list of nested scopes since these are global register variables. >
Re: [PATCH V2 7/9] tools/perf: Update instruction tracking with add instruction
Le 06/05/2024 à 14:19, Athira Rajeev a écrit : > Update instruction tracking with add instruction. Apart from "mr" > instruction, the register state is carried on by other insns, ie, > "add, addi, addis". Since these are not memory instructions and doesn't > fall in the range of (32 to 63), add these as part of nmemonic table. > For now, add* instructions are added. There is possibility of getting > more added here. But to extract regs, still the binary code will be > used. So associate this with "load_store_ops" itself and no other > changes required. Looks fragile. How do you handle addc, adde, addic ? And also addme, addze, which only have two operands instead of 3 ? > > Signed-off-by: Athira Rajeev > --- > .../perf/arch/powerpc/annotate/instructions.c | 21 +++ > tools/perf/util/disasm.c | 1 + > 2 files changed, 22 insertions(+) > > diff --git a/tools/perf/arch/powerpc/annotate/instructions.c > b/tools/perf/arch/powerpc/annotate/instructions.c > index cce7023951fe..1f35d8a65bb4 100644 > --- a/tools/perf/arch/powerpc/annotate/instructions.c > +++ b/tools/perf/arch/powerpc/annotate/instructions.c > @@ -1,6 +1,17 @@ > // SPDX-License-Identifier: GPL-2.0 > #include > > +/* > + * powerpc instruction nmemonic table to associate load/store instructions > with > + * move_ops. mov_ops is used to identify add/mr to do instruction tracking. > + */ > +static struct ins powerpc__instructions[] = { > + { .name = "mr", .ops = &load_store_ops, }, > + { .name = "addi", .ops = &load_store_ops, }, > + { .name = "addis", .ops = &load_store_ops, }, > + { .name = "add",.ops = &load_store_ops, }, > +}; > + > static struct ins_ops *powerpc__associate_instruction_ops(struct arch > *arch, const char *name) > { > int i; > @@ -75,6 +86,9 @@ static void update_insn_state_powerpc(struct type_state > *state, > if (annotate_get_insn_location(dloc->arch, dl, &loc) < 0) > return; > > + if (!strncmp(dl->ins.name, "add", 3)) > + goto regs_check; > + > if (strncmp(dl->ins.name, "mr", 2)) > return; > > @@ -85,6 +99,7 @@ static void update_insn_state_powerpc(struct type_state > *state, > dst->reg1 = src_reg; > } > > +regs_check: > if (!has_reg_type(state, dst->reg1)) > return; > > @@ -115,6 +130,12 @@ static void update_insn_state_powerpc(struct type_state > *state __maybe_unused, s > static int powerpc__annotate_init(struct arch *arch, char *cpuid > __maybe_unused) > { > if (!arch->initialized) { > + arch->nr_instructions = ARRAY_SIZE(powerpc__instructions); > + arch->instructions = calloc(arch->nr_instructions, > sizeof(struct ins)); > + if (!arch->instructions) > + return -ENOMEM; > + memcpy(arch->instructions, powerpc__instructions, sizeof(struct > ins) * arch->nr_instructions); > + arch->nr_instructions_allocated = arch->nr_instructions; > arch->initialized = true; > arch->associate_instruction_ops = > powerpc__associate_instruction_ops; > arch->objdump.comment_char = '#'; > diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c > index ac6b8b8da38a..32cf506a9010 100644 > --- a/tools/perf/util/disasm.c > +++ b/tools/perf/util/disasm.c > @@ -36,6 +36,7 @@ static struct ins_ops mov_ops; > static struct ins_ops nop_ops; > static struct ins_ops lock_ops; > static struct ins_ops ret_ops; > +static struct ins_ops load_store_ops; > > static int jump__scnprintf(struct ins *ins, char *bf, size_t size, > struct ins_operands *ops, int max_ins_name);
[PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files
Add a new .note section containing type, size, offset and flags of every xfeature that is present. This information will be used by the debuggers to understand the XSAVE layout of the machine where the core file is dumped, and to read XSAVE registers, especially during cross-platform debugging. Some background: The XSAVE layouts of modern AMD and Intel CPUs differ, especially since Memory Protection Keys and the AVX-512 features have been inculcated into the AMD CPUs. This is since AMD never adopted (and hence never left room in the XSAVE layout for) the Intel MPX feature. Tools like GDB had assumed a fixed XSAVE layout matching that of Intel (based on the XCR0 mask). Hence, the core dumps from AMD CPUs didn't match the known size for the XCR0 mask. This resulted in GDB and other tools not being able to access the values of the AVX-512 and PKRU registers on AMD CPUs. To solve this, an interim solution has been accepted into GDB, and is already a part of GDB 14, thanks to these series of patches [ https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html ]. But this patch series depends on heuristics based on the total XSAVE register set size and the XCR0 mask to infer the layouts of the various register blocks for core dumps, and hence, is not a foolproof mechanism to determine the layout of the XSAVE area. Hence this new core dump note has been proposed as a more sturdy mechanism to allow GDB/LLDB and other relevant tools to determine the layout of the XSAVE area of the machine where the corefile was dumped. The new core dump note (which is being proposed as a per-process .note section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures. Each structure describes an individual extended feature containing offset, size and flags (that is obtained through CPUID instruction) in a format roughly matching the follow C structure: struct xfeat_component { u32 xfeat_type; u32 xfeat_sz; u32 xfeat_off; u32 xfeat_flags; }; Co-developed-by: Jini Susan George Signed-off-by: Jini Susan George Signed-off-by: Vignesh Balasubramanian --- v1->v2: Removed kernel internal defn dependency, code improvements arch/x86/Kconfig | 1 + arch/x86/include/asm/elf.h | 34 + arch/x86/kernel/fpu/xstate.c | 141 +++ fs/binfmt_elf.c | 4 +- include/uapi/linux/elf.h | 1 + 5 files changed, 179 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 928820e61cb5..cc67daab3396 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -105,6 +105,7 @@ config X86 select ARCH_HAS_DEBUG_WX select ARCH_HAS_ZONE_DMA_SET if EXPERT select ARCH_HAVE_NMI_SAFE_CMPXCHG + select ARCH_HAVE_EXTRA_ELF_NOTES select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_PC_PARPORT diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 1fb83d47711f..5952574db64b 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -13,6 +13,40 @@ #include #include +struct xfeat_component { + u32 xfeat_type; + u32 xfeat_sz; + u32 xfeat_off; + u32 xfeat_flags; +} __packed; + +_Static_assert(sizeof(struct xfeat_component)%4 == 0, "xfeat_component is not aligned"); + +enum custom_feature { + FEATURE_XSAVE_FP = 0, + FEATURE_XSAVE_SSE = 1, + FEATURE_XSAVE_YMM = 2, + FEATURE_XSAVE_BNDREGS = 3, + FEATURE_XSAVE_BNDCSR = 4, + FEATURE_XSAVE_OPMASK = 5, + FEATURE_XSAVE_ZMM_Hi256 = 6, + FEATURE_XSAVE_Hi16_ZMM = 7, + FEATURE_XSAVE_PT = 8, + FEATURE_XSAVE_PKRU = 9, + FEATURE_XSAVE_PASID = 10, + FEATURE_XSAVE_CET_USER = 11, + FEATURE_XSAVE_CET_SHADOW_STACK = 12, + FEATURE_XSAVE_HDC = 13, + FEATURE_XSAVE_UINTR = 14, + FEATURE_XSAVE_LBR = 15, + FEATURE_XSAVE_HWP = 16, + FEATURE_XSAVE_XTILE_CFG = 17, + FEATURE_XSAVE_XTILE_DATA = 18, + FEATURE_MAX, + FEATURE_XSAVE_EXTENDED_START = FEATURE_XSAVE_YMM, + FEATURE_XSAVE_EXTENDED_END = FEATURE_XSAVE_XTILE_DATA, +}; + typedef unsigned long elf_greg_t; #define ELF_NGREG (sizeof(struct user_regs_struct) / sizeof(elf_greg_t)) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 33a214b1a4ce..3d1c3c96e34d 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -87,6 +88,8 @@ static unsigned int xstate_flags[XFEATURE_MAX] __ro_after_init; #define XSTATE_FLAG_SUPERVISOR BIT(0) #define XSTATE_FLAG_ALIGNED64 BIT(1) +static const char owner_name[] = "LINUX"; + /* * Return whether the system supports a given xfeature. * @@ -1837,3 +1840,141 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, return 0; } #endif /* CONFIG
[PATCH v2 0/1] Add XSAVE layout description to Core files for debuggers to support varying XSAVE layouts
This patch proposes to add an extra .note section in the corefile to dump the CPUID information of a machine. This is being done to solve the issue of tools like the debuggers having to deal with coredumps from machines with varying XSAVE layouts in spite of having the same XCR0 bits. The new proposed .note section, at this point, consists of an array of records containing the information of each extended feature that is present. This provides details about the offsets and the sizes of the various extended save state components of the machine where the application crash occurred. Requesting a review for this patch. Please NOTE that this patch has to be applied on top of the patch (https://lore.kernel.org/lkml/874jbt7qz3@oldenburg3.str.redhat.com/T/). Vignesh Balasubramanian (1): x86/elf: Add a new .note section containing Xfeatures information to x86 core files arch/x86/Kconfig | 1 + arch/x86/include/asm/elf.h | 34 + arch/x86/kernel/fpu/xstate.c | 141 +++ fs/binfmt_elf.c | 4 +- include/uapi/linux/elf.h | 1 + 5 files changed, 179 insertions(+), 2 deletions(-) -- 2.34.1
Re: [PATCH V2 6/9] tools/perf: Update instruction tracking for powerpc
Le 06/05/2024 à 14:19, Athira Rajeev a écrit : > Add instruction tracking function "update_insn_state_powerpc" for > powerpc. Example sequence in powerpc: > > ld r10,264(r3) > mr r31,r3 > < > ld r9,312(r31) Your approach looks fragile. mr is a simplified instruction which in fact is "or r31, r3, r3" By the way, the compiler sometimes does it different, like below: lwz r10,264(r3) addir31, r3, 312 lwz r9, 0(r31) And what about sequences with lwzu ? > > Consider ithe sample is pointing to: "ld r9,312(r31)". > Here the memory reference is hit at "312(r31)" where 312 is the offset > and r31 is the source register. Previous instruction sequence shows that > register state of r3 is moved to r31. So to identify the data type for r31 > access, the previous instruction ("mr") needs to be tracked and the > state type entry has to be updated. Current instruction tracking support > in perf tools infrastructure is specific to x86. Patch adds this for > powerpc and adds "mr" instruction to be tracked. > > Signed-off-by: Athira Rajeev
Re: [PATCH V2 5/9] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc
Le 06/05/2024 à 14:19, Athira Rajeev a écrit : > Use the raw instruction code and macros to identify memory instructions, > extract register fields and also offset. The implementation addresses > the D-form, X-form, DS-form instructions. Two main functions are added. > New parse function "load_store__parse" as instruction ops parser for > memory instructions. Unlink other parser (like mov__parse), this parser > fills in only the "raw" field for source/target and new added "mem_ref" > field. This is because, here there is no need to parse the disassembled > code and arch specific macros will take care of extracting offset and > regs which is easier and will be precise. > > In powerpc, all instructions with a primary opcode from 32 to 63 > are memory instructions. Update "ins__find" function to have "opcode" > also as a parameter. Don't use the "extract_reg_offset", instead use > newly added function "get_arch_regs" which will set these fields: reg1, > reg2, offset depending of where it is source or target ops. Yes all instructions with a primary opcode from 32 to 63 are memory instructions, but not all memory instructions have opcode 32 to 63. > > Signed-off-by: Athira Rajeev > --- > tools/perf/arch/powerpc/util/dwarf-regs.c | 33 + > tools/perf/util/annotate.c| 22 - > tools/perf/util/disasm.c | 59 +-- > tools/perf/util/disasm.h | 4 +- > tools/perf/util/include/dwarf-regs.h | 4 +- > 5 files changed, 114 insertions(+), 8 deletions(-) > > diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c > b/tools/perf/arch/powerpc/util/dwarf-regs.c > index e60a71fd846e..3121c70dc0d3 100644 > --- a/tools/perf/arch/powerpc/util/dwarf-regs.c > +++ b/tools/perf/arch/powerpc/util/dwarf-regs.c > @@ -102,6 +102,9 @@ int regs_query_register_offset(const char *name) > #define PPC_OP(op) (((op) >> 26) & 0x3F) > #define PPC_RA(a) (((a) >> 16) & 0x1f) > #define PPC_RT(t) (((t) >> 21) & 0x1f) > +#define PPC_RB(b)(((b) >> 11) & 0x1f) > +#define PPC_D(D) ((D) & 0xfffe) > +#define PPC_DS(DS) ((DS) & 0xfffc) > > int get_opcode_insn(unsigned int raw_insn) > { > @@ -117,3 +120,33 @@ int get_target_reg(unsigned int raw_insn) > { > return PPC_RT(raw_insn); > } > + > +int get_offset_opcode(int raw_insn __maybe_unused) > +{ > + int opcode = PPC_OP(raw_insn); > + > + /* DS- form */ > + if ((opcode == 58) || (opcode == 62)) Can you re-use macros from arch/powerpc/include/asm/ppc-opcode.h ? #define OP_LD 58 #define OP_STD 62 > + return PPC_DS(raw_insn); > + else > + return PPC_D(raw_insn); > +} > + > +/* > + * Fills the required fields for op_loc depending on if it > + * is a source of target. > + * D form: ins RT,D(RA) -> src_reg1 = RA, offset = D, dst_reg1 = RT > + * DS form: ins RT,DS(RA) -> src_reg1 = RA, offset = DS, dst_reg1 = RT > + * X form: ins RT,RA,RB -> src_reg1 = RA, src_reg2 = RB, dst_reg1 = RT > + */ > +void get_arch_regs(int raw_insn __maybe_unused, int is_source > __maybe_unused, struct annotated_op_loc *op_loc __maybe_unused) > +{ > + if (is_source) > + op_loc->reg1 = get_source_reg(raw_insn); > + else > + op_loc->reg1 = get_target_reg(raw_insn); > + if (op_loc->multi_regs) > + op_loc->reg2 = PPC_RB(raw_insn); > + if (op_loc->mem_ref) > + op_loc->offset = get_offset_opcode(raw_insn); > +} > diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c > index 85692f73e78f..f41a0fadeab4 100644 > --- a/tools/perf/util/disasm.c > +++ b/tools/perf/util/disasm.c > @@ -760,11 +800,20 @@ static void ins__sort(struct arch *arch) > qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp); > } > > -static struct ins_ops *__ins__find(struct arch *arch, const char *name) > +static struct ins_ops *__ins__find(struct arch *arch, const char *name, int > opcode) > { > struct ins *ins; > const int nmemb = arch->nr_instructions; > > + if (arch__is(arch, "powerpc")) { > + /* > + * Instructions with a primary opcode from 32 to 63 > + * are memory instructions in powerpc. > + */ > + if ((opcode >= 31) && (opcode <= 63)) Could just be if ((opcode & 0x20)) I guess. By the way your test is wrong because opcode 31 is not only memory instructions, see example below (not exhaustive): #define OP_31_XOP_TRAP 4 ==> No #define OP_31_XOP_LDX 21 ==> Yes #define OP_31_XOP_LWZX 23 ==> Yes #define OP_31_XOP_LDUX 53 ==> Yes #define OP_31_XOP_DCBST 54 ==> No #define OP_31_XOP_LWZUX 55 ==> Yes #define OP_31_XOP_TRAP_64 68 ==> No #define OP_31_XOP_DCBF 86 ==> No #define OP_31_XOP_LBZX 87 ==> Yes #define OP_31_XOP_STDX 149 ==> Yes #define O
Re: [PATCH V2 4/9] tools/perf: Add support to capture and parse raw instruction in objdump
Le 06/05/2024 à 14:19, Athira Rajeev a écrit : > Add support to capture and parse raw instruction in objdump. What's the purpose of using 'objdump' for reading raw instructions ? Can't they be read directly without invoking 'objdump' ? It looks odd to me to use objdump to provide readable text and then parse it back. > Currently, the perf tool infrastructure uses "--no-show-raw-insn" option > with "objdump" while disassemble. Example from powerpc with this option > for an instruction address is: Yes and that makes sense because the purpose of objdump is to provide human readable annotations, not to perform automated analysis. Am I missing something ? > > Snippet from: > objdump --start-address= --stop-address= -d > --no-show-raw-insn -C > > c10224b4: lwz r10,0(r9) > > This line "lwz r10,0(r9)" is parsed to extract instruction name, > registers names and offset. Also to find whether there is a memory > reference in the operands, "memory_ref_char" field of objdump is used. > For x86, "(" is used as memory_ref_char to tackle instructions of the > form "mov (%rax), %rcx". > > In case of powerpc, not all instructions using "(" are the only memory > instructions. Example, above instruction can also be of extended form (X > form) "lwzx r10,0,r19". Inorder to easy identify the instruction category > and extract the source/target registers, patch adds support to use raw > instruction. With raw instruction, macros are added to extract opcode > and register fields. > > "struct ins_operands" and "struct ins" is updated to carry opcode and > raw instruction binary code (raw_insn). Function "disasm_line__parse" > is updated to fill the raw instruction hex value and opcode in newly > added fields. There is no changes in existing code paths, which parses > the disassembled code. The architecture using the instruction name and > present approach is not altered. Since this approach targets powerpc, > the macro implementation is added for powerpc as of now. > > Example: > representation using --show-raw-insn in objdump gives result: > > 38 01 81 e8 ld r4,312(r1) > > Here "38 01 81 e8" is the raw instruction representation. In powerpc, > this translates to instruction form: "ld RT,DS(RA)" and binary code > as: > _ > | 58 | RT | RA | DS | | > - > 06 1116 30 31 > > Function "disasm_line__parse" is updated to capture: > > line:38 01 81 e8 ld r4,312(r1) > opcode and raw instruction "38 01 81 e8" > Raw instruction is used later to extract the reg/offset fields. > > Signed-off-by: Athira Rajeev > ---
Re: [PATCH v1 1/9] selftests/powerpc/dexcr: Add -no-pie to hashchk tests
On Wed, 2024-04-17 at 21:23 +1000, Benjamin Gray wrote: > The hashchk tests want to verify that the hash key is changed over > exec. > It does so by calculating hashes at the same address across an exec. > This is made simpler by disabling PIE functionality, so we can > re-execute ourselves and be using the same addresses in the child. > > While -fno-pie is already added, -no-pie is also required. > > Fixes: ca64da7574f8 ("selftests/powerpc/dexcr: Add hashst/hashchk > test") > Signed-off-by: Benjamin Gray This matches the gcc documentation. Reviewed-by: Andrew Donnellan Tested-by: Andrew Donnellan > > --- > > This is not related to features introduced in this series, just fixes > the test added in the static DEXCR series. > --- > tools/testing/selftests/powerpc/dexcr/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile > b/tools/testing/selftests/powerpc/dexcr/Makefile > index 76210f2bcec3..829ad075b4a4 100644 > --- a/tools/testing/selftests/powerpc/dexcr/Makefile > +++ b/tools/testing/selftests/powerpc/dexcr/Makefile > @@ -3,7 +3,7 @@ TEST_GEN_FILES := lsdexcr > > include ../../lib.mk > > -$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie $(call cc-option,-mno- > rop-protect) > +$(OUTPUT)/hashchk_test: CFLAGS += -fno-pie -no-pie $(call cc- > option,-mno-rop-protect) > > $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c > $(TEST_GEN_FILES): ../utils.c ./dexcr.c -- Andrew DonnellanOzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited